Logo

The Data Daily

COVID-19: What Are Data Saying?

COVID-19: What Are Data Saying?

There are different types of coronavirus that can infect humans and animals. A novel strain of coronavirus was first detected in December 2019 in Wuhan, China. It was declared as a pandemic by the World Health Organization (WHO) in March 2020. Since then, COVID-19 is controversial and people (including me) did not stop asking questions regarding the novel coronavirus.

In this post, I will try to explore and analyze dataset collected across the world in order to provide answers to the following:

For this purpose, I used COVID-19 public data for 170 countries provided by Mendeley Data [1].

Before we proceed, it is interesting to have an overview of the data we used for this propose.

This dataset has 14 columns/variables and 50418 rows/observations. However, for the purpose of analyzing/exploring, I did remove some columns.

I explored data in order to answer the abovementioned questions using Plotly library. According to total_casesandtotal_deaths,we can see that most of the countries have less than one million confirmed cases by October 2020. However, figure 1 below shows three countires including United States, India and Brazil (highest to lowest) that have the highest COVID-19 cases.

The figure below shows the death toll for countries worldwide. Overall, the values ranged between zero and 50K for the majority of countries, while the United States, Brazil, India and Mexico have the highest death tolls (High to low).

In other words, does the number of cases in each country has a relationship with its population? A lot of people think that the high numbers can be related to poulation. Thanks to Python libraries that make it easy for us to get the answer using the correlation Matrix.

Correlation Matrix is a very common way to summarize data, it is defined as a table which shows the correlation coefficient/strength of the relationship between the variables, each cell shows the coefficient between two variables.

Increased levels of cases/deaths have been associated with population (correlation coefficient between 0 and 1 = positive). A large population is likely to increase the number of cases and deaths. Furthermore, in my opinion, the more a country tests the more cases they will find.

Case Fatality Rate (CFR) is defined in [2] as the proportion of deaths within a defined population of interest. Case fatality rate measures the severity of the disease that causes death. The data we used does not have the CFR but we can easily calculate it using the number of cases and deaths for each country.

From the histogram above, we can easily observe the countries that have the highest percentage of COVID-19 death. Yemen, Mexico and Italy have 28.99%, 10.12% and 8.82% respectively.

You can access the code developed for this analysis through my GitHub link.

Knowing which countries have the highest cases/deaths and CFR allows researchers and governments to focus on them to know the main reasons of spread and fatality.

[1] Vitenu-Sackey, Prince Asare (2020), “The Impact of Covid-19 Pandemic on the Global Economy: Emphasis on Poverty Alleviation and Economic Growth”, Mendeley Data, V1, doi: 10.17632/b2wvnbnpj9.1

Images Powered by Shutterstock