Compares socio-economic info with suicide rates by year and country. Part1: Data Description.

tambounstephanekem
Mar 23, 2020
2 min read

It is estimated that each year approximately one million people die from suicide. This represents a global mortality rate of 16 people per 100,000 or one death every 40 seconds according to the World Health Organisation (WHO) . More specifically, it is the 11th cause of death among Americans and the 3 cause of death in young people between the ages of 15-24 as reported by the (CDC) Centre of Disease Control and Prevention.

Predictions shows this year of 2020 the rate of death will increase to one every 20 seconds. Therefore, it is important to know why it happens and how we can alter this curve for next generations.

The data is a subset of a set a compiled dataset from https //www.kaggle.com/russellyates88/suicide−rates−overview−1985−to−2016, pulled from four other dataset linked by time and place:

• United Nations Development Program. (2018). Human development index (HDI).

• World Bank. (2018). World development indicators: GDP (current US)

• Szamil. (2017). Suicide in the Twenty-First Century [dataset].

• World Health Organization. (2018). Suicide prevention.

The original data is a data of 27820 observations and 12 dimensions. Collected over the period of 1985 to 2016 and through 101 countries. It was built to find signals correlated to increased suicide rates among different cohorts globally, across the socio-economic spectrum and information was collected on the country, year, sex, age group, count of suicides, population, suicide rate, HDI(Human development index) for year, gdp_for_year(gross domestic product per year), gdp_per_capita, generation (based on age grouping average).

We considered a subset of this dataset as mentioned above because we wish to reduce the observations of the data to reduce the running time during the analysis. Therefore, for our new data set, we considered only the years 1985, 1986,1999,2000,2015 and 2016, across 94 countries and totalling 4084 observations and 12 dimensions with 6 categorical type dimention and 6quantitative.

Primarily observing the data, we recognise long-time trends and differences between countries and across generations. We will go further in our analysis in the next blog posts.

Overall, we sort to answer the following questions by looking critically at the dataset using data visualization techniques

• What are the main causes of suicide?

• Can we develop techniques or model to identify a person with suicidal tendencies?

• How can we prevent suicide?

Compares socio-economic info with suicide rates by year and country. Part1: Data Description.

Recent Posts

Commentaires