# Know it before it happens: Potential factors associated with suicides

According to WHO, close to 800, 000 people die due to suicide every year, which is one person every 40 seconds. Suicide is a global phenomenon and occurs throughout the lifespan.

As a researcher in human cognitive neuroscience field, we are digging the deep brain mechanism of suicide, and we are trying to find ways to predict and prevent suicides. However, besides underlying neural correlates, there are some explicit factors linked to suicide rates, such as gender, age, culture etc. An overview and exploratory analysis on these potential factors will help us predict and implement effective interventions at population, sub-population and individual levels to prevent suicides and suicides attempts.

Given this background, using the dataset of suicide rates overview 1985 to 2016 provided by Kaggle, the current project aimed to conduct some exploratory analysis and visualization to outline the world suicides rates across 30 years and explore the potential factors associated with suicides. This project is one of Udacity Data Scientist Nanodegree projects.

**How does the world suicide rates change across 30 years?**

Firstly, I am interested to know the suicide rates changes around the world from 1985 to 2016. To answer this question, I took the suicides/100 k pop as the variable of interest. Although the suicides number also indicates something, it is unreasonable to use this index due to the huge population variability of each country. From the below bar plot (**Figure 1**), we can see that the world-wide suicides rate fluctuated from 1985 to 2016. Specifically, the suicides rates obviously increased during the five years prior to 21st, and then showed a decreasing trend. However, it again significantly increased in 2016.

**Differences between females and males in suicide rates**

A large number of researches has demonstrated that gender plays an important role in suicide risk assessments. Studies from America and Europe show that suicide rates in males are higher than females[1] .

Thus, the second question is to explore this gender difference in suicides rates over the world in each year. As shown in below the **Figure 2**, there is a clear trend that males show higher suicide rates than females across the 30 years, and the differences appear significant. This finding is consistent with the relevant research.

Another interesting finding is, from the visualization, the suicide rates in males showed more fluctuation along with time, which is in line with the overall change shown in **Figure 1**, while the females showed a relatively stable trend.

**How does the age affect suicide rates?**

Is there a specific age group showing higher suicide rate? In addition to the factor of gender, I am interested to understand the influence of age on suicide rates. Moreover, I would like to explore if there is potential interaction effect between age and gender in suicide rates.

From the above **Figure 3**, we can see that, in general, the trend of age effect in both male and female groups are similar. Specifically, the 75+ years age group shows the highest suicide rate, and the 5–14 years age group shows the lowest suicide rate compared to other age groups.

However, it is worth mentioning that in males, the 25–34 years group show obvious higher suicide rate than 15–24 years group, while there is no such difference in females.

In addition, an important finding is that, except the 5–14 years group, males show higher suicide rates than females in every age group. So, visually, there is no interaction effect between age and gender. Furthermore, even the 75+ years group in female group shows lower suicide rates than the 15–24 years group in males.

**Is there any relationship between suicide rates and other numerical variable?**

In this section, I aimed to see the relationships among all numerical variables. In particularly, I would like to explore the relationship between the gdp/capita ($) and the suicide rates. Is good economic condition linked to lower suicide rates?

Surprisingly, as shown in the correlation heat map (Figure 4), except the suicide number, the other numerical variables do not appear to correlate with suicide rates. The correlation coefficient between suicide rates and gdp/capita is only 0.002. Basically, we can say that is no directly relationship between suicide rates and economy.

**Which factors are the best predictors of suicide rates?**

Since there is no obvious relationship between numerical variables and suicide rates, then the rest categorical factors might be better predictors. Based on the previous results, are age and gender the best two predictors? What about countries? However, there were thousands of countries. Therefore, I aimed to use linear regression model to investigate all variables and find out which factors are the best predictors of suicide rates. In other words, the suicide rate is the response variable, and all other variables are predictors.

Using the sklearn, the dataset was split to train and test, and then the linear regression model was fit. *The results showed that the r-squared score on the training data was 0.55, and the r-squared score on the test data was 0.54. Overall, this model fits well.*

Finally, I intended to list the top 50 predictors of suicide, and it turns out the top 20 predictors are all countries (See Table 1)

From the above Table 1, we can see among the top 50 predictors, most of them are countries, and all of them showed positive coefficient values.

However, males and 75+ years were listed in the top 50, which is consistent with the previous findings.

Also, among the top 25 countries, there are some with good economic environment and welfare, such as Finland, Switzerland, France, Japan, Singapore, United States etc. Again, it indicated that the economy does not influence the suicide rates directly.

**Take home message**

Since this project is based on the exploratory analysis, I am avoiding drawing strong conclusions, but still, we can get some preliminary summary from the visualizations:

*1.* *The world suicide rates fluctuate up and down across the recent three decades, but it is never below 10.*

*2.* *Males show times higher suicide rates than females, and this trend exists in all age groups. This alerts us that more attention needs be paid to males’ mental health.*

*3.* *75+ years age group shows the highest suicide rate among all age groups for both males and females. This reminds us that elderly people with relatively poor life quality are facing higher suicide risk.*

*4.* *Overall, countries appear to be the best predictors of suicide rates, but it seems nothing to do with the countries’ economy. Therefore, for each country, there should be other specific factors associated with suicide rates, which is not reflected in the current dataset. Further research is needed to find out these underlying factors.*

**Limitations**

Of course, there are limitations of this project:

*1. The project is exploratory and not many statistical analyses were done. For example, the differences in suicide rates between males and females were observed visually, the quantitative analysis should be done in further analysis.*

*2. The dataset can be split into different groups based on the suicide rates, and compare the low and high suicide rates groups.*

*3. More detailed analysis can be performed in the Top 20 countries.*

[1] Sidhu, N., & Friedman, S. H. (2020). Suicide and Gender. *The American Psychiatric Association Publishing Textbook of Suicide Risk Assessment and Management*, 293.