Addressing Depression Globally

Utilizing advanced analytics to understand global mental health needs

Optimize resource use and support the delivery of quality care

Determine which countries had the highest rates of depression
Analyze correlations between rates of depression and other mental illnesses
Identify trends in depression diagnoses over time across countries

Data

Context

Depression affects people across the world, yet not all countries have the resources necessary to provide adequate care. I wanted to learn how data could be used to predict where needs were highest and possibly offer insights into when and why depression rates rise. I completed this personal project as the capstone to a data analytics course I completed with CareerFoundry.

Technical Competencies

Descriptive analysis
Data wrangling
Subsetting
Supervised regression
KMeans clustering

Correlation matrices
Pair plotting
Categorical plotting
Chloropleth mapping
Data visualization

Objectives

This project uses data from international health organizations and governments across the world. Data Set: Mental Health Depression Disorder Data, Kaggle

Phase 1: Identifying Correlations

My first objective was to identify correlations between depression and other mental illnesses. This posed an immediate challenge because I found very low correlations between depression and other mental illnesses,

The confusion matrix shows the results of my initial analysis. Oranges and lighter reds represent stronger correlations.

My mentor encouraged me to dig deeper and look for non-linear correlations, starting with pair and categorical plots.

My initial pair plot showed too much detail to be helpful. Unsure of which direction to take, I consulted my mentor, who explained that clusters and ”snakes” represented data points from the same country across different years.

My choices were to filter the data to focus on specific years and countries or conduct a cluster analysis. I decided to move forward with the cluster analysis, with the goal of finding more valuable relationships between variables.

Cluster Analysis: Challenges and Results

Python presented me with more data points I couldn’t interpret. I recalled that “clumps” of data most likely represented timelines in particular countries, but I couldn’t interact with the Python output (top).

To solve this problem, I exported the Pandas data set with the added clusters data frame and turned to Tableau. Building on my experience using filters and tooltips, I created a “lookalike” cluster analysis that would let me view the data for individual points (bottom).

I learned that some countries looked like they had increasing rates of depression, but more recent years were located to the left, showing an actual decrease.

Phase 2: Mapping Data

My first attempt at working with JSON files was difficult. Multiple countries were blacked out, and the visualization didn’t seem in line with what I’d learned. I needed to make country names on the map match those in the JSON file. After conferring with experienced analysts, I learned to change the JSON data.

Improving the accuracy of results was much more difficult. My mentor and I collaborated using Excel pivot tables to create an aggregated data set using numbers from more recent years. I then used that dataset to generate a chloropleth map via Python.

Conclusion: Greenland, Finland, and Morocco have particularly high rates of depression per 100,000 people. High incidences are also visible in the United States, Iran, Afganistan, Australia, and certain pockets of Eastern Europe and Africa.

Phase 3: Integrating Map and Cluster Analyses

For the final phase of this project, I used Tableau to overlay clustering data on a chloropleth map. Cluster 1 represents the highest level of need. From a best practices standpoint, it would be more effective for those symbols to be the darkest.

I looked for ways to order the clusters as “0,2,1” but was unable to do so with my current level of Tableau skill. Knowing I would need to call audiences’ attention to Cluster 1 in another way, I highlighted it in my presentation text.

Conclusions

Despite the initially disappointing lack of connection between depression and other mental illnesses, there were valuable insights in this data.

For the most part, high-need Cluster 1 countries also had high depression rates. There is also a significant amount of geographic grouping of Cluster 1 and Cluster 2 designations, suggesting the need to dig into potential causes.

Recommendations for Further Analysis

What resources are currently available in high-need countries? Several Cluster 1 countries are well-resourced, but many are in the developing world.
How have trends changed in the past five years? This data stops at 2016, and much has changed in mental health care since then.
What is happening in countries with high rates of anxiety and depression? Can spciopolitical trends help us to predict intensifying mental health needs?

Dive Deeper

View project scripts on GitHub

View project presentation in Tableau