Machine Learning Analysis

Analyzing climate change patterns with machine learning

Evaluate the potential of common machine learning models to assess climate change impact and plan for the future

Identify weather patterns outside the regional norm in Europe
Determine if unusual weather patterns are increasing
Generate possibilities for future weather conditions over the next 25 to 50 years based on current trends
Determine the safest places for people to live in Europe over the next 25 to 50 years

Data

Context

Machine learning allows organizations to analyze complex patterns such as those caused by climate change. This student project explores the machine learning models available to climate analysts and proposes three potential approaches for a fictionalized research organization, ClimateWins.

Professional Competencies

Data profiling & cleaning
Data wrangling & subsetting
Principal Component Analysis (PCA)

Supervised & unsupervised models
Deep learning models
Hyperparameter optimization

Objectives

This project uses data from the European Climate Assessment and Dataset (ECAD) project. This specific dataset includes weather observations from 18 different weather stations across Europe containing data from the late 1800s to 2022.

Phase 1: Explore Supervised Machine Learning Models

As this was my first venture into machine learning, I needed to begin by familiarizing myself with the basics. I explored the functions of fitting the model to the available data, exploring techniques such as gradient descent optimization. Taking time to understand them allowed me to better grasp the function of supervised machine learning models.

With this understanding, I ran the following supervised machine learning models on the data. I ran the ECAD data set against generated data classifying daily observations as pleasant or unpleasant weather. I ran this data using the following three models:

K-Nearest Neighbor
Decision Tree
Artificial Neural Network (ANN)

My goal was to assess the accuracy of each model and identify which had the most potential for evaluating weather data. Overall, test data accuracy ranged from 47% to 54%, though 100% unpleasant weather on one mountaintop model distorted the average.

Station-specific accuracy was higher at 85.8% to 94.4%.The confusion matrix pictured here offers more detail. As the ANN model had the highest overall accuracy, I recommended this option for further study.

Phase 2: Assess and Optimize Unsupervised Models

The next step was to evaluate unsupervised machine learning models using the same data sets. I learned and explored the following models:

Hierarchical Clustering
Random Forest
Convolution Neural Network (CNN)
General Adversarial Network (GAN)

The biggest challenge at this phase was understanding the functionality of each model. As a novice in machine learning, I relied on the explanations of more experienced users, including my CareerFoundry mentor. I learned to ask targeted questions that would help me understand the key functionalities of each model so I could optimize the hyperparameters.

Optimization was a trial-and-error process. My goal was to find the best combination of inputs to get the optimal accuracy for each model. After optimizing the model, my next step was to evaluate the feature importances of each weather stationa nd tyep of weather, to get more information about what weather data to prioriize.

Phase 3: Compare Models and Identify Next Steps

Optimization had the best results with the CNN and Random Forest models. The CNN had the most improved results, wiht accuracy incraesing from 10.4% to 61.2%. The goal was to determine strategies for enxt steps, so we evaluated the viability of GAN data to apply to the model. Optimizing the GAN model allowed me to achieve 98.1% accuracy, suggesting this approach had viability.

To learn more about how this approach works, I conducted some independent research. I found several tsudies that had applied GAN data to a CNN model to predict weather outcomes. These researchers were publishing promising results, so I suggested this as the track ClimateWins could take for promising future study.

Recommendations for Further Analysis

What additional weather data is available? ClimateWins wants to predict weather crises in order to protect human health. To answer those questions, we need data on the frequency of extreme weather events. Radar data would alos be beneficial, as the CNN model excels at processing image data.
What types of weather are associated with the greatest risk? The available data divided weather into pleasant vs. unpleasant. The analysis would benefit from information on where weather-related injuries, illnesses, and deaths ahve occurred.

Dive Deeper

View full presentation

View preliminary analysis presentation

View code on GitHub