Performing Analysis of Meteorological Data

Rithik Alias
5 min readMay 21, 2021

Author : Rithik Alias

The Dataset Used

One type of data that’s easier to find on the net is Weather data. Many sites provide historical data on many meteorological parameters such as pressure, temperature, humidity, wind speed, visibility, etc. One such weather dataset is available on Kaggle. (Source URL: https://www.kaggle.com/muthuj7/weather-dataset)

The dataset has hourly temperature recorded for last 10 years starting from 2006–04–01 00:00:00.000 +0200 to 2016–09–09 23:00:00.000 +0200. It corresponds to Finland, a country in the Northern Europe. You can download the dataset from this Google drive link: https://drive.google.com/open?id=1ScF_1a-bkHi1qe8Rn78uxK6_5QwUD9Bu

Objective

We will be transforming this raw data into information and then convert it into knowledge.

“Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming”

We will be performing analysis on the above raw data based on the given null hypothesis.

The null hypothesis means we need to find whether the average Apparent temperature for the month of a month say April starting from 2006 to 2016 and the average humidity for the same period have increased or not.

The Code

The code is available with the dataset in the GitHub repository mentioned below.

The link to the code : https://colab.research.google.com/drive/1ajdcaXBVisMOUHDw77J9V4mF9dHXGGaS?usp=sharing

GitHub : https://github.com/Rithik-Alias/Analysis-of-Meteorological-Data

Cleaning the Dataset

As part of cleaning, the very first thing that I have done is transforming the unformatted date column to a formatted datetime type column. Hence it become much easier for the further analysis. Also I have removed the unwanted columns ['Summary', 'Precip Type', 'Temperature (C)', 'Wind Speed (km/h)', 'Wind Bearing (degrees)', 'Visibility (km)', 'Pressure (millibars)', 'Daily Summary']. Also I made the date column as the index column.

The Result Based on a Yearly Resampling

Here mainly we have the apparent temperature data and humidity data. So we are going to conduct our analysis on these to data. Hence then we will try to check the trueness of the hypothesis that we have chosen.

So, here I have visualized yearly change in the humidity and apparent temperature from 2006 to 2016. I took the mean of each year’s data to represent each year.

Humidity (on left) and Apparent temperature (on right).

Things that I have observed from these graphs are in case of apparent temperature, The temperature have raised from 10.2 to more than 10.8 by the end of 2015. While humidity didn’t show much difference but by 2010 time it had shown a big difference.

Analysis Based on Each Month

January

February

March

April

May

June

July

August

September

October

November

December

Conclusion

From the visualizations shown above we can conclude that the apparent temperature has shown a slight increase through these years. Humidity has also shown a small increase through these years.

Inference

So our null hypothesis was “Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming”. We know that global warming was steadily increasing through these 10 years. As shown in the graph below, solar irradiance is decreasing over years mainly through the years that we have mentioned, i.e. 2006–2016. So as solar irradiance is inversely proportional to global warming, we can conclude that global warming is increasing through these years.

Solar Irradiance and Temperature change due to global warming

Hence, as temperature and humidity also showed a increase through these years, we can conclude that our null hypothesis is true.

Reference

‘’I am thankful to mentors at https://internship.suvenconsultants.com for providing awesome problem statements and giving many of us a Coding Internship Exprience. Thank you www.suvenconsultants.com’’

https://www.kaggle.com/muthuj7/weather-dataset

--

--

Rithik Alias

Masters Student at IIITMK specialized in Data Analytics