Performing Analysis of Meteorological Data
Author : Rithik Alias
The Dataset Used
One type of data that’s easier to find on the net is Weather data. Many sites provide historical data on many meteorological parameters such as pressure, temperature, humidity, wind speed, visibility, etc. One such weather dataset is available on Kaggle. (Source URL: https://www.kaggle.com/muthuj7/weather-dataset)
The dataset has hourly temperature recorded for last 10 years starting from 2006–04–01 00:00:00.000 +0200 to 2016–09–09 23:00:00.000 +0200. It corresponds to Finland, a country in the Northern Europe. You can download the dataset from this Google drive link: https://drive.google.com/open?id=1ScF_1a-bkHi1qe8Rn78uxK6_5QwUD9Bu
Objective
We will be transforming this raw data into information and then convert it into knowledge.
“Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming”
We will be performing analysis on the above raw data based on the given null hypothesis.
The null hypothesis means we need to find whether the average Apparent temperature for the month of a month say April starting from 2006 to 2016 and the average humidity for the same period have increased or not.
The Code
The code is available with the dataset in the GitHub repository mentioned below.
The link to the code : https://colab.research.google.com/drive/1ajdcaXBVisMOUHDw77J9V4mF9dHXGGaS?usp=sharing
GitHub : https://github.com/Rithik-Alias/Analysis-of-Meteorological-Data
Cleaning the Dataset
As part of cleaning, the very first thing that I have done is transforming the unformatted date column to a formatted datetime type column. Hence it become much easier for the further analysis. Also I have removed the unwanted columns ['Summary', 'Precip Type', 'Temperature (C)', 'Wind Speed (km/h)', 'Wind Bearing (degrees)', 'Visibility (km)', 'Pressure (millibars)', 'Daily Summary']. Also I made the date column as the index column.
The Result Based on a Yearly Resampling
Here mainly we have the apparent temperature data and humidity data. So we are going to conduct our analysis on these to data. Hence then we will try to check the trueness of the hypothesis that we have chosen.
So, here I have visualized yearly change in the humidity and apparent temperature from 2006 to 2016. I took the mean of each year’s data to represent each year.
Things that I have observed from these graphs are in case of apparent temperature, The temperature have raised from 10.2 to more than 10.8 by the end of 2015. While humidity didn’t show much difference but by 2010 time it had shown a big difference.
Analysis Based on Each Month
January
February
March
April
May
June
July
August
September
October
November
December
Conclusion
From the visualizations shown above we can conclude that the apparent temperature has shown a slight increase through these years. Humidity has also shown a small increase through these years.
Inference
So our null hypothesis was “Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming”. We know that global warming was steadily increasing through these 10 years. As shown in the graph below, solar irradiance is decreasing over years mainly through the years that we have mentioned, i.e. 2006–2016. So as solar irradiance is inversely proportional to global warming, we can conclude that global warming is increasing through these years.
Hence, as temperature and humidity also showed a increase through these years, we can conclude that our null hypothesis is true.
Reference
‘’I am thankful to mentors at https://internship.suvenconsultants.com for providing awesome problem statements and giving many of us a Coding Internship Exprience. Thank you www.suvenconsultants.com’’