Air pollution has become a serious problem in recent years across the world. Effects of Air Pollution is devastating and its harmful effects are not just limited to Humans but also animals and plants as well. It also leads to global warming which is esentially increasing air and ocean temperatures around the world.
Indian cities have been topping the list of polluted cities. In order to solve the problem of air pollution the most important thing is to track air pollution on real-time basis first which alerts people to avoid outdoor activities during high air Pollution. This post explains how you can fetch real-time Air Quality Index (AQI) of Indian cities using Python and R code. It allows both Python and R programmers to pull pollution data.
You can download the dataset which contains static information about Indian states, cities and AQI stations. Variables stored in this dataset will be used further to fetch real-time data.
id stationID longitude
1 site_5331 Kariavattom, Thiruvananthapuram - Kerala PCB 76.88650
2 site_252 Plammoodu, Thiruvananthapuram - Kerala PCB 76.94359
3 site_5272 Kacheripady, Ernakulam - Kerala PCB 76.28134
4 site_5276 Thavakkara, Kannur - Kerala PCB 75.37320
5 site_5334 Polayathode, Kollam - Kerala PCB 76.60730
6 site_5271 Palayam, Kozhikode - Kerala PCB 75.78437
latitude live avg cityID stateID
1 8.563700 FALSE NA Thiruvananthapuram Kerala
2 8.514909 TRUE 20 Thiruvananthapuram Kerala
3 9.985653 TRUE 27 Ernakulam Kerala
4 11.875000 TRUE 56 Kannur Kerala
5 8.878700 TRUE 54 Kollam Kerala
6 11.249077 TRUE 70 Kozhikode Kerala
Dashboard > MyAccount
. You will find API key there.
Incase you face any issue registering yourself and getting API key, you can use Sample API Key for experimentation purpose.See the Sample API key below. Sample API key has a limitation of 10 records it can fetch at a single run.
579b464db66ec23bdd000001cdd3946e44ce4aad7209ff7b23ac571b
Real Time Air Quality Index From Various Locations
which we will use in this step.
API Key
Filter criteria
. Filter criteria can have "state", "city", "station", "pollutant_id". To see the unique values of state, city and station, you can download and refer the dataset shown above. Distinct values of pollutant_id are as follows -
"PM2.5" "PM10" "NO2" "NH3" "SO2" "CO" "OZONE"
Python Code
import requests
import json
import pandas as pd
import re
import datetime
import time
import base64
from itertools import product
stationsData = pd.read_csv("https://raw.githubusercontent.com/deepanshu88/Datasets/master/UploadedFiles/stations.csv")
def getData(api, filters):
url1 = "https://api.data.gov.in/resource/3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69?api-key=" + api + "&format=json&limit=500"
criteriaAll = [[(k, re.sub(r'\s+', '%20', v)) for v in criteria[k]] for k in criteria]
url2 = [url1 + ''.join(f'&filters[{ls}]={value}' for ls, value in p) for p in product(*criteriaAll)]
pollutionDfAll = pd.DataFrame()
for i in url2:
response = requests.get(i, verify=True)
response_dict = json.loads(response.text)
pollutionDf = pd.DataFrame(response_dict['records'])
pollutionDfAll = pd.concat([pollutionDfAll, pollutionDf])
return pollutionDfAll
# Sample key
api = "579b464db66ec23bdd000001cdd3946e44ce4aad7209ff7b23ac571b"
criteria = {'city':["Greater Noida","Delhi"], 'pollutant_id': ["PM10", "PM2.5"]}
mydata = getData(api, criteria)
R Code
library(httr) library(jsonlite) library(dplyr) stationsData <- read.csv("https://raw.githubusercontent.com/deepanshu88/Datasets/master/UploadedFiles/stations.csv") getData <- function(api, filters) { url1 <- paste0("https://api.data.gov.in/resource/3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69?api-key=",api, "&format=json&limit=500") url2 <- paste0('%s', paste0('&filters[', names(filters), ']=%s', collapse = '')) urlAll <- do.call(sprintf, c(url2, url1, expand.grid(lapply(filters, function(x) gsub("\\s+", "%20", x))))) pollutionDfAll <- data.frame() for (i in urlAll){ request <- GET(url=i) response <- content(request, as = "text", encoding = "UTF-8") df <- fromJSON(response, flatten = TRUE) pollutionDf <- df[["records"]] pollutionDfAll <- rbind(pollutionDfAll, pollutionDf) } return(pollutionDfAll) } # Sample key api <- "579b464db66ec23bdd000001cdd3946e44ce4aad7209ff7b23ac571b" criteria <- list(city=c("Greater Noida","Delhi"),pollutant_id=c("PM10", "PM2.5")) mydata <- getData(api, criteria)
id country state city 1 1686 India Uttar_Pradesh Greater Noida 2 1693 India Uttar_Pradesh Greater Noida 3 297 India Delhi Delhi 4 304 India Delhi Delhi 5 311 India Delhi Delhi 6 318 India Delhi Delhi station last_update 1 Knowledge Park - III, Greater Noida - UPPCB 11-07-2022 05:00:00 2 Knowledge Park - V, Greater Noida - UPPCB 11-07-2022 05:00:00 3 Alipur, Delhi - DPCC 11-07-2022 05:00:00 4 Anand Vihar, Delhi - DPCC 11-07-2022 05:00:00 5 Ashok Vihar, Delhi - DPCC 11-07-2022 05:00:00 6 Aya Nagar, Delhi - IMD 11-07-2022 05:00:00 pollutant_id pollutant_min pollutant_max pollutant_avg 1 PM10 57 136 93 2 PM10 56 147 98 3 PM10 45 118 77 4 PM10 96 179 132 5 PM10 80 122 95 6 PM10 38 83 65 pollutant_unit 1 NA 2 NA 3 NA 4 NA 5 NA 6 NA
Python Code
criteria = {"station":["Anand Vihar, Delhi - DPCC", "Okhla Phase-2, Delhi - DPCC"], "pollutant_id":["PM10"]}
mydata = getData(api, criteria)
R Code
criteria <- list(station=c("Anand Vihar, Delhi - DPCC", "Okhla Phase-2, Delhi - DPCC"), pollutant_id=c("PM10")) mydata <- getData(api, criteria)
We can pass cities as a criteria to pull AQI of all the cities. Then we can take median of AQIs by city.
Python Code
criteria = {'city' : stationsData.cityID.unique(), 'pollutant_id' : ["PM10"]}
mydata = getData(api, criteria)
R Code
criteria <- list(city= unique(stationsData$cityID), pollutant_id= c("PM10")) mydata <- getData(api, criteria)
The program below returns two dataframes - summary, pollutants. Dataframe named pollutants
returns scores with respect to various pollutants in the location. Function has two arguments - id
and dt
. id refers to unique identifier assigned to each station. Format of id : site_*
and dt refers to datetime object.
Python Code
import requests
import json
import pandas as pd
import re
import datetime
import time
import base64
from itertools import product
def get_data_cpcb(id, dt):
datetime2 = dt.strftime('%Y-%m-%dT%H:%M:%SZ')
key = '{"station_id":"' + id + '","date":"' + datetime2 + '"}'
body = base64.b64encode(key.encode()).decode()
timeZoneoffset = int((datetime.datetime.utcnow() - datetime.datetime.now()).total_seconds()/60)
token = '{"time":' + str(int(time.time())) + ',"timeZoneOffset":'+ str(timeZoneoffset ) +'}'
accessToken = base64.b64encode(str(token).encode()).decode()
headers = {
'accept': 'application/json, text/javascript, */*; q=0.01',
'accesstoken': accessToken,
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'https://app.cpcbccr.com',
'referer': 'https://app.cpcbccr.com/AQI_India/',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'accept-language': 'en-US,en;q=0.9'
}
response = requests.post('https://app.cpcbccr.com/aqi_dashboard/aqi_all_Parameters', headers=headers, data=body, verify=True)
response_dict = json.loads(response.text)
info = pd.DataFrame({'title':response_dict['title'], 'date':response_dict['date']}, index=[0])
pollutionDf = pd.concat([pd.DataFrame([response_dict['aqi']]), info], axis=1)
pollutants = pd.concat([pd.DataFrame(response_dict['metrics']), info], axis=1)
return pollutionDf, pollutants
id = stationsData.id[0]
summary, pollutants = get_data_cpcb(id, datetime.datetime(2022, 7, 9, 18, 44, 59, 0))
R Code
library(httr) library(jsonlite) library(dplyr) get_data_cpcb <- function(id, datetime) { is.POSIXct <- function(x) inherits(x, "POSIXct") if(!is.POSIXct(datetime)) {stop("datetime must be POSIXct object")} key = paste0('{"station_id":"', id, '","date":"', gsub("\\s+", "T",as.character(datetime)), "Z",'"}') body = gsub("\\n","",base64_enc(key)) timeZoneoffset <- ceiling((as.numeric(as.POSIXct(format(datetime),tz="UTC")) - as.numeric(datetime))/60) token = paste0('{"time":', ceiling(as.numeric(datetime)), ',"timeZoneOffset":', timeZoneoffset, '}') accesstoken = base64_enc(token) URL <- "https://app.cpcbccr.com/aqi_dashboard/aqi_all_Parameters" headers <- add_headers(`user-agent` = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36", accept = "application/json, text/javascript, */*; q=0.01", `accept-encoding` = "gzip, deflate, br", `accept-language` = "en-US,en;q=0.9,id;q=0.8,pt;q=0.7", accesstoken = accesstoken, `content-type` = "application/x-www-form-urlencoded; charset=UTF-8", origin = "https://app.cpcbccr.com", referer = 'https://app.cpcbccr.com/AQI_India/', `sec-fetch-dest` = "empty", `sec-fetch-mode` = "cors", `sec-fetch-site` = "same-origin") request <- POST(URL, headers, body = body, encode = "form") response <- content(request, as = "text", encoding = "UTF-8") df <- fromJSON(response, flatten = TRUE) return(df) } id <- stationsData$id[1] datetime <- as.POSIXct("2022-07-08 16:35:00") df <- get_data_cpcb(id, datetime) summary <- data.frame(df[c("title","date")], t(unlist(df$aqi))) pollutants <- data.frame(df[c("title","date")], df$metrics)
Very helpful guide. Thanks!!! Is it possible to give me a link to static information about Indian states (for stations with lat and long) directly from government site please?
ReplyDelete