top of page

Collecting data from API

  • staniszradek
  • 6 wrz 2023
  • 3 minut(y) czytania

Zaktualizowano: 19 cze 2024


As I mentioned in the previous post, one way to get the data is to use API. In this article I am going to demonstrate how can we do it and what can we do with such data subsequently. I am going to use the API of Air Quality Service run by Chief Inspectorate for Environmental Protection in Poland. The goal of this exercise is to create a csv file with the following information:


  • monitoring stations' names and id

  • monitoring stations' coordinates

  • what is being measured and sensors id


Read API documentation


The first thing we need to do is to visit API website and read through. There are always some instructions on how to use the API and what kind of endpoints they offer. In this case we have 4 endpoints available:


  1. https://api.gios.gov.pl/pjp-api/rest/station/findAll - from this endpoint we can get a list of all monitoring stations

  2. https://api.gios.gov.pl/pjp-api/rest/station/sensors/{stationId} - returns list of sensors available at particular monitoring station

  3. https://api.gios.gov.pl/pjp-api/rest/data/getData/{sensorId} - returns measurements made by particular sensor

  4. https://api.gios.gov.pl/pjp-api/rest/aqindex/getIndex/{stationId} - returns air quality index for particular monitoring station


Along with the endpoints we are also informed about the requests limits. For example, the first endpoint has a limit of 2 requests per minute. It is important to obey this limits otherwise we'll face an error. To control our requests we can use 'sleep' function from 'time' module and specify exactly how many requests we want to make within a minute.


Making first requests


Once we are familiar with the API documentation we can start working with particular endpoints. To make a request, we need to import requests library and use GET method:



The response we got is a list and can easily be transformed into a DataFrame in Pandas:



Since we want only stations' names, coordinates and id, the last two columns are to be removed using drop method.

Now we are ready to proceed with the second endpoint. The idea is the same, however, this endpoint returns all sensors from a single station and what we need is all sensors from all stations. Once we have prepared a list of all stations we can use 'for' loop to get it:



Unfortunately our response is too nested to convert it directly into DataFrame. Therefore we need to flatten our list:



Now we have received almost what we needed. We just need to break the column 'param' down. We will use a method called 'json_normalize():



As a result we got another DataFrame out of column 'param'. This is what we expected. Now we need to merge all 3 Dataframes we had created, clean a little bit and save it as csv file.





Notice that we can merge df and df_2 using id of a monitoring station. In df it's marked as 'id' while in df_2 it is 'stationId' column.



We have just created a DataFrame (table) with all required information:


  • monitoring stations' names and id

  • monitoring stations' coordinates

  • what is being measured and sensors id

and saved it as csv file.


On our way, not only did we collected data by using API but also we employed a couple of useful python functions and methods to manipulate the data. To sum up let's list some of the key takeways from this exercise:


  • making a request with GET method

  • creating a DataFrame in Pandas

  • using list comprehension

  • merging DataFrames










Comments


bottom of page