Building live air quality index application

#API #Air quality #Dash #Data Visualization

Today I will continue the work with data collected from Air Quality Service API. I will use csv file that was created in this project and build on it another dash app. This time the goal of the app is to return the most recent measurements from sensors based on the chosen city (monitoring station). The deployed app can be seen here (it may take up to 30s to load since the app was deployed within a free tier).

Let's start with what the Air Quality Index basically is. There are a few different methodologies for calculating the air quality index depending on the region. Polish methodology calculates the index according to this table and is based on 1-hour results from measurements of concentrations of the following pollutants: sulfur dioxide (SO2), nitrogen dioxide (NO2), PM10 dust, PM2.5 dust, and ozone (O3). Then, the general index takes the value of the worst individual index among the pollutants measured at the monitoring station. This app skips the last step and returns only the individual indexes. There are some monitoring stations which measure additional pollutants, for example C6H6. In these cases the app returns information that the pollutant is not included in the AQI.

The very first step of building this app is to create a dictionary with all 5 pollutants as dictionary keys. The values are in the form of tuples which contain 3 elements. First 2 elements represent the concentration range and the third - corresponding AQI category.

aqi_ranges = {
    'PM10': [(0, 20, 'Very Good'), (20.1, 50, 'Good'), (50.1, 80, 'Moderate'), (80.1, 110, 'Passable'),
             (110.1, 150, 'Bad'), (150.1, 1000, 'Very Bad')],
    'PM2.5': [(0, 13, 'Very Good'), (13.1, 35, 'Good'), (35.1, 55, 'Moderate'), (55.1, 75, 'Passable'),
              (75.1, 110, 'Bad'), (110.1, 1000, 'Very Bad')],
    'O3': [(0, 70, 'Very Good'), (70.1, 120, 'Good'), (120.1, 150, 'Moderate'), (150.1, 180, 'Passable'),
           (180.1, 240, 'Bad'), (240.1, 1000, 'Very Bad')],
    'NO2': [(0, 40, 'Very Good'), (40.1, 100, 'Good'), (100.1, 150, 'Moderate'), (150.1, 230, 'Passable'),
            (230.1, 400, 'Bad'), (400.1, 1500, 'Very Bad')],
    'SO2': [(0, 50, 'Very Good'), (50.1, 100, 'Good'), (100.1, 200, 'Moderate'), (200.1, 350, 'Passable'),
            (350.1, 500, 'Bad'), (500.1, 2000, 'Very Bad')]
}

The next steps involve defining 2 functions. Since the application is to return the most recent measurements based on the chosen city (monitoring station) the first function needs to collect all sensors installed at chosen monitoring station and then get the measurements from these sensors. This can be achieved by filtering our dataframe (monitoring_stations_pl.csv), making a list of sensors and then iterating over its elements. In addition, based on the above dictonary the function assigns AQI category by employing the second function (get_aqi_category).

Since we are not interested in getting all 1-hour measurements but only the most recent one, we need to apply another for loop that breaks as soon as it finds the first not None value. Initially I intended to get the loop return the very first value, but soon I realized that the measurements from particular sensors are not synchronized which results in getting None values for some sensors.

def get_air_quality_data(station):
    sensors = df[df['station_name'] == station]['sensor_id'].tolist()
    all_data = []

    for sensor_id in sensors:
        api_endpoint = f'https://api.gios.gov.pl/pjp-api/rest/data/getData/{sensor_id}'
        response = requests.get(api_endpoint)

        if response.status_code == 200:
            values = response.json().get('values')
            if values:
                sensor_data = None
                for v in values:
                    if v.get('value') is not None:
                        sensor_data = v
                        break

                if sensor_data is None:
                    continue

                pollutant = response.json().get('key')
                if pollutant is None:
                    continue

                aqi_value = sensor_data.get('value')
                if aqi_value is None:
                    continue

                aqi_category = get_aqi_category(pollutant, aqi_value)
                all_data.append(
                    {'sensor_id': sensor_id, 'key': pollutant, 'value': sensor_data, 'aqi_category': aqi_category}
                )
            else:
                print(f"No values found for sensor ID {sensor_id}")
        else:
            print(
                f"Error while collecting data from API for sensor ID {sensor_id}. Status code: {response.status_code}")
            return html.Div(
                f'Error while collecting data from API for sensor ID {sensor_id}. Check internet connection.')

    return all_data

def get_aqi_category(pollutant, value):
    ranges = aqi_ranges.get(pollutant, [])
    aqi_category = "Not included in AQI index"

    for range_tuple in ranges:
        if range_tuple[0] <= value <= range_tuple[1]:
            aqi_category = range_tuple[2]
            break

    return aqi_category

I thought it would be nice to visualize the location of the chosen monitoring station. In order to do so, let's define another function:

def generate_map(station):
    station_data = df[df['station_name'] == station].iloc[0]
    fig = px.scatter_mapbox(
        lat=[station_data['latitude']],
        lon=[station_data['longitude']],
        hover_name=[station],
        zoom=10,
        height=500
    )
    fig.update_layout(mapbox_style="open-street-map")
    fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
    return fig

As soon as we have our functions prepared, we can focus on building a dash app. To let the user have a choice of the city (monitoring station) we need to use a dropdown component with options being nothing but the unique values of the station_name column in our df.

app = dash.Dash(__name__)

app.layout = html.Div([
    html.Label('Choose monitoring station:'),
    dcc.Dropdown(
        id='station-dropdown',
        options=[{'label': station, 'value': station} for station in df['station_name'].unique()],
        value=df['station_name'].unique()[0],  # First station is set as default value
        style={'width': '50%'}
    ),
    html.Br(),
    html.Div(id='output-container'),
    html.Br(),
    dcc.Graph(id='map')
])

Based on the input (chosen city) the app builds a html table with required data and shows on the map the location of the monitoring station. In addition, the assigned AQI category changes the background color depending on the returned value to strenghten the message.

The project repository is available here.

Building live air quality index application

Ostatnie posty

Comments