surveyweathertool.src.weather.weather_pipeline

Module Contents

Functions

aggr_monthly(df[, column_aggr])

Aggregates the weather data in the DataFrame on a monthly basis.

aggr_yearly(df[, column_aggr])

Aggregates the weather data in the DataFrame on a monthly basis.

convert_point_crs(df, target_epsg[, source_epsg])

Convert a DataFrame with 'lon' and 'lat' columns to a GeoDataFrame with specified geometry.

convert_map_crs(→ geopandas.GeoDataFrame)

Convert the CRS of a DataFrame or GeoDataFrame to a specified EPSG code.

aggr_seasonal(df)

Aggregate data by season, latitude, longitude, and year, and calculates the mean, max, and min values.

aggr_seosonal_nigeria(df)

Aggregate data by season, latitude, longitude, and year, and calculates the mean, max, and min values.

combine(map_df, other_df, col_dissolve[, group_cols, ...])

Function to combine two GeoDataFrames based on a spatial join operation and group by operations.

create_df_final(nigeria_shape_df, weather_df, ...)

This function creates a final dataframe by combining weather and shapefile data

extreme_thresholding_delta(→ Tuple[float, bool])

Calculate the delta between an event's value and its threshold value, and check if it exceeds a delta parameter.

create_extreme_weather_event_delta(→ None)

Calculate deltas and identify extreme weather events based on given thresholds.

heatwave_heavy_rainfall_indicators(data, event, ...[, ...])

Compute heatwave or heavy rainfall indicators for extreme weather event definition.

combine_map_weather(nigeria_shape_df, weather_df, ...)

Combines map data with different types of weather data and returns a unified dataframe.

calculate_SPI(df[, column_rol])

Calculates the Standardized Precipitation Index (SPI) for given input data.

xarray_upsample(da, lat_res, lon_res, method)

Upsample an xarray DataArray based on the provided latitude and longitude resolutions.

clip_data(df, geo_df[, epsg])

Processes and clips weather data in the DataFrame to a specified geographical region.

interpolate_data(df, geo_df, value_col[, lat_res, ...])

Processes and interpolates weather data for each unique date in the DataFrame,

clip_dataarray(→ xarray.DataArray)

Clip a DataArray based on the shape of a GeoDataFrame.

plot_heatmap_grid_on_map(df, value_col, geo_df, cmap, ...)

Plots a heatmap for a given value column on a grid based on the provided GeoDataFrame.

interpolate_data_for_all_dates(df, geo_df, value_col)

Interpolates weather data for each unique date in the DataFrame.

surveyweathertool.src.weather.weather_pipeline.aggr_monthly(df, column_aggr='')

Aggregates the weather data in the DataFrame on a monthly basis.

Parameters: df (pandas.DataFrame): DataFrame containing weather data with a DateTime column. column_aggr: Column to aggregate, for example, temp or precip

Returns: df_monthly (pandas.DataFrame): New DataFrame containing the mean, max, and min weather grouped by month, year, latitude, and longitude.

surveyweathertool.src.weather.weather_pipeline.aggr_yearly(df, column_aggr='')

Aggregates the weather data in the DataFrame on a monthly basis.

Parameters: df (pandas.DataFrame): DataFrame containing weather data with a DateTime index. column_aggr: Column to aggregate, for example, temp or precip

Returns: df_monthly (pandas.DataFrame): New DataFrame containing the mean, max, and min weather grouped by month, year, latitude, and longitude.

surveyweathertool.src.weather.weather_pipeline.convert_point_crs(df, target_epsg, source_epsg=4326)

Convert a DataFrame with ‘lon’ and ‘lat’ columns to a GeoDataFrame with specified geometry.

Parameters: - df (pd.DataFrame): Input DataFrame with ‘lon’ and ‘lat’ columns. - target_epsg (str): Desired EPSG code for output GeoDataFrame. - source_epsg (str, optional): The EPSG code of the input DataFrame’s ‘lon’ and ‘lat’ columns.

Default is “EPSG:4326” (WGS84 latitude-longitude).

Returns: - gpd.GeoDataFrame: GeoDataFrame with geometry set and reprojected to the desired EPSG.

surveyweathertool.src.weather.weather_pipeline.convert_map_crs(geo_df: pandas.DataFrame, epsg: int) geopandas.GeoDataFrame

Convert the CRS of a DataFrame or GeoDataFrame to a specified EPSG code.

Parameters: - geo_df (pd.DataFrame): The input DataFrame or GeoDataFrame. - epsg (int): The EPSG code for the desired Coordinate Reference System.

Returns: - gpd.GeoDataFrame: The GeoDataFrame with the new CRS.

surveyweathertool.src.weather.weather_pipeline.aggr_seasonal(df)

Aggregate data by season, latitude, longitude, and year, and calculates the mean, max, and min values.

This function adds a new column ‘season’ to the dataframe, which categorizes each month into one of the four seasons: Spring (March, April, May), Summer (June, July, August), Autumn (September, October, November), Winter (December, January, February). It then groups the data by season, latitude, longitude, and year and calculates the mean values for ‘mean’, ‘max’, and ‘min’ columns.

Parameters: df (pandas.DataFrame): A pandas DataFrame that contains columns ‘month’, ‘lat’, ‘lon’, ‘year’, ‘mean’, ‘max’, and ‘min’.

Returns: df_seasonal (pandas.DataFrame): A pandas DataFrame with the aggregated data.

surveyweathertool.src.weather.weather_pipeline.aggr_seosonal_nigeria(df)

Aggregate data by season, latitude, longitude, and year, and calculates the mean, max, and min values. This function is same as aggr_seasonal but decided Nigeria’s two seasons

surveyweathertool.src.weather.weather_pipeline.combine(map_df, other_df, col_dissolve, group_cols=None, agg_dict={}, method='intersects')

Function to combine two GeoDataFrames based on a spatial join operation and group by operations.

Parameters

map_dfgeopandas.GeoDataFrame

The GeoDataFrame with the geometry column to be used for the spatial join.

other_dfgeopandas.GeoDataFrame

The GeoDataFrame with data to be joined with map_df.

col_dissolvestr

The column name in map_df used to dissolve the geometry into unique polygons.

group_colslist

List of columns to group by in addition to col_dissolve and geometry.

agg_dictdict

Dictionary of aggregation functions for specified columns.

methodstr, optional

The method to be used for the spatial join operation. It could be “intersects”, “within”, “contains”, etc. Default is “intersects”.

Returns

admin_dbgeopandas.GeoDataFrame

The GeoDataFrame resulting from the spatial join and group by operations.

surveyweathertool.src.weather.weather_pipeline.create_df_final(nigeria_shape_df, weather_df, col_dissolve, target_epsg, weather_data_name, agg_dict, level)

This function creates a final dataframe by combining weather and shapefile data

Parameters:

nigeria_shape_df: geopandas.GeoDataFrame

Nigeria map with admin columns

weather_dfgeopandas.GeoDataFrame

Weather data, geopandas.GeoDataFrame with columns [‘precipitation’, ‘temperature’, ‘spi’, ‘heatwaves’, ‘heavyrain’]

col_dissolvestr

Column name on which to dissolve boundaries when merging the shapefile and weather data.

target_epsg: target epsg to use weather_data_name:

This is weather name, wether precipitation or temperature. Currenly not in use properly but will be used better when different positions are created for each weather event in the dashboard

agg_dict: dict

Dictionary to with column names and the aggregation method to perform

level: str

“month” or “season” level to group on

Returns:

pandas.DataFrame

A dataframe containing the merged shapefile and weather data.

surveyweathertool.src.weather.weather_pipeline.extreme_thresholding_delta(row: Dict[str, Any], event: str, geo_columns: List[str], threshold_values: Dict[str, List[float]], delta_param: int | None) Tuple[float, bool]

Calculate the delta between an event’s value and its threshold value, and check if it exceeds a delta parameter.

Parameters:

row (Dict[str, Any]): A dictionary representing a row of data. event (str): The name of the event. geo_columns (List[str]): List of column names containing longitude and latitude. threshold_values (Dict[str, List[float]]): A dictionary of threshold values indexed by unique geo keys. delta_param (Optional[int]): The delta parameter value. If not provided, defaults to 0.

Returns:

Tuple[float, bool]: A tuple containing the calculated delta and a boolean indicating if the delta exceeds the delta parameter.

surveyweathertool.src.weather.weather_pipeline.create_extreme_weather_event_delta(data: pandas.DataFrame, event: str, geo_columns: List[str], daily_thresholds: Dict[str, List[float]], delta_param: int | None = None) None

Calculate deltas and identify extreme weather events based on given thresholds.

Parameters:

data (pd.DataFrame): DataFrame containing weather data. event (str): The name of the event. geo_columns (List[str]): List of column names containing longitude and latitude. daily_thresholds (Dict[str, List[float]]): A dictionary of daily threshold values indexed by unique geo keys. delta_param (Optional[int]): The delta parameter value. If not provided, defaults to None.

Returns:

None: The function modifies the ‘data’ DataFrame by adding ‘delta’ and ‘extreme’ columns.

surveyweathertool.src.weather.weather_pipeline.heatwave_heavy_rainfall_indicators(data: pandas.DataFrame, event: str, geo_columns: List[str], rolling_window: int = 3, delta_param: int = 5, days_param: int = 3, batch_size: int = 3)

Compute heatwave or heavy rainfall indicators for extreme weather event definition.

Parameters:

data (pd.DataFrame): The DataFrame containing weather data with ‘date’, ‘event’, and specified geo_columns. event (str): The weather event name (e.g., ‘temperature’, ‘rainfall’). geo_columns (List[str]): List of columns representing geographic information. on_survey (bool, optional): Flag indicating whether the function is being run on a survey. Default is False. rolling_window (int, optional): Rolling window size for computing daily averaging thresholds. Default is 3. delta_param (int, optional): Delta parameter for creating extreme weather event delta. Default is 5. days_param (int, optional): Days parameter for creating extreme weather event delta. Default is 3.

Returns:

None. Computes and saves the relevant indicators based on the specified parameters.

surveyweathertool.src.weather.weather_pipeline.combine_map_weather(nigeria_shape_df, weather_df, col_dissolve, weather_data_name, agg_dict, level, target_epsg)

Combines map data with different types of weather data and returns a unified dataframe.

Parameters:

nigeria_shape_df: geopandas.GeoDataFrame

Nigeria map with administrative columns.

weather_precipitation_dfgeopandas.GeoDataFrame

Weather data containing precipitation.

weather_temperature_dfgeopandas.GeoDataFrame

Weather data containing temperature.

weather_spi_dfgeopandas.GeoDataFrame

Weather data containing spi.

col_dissolve: str

Column name to dissolve boundaries when merging the shapefile and weather data.

weather_data_name: str

Specifies the type of weather data. Options include “temperature”, “precipitation”, “spi”, “heatwave”, and “heavyrain”.

agg_dict: dict

Dictionary with column names and the aggregation method to perform.

level: str

Specifies the level to group on. e.g. “month” or “season”.

target_epsg: int

The target epsg code to use for the geographical transformations.

Returns:

pandas.DataFrame

A dataframe containing the merged shapefile and specified weather data.

surveyweathertool.src.weather.weather_pipeline.calculate_SPI(df, column_rol='precipitation')

Calculates the Standardized Precipitation Index (SPI) for given input data.

Parameters: - df (pd.DataFrame): Input DataFrame containing precipitation data. - column_rol (str, optional): Column in the DataFrame containing precipitation data

that will be used to calculate the rolling mean. Defaults to ‘precipitation’.

  • save (bool, optional): If to save the spi to a csv file. Defaults to False

  • save (str, optional): Path to save the spi file. Defaults to spi_file.csv

Returns: - df (pd.DataFrame): DataFrame with added columns for 3-month rolling mean of precipitation,

cumulative probability, and the 3-month SPI.

surveyweathertool.src.weather.weather_pipeline.xarray_upsample(da, lat_res, lon_res, method)

Upsample an xarray DataArray based on the provided latitude and longitude resolutions.

Parameters:

daxarray.DataArray

The input DataArray to be upsampled.

lat_resfloat

The desired latitude resolution for upsampling.

lon_resfloat

The desired longitude resolution for upsampling.

methodstr

The interpolation method to be used for upsampling. Supported methods are the same as those for xarray’s interp method (e.g., “linear”, “nearest”).

Returns:

xarray.DataArray

The upsampled DataArray with new latitude and longitude resolutions.

surveyweathertool.src.weather.weather_pipeline.clip_data(df: pandas.DataFrame, geo_df: geopandas.GeoDataFrame, epsg: int = 4326)

Processes and clips weather data in the DataFrame to a specified geographical region.

Parameters:

dfpd.DataFrame

Input DataFrame with weather data that includes columns: date, lat, lon, and the specified value (e.g., temperature).

geo_dfgpd.GeoDataFrame

Geographic DataFrame (e.g., a map of Nigeria) used to clip the weather data.

epsgint, optional

EPSG code to set the Coordinate Reference System (CRS) for the data. Defaults to 4326 (WGS 84).

Returns:

pd.DataFrame

A DataFrame with the combined, processed, and interpolated weather data, clipped to the specified region (from ‘geo_df’). The DataFrame includes columns: lat, lon, the specified weather value (e.g., temperature), and date.

surveyweathertool.src.weather.weather_pipeline.interpolate_data(df: geopandas.GeoDataFrame, geo_df: geopandas.GeoDataFrame, value_col: str, lat_res: float = 0.1, lon_res: float = 0.1, method='linear', epsg: int = 4326)

Processes and interpolates weather data for each unique date in the DataFrame, and clips the interpolated data to a specified geographical region.

Parameters:

dfgpd.GeoDataFrame

Input DataFrame with weather data that includes columns: date, lat, lon, and the specified value (e.g., temperature).

geo_dfgpd.GeoDataFrame

Geographic DataFrame (e.g., a map of Nigeria) used to clip the interpolated weather data.

value_colstr

Column name in ‘df’ with weather values (e.g., temperature) to be visualized and interpolated.

lat_resfloat, optional

Latitude resolution for interpolation. Defaults to 0.1.

lon_resfloat, optional

Longitude resolution for interpolation. Defaults to 0.1.

methodstr, optional

Interpolation method to use. Can be ‘linear’, ‘nearest’, etc. depending on the available methods in the underlying library. Defaults to ‘linear’.

epsgint, optional

EPSG code to set the Coordinate Reference System (CRS) for the data. Defaults to 4326 (WGS 84).

Returns:

pd.DataFrame

A DataFrame with the combined, processed, and interpolated weather data, clipped to the specified region (from ‘geo_df’). The DataFrame includes columns: lat, lon, the specified weather value (e.g., temperature), and date.

surveyweathertool.src.weather.weather_pipeline.clip_dataarray(da: xarray.DataArray, geo_df: geopandas.GeoDataFrame) xarray.DataArray

Clip a DataArray based on the shape of a GeoDataFrame.

Parameters: - da (xarray.DataArray): The input DataArray. - geo_df (gpd.GeoDataFrame): The GeoDataFrame to use for clipping.

Returns: - xarray.DataArray: The clipped DataArray.

surveyweathertool.src.weather.weather_pipeline.plot_heatmap_grid_on_map(df: geopandas.GeoDataFrame, value_col: str, geo_df: geopandas.GeoDataFrame, cmap: str, legend_title: str, clip: bool = False, epsg: int = 4326)

Plots a heatmap for a given value column on a grid based on the provided GeoDataFrame. NB: Use ‘Reds’ for temperature related plots, and ‘Greens’ for precipitations related plots

Parameters:

dfgpd.GeoDataFrame

The GeoDataFrame containing the data to be plotted.

value_colstr

The column name in df which contains the values to be plotted on the heatmap.

geo_dfgpd.GeoDataFrame

The GeoDataFrame to use for clipping. Defaults to None.

clipbool, optional

Clips the data to the map. If True geo_df should be provided.

epsgint, optional (default=4326)

The EPSG code representing the coordinate reference system (CRS) of the input data. Default is EPSG:4326 (WGS 84).

Returns:

None

Displays a heatmap plot.

surveyweathertool.src.weather.weather_pipeline.interpolate_data_for_all_dates(df: geopandas.GeoDataFrame, geo_df: geopandas.GeoDataFrame, value_col: str, lat_res: float = 0.1, lon_res: float = 0.1, method: str = 'linear', epsg: int = 4326)

Interpolates weather data for each unique date in the DataFrame.

Parameters are the same as in the ‘interpolate_data’ function.

Returns:

pd.DataFrame

A DataFrame with interpolated weather data for all dates, clipped to the specified region.