surveyweathertool.src.weather.weather_pipeline¶
Module Contents¶
Functions¶
|
Aggregates the weather data in the DataFrame on a monthly basis. |
|
Aggregates the weather data in the DataFrame on a monthly basis. |
|
Convert a DataFrame with 'lon' and 'lat' columns to a GeoDataFrame with specified geometry. |
|
Convert the CRS of a DataFrame or GeoDataFrame to a specified EPSG code. |
|
Aggregate data by season, latitude, longitude, and year, and calculates the mean, max, and min values. |
Aggregate data by season, latitude, longitude, and year, and calculates the mean, max, and min values. |
|
|
Function to combine two GeoDataFrames based on a spatial join operation and group by operations. |
|
This function creates a final dataframe by combining weather and shapefile data |
|
Calculate the delta between an event's value and its threshold value, and check if it exceeds a delta parameter. |
Calculate deltas and identify extreme weather events based on given thresholds. |
|
|
Compute heatwave or heavy rainfall indicators for extreme weather event definition. |
|
Combines map data with different types of weather data and returns a unified dataframe. |
|
Calculates the Standardized Precipitation Index (SPI) for given input data. |
|
Upsample an xarray DataArray based on the provided latitude and longitude resolutions. |
|
Processes and clips weather data in the DataFrame to a specified geographical region. |
|
Processes and interpolates weather data for each unique date in the DataFrame, |
|
Clip a DataArray based on the shape of a GeoDataFrame. |
|
Plots a heatmap for a given value column on a grid based on the provided GeoDataFrame. |
|
Interpolates weather data for each unique date in the DataFrame. |
- surveyweathertool.src.weather.weather_pipeline.aggr_monthly(df, column_aggr='')¶
Aggregates the weather data in the DataFrame on a monthly basis.
Parameters: df (pandas.DataFrame): DataFrame containing weather data with a DateTime column. column_aggr: Column to aggregate, for example, temp or precip
Returns: df_monthly (pandas.DataFrame): New DataFrame containing the mean, max, and min weather grouped by month, year, latitude, and longitude.
- surveyweathertool.src.weather.weather_pipeline.aggr_yearly(df, column_aggr='')¶
Aggregates the weather data in the DataFrame on a monthly basis.
Parameters: df (pandas.DataFrame): DataFrame containing weather data with a DateTime index. column_aggr: Column to aggregate, for example, temp or precip
Returns: df_monthly (pandas.DataFrame): New DataFrame containing the mean, max, and min weather grouped by month, year, latitude, and longitude.
- surveyweathertool.src.weather.weather_pipeline.convert_point_crs(df, target_epsg, source_epsg=4326)¶
Convert a DataFrame with ‘lon’ and ‘lat’ columns to a GeoDataFrame with specified geometry.
Parameters: - df (pd.DataFrame): Input DataFrame with ‘lon’ and ‘lat’ columns. - target_epsg (str): Desired EPSG code for output GeoDataFrame. - source_epsg (str, optional): The EPSG code of the input DataFrame’s ‘lon’ and ‘lat’ columns.
Default is “EPSG:4326” (WGS84 latitude-longitude).
Returns: - gpd.GeoDataFrame: GeoDataFrame with geometry set and reprojected to the desired EPSG.
- surveyweathertool.src.weather.weather_pipeline.convert_map_crs(geo_df: pandas.DataFrame, epsg: int) geopandas.GeoDataFrame¶
Convert the CRS of a DataFrame or GeoDataFrame to a specified EPSG code.
Parameters: - geo_df (pd.DataFrame): The input DataFrame or GeoDataFrame. - epsg (int): The EPSG code for the desired Coordinate Reference System.
Returns: - gpd.GeoDataFrame: The GeoDataFrame with the new CRS.
- surveyweathertool.src.weather.weather_pipeline.aggr_seasonal(df)¶
Aggregate data by season, latitude, longitude, and year, and calculates the mean, max, and min values.
This function adds a new column ‘season’ to the dataframe, which categorizes each month into one of the four seasons: Spring (March, April, May), Summer (June, July, August), Autumn (September, October, November), Winter (December, January, February). It then groups the data by season, latitude, longitude, and year and calculates the mean values for ‘mean’, ‘max’, and ‘min’ columns.
Parameters: df (pandas.DataFrame): A pandas DataFrame that contains columns ‘month’, ‘lat’, ‘lon’, ‘year’, ‘mean’, ‘max’, and ‘min’.
Returns: df_seasonal (pandas.DataFrame): A pandas DataFrame with the aggregated data.
- surveyweathertool.src.weather.weather_pipeline.aggr_seosonal_nigeria(df)¶
Aggregate data by season, latitude, longitude, and year, and calculates the mean, max, and min values. This function is same as aggr_seasonal but decided Nigeria’s two seasons
- surveyweathertool.src.weather.weather_pipeline.combine(map_df, other_df, col_dissolve, group_cols=None, agg_dict={}, method='intersects')¶
Function to combine two GeoDataFrames based on a spatial join operation and group by operations.
Parameters¶
- map_dfgeopandas.GeoDataFrame
The GeoDataFrame with the geometry column to be used for the spatial join.
- other_dfgeopandas.GeoDataFrame
The GeoDataFrame with data to be joined with map_df.
- col_dissolvestr
The column name in map_df used to dissolve the geometry into unique polygons.
- group_colslist
List of columns to group by in addition to col_dissolve and geometry.
- agg_dictdict
Dictionary of aggregation functions for specified columns.
- methodstr, optional
The method to be used for the spatial join operation. It could be “intersects”, “within”, “contains”, etc. Default is “intersects”.
Returns¶
- admin_dbgeopandas.GeoDataFrame
The GeoDataFrame resulting from the spatial join and group by operations.
- surveyweathertool.src.weather.weather_pipeline.create_df_final(nigeria_shape_df, weather_df, col_dissolve, target_epsg, weather_data_name, agg_dict, level)¶
This function creates a final dataframe by combining weather and shapefile data
Parameters:¶
- nigeria_shape_df: geopandas.GeoDataFrame
Nigeria map with admin columns
- weather_dfgeopandas.GeoDataFrame
Weather data, geopandas.GeoDataFrame with columns [‘precipitation’, ‘temperature’, ‘spi’, ‘heatwaves’, ‘heavyrain’]
- col_dissolvestr
Column name on which to dissolve boundaries when merging the shapefile and weather data.
target_epsg: target epsg to use weather_data_name:
This is weather name, wether precipitation or temperature. Currenly not in use properly but will be used better when different positions are created for each weather event in the dashboard
- agg_dict: dict
Dictionary to with column names and the aggregation method to perform
- level: str
“month” or “season” level to group on
Returns:¶
- pandas.DataFrame
A dataframe containing the merged shapefile and weather data.
- surveyweathertool.src.weather.weather_pipeline.extreme_thresholding_delta(row: Dict[str, Any], event: str, geo_columns: List[str], threshold_values: Dict[str, List[float]], delta_param: int | None) Tuple[float, bool]¶
Calculate the delta between an event’s value and its threshold value, and check if it exceeds a delta parameter.
- Parameters:
row (Dict[str, Any]): A dictionary representing a row of data. event (str): The name of the event. geo_columns (List[str]): List of column names containing longitude and latitude. threshold_values (Dict[str, List[float]]): A dictionary of threshold values indexed by unique geo keys. delta_param (Optional[int]): The delta parameter value. If not provided, defaults to 0.
- Returns:
Tuple[float, bool]: A tuple containing the calculated delta and a boolean indicating if the delta exceeds the delta parameter.
- surveyweathertool.src.weather.weather_pipeline.create_extreme_weather_event_delta(data: pandas.DataFrame, event: str, geo_columns: List[str], daily_thresholds: Dict[str, List[float]], delta_param: int | None = None) None¶
Calculate deltas and identify extreme weather events based on given thresholds.
- Parameters:
data (pd.DataFrame): DataFrame containing weather data. event (str): The name of the event. geo_columns (List[str]): List of column names containing longitude and latitude. daily_thresholds (Dict[str, List[float]]): A dictionary of daily threshold values indexed by unique geo keys. delta_param (Optional[int]): The delta parameter value. If not provided, defaults to None.
- Returns:
None: The function modifies the ‘data’ DataFrame by adding ‘delta’ and ‘extreme’ columns.
- surveyweathertool.src.weather.weather_pipeline.heatwave_heavy_rainfall_indicators(data: pandas.DataFrame, event: str, geo_columns: List[str], rolling_window: int = 3, delta_param: int = 5, days_param: int = 3, batch_size: int = 3)¶
Compute heatwave or heavy rainfall indicators for extreme weather event definition.
- Parameters:
data (pd.DataFrame): The DataFrame containing weather data with ‘date’, ‘event’, and specified geo_columns. event (str): The weather event name (e.g., ‘temperature’, ‘rainfall’). geo_columns (List[str]): List of columns representing geographic information. on_survey (bool, optional): Flag indicating whether the function is being run on a survey. Default is False. rolling_window (int, optional): Rolling window size for computing daily averaging thresholds. Default is 3. delta_param (int, optional): Delta parameter for creating extreme weather event delta. Default is 5. days_param (int, optional): Days parameter for creating extreme weather event delta. Default is 3.
- Returns:
None. Computes and saves the relevant indicators based on the specified parameters.
- surveyweathertool.src.weather.weather_pipeline.combine_map_weather(nigeria_shape_df, weather_df, col_dissolve, weather_data_name, agg_dict, level, target_epsg)¶
Combines map data with different types of weather data and returns a unified dataframe.
Parameters:¶
- nigeria_shape_df: geopandas.GeoDataFrame
Nigeria map with administrative columns.
- weather_precipitation_dfgeopandas.GeoDataFrame
Weather data containing precipitation.
- weather_temperature_dfgeopandas.GeoDataFrame
Weather data containing temperature.
- weather_spi_dfgeopandas.GeoDataFrame
Weather data containing spi.
- col_dissolve: str
Column name to dissolve boundaries when merging the shapefile and weather data.
- weather_data_name: str
Specifies the type of weather data. Options include “temperature”, “precipitation”, “spi”, “heatwave”, and “heavyrain”.
- agg_dict: dict
Dictionary with column names and the aggregation method to perform.
- level: str
Specifies the level to group on. e.g. “month” or “season”.
- target_epsg: int
The target epsg code to use for the geographical transformations.
Returns:¶
- pandas.DataFrame
A dataframe containing the merged shapefile and specified weather data.
- surveyweathertool.src.weather.weather_pipeline.calculate_SPI(df, column_rol='precipitation')¶
Calculates the Standardized Precipitation Index (SPI) for given input data.
Parameters: - df (pd.DataFrame): Input DataFrame containing precipitation data. - column_rol (str, optional): Column in the DataFrame containing precipitation data
that will be used to calculate the rolling mean. Defaults to ‘precipitation’.
save (bool, optional): If to save the spi to a csv file. Defaults to False
save (str, optional): Path to save the spi file. Defaults to spi_file.csv
Returns: - df (pd.DataFrame): DataFrame with added columns for 3-month rolling mean of precipitation,
cumulative probability, and the 3-month SPI.
- surveyweathertool.src.weather.weather_pipeline.xarray_upsample(da, lat_res, lon_res, method)¶
Upsample an xarray DataArray based on the provided latitude and longitude resolutions.
Parameters:¶
- daxarray.DataArray
The input DataArray to be upsampled.
- lat_resfloat
The desired latitude resolution for upsampling.
- lon_resfloat
The desired longitude resolution for upsampling.
- methodstr
The interpolation method to be used for upsampling. Supported methods are the same as those for xarray’s interp method (e.g., “linear”, “nearest”).
Returns:¶
- xarray.DataArray
The upsampled DataArray with new latitude and longitude resolutions.
- surveyweathertool.src.weather.weather_pipeline.clip_data(df: pandas.DataFrame, geo_df: geopandas.GeoDataFrame, epsg: int = 4326)¶
Processes and clips weather data in the DataFrame to a specified geographical region.
Parameters:¶
- dfpd.DataFrame
Input DataFrame with weather data that includes columns: date, lat, lon, and the specified value (e.g., temperature).
- geo_dfgpd.GeoDataFrame
Geographic DataFrame (e.g., a map of Nigeria) used to clip the weather data.
- epsgint, optional
EPSG code to set the Coordinate Reference System (CRS) for the data. Defaults to 4326 (WGS 84).
Returns:¶
- pd.DataFrame
A DataFrame with the combined, processed, and interpolated weather data, clipped to the specified region (from ‘geo_df’). The DataFrame includes columns: lat, lon, the specified weather value (e.g., temperature), and date.
- surveyweathertool.src.weather.weather_pipeline.interpolate_data(df: geopandas.GeoDataFrame, geo_df: geopandas.GeoDataFrame, value_col: str, lat_res: float = 0.1, lon_res: float = 0.1, method='linear', epsg: int = 4326)¶
Processes and interpolates weather data for each unique date in the DataFrame, and clips the interpolated data to a specified geographical region.
Parameters:¶
- dfgpd.GeoDataFrame
Input DataFrame with weather data that includes columns: date, lat, lon, and the specified value (e.g., temperature).
- geo_dfgpd.GeoDataFrame
Geographic DataFrame (e.g., a map of Nigeria) used to clip the interpolated weather data.
- value_colstr
Column name in ‘df’ with weather values (e.g., temperature) to be visualized and interpolated.
- lat_resfloat, optional
Latitude resolution for interpolation. Defaults to 0.1.
- lon_resfloat, optional
Longitude resolution for interpolation. Defaults to 0.1.
- methodstr, optional
Interpolation method to use. Can be ‘linear’, ‘nearest’, etc. depending on the available methods in the underlying library. Defaults to ‘linear’.
- epsgint, optional
EPSG code to set the Coordinate Reference System (CRS) for the data. Defaults to 4326 (WGS 84).
Returns:¶
- pd.DataFrame
A DataFrame with the combined, processed, and interpolated weather data, clipped to the specified region (from ‘geo_df’). The DataFrame includes columns: lat, lon, the specified weather value (e.g., temperature), and date.
- surveyweathertool.src.weather.weather_pipeline.clip_dataarray(da: xarray.DataArray, geo_df: geopandas.GeoDataFrame) xarray.DataArray¶
Clip a DataArray based on the shape of a GeoDataFrame.
Parameters: - da (xarray.DataArray): The input DataArray. - geo_df (gpd.GeoDataFrame): The GeoDataFrame to use for clipping.
Returns: - xarray.DataArray: The clipped DataArray.
- surveyweathertool.src.weather.weather_pipeline.plot_heatmap_grid_on_map(df: geopandas.GeoDataFrame, value_col: str, geo_df: geopandas.GeoDataFrame, cmap: str, legend_title: str, clip: bool = False, epsg: int = 4326)¶
Plots a heatmap for a given value column on a grid based on the provided GeoDataFrame. NB: Use ‘Reds’ for temperature related plots, and ‘Greens’ for precipitations related plots
Parameters:¶
- dfgpd.GeoDataFrame
The GeoDataFrame containing the data to be plotted.
- value_colstr
The column name in df which contains the values to be plotted on the heatmap.
- geo_dfgpd.GeoDataFrame
The GeoDataFrame to use for clipping. Defaults to None.
- clipbool, optional
Clips the data to the map. If True geo_df should be provided.
- epsgint, optional (default=4326)
The EPSG code representing the coordinate reference system (CRS) of the input data. Default is EPSG:4326 (WGS 84).
Returns:¶
- None
Displays a heatmap plot.
- surveyweathertool.src.weather.weather_pipeline.interpolate_data_for_all_dates(df: geopandas.GeoDataFrame, geo_df: geopandas.GeoDataFrame, value_col: str, lat_res: float = 0.1, lon_res: float = 0.1, method: str = 'linear', epsg: int = 4326)¶
Interpolates weather data for each unique date in the DataFrame.
Parameters are the same as in the ‘interpolate_data’ function.
Returns:¶
- pd.DataFrame
A DataFrame with interpolated weather data for all dates, clipped to the specified region.