surveyweathertool.src.weather.utils¶
Module Contents¶
Functions¶
|
Download weather data files for a range of years from a given URL. |
|
Convert a NetCDF file to CSV format and filter data based on coordinates. |
|
Performs correlation analysis on a given dataframe. |
|
Performs time series analysis and clustering on a given dataframe. |
|
Compute the Dynamic Time Warping (DTW) distance between two time series. |
Create a new feature 'SPI_Category' in the dataframe based on the '3_month_SPI' values. |
|
|
Reads a shapefile and pre-processes it by removing certain columns. |
|
Check if a file exists at the given file path. |
|
Extract day, month, and year from the 'date' column of the DataFrame. |
|
Save or load data using pickle to/from a specified file path. |
Compute the rolling daily averages of an event per grid coordinate for each year. |
|
|
Calculate severity measure for consecutive positive values. |
|
Assign severity to a row based on the severity measure. |
|
Initialize severity and ranking columns based on the 'extreme' column values. |
|
Create and compute severity values for extreme weather events in a given year. |
|
Compute severity and ranking measures for extreme weather events in the provided data. |
|
Split a container into batches of a specified size. |
|
Plots a heatmap based on the given matrix (numpy array). |
|
Split a container into batches of a specified size. |
|
Fetches weather data from the Open-Meteo API for a specified location and time range. |
|
Fetches weather data for each household's location and merges it with household data. |
|
Preprocesses the precipitation and temperature datasets. |
- surveyweathertool.src.weather.utils.get_data(baseURL, destination_path, weather, year_start, year_end)¶
Download weather data files for a range of years from a given URL.
Parameters: baseURL (str): Base URL of the weather data files. destination_path (str): Directory where downloaded files will be stored. weather (str): Type of weather data (e.g., “precip”, “tmax”, “tmin”). year_start (int): Starting year of the range. year_end (int): Ending year of the range.
- surveyweathertool.src.weather.utils.ncToCSV(filepath, destination_path, coordinate_limits)¶
Convert a NetCDF file to CSV format and filter data based on coordinates.
Parameters: filepath (Path): Path to the input NetCDF file. destination_path (Path): Path where the CSV file will be saved. coordinate_limits (list): List of latitude and longitude coordinate limits.
Notes: This function assumes that the NetCDF file has a ‘time’, ‘lat’, ‘lon’, and ‘precip’/’temp’ columns.
- surveyweathertool.src.weather.utils.get_correlation_matrix(df, col_index, col_columns, col_values)¶
Performs correlation analysis on a given dataframe.
Parameters:¶
- dfpandas.DataFrame
Input dataframe with time series data.
- col_indexstr
Name of the column in df to use as the new dataframe index (time period).
- col_columnsstr
Name of the column in df to use for creating the new dataframe columns (regions).
- col_valuesstr
Name of the column in df to use for the data values (mean precipitation).
Returns:¶
- pandas.DataFrame
A correlation matrix of the data.
- surveyweathertool.src.weather.utils.time_series_clustering(df, col_dissolve, time_cols, value_col, eps=2, min_samples=4)¶
Performs time series analysis and clustering on a given dataframe.
Parameters:¶
- dfpandas.DataFrame
Input dataframe with time series data.
- col_dissolvestr
Name of the column in df to use as the new dataframe index.
- time_colslist of str
Names of the columns in df to use for creating the time period labels.
- value_colstr
Name of the column in df to use for the time series values.
- epsfloat, optional
The maximum distance between two samples for them to be considered as in the same neighborhood in DBSCAN. Defaults to 2.
- min_samplesint, optional
The number of samples in a neighborhood for a point to be considered as a core point in DBSCAN. Defaults to 4.
Returns:¶
- pandas.DataFrame
A dataframe with the same index as the input dataframe and an additional column ‘Cluster’ with the DBSCAN cluster labels.
- surveyweathertool.src.weather.utils.compute_dtw(series_1, series_2)¶
Compute the Dynamic Time Warping (DTW) distance between two time series.
Parameters: series_1, series_2 : array-like
Input time series.
Returns: dtw_distance : float
DTW distance between the input time series.
- surveyweathertool.src.weather.utils.create_spi_category(df)¶
Create a new feature ‘SPI_Category’ in the dataframe based on the ‘3_month_SPI’ values.
Parameters: - df (pd.DataFrame): DataFrame with ‘3_month_SPI’ values.
Returns: - df (pd.DataFrame): DataFrame with the added ‘SPI_Category’ column.
- surveyweathertool.src.weather.utils.read_shape_file(data_path)¶
Reads a shapefile and pre-processes it by removing certain columns.
Parameters:¶
- data_pathstr
The file path to the shapefile.
Returns:¶
- geo_nigGeoDataFrame
The pre-processed geopandas dataframe.
Notes:¶
This function is specifically designed for shapefiles that contain the following columns: ‘admin2AltN’, ‘admin2Al_1’, ‘ValidTo’, ‘date’, ‘validOn’, ‘Shape_Leng’, ‘Shape_Area’, ‘admin0Name’, ‘admin0Pcod’. These columns will then be removed from the original data. # if we need some of the the columns we can further change this function.
- surveyweathertool.src.weather.utils.check_file_existence(file_path: str) bool¶
Check if a file exists at the given file path.
- Parameters:
file_path (str): The path to the file to be checked.
- Returns:
bool: True if the file exists, False otherwise.
- surveyweathertool.src.weather.utils.extract_timescales(data: pandas.DataFrame)¶
Extract day, month, and year from the ‘date’ column of the DataFrame.
- Parameters:
data (pd.DataFrame): The DataFrame containing the ‘date’ column.
- Returns:
None. Modifies the DataFrame in-place by adding ‘day’, ‘month’, and ‘year’ columns.
- surveyweathertool.src.weather.utils.thresholds_saver_loader(file_path: str | pathlib.Path, data: Any = None, action: str = 'save')¶
Save or load data using pickle to/from a specified file path.
- Parameters:
file_path (Union[str, Path]): The path to the file to be saved or loaded. data (Any, optional): The data to be saved. Required when action is “save”. action (str, optional): The action to perform. Options: “save” or “load”.
- Returns:
Any: Loaded data if action is “load”.
- surveyweathertool.src.weather.utils.compute_years_daily_averaging_threshold_per_grid(data: pandas.DataFrame, event: str, geo_columns: list, window: int)¶
Compute the rolling daily averages of an event per grid coordinate for each year.
- Parameters:
data (pd.DataFrame): The DataFrame containing the weather event data. event (str): The name of the weather event column. geo_columns (list): List of columns representing the grid coordinates. window (int): The rolling window size.
- Returns:
dict: A dictionary containing rolling daily averages per grid coordinate.
- surveyweathertool.src.weather.utils.severity_measure(data: Dict[int, float], days_param: int = 3) Dict[int, float]¶
Calculate severity measure for consecutive positive values.
- Parameters:
data (Dict[int, float]): A dictionary of values indexed by day. days_param (int): The minimum number of consecutive positive days to calculate the severity. Defaults to 3.
- Returns:
Dict[int, float]: A dictionary containing severity values indexed by day.
- surveyweathertool.src.weather.utils.assign_severity(row: Dict[str, Any], measure: Dict[int, float]) float¶
Assign severity to a row based on the severity measure.
- Parameters:
row (Dict[str, Any]): The row of data containing ‘day’ information. measure (Dict[int, float]): The severity measure dictionary.
- Returns:
float: The assigned severity value.
- surveyweathertool.src.weather.utils.initialize_features_columns(data: pandas.DataFrame) None¶
Initialize severity and ranking columns based on the ‘extreme’ column values.
- Parameters:
data (pd.DataFrame): The input DataFrame containing weather event data.
- Returns:
None. Modifies the input DataFrame by setting ‘severity’ and ‘ranking’ columns based on ‘extreme’ column values.
- surveyweathertool.src.weather.utils.create_severity_value(data: pandas.DataFrame, year: int, geo_columns: List[str], days_param: int) pandas.DataFrame¶
Create and compute severity values for extreme weather events in a given year.
- Parameters:
data (pd.DataFrame): The input DataFrame containing weather event data. year (int): The year for which severity values are computed. geo_columns (List[str]): List of column names representing geographical coordinates. days_param (int): Parameter for computing severity.
- Returns:
pd.DataFrame: A DataFrame containing computed severity and ranking values for the given year.
- surveyweathertool.src.weather.utils.compute_severity_ranking(data: pandas.DataFrame, event: str, days_param: int = 3)¶
Compute severity and ranking measures for extreme weather events in the provided data.
- Parameters:
data (pd.DataFrame): The input DataFrame containing weather event data. event (str): The name of the weather event column. days_param (int, optional): Parameter for computing severity. Default is 3.
- Returns:
None
- surveyweathertool.src.weather.utils.batching(container: List, batch_size: int) List[List]¶
Split a container into batches of a specified size.
- Parameters:
container (list): The container to be split. batch_size (int): The size of each batch.
- Returns:
list of lists: List of batches, each containing a subset of the input container.
- surveyweathertool.src.weather.utils.plot_heatmap(matrix, title, save_path, figsize=(8, 6), cmap='coolwarm', annot=False, fmt='.1f', save=False)¶
Plots a heatmap based on the given matrix (numpy array).
Parameters: - matrix (numpy array): The matrix containing the values to be plotted. - title (str): Title of the heatmap. - save_path (str): Path where the heatmap visualization will be saved. - figsize (tuple): Figure size. Default is (8, 6). - cmap (str): Color map used for the heatmap. Default is ‘coolwarm’. - annot (bool): Whether to annotate each cell with their value. Default is False. - fmt (str): String formatting code to use when adding annotations. Default is “.1f”.
Returns: None. Displays and saves the heatmap visualization.
- surveyweathertool.src.weather.utils.batching(list_of_elements: List, batch_size: int) List[List]¶
Split a list_of_elements into batches of a specified size.
- Parameters:
list_of_elements (list): The list_of_elements to be split. batch_size (int): The size of each batch.
- Returns:
list of lists: List of batches, each containing a subset of the input list_of_elements.
- surveyweathertool.src.weather.utils.get_weather_data_open_meteo(latitude: float, longitude: float, start: str, end: str = None) pandas.DataFrame¶
Fetches weather data from the Open-Meteo API for a specified location and time range.
This function constructs a URL using the provided latitude, longitude, start date, and end date, to fetch daily weather data including maximum, minimum, and mean temperatures, as well as total precipitation. The fetched data is processed and returned as a DataFrame.
- Parameters:
latitude (float): The latitude of the location for weather data retrieval. longitude (float): The longitude of the location for weather data retrieval. start (str): The start date of the data range in ‘YYYY-MM-DD’ format. end (str)(optional): The end date of the data range in ‘YYYY-MM-DD’ format.
- Returns:
- pd.DataFrame: A DataFrame containing processed weather data with columns for:
date (str): The date of the weather data in ‘YYYY-MM-DD’ format.
temperature_2m_max (float): Maximum 2-meter temperature in Celsius.
temperature_2m_min (float): Minimum 2-meter temperature in Celsius.
temperature_2m_mean (float): Mean 2-meter temperature in Celsius.
precipitation_sum (float): Total daily precipitation in millimeters.
- surveyweathertool.src.weather.utils.get_weather_for_household(hh_data: pandas.DataFrame) pandas.DataFrame¶
Fetches weather data for each household’s location and merges it with household data.
This function takes household data containing latitude, longitude, and date information, fetches weather data for each household’s location and date, and combines it with the original household data. The weather data includes maximum, minimum, and mean temperatures, as well as total daily precipitation.
- Parameters:
hh_data (pd.DataFrame): Household data containing ‘lat’, ‘lon’, and ‘date’ columns.
- Returns:
pd.DataFrame: A DataFrame containing combined household and weather data
- surveyweathertool.src.weather.utils.preprocess_weather_datasets(precip_data: pandas.DataFrame, temp_data: pandas.DataFrame, needed_cols_precip: List[str] = ['date', 'lat', 'lon', 'precipitation', 'month', 'year', 'delta', 'SPI'], needed_cols_temp: List[str] = ['date', 'lat', 'lon', 'temperature', 'month', 'year', 'delta'], cols_rename_dict: Dict[str, Dict[str, str]] = {'precipitation': {'delta': 'heavy_rain_index', 'SPI': 'spi_index'}, 'temperature': {'delta': 'heatwave_index'}}, non_float_cols: List[str] = ['date', 'month', 'year']) Tuple[pandas.DataFrame, pandas.DataFrame]¶
Preprocesses the precipitation and temperature datasets.
This function preprocesses the precipitation and temperature datasets by selecting only the needed columns, rounding the float columns to 2 decimals, and renaming the columns.
- Parameters:
precip_data (pd.DataFrame): The precipitation dataset. temp_data (pd.DataFrame): The temperature dataset. needed_cols_precip (List[str]): The list of needed columns for the precipitation dataset. needed_cols_temp (List[str]): The list of needed columns for the temperature dataset. cols_rename_dict (Dict[str, Dict[str, str]]): The dictionary containing the columns to be renamed.
- Returns:
Tuple[pd.DataFrame, pd.DataFrame]: The preprocessed precipitation and temperature datasets.