surveyweathertool.src.survey.harmonizer¶
Module Contents¶
Functions¶
|
Prepares data descriptions for files from a JSON file. |
|
Writes a dictionary to a file in the specified format. |
|
Adds a dictionary to an existing Excel workbook in the specified sheet. |
|
Retrieves a dictionary containing all harmonized files in a folder. |
|
Preprocesses the given dataframe by dropping duplicate rows, dropping NaN values, extracting year from 'wave' column, |
Prepares concatenated dataframes based on the given filenames and their associated indicators. |
|
|
Merge and process a list of DataFrames containing indicator data. |
Runs all setup on the harmonize dataset and save for all selected indicators_ |
- surveyweathertool.src.survey.harmonizer.prepare_files_descriptions_data(json_file_path: str) dict¶
Prepares data descriptions for files from a JSON file.
- Args:
json_file_path (str): The path of the JSON file.
- Returns:
dict: The prepared data descriptions for files.
- surveyweathertool.src.survey.harmonizer.write_dict_to_file(data, output_path, filename, file_type='xlsx')¶
Writes a dictionary to a file in the specified format.
- Args:
data: The dictionary to write to the file. output_path (str): The path of the output directory. filename (str): The name of the output file. file_type (str, optional): The file format to use (“csv”, “xlsx”, or “json”). Defaults to “xlsx”.
- Returns:
None
- surveyweathertool.src.survey.harmonizer.add_extract_dict_to_excel(workbook_path: str, data: dict, excluding_columns: list, sheet_name: str = 'Indicator-Domain - Indicator', column1: str = 'Indicator_Domain', column2: str = 'Indicator') tuple¶
Adds a dictionary to an existing Excel workbook in the specified sheet.
- Args:
workbook_path (str): The path of the Excel workbook. data (dict): The dictionary to add to the workbook. excluding_columns (list): The list of columns to exclude. sheet_name (str, optional): The name of the sheet to add the data. Defaults to “Indicator-Domain - Indicator”. column1 (str, optional): The column header for Indicator_Domain. Defaults to “Indicator_Domain”. column2 (str, optional): The column header for Indicator. Defaults to “Indicator”.
- Returns:
tuple: A tuple containing the indicator_domain_mapping, columns_filename_lookup, and file_columns_question_lookup.
- surveyweathertool.src.survey.harmonizer.get_all_harmonized_files_dictionary(folder_path: str, file_extension_choices: List[str]) Dict[str, str]¶
Retrieves a dictionary containing all harmonized files in a folder.
- Args:
folder_path (str): The path to the folder containing the files. file_extension_choices (List[str]): The list of file extensions to consider.
- Returns:
Dict[str, str]: A dictionary mapping file names to their full paths.
- surveyweathertool.src.survey.harmonizer.preprocessing_transformer(df: pandas.DataFrame, primary_columns: List[str], columns_to_check: List[str] = None) pandas.DataFrame¶
Preprocesses the given dataframe by dropping duplicate rows, dropping NaN values, extracting year from ‘wave’ column, and converting selected columns to specific data types.
- Parameters:
df (pd.DataFrame): The input dataframe. primary_columns (List[str]): A list of primary columns to be used for preprocessing. columns_to_check (List[str], optional): A list of columns to check for NaN values. Defaults to None.
- Returns:
pd.DataFrame: The preprocessed dataframe.
- surveyweathertool.src.survey.harmonizer.prepare_concatenated_data(filenames_indicator: Dict[str, set], files_paths: Dict[str, str], geodata_path: str) List[pandas.DataFrame] | None¶
Prepares concatenated dataframes based on the given filenames and their associated indicators.
- Parameters:
filenames_indicator (Dict[str, set]): A dictionary where keys are filenames and values are sets of indicators. files_paths (Dict[str, str]): A dictionary where keys are filenames and values are file paths. geodata_path (str): The file path of the geolocation data.
- Returns:
List[pd.DataFrame] or None: A list of concatenated dataframes.
- surveyweathertool.src.survey.harmonizer.indicator_merger(concatenated_data: List[pandas.DataFrame]) pandas.DataFrame¶
Merge and process a list of DataFrames containing indicator data.
- Args:
concatenated_data (List[pd.DataFrame]): A list of DataFrames to be merged.
- Returns:
pd.DataFrame: The merged DataFrame after processing.
- surveyweathertool.src.survey.harmonizer.harmonize()¶
Runs all setup on the harmonize dataset and save for all selected indicators_