surveyweathertool.src.survey.harmonizer

Module Contents

Functions

prepare_files_descriptions_data(→ dict)

Prepares data descriptions for files from a JSON file.

write_dict_to_file(data, output_path, filename[, ...])

Writes a dictionary to a file in the specified format.

add_extract_dict_to_excel(→ tuple)

Adds a dictionary to an existing Excel workbook in the specified sheet.

get_all_harmonized_files_dictionary(→ Dict[str, str])

Retrieves a dictionary containing all harmonized files in a folder.

preprocessing_transformer(→ pandas.DataFrame)

Preprocesses the given dataframe by dropping duplicate rows, dropping NaN values, extracting year from 'wave' column,

prepare_concatenated_data(...)

Prepares concatenated dataframes based on the given filenames and their associated indicators.

indicator_merger(→ pandas.DataFrame)

Merge and process a list of DataFrames containing indicator data.

harmonize()

Runs all setup on the harmonize dataset and save for all selected indicators_

surveyweathertool.src.survey.harmonizer.prepare_files_descriptions_data(json_file_path: str) dict

Prepares data descriptions for files from a JSON file.

Args:

json_file_path (str): The path of the JSON file.

Returns:

dict: The prepared data descriptions for files.

surveyweathertool.src.survey.harmonizer.write_dict_to_file(data, output_path, filename, file_type='xlsx')

Writes a dictionary to a file in the specified format.

Args:

data: The dictionary to write to the file. output_path (str): The path of the output directory. filename (str): The name of the output file. file_type (str, optional): The file format to use (“csv”, “xlsx”, or “json”). Defaults to “xlsx”.

Returns:

None

surveyweathertool.src.survey.harmonizer.add_extract_dict_to_excel(workbook_path: str, data: dict, excluding_columns: list, sheet_name: str = 'Indicator-Domain - Indicator', column1: str = 'Indicator_Domain', column2: str = 'Indicator') tuple

Adds a dictionary to an existing Excel workbook in the specified sheet.

Args:

workbook_path (str): The path of the Excel workbook. data (dict): The dictionary to add to the workbook. excluding_columns (list): The list of columns to exclude. sheet_name (str, optional): The name of the sheet to add the data. Defaults to “Indicator-Domain - Indicator”. column1 (str, optional): The column header for Indicator_Domain. Defaults to “Indicator_Domain”. column2 (str, optional): The column header for Indicator. Defaults to “Indicator”.

Returns:

tuple: A tuple containing the indicator_domain_mapping, columns_filename_lookup, and file_columns_question_lookup.

surveyweathertool.src.survey.harmonizer.get_all_harmonized_files_dictionary(folder_path: str, file_extension_choices: List[str]) Dict[str, str]

Retrieves a dictionary containing all harmonized files in a folder.

Args:

folder_path (str): The path to the folder containing the files. file_extension_choices (List[str]): The list of file extensions to consider.

Returns:

Dict[str, str]: A dictionary mapping file names to their full paths.

surveyweathertool.src.survey.harmonizer.preprocessing_transformer(df: pandas.DataFrame, primary_columns: List[str], columns_to_check: List[str] = None) pandas.DataFrame

Preprocesses the given dataframe by dropping duplicate rows, dropping NaN values, extracting year from ‘wave’ column, and converting selected columns to specific data types.

Parameters:

df (pd.DataFrame): The input dataframe. primary_columns (List[str]): A list of primary columns to be used for preprocessing. columns_to_check (List[str], optional): A list of columns to check for NaN values. Defaults to None.

Returns:

pd.DataFrame: The preprocessed dataframe.

surveyweathertool.src.survey.harmonizer.prepare_concatenated_data(filenames_indicator: Dict[str, set], files_paths: Dict[str, str], geodata_path: str) List[pandas.DataFrame] | None

Prepares concatenated dataframes based on the given filenames and their associated indicators.

Parameters:

filenames_indicator (Dict[str, set]): A dictionary where keys are filenames and values are sets of indicators. files_paths (Dict[str, str]): A dictionary where keys are filenames and values are file paths. geodata_path (str): The file path of the geolocation data.

Returns:

List[pd.DataFrame] or None: A list of concatenated dataframes.

surveyweathertool.src.survey.harmonizer.indicator_merger(concatenated_data: List[pandas.DataFrame]) pandas.DataFrame

Merge and process a list of DataFrames containing indicator data.

Args:

concatenated_data (List[pd.DataFrame]): A list of DataFrames to be merged.

Returns:

pd.DataFrame: The merged DataFrame after processing.

surveyweathertool.src.survey.harmonizer.harmonize()

Runs all setup on the harmonize dataset and save for all selected indicators_