esat.data package#

Submodules#

esat.data.analysis module#

esat.data.datahandler module#

class esat.data.datahandler.DataHandler(input_path: str, uncertainty_path: str, index_col: str = None, drop_col: list = None, sn_threshold: float = 2.0, load: bool = True)#

Bases: object

The class for cleaning and preparing input datasets for use in ESAT.

The DataHandler class is intended to provide a standardized way of cleaning and preparing data from file to ESAT models.

The input and uncertainty data files are specified by their file paths. Input files can be .csv or tab separated text files. Other file formats are not supported at this time.

#TODO: Add additional supported file formats by expanding the __read_data function.

Parameters:
  • input_path (str) – The file path to the input dataset.

  • uncertainty_path (str) – The file path to the uncertainty dataset. #TODO: Add the option of generating an uncertainty dataset from a provided input dataset, using a random selection of some percentage range of the input dataset cell values.

  • index_col (str) – The name of the index column if it is not the first column in the dataset. Default = None, which will use the 1st column.

  • drop_col (list) – A list of columns to drop from the dataset. Default = None.

  • sn_threshold (float) – The threshold for the signal to noise ratio values.

  • load (bool) – Load the input and uncertainty data files, used internally for load_dataframe.

get_data()#

Get the processed input and uncertainty dataset ready for use in ESAT. :returns: The processed input dataset and the processed uncertainty dataset as numpy arrays. :rtype: np.ndarray, np.ndarray

static load_dataframe(input_df: DataFrame, uncertainty_df: DataFrame)#

Pass in pandas dataframes for the input and uncertainty datasets, instead of using files.

Parameters:
  • input_df

  • uncertainty_df

Returns:

Instance of DataHandler using dataframes as input.

Return type:

DataHandler

plot_data_uncertainty()#

Create a plot of the data vs the uncertainty for a specified feature, with a dropdown menu for feature selection.

plot_feature_data(x_idx, y_idx)#

Create a plot of a data feature, column, vs another data feature, column. Specified by the feature indices.

Parameters:
  • x_idx (int) – The feature index for the x-axis values.

  • y_idx (int) – The feature index for the y-axis values.

plot_feature_timeseries(feature_selection)#

Create a plot of a feature, or list of features, as a timeseries.

Parameters:

feature_selection (int or list) – A single or list of feature indices to plot as a timeseries.

set_category(feature: str, category: str = 'strong')#

Set the S/N category for the feature, options are ‘strong’, ‘weak’ or ‘bad’. All features are set to ‘strong’ by default, which doesn’t modify the feature’s behavior in models. Features categorized as ‘weak’ triples their uncertainty and ‘bad’ features are excluded from analysis.

Parameters:
  • feature (str) – The name or label of the feature.

  • category (str) – The new category of the feature

Returns:

True if the change was successful, otherwise False.

Return type:

bool

esat.data.test_tools module#

class esat.data.test_tools.CompareAnalyzer(input_df, pmf_profile_df, pmf_contributions_df, ls_profile_df, ws_profile_df, ls_mapping, ws_mapping, ls_contributions_df, ws_contributions_df, features, datetimestamps)#

Bases: object

Compare ESAT output with the PMF5 output.

feature_histogram(feature: str = None, feature_i: int = 0, normalized: bool = False, threshold: float = 3.0)#
plot_factor_contribution(feature: str = None, feature_i: int = 0)#
plot_factors()#
plot_feature_timeseries(factor_n: int, feature_n, show_input: bool = True)#
plot_fingerprints(ls_nmf_r2: -1, ws_nmf_r2: -1)#
timeseries_plot(feature: str = None, feature_i: int = 0)#

Module contents#