esat.data package#
Submodules#
esat.data.analysis module#
- class esat.data.analysis.BatchAnalysis(batch_sa: BatchSA, data_handler: DataHandler | None = None)#
Bases:
object
Class for running batch solution analysis.
- Parameters:
batch_sa (BatchSA) – A completed ESAT batch source apportionment to run solution analysis on.
- plot_loss()#
Plot the loss value for each model in the batch solution as it changes over time.
A model will stop updating if the convergence criteria is met, which can be identified by the models that stop before reaching max iterations. The ideal loss curve should represent a y=1/x hyperbola, but because of the data uncertainty the curve may not be entirely smooth.
- plot_loss_distribution()#
Plot the distribution of batch model Q(True) and Q(Robust).
A very broad distribution is often a result of a ‘loose’ convergence criteria, increasing converge_n and decreasing converge_delta will narrow the criteria. If the Q(True) and Q(Robust) distributions are very similar the solution may be overfit, where enough sources/factors are available to capture the majority of outline behavior. In this case, reducing the number of factors can resolve overfitting the model.
- plot_temporal_residuals(feature_idx: int)#
Plot the temporal residuals for a specified feature, by index, of all models in the SA batch.
- Parameters:
feature_idx (int) – The index of the feature to plot.
- class esat.data.analysis.ModelAnalysis(datahandler: DataHandler, model: SA, selected_model: int | None = None)#
Bases:
object
Class for running model analysis and generating plots. A collection of model statistic methods and plot generation functions.
- Parameters:
datahandler (DataHandler) – The datahandler instance used for processing the input and uncertainty datasets used by the SA model.
model (SAModel) – A completed SA model with output used for calculating model statistics and generating plots.
selected_model (int) – If SA model is part of a batch, the model id/index that will be used for plot labels.
- calculate_statistics(results: ndarray | None = None)#
Calculate general statistics from the results of the NMF model run.
Will generate a pd.DataFrame with a set of metrics for each feature. The resulting dataframe will be accessible as .statistics. These metrics focus on residual analysis, including Norm tests of the residuals with three different metrics for testing the norm.
- Parameters:
results (np.ndarray) – The default behavior is for this function to use the ESAT model WH matrix for calculating metrics, this can be overriden by providing np.ndarray in the ‘results’ parameter. Default = None.
- features_metrics(est_V: ndarray | None = None)#
Create a dataframe of the feature metrics and error for model analysis.
- Parameters:
est_V (np.ndarray) – Overrides the use of the ESAT model’s WH matrix in the residual calculation. Default = None.
- Returns:
The features of the input dataset compared to the results of the model, as a pd.DataFrame
- Return type:
pd.DataFrame
- plot_estimated_observed(feature_idx: int)#
Create a plot that shows the estimates concentrations of a feature vs the observed concentrations.
- Parameters:
feature_idx (int) – The index of the feature to plot.
- plot_estimated_timeseries(feature_idx: int)#
Create a plot that shows the estimated values of a timeseries for a specific feature, selected by feature index.
- Parameters:
feature_idx (int) – The index of the feature to plot.
- plot_factor_composition()#
Creates a radar plot of the composition of all the factors to all features.
- plot_factor_contributions(feature_idx: int, contribution_threshold: float = 0.05)#
Create a plot of the factor contributions and the normalized contribution.
- Parameters:
feature_idx (int) – The index of the feature to plot.
contribution_threshold (float) – The contribution percentage of a factor above which to include on the plot.
- plot_factor_fingerprints(grouped: bool = False)#
Create a stacked bar plot of the factor profile, fingerprints.
- plot_factor_profile(factor_idx: int, H: ndarray | None = None, W: ndarray | None = None)#
Create a bar plot of a factor profile.
- Parameters:
factor_idx (int) – The index of the factor to plot (1 -> k).
H (np.ndarray) – Overrides the factor profile matrix in the ESAT model used for the plot.
W (np.ndarray) – Overrides the factor contribution matrix in the ESAT model used for the plot.
- plot_factor_surface(factor_idx: int = 1, feature_idx: int | None = None, percentage: bool = True, zero_threshold: float = 0.0001)#
Creates a 3d surface plot of the specified factor_idx’s concentration percentage or mass.
- Parameters:
factor_idx (int) – The factor index to plot showing all features for that factor, if factor_idx is none will show the feature_idx for all factors.
feature_idx (int) – The feature to include in the plot if factor_idx is none, otherwise will show all features for a specified factor_idx.
percentage (bool) – Plot the concentration as a scaled value, percentage of the sum of all factors, or as the calculated mass. Default = True.
zero_threshold (float) – Values below this threshold are considered zero on the plot.
- plot_g_space(factor_1: int, factor_2: int)#
Create a scatter plot showing a factor contributions vs another factor contributions.
- Parameters:
factor_1 (int) – The index of the factor to plot along the x-axis.
factor_2 (int) – The index of the factor to plot along the y-axis.
- plot_residual_histogram(feature_idx: int, abs_threshold: float = 3.0, est_V: ndarray | None = None)#
Create a plot of a histogram of the residuals for a specific feature.
- Parameters:
feature_idx (int) – The index of the feature for the plot.
abs_threshold (float) – The function generates a list of residuals that exceed this limit, the absolute value of the limit.
est_V (np.ndarray) – Overrides the use of the ESAT model’s WH matrix in the residual calculation. Default = None.
- Returns:
The list of residuals that exceed the absolute value of the threshold, as a pd.DataFrame
- Return type:
pd.DataFrame
esat.data.datahandler module#
- class esat.data.datahandler.DataHandler(input_path: str, uncertainty_path: str, index_col: str | None = None, drop_col: list | None = None, sn_threshold: float = 2.0, load: bool = True)#
Bases:
object
The class for cleaning and preparing input datasets for use in ESAT.
The DataHandler class is intended to provide a standardized way of cleaning and preparing data from file to ESAT models.
The input and uncertainty data files are specified by their file paths. Input files can be .csv or tab separated text files. Other file formats are not supported at this time.
#TODO: Add additional supported file formats by expanding the __read_data function.
- Parameters:
input_path (str) – The file path to the input dataset.
uncertainty_path (str) – The file path to the uncertainty dataset. #TODO: Add the option of generating an uncertainty dataset from a provided input dataset, using a random selection of some percentage range of the input dataset cell values.
index_col (str) – The name of the index column if it is not the first column in the dataset. Default = None, which will use the 1st column.
drop_col (list) – A list of columns to drop from the dataset. Default = None.
sn_threshold (float) – The threshold for the signal to noise ratio values.
load (bool) – Load the input and uncertainty data files, used internally for load_dataframe.
- get_data()#
Get the processed input and uncertainty dataset ready for use in ESAT. :returns: The processed input dataset and the processed uncertainty dataset as numpy arrays. :rtype: np.ndarray, np.ndarray
- static load_dataframe(input_df: DataFrame, uncertainty_df: DataFrame)#
Pass in pandas dataframes for the input and uncertainty datasets, instead of using files.
- Parameters:
input_df
uncertainty_df
- Returns:
Instance of DataHandler using dataframes as input.
- Return type:
- plot_data_uncertainty(feature_idx)#
Create a plot of the data vs the uncertainty for a specified feature, by the feature index.
- Parameters:
feature_idx (int) – The index of the feature, column, of the input and uncertainty dataset to plot.
- plot_feature_data(x_idx, y_idx)#
Create a plot of a data feature, column, vs another data feature, column. Specified by the feature indices.
- Parameters:
x_idx (int) – The feature index for the x-axis values.
y_idx (int) – The feature index for the y-axis values.
- plot_feature_timeseries(feature_selection)#
Create a plot of a feature, or list of features, as a timeseries.
- Parameters:
feature_selection (int or list) – A single or list of feature indices to plot as a timeseries.
- set_category(feature: str, category: str = 'strong')#
Set the S/N category for the feature, options are ‘strong’, ‘weak’ or ‘bad’. All features are set to ‘strong’ by default, which doesn’t modify the feature’s behavior in models. Features categorized as ‘weak’ triples their uncertainty and ‘bad’ features are excluded from analysis.
- Parameters:
feature (str) – The name or label of the feature.
category (str) – The new category of the feature
- Returns:
True if the change was successful, otherwise False.
- Return type:
bool
esat.data.test_tools module#
- class esat.data.test_tools.CompareAnalyzer(input_df, pmf_profile_df, pmf_contributions_df, ls_profile_df, ws_profile_df, ls_mapping, ws_mapping, ls_contributions_df, ws_contributions_df, features, datetimestamps)#
Bases:
object
Compare ESAT output with the PMF5 output.
- feature_histogram(feature: str | None = None, feature_i: int = 0, normalized: bool = False, threshold: float = 3.0)#
- plot_factor_contribution(feature: str | None = None, feature_i: int = 0)#
- plot_factors()#
- plot_feature_timeseries(factor_n: int, feature_n, show_input: bool = True)#
- plot_fingerprints(ls_nmf_r2: -1, ws_nmf_r2: -1)#
- timeseries_plot(feature: str | None = None, feature_i: int = 0)#