esat.error package#

Submodules#

esat.error.bootstrap module#

class esat.error.bootstrap.Bootstrap(sa: SA, feature_labels: list | None = None, model_selected: int = -1, bootstrap_n: int = 20, block_size: int = 10, threshold: float = 0.6, parallel: bool = True, cpus: int = -1, seed: int | None = None)#

Bases: object

The Bootstrap (BS) method is used to detect and estimate disproportionate effects of a small set of data samples on the solution. The BS method assembles dataset by randomly selecting blocks of consecutive samples from the original dataset, with replacement.

The BS method implemented here is called the block bootstrap method. The block BS method is useful on timeseries data that may contain temporal correlations that would otherwise be lost if single samples were resampled.

For each BS run, a unique BS dataset is created and run through NMF to convergence where the output is compared to see if the factors of the original base model map to each of the factors of the BS output. The factors are mapped to the original base model factors by highest correlation, potentially having multiple BS factors mapping to the same base model factor, where the correlation is above the user specified threshold.

Parameters:
  • sa (SA) – A completed SA base model that used the same data and uncertainty datasets.

  • feature_labels (list) – The labels for the features, columns of the dataset, specified from the data handler.

  • model_selected (int) – The index of the model selected from a batch NMF run, used for labeling.

  • bootstrap_n (int) – The number of bootstrap runs to make.

  • block_size (int) – The block size for the BS resampling.

  • threshold (float) – The correlation threshold that must be met for a BS factor to be mapped to a base model factor, factor correlations must be greater than the threshold or are labeled unmapped.

  • parallel (bool) – Run the individual models in parallel, not the same as the optimized parallelized option for an SA ws-nmf model. Default = True.

  • cpus (int) – The number of cpus to use for parallel processing. Default is the number of cpus - 1.

  • seed (int) – The random seed for random resampling of the BS datasets. The base model random seed is used for all BS runs, which result in the same initial W matrix.

static load(file_path: str)#

Load a previously saved BS SA pickle file.

Parameters:

file_path (str) – File path to a previously saved BS SA pickle file

Returns:

On successful load, will return a previously saved BS SA object. Will return None on load fail.

Return type:

Bootstrap

map_contributions(W1: ndarray, H1: ndarray, W2: ndarray, H2: ndarray, threshold: float = 0.6)#

Map all the factors of H1 to the factors of H2 by the factor contributions.

Parameters:
  • W1 (np.ndarray) – The first factor contribution matrix for the mapping.

  • H1 (np.ndarray) – The first factor profile matrix for the mapping.

  • W2 (np.ndarray) – The second factor contribution matrix for the mapping.

  • H2 (np.ndarray) – The second factor profile matrix for the mapping.

  • threshold (float) – The threshold that a factor correlation must exceed to be mapped to another factor.

Returns:

A dictionary of the mapping of the H1 factors to the H2 factors.

Return type:

dict

map_factors(H1: ndarray, H2: ndarray, threshold: float = 0.6)#

Map all the factors of one factor profile to the factors of a second factor profile.

Parameters:
  • H1 (np.ndarray) – The first factor profile for the mapping.

  • H2 (np.ndarray) – The second factor profile for the mapping.

  • threshold (float) – The threshold that a factor correlation must exceed to be mapped to another factor.

Returns:

A dictionary of the mapping of the H1 factors to the H2 factors.

Return type:

dict

plot_contribution(factor: int)#

Plot the BS factor contributions for a specific factor.

Parameters:

factor (int) – The index of the factor to plot.

plot_factor(factor: int)#

Plot the BS factor profile for a specific factor.

Parameters:

factor (int) – The index of the factor to plot.

plot_results(factor: int)#

Plot both the factor profile and factor contributions for a specific index.

Parameters:

factor (int) – The index of the factor to plot.

run(keep_H: bool = True, reuse_seed: bool = True, block: bool = True, overlapping: bool = False)#

Run the BS method.

Executes all the BS runs and compiles the results.

Parameters:
  • keep_H (bool) – When retraining the SA models using the resampled input and uncertainty datasets, keep the base model H matrix instead of reinitializing. The W matrix is always reinitialized when NMF is run on the BS datasets. Default = True

  • reuse_seed (bool) – Reuse the base model seed for initializing the W matrix, and the H matrix if keep_H = False. Default = True

  • block (bool) – Use block resampling instead of full resampling. Default = True

  • overlapping (bool) – Allow resampled blocks to overlap. Default = False

save(bs_name: str, output_directory: str, pickle_result: bool = True)#

Save the BS results. :param bs_name: The name to use for the BS file. :type bs_name: str :param output_directory: The output directory to save the BS file to. :type output_directory: str :param pickle_result: Pickle the bs model. Default = True. :type pickle_result: bool

Returns:

The path to the saved file.

Return type:

str

show_factor_results(factor: int)#

Create the table showing the factor metrics from the BS runs for a specific factor.

Parameters:

factor (int) – The index of the factor to show.

show_mapping_table()#

Plots the factor mapping table.

show_q_table()#

Plots the BS run Q(robust) statistics.

summary()#

Prints a summary of the BS parameters and results.

esat.error.bs_disp module#

class esat.error.bs_disp.BSDISP(sa: SA, feature_labels: list, model_selected: int = -1, bootstrap: Bootstrap | None = None, bootstrap_n: int = 20, block_size: int = 10, threshold: float = 0.6, max_search: int = 50, threshold_dQ: float = 0.1, features: list | None = None, seed: int | None = None)#

Bases: object

The Bootstrap-Displacement (BS-DISP) method combines both the Bootstrap and Displacement methods to estimate the errors with both random and rotational ambiguity. For each BS run/dataset, the DISP method is run on that dataset.

The BS-DISP method uses a base model, and an optional BS instance. For each bootstrap run, BS dataset, DISP will be run on each BS model for the specified features. If no features are specified then all features are run on DISP.

Parameters:
  • sa (SA) – A completed SA base model that used the same data and uncertainty datasets.

  • feature_labels (list) – The labels for the features, columns of the dataset, specified from the data handler.

  • model_selected (int) – The index of the model selected from a batch NMF run, used for labeling.

  • bootstrap (Bootstrap) – A previously complete BS model.

  • bootstrap_n (int) – The number of bootstrap runs to make.

  • block_size (int) – The block size for the BS resampling.

  • threshold (float) – The correlation threshold that must be met for a BS factor to be mapped to a base model factor, factor correlations must be greater than the threshold or are labeled unmapped.

  • max_search (int) – The maximum number of search steps to complete when trying to find a factor feature value. Default = 50

  • threshold_dQ (float) – The threshold range of the dQ value for the factor feature value to be considered found. I.E, dQ=4 and threshold_dQ=0.1, than any value between 3.9 and 4.0 will be considered valid.

  • features (list) – A list of the feature indices to run DISP on, default is None which will run DISP on all features.

  • seed (int) – The random seed for random resampling of the BS datasets. The base model random seed is used for all BS runs, which result in the same initial W matrix.

dQmax = [4, 2, 1, 0.5]#
static load(file_path: str)#

Load a previously saved BS-DISP SA pickle file.

Parameters:

file_path (str) – File path to a previously saved BS-DISP SA pickle file

Returns:

On successful load, will return a previously saved BS-DISP SA object. Will return None on load fail.

Return type:

BSDISP

plot_contribution(factor: int, dQ: float = 0.5)#

Plot the BS-DISP factor contribution results.

Parameters:
  • factor (int) – The index of the BS-DISP factor results to display.

  • dQ (float) – The dQ value to show in the results, valid values are (0.5, 1, 2, 4). Default = 0.5, will use default if invalid value provided.

plot_profile(factor: int, dQ: float = 0.5)#

Plot the BS-DISP factor profile results.

Parameters:
  • factor (int) – The index of the BS-DISP factor results to display.

  • dQ (float) – The dQ value to show in the results, valid values are (0.5, 1, 2, 4). Default = 0.4, will use default if invalid value provided.

plot_results(factor: int, dQ: float = 0.5)#

Plot the BS-DISP results for a specified factor and dQ value. The output results are grouped by dQ, with dQ=0.5 being the default value displayed for results.

Parameters:
  • factor (int) – The index of the BS-DISP factor results to display.

  • dQ (float) – The dQ value to show in the results, valid values are (0.5, 1, 2, 4). Default = 0.5, will use default if invalid value provided.

run(parallel: bool = True, keep_H: bool = True, reuse_seed: bool = True, block: bool = True, overlapping: bool = False)#

Run the BS-DISP error estimation method. If no prior BS run had been completed, this will execute a BS run and then a DISP for each of the BS runs.

Parameters:
  • keep_H (bool) – When retraining the SA models using the resampled input and uncertainty datasets, keep the base model H matrix instead of reinitializing. The W matrix is always reinitialized when SA is run on the BS datasets. Default = True

  • reuse_seed (bool) – Reuse the base model seed for initializing the W matrix, and the H matrix if keep_H = False. Default = True

  • block (bool) – Use block resampling instead of full resampling. Default = True

  • overlapping (bool) – Allow resampled blocks to overlap. Default = False

save(bsdisp_name: str, output_directory: str, pickle_result: bool = True)#

Save the BS-DISP results. :param bsdisp_name: The name to use for the BS-DISP pickle file. :type bsdisp_name: str :param output_directory: The output directory to save the BS-DISP pickle file to. :type output_directory: str :param pickle_result: Pickle the BS-DISP model. Default = True. :type pickle_result: bool

Returns:

The path to the saved file.

Return type:

str

summary()#

Prints a summary of the BS-DISP results table.

Summary shows the largest change in Q across all DISP runs, the % of cases with a drop of Q, swap in best fit and swap in DISP phase. Followed by the swap % table as shown in the regular DISP summary. The dQmax values in BS-DISP differ from DISP to account for increased variability, BS-DISP dQmax values are (0.5, 1, 2, 4) while DISP dQmax values are (4, 8, 16, 32)

esat.error.displacement module#

class esat.error.displacement.Displacement(sa: SA, feature_labels: list, model_selected: int = -1, max_search: int = 50, threshold_dQ: float = 0.1, features: list | None = None)#

Bases: object

The displacement method (DISP) for error estimation explores the rotational ambiguity in the solution by assessing the largest range of source profile values without an appreciable increase in the loss value (Q).

The DISP method finds the required change in a factor profile feature value to cause a specific increase in the loss function value (dQ). The change is found for dQ=(4, 8, 16, 32) and for both increasing and decreasing changes to the factor profile feature values. The search for these changes is limited to max_search steps, where a step is the binary search for the value based upon the bounds of the initial value. Factor profile values must be greater than 0, so once the modified value is below 1e-8 or the modified value is no longer changing between steps the search is stopped and the final value in the search used.

The process is repeated for all factors and features, if there are factors=k, features=N, dQ_N=4 then this process is completed 2*k*N*4 times.

Parameters:
  • sa (SA) – The base model to run the DISP method on.

  • feature_labels (list) – The list of feature, column, labels from the original input dataset. Provided in the data handler.

  • model_selected (int) – The index of the model selected in the case of a batch NMF run, used for labeling.

  • max_search (int) – The maximum number of search steps to complete when trying to find a factor feature value. Default = 50

  • threshold_dQ (float) – The threshold range of the dQ value for the factor feature value to be considered found. I.E, dQ=4 and threshold_dQ=0.1, than any value between 3.9 and 4.0 will be considered valid.

  • features (list) – A list of the feature indices to run DISP on, default is None which will run DISP on all features.

dQmax = [32, 16, 8, 4]#
static load(file_path: str)#

Load a previously saved DISP SA pickle file.

Parameters:

file_path (str) – File path to a previously saved DISP SA pickle file

Returns:

On successful load, will return a previously saved DISP NMF object. Will return None on load fail.

Return type:

Displacement

plot_contribution(factor: int, dQ: int = 4)#

Plot the DISP factor contribution results.

Parameters:
  • factor (int) – The index of the DISP factor results to display.

  • dQ (int) – The dQ value to show in the results, valid values are (4, 8, 16, 32). Default = 4, will use default if invalid value provided.

plot_profile(factor: int, dQ: int = 4)#

Plot the DISP factor profile results.

Parameters:
  • factor (int) – The index of the DISP factor results to display.

  • dQ (int) – The dQ value to show in the results, valid values are (4, 8, 16, 32). Default = 4, will use default if invalid value provided.

plot_results(factor: int, dQ: int = 4)#

Plot the DISP results for a specified factor and dQ value. The output results are grouped by dQ, with dQ=4 being the default value displayed for results.

Parameters:
  • factor (int) – The index of the DISP factor results to display.

  • dQ (int) – The dQ value to show in the results, valid values are (4, 8, 16, 32). Default = 4, will use default if invalid value provided.

run(batch: int = -1)#

Run the DISP method on the provided SA model.

Parameters:

batch (int) – Batch number identifier, used for labeling DISP during parallel runs with BS-DISP.

save(disp_name: str, output_directory: str, pickle_result: bool = True)#

Save the DISP results. :param disp_name: The name to use for the DISP pickle file. :type disp_name: str :param output_directory: The output directory to save the DISP pickle file to. :type output_directory: str :param pickle_result: Pickle the disp model. Default = True. :type pickle_result: bool

Returns:

The path to the saved file.

Return type:

str

summary()#

Print the summary table showing the largest change in dQ and the % of factor flips that occurred.

esat.error.error module#

class esat.error.error.Error(bs: Bootstrap | None = None, disp: Displacement | None = None, bsdisp: BSDISP | None = None)#

Bases: object

Calculate the summary error statistics from bootstrap, displacement and BS-DISP results.

Calculate the combined error summary statistics from various error estimation methods.

Parameters:
  • bs (Bootstrap) – The BS run to calculate the summary error.

  • disp (Displacement) – The DISP run to calculate the summary error.

  • bsdisp (BSDISP) – The BS-DISP run to calculate the summary error.

plot_summary(factor: int)#

Plot the combined error estimation results from all provided method results.

Parameters:

factor (int) – The index of the factor to plot.

Module contents#