esat package#

Subpackages#

Submodules#

esat.configs module#

esat.metrics module#

Collection of metric functions which are used throughout the code base.

esat.metrics.calculate_Q(residuals, uncertainty)#
esat.metrics.q_factor(V, U, W, H)#
esat.metrics.q_loss(V, U, W, H, uncertainty: bool = True)#
esat.metrics.qr_loss(V, U, W, H, alpha=4.0)#

esat.utils module#

Collection of utility functions used throughout the code base.

esat.utils.calculate_factor_correlation(factor1, factor2)#
esat.utils.compare_all_factors(matrix1, matrix2)#
esat.utils.np_encoder(object)#

Convert any numpy type to a generic type for json serialization.

Parameters:

object – Object to be converted.

Returns:

Generic object or an unchanged object if not a numpy type

Return type:

object

esat.utils.solution_bump(profile: ndarray, contribution: ndarray, bump_range: tuple = (0.9, 1.1), seed: int = 42)#

esat.estimator module#

class esat.estimator.FactorEstimator(V: ndarray, U: ndarray, seed: int = 42, test_percent: float = 0.1, k_coef: float = 1.0)#

Bases: object

Factor search uses a Monte Carlo sampling approach for testing different factor counts using cross-validation testing. Both a train and a test MSE are calculated for each model in the search. These MSE values are averaged for each factor count and the change in test MSE is used to estimate the factor count for the dataset.

Reference: http://alexhwilliams.info/itsneuronalblog/2018/02/26/crossval/

Parameters:
  • V (np.ndarray) – The input dataset to use for the factor search.

  • U (np.ndarray) – The uncertainty dataset to use for the factor search.

  • seed (int) – The random seed to use for the model initialization, cross-validation masking, and factor selection.

  • test_percent (float) – The decimal percentage of values in the input dataset to use for the MSE test calculation.

  • k_coef (float) – The K estimate metric calculation uses a coefficient that can be used for tuning.

plot(actual_count: int | None = None)#

Plot the results of the factor search as seen by the results table. When the actual number of factors are known, they can be provided using the actual_count parameter. The estimated factor count will be shown as a red dashed vertical line, the actual factor count is shown as a black dashed vertical line when it is provided.

Parameters:

actual_count (int) – The known factor count value, such as when using the Simulator.

run(samples: int = 200, min_factors: int = 2, max_factors: int = 15, max_iterations: int = 2000, converge_delta: float = 1.0, converge_n: int = 10)#

Run the Monte Carlo sampling for a random set of models using factor counts between min_factors and max_factors a specified number of times, samples.

When the results are inconclusive or there are several large peaks in the delta MSE line, increasing the sample count can help narrow the estimation.

Parameters:
  • samples (int) – The number of random samples to take for the factor estimation.

  • min_factors (int) – The minimum number of factors to consider in the random sampling.

  • max_factors (int) – The maximum number of factors to consider in the random sampling.

  • max_iterations (int) – The maximum number of iterations to run the models.

  • converge_delta (float) – The change in the loss value over a specified numbers of steps for the model to be considered converged.

  • converge_n (int) – The number of steps where the loss changes by less than converge_delta, for the model to be considered converged.

Returns:

The results of the factor search showing the metrics used to estimate the factor count.

Return type:

pd.DataFrame

Module contents#