lf2i.test_statistics package#

Module contents#

class lf2i.test_statistics.TestStatistic(acceptance_region: str, estimation_method: str)[source]#

Bases: ABC

Base class for test statistics. This is a template from which every test statistic should inherit.

Parameters:

acceptance_region (str) – Whether the acceptance region for the corresponding test is defined to be on the right or on the left of the critical value. Must be either left or right.
estimation_method (str) – The method with which the test statistic is estimated. If likelihood-based test statistics are used, e.g. ACORE and BFF, then ‘likelihood’. If prediction/posterior-based test statistics are used, e.g. WALDO, then ‘prediction’ or ‘posterior’.

abstract estimate()[source]#

abstract evaluate()[source]#

class lf2i.test_statistics.ACORE(estimator: str | Any, poi_dim: int, nuisance_dim: int, batch_size: int, data_dim: int, estimator_kwargs: Dict = {}, n_jobs: int = -2)[source]#

Bases: TestStatistic

Implements the ACORE test statistic as described in https://proceedings.mlr.press/v119/dalmasso20a.html and https://arxiv.org/abs/2107.03920.

Parameters:

estimator (Union[str, Any]) – Probabilistic classifier used to estimate odds (i.e., likelihood up to a normalization constant). If str, must be one of the predefined estimators listed in test_statistics/estimators.py. If Any, a trained estimator is expected. Needs to implement estimator.predict_proba(X=…).
poi_dim (int) – Dimensionality (number) of the parameters of interest.
nuisance_dim (int) – Dimensionality (number) of the nuisance parameters (systematics). Should be 0 if all parameters are object of inference.
batch_size (int) – Size of a batch of datapoints from a specific parameter configuration. Must be the same for observations and simulations. A simulated/observed batch from a specific parameter configuration will have dimensions (batch_size, data_dim).
data_dim (int) – Dimensionality of a single datapoint X.
estimator_kwargs (Dict, optional) – Hyperparameters and settings for the conditional mean estimator, by default {}.
n_jobs (int, optional) – Number of workers to use when computing ACORE over multiple inputs, by default -2, which uses all cores minus one.

estimate(labels: ndarray | Tensor, parameters: ndarray | Tensor, samples: ndarray | Tensor) → None[source]#

Train the estimator for odds (i.e. likelihood up to a normalization constant). The training dataset should contain two classes:

label 1, with pairs \((\theta, X)\) where \(X \sim p(\cdot;\theta)\) is drawn from the likelihood/simulator.

label 0, with pairs \((\theta, X)\) where \(X \sim G\) is drawn from a dominating reference distribution (e.g., empirical marginal).

To goal is to train a classifier that is able to distinguish whether a sample comes from the likelihood or not. See https://arxiv.org/abs/2107.03920 for a more detailed explanation.

Parameters:

labels (Union[np.ndarray, torch.Tensor]) – Class labels 0/1.
parameters (Union[np.ndarray, torch.Tensor]) – Simulated parameters to be used for training.
samples (Union[np.ndarray, torch.Tensor]) – Simulated samples to be used for training.

evaluate(parameters: ndarray | Tensor, samples: ndarray | Tensor, mode: str, param_space_bounds: List[Tuple[float]] | None) → ndarray[source]#

Evaluate the ACORE test statistic over the given parameters and samples. Behaviour differs depending on mode:

‘critical_values’ and ‘diagnostics’ compute ACORE once for each pair \((\theta, X)\).

‘confidence_sets’ computes ACORE over all pairs given by the cartesian product of parameters (the parameter grid to construct confidence sets) and samples.

Parameters:

parameters (Union[np.ndarray, torch.Tensor]) – Parameters over which to evaluate the test statistic.
samples (Union[np.ndarray, torch.Tensor]) – Samples over which to evaluate the test statistic.
mode (str) – Either ‘critical_values’, ‘confidence_sets’, ‘diagnostics’.
param_space_bounds (Optional[List[Tuple[float]]]) – Bounds of the parameter space, both POIs and nuisances. Must be in the same order as in parameters.

Returns:

ACORE test statistics evaluated over parameters and samples.

Return type:

np.ndarray

Raises:

ValueError – If mode is not among the pre-specified values.

class lf2i.test_statistics.BFF(estimator: str | Any, poi_dim: int, nuisance_dim: int, batch_size: int, data_dim: int, estimator_kwargs: Dict = {}, n_jobs: int = -2)[source]#

Bases: TestStatistic

Implements the BFF test statistic as described in https://arxiv.org/abs/2107.03920. NOTE: for now supports only box uniform proposal distributions over the parameter space.

Parameters:

estimator (Union[str, Any]) – Probabilistic classifier used to estimate odds (i.e., likelihood up to a normalization constant). If str, must be one of the predefined estimators listed in test_statistics/_estimators.py. If Any, a trained estimator is expected. Needs to implement estimator.predict_proba(X=…).
poi_dim (int) – Dimensionality (number) of the parameters of interest.
nuisance_dim (int) – Dimensionality (number) of the nuisance parameters (systematics). Should be 0 if all parameters are object of inference.
batch_size (int) – Size of a batch of datapoints from a specific parameter configuration. Must be the same for observations and simulations. A simulated/observed batch from a specific parameter configuration will have dimensions (batch_size, data_dim).
data_dim (int) – Dimensionality of a single datapoint X.
estimator_kwargs (Dict, optional) – Hyperparameters and settings for the conditional mean estimator, by default {}.
n_jobs (int, optional) – Number of workers to use when computing BFF over multiple inputs, by default -2, which uses all cores minus one.

estimate(labels: ndarray | Tensor, parameters: ndarray | Tensor, samples: ndarray | Tensor) → None[source]#

Train the estimator for odds (i.e. likelihood up to a normalization constant). The training dataset should contain two classes:

label 1, with pairs \((\theta, X)\) where \(X \sim p(\cdot;\theta)\) is drawn from the likelihood/simulator.

label 0, with pairs \((\theta, X)\) where \(X \sim G\) is drawn from a dominating reference distribution (e.g., empirical marginal).

To goal is to train a classifier that is able to distinguish whether a sample comes from the likelihood or not. See https://arxiv.org/abs/2107.03920 for a more detailed explanation.

Parameters:

labels (Union[np.ndarray, torch.Tensor]) – Class labels 0/1.
parameters (Union[np.ndarray, torch.Tensor]) – Simulated parameters to be used for training.
samples (Union[np.ndarray, torch.Tensor]) – Simulated samples to be used for training.

evaluate(parameters: ndarray | Tensor, samples: ndarray | Tensor, mode: str, param_space_bounds: List[Tuple[float]] | None = None) → ndarray[source]#

Evaluate the BFF test statistic over the given parameters and samples. Behaviour differs depending on mode:

‘critical_values’ and ‘diagnostics’ compute BFF once for each pair \((\theta, X)\).

‘confidence_sets’ computes BFF over all pairs given by the cartesian product of parameters (the parameter grid to construct confidence sets) and samples.

Parameters:

parameters (Union[np.ndarray, torch.Tensor]) – Parameters over which to evaluate the test statistic.
samples (Union[np.ndarray, torch.Tensor]) – Samples over which to evaluate the test statistic.
mode (str) – Either ‘critical_values’, ‘confidence_sets’, ‘diagnostics’.
param_space_bounds (Optional[List[Tuple[float]]]) – Bounds of the parameter space, both POIs and nuisances. Must be in the same order as in parameters. NOTE: Bounds are needed because we support only box uniform proposal distributions over the parameter space at the moment.

Returns:

BFF test statistics evaluated over parameters and samples.

Return type:

np.ndarray

Raises:

ValueError – If mode is not among the pre-specified values.

class lf2i.test_statistics.Waldo(estimator: str | Any, poi_dim: int, estimation_method: str, num_posterior_samples: int | None = None, cond_variance_estimator: str | Any | None = None, estimator_kwargs: Dict = {}, cond_variance_estimator_kwargs: Dict = {}, n_jobs: int = -2)[source]#

Bases: TestStatistic

Implements the Waldo test statistic, as described in arXiv:2205.15680.

Parameters:

estimator (Union[str, Any]) –
If estimation_method == prediction, then this is the conditional mean estimator. If estimation_method == posterior, then this is the posterior estimator. Currently compatible with posterior objects from SBI package (https://github.com/mackelab/sbi)

If str, will use one of the predefined estimators. If Any, a trained estimator is expected. Needs to implement estimator.predict(X=…) (“prediction”), or estimator.sample(sample_shape=…, x=…) (“posterior”).
poi_dim (int) – Dimensionality (number) of the parameters of interest.
estimation_method (str) – Whether the estimator is a prediction algorithm (“prediction”) or a posterior estimator (“posterior”).
num_posterior_samples (Optional[int], optional) – Number of posterior samples to draw to approximate conditional mean and variance if estimation_method == posterior, by default None
cond_variance_estimator (Optional[Union[str, Any]], optional) – If estimation_method == prediction, then this is the conditional variance estimator, by default None
estimator_kwargs (Dict) – Hyperparameters and settings for the conditional mean estimator, by default {}.
cond_variance_estimator_kwargs (Dict) – Hyperparameters and settings for the conditional variance estimator, by default {}.
n_jobs (int, optional) – Number of workers to use when evaluating Waldo over multiple inputs if using a posterior estimator. By default -2, which uses all cores minus one.

estimate(parameters: ndarray | Tensor, samples: ndarray | Tensor) → None[source]#

Train the estimator(s) for the conditional mean and conditional variance.

Parameters:

parameters (Union[np.ndarray, torch.Tensor]) – Simulated parameters to be used for training.
samples (Union[np.ndarray, torch.Tensor]) – Simulated samples to be used for training.

evaluate(parameters: ndarray | Tensor, samples: ndarray | Tensor, mode: str) → ndarray[source]#

Evaluate the Waldo test statistic over the given parameters and samples.

Behaviour differs depending on mode: ‘critical_values’, ‘confidence_sets’, ‘diagnostics’:

If mode equals critical_values or diagnostics, evaluate Waldo over pairs \((\theta_i, x_i)\).
If mode equals confidence_sets, evaluate Waldo over all pairs given by the cartesian product of parameters (the parameter grid to construct confidence sets) and samples.

Parameters:

parameters (np.ndarray) – Parameters over which to evaluate the test statistic.
samples (np.ndarray) – Samples over which to evaluate the test statistic.
mode (str) – Either ‘critical_values’, ‘confidence_sets’, ‘diagnostics’.

Returns:

Waldo test statistics evaluated over parameters and samples.

Return type:

np.ndarray