lf2i.test_statistics package#

Module contents#

class lf2i.test_statistics.TestStatistic(acceptance_region: str, estimation_method: str)[source]#

Bases: ABC

Base class for test statistics. This is a template from which every test statistic should inherit.

Parameters:
  • acceptance_region (str) – Whether the acceptance region for the corresponding test is defined to be on the right or on the left of the critical value. Must be either left or right.

  • estimation_method (str) – The method with which the test statistic is estimated. If likelihood-based test statistics are used, e.g. ACORE and BFF, then ‘likelihood’. If prediction/posterior-based test statistics are used, e.g. WALDO, then ‘prediction’ or ‘posterior’.

abstract estimate()[source]#
abstract evaluate()[source]#
class lf2i.test_statistics.ACORE(estimator: str | Any, poi_dim: int, nuisance_dim: int, batch_size: int, data_dim: int, estimator_kwargs: Dict = {}, n_jobs: int = -2)[source]#

Bases: TestStatistic

Implements the ACORE test statistic as described in https://proceedings.mlr.press/v119/dalmasso20a.html and https://arxiv.org/abs/2107.03920.

Parameters:
  • estimator (Union[str, Any]) – Probabilistic classifier used to estimate odds (i.e., likelihood up to a normalization constant). If str, must be one of the predefined estimators listed in test_statistics/estimators.py. If Any, a trained estimator is expected. Needs to implement estimator.predict_proba(X=…).

  • poi_dim (int) – Dimensionality (number) of the parameters of interest.

  • nuisance_dim (int) – Dimensionality (number) of the nuisance parameters (systematics). Should be 0 if all parameters are object of inference.

  • batch_size (int) – Size of a batch of datapoints from a specific parameter configuration. Must be the same for observations and simulations. A simulated/observed batch from a specific parameter configuration will have dimensions (batch_size, data_dim).

  • data_dim (int) – Dimensionality of a single datapoint X.

  • estimator_kwargs (Dict, optional) – Hyperparameters and settings for the conditional mean estimator, by default {}.

  • n_jobs (int, optional) – Number of workers to use when computing ACORE over multiple inputs, by default -2, which uses all cores minus one.

estimate(labels: ndarray | Tensor, parameters: ndarray | Tensor, samples: ndarray | Tensor) None[source]#

Train the estimator for odds (i.e. likelihood up to a normalization constant). The training dataset should contain two classes:

  • label 1, with pairs \((\theta, X)\) where \(X \sim p(\cdot;\theta)\) is drawn from the likelihood/simulator.

  • label 0, with pairs \((\theta, X)\) where \(X \sim G\) is drawn from a dominating reference distribution (e.g., empirical marginal).

To goal is to train a classifier that is able to distinguish whether a sample comes from the likelihood or not. See https://arxiv.org/abs/2107.03920 for a more detailed explanation.

Parameters:
  • labels (Union[np.ndarray, torch.Tensor]) – Class labels 0/1.

  • parameters (Union[np.ndarray, torch.Tensor]) – Simulated parameters to be used for training.

  • samples (Union[np.ndarray, torch.Tensor]) – Simulated samples to be used for training.

evaluate(parameters: ndarray | Tensor, samples: ndarray | Tensor, mode: str, param_space_bounds: List[Tuple[float]] | None) ndarray[source]#

Evaluate the ACORE test statistic over the given parameters and samples. Behaviour differs depending on mode:

  • ‘critical_values’ and ‘diagnostics’ compute ACORE once for each pair \((\theta, X)\).

  • ‘confidence_sets’ computes ACORE over all pairs given by the cartesian product of parameters (the parameter grid to construct confidence sets) and samples.

Parameters:
  • parameters (Union[np.ndarray, torch.Tensor]) – Parameters over which to evaluate the test statistic.

  • samples (Union[np.ndarray, torch.Tensor]) – Samples over which to evaluate the test statistic.

  • mode (str) – Either ‘critical_values’, ‘confidence_sets’, ‘diagnostics’.

  • param_space_bounds (Optional[List[Tuple[float]]]) – Bounds of the parameter space, both POIs and nuisances. Must be in the same order as in parameters.

Returns:

ACORE test statistics evaluated over parameters and samples.

Return type:

np.ndarray

Raises:

ValueError – If mode is not among the pre-specified values.

class lf2i.test_statistics.BFF(estimator: str | Any, poi_dim: int, nuisance_dim: int, batch_size: int, data_dim: int, estimator_kwargs: Dict = {}, n_jobs: int = -2)[source]#

Bases: TestStatistic

Implements the BFF test statistic as described in https://arxiv.org/abs/2107.03920. NOTE: for now supports only box uniform proposal distributions over the parameter space.

Parameters:
  • estimator (Union[str, Any]) – Probabilistic classifier used to estimate odds (i.e., likelihood up to a normalization constant). If str, must be one of the predefined estimators listed in test_statistics/_estimators.py. If Any, a trained estimator is expected. Needs to implement estimator.predict_proba(X=…).

  • poi_dim (int) – Dimensionality (number) of the parameters of interest.

  • nuisance_dim (int) – Dimensionality (number) of the nuisance parameters (systematics). Should be 0 if all parameters are object of inference.

  • batch_size (int) – Size of a batch of datapoints from a specific parameter configuration. Must be the same for observations and simulations. A simulated/observed batch from a specific parameter configuration will have dimensions (batch_size, data_dim).

  • data_dim (int) – Dimensionality of a single datapoint X.

  • estimator_kwargs (Dict, optional) – Hyperparameters and settings for the conditional mean estimator, by default {}.

  • n_jobs (int, optional) – Number of workers to use when computing BFF over multiple inputs, by default -2, which uses all cores minus one.

estimate(labels: ndarray | Tensor, parameters: ndarray | Tensor, samples: ndarray | Tensor) None[source]#

Train the estimator for odds (i.e. likelihood up to a normalization constant). The training dataset should contain two classes:

  • label 1, with pairs \((\theta, X)\) where \(X \sim p(\cdot;\theta)\) is drawn from the likelihood/simulator.

  • label 0, with pairs \((\theta, X)\) where \(X \sim G\) is drawn from a dominating reference distribution (e.g., empirical marginal).

To goal is to train a classifier that is able to distinguish whether a sample comes from the likelihood or not. See https://arxiv.org/abs/2107.03920 for a more detailed explanation.

Parameters:
  • labels (Union[np.ndarray, torch.Tensor]) – Class labels 0/1.

  • parameters (Union[np.ndarray, torch.Tensor]) – Simulated parameters to be used for training.

  • samples (Union[np.ndarray, torch.Tensor]) – Simulated samples to be used for training.

evaluate(parameters: ndarray | Tensor, samples: ndarray | Tensor, mode: str, param_space_bounds: List[Tuple[float]] | None = None) ndarray[source]#

Evaluate the BFF test statistic over the given parameters and samples. Behaviour differs depending on mode:

  • ‘critical_values’ and ‘diagnostics’ compute BFF once for each pair \((\theta, X)\).

  • ‘confidence_sets’ computes BFF over all pairs given by the cartesian product of parameters (the parameter grid to construct confidence sets) and samples.

Parameters:
  • parameters (Union[np.ndarray, torch.Tensor]) – Parameters over which to evaluate the test statistic.

  • samples (Union[np.ndarray, torch.Tensor]) – Samples over which to evaluate the test statistic.

  • mode (str) – Either ‘critical_values’, ‘confidence_sets’, ‘diagnostics’.

  • param_space_bounds (Optional[List[Tuple[float]]]) – Bounds of the parameter space, both POIs and nuisances. Must be in the same order as in parameters. NOTE: Bounds are needed because we support only box uniform proposal distributions over the parameter space at the moment.

Returns:

BFF test statistics evaluated over parameters and samples.

Return type:

np.ndarray

Raises:

ValueError – If mode is not among the pre-specified values.

class lf2i.test_statistics.Waldo(estimator: str | Any, poi_dim: int, estimation_method: str, num_posterior_samples: int | None = None, cond_variance_estimator: str | Any | None = None, estimator_kwargs: Dict = {}, cond_variance_estimator_kwargs: Dict = {}, n_jobs: int = -2)[source]#

Bases: TestStatistic

Implements the Waldo test statistic, as described in arXiv:2205.15680.

Parameters:
  • estimator (Union[str, Any]) –

    If estimation_method == prediction, then this is the conditional mean estimator. If estimation_method == posterior, then this is the posterior estimator. Currently compatible with posterior objects from SBI package (https://github.com/mackelab/sbi)

    If str, will use one of the predefined estimators. If Any, a trained estimator is expected. Needs to implement estimator.predict(X=…) (“prediction”), or estimator.sample(sample_shape=…, x=…) (“posterior”).

  • poi_dim (int) – Dimensionality (number) of the parameters of interest.

  • estimation_method (str) – Whether the estimator is a prediction algorithm (“prediction”) or a posterior estimator (“posterior”).

  • num_posterior_samples (Optional[int], optional) – Number of posterior samples to draw to approximate conditional mean and variance if estimation_method == posterior, by default None

  • cond_variance_estimator (Optional[Union[str, Any]], optional) – If estimation_method == prediction, then this is the conditional variance estimator, by default None

  • estimator_kwargs (Dict) – Hyperparameters and settings for the conditional mean estimator, by default {}.

  • cond_variance_estimator_kwargs (Dict) – Hyperparameters and settings for the conditional variance estimator, by default {}.

  • n_jobs (int, optional) – Number of workers to use when evaluating Waldo over multiple inputs if using a posterior estimator. By default -2, which uses all cores minus one.

estimate(parameters: ndarray | Tensor, samples: ndarray | Tensor) None[source]#

Train the estimator(s) for the conditional mean and conditional variance.

Parameters:
  • parameters (Union[np.ndarray, torch.Tensor]) – Simulated parameters to be used for training.

  • samples (Union[np.ndarray, torch.Tensor]) – Simulated samples to be used for training.

evaluate(parameters: ndarray | Tensor, samples: ndarray | Tensor, mode: str) ndarray[source]#

Evaluate the Waldo test statistic over the given parameters and samples.

Behaviour differs depending on mode: ‘critical_values’, ‘confidence_sets’, ‘diagnostics’:
  • If mode equals critical_values or diagnostics, evaluate Waldo over pairs \((\theta_i, x_i)\).

  • If mode equals confidence_sets, evaluate Waldo over all pairs given by the cartesian product of parameters (the parameter grid to construct confidence sets) and samples.

Parameters:
  • parameters (np.ndarray) – Parameters over which to evaluate the test statistic.

  • samples (np.ndarray) – Samples over which to evaluate the test statistic.

  • mode (str) – Either ‘critical_values’, ‘confidence_sets’, ‘diagnostics’.

Returns:

Waldo test statistics evaluated over parameters and samples.

Return type:

np.ndarray