lf2i.diagnostics package#
Submodules#
lf2i.diagnostics.diagnostics module#
- lf2i.diagnostics.diagnostics.estimate_coverage_proba(indicators: ndarray, parameters: ndarray, estimator: str, estimator_kwargs: Dict, param_dim: int, new_parameters: ndarray | None = None, n_sigma: int = 2) Tuple[Any, ndarray, ndarray, ndarray] [source]#
Estimate conditional coverage probabilities by regressing indicators, which signal if the corresponding value in parameters was included or not in the parameter region, against the parameters themselves.
Note that indicators can be computed from any parameter region (posterior credible sets, confidence sets, prediction sets, etc …).
- Parameters:
indicators (np.ndarray) – Array of zeros and ones to mark which parameters were included or not in the corresponding parameter regions.
parameters (np.ndarray) – Array of p-dimensional parameters.
estimator (str) – Name of the probabilistic classifier to use to estimate coverage probabilities.
estimator_kwargs (Dict) – Settings for estimator.
param_dim (int) – Dimensionality of the parameter.
new_parameters (Optional[np.ndarray], optional) – Array of parameters over which to estimate/evaluate coverage probabilities. If not provided, both training and evaluation of the probabilistic classifier are done over parameters.
n_sigma (int, optional) – Uncertainties around the estimated mean coverage proabilities are computed as \(\mu \pm se \cdot n\_sigma\). If using the splines estimator, the standard errors are based on the posterior distribution of the model coefficients. By default 2.
- Returns:
Fitted estimator, evaluated parameters, and estimated coverage probabilities – mean, upper-n_sigma bound. lower-n_sigma bound
- Return type:
Tuple[Any, np.ndarray, np.ndarray, np.ndarray, np.ndarray]
- Raises:
ValueError – Estimator must be one of [splines]
- lf2i.diagnostics.diagnostics.compute_indicators_lf2i(test_statistics: ndarray, critical_values: ndarray, parameters: ndarray, acceptance_region: str, param_dim: int) ndarray [source]#
Construct an array of indicators which mark whether each value in parameters is included or not in the corresponding LF2I confidence region.
This assumes that parameters is an array containing the “true” values, as simulated for the diagnostics branch. Instead of actually checking if the parameter is geometrically included in the confidence region or not, this allows to simply deem a value as included if the corresponding test does not reject it.
- Parameters:
test_statistics (np.ndarray) – Array of test statistics. Each value must be computed for the test with corresponding (null) value of parameters, given a sample generate from it.
critical_values (np.ndarray) – Array of critical values, each computed for the test with corresponding (null) value of parameters, against which to compare the test statistics.
parameters (np.ndarray) – True (simulated) parameter values. If a parameter is in the acceptance region of the corresponding test, then it is included in the confidence set.
acceptance_region (str) – Whether the acceptance region for the corresponding test is defined to be on the right or on the left of the critical value. Must be either left or right.
param_dim (int) – Dimensionality of the parameter.
- Returns:
Array of zeros and ones that indicate whether the corresponding value in parameters is included or not in the confidence region.
- Return type:
np.ndarray
- Raises:
ValueError – acceptance_region must be either left or right.
- lf2i.diagnostics.diagnostics.compute_indicators_posterior(posterior: NeuralPosterior | KDEWrapper | Sequence[NeuralPosterior | KDEWrapper], parameters: Tensor, samples: Tensor, parameter_grid: Tensor, confidence_level: float, param_dim: int, batch_size: int, num_p_levels: int = 100000, tol: float = 0.01, return_credible_regions: bool = False, n_jobs: int = -2) ndarray | Tuple[ndarray, Sequence[ndarray]] [source]#
Construct an array of indicators which mark whether each value in parameters is included or not in the corresponding posterior credible region.
- Parameters:
posterior (Union[NeuralPosterior, KDEWrapper, Sequence[Union[NeuralPosterior, KDEWrapper]]],) – Estimated posterior distribution. If Sequence of posteriors, we assume i-th posterior is estimated given i-th element of samples. Must have log_prob() method.
parameters (torch.Tensor,) – True (simulated) parameter values, for which inclusion in the corresponding credible region is checked.
samples (torch.Tensor,) – Array of d-dimensional samples, each generated from the corresponding value in parameters.
parameter_grid (torch.Tensor,) – Parameter space over which posterior is defined. This is used to construct the credible region.
confidence_level (float) – Confidence level of the credible regions to be constructed. Must be in (0, 1).
param_dim (int) – Dimensionality of the parameter.
batch_size (int) – Number of samples drawn from the same parameter value, for each batch in samples. Each element of samples is of size (batch_size, data_dim).
num_p_levels (int, optional) – Number of level sets to consider to construct the high-posterior-density credible region, by default 100_000.
tol (float, optional) – Tolerance for the coverage probability of the credible region, used as stopping criterion to construct it, by default 0.01.
return_credible_regions (bool, optional) – Whether to return the credible regions computed along the way or not.
n_jobs (int, optional) – Number of workers to use when computing indicators over a sequence of inputs. By default -2, which uses all cores minus one.
- Returns:
Array of zeros and ones that indicate whether the corresponding value in parameters is included or not in the credible region. If return_credible_regions, then return a tuple whose second element is a sequence of credible regions (one for each parameter/sample).
- Return type:
Union[np.ndarray, Tuple[np.ndarray, Sequence[np.ndarray]]]
- lf2i.diagnostics.diagnostics.compute_indicators_prediction(test_statistic: Waldo, parameters: ndarray, samples: ndarray, confidence_level: float, param_dim: int) ndarray [source]#
Construct an array of indicators which mark whether each value in parameters is included or not in the corresponding prediction set. The (central) prediction set is computed using a gaussian approximation.
- Parameters:
test_statistic (Waldo) – An instance of the Waldo test statistic object, where Waldo.estimator and Waldo.cond_variance_estimator have been trained and have a predict(X=…) method to estimate the conditional mean and conditional variance given samples.
parameters (np.ndarray) – True (simulated) parameter values, for which inclusion in the corresponding prediction set is checked.
confidence_level (float) – Confidence level of the credible regions to be constructed. Must be in (0, 1).
param_dim (int) – Dimensionality of the parameter.
- Returns:
Array of zeros and ones that indicate whether the corresponding value in parameters is included or not in the prediction set.
- Return type:
np.ndarray
- Raises:
NotImplementedError – Only implemented for one-dimensional parameters.
- lf2i.diagnostics.diagnostics.fit_r_estimator(estimator: str, indicators: ndarray, parameters: ndarray, param_dim: int) Any [source]#
Estimate coverage probabilities across the whole parameter space using a pre-defined estimator available in R.
- Parameters:
estimator (str) – Name of the estimator to use.
indicators (np.ndarray) – Array of zeros and ones that indicate whether the corresponding value in parameters is included or not in the parameter region.
parameters (np.ndarray) – True (simulated) parameter values.
param_dim (int) – Dimensionality of the parameter.
- Returns:
A model object returned by the corresponding R code.
- Return type:
Any
- Raises:
NotImplementedError – Estimator must be one of [gam, TBD]
- lf2i.diagnostics.diagnostics.predict_r_estimator(fitted_estimator: Any, parameters: ndarray, param_dim: int, n_sigma: int) Tuple[ndarray, ndarray, ndarray] [source]#
Evaluate the trained R estimator and estimate the coverage probabilities given parameters.
- Parameters:
fitted_estimator (Any) – Trained estimator, as returned by fit_r_estimator.
param_dim (int) – Dimensionality of the parameter.
n_sigma (int) – Uncertainties around the estimated mean coverage proabilities are computed as mean +- se * n_sigma, by default 2.
- Returns:
Estimated conditional coverage probabilities – mean, upper-n_sigma bound. lower-n_sigma bound
- Return type:
Tuple[np.ndarray, np.ndarray, np.ndarray]