Bootstrapping for Uncertainty Estimation

TemporalMixtureModels.jl provides functionality to perform bootstrapping to estimate the uncertainty of the fitted model parameters. Bootstrapping involves resampling the data with replacement and refitting the model multiple times to obtain a distribution of parameter estimates.

Performing Bootstrapping

To perform bootstrapping with TemporalMixtureModels.jl, you can use the bootstrap function. This function works in the same way as the fit_mixture function, but it takes an additional argument specifying the number of bootstrap samples to generate.

TemporalMixtureModels.bootstrapFunction
bootstrap(component::Component, n_components::Int, n_bootstrap::Int, 
t::AbstractVector, y::AbstractMatrix, ids::AbstractVector;
n_repeats::Int=5,
error_model::ErrorModel=NormalError(),
inputs=nothing,
max_iter::Int=100,
tol::Float64=1e-6,
separation_threshold::Float64=0.01,
rng::AbstractRNG=Random.GLOBAL_RNG,
show_progress_bar::Bool=true)

Run bootstrap resampling to estimate confidence intervals for the coefficients of each component in a mixture model. The function fits the mixture model multiple times on bootstrap samples drawn with replacement from the original data. After fitting, it matches the components to the original fit to prevent label switching and collects the parameter estimates. Separation scores are computed to detect ambiguities in component matching, which may indicate unreliable confidence intervals. This can happen if the components aren't well separated, for example when too many components are specified.

Arguments

  • component::Component: The component model to use (e.g., PolynomialRegression).
  • n_components::Int: Number of mixture components (clusters).
  • n_bootstrap::Int: Number of bootstrap samples to draw.
  • t::AbstractVector: Time points vector.
  • y::AbstractMatrix: Observations matrix (rows: time points, columns: measurements).
  • ids::AbstractVector: Subject IDs vector.

Optional keyword arguments

  • n_repeats::Int=5: Number of random initializations for fitting.
  • error_model::ErrorModel=NormalError(): Error model to use. Only NormalError is currently supported.
  • inputs=nothing: Additional inputs for the component model (if applicable).
  • max_iter::Int=100: Maximum number of EM iterations.
  • tol::Float64=1e-6: Convergence tolerance for the EM algorithm.
  • separation_threshold::Float64=0.01: Threshold for detecting ambiguities in component matching.
  • rng::AbstractRNG=Random.GLOBAL_RNG: Random number generator to use.
  • show_progress_bar::Bool=true: Whether to show a progress bar.

Returns

  • bootstrap_results::Vector{MixtureResult}: A vector of MixtureResult objects from each bootstrap sample.
  • ambiguities_detected::Int: The number of ambiguities detected during component matching in the bootstrap resampling. A high number may indicate unreliable confidence intervals due to uncertain sample assignment to components.
source

The bootstrap function returns a vector of fitted mixture models, each corresponding to a bootstrap sample. We can then predict using each of these models to obtain a distribution of predictions, which can then also be used to estimate confidence intervals.