Fitting Models and Estimating Uncertainty

The primary function for fitting temporal mixture models is fit!. This function takes a mixture model and a dataset as input and performs the fitting process using the Expectation-Maximization (EM) algorithm. The function modifies the input model in place, updating its parameters to best fit the data.

TemporalMixtureModels.fit!Function

Fit a mixture model to the given DataFrame.

Usage

fit!(model, df, [id_col], [time_col], [value_col], [var_name_col]; kwargs...)

Arguments

  • model::AbstractMixtureModel: The mixture model to fit.
  • df::DataFrame: The input data.

Optional Keyword Arguments

  • id_col::String="id": The name of the column containing individual IDs.
  • time_col::String="time": The name of the column containing time points.
  • value_col::String="value": The name of the column containing observed values.
  • var_name_col::String="var_name": The name of the column containing variable, can be ignored for univariate data.
  • kwargs...: Additional keyword arguments passed to the internal fit! function.

Example

model = UnivariateMixtureModel(2, PolynomialRegression(2))
fit!(model, df)
source

Additional keyword arguments used by fit!

  • rng: An optional random number generator for reproducibility. Default is Random.default_rng().
  • verbose: A boolean flag to control the verbosity of the fitting process. Default is true.
  • max_iter: The maximum number of iterations for the EM algorithm. Default is 100.
  • tol: The tolerance for convergence. The fitting process stops when the change in log-likelihood is less than this value. Default is 1e-6.
  • hard_assignment: A boolean flag indicating whether to use hard assignments (True) or soft assignments (False) during the E-step of the EM algorithm. Default is false.

Evaluating Model Fit

To evaluate the fit of a temporal mixture model, the package provides functions to compute the log-likelihoods and posterior responsibilities.

Bootstrap Confidence Intervals

To estimate the uncertainty of the fitted model parameters, the package provides a bootstrap_ci function. This function performs bootstrap resampling to compute confidence intervals for the model parameters.

TemporalMixtureModels.bootstrap_ciFunction

Run bootstrap resampling to estimate confidence intervals for the coefficients of each component in a mixture model.

Arguments

  • model::AbstractMixtureModel{T}: The fitted mixture model
  • df::DataFrame: The input DataFrame containing the data.

Optional keyword arguments

  • n_bootstrap::Int=100: Number of bootstrap samples to draw.
  • alpha::Float64=0.05: Significance level for the confidence intervals (e.g., 0.05 for 95% CI).
  • rng::AbstractRNG=Random.GLOBAL_RNG: Random number generator to use.
  • prog::Bool=true: Whether to show a progress bar.

Returns

In case of a univariate mixture model

  • ci_results::Vector{Dict{Symbol, Any}}: A vector where each element corresponds to a component and contains a dictionary with keys :lower and :upper for the confidence intervals of the coefficients.
  • component_samples::Vector{Vector{Vector{T}}}: A vector where each element corresponds to a component and contains a vector of coefficient samples from the bootstrap resampling.
  • ambiguities_detected::Int: The number of ambiguities detected during component matching in the bootstrap resampling. A high number may indicate unreliable confidence intervals due to uncertain sample assignment to components.

In case of a multivariate mixture model

  • ci_results::Vector{Dict{Symbol, Any}}: A vector where each element corresponds to a component and contains a dictionary. Each dictionary has variable names as keys and values are tuples with keys :lower and :upper for the confidence intervals of the coefficients for that variable.
  • component_samples::Vector{Dict{Symbol, Vector{Vector{T}}}}: A vector where each element corresponds to a component and contains a dictionary. Each dictionary has variable names as keys and values are vectors of coefficient samples from the bootstrap resampling for that variable.
  • ambiguities_detected::Int: The number of ambiguities detected during component matching in the bootstrap resampling. A high number may indicate unreliable confidence intervals due to uncertain sample assignment to components.
source