Fitting Models and Estimating Uncertainty

The primary function for fitting temporal mixture models is fit!. This function takes a mixture model and a dataset as input and performs the fitting process using the Expectation-Maximization (EM) algorithm. The function modifies the input model in place, updating its parameters to best fit the data.

TemporalMixtureModels.fit! — Function

Fit a mixture model to the given DataFrame.

Usage

fit!(model, df, [id_col], [time_col], [value_col], [var_name_col]; kwargs...)

Arguments

model::AbstractMixtureModel: The mixture model to fit.
df::DataFrame: The input data.

Optional Keyword Arguments

id_col::String="id": The name of the column containing individual IDs.
time_col::String="time": The name of the column containing time points.
value_col::String="value": The name of the column containing observed values.
var_name_col::String="var_name": The name of the column containing variable, can be ignored for univariate data.
kwargs...: Additional keyword arguments passed to the internal fit! function.

Example

model = UnivariateMixtureModel(2, PolynomialRegression(2))
fit!(model, df)

source

Additional keyword arguments used by `fit!`

rng: An optional random number generator for reproducibility. Default is Random.default_rng().
verbose: A boolean flag to control the verbosity of the fitting process. Default is true.
max_iter: The maximum number of iterations for the EM algorithm. Default is 100.
tol: The tolerance for convergence. The fitting process stops when the change in log-likelihood is less than this value. Default is 1e-6.
hard_assignment: A boolean flag indicating whether to use hard assignments (True) or soft assignments (False) during the E-step of the EM algorithm. Default is false.

Evaluating Model Fit

To evaluate the fit of a temporal mixture model, the package provides functions to compute the log-likelihoods and posterior responsibilities.

TemporalMixtureModels.log_likelihood — Function

Compute the total log-likelihood of the mixture model for the given DataFrame.

source

TemporalMixtureModels.posterior_responsibilities — Function

Compute posterior responsibilities for the given DataFrame.

source

Bootstrap Confidence Intervals

To estimate the uncertainty of the fitted model parameters, the package provides a bootstrap_ci function. This function performs bootstrap resampling to compute confidence intervals for the model parameters.

TemporalMixtureModels.bootstrap_ci — Function

Run bootstrap resampling to estimate confidence intervals for the coefficients of each component in a mixture model.

Arguments

model::AbstractMixtureModel{T}: The fitted mixture model
df::DataFrame: The input DataFrame containing the data.

Optional keyword arguments

n_bootstrap::Int=100: Number of bootstrap samples to draw.
alpha::Float64=0.05: Significance level for the confidence intervals (e.g., 0.05 for 95% CI).
rng::AbstractRNG=Random.GLOBAL_RNG: Random number generator to use.
prog::Bool=true: Whether to show a progress bar.

Returns

In case of a univariate mixture model

ci_results::Vector{Dict{Symbol, Any}}: A vector where each element corresponds to a component and contains a dictionary with keys :lower and :upper for the confidence intervals of the coefficients.
component_samples::Vector{Vector{Vector{T}}}: A vector where each element corresponds to a component and contains a vector of coefficient samples from the bootstrap resampling.
ambiguities_detected::Int: The number of ambiguities detected during component matching in the bootstrap resampling. A high number may indicate unreliable confidence intervals due to uncertain sample assignment to components.

In case of a multivariate mixture model

ci_results::Vector{Dict{Symbol, Any}}: A vector where each element corresponds to a component and contains a dictionary. Each dictionary has variable names as keys and values are tuples with keys :lower and :upper for the confidence intervals of the coefficients for that variable.
component_samples::Vector{Dict{Symbol, Vector{Vector{T}}}}: A vector where each element corresponds to a component and contains a dictionary. Each dictionary has variable names as keys and values are vectors of coefficient samples from the bootstrap resampling for that variable.
ambiguities_detected::Int: The number of ambiguities detected during component matching in the bootstrap resampling. A high number may indicate unreliable confidence intervals due to uncertain sample assignment to components.

source

Fitting Models and Estimating Uncertainty

Additional keyword arguments used by fit!

Evaluating Model Fit

Bootstrap Confidence Intervals

Additional keyword arguments used by `fit!`