Fitting Models and Estimating Uncertainty
The primary function for fitting temporal mixture models is fit!. This function takes a mixture model and a dataset as input and performs the fitting process using the Expectation-Maximization (EM) algorithm. The function modifies the input model in place, updating its parameters to best fit the data.
TemporalMixtureModels.fit! — FunctionFit a mixture model to the given DataFrame.
Usage
fit!(model, df, [id_col], [time_col], [value_col], [var_name_col]; kwargs...)Arguments
model::AbstractMixtureModel: The mixture model to fit.df::DataFrame: The input data.
Optional Keyword Arguments
id_col::String="id": The name of the column containing individual IDs.time_col::String="time": The name of the column containing time points.value_col::String="value": The name of the column containing observed values.var_name_col::String="var_name": The name of the column containing variable, can be ignored for univariate data.kwargs...: Additional keyword arguments passed to the internalfit!function.
Example
model = UnivariateMixtureModel(2, PolynomialRegression(2))
fit!(model, df)Additional keyword arguments used by fit!
rng: An optional random number generator for reproducibility. Default isRandom.default_rng().verbose: A boolean flag to control the verbosity of the fitting process. Default istrue.max_iter: The maximum number of iterations for the EM algorithm. Default is100.tol: The tolerance for convergence. The fitting process stops when the change in log-likelihood is less than this value. Default is1e-6.hard_assignment: A boolean flag indicating whether to use hard assignments (True) or soft assignments (False) during the E-step of the EM algorithm. Default isfalse.
Evaluating Model Fit
To evaluate the fit of a temporal mixture model, the package provides functions to compute the log-likelihoods and posterior responsibilities.
TemporalMixtureModels.log_likelihood — FunctionCompute the total log-likelihood of the mixture model for the given DataFrame.
TemporalMixtureModels.posterior_responsibilities — FunctionCompute posterior responsibilities for the given DataFrame.
Bootstrap Confidence Intervals
To estimate the uncertainty of the fitted model parameters, the package provides a bootstrap_ci function. This function performs bootstrap resampling to compute confidence intervals for the model parameters.
TemporalMixtureModels.bootstrap_ci — FunctionRun bootstrap resampling to estimate confidence intervals for the coefficients of each component in a mixture model.
Arguments
model::AbstractMixtureModel{T}: The fitted mixture modeldf::DataFrame: The input DataFrame containing the data.
Optional keyword arguments
n_bootstrap::Int=100: Number of bootstrap samples to draw.alpha::Float64=0.05: Significance level for the confidence intervals (e.g., 0.05 for 95% CI).rng::AbstractRNG=Random.GLOBAL_RNG: Random number generator to use.prog::Bool=true: Whether to show a progress bar.
Returns
In case of a univariate mixture model
ci_results::Vector{Dict{Symbol, Any}}: A vector where each element corresponds to a component and contains a dictionary with keys:lowerand:upperfor the confidence intervals of the coefficients.component_samples::Vector{Vector{Vector{T}}}: A vector where each element corresponds to a component and contains a vector of coefficient samples from the bootstrap resampling.ambiguities_detected::Int: The number of ambiguities detected during component matching in the bootstrap resampling. A high number may indicate unreliable confidence intervals due to uncertain sample assignment to components.
In case of a multivariate mixture model
ci_results::Vector{Dict{Symbol, Any}}: A vector where each element corresponds to a component and contains a dictionary. Each dictionary has variable names as keys and values are tuples with keys:lowerand:upperfor the confidence intervals of the coefficients for that variable.component_samples::Vector{Dict{Symbol, Vector{Vector{T}}}}: A vector where each element corresponds to a component and contains a dictionary. Each dictionary has variable names as keys and values are vectors of coefficient samples from the bootstrap resampling for that variable.ambiguities_detected::Int: The number of ambiguities detected during component matching in the bootstrap resampling. A high number may indicate unreliable confidence intervals due to uncertain sample assignment to components.