Univariate Model
This tutorial provides a step-by-step guide on how to use the TemporalMixtureModels.jl package to fit temporal mixture models to time series data. It provides a simple example using synthetic data to illustrate the main functionalities of the package, including data preparation, model fitting, prediction, and uncertainty estimation.
Prerequisites
While the package uses an internal data structure for the time series data, the package by default implements automatic conversion from a DataFrame to its internal format.
using TemporalMixtureModels, DataFramesGenerating Synthetic Data
For this tutorial, we will generate synthetic univariate time series data. The data will consist of three clusters, each generated from a different polynomial function with added Gaussian noise. The input DataFrame should be a so-called "long" format, with columns for the individual ID, time points, and observed values.
# Set random seed for reproducibility
using Random
Random.seed!(1234)
# Generate time points
individuals_per_group = [20, 30]
t_values = 0:0.1:10
n_groups = 2
group_coefficients = [ [2.0, -0.5, 0.05], # Group 1: y = 2 - 0.5*t + 0.05*t^2
[1.0, 0.3, -0.02] ] # Group 2: y = 1 + 0.3*t - 0.02*t^2
ids = Int[]
timepoints = Float64[]
measurements = Float64[]
for group in 1:n_groups
id_start = sum(individuals_per_group[1:group-1]) + 1
id_end = sum(individuals_per_group[1:group])
for individual in id_start:id_end
for t in t_values
y = group_coefficients[group][1] + group_coefficients[group][2]*t + group_coefficients[group][3]*t^2 + randn()*0.25
push!(ids, individual)
push!(timepoints, t)
push!(measurements, y)
end
end
end
input_data = DataFrame(id = ids, time = timepoints, value = measurements)
first(input_data, 5) # Display the first 5 rows of the DataFrame| Row | id | time | value |
|---|---|---|---|
| Int64 | Float64 | Float64 | |
| 1 | 1 | 0.0 | 1.91007 |
| 2 | 1 | 0.1 | 2.2223 |
| 3 | 1 | 0.2 | 1.7971 |
| 4 | 1 | 0.3 | 2.03423 |
| 5 | 1 | 0.4 | 1.91306 |
Here you can see the first few rows of the generated DataFrame. Each row corresponds to a measurement for a specific individual at a specific time point. The internal structure of the model takes care of the bookkeeping during the fitting process.
Fitting a Temporal Mixture Model
Now that we have our synthetic data, we can fit a temporal mixture model using the fit! function. We will use a polynomial regression model as the component model for this example. We will specify the number of components (clusters) we expect in the data.
We first define the model, specifying the number of components and the type of component model to use. In this case, we will use a polynomial regression model of degree 2.
# Define the number of components and the component model
n_components = 2
component_model = PolynomialRegression(2)
model = UnivariateMixtureModel(n_components, component_model)UnivariateMixtureModel{Float64}(AbstractMixtureModelComponent{Float64}[PolynomialRegression{Float64}(2, [0.0, 0.0, 0.0]), PolynomialRegression{Float64}(2, [0.0, 0.0, 0.0])], [0.5, 0.5], [0.0, 0.0], -Inf, false, 0)We can now fit the model to our data using the fit! function. This function takes the model and the input DataFrame as arguments and performs the fitting process.
# Fit the model to the data
fit!(model, input_data)UnivariateMixtureModel{Float64}(AbstractMixtureModelComponent{Float64}[PolynomialRegression{Float64}(2, [1.9918590501073516, -0.49286490747138556, 0.049279349186218915]), PolynomialRegression{Float64}(2, [0.989186937982434, 0.3062137555464907, -0.020612764194961962])], [0.4, 0.6], [0.062871976937173, 0.06294039234699463], -212.13465861833342, true, 3)We can also compute the confidence intervals for the model parameters using bootstrap resampling. This provides an estimate of the uncertainty in the parameter estimates.
# Compute bootstrap confidence intervals
n_bootstrap = 100 # Number of bootstrap samples
confidence_intervals, _, _ = bootstrap_ci(model, input_data; n_bootstrap=n_bootstrap)
confidence_intervals2-element Vector{Any}:
(lower = [1.9619378486909782, -0.5040298747290615, 0.04810395842021009], upper = [2.026180513936858, -0.4808429504588593, 0.05013344858781813])
(lower = [0.9544293489367097, 0.2927516811190185, -0.022293345025492427], upper = [1.0128094093921465, 0.3224244442686258, -0.01927421348782255])The output shows the confidence intervals for the parameters of each component model in the mixture. We can use these intervals and the fitted model to visualize the results.
using CairoMakie
# Create a range of time points for prediction
t_pred = 0:0.1:10
# Prepare a figure
figure_mixture = let f = Figure()
ax = Axis(f[1, 1], xlabel="Time", ylabel="Value", title="Temporal Mixture Model Fit")
# Plot the original data points
scatter!(ax, input_data.time, input_data.value; color=:gray, markersize=4, label="Data")
# Plot the fitted component models
predictions = predict(model, t_pred)
COMPONENT_COLORS = [colorant"#c74300", colorant"#008aa1"]
for k in axes(predictions, 2)
lines!(ax, t_pred, predictions[:, k];
label="Component $k", color=COMPONENT_COLORS[k], linewidth=2)
# get the confidence bounds. This is still a bit clunky because there is no API for this yet
lower_bound_parameters = confidence_intervals[k][:lower]
upper_bound_parameters = confidence_intervals[k][:upper]
model_lb = PolynomialRegression(2, lower_bound_parameters)
model_ub = PolynomialRegression(2, upper_bound_parameters)
y_lower = predict(model_lb, t_pred)
y_upper = predict(model_ub, t_pred)
# Plot the confidence intervals as shaded areas
band!(ax, t_pred, y_lower, y_upper;
color=(COMPONENT_COLORS[k], 0.2), label="Component $k")
end
axislegend(ax; merge=true)
f
end
figure_mixture