Multivariate Model

In this section, we will demonstrate how to fit a multivariate temporal mixture model using the TemporalMixtureModels.jl package. We will use synthetic data for this example.

Prerequisites

While the package uses an internal data structure for the time series data, the package by default implements automatic conversion from a DataFrame to its internal format.

using TemporalMixtureModels, DataFrames

Generating Synthetic Data

We will generate some synthetic multivariate time series data. The data will consist of two variables, each generated from different polynomial functions with added Gaussian noise. The input DataFrame should be a so-called "long" format, with columns for the individual ID, time points, variable names, and observed values.

# Set random seed for reproducibility
using Random
Random.seed!(1234)

# Generate time points
individuals_per_group = [20, 30]
t_values = 0:0.1:10
n_groups = 2

group_coefficients_y = [ [2.0, -0.5, 0.05],  # Group 1: y = 2 - 0.5*t + 0.05*t^2
                       [1.0, 0.3, -0.02] ] # Group 2: y = 1 + 0.3*t - 0.02*t^2
group_coefficients_z = [ [0.5, 0.4, -0.03],  # Group 1: z = 0.5 + 0.4*t - 0.03*t^2
                       [3.0, -0.2, 0.04] ] # Group 2: z = 3 - 0.2*t + 0.04*t^2

ids = Int[]
timepoints = Float64[]
measurements_y = Float64[]
measurements_z = Float64[]
for group in 1:n_groups
    id_start = sum(individuals_per_group[1:group-1]) + 1
    id_end = sum(individuals_per_group[1:group])
    for individual in id_start:id_end
        for t in t_values
            y = group_coefficients_y[group][1] + group_coefficients_y[group][2]*t + group_coefficients_y[group][3]*t^2 + randn()*0.25
            z = group_coefficients_z[group][1] + group_coefficients_z[group][2]*t + group_coefficients_z[group][3]*t^2 + randn()*0.25
            push!(ids, individual)
            push!(timepoints, t)
            push!(measurements_y, y)
            push!(measurements_z, z)
        end
    end
end

input_data = DataFrame(id = [ids; ids], time = [timepoints; timepoints], value = [measurements_y; measurements_z], var_name = repeat(["y", "z"], inner=length(ids)))
first(input_data, 5)  # Display the first 5 rows of the DataFrame
5×4 DataFrame
Rowidtimevaluevar_name
Int64Float64Float64String
110.01.91007y
210.11.8456y
310.22.00706y
410.32.36819y
510.41.73177y

Here you can see the first few rows of the generated DataFrame. Each row corresponds to a measurement for a specific individual at a specific time point for one of the two variables.

Fitting a Multivariate Temporal Mixture Model

Now that we have our synthetic multivariate data, we can fit a multivariate temporal mixture model using the fit! function. We will use polynomial regression models as the component models for this example. We will specify the number of components (clusters) we expect in the data.

We first define the model, specifying the number of components and the type of component models to use for each variable. In this case, we will use polynomial regression models of degree 2 for both variables.

# Define the number of components and the component models for each variable
n_components = 2
component_models = Dict(:y => PolynomialRegression(2), :z => PolynomialRegression(2))
model = MultivariateMixtureModel(n_components, component_models)
MultivariateMixtureModel{Float64}(Dict{Symbol, AbstractMixtureModelComponent{Float64}}[Dict(:y => PolynomialRegression{Float64}(2, [0.0, 0.0, 0.0]), :z => PolynomialRegression{Float64}(2, [0.0, 0.0, 0.0])), Dict(:y => PolynomialRegression{Float64}(2, [0.0, 0.0, 0.0]), :z => PolynomialRegression{Float64}(2, [0.0, 0.0, 0.0]))], [0.5, 0.5], [0.0 0.0; 0.0 0.0], -Inf, false, 0)

We can now fit the model to our data using the fit! function. This function takes the model and the input DataFrame as arguments and performs the fitting process.

# Fit the model to the data
fit!(model, input_data)
MultivariateMixtureModel{Float64}(Dict{Symbol, AbstractMixtureModelComponent{Float64}}[Dict(:y => PolynomialRegression{Float64}(2, [1.0005471675119773, 0.29730782662765787, -0.019680612927625955]), :z => PolynomialRegression{Float64}(2, [2.9833876700157558, -0.1948543443396753, 0.03967899708946765])), Dict(:y => PolynomialRegression{Float64}(2, [1.98633556351034, -0.48904758236841955, 0.04867909688884508]), :z => PolynomialRegression{Float64}(2, [0.5069300712194061, 0.39836390116509673, -0.029734077768493152]))], [0.6, 0.4], [0.061580955471056006 0.0643541024198967; 0.06267539351032858 0.05976239625697861], -357.09848322308216, true, 3)

As with the univariate case, we can also estimate the uncertainty of the fitted model parameters using the bootstrap_ci function. This function performs bootstrap resampling to compute confidence intervals for the model parameters.

# Compute bootstrap confidence intervals
n_bootstrap = 100  # Number of bootstrap samples
confidence_intervals, _, _ = bootstrap_ci(model, input_data; n_bootstrap=n_bootstrap)
confidence_intervals
2-element Vector{Any}:
 Dict{Symbol, Any}(:y => (lower = [0.9777883917870603, 0.2876007762445627, -0.020338176148768546], upper = [1.0267758220338905, 0.3041904063304761, -0.018883592062307095]), :z => (lower = [2.9571897917084833, -0.20564872844080367, 0.038524797569084834], upper = [3.0172107496373997, -0.18165510104695098, 0.04061399684471874]))
 Dict{Symbol, Any}(:y => (lower = [1.9410436374714606, -0.5056756860524578, 0.04717293399643091], upper = [2.024336455771562, -0.47032897754477737, 0.05012472023555149]), :z => (lower = [0.48102522010885146, 0.3861485908135129, -0.03062011920669294], upper = [0.5396725851808255, 0.40817937597514264, -0.028840212950910746]))

The output shows the confidence intervals for the parameters of each component model and variable in the mixture. We can use these intervals and the fitted model to visualize the results.

using CairoMakie

# Create a range of time points for prediction
t_pred = 0:0.1:10

# Prepare a figure
figure_mixture = let f = Figure()

    # Plot the fitted component models
    predictions = predict(model, t_pred)

    COMPONENT_COLORS = [colorant"#c74300", colorant"#008aa1"]

    for (axi, variable) in enumerate([:y, :z])
        ax = Axis(f[1, axi], xlabel="Time", ylabel="Value", title="Temporal Mixture, variable: $variable")
        # Plot the original data points
        scatter!(ax, input_data[input_data[!,:var_name] .== String(variable),:time], input_data[input_data[!,:var_name] .== String(variable),:value]; color=:gray, markersize=2, label="Data - $variable")
        for k in axes(predictions[variable], 2)
            lines!(ax, t_pred, predictions[variable][:, k];
            label="Component $k - $variable", color=COMPONENT_COLORS[k], linewidth=2)

            # get the confidence bounds. This is still a bit clunky because there is no API for this yet
            lower_bound_parameters = confidence_intervals[k][variable][:lower]
            upper_bound_parameters = confidence_intervals[k][variable][:upper]

            model_lb = PolynomialRegression(2, lower_bound_parameters)
            model_ub = PolynomialRegression(2, upper_bound_parameters)

            y_lower = predict(model_lb, t_pred)
            y_upper = predict(model_ub, t_pred)

            # Plot the confidence intervals as shaded areas
            band!(ax, t_pred, y_lower, y_upper;
            color=(COMPONENT_COLORS[k], 0.3), label="Component $k - $variable")
        end
    end

    f
end
figure_mixture
Example block output