Basic Usage of TemporalMixtureModels.jl

The main functionality revolves around the fit_mixture function, which fits a temporal mixture model to the provided time series data, using the EM algorithm.

First, we need to import the package:

using TemporalMixtureModels, Random
Random.seed!(27052023)  # For reproducibility
Random.TaskLocalRNG()

Data Preparation

First of all, you'll need some data that is in the right format. The basic format for the data consists of three elements:

  • t: A vector of time points. This should be a one-dimensional array of real numbers representing the time axis for the time series data.
  • Y: A vector or matrix of observed values. Each row corresponds to a time point in t, and each column corresponds to a different outcome variable (for multivariate time series). For univariate time series, this can be a one-dimensional array.
  • ids: A vector of identifiers for each independent time series. The length of this vector should match the number of rows in Y and indicates which time series each observation belongs to.

Optionally, you can also provide:

  • inputs: A matrix of input variables (covariates) that may influence the observed values. Each row corresponds to a time point, and each column corresponds to a different input variable. Inputs are currently not used in the built-in component models, but the support is there for custom models.

Say, we have a dataset with blood pressure measurements over time for multiple patients while they have been given either a placebo or a drug. Our table of data might look like this:

idtimesystolic_pressurediastolic_pressuretreatment
10.012080drug
11.011878drug
12.011575drug
20.013085placebo
21.012883placebo
22.012580placebo
...............
432.011070placebo

For the examples in the manual, the package contains a function called example_bp_data() that can be used to generate a dataset like this.

TemporalMixtureModels.example_bp_dataFunction
example_bp_data(;n_subjects_drug=50, n_subjects_placebo=50, n_timepoints=5, rng::AbstractRNG=Random.GLOBAL_RNG)

Generate example blood pressure data for testing and examples.

Arguments

  • n_subjects_drug: Number of subjects in the drug group (default: 50)
  • n_subjects_placebo: Number of subjects in the placebo group (default: 50)
  • n_timepoints: Number of time points per subject (default: 5)
  • rng: Random number generator (default: Random.GLOBAL_RNG)

Returns

  • t: Vector of time points
  • y: Matrix of blood pressure measurements (systolic and diastolic)
  • ids: Vector of subject IDs
  • class_labels: Vector of class labels (1 for drug, 0 for placebo)

Example

using TemporalMixtureModels: example_bp_data
t, y, ids, class_labels = example_bp_data(n_subjects_drug=30, n_subjects_placebo=30, n_timepoints=4)
source

Let's generate the example blood pressure dataset:

t, y, ids, class_labels = example_bp_data(;n_subjects_drug=21, n_subjects_placebo=22, n_timepoints=5)
([0.0, 1.25, 2.5, 3.75, 5.0, 0.0, 1.25, 2.5, 3.75, 5.0  …  0.0, 1.25, 2.5, 3.75, 5.0, 0.0, 1.25, 2.5, 3.75, 5.0], [125.48572170434635 82.70734137033764; 103.06459556000243 71.3956312347923; … ; 109.19769689115391 71.89320129586659; 108.0637238450581 77.0854240014355], [1, 1, 1, 1, 1, 2, 2, 2, 2, 2  …  42, 42, 42, 42, 42, 43, 43, 43, 43, 43], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

We can quickly visualize the systolic blood pressure measurements over time for all subjects:

systolic_bp_plot

Missing Data

It may occur that some time series have missing values at certain time points. The current implementation of fit_mixture handles missing data by ignoring those specific observations during the fitting process.

This may for example also happen if you measure two variables that are not always both measured at the same time points. In that case, you can fill the missing values with missing and the fitting process will skip those values. TemporalMixtureModels.jl is designed to handle such cases gracefully.

An example of such a dataset, where we may measure systolic pressure at the start and after 1 hour, while we diastolic pressure at the start and at 2 hours, would look like this:

idtimesystolic_pressurediastolic_pressuretreatment
10.012080drug
11.0118missingdrug
12.0missing75drug
20.013085placebo
21.0128missingplacebo
22.0missing80placebo
...............
432.0missing70placebo

As long as we have enough data points overall, the fitting process will still work.

Fitting a Temporal Mixture Model

Once you have your data prepared, you can fit a temporal mixture model using the fit_mixture function. You need to specify the number of components (clusters) you want to fit, as well as the component model to use (e.g., PolynomialRegression). For now, we are interested in the systolic pressure only, so we will extract that column from Y for fitting.

component_model = PolynomialRegression(2)
num_components = 2  # For example, we want to fit 2 clusters
model = fit_mixture(component_model, num_components, t, y[:, 1], ids)
MixtureResult(PolynomialRegression(2), 2, [[119.34891353034865, -7.2868884094666875, 0.5764883810632598], [122.7913507650051, 0.2949062907715054, -0.08341428634647943]], [[94.96960000162001], [59.68091478886761]], [0.5259997562454423, 0.4740002437545577], [0.9999979104182807 2.0895817192977623e-6; 0.9999998337442878 1.6625571223359278e-7; … ; 0.0004310743373957102 0.9995689256626042; 0.9393909635620004 0.06060903643799958], -797.0558338777277, true, 30, NormalError())

This will fit a temporal mixture model with 2 polynomial regression components of degree 2 to the systolic blood pressure data.

The Polynomial Regression Component

The PolynomialRegression component model fits a polynomial regression of a specified degree to the time series data. In this example, we used a polynomial of degree 2, which means that each component will fit a quadratic curve to the data. You can choose different degrees based on the complexity of the temporal patterns you expect in your data.

Customizing the fitting process

The fit_mixture function provides several optional keyword arguments to customize the fitting process:

TemporalMixtureModels.fit_mixtureFunction
fit_mixture(component, n_components, t, y, ids;
            n_repeats=5, error_model=NormalError(), inputs=nothing, 
            max_iter=100, tol=1e-6, verbose=true, initial_assignments=nothing)

Fit a mixture model using the Expectation-Maximization (EM) algorithm. By default, fit_mixture will run 5 EM restarts and return the best fitting model based on log-likelihood.

Arguments

  • component: A Component model (e.g., PolynomialRegression(2))
  • n_components: Number of mixture components/clusters
  • t: Time points (vector)
  • y: Measurements (vector for single measurement, matrix for multiple)
  • ids: Subject identifiers (vector)

Keyword Arguments

  • n_repeats: Number of EM restarts (default: 5)
  • error_model: Error distribution (currently only NormalError() is supported, included for possible future extensions)
  • inputs: Additional input variables to be passed to the component model (default: nothing)
  • max_iter: Maximum EM iterations (default: 100)
  • tol: Convergence tolerance (default: 1e-6)
  • verbose: Print progress (default: true)
  • initial_assignments: Optional vector of initial cluster assignments (1 to n_components) for each subject (default: nothing, uses random initialization)

Returns

  • MixtureResult containing fitted parameters and cluster assignments, with fields:
    • component: The component model used
    • n_clusters: Number of clusters
    • parameters: Fitted parameters for each cluster
    • params_error: Estimated error model parameters for each measurement and cluster
    • cluster_probs: Mixing proportions for each cluster
    • responsibilities: Posterior responsibilities for each subject and cluster
    • loglikelihood: Final log-likelihood of the fitted model
    • converged: Boolean indicating if the EM algorithm converged
    • n_iterations: Number of iterations performed
    • error_model: The error model used

Examples

# Single measurement with normal errors
result = fit_mixture(PolynomialRegression(2), 3, t, y, ids)

# With initial cluster assignments
initial_clusters = [1, 1, 2, 2, 3, 3]  # subjects 1-2 in cluster 1, 3-4 in cluster 2, etc.
result = fit_mixture(PolynomialRegression(2), 3, t, y, ids; initial_assignments=initial_clusters)
source

Making Predictions

After fitting the model, you can make predictions for new time points using the predict function. You can specify the time points at which you want predictions, and the function will return the predicted values for each component.

new_time_points = 0:0.1:2.0
predictions = predict(model, new_time_points)

An example of the resulting mixture model is shown below:

systolic_bp_mixture_model