Getting Started#
We’re going to demonstrate how to use hbMEP to estimate recruitment curves from MEP size data using a hierarchical Bayesian model. The model is based on a rectified-logistic function (Tyagi et al., 2024, see Methods 4.1.1, Eq. 4.1.4), that was introduced for estimation of threshold.
A Simple Example#
Note
This tutorial assumes that you are working on a Linux or MacOS system. If you are using Windows, you can use the Windows Subsystem for Linux (WSL), or update the bash commands and paths accordingly.
Begin by downloading the mock_data.csv
file using the following command.
wget https://raw.githubusercontent.com/hbmep/hbmep/docs-data/data/mock_data.csv
This file contains transcranial magnetic stimulation (TMS) MEP size data for 3 participants, with responses from two muscles (see columns PKPK_ECR and PKPK_FCR in the file). The responses are recorded in peak-to-peak values in millivolts (mV), and the intensity is recorded in maximum stimulator output (0-100% MSO).
Configuration file#
hbMEP uses a configuration file that contains all the necessary information to run the model.
Begin by creating a TOML file, config.toml
, and enter the following content.
[paths]
csv_path = "./mock_data.csv"
build_directory = "./mock_data_output/"
[variables]
intensity = "TMSIntensity"
features = ["participant"]
response = ['PKPK_ECR', 'PKPK_FCR']
[mcmc]
num_chains = 4
num_warmup = 1000
num_samples = 1000
thinning = 1
[misc]
base = 10
Here,
paths
table contains the paths to the dataset and the build directory.csv
is the path to the dataset in csv format. In the above config file, it points to the mock_data.csv file.build_dir
is the directory where hbMEP will store model artefacts, such as recruitment curve plots and parameter estimates. If this directory does not already exists, hbMEP will create it. If it already exists, hbMEP will do nothing to the content present in it.
variables
table contains the names of the columns in the dataset that are used by the model.intensity
is the column name that contains the TMS intensity.features
is a list of columns that uniquely identify the set of data points for a single recruitment curve. In this case, the model will yield recruitment curve for each participant.response
is a list of columns that contain the MEP size values. In this case, the columns PKPK_ECR and PKPK_FCR are included in this list and the model will be run on these two muscles.
mcmc
table configures the No-U-Turn sampler (NUTS) sampler.num_chains
is the number of chains to run in parallel.num_warmup
is the number of warmup samples.num_samples
is the number of samples to draw from the posterior.thinning
is the thinning factor for the samples.
Running the model#
Tip
If you have trouble running the commands in this tutorial, please copy the command and its output, then open an issue on the hbMEP repository on GitHub. We’ll do our best to help you!
import os
import pickle
import pandas as pd
from hbmep.config import Config
from hbmep.model.tms import RectifiedLogistic
from hbmep.model.utils import Site as site
Begin by building the model and reading the mock_data.csv
file.
# Build model
config = Config(toml_path="./config.toml")
model = RectifiedLogistic(config=config)
# Read data
df = pd.read_csv(model.csv_path)
print(f"Dataframe shape:\t{df.shape}")
print(f"Dataframe columns:\t{', '.join(df.columns.tolist())}")
print(f"All participants:\t{', '.join(df['participant'].unique().tolist())}")
n = 5
print(f"\nDataframe first {n} rows:\n\n{df.head(n=n).to_string()}")
Dataframe shape: (245, 4)
Dataframe columns: TMSIntensity, participant, PKPK_ECR, PKPK_FCR
All participants: P1, P2, P3
Dataframe first 5 rows:
TMSIntensity participant PKPK_ECR PKPK_FCR
0 43.79 P1 0.197 0.048
1 55.00 P1 0.224 0.068
2 41.00 P1 0.112 0.110
3 43.00 P1 0.149 0.058
4 14.00 P1 0.014 0.011
Now, we will process this dataset with it so that it’s usable by hbMEP.
# Process data
df, encoder_dict = model.load(df=df)
Before running inference, we can visualize the dataset.
# Plot data
model.plot(df=df, encoder_dict=encoder_dict)
The above command created a pdf dataset.pdf
and saved it to the build directory. As it can be seen, there are three participants with their MEPs recorded from two muscles.
Now, lets run inference on this data.
# Run inference
mcmc, posterior_samples = model.run_inference(df=df)
We can print the convergence diagnostics with the following command.
model.print_summary(samples=posterior_samples)
mean std median 2.5% 97.5% n_eff r_hat
H[0,0] 0.24 0.01 0.24 0.21 0.27 2302.41 1.00
H[0,1] 0.16 0.01 0.16 0.14 0.18 3441.27 1.00
H[1,0] 0.19 0.04 0.18 0.14 0.26 1853.05 1.00
H[1,1] 0.16 0.07 0.14 0.08 0.29 1617.89 1.00
H[2,0] 0.23 0.05 0.22 0.15 0.34 3450.24 1.00
H[2,1] 0.26 0.13 0.22 0.10 0.52 2259.82 1.00
H_scale 0.28 0.13 0.25 0.12 0.54 1015.16 1.00
L[0,0] 0.01 0.00 0.01 0.01 0.02 3529.07 1.00
L[0,1] 0.01 0.00 0.01 0.01 0.01 3712.93 1.00
L[1,0] 0.01 0.00 0.01 0.01 0.01 4051.42 1.00
L[1,1] 0.01 0.00 0.01 0.01 0.01 3993.32 1.00
L[2,0] 0.01 0.00 0.01 0.00 0.01 4709.97 1.00
L[2,1] 0.01 0.00 0.01 0.01 0.01 4185.50 1.00
L_scale 0.01 0.01 0.01 0.01 0.02 690.70 1.00
a[0,0] 31.91 0.46 31.97 30.77 32.64 702.71 1.00
a[0,1] 31.59 0.53 31.67 30.60 32.59 2652.48 1.00
a[1,0] 45.59 0.81 45.79 44.22 46.47 340.41 1.00
a[1,1] 44.84 0.80 44.96 43.19 46.19 2482.68 1.00
a[2,0] 30.08 0.38 30.15 29.32 30.69 2455.03 1.00
a[2,1] 31.17 0.67 31.11 30.01 32.52 3348.39 1.00
a_loc 35.30 5.77 35.77 23.62 46.19 1216.14 1.00
a_scale 11.09 6.88 9.10 3.93 24.66 1071.15 1.00
b[0,0] 0.17 0.05 0.17 0.09 0.28 2313.03 1.00
b[0,1] 0.22 0.11 0.19 0.07 0.42 2782.91 1.00
b[1,0] 0.09 0.05 0.08 0.02 0.22 405.55 1.00
b[1,1] 0.07 0.04 0.07 0.01 0.15 2347.51 1.00
b[2,0] 0.09 0.04 0.08 0.03 0.15 2109.42 1.00
b[2,1] 0.05 0.03 0.04 0.01 0.10 2771.50 1.00
b_scale 0.18 0.09 0.15 0.05 0.34 1295.98 1.00
c_1_scale 3.16 2.79 2.35 0.08 8.92 2959.09 1.00
c_2_scale 0.29 0.14 0.26 0.12 0.53 1018.04 1.00
c₁[0,0] 2.66 3.40 1.47 0.05 9.25 3508.38 1.00
c₁[0,1] 2.70 3.38 1.48 0.04 9.41 3436.13 1.00
c₁[1,0] 2.37 3.37 1.14 0.01 9.00 3046.44 1.00
c₁[1,1] 2.50 3.36 1.27 0.03 9.13 3193.02 1.00
c₁[2,0] 1.36 2.82 0.10 0.01 7.32 1883.38 1.00
c₁[2,1] 2.38 3.15 1.17 0.03 8.78 3381.21 1.00
c₂[0,0] 0.07 0.01 0.07 0.05 0.10 4110.79 1.00
c₂[0,1] 0.13 0.02 0.12 0.09 0.17 4951.67 1.00
c₂[1,0] 0.08 0.02 0.08 0.06 0.12 3143.90 1.00
c₂[1,1] 0.23 0.04 0.22 0.16 0.30 4193.70 1.00
c₂[2,0] 0.32 0.12 0.29 0.16 0.57 1906.25 1.00
c₂[2,1] 0.34 0.06 0.33 0.23 0.45 3524.25 1.00
ell_scale 8.72 6.14 7.47 0.15 20.76 2873.60 1.00
ℓ[0,0] 7.74 8.14 4.85 0.08 24.39 3434.41 1.00
ℓ[0,1] 7.65 8.33 4.79 0.05 24.38 3215.37 1.00
ℓ[1,0] 6.41 8.07 3.49 0.00 23.11 2196.32 1.00
ℓ[1,1] 6.74 8.27 3.95 0.01 22.86 3548.49 1.00
ℓ[2,0] 6.80 8.06 4.02 0.01 23.35 3589.39 1.00
ℓ[2,1] 6.71 7.86 3.96 0.02 22.54 3877.42 1.00
The different site names present in the first column of the above output correspond to model parameters (Tyagi et al., 2024, see Methods 4.1.1, Supplementary Fig. S3g,h)
Plotting the curves#
Now, we will plot the estimated recruitment curves. Before that, we need to use the estimated parameters and predict on a “template” prediction dataframe.
This prediction dataframe is similar to the input dataframe on which inference was run, except it has missing MEP size response values. Additionally, it has many more intensity values (controlled by the argument num_points=100
). These are the intensity values on which the model will predict the response values.
# Create prediction dataframe
prediction_df = model.make_prediction_dataset(df=df, num_points=100)
# Use the model to predict on the prediction dataframe
posterior_predictive = model.predict(
df=prediction_df, posterior_samples=posterior_samples
)
This returns the posterior predictive of the model. We can use this to plot the recruitment curves.
# Plot recruitment curves
model.render_recruitment_curves(
df=df,
encoder_dict=encoder_dict,
posterior_samples=posterior_samples,
prediction_df=prediction_df,
posterior_predictive=posterior_predictive
)
The above command created a pdf recruitment_curves.pdf
and saved it to the build directory.
As it can be seen in the pdf, the participants are aligned top to bottom, and the muscles are aligned left to right. The plots are colored by muscle name. For a given participant and muscle combination, the first plot shows the MEP size and stimulation intensity data, the second plot shows the estimated recruitment curve overlaid on top of the data, and the third plot shows posterior distribution of the threshold parameter.
The estimated curves look good, except for participant P1 and muscle FCR, which is indexed by the tuple (0, 1). The curve seems to be biased towards a few outliers. Later on, we will see how to tackle this using a mixture model (Tyagi et al., 2024, see Fig. 4, Methods 4.1.1).
Additionally, we can also plot the posterior predictive of the model using the following command.
# Posterior predictive check
model.render_predictive_check(
df=df,
encoder_dict=encoder_dict,
prediction_df=prediction_df,
posterior_predictive=posterior_predictive
)
The above command created a pdf posterior_predictive_check.pdf
and saved it to the build directory.
Accessing parameters#
The estimated parameters are stored in the posterior_samples
dictionary. The keys are the parameter names and the values are the samples from the posterior distribution. The samples are stored as a NumPy array. The samples can be accessed as follows:
# Threshold parameter
a = posterior_samples[site.a]
print(f"Shape of a:\t{a.shape}")
print(f"\nFirst dimension corresponds to the number of samples:\t\t{a.shape[0]}")
print(f"Second dimension corresponds to the number of participants:\t{a.shape[1]}")
print(f"Third dimension corresponds to the number of muscles:\t\t{a.shape[2]}")
Shape of a: (4000, 3, 2)
First dimension corresponds to the number of samples: 4000
Second dimension corresponds to the number of participants: 3
Third dimension corresponds to the number of muscles: 2
Note that there are 4000 samples because we ran 4 chains with 1000 samples each.
Additionally, we can reshape the array so that the first dimension corresponds to the chains and the second dimension corresponds to the samples. Although this is not necessary, it can be useful for some analyses.
# Reshape samples to group them by chain
a_grouped_by_chain = a.reshape(
model.mcmc_params["num_chains"],
-1,
*a.shape[1:]
)
print(f"Shape of a_grouped_by_chain:\t{a_grouped_by_chain.shape}")
Shape of a_grouped_by_chain: (4, 1000, 3, 2)
The other curve parameters can be accessed in a similar way.
Saving the model#
# Save the model
destination_path = os.path.join(model.build_dir, "inference.pkl")
with open(destination_path, "wb") as f:
pickle.dump((model, mcmc, posterior_samples,), f)
print(f"Model saved to:\t{destination_path}")
Model saved to: ./mock_data_output/inference.pkl
The saved model can be loaded and used for further analysis:
# Load the model
source_path = destination_path
with open(source_path, "rb") as f:
model, mcmc, posterior_samples = pickle.load(f)
Using other functions#
Alternatively, we can use other functions to estimate the recruitment curves. The available choices in hbMEP (Tyagi et al., 2024, see Results 2.3 and Fig. 3 for a comparison of their predictive performance to the rectified-logistic function, Methods 4.1, Eq. 4.1.1-4.1.3) are:
Logistic-4, also known as the Boltzmann sigmoid.
Logistic-5, which is a more generalized version of Logistic4.
Rectified-linear.
We recommend using Logistic-5 if estimating threshold is not the goal.
For example, to use Logistic-5 function, we need to import it and modify the code appropriately.
# Import Logistic5 model
from hbmep.model.tms import Logistic5
# Build model
config = Config(toml_path="./config.toml")
model = Logistic5(config=config)
References#
Vishweshwar Tyagi, Lynda M. Murray, Ahmet S. Asan, Christopher Mandigo, Michael S. Virk, Noam Y. Harel, Jason B. Carmel, and James R. McIntosh. Hierarchical Bayesian estimation of motor-evoked potential recruitment curves yields accurate and robust estimates. arXiv preprint arXiv:2407.08709, 2024. doi:http://doi.org/10.48550/arXiv.2407.08709.