Selecting, smoothing, and deriving measures from motion tracking, and merging with acoustics and annotations

Wim Pouw () & James Trujillo ()

2023-07-03


Info documents

Video Tutorial

Background

In multimodal analysis you will often encounter the situation that you have signals that are continuous but sampling at different rates, and such signals then also need to be related to qualitative codings such as ELAN gesture annotations, or trial information of your experiment. It is sometimes convenient to have everything you need in one time series file, so you can apply your multimodal analysis easily. A lot of the initial steps of even beginning to do some quantitative multimodal recording and analysis are covered in Pouw, Trujillo, Dixon (2020); here we provide some basic data wrangling steps that are often required to set up your dataset (e.g., merging data, smoothing data, extracting kinematic variables).

Set up folders and check data formats

For this module we will only demonstrate the steps for the cartoon retelling example that we have in our multimedia samples. For this sample we have already generated A) a motion tracking time series for a bunch of body keypoints sampling at 30Hz, B) an amplitude envelope time series of speech which was sampled at 100Hz. C) Then we also have annotations of the dominant hand (right handed gestures) that this person produced. So here we show a way to merge A, B, and C, in a way that is convenient for further analysis. Lets first identify the relevant files and set the relevant folders.

#When running this in Rmarkdown yourself: 
#first make sure to set "Session" -> "Set Working Directory" -> "To Source File Location"

#get current drive
curfolder <- getwd()
#Load in the motion tracking data
MT <- read.csv(paste0(curfolder, "/MotionTracking/video_cartoon.csv"))
#Load in the amplitude envelope
ENV <- read.csv(paste0(curfolder, "/AmplitudeEnvelope/audio_cartoon_ENV.csv"))
#load in the relevant annotations                                            
ANNO <- read.csv(paste0(curfolder, "/MultimediaAnnotations/annotations_cartoon.csv"))
#This is the folder where your merged output is saved                                                 
outputfolder <- paste0(curfolder, "/output/")

The data we have are the motion tracking data with 133 columns (body keypoints = 132.00, and sampling at 33.33 ms intervals.

head(MT[,1:4]) #lets select only the first 4 columns
##        time       X_NOSE    Y_NOSE     Z_NOSE
## 1   0.00000 -0.063733011 0.5028050 -0.3956979
## 2  33.33333 -0.043210629 0.5022403 -0.3946956
## 3  66.66667 -0.020162869 0.5035036 -0.3863501
## 4 100.00000 -0.008440451 0.5331317 -0.3519033
## 5 133.33333  0.002766718 0.5590019 -0.2622746
## 6 166.66667  0.011595617 0.5731958 -0.2162768

The data we have for the amplitude envelope is 2 columns, and sampling at 10.00 ms intervals.

head(ENV)
##   time_ms env
## 1      10   0
## 2      20   0
## 3      30   0
## 4      40   0
## 5      50   0
## 6      60   0

And the annotations of gestures of the right hand, with begintime, endtime, and annotation information. In total we have 36 annotations, and three columns.

head(ANNO)
##   Begin.Time...msec End.Time...msec annot_gesture_right
## 1              7775            8255                BEAT
## 2             12885           13705                 REP
## 3             14265           14835                BEAT
## 4             14975           15655                 REP
## 5             16645           17355                 REP
## 6             21005           21685                 REP

Select MT and merge with acoustic envelope data

Lets not merge all data. For the motion tracking output we generated for the cartoon video we are now only interested in some specific body part; say we are interested in the right hand index finger traces only. Lets select them first.

selection <- c("time", "X_RIGHT_INDEX", "Y_RIGHT_INDEX" ,"Z_RIGHT_INDEX") #concatenate some variable names in a vector "selection"
MTs <- MT[,selection] #select all column positions of which the names are also in the selection variable and save to a new object called MTs

This selection of the motion tracking data we want to then align with the acoustic data. We use the R’s native ‘merge’ function for this, and we align the acoustic and motion tracking data based on their common information (namely time in milliseconds). We do we want to make sure that we keep information from both objects, instead of only aligning when one and the other has a value (we therefore set the arugment ‘all’ to ‘TRUE’).

merged <- merge(x=MTs, y = ENV, by.x = "time", by.y = "time_ms", all=TRUE)
head(merged)
##       time X_RIGHT_INDEX Y_RIGHT_INDEX Z_RIGHT_INDEX env
## 1  0.00000     0.2304090   -0.01144398    -0.3587353  NA
## 2 10.00000            NA            NA            NA   0
## 3 20.00000            NA            NA            NA   0
## 4 30.00000            NA            NA            NA   0
## 5 33.33333     0.2277382   -0.03190942    -0.3322890  NA
## 6 40.00000            NA            NA            NA   0

We can see that while we have ordered and aligned the two objects in a single merge object, we have a lot of empty Non-Applicable (NA) rows. This is because at the exact times the sample is taken for the amplitude envelope there is not a sample for motion tracking. The solution is to linearly interpolate and upsample your data. We will do this by approximating for each NA for motion tracking sample what its value would be given that it is at time x and we know the values at a particular time before and after. The function na.approx from library(zoo) allows us to do this, by stating what vector you want to interpolate NA’s for (e.g., X_RIGHT_INDEX), given some information about the time (x = time). We can leave NA’s that we aren’t able to interpolate, e.g., if your merged time series ends with NA’s we cant interpolate because we dont have values we can use for interpolation (so we set na.rm=FALSE).

library(zoo)
merged$X_RIGHT_INDEX <- na.approx(merged$X_RIGHT_INDEX, x = merged$time, na.rm=FALSE)
merged$Y_RIGHT_INDEX <- na.approx(merged$Y_RIGHT_INDEX, x = merged$time, na.rm=FALSE)
merged$Z_RIGHT_INDEX <- na.approx(merged$Z_RIGHT_INDEX, x = merged$time, na.rm=FALSE)
#if you want to do this in one line of code, you can just do this:
#merged[,2:4] <- apply(merged[,2:4], 2, FUN = function(y) na.approx(y, x = merged$time, na.rm=FALSE))

head(merged)
##       time X_RIGHT_INDEX Y_RIGHT_INDEX Z_RIGHT_INDEX env
## 1  0.00000     0.2304090   -0.01144398    -0.3587353  NA
## 2 10.00000     0.2296077   -0.01758361    -0.3508014   0
## 3 20.00000     0.2288065   -0.02372324    -0.3428675   0
## 4 30.00000     0.2280053   -0.02986288    -0.3349336   0
## 5 33.33333     0.2277382   -0.03190942    -0.3322890  NA
## 6 40.00000     0.2214939   -0.02437886    -0.3254713   0

We are now almost done with the merging of acoustics and motion tracking. First, we should note, that there is an important reason why we choose to upsample the motion tracking data from 30Hz to 100Hz, and this is because we would be loosing information if we would downsample the amplitude envelope from 100Hz to 30Hz. Now that we have upsampled the motion tracking data, we can just go ahead and only keep information where we both have info from the amplitude envelope and info from motion tracking; this will yield a time series object with steadily samples at 100Hz with original data points for the amplitude envelope, and interpolated and upsampled values for the motion tracking (we discard the original samples from the motion tracking).

## Lets only keep information at the original sampling rate of the amplitude envelope (so exclude envelope NA's rows)
merged <- merged[(!is.na(merged$env)),]
merged <- na.trim(merged) #also remove trailing Na's

Inspecting data, deriving some motion tracking measures, and applying smoothing

So we now have a ‘merged’ data file that contains fully time aligned data about movement and acoustics. Our first multimodal time series object! Lets do some plotting of the amplitude envelope against the position traced we have of the index finger for an arbitrary 5 second sample.

library(ggplot2)
library(plotly)

a <- ggplot(merged, aes(x=time))+geom_path(aes(y=env))+xlim(13000,16000)+theme_bw()
b <- ggplot(merged, aes(x=time))+geom_path(aes(y=Z_RIGHT_INDEX), color = "red")+geom_path(aes(y=Y_RIGHT_INDEX), color = "gold")+geom_path(aes(y=X_RIGHT_INDEX), color = "blue")+xlim(13000,16000)+ylim(-0.5, 0.05)+theme_bw()
subplot(ggplotly(a), ggplotly(b), nrows=2) #we use ggplotly here as it is creates some nice interactive graphs

Smoothing

One thing that you will run into when using motion tracking data, especially when using video based motion tracking data, is that you will have noise-related jitter in your time series. At some times such noise maybe minimal, e.g., when using very accurate device-based motion tracking devices. But in other cases, you will see that there are sudden jumps or kinks from timepoint to timepoint due to tracking inaccuracies (that can be caused by occlusions, or not ideal lighting, camara position changes, etc.).

It is good therefore to apply some smoothing to the position traces of your motion tracking data, as well as any derivatives that are approximated afterwards (e.g., 3D speed, vertical velocity). You can for example apply a low-pass filter, whereby you try to only allow fluctuations that have a slow frequency change (gradual changes from point to point) so as to filter out (i.e., reduce the amplitude of) the jitter that occurs at very high frequencies (because they result in sudden changes from point to point). Note that when using low-pass filters there can be some time shift, so in that case it is good to undo that shift by running the smoothing forwards and backwards (we do this by using signal::filtfilt). This is important if you care about precise temporal precision relative to some other timeseries for example (e.g., acoustics).

We can also use a different kind of smoothing filter where the cut-off frequency is less strictly defined such as a running weighted average or gaussian filter, such that sudden changes in the time series are smoothed out by relating them in some weigthed way to the neighboring data points. Below we use such a neighbor averaing filter called a Kolmogorov-Zurbenko filter.

So we provide two filters, and we also show some differences in settings. The amount of possible filters is immense. To pick one filter without becoming an expert on filters (but see [2] chapter 4 for a really nice resource to read up on this), you can try a few and then assess it with your own data and see how a filter is capturing the variability you are interested in. For example if you care about small amplitude fluctuations in your time series that might have to do with jerky movement, a heavy filter can potentially destroy this variability. A too weak of a filter might leave the signal riddled with noise, which may look acceptable, but can cascade into dramatic noisy estimates when taking for example derivatives (e.g., speed -> acceleration -> jerk). There are also ways to empirically assess what high frequency noise is in the data, given that noise generally has a particular random structure (see [2]).

#Butterworth filter function
  #take some time series, apply the filter, then return it
library(signal)
butter.it <- function(x, samplingrate, order, lowpasscutoff)
{bf <- butter(order,lowpasscutoff/(samplingrate/2), type="low") #normalized frequency (cutoff divided by the nyquist frequency)
x <<- as.numeric(signal::filtfilt(bf, x))} #apply forwards and backwards using filtfilt

#Kolmogorov-Zurbenko filter
  #take some time series, apply the filter, then return it
library(kza)
kolmogorov.it <- function(timeseries, span, order)
{timeseries <- kza(x= timeseries, m=span, k = order)
return(timeseries$kza)}

#apply Butterworth to our temporay dataset
tempmerge <- merged #just copy our merged data
tempmerge$Z_RIGHT_INDEXlowpass10 <- butter.it(tempmerge$Z_RIGHT_INDEX, samplingrate = 100, order = 1, lowpasscutoff = 10) #add a heavy filter
tempmerge$Z_RIGHT_INDEXlowpass30 <- butter.it(tempmerge$Z_RIGHT_INDEX, samplingrate = 100, order = 1, lowpasscutoff = 30) #add a milder filter

#plot the time series with the original, and the low pass filtered signals
p1 <- ggplot(tempmerge, aes(x=time))+geom_path(aes(y=Z_RIGHT_INDEX))+
                                    geom_path(aes(y=Z_RIGHT_INDEXlowpass10), color = "red", alpha=0.5)+
                                    geom_path(aes(y=Z_RIGHT_INDEXlowpass30), color = "purple", alpha=0.5)+
  xlim(14000,16000)+ggtitle("zero-lag butterworth, 10 Hz cutoff (red), 30 Hz cutoff (purple)")+theme_bw()+ylim(-0.45, -0.2)
ggplotly(p1)
#apply kolmororov zurbenko
tempmerge <- merged 
tempmerge$Z_RIGHT_INDEXs42 <- kolmogorov.it(timeseries = tempmerge$Z_RIGHT_INDEX, span = 4, order = 2) #add a heavy filter
tempmerge$Z_RIGHT_INDEXs46 <- kolmogorov.it(timeseries = tempmerge$Z_RIGHT_INDEX, span = 4, order = 6) #add a milder filter
#repeat a similar procedure as above
p2 <- ggplot(tempmerge, aes(x=time))+geom_path(aes(y=Z_RIGHT_INDEX))+
                                    geom_path(aes(y=Z_RIGHT_INDEXs42), color = "red", alpha=0.5)+
                                    geom_path(aes(y=Z_RIGHT_INDEXs46), color = "purple", alpha=0.5)+
  xlim(14000,16000)+ggtitle("kolmogorov zurbenko, span = 4, order 2 (red), span = 4, order 6 (purple)")+theme_bw()+ylim(-0.45, -0.2)
ggplotly(p2)

So in the above figures you can zoom in see the differences between different kind of filter settings. Our Mediapipe tracking sample is already quite smooth, but in our experience video based motion tracking can be quite jittery and smoothing is crucial then. For now we will apply for all our motion tracking data the 2nd order zero-phase butterworth filter at a frequency cutoff of 30Hz.

to_smooth <- c("X_RIGHT_INDEX", "Y_RIGHT_INDEX" ,"Z_RIGHT_INDEX") 
merged[,to_smooth] <- 
  apply(merged[,to_smooth], 2, 
        FUN = function(x) butter.it(x, samplingrate = 100, order = 2, lowpasscutoff = 30))

Computing speed and acceleration (and smoothing again)

We now already have only for one body joint three variables to describe its position. Sometimes we are only interested in a 1-dimensional signal, such as rate of movement in a particular dimension (e.g., vertical velocity), or the rate of movement in any direction (3D speed), or the change of 3D speed (acceleration). Below some simple examples of how to compute this from your initial position data using forward differientiation.

#FUNCTION FOR GETTING SPEED VECTOR per 1000ms time
speedXYZ.it <- function(x,y,z, time_millisecond)
{
  #calculate the Euclidean distance from time point x to timepoint x+1, for 3 dimensions
  speed <- c(0, sqrt( rowSums( cbind(diff(x)^2, diff(y)^2,diff(z)^2)) ))
  #smooth the result
  speed <- butter.it(speed, samplingrate = 100, order = 1, lowpasscutoff = 30)
  #scale the speed vector so that we express it units change per second change
  speed <<- speed/(c(1, diff(time_millisecond))/1000)  
}

#derive the change of x and smooth
derive.it <- function(x){
  butter.it(c(0, diff(x)), samplingrate = 100, order = 1, lowpasscutoff = 30)}

#calculate speed
merged$speed <- speedXYZ.it(merged$X_RIGHT_INDEX, merged$Y_RIGHT_INDEX, merged$Z_RIGHT_INDEX, merged$time)
merged$vertical_velocity <- derive.it(merged$Y_RIGHT_INDEX) #in mediapipe the vertical dimenions is Z
#calculate acceleration
merged$acceleration <- derive.it(merged$speed)
merged$jerk <- derive.it(merged$acceleration)

#repeat a similar procedure as above
sc.it <- function(x) {scale(x, center = FALSE)}  #lets scale all the variables, without centering
p <- ggplot(merged, aes(x=time))+geom_path(aes(y=sc.it(speed))) +
                                    geom_path(aes(y=sc.it(vertical_velocity)), color = "blue", alpha=0.5)+
                                    geom_path(aes(y=sc.it(acceleration)), color = "red", alpha=0.5)+
                                    geom_path(aes(y=sc.it(jerk)), color = "purple", alpha=0.5)+geom_hline(yintercept=0, color = "gold")+
  xlim(14000,16000)+ggtitle("kinematic variables")+theme_bw()+ylim(-3.5, 3.5)
ggplotly(p)

Note. Notice for the figure that vertical velocity can go positive (upward rate of motion) or negative (downward rate of motion); it is a vector as the values have a direction. This is similar for acceleration (positive = acceleration, negative = deceleration) and jerk (positive = increasing acceleration, negative = decreasing acceleraiton). Speed is a scalar value and does not go negative.

Adding annotations and saving data

We now have merged the acoustic data with an upsampled selection of the motion tracking data, and we have derived some kinematic variables and added them to our time series dataset. The only thing that is left, is to merge the annotations with this dataset. The following code loads in the annotations (with begintime, endtime, and annotation) in the time series data. We also add an identifier to the gesture coding, such that each gesture event gets a unique identifier.

#make a new gesture ID for each annotation using the original begin and end times
ANNOID <- cbind(ANNO[,1:2], paste0("GID_", c(1:nrow(ANNO)))) 

##this function loads into a time series an annotation dataframe with columns (begintime, endtime, annotation)
load.in.event <- function(time_original, anno)
{
  output <- rep(NA, length(time_original))
  for(i in 1:length(anno[,1])) #loop through each annotation event
  {
    #print(anno[i,3]) #you can print the value to check if this is running correctly
    output <- ifelse((time_original >= anno[i,1] & time_original <= anno[i,2]), as.character(anno[i,3]), output)
  }
  return(output)
}

#make a function that loads in a 3 column annotation file, into a timeseries object
merged$gesture_type <- merged$gesture_ID <- NA #initialize two new variables in our dataframe
merged$gesture_type <- load.in.event(time_original = merged$time, anno = ANNO) #apply function
merged$gesture_ID   <- load.in.event(merged$time, anno = ANNOID) #apply function

#lets save the data now we have everything merged
write.csv(merged, paste0(outputfolder, "mergedMM1.csv"))

Some applications

So this new merged dataset is very flexible, in that we can for example, loop through relevant events and extract information from it. For example in the below lines of code, we ask for each gesture what the peak speed was, and then also the maximum in the positive change in amplitude envelope, and then plot the association between these for each gesture type, to see how they scale.

peak analysis

maxs <- maxenv <- g_type <- vector() #initialize some variables that we want to fill
for(id in unique(merged$gesture_ID)) #loop through gesture ids in our data
{
  if(!is.na(id)) #ignore NA events
  {
  indices <- which(merged$gesture_ID==id) #check which block of data has this gesture id
  maxs <- c(maxs, max(merged$speed[indices])) #fill the maxvc with a maximum vertical velocity for these indiced
  maxenv <- c(maxenv, max(diff(merged$env[indices])))#fill the maxenv with a maximum amplitude envelope for these indices
  g_type <- c(g_type,unique(merged$gesture_type[indices]))#also retrieve the gesture type for this ID
  }
}
mag_D <- cbind.data.frame(maxs,maxenv, g_type) #merge vectors as columns into a dataframe
a <- ggplot(mag_D, aes(x=maxs, y = maxenv, color = g_type))+ geom_point()+geom_smooth(method = "lm")+theme_bw()+facet_wrap(.~g_type, scales ="free")+
  ylab("peak amplitude envelope")+xlab("peak speed")#plot
ggplotly(a)