Human (communicative) body movements are often recurrent, such that they use similar shapes or trajectories to convey or act on something consistently. How can we make sense of the variability and consistency with which movement events are produced? One way to do this is by quantifying the similarities present between all the produced movement events. Once we have documented how similar each movement event is relative to all other events, we can then visualize and quantify the structure of these interrelationships and discover novel patterns that exist over the entire set of behaviors. For example, in previous work we have found that when communicative silent gestures have been practiced and communicated more, those silent gestures start to change their interrelationships in consistent ways (e.g., the kinematic space reduces in complexity) (see [3] Pouw, Dingemanse, Motamedi, Ozyurek, 2021).
Note that this particular procedure outlined here, is introduced in [1] Pouw & Dixon (2020) and we advise you to read this first before starting (or [3]). The main procedure: we compute some distance measure between two N-dimenionsal time series i and j, and do this for all events n. Similar to previous work [1] we will use dynamic time warping (see [2] Giorgino, 2009) as the distance calculation. We then compute what is called a distance matrix (M) which has n rows and n columns containing for each cell M[i,j] the distance score between movement i and j. Then we can use dimensional reduction techniques (e.g., classic multidimenionsal scaling, t-sne, UMAP, PCA) to visualize in 2-dimensional space the interrelationships between all movement events.
For the current movement events we use a set of silent gestures produced by an actor conveying certain concepts (open dataset by: [4][Ortega & Ozyurek, 2020). We extracted the movement information using mediapipe 3D body tracking for each silent gesture, and this is the input of our procedure. Note that the current procedure can be applied on any signal (e.g., acoustic trajectories) and also on multiple persons that are producing movement events.
Overview processing steps
This markdown file contains the following processing steps.
Create a distance matrix with dynamic time warping distances between events
Visualize the distance matrix in a 2D space
Some application: Do ratings of iconicity of a particular gesture depend on the interrelationships of that gesture?
The folder ‘Time_series’ is filled with mediapipe body tracking time series for each video. We extract from it, nose, elbow, index finger (x , y, z) info (for both hands). Figure 1 shows an example of the kinematic time series for the silent gesture “TO-SWIM” which is also shown below. The swimming gesture has a characteristic rhythmic movement, as reflected in the oscillating movements in the horizontal dimension for left and right index finger.
Figure 1. Example raw time series of SWIMMING for the right (red) and left hand (in blue)
MT <-read.csv(file.path(timeserfolder, "TO-SWIM_silentgesture.csv"))a <-ggplot(MT, aes(x = time)) +geom_path(aes(y = X_RIGHT_INDEX), color ="red") +geom_path(aes(y = X_LEFT_INDEX), color ="blue") +theme_bw()+ylab("horizontal\nposition normalized")ggplotly(a)
Main routine: constructing a distance matrix
The next step is to compute a average dynamic time warping distance between joint position traces. This means we are performing a dependent DTW, where x,y,z time series are submitted for DTW comparison for a gesture i and gesture event j, and we perform this procedure for each joint separately (this averaging of by-keypoint DTW comparisons is called an independent DTW; so we are combining an independent and dependent DTW version).
Script procedure 2. This loop makes a distance matrix
timeseriess <-list.files(timeserfolder)#########################FUNCTIONS#identify which columns contain the relevant variableskeypointsindex_right <-which(colnames(MT) %in%c("X_RIGHT_INDEX", "Y_RIGHT_INDEX", "Z_RIGHT_INDEX"))keypointsindex_left <-which(colnames(MT) %in%c("X_RIGHT_INDEX", "Y_RIGHT_INDEX", "Z_RIGHT_INDEX"))keypointselbow_right <-which(colnames(MT) %in%c("X_RIGHT_ELBOW", "Y_RIGHT_ELBOW", "Z_RIGHT_ELBOW"))keypointselbow_left <-which(colnames(MT) %in%c("X_LEFT_ELBOW", "Y_LEFT_ELBOW", "Z_LEFT_ELBOW"))keypointsnose <-which(colnames(MT) %in%c("X_NOSE", "Y_NOSE", "Z_NOSE"))#Summed DTWDTW.compare <-function(TS1, TS2){#perform the dependent (x,y,z) dynamic time warping for each keypoint seperately (indepedently) distancedtw <-mean(dtw(TS1[keypointsindex_right], TS2[keypointsindex_right])$normalizedDistance,dtw(TS1[keypointsindex_left], TS2[keypointsindex_left])$normalizedDistance,dtw(TS1[keypointselbow_right], TS2[keypointselbow_right])$normalizedDistance,dtw(TS1[keypointselbow_left], TS2[keypointselbow_left])$normalizedDistance,dtw(TS1[keypointsnose], TS2[keypointsnose ])$normalizedDistance)return(distancedtw)}#########################MAIN ROUTINE#initialize some variables#ncores: number of available cores for parallel processing--ideally one less than the#number of available physical cores (one core for other applications). #MacOS, SunOS and Windows provide this number reliably via detectCores(logical=FALSE)#Other systems: we assume 2 virtual cores per physical core to avoid resource-hogging, crashes, etc.#However, users with familiar with their processors should adjust ncores as they see fit ncores <-ifelse(Sys.info()["sysname"] %in%c("Darwin","SunOS","Windows"), detectCores(logical=FALSE)-1, detectCores()/2-1) numges <-length(timeseriess) #how many gestures (should be 109) catchnames <-substr(timeseriess,1,nchar(timeseriess)-4) #get vector of gesture IDs#initially, we neglect the first row (numges-1 rows). this row is by definition # a single zero followed by numges-1 NAs, and can be added more efficiently later distmat <-matrix(nrow=numges-1, ncol=numges) #108x109 matrix initially#load all time series into data frames for quick accessfor(tsi in1:length(timeseriess)) {assign(catchnames[tsi], read.csv(file.path(timeserfolder, timeseriess[tsi]))) }if (ncores >1) { #if at least two devoted cores are available, cl <-makeCluster(ncores) #set up a local cluster for parallel processingregisterDoParallel(ncores)#first just calculate lower triangular matrix (can be reflected across diagonal later, since matrix is symmetric)#create a parallel process for each row. On Windows, packages/data called by these procs must be named in the .packages/.export arguments distmat <-foreach(TS1index =icount(numges-1), .combine ="rbind", .packages=c("foreach", "iterators", "dtw"), .export = catchnames) %dopar%{#each parallel process produces a distance vector distvec <-foreach(TS2index =icount(TS1index), .combine ="cbind") %do%{#compute DTW. We start with "TS1index+1" because we are neglecting the first row (see above) TS1 <-get(catchnames[TS1index+1]) TS2 <-get(catchnames[TS2index]) dtwoutput <-DTW.compare(TS1, TS2) }#ensure correct dimensions by filling any remaining empty cells in distvec with NAs distvec <-cbind(distvec, t(rep(NA, numges-length(distvec)))) }stopCluster(cl) } else { #serial version: used if only one single devoted or shared core is available#loop through all combinations and save DTW score#first loop through rows#initially, we neglect the first row, so we start with "timeseriess[2]" (see above)for(ts1 in timeseriess[2:length(timeseriess)]) #loop through i { TS1index <-which(timeseriess ==ts1) #index of this gesture TS1 <-get(catchnames[TS1index]) #get first time series#loop through columns. End index is "TS1index-1" because DTW.compare(TS1index,TS1index) is by definition zero#so we can neglect the final column in every loop (the diagonal) and add the zeros laterfor(ts2 in timeseriess[1:TS1index-1]) { TS2index <-which(timeseriess ==ts2) #index of this gesture TS2 <-get(catchnames[TS2index]) #get second time series#compute DTW and save into matrix dtwoutput <-DTW.compare(TS1, TS2)#"TS1index-1" because first row is initially neglected distmat[TS1index-1,TS2index] <-dtwoutput } } }#add previously neglected first row to matrix distmat <-rbind(t(c(0, rep(NA, numges-1))), distmat) diag(distmat) <-0#set diagonal to zero: distance from self is by definition zero distmat <-pmax(distmat, t(distmat), na.rm = T) #reflect values across diagonalcolnames(distmat) <-catchnames #attach the names to the distance matrix#save the distance matrixwrite.csv(distmat,file.path(distancematfolder, "distance_matrix.csv"), row.names =FALSE)rm(list=catchnames) #remove timeseries dataframes from memory
Visualizing interrelationships between movement/gesture events
Now that we have produced a distance matrix containing all the dynamic time warping distances between the silent gestures, we can visualize how these gestures relate to one another. We do this by applying a dimensionality reduction technique to visualize the high-dimensional distance matrix into a 2D space using classic multidimensional scaling. We can now inspect some of the nodes. For example if we look up the gesture Accordian and Butterfly we see that both are judged by the current procedure to be very similar in nature as they are close in the kinematic space. Indeed, as seen in the gestures shown below these gestures are highly similar in nature.
distmat <-read.csv(file.path(distancematfolder, "distance_matrix.csv"))#plot distancesplot.space <-function(top, title,cluster1, sizepoint){ top <-cbind.data.frame(top, cluster1)colnames(top) <-c("Xpos", "Ypos", "grouping") pl <-ggplot(top, aes(x= Xpos, y = Ypos, color =grouping)) +geom_point(size= sizepoint, alpha=0.5)+theme(legend.position ="none") pl <-pl+theme_bw() +ggtitle(title) +xlab("D1") +ylab("D2")return(pl)}#plot the distance matrixtopology <-cmdscale(as.matrix(log(distmat+1))) #we do a log(x+1) transform because the distances are long-tailed distributed#plot the tsne ggplotly(plot.space(topology, "distances represented through classic mds", colnames(distmat),10))
Animated Visualisations
But we can do even better! We can position all gesture .gif files that we have in our Images folder, and position them according to the scaled coordinates of the multidimensional scaling applied to the distance matrix. In this way we can directly inspect whether the kinematic distances make sense when looking at the gesture.
library(stringr) #for string manipulationlibrary(scales) #for rescaling distmat <-read.csv(file.path(distancematfolder, "distance_matrix.csv")) #read in the distance matrixtopology <-as.data.frame(cmdscale(as.matrix(log(distmat+1)))) #multidimenionsal scaling of the logarithmically transformed distance matrixtopology[,1:2] <-apply(topology[,1:2], 2, FUN=function(x)scales::rescale(x, from=range(x), c(0, 1300))) #lets rescale the topology from 0 to 600 pixelsnams <-colnames(distmat)topology$nams <-stringr::str_replace_all(nams, "[.]", "-") #make sure names are the same as the video (so no "." but "-")todraw <- ""layer =1adjustright =300adjustvertical =-100#we are only showing 33% of the videos, otherwise it is very crowdedfor(nam in topology$nams[1:(length(topology$nams)/3)]){todraw <-paste0(todraw, paste0('<img src="Images/', nam, '.gif" style="position: absolute; top: ', round(topology[topology$nams==nam,1]+adjustvertical),'px; right:', round(topology[topology$nams==nam,2]+adjustright), 'px; width: 6%; height: auto; zindex:',layer,'; border: solid 1px #CCC"/>'))layer <-layer+1}fileConn<-file("animated_kinematicspace.html") #save to a separate html filewriteLines(todraw, fileConn)close(fileConn)
There are many different things that you can tweak with this general gesture procedure. For example, you can expand the gesture set to comprise gestures of multiple individuals (see gesture network module 2). Other things to consider are when you care about the general shape of a gesture but not its size, then you might want to normalize (e.g., z-scale) the time series first. Also consider, we now only take a few keypoints that do not capture much of the hand shape, so you can consider adding more keypoints. Further, what if you have a gesture consisting of multiple elements, and you want to judge another gesture as similar when it has the same elements regardless of the order of the items? Then you might not want to use dynamic time warping as it will judge a time series with elements a-b-c to be completely different from a time series with elements c-b-a (a possible time series distance measure that does judge such time series to be similar is matrix profile distance or MPdist). Further there are a range of settings that optimize dynamic time warping performance such as not allowing the distance error calculation to be based on the begin and trailing portions of the time series ([5] Silva et al 2016; for further references see e.g., Pouw & Dixon, 2020; Pouw et al., 2021).
Applications
This open dataset by Ortega and Oyzyrek (2020) also has information about how each gesture was rated for its iconicity (how well the gesture represents its target referent). Since we now have information about how each gesture relates to other gestures, we can for example ask whether gestures that are more differentiable from other gestures will also likely be seen as more iconic; after all, the gesture is less to be confused with others. We can do all kinds of more complex operations to characterize how a particular gesture relates to other gestures (see for example Pouw et al., 2021), but for now we will keep it simple. Namely, what is the average kinematic distance of a gesture (higher average distance = more differentiable?) and what was the mean iconic rating for that gesture. Below we show this in a plot. It seems that gestures that have generally higher kinematic distances between other gestures, are also are more likely to be found more iconic (i.e., more transparent in meaning).
diag(distmat) <-NA#lets ignore the diagonals#the distances are long-tailed distributed so we will logtransform it by log(x+1)distmatl <-log(distmat+1) #compute the average distance of a gesture with all othersavdist <-apply(distmatl, 2, FUN =function(x)mean(x,na.rm=TRUE))#collect the names of the gestures (ignoring the last bit of the string)nam <-substr(colnames(distmatl), 1, nchar(colnames(distmatl))-14)distsum <-cbind(nam, avdist) #combine in one matrix#merge our summarized distance data with the iconicity ratinsmergedc <-merge(iconicityratings, distsum, by.x="English", by.y="nam")mergedc$avdist <-as.numeric(mergedc$avdist) #make sure that avdist is a numeric variable #plot the relationggplot(mergedc, aes(x= MEAN, y = avdist))+geom_point(size=3)+geom_smooth(method="lm", size =3)+theme_bw()+xlab("iconicity")+ylab("average distance")