Processing output for 3D single person full-body tracking with FrankMocap

Wim Pouw ( wim.pouw@donders.ru.nl )
James Trujillo ( james.trujillo@donders.ru.nl )
18-11-2021

Info documents

This module provides a demonstration of how to extract a time series from FrankMocap output. FrankMocap is a heavy-duty GPU-based single-person 3D motion tracking procedure that takes as input from 2D video and approximates 3D joint postions. So for it to run it (smoothly) you need a (strong) GPU (i.e., a good video card); see repository for specifications.

FrankMocap is special in that it is dedicated for 3D motion tracking from 2D video. Where the lightweight Mediapipe body and hand tracking also provides some depth information, FrankMocap is specifically optimized for estimating as best as possible the 3D joint positions (and rotations) from 2D video for the hands and the body. See the main FrankMocap paper in the resources for performance indicators. Note that FrankMocap only allows for tracking one person, and it will actually give an error if there are more than 1 person in the frame.

The installation of frankmocap can be quite cumbersome, and the procedure depends on your operating system (for a description see the FrankMocap repository linked below). We have installed frankmocap on a windows machine following the exact instructions of this helpful tutorial https://www.youtube.com/watch?v=MYLMM7jOMS4. If your institute has such resources you can ask your IT department to assist in the installation.

But even when the installation is completed, the output you can then generate is not yet a time series and a tracking video that we can work with. FrankMocap generates only single video frames with the pose projected onto the frames, and per frame a .pkl file that contains tracking information. This python script therefore assists in this last step to transform the frankmocap output into workable time series data and create a video from the frames. We will extract the joint positions, but not that the script can be adapted if you want the 3D joint rotation angles instead.

Executed code in conda command prompt

We ran the full body tracking (hands and body) on the sample video to produce the motion tracking output by frankmocap.

"python -m demo.demo_frankmocap --input_path ./sampledata/TanDun_youtube_conductorexample.mp4 --out_dir ./mocap_output --save_pred_pkl"



resources

  • FrankMocap Github Repository: https://github.com/facebookresearch/frankmocap

  • Rong, Y., Shiratori, T., & Joo, H. (2020). FrankMocap: Fast monocular 3D hand and body motion capture by regression and integration. arXiv preprint arXiv:2008.08324.
In [1]:
from IPython.display import HTML

HTML('<iframe width="935" height="584" src="https://www.youtube.com/embed/mw8RymohMp0?start=8668" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>')
C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\display.py:717: UserWarning: Consider using IPython.display.IFrame instead
  warnings.warn("Consider using IPython.display.IFrame instead")
Out[1]:
In [4]:
import pickle #for opening pkl files
import cv2 #for video processing functions
import os
import numpy as np
import pandas as pd
import csv

#files that contains predicted points
framefol = "../frankmocap_output/mocap/"
framedata = os.listdir(framefol)

#files that contain the images
renderfol = "../frankmocap_output/rendered/"
renderdata = os.listdir(renderfol)

#folder for saving the rendered video to
videofold = '../MTVideos/'

#folder for saving the timeseries to
outtsfol = '../OutputTimeseries/'

Lets first render the frames that were produced by frankmocap into a motion tracking video and save it to the MTVideos folder.

In [5]:
#check the frame properties
images = renderdata
checkpixels = cv2.imread(renderfol + images[0])
height, width = checkpixels.shape[:2]
print(width, height)

# choose codec according to format needed
fourcc = cv2.VideoWriter_fourcc(*'MP4V') 
video = cv2.VideoWriter(videofold +'MTvideo.mp4', fourcc, 29.97, (width, height))

#loop through the images save them to a video
for i in images:
    img = cv2.imread(renderfol + i)
    video.write(img)

#release
print("done")
cv2.destroyAllWindows()
video.release()
1280 480

We will now extract from the pkl files all the tracked body points. We will only do this for the upper body, as the rest was not visible. You can change the script of course to extract all data.

In [6]:
#each left or right hand has
#pred_joints_smpl
#faces
#bbox_scale_ratio
#bboc_top_left
#pred_camera
#img_cropped
#pred_hand_pose -> this gives the joint angles
#pred_hand_betas 
#pred_joints_img*** -> this gives the joint positions (rescaled for camera)
#****we will extract the rescaled joint positions

#https://github.com/facebookresearch/frankmocap/issues/31
#in body+hand mode the body points are the same as openpose
markerspositionhand = ['Wrist', 'Thumb_00', 'Thumb_01', 'Thumb_02', 'Thumb_03', 'Index_00', 'Index_01',
                'Index_02', 'Index_03', 'Middle_00', 'Middle_01', 'Middle_02', 'Middle_03', 'Ring_00',
                'Ring_01', 'Ring_02', 'Ring_03', 'Little_00', 'Little_01', 'Little_02', 'Little_03']

#lets make a list of the variables with x,y,z coordinates
markerhands = []
for i in ['L_', 'R_']:
    for j in markerspositionhand:
        for k in ['x_', 'y_', 'z_']:
            markerhands.append(str.lower(k + i + j))

#for upper body we only want these points and they for position they follow openpose       
markerspositionbody = ['Nose', 'Neck', 'Rshoulder', 'Relbow', 'RWrist', 'LShoulder',
               'LElbow', 'LWrist', 'Midhip', 'RHip', 'RKnee', 'RAnkle', 'LHip']
              #'LKnee', 'LAnkle', 'REye', 'LEye', 'REar', 'LEar', 'LBigToe', 'LSmallToe', 
              # 'LHeel', 'RBigToe', 'RSmallToe', 'RHeel', 'Background']

markerbody = []
for j in markerspositionbody:
    for k in ['x_', 'y_', 'z_']:
        markerbody.append(str.lower(k + j))
In [10]:
columns = markerhands+markerbody #these are your column names

#make a time series by concatenating row information
timeseries = []
timeseries.append(columns) #the first is a list with columns
for frame in framedata:
    with open(framefol+frame, 'rb') as f:
        data = pickle.load(f)    
    timsdat = data['pred_output_list']
    timsdat = pd.DataFrame(timsdat)
    #extract body joint positions
    leftjoints = np.ndarray.tolist(timsdat['pred_lhand_joints_img'][0])
    rightjoints = np.ndarray.tolist(timsdat['pred_rhand_joints_img'][0])
    bodyjoints = np.ndarray.tolist(timsdat['pred_body_joints_img'][0])
    #get everything in a flat list
    lj = []
    rj = []
    bj = []
    for i in leftjoints:
        lj.extend(i)
    for i in leftjoints:
        rj.extend(i)
    it = 0
    for i in markerspositionbody:
        bj.extend(bodyjoints[it])
        it+=1
    #append to the the list
    timeseries.append(lj+rj+bj)

print('done! Now saving to file')
####################################################### data to be written row-wise in csv fil
data = timeseries
  
# opening the csv file in 'w+' mode
file = open(outtsfol + 'timeseries.csv', 'w+', newline ='')
#write it
with file:    
    write = csv.writer(file)
    write.writerows(data)
print('done! Timeseries saved')
done! Now saving to file
done! Timeseries saved