On December 9th, 2013, the 15th ACM International Conference on Multimodal Interaction announced the result of ChaLearn Multi-modal Gesture Recognition Challenge in Sydney and the team from National Laboratory of Pattern Recognition, which was led by Professor Hanqing Lu and Associate Professor Jian Cheng, won the championship.
For the user-independent gesture spotting and recognition problem in continuous data stream, this team proposed a high-speed gesture recognition approach, which fused the information from both audio and skeleton data provided by the Kinect sensor. In the final evaluation stage of the competition, the recognition error rate of the proposed method is only 12.756%, which reduced the error rate by 17%, comparing with that of the team in the 2nd position.
The dataset used in the ChaLearn Multi-modal Gesture Recognition Challenge is collected by the Microsoft Kinect sensor, and the available modalities include color image, depth image, skeleton and audio data. The total data length is nearly 24 hours, consisting of 13,858 gesture instances, which are performed by 27 individuals. This dataset establishes a benchmark for the development and evaluation of user-independent multi-modal gesture spotting and recognition algorithms. For each team, it is required to train their models with a subset of the full dataset, and then submit their final recognition models. The final evaluation is performed by the competition organizers on another unpublished subset for ranking.
ChaLearn Challenges has been focused on Kinect-based gesture spotting and recognition task since 2011, and each year’s competition attracts more than 50 teams from various nations. These competitions greatly boost the comparison and integration of different algorithms and are believed to have far-reaching effects to both industrial and academic circles.