Using Gesture Controls in FascinatE

Gesture recognition technologies are being widely applied to many applications related to the interaction between users and machines. There is a global tendency to replace external devices, such as remote controls, keyboards or mice, with device-less gesture recognition solutions. Indeed, the objective is to obtain device-less, but also marker-less, gesture recognition systems that allow users to interact as naturally as possible, providing a truly immersive experience.

Within the FascinatE project, the Universitat Politècnica de Catalunya (UPC) is working in providing seamless user interaction with the system by detecting and recognizing user gestures. Therefore, a user of the FascinatE system will be able to interact with it from his/her couch without the need of any external device. Several gestures are being investigated in order to allow the user to interact with the system. The gestures allow the user to perform simple interactions, such as selecting different channels on their TVs, to more innovative interactions such as automatically following their choice of players in a football match, or navigating through high resolution panoramic views of the scene.

A home setup consisting in a centered Time-Of-Flight (TOF) camera [1] and two lateral color cameras is proposed. In order to interpret user gestures, head and hands are tracked by exploiting TOF depth estimation. In such a con-text, an ellipse is resized to the estimated head size, depending on distance between the user and the camera. This ellipse is projected onto the TOF camera image plane, aiming to find the image zone which better matches the elliptical shape. A matching score is obtained at every image, the best score giving an estimate of the head position.

Knowing the camera parameters and the depth of the estimated position, a 3D estimate may be obtained. The size of the search zone is updated depending on the variance of the estimated head position and the matching score value. In a second step, a three dimensional bounding box is attached to the head position, in such a way that hands lie in the box when moved before the body. An estimate of the position of the hand(s) is obtained after segmenting and grouping the 3D points in the bounding box.

The head+hands tracking module performs in real time at more than 20 frames per second, enabling many interesting applications. In the figure below a user feed-back is presented, where the user can visualize the relative position of his/her hands on a TV screen. Eventually, the user could be able to point zones on the screen, navigate through menus or perform gestures to control some functionalities of FascinatE’s TV-based home system.

References

[1] A Kolb, E Barth, R Koch, and R Larsen, “Time-of-Flight Cameras in Computer Graphics,” Computer Graphics Forum, vol. 29, no. 1, pp. 141–159, 2010.

Home Technical Section Using Gesture Controls in FascinatE
credit
© FascinatE Project