FascinatE Rendering Node with ROI zoom and Gesture Control
A first terminal prototype, shown in the Fascinate stand at IBC 2011, demonstrates the capability of navigating within content captured by FascinatE selected sensors such as panoramic views. The demonstrator employs gesture recognition to simplify the interaction between the terminal and the end user. It is also focused in a home scenario where the end user interacts with the rendered content on a high-definition TV set.
The Universitat Politècnica de Catalunya (UPC) developed for this purpose a fast and robust head and hand tracking algorithm using depth information from a range sensor, allowing interactive and immersive applications. This functionality is used to control a real time rendering platform developed by researchers of Technicolor. This platform is configurable by scripts and provides Virtual Camera navigation with pan, tilt and zoom commands.
In order to interpret user gestures as means to navigate within a panorama, hands and heads are tracked by exploiting depth estimation. This process is includes modeling templates for heads and calculating an elliptical matching score. The template is resized depending on the distance the person is placed. For a given search zone a matching score provides head position probabilities and confidence values for position estimations.
For tracking the hands to understand the performed gestures, a workspace is defined as a 3D box, placed in relation to the detected head position. Within this 3D box, hands are detected by merging and filtering samples with similar size and depth information.
Finally, an empirical law relating the area of a surface in the image with its real world counterpart is obtained. A distinction of open or closed hands is obtained by segmenting the area of the detected hand. An example of all these steps is shown in Figure 1.
The variety of available end terminals require nowadays a format agnostic production to prepare the content best suited to all. FascinatE terminals and services will supply interactive, personalized visual perspective to enrich the user experience. Content navigation like pan, tilt and zoom allows the user a real immersive experience beyond simple channel switching. The scalable architecture of the rendering platform developed for FascinatE allows applications of different target terminals such as home theaters or smart phones.
An applied XML based scripting mechanism controls and scales visual rendering performed on camera clusters offering multiple regions of interest (see Figure 2). This supports automation of workflows and optimization of delivery channels. The visual rendering of such layered scenes into personalized perspectives on end user screens are performed by transformation from the circular panorama onto flat surfaces (figure 3). Additional effort is spend to place graphical elements for user information in relation to the selected region of interest and the display surface used for presentation.
In conclusion, the demonstrator presented at IBC 2011 is able to perform a fast (68fps) and robust hand and head tracking with an error of less than 6cm. The resulting smooth hand trajectories can be used for further gesture classification and analysis. This technology is applied to a real time capable terminal platform for pan, tilt and zoom navigation within a panoramic scene. An easy personalization by gestures is complemented by scripting support offering perspective options such as prepared region of interests.
You can see a video demo of the FascinatE rendering node below: