Wrapping Snakes

A key step in the process of lip-reading is determining the shape of the speaker's lips. This has previously been achieved through an energy method known as "snakes", however this approach has some limitations. Here I present an adapted approach called wrapping snakes, where the image forces are modified based on the snake's location and orientation. This modification encourages wrapping snakes to continue along features they have already partially found, overcoming one of the problems of traditional snakes. The use of wrapping snakes allows for more accurate and robust lip segmentation, as well as increasing the speed of the segmentation.

Much of the recent work in lip segmentation has focused on deformable models [Chen98,Eveno04,Wu02]. These mathematical models use an energy function to fit a parameterised model to the image. One of the more promising approaches has been the use of snakes.

What are snakes?

Snakes are a series of connected points that are controlled by a mathematical model. They are a form of active contour model, which use an energy minimising spline that is guided by internal and external forces [Kass88]. The internal forces are due to the rigidity and tension of the spline, while the external forces are chosen to track the desired features.

Snakes were introduced by Kass in [Kass88], so for more information about them I would highly recommend reading this paper.

Wrapping snakes - my modification to traditional snakes.

Wrapping snakes are an adaptation of traditional snakes, where the image forces are modified based on the snakes location and orientation. This modification encourages wrapping snakes to continue along features they have already partially found, overcoming a major source of problems for traditional snakes.

What makes wrapping snakes different to traditional snakes?

The difference between wrapping snakes and traditional snakes is how the image force is used to modify the position of the snake. In wrapping snakes, instead of just using the image for at a given location, only the component of the force that is in the direction of the normal to the snake is used. I.e. the force will always act perpendicularly to the snake. This can be seen in Figure 1 below.

Calculating the image force.

Fig 1: Determining the wrapping force (red) from the image force (green) and the snake position (blue).

By modifying the image force, wrapping snakes have a number of advantages over tradional snakes. These include faster locating of features, increased robustness to strong noise and weak target features, being able to handle multiple enclosed regions, and successfully locating the desired feature even with very poor intial position. The following sections explain these characteristics further, in the context of locating lips for use with visual speech recognition.

Locating the lips faster

In situations where traditional snakes work well, wrapping snakes can locate the lips with fewer iterations. Figure 2 below shows the traditional snake taking 4 iterations to locate the lips, under ideal inital conditions, whereas Figure 3 shows the wrapping snake only taking 1 iteration to locate the lips from the same initial conditions. The wrapping snake is faster due to the characteristic of the wrapping force encouraging the snake to continue along a feature it has already partially found.

Traditional snakes under ideal conditions.

Fig 2: Traditional snake, under ideal conditions, takes 4 iterations to locate the lips.
(Blue: initial position; Green: intermediate positions; Red: final position.)

Wrapping snakes under ideal conditions.

Fig 2: Wrapping snake, under ideal conditions, takes only 1 iteration to locate the lips.
(Blue: initial position; Green: intermediate positions; Red: final position.)

Increased robustness to strong noise and weak target features

Unlike traditional snakes, wrapping snakes are still able to successfully locate the lips in the presence of strong noise and weak target features. The problem with traditional snakes is that they only have the two internal forces, tension and rigidity, to try to overcome the noise in the image forces. As you can see in Figure 4, if either of these are increased, the traditional snake will fall off the corner of the mouth before being pulled away from the noise itself (in this case the noise is due to the shadow under the nose).

As you can see, the wrapping snape doesn&quote;t have this problem, as the wrapping force can be used, together with the tension, to pull the snake off the noise and continue along the lip feature. This is possible because the wrapping force encourages the snake to continue along the features it has already found, and the tension is used to ensure the snake favours the feature it has enclosed by pulling it inwards (but not with sufficient force to pull it off the corner of the mouth).

Strong noise and weak target features.

Fig 4: Traditional snakes (left) cannot handle strong noise and weak target features, whereas wrapping snakes (right) can.

Multiple enclosed regions

When the snake encloses multiple regions, wrapping snakes really show their usefulness. When the intial position includes multiple enclosed regions, traditional snakes cannot locate any of the regions successfully (see Figure 5). Wrapping snakes, on the other hand, can successfully locate each feature. This is because the wrapping force allows the snake to continue along the partially found features, eventually wrapping completely around each.

To handle multiple regions, the snake needs to be split when it crosses itself, forming separate independent snakes. In this example, only the longer snake is shown, as it is the one we are interested in.

Multiple enclosed regions.

Fig 5: When the initial position encloses multiple regions, traditional snakes (left) fail to fully locate any of the features, whereas wrapping snakes (right) successfully locate the lips. (In this example, when the wrapping snake is split, only the longer snake is shown).

This characteristic of wrapping snakes is extremely useful in image segmentation, as it does not require the snake to be intialised perfectly around the lips, without enclosing any noise. In the case of lip segmentation, there is often noise close to the lips, primarily due to shadow under the nose and under the chin and neck region, but also some noise is still present on the cheeks due to imperfect lip-colour detection (the method I've used to generate the images that produce the image forces). As the shadow under the nose is extremely close to the lips, there is a high probability of it being enclosed within the initial snake position, along with any other noise nearby.

The ability of wrapping snakes to still successfully locate the lips under these conditions greatly improves it's robustness and the accuracy of the lip segmentation.

Summary of advantages of wrapping snakes

The use of the wrapping force encourages the snake to continue along partially found features. This accelerates the snake to converge on a feature, as well as increasing the robustness to noise and weak target features. Even with very poor initial position, wrapping snakes can still successfully locate the lips.

Wrapping snakes also handle multiple enclosed regions very easily. This is benefitial for visual speech recognition due to visual noise often being close to the lips, such as the shadow under the nose.

These features are all desired for improved lip segmentation for visual speech recognition.

My publications and resources on wrapping snakes

Here is a paper I wrote on wrapping snakes for the ICASSP09 conference: Wrapping Snakes for Improved Lip Segmentation, and it's associated poster.

Here is a presentation I gave at to my Mechatronic Engineering colleagues at Curtin University: Wrapping snakes (powerpoint presentation).

References

  • Tsuhan Chen and R. R. Rao, Audio-visual integration in multimodal communication, Proceedings of the IEEE, vol. 86, no. 5, pp. 837–852, 1998, 0018-9219.
  • N. Eveno, A. Caplier, and P. Y. Coulon, Automatic and accurate lip tracking, IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 5, pp. 706–715, 2004.
  • Zhilin Wu, P. S. Aleksic, and A. K. Katsaggelos, Lip tracking for mpeg-4 facial animation, in Fourth IEEE International Conference on Multimodal Interfaces, 2002. Proceedings., 2002, pp. 293–298.
  • M. Kass, A. Witkin, and D. Terzopoulos, Snakes: Active contour models, International Journal of Computer Vision, vol. 1, no. 4, pp. 321–331, 1988.