An idea I've been thinking over for having a skeletally animated mesh with the ability to lip-sync to speech.
- Imagine a human ragdoll mesh. The head can be represented by a single body. If we remove the head (but nothing else, not even the neck), we still have a fully functioning ragdoll. It is important to note that the neck joint to which the head was attached should not be removed. Let us call the neck joint "NTAG".
- Now imagine that the head mesh that we have removed is stored as a separate file in MD3 format. This MD3 file contains a single "tag", which is also called "NTAG".
- When the ragdoll is loaded, the head mesh is loaded too and is rejoined with the body by linking the MS3D joint "NTAG" and the MD3 tag "NTAG".
- Well, what have we achieved? Not much - we have the same model as before, only now the head is stored as a frame-by-frame model while the body is stored as a ragdoll.
- Next step: For the MD3 head model, create a separate animation frame for every sound the human mouth can make. Each frame should be named appropriately (eg, the animation frame of the head making an "eeh" sound should be named "eeh", etc).
- Using SDL_mixer, load some speech sounds into the project.
- Hook up an SDL_mixer sound channel to an output preprocessor function. This allows the user to access any raw sound data for that channel just before it is played.
- Analyse the speech WAV as it is played using this callback function. Whenever an "ahh" sound is played, set the MD3 head mesh to the frame named "ahh", etc.
- To improve the effect, use linear interpolation for the MD3 frame transitioning to smooth the mouth movement and make it look more natural.