Because the present method iterates over smaller and smaller unaligned segments with a more constrained vocabulary and language model, the present method is better able to overcome noise and other difficulties in the audio stream, for example, audio streams where speech and music are overlaid.