Typically, the division into segments is done using a specially modified speech recognizer set to a forced alignment mode with some manual correction afterward, using visual representations such as the waveform and spectrogram.[11] An index of the units in the speech database is then created based on the segmentation and acoustic parameters like the fundamental frequency (pitch), duration, posit