Practically, the real voice prosody extracting part 23 extracts the real voice prosody information that determines a manner of speaking such as a voice pitch, an intonation, a rhythm, and the like from the speech data output from the utterance input part 21.