Speech technology
Many automatic voice services are built on speech technology. This is a field in the cross section of computer science, linguistics and phonetics where human speech is generated (speech synthesis) or understood (speech recognition). It is used in a large number of applications where humans want to use their voice and ears to speak to a computer in order to get or give information that is used by the computer as text.
Speech syntheses
By use of computer algorithms speech can be generated. Most often the original format of the information is text and the technology is then referred to as text-to-speech (TTS). TTS systems come in many different formats. Most commonly used are rule based TTS where the sound waves are produced purely by computer algorithms and concatenate speech synthesis where small parts of recorded speech is glued together and computer algorithms are used to find the best speech part and to make the transition between parts smooth.
Speech recognition
Automated speech recognition (or ASR) are used in order to transcribe what a human says in a microphone into text. It can then be used by the computer to understand things the user wants to do (commands) or just what kind of information the user is looking for (in a dialogue).
Dialogue systems
Or more correct: spoken dialogue systems, are computer programs that use TTS and ASR and algorithms (AI or other) to have a conversation with a user. It does everything that makes the dialogue work between the user and the computer besides the translation between sound and text.