Hoy, M. B. Alexa, Siri, Cortana, and More: An introduction to voice assistants. Med. Ref. Serv. Q. 37, 81–88 (2018).
Google Scholar
Olmstead, K. Nearly half of Americans use digital voice assistants, mostly on their smartphones. Pew Res. Cent. (2017).
Plummer, D. C. et al. ’Top Strategic Predictions for 2017 and Beyond: Surviving the Storm Winds of Digital Disruption’ Gartner Report G00315910 (Gartner. Inc, 2016).
Fernald, A. Meaningful melodies in mothers’ speech to infants. in Nonverbal Vocal Communication: Comparative and Developmental Approaches, 262–282 (Cambridge University Press, 1992).
Grieser, D. L. & Kuhl, P. K. Maternal speech to infants in a tonal language: Support for universal prosodic features in motherese. Dev. Psychol. 24, 14 (1988).
Google Scholar
Hilton, C. B. et al. Acoustic regularities in infant-directed speech and song across cultures. Nat. Hum. Behav. https://doi.org/10.1038/s41562-022-01410-x (2022).
Google Scholar
Cox, C. et al. A systematic review and Bayesian meta-analysis of the acoustic features of infant-directed speech. Nat. Hum. Behav. 7, 114–133 (2023).
Google Scholar
Uther, M., Knoll, M. A. & Burnham, D. Do you speak E-NG-LI-SH? A comparison of foreigner-and infant-directed speech. Speech Commun. 49, 2–7 (2007).
Google Scholar
Scarborough, R., Dmitrieva, O., Hall-Lew, L., Zhao, Y. & Brenier, J. An acoustic study of real and imagined foreigner-directed speech. in Proceedings of the International Congress of Phonetic Sciences, 2165–2168 (2007).
Burnham, D. K., Joeffry, S. & Rice, L. Computer-and human-directed speech before and after correction. in Proceedings of the 13th Australasian International Conference on Speech Science and Technology, 13–17 (2010).
Oviatt, S., MacEachern, M. & Levow, G.-A. Predicting hyperarticulate speech during human–computer error resolution. Speech Commun. 24, 87–110 (1998).
Google Scholar
Clark, H. H. & Murphy, G. L. Audience design in meaning and reference. In Advances in Psychology Vol. 9 (eds LeNy, J.-F. & Kintsch, W.) 287–299 (Elsevier, 1982).
Hwang, J., Brennan, S. E. & Huffman, M. K. Phonetic adaptation in non-native spoken dialogue: Effects of priming and audience design. J. Mem. Lang. 81, 72–90 (2015).
Google Scholar
Tippenhauer, N., Fourakis, E. R., Watson, D. G. & Lew-Williams, C. The scope of audience design in child-directed speech: Parents’ tailoring of word lengths for adult versus child listeners. J. Exp. Psychol. Learn. Mem. Cogn. 46, 2123 (2020).
Google Scholar
Cohn, M., Ferenc Segedin, B. & Zellou, G. Acoustic-phonetic properties of Siri- and human-directed speech. J. Phon. 90, 101123 (2022).
Google Scholar
Cohn, M., Liang, K.-H., Sarian, M., Zellou, G. & Yu, Z. Speech rate adjustments in conversations with an Amazon Alexa Socialbot. Front. Commun. 6, 1–8 (2021).
Google Scholar
Cohn, M. & Zellou, G. Prosodic differences in human- and Alexa-directed speech, but similar local intelligibility adjustments. Front. Commun. 6, 1–13 (2021).
Google Scholar
Cohn, M., Mengesha, Z., Lahav, M. & Heldreth, C. African American English speakers’ pitch variation and rate adjustments for imagined technological and human addressees. JASA Express Lett. 4, 1–4 (2024).
Google Scholar
Raveh, E., Steiner, I., Siegert, I., Gessinger, I. & Möbius, B. Comparing phonetic changes in computer-directed and human-directed speech. in Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2019, 42–49 (TUDpress, 2019).
Siegert, I. & Krüger, J. “Speech melody and speech content didn’t fit together”—differences in speech behavior for device directed and human directed interactions. in Advances in Data Science: Methodologies and Applications, vol. 189, 65–95 (Springer, 2021).
Ibrahim, O. & Skantze, G. Revisiting robot directed speech effects in spontaneous human–human–robot interactions. in Human Perspectives on Spoken Human–Machine Interaction (2021).
Cowan, B. R., Branigan, H. P., Obregón, M., Bugis, E. & Beale, R. Voice anthropomorphism, interlocutor modelling and alignment effects on syntactic choices in human−computer dialogue. Int. J. Hum.-Comput. Stud. 83, 27–42 (2015).
Google Scholar
Kalashnikova, N., Hutin, M., Vasilescu, I. & Devillers, L. Do we speak to robots looking like humans as we speak to humans? A study of pitch in french human–machine and human–human interactions. in Companion Publication of the 25th International Conference on Multimodal Interaction, 141–145 (2023).
Lu, Y. & Cooke, M. The contribution of changes in F0 and spectral tilt to increased intelligibility of speech produced in noise. Speech Commun. 51, 1253–1262 (2009).
Google Scholar
Brumm, H. & Zollinger, S. A. The evolution of the Lombard effect: 100 years of psychoacoustic research. Behaviour 148, 1173–1198 (2011).
Google Scholar
Nass, C., Steuer, J. & Tauber, E. R. Computers are social actors. in Proceedings of the SIGCHI Conference on Human factors in Computing Systems, 72–78 (ACM, 1994). https://doi.org/10.1145/259963.260288.
Nass, C., Moon, Y., Morkes, J., Kim, E.-Y. & Fogg, B. J. Computers are social actors: A review of current research. Hum. Values Des. Comput. Technol. 72, 137–162 (1997).
Lee, K. M. Media equation theory. in The International Encyclopedia of Communication, vol. 1, 1–4 (Wiley, 2008).
Epley, N., Waytz, A. & Cacioppo, J. T. On seeing human: A three-factor theory of anthropomorphism. Psychol. Rev. 114, 864–886 (2007).
Google Scholar
Waytz, A., Cacioppo, J. & Epley, N. Who sees human?: The Stability and importance of individual differences in anthropomorphism. Perspect. Psychol. Sci. 5, 219–232 (2010).
Google Scholar
Urquiza-Haas, E. G. & Kotrschal, K. The mind behind anthropomorphic thinking: Attribution of mental states to other species. Anim. Behav. 109, 167–176 (2015).
Google Scholar
Ernst, C.-P. & Herm-Stapelberg, N. Gender Stereotyping’s Influence on the Perceived Competence of Siri and Co. in Proceedings of the 53rd Hawaii International Conference on System Sciences, 4448–44453 (2020).
Cohn, M., Ferenc Segedin, B. & Zellou, G. Imitating Siri: Socially-mediated alignment to device and human voices. in Proceedings of International Congress of Phonetic Sciences, 1813–1817 (2019).
Cohn, M., Predeck, K., Sarian, M. & Zellou, G. Prosodic alignment toward emotionally expressive speech: Comparing human and Alexa model talkers. Speech Commun. 135, 66–75 (2021).
Google Scholar
Cohn, M., Sarian, M., Predeck, K. & Zellou, G. Individual variation in language attitudes toward voice-AI: The role of listeners’ autistic-like traits. in Proceedings of Interspeech 2020, 1813–1817 (2020). https://doi.org/10.21437/Interspeech.2020-1339.
Tarłowski, A. & Rybska, E. Young children’s inductive inferences within animals are affected by whether animals are presented anthropomorphically in films. Front. Psychol. 12, 634809 (2021).
Google Scholar
Gjersoe, N. L., Hall, E. L. & Hood, B. Children attribute mental lives to toys when they are emotionally attached to them. Cogn. Dev. 34, 28–38 (2015).
Google Scholar
Moriguchi, Y. et al. Imaginary agents exist perceptually for children but not for adults. Palgrave Commun. 5, 1–9 (2019).
Google Scholar
Taylor, M. & Mottweiler, C. M. Imaginary companions: Pretending they are real but knowing they are not. Am. J. Play 1, 47–54 (2008).
Read, J. C. & Bekker, M. M. The nature of child computer interaction. in Proceedings of the 25th BCS conference on human-computer interaction, 163–170 (British Computer Society, 2011).
Lovato, S. & Piper, A. M. Siri, is this you?: Understanding young children’s interactions with voice input systems. in Proceedings of the 14th International Conference on Interaction Design and Children, 335–338 (ACM, 2015).
Garg, R. & Sengupta, S. He is just like me: A study of the long-term use of smart speakers by parents and children. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4, 1–24 (2020).
Google Scholar
Gambino, A., Fox, J. & Ratan, R. A. Building a stronger CASA: Extending the computers are social actors paradigm. Hum. Mach. Commun. 1, 71–85 (2020).
Google Scholar
Mayo, C., Aubanel, V. & Cooke, M. Effect of prosodic changes on speech intelligibility. in Thirteenth Annual Conference of the International Speech Communication Association, 1706–1709 (2012).
Li, Q. & Russell, M. J. Why is automatic recognition of children’s speech difficult? in Interspeech, 2671–2674 (2001).
Russell, M. & D’Arcy, S. Challenges for computer recognition of children’s speech. in Workshop on Speech and Language Technology in Education (2007).
Kennedy, J. et al. Child speech recognition in human-robot interaction: Evaluations and recommendations. in Proceedings of the 2017 ACM/IEEE international conference on human-robot interaction, 82–90 (2017).
Kim, M. K. et al. Examining voice assistants in the context of children’s speech. Int. J. Child Comput. Interact. 34, 100540 (2022).
Google Scholar
Mallidi, S. H. et al. Device-directed utterance detection. in Interspeech 2018 (2018).
Swerts, M., Litman, D. & Hirschberg, J. Corrections in spoken dialogue systems. in Sixth International Conference on Spoken Language Processing (2000).
Stent, A. J., Huffman, M. K. & Brennan, S. E. Adapting speaking after evidence of misrecognition: Local and global hyperarticulation. Speech Commun. 50, 163–178 (2008).
Google Scholar
Lindblom, B. Explaining phonetic variation: A sketch of the H&H theory. in Speech Production and Speech Modelling, vol. 55, 403–439 (Springer, 1990).
Szendrői, K., Bernard, C., Berger, F., Gervain, J. & Höhle, B. Acquisition of prosodic focus marking by English, French, and German three-, four-, five-and six-year-olds. J. Child Lang. 45, 219–241 (2018).
Google Scholar
Esteve-Gibert, N., Lœvenbruck, H., Dohen, M. & d’Imperio, M. Pre-schoolers use head gestures rather than prosodic cues to highlight important information in speech. Dev. Sci. 25, e13154 (2022).
Google Scholar
Cheng, Y., Yen, K., Chen, Y., Chen, S. & Hiniker, A. Why doesn’t it work? Voice-driven interfaces and young children’s communication repair strategies. in Proceedings of the 17th ACM Conference on Interaction Design and Children, 337–348 (ACM, 2018).
Bell, L. & Gustafson, J. Child and adult speaker adaptation during error resolution in a publicly available spoken dialogue system. in Eighth European Conference on Speech Communication and Technology (2003).
Ramirez, A., Cohn, M., Zellou, G. & Graf Estes, K. Es una pelota, do you like the ball?” Pitch in Spanish-English Bilingual Infant Directed Speech. (under review).
Picheny, M. A., Durlach, N. I. & Braida, L. D. Speaking clearly for the hard of hearing I: Intelligibility differences between clear and conversational speech. J. Speech Lang. Hear. Res. 28, 96–103 (1985).
Google Scholar
Scarborough, R. & Zellou, G. Clarity in communication:“Clear” speech authenticity and lexical neighborhood density effects in speech production and perception. J. Acoust. Soc. Am. 134, 3793–3807 (2013).
Google Scholar
Burnham, D. et al. Are you my little pussy-cat? Acoustic, phonetic and affective qualities of infant-and pet-directed speech. in Fifth International Conference on Spoken Language Processing Paper 0916 (1998).
Burnham, D., Kitamura, C. & Vollmer-Conna, U. What’s new, pussycat? On talking to babies and animals. Science 296, 1435–1435 (2002).
Google Scholar
Zellou, G., Cohn, M. & FerencSegedin, B. Age- and gender-related differences in speech alignment toward humans and voice-AI. Front. Commun. 5, 1–11 (2021).
Google Scholar
Song, J. Y., Pycha, A. & Culleton, T. Interactions between voice-activated AI assistants and human speakers and their implications for second-language acquisition. Front. Commun. 7, 9475 (2022).
Google Scholar
Koenecke, A. et al. Racial disparities in automated speech recognition. Proc. Natl. Acad. Sci. 117, 7684–7689 (2020).
Google Scholar
Wassink, A. B., Gansen, C. & Bartholomew, I. Uneven success: Automatic speech recognition and ethnicity-related dialects. Speech Commun. 140, 50–70 (2022).
Google Scholar
Sachs, J. & Devin, J. Young children’s use of age-appropriate speech styles in social interaction and role-playing*. J. Child Lang. 3, 81–98 (1976).
Google Scholar
Syrett, K. & Kawahara, S. Production and perception of listener-oriented clear speech in child language. J. Child Lang. 41, 1373–1389 (2014).
Google Scholar
Wellman, H. M. Making Minds: How Theory of Mind Develops (Oxford University Press, 2014).
Google Scholar
Slaughter, V. Theory of mind in infants and young children: A review. Aust. Psychol. 50, 169–172 (2015).
Google Scholar
Severson, R. L. & Lemm, K. M. Kids see human too: Adapting an individual differences measure of anthropomorphism for a child sample. J. Cogn. Dev. 17, 122–141 (2016).
Google Scholar
Severson, R. L. & Woodard, S. R. Imagining others’ minds: The positive relation between children’s role play and anthropomorphism. Front. Psychol. https://doi.org/10.3389/fpsyg.2018.02140 (2018).
Google Scholar
Siegert, I. et al. Voice assistant conversation corpus (VACC): A multi-scenario dataset for addressee detection in human–computer-interaction using Amazon’s ALEXA. in Proceeding of the 11th LREC (2018).
Garnier, M., Ménard, L. & Alexandre, B. Hyper-articulation in Lombard speech: An active communicative strategy to enhance visible speech cues?. J. Acoust. Soc. Am. 144, 1059–1074 (2018).
Google Scholar
Trujillo, J., Özyürek, A., Holler, J. & Drijvers, L. Speakers exhibit a multimodal Lombard effect in noise. Sci. Rep. 11, 16721 (2021).
Google Scholar
Gampe, A., Zahner-Ritter, K., Müller, J. J. & Schmid, S. R. How children speak with their voice assistant Sila depends on what they think about her. Comput. Hum. Behav. 143, 107693 (2023).
Google Scholar
Gessinger, I., Cohn, M., Zellou, G. & Möbius, B. Cross-Cultural Comparison of Gradient Emotion Perception: Human vs. Alexa TTS Voices. Proceedings Interspeech 2022 23rd Conference International Speech Communication Association, 4970–4974 (2022).
Kornai, A. Digital language death. PLoS ONE 8, e77056 (2013).
Google Scholar
Zaugg, I. A., Hossain, A. & Molloy, B. Digitally-disadvantaged languages. Internet Policy Rev. 11, 1654 (2022).
Google Scholar
Kuperman, V., Stadthagen-Gonzalez, H. & Brysbaert, M. Age-of-acquisition ratings for 30,000 English words. Behav. Res. Methods 44, 978–990 (2012).
Google Scholar
Wittenburg, P., Brugman, H., Russel, A., Klassmann, A. & Sloetjes, H. ELAN: A professional framework for multimodality research. in 5th International Conference on Language Resources and Evaluation (LREC 2006), 1556–1559 (2006).
Boersma, P. & Weenink, D. Praat: Doing Phonetics by Computer. (2021).
DiCanio, C. Extract Pitch Averages. https://www.acsu.buffalo.edu/~cdicanio/scripts/Get_pitch.praat (2007).
Bürkner, P.-C. brms: An R package for Bayesian multilevel models using Stan. J. Stat. Softw. 80, 1–28 (2017).
Google Scholar
Carpenter, B. et al. Stan: A probabilistic programming language. J. Stat. Softw. 76, 01 (2017).
Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing. (R Foundation for Statistical Computing, 2016).