Tu Le (University of Virginia) Danny Yuxing Huang (New York University) Noah Apthorpe (Colgate University) Yuan Tian (University of Virginia)
Image courtesy: kidscreen
Abstract
Many households include children who use voice personal assistants (VPA) such as Amazon Alexa. Children benefit from the rich functionalities of VPAs and third-party apps but are also exposed to new risks in the VPA ecosystem (e.g., inappropriate content or information collection). To study the risks VPAs pose to children, built a Natural Language Processing (NLP)-based system to automatically interact with VPA apps and analyze the resulting conversations to identify contents risky to children. Identified 28 child-directed apps with risky contents and maintain a growing dataset of 31,966 non-overlapping app behaviours collected from 3,434 Alexa apps. Findings suggest that although voice apps designed for children are subject to more policy requirements and intensive vetting, children are still vulnerable to risky content. a user study is conducted to study showing that parents are more concerned about VPA apps with inappropriate content than those that ask for personal information, but many parents are not aware that risky apps of either type exist. Finally, identified a new threat to users of VPA apps: confounding utterances, or voice commands shared by multiple apps that may cause a user to invoke or interact with a different app than intended. Identified 4,487 confounding utterances, including 581 shared by child-directed and non-child-directed apps
Risks to Children from VPAs.
Researchers have found that 91% of children between ages 4 and 11 in the U.S. have access to VPAs, 26% of children are exposed to a VPA between 2 and 4 hours a week, and 20% talk to VPA devices for more than 5 hours a week. The lack of robust authentication on commercial VPAs makes it challenging to regulate children’s use of skills, especially as anyone in the same physical vicinity of a VPA can interact with the device. As a result, children may have access to risky skills that deliver inappropriate content (e.g., expletives) or collect personal information through voice interactions.
The 1998 Children’s Online Privacy Protection Act (COPPA) regulates the information collected from children under 13 online, but widespread COPPA violations have been shown in the mobile application market and compliance in the VPA space is far from guaranteed. Additionally, parental control modes provided by VPAs (e.g., Amazon FreeTime and Google Family App) often place a burden on parents during setup and receive complaints from parents due to their limitations.
Courtesy: Youtube
Purpose
Protecting children in the era of voice devices, therefore, raises several pressing questions:
• Can we automate the analysis of VPA skills to identify content risky for children without requiring manual human voice interactions?
• Are VPA skills targeted to children that claim to follow additional content requirements – hereafter referred to as “kid skills” – actually safe for child users?
• What are parents’ attitudes and awareness of the risks posed by VPAs to children?
• How likely is it for children to be exposed to risky skills through confounding utterances—voice commands shared by multiple skills which could cause a child to accidentally invoke or interact with a different skill than intended.
Contributions
Automated System for Skill Analysis: A system is presented, SkillBot, that automatically interacts with Alexa skills and collects their contents at scale. The system can be run longitudinally to identify new conversations and new conversation branches in previously analyzed skills.
Identification of Risks to Children: Analyzed 31,966 conversations collected from 3,434 Alexa kid skills to detect potential risky skills directed to children. 8 skills were found that contains inappropriate content for children and 20 skills that ask for personal information through voice interaction.
User Study of Parents’ Awareness and Experiences: A user study conducted demonstrating that a majority of parents express concern about the content of the risky kid's skills identified by SkillBot tempered by disbelief that these skills are actually available for Alexa VPAs. This lack of risk awareness is compounded by findings that many parents’ do not use VPA parental controls and allow their children to use VPA versions that do not have parental controls enabled by default.
Confounding Utterances: Identified confounding utterances as a novel threat to VPA users. The SkillBot analysis reveals 4,487 confounding utterances shared between two or more skills and highlight those that place child users at risk by invoking a non-kid skill instead of an expected kid skill.
Alexa Parental Control.
Amazon FreeTime is a parental control feature which allows parents to manage what content their children can access on their Amazon devices. FreeTime on Alexa provides a Parent Dashboard user interface for parents to set daily time limits, monitor activities, and manage allowed content. If Freetime is enabled, users can only use the skills in the kid's category by default. To use other skills, parents need to manually add skills in the white list. FreeTime Unlimited is a subscription that offers thousands of kid-friendly content, including a list of kid skills available on compatible Echo devices, for children under 13. Parents can purchase this subscription via their Amazon account and use it across all compatible Amazon devices.
Children can potentially access an Amazon Echo device located in a shared space and invoke such “risky" skills in the absence of child-protection features on the Amazon Echo because of the following reasons. FreeTime is turned off by default on the regular version of Amazon Echo
Exploring and Classifying Utterances: Amazon allows developers to list up to three sample utterances in the sample utterances section of their skill’s information page. The system first extracts these sample utterances.
Detecting Questions in Skill Responses: To extend the conversation, our system first classifies responses collected from the skill into three main categories. These three categories include: Yes/No question, WH question, and non-question statement.
For this classification task, we employ spaCy and StanfordCoreNLP which are popular tools for NLP tasks. first tokenized the skill’s response into sentences and each sentence into words. Then annotate each sentence using part-of-speech (POS) tagging. For POS tags, utilizing both TreeBank POS tags and Universal POS tags. With the POS tagging, identifed the role of each word in the sentence, such as auxiliary, subject, or object, based on its tag.
For each skill, SkillBot runs multiple rounds to explore different paths within the conversation trees. Each node in this tree is a unique response from Alexa. There is an edge between nodes i and j if there exists an interaction where Alexa says i, the user (i.e., SkillBot) says something, and then Alexa says j. We call the progression from i to j a path in the tree. Furthermore, multiple paths of interactions could exist for a skill. For instance, node i could have two edges: one with j and another one with k. Effectively, two paths lead from i. In one path, the user says something after hearing i, and Alexa responds with j. In another path, the user says something else after hearing i, and Alexa responds with k.
Kids Only: Identified 64 (58.2%) out of 110 utterances in this type invoked an irrelevant skill that was not in the list of skills associated with the utterance itself. The remaining 46 utterances (41.8%) invoked a relevant skill within the list of associated skills.
Both Kids and Non-kids (Joint): Found that 367 (63.2%) out of 581 utterances in this type invoked an irrelevant skill that was not in the list of skills associated with the utterance itself. The remaining 214 utterances (36.8%) invoked a relevant skill within the list of associated skills. However, there were 157 out of 214 utterances (73.4%) prioritized to invoke a non-kid skill over a kid skill.
Non-kids Only: Found that 1,999 (52.7%) out of 3,796 utterances in this type invoked an irrelevant skill that was not in the list of skills associated with the utterance itself. The remaining 1,797 utterances (47.3%) invoked a relevant skill within the list of associated skills.
Takeaway: It is risky if a confounding utterance is shared between a kid skill and a non-kid skill. Our analysis shows that kids can accidentally invoke a non-kid skill while trying to use a kid skill. An adversary can exploit this problem to get kid users to invoke risky non-kid skills.
Conclusion
Designed and implemented an automated skill interaction system called SkillBot, analyzing 3,434 Alexa kid skills. Identified a number of risky skills with inappropriate content or personal data requests, and confounding utterance threat. To further evaluate the impacts of these risky skills on kids, then conducted a user study of 232 U.S. parents who use Alexa in their household. Found widespread concerns about the contents of these skills, combined with general disbelief that these skills might actually be available to kids, and low adoption of parental control features.
Comments