Author’s Note: Product teams at Facebook rely on research along with other external factors to design and build products. This article discusses research conducted by Facebook's Research Team to better understand people’s experiences with privacy-enhancing technology.
Privacy-enhancing technologies such as “on-device learning” can protect people’s privacy but can be complex and challenging to understand. We interviewed privacy experts and non-experts to learn how we might better communicate about these kinds of technologies, with a focus on on-device learning.
We learned that people may get more value from explanations of on-device learning that focus on the immediate impact on their experience and the implications for their lives, rather than focusing on the technical details of how the technology works.
These results provide initial direction for how companies might improve explanations of on-device learning and similar technologies; we hope that sharing these insights inspires further explorations of how to create even more effective education for privacy-enhancing technologies.
Privacy-enhancing technologies use advanced techniques from cryptography and statistics to protect people’s personal information in digital experiences. Although people can benefit from the application of privacy-enhancing technologies, they can be complex and difficult to understand. For example, consider the Wikipedia definition of one such technology, on-device learning (also known as federated learning): “A machine learning technique that trains an algorithm across multiple decentralized edge devices or servers holding local data samples, without exchanging them” (Wikipedia, 2021). This explanation is dense, technical, and will likely not be particularly helpful for someone who isn’t an expert in the field of machine learning. And that’s a problem because it’s important that people understand the privacy-enhancing technologies that companies offer or use for their benefit. Without understanding the value of these technologies, people may struggle to make informed decisions about their data or may feel vulnerable even when their privacy is protected. At Facebook, we wanted to learn how we might better communicate to people about these kinds of technologies and their value.
What we did
We interviewed 16 AI privacy experts and 16 people who had experience using Facebook or Oculus (“non-experts”). During each interview, we presented four explanations of on-device learning (a type of privacy-enhancing technology). The explanations varied in their level of technical detail, and we asked participants questions in order to understand which parts of the explanations might be most helpful and why.
Explanation 1: Now you can keep your data on your device without sharing it.
Explanation 2: Now you can use a data protection technology called on-device learning to keep your data on-device without sharing it with the cloud.
Explanation 3: Now you can use a data protection technology called on-device learning to protect your privacy. On-device learning protects your privacy by training machine learning models on your device. On-device learning only shares aggregated model summaries, not your data, to the cloud to improve everyone’s experience, including yours.
Explanation 4: Now you can use a data protection technology called on-device learning to improve your experience and protect the privacy of data on your device. With standard machine learning, models train on centralized data centers to learn how to predict what might happen in the future to improve everyone’s experience, including yours. On-device learning lets your device train models by collaborating with other devices, instead of sharing your data with the cloud.
See the Research Methods appendix for additional method details including information about how we sampled participants and the specific questions we asked during the interviews.
What we learned
On-device learning is not easy to understand
The AI privacy experts and non-experts in our study viewed on-device learning as a complex topic that could be difficult for non-experts to understand. Technical terms like “machine learning models” or “aggregated model summaries” that are core to this technology were unfamiliar and difficult to understand for non-experts; this was true even for participants with university-level education who had generally high levels of digital skills. This lack of baseline familiarity ultimately blocked some participants from understanding how or why their experiences would differ if a product used on-device learning.
Focus on the impact of technology for people’s experiences, not technical jargon
Privacy experts and non-experts alike suggested that explanations of on-device learning should focus on the impact and implications of the technology (e.g., how someone’s immediate experiences would change or how the technology could help keep them safer) rather than the specific details of how the technology works (e.g., “On-device learning only shares aggregated model summaries”). What matters is that people have confidence that a product is protecting their privacy and understand the impact of using privacy-enhancing technologies for their experiences. Technical explanations aren’t necessarily helpful for a typical non-expert who won’t have the knowledge to evaluate the merits of the technology itself. A helpful analogy is the information someone needs when they’re thinking about buying an electric or gas car. Detailed explanations for the inner workings of electric motors versus gas engines are less helpful than a clear explanation for the relative benefits of each option for the individual and society (e.g., expected repair costs, environmental impact).
Similarly, when it comes to communicating about privacy-enhancing technologies like on-device learning, non-experts are more likely to benefit from a clear explanation of the value that they’ll receive from using the technology rather than an explanation of the inner workings of the technology itself. Of course, information about those inner workings should still be made available to review elsewhere in order to support transparency and accountability. However, that level of technical depth does not meet non-experts’ needs when they first encounter the concept of on-device learning. Instead, technical details best fit into opportunities to “learn more.”
Through our interviews, we learned about the type of information that non-experts might find helpful when introduced to on-device learning. Specifically, participants had a desire to learn about the following information:
What specific data is being collected and used? Why is data being used? What are the benefits to me, as well as the risks? When is data being collected, and when is it not being collected? Where is the data being stored? On my device, or somewhere else? Who has access to the data?
Explanations of on-device learning that focus on these questions have the potential to communicate clearly about what matters most to people — the personal impact and implications of on-device learning. When people understand the purpose and implications of using on-device learning, they can be in a better position to make informed choices about their personal information. Based on this research, we’d encourage product developers to consider the questions above when exploring the best ways to communicate about on-device learning (and potentially other privacy-enhancing technologies), particularly when communicating to people who are non-experts in the field of AI or machine learning. In some cases, people may benefit from answers to all 5 questions, whereas in other cases they may benefit from a more succinct explanation that focuses on one or two questions only. Ultimately, the information that is necessary and sufficient to effectively communicate about privacy-enhancing technologies may vary across contexts, but the list of questions above can serve as a useful starting point to help identify concepts to test and explore in support of that goal.
People are more likely to benefit from explanations of privacy-enhancing technologies like on-device learning that explain the personal impact and implications of data collection rather than explanations that detail the inner workings of the technology itself. Of course, it’s still important to explain how these technologies work in order to support transparency and accountability for privacy technologies. Ultimately then, it may be important for companies to pursue dual communication strategies for privacy-enhancing technologies like on-device learning: (1) When communicating to non-experts, help them understand the relative benefits of the technology for their own lives, and (2) provide opportunities for experts or curious non-experts to learn more about the technical details of the technology itself to help those who are interested evaluate the efficacy of the privacy-enhancing technology. Ultimately, we hope that sharing these insights inspires further explorations of how to craft even more effective education for privacy-enhancing technologies.
Acknowledgements: Thanks to Ethan Ye, Amy Connors, Christiana Chae, Polina Zvyagina, Ilya Mironov, Dzmitry Huba, Daphne Chen, Pierre Jammes, Yogesh Ingole, Kevin Hannan, and Selena Chan for contributing ideas and designs that were important to the successful completion of this project. Thanks to Justin Hepler for contributing editorial support for this publication. Thanks to Dana Beaty and the AI Creative Design team for contributing to the visual for this article.
Appendix: Research Methods
- Ages 18-65
- Current or former Facebook app users, or current Oculus users
- Mix based on demographics (e.g., age, gender), Facebook app usage (heavy vs. light), device (iOS vs Android), literacy (digital and language), and familiarity with AI technology
During each interview, we presented four explanations for on-device learning (a type of privacy-enhancing technology). The explanations varied in their level of technical detail, and we asked participants questions in order to understand which parts of the explanations might be most helpful and why.
- Explanation 1: Now you can keep your data on your device without sharing it.
- Explanation 2: Now you can use a data protection technology called on-device learning to keep your data on-device without sharing it with the cloud.
- Explanation 3: Now you can use a data protection technology called on-device learning to protect your privacy. On-device learning protects your privacy by training machine learning models on your device. On-device learning only shares aggregated model summaries, not your data, to the cloud to improve everyone’s experience, including yours.
- Explanation 4: Now you can use a data protection technology called on-device learning to improve your experience and protect the privacy of data on your device. With standard machine learning, models train on centralized data centers to learn how to predict what might happen in the future to improve everyone’s experience, including yours. On-device learning lets your device train models by collaborating with other devices, instead of sharing your data with the cloud.
For each statement, participants were asked the following questions:
- Meaning: In your own words, tell me what you think this means. Please “think out loud” as you tell me about what it means to you.
- Company Transparency: Based on this description, is the company being transparent with you about how your data is being protected? Why or why not?
- Sufficiency of information: Do you feel this description provides sufficient information about how your privacy is being protected? Why or why not?
- Comprehension: How easy or difficult is this statement for you to understand? What, if anything, is difficult to understand? What information would help clarify the parts that are difficult to understand?
- Ideas to improve language: How might you change this description? What else would you want to know?
We conducted thematic and comparative analysis to identify emergent themes and develop insights.