In April, Digital Promise launched its newest product certification, Responsibly Designed AI, which helps districts make more informed procurement decisions. At a time when many edtech solutions are rapidly integrating artificial intelligence (AI) capabilities, it’s important for developers to think critically about how they are doing so responsibly. This blog is the final in a series of four posts exploring how edtech can be powered by AI in ways that best support educators’ and learners’ pedagogical needs, agency, and safety. Each blog post is written by an edtech developer whose product was among the first cohort to earn the Responsibly Designed AI certification. Read the third post here.
With great opportunities come great challenges. One particular challenge is that when we use generative AI models, the training data is generally unknown to us. This has the potential to introduce unexpected biases in how an AI model responds, or variations in its helpfulness. New challenges do not change our guiding principles—to develop impactful products that improve student outcomes for all students—but they do encourage us to think of new ways to measure and apply those guiding principles. The Responsibly Designed AI product certification from Digital Promise—a certification that focuses on privacy, data security, algorithmic fairness, and user control—aligns well with our principles and has helped us think through best practices more thoroughly.
We built LiveHint AI based on three major principles: helpfulness, consistency, and safety and fairness.
It’s easy to be overly excited about new technologies, but can we use them in ways that significantly benefit student learning? We always aim to measure this impact and never take it for granted. That’s why we integrated LiveHint AI with UpGrade, Carnegie Learning’s open-source platform for designing and running large-scale randomized controlled trials. This integration allows us to measure how students’ interactions with LiveHint AI impact their performance, and how that changes with different models, prompting strategies, features, or approaches.
LiveHint AI is designed to provide all learners with the same high-quality learning experiences found in our curricula. We do this by ensuring that LiveHint AI knows the vocabulary and instructional design principles our curricula are based on, as well as the precise problem a student is working on. We’re working on making LiveHint AI even more aware of student usage and performance across all our products.
Student interactions should be free from harmful language. While many modern LLMs already have guardrails built in, we generally found them to be insufficient for interactions with students and decided to supplement such guardrails with our own. For example, we use toxicity detectors to ensure that conversations do not continue when students use language that is harmful or toxic. We also add instructions to make sure conversations do not go off track while still allowing for creative analogies and examples.
We are intentional and thoughtful about rolling out LiveHint AI to students. So far, we have only launched it in school districts that explicitly approved it, and only after circulating opt-out letters to parents and guardians. We provide a version of LiveHint, without generative AI, to anyone who opts out, and we clearly communicate which version a student is interacting with before every session. Speaking of communication, as generative AI applications become widely implemented, you might think that adding the “AI sparkles” is enough to inform users that they are interacting with generative AI—but we can do better. That’s why, in line with the certification requirements, we clearly explain to students whether they will interact with generative AI when entering a chat with LiveHint AI.
In an effort to measure potential bias in a robust way, we partnered with researchers at Cornell University, Columbia University, and the University of Michigan to build and use a framework for fairness evaluation, monitoring, and mitigation. Specifically, we looked at how different underlying foundation models exhibit different behavior as part of the LiveHint AI system when interacting with “simulated” students. Simulated student prompts were varied in terms of explicit or implicit student identifiers (such as adding “I am Spanish,” switching the dialect from American English to Indian English, or adding typos and slang to a baseline prompt). We noticed that different models responded differently. For example, Claude 3.5 Sonnet was more likely to switch to a different language (e.g. Spanish when Spain is mentioned, as in the previous example), and to add additional explanations and use simpler language when the simulated prompt had typos or used slang. More details about this study will be published in the proceedings of the International Conference on AI in Education in Italy in July 2025.
While there is more work to do to understand and mitigate bias, we continue to develop and use principled frameworks to choose and monitor new models as they become available and build guardrails around them to reach desired behavior. LiveHint AI is built so that it is possible to use a different model for every small piece of the overall system, giving us maximum flexibility when choosing models.
In general, some decisions might be perceived differently on an individual level. Therefore, it might be best to leave some decisions up to the student or their teachers when possible, such as allowing students to choose a different language if they prefer, rather than switching to it automatically. Ultimately, granting students and teachers the power to make such decisions provides the same things we strive to design toward: helpfulness, consistency, and above all else, safety and fairness.
We want to hear from you!
Please take this 5-minute survey and help us serve you better.