Guest Post: How We Respond to the Unique Challenges of AI in EdTech Development

July 10, 2025 | By Husni Almoubayyed

In April, Digital Promise launched its newest product certification, Responsibly Designed AI, which helps districts make more informed procurement decisions. At a time when many edtech solutions are rapidly integrating artificial intelligence (AI) capabilities, it’s important for developers to think critically about how they are doing so responsibly. This blog is the final in a series of four posts exploring how edtech can be powered by AI in ways that best support educators’ and learners’ pedagogical needs, agency, and safety. Each blog post is written by an edtech developer whose product was among the first cohort to earn the Responsibly Designed AI certification. Read the third post here.

At Carnegie Learning, we’ve been developing evidence-based educational products with AI for more than 25 years. With recent advances in generative AI, more opportunities for personalizing the student experience have been made possible. As the director of AI engineering at Carnegie Learning, I lead the development of AI products, tools, and features, focusing on ways to materially improve student outcomes. One particular product we’re developing is LiveHint AI, a multimodal, curriculum-specific AI tutor. LiveHint AI offers conversational text interactions, avatar-based video. and real-time plot generation. It supports students by providing step-by-step guidance as they work on math problems from our existing math products, such as our digital book (MATHbook) and our interactive video program (MATHstream).

With great opportunities come great challenges. One particular challenge is that when we use generative AI models, the training data is generally unknown to us. This has the potential to introduce unexpected biases in how an AI model responds, or variations in its helpfulness. New challenges do not change our guiding principles—to develop impactful products that improve student outcomes for all students—but they do encourage us to think of new ways to measure and apply those guiding principles. The Responsibly Designed AI product certification from Digital Promise—a certification that focuses on privacy, data security, algorithmic fairness, and user control—aligns well with our principles and has helped us think through best practices more thoroughly.

We built LiveHint AI based on three major principles: helpfulness, consistency, and safety and fairness.

Helpfulness

It’s easy to be overly excited about new technologies, but can we use them in ways that significantly benefit student learning? We always aim to measure this impact and never take it for granted. That’s why we integrated LiveHint AI with UpGrade, Carnegie Learning’s open-source platform for designing and running large-scale randomized controlled trials. This integration allows us to measure how students’ interactions with LiveHint AI impact their performance, and how that changes with different models, prompting strategies, features, or approaches.

Consistency

LiveHint AI is designed to provide all learners with the same high-quality learning experiences found in our curricula. We do this by ensuring that LiveHint AI knows the vocabulary and instructional design principles our curricula are based on, as well as the precise problem a student is working on. We’re working on making LiveHint AI even more aware of student usage and performance across all our products.

Safety and Fairness

Student interactions should be free from harmful language. While many modern LLMs already have guardrails built in, we generally found them to be insufficient for interactions with students and decided to supplement such guardrails with our own. For example, we use toxicity detectors to ensure that conversations do not continue when students use language that is harmful or toxic. We also add instructions to make sure conversations do not go off track while still allowing for creative analogies and examples.

We are intentional and thoughtful about rolling out LiveHint AI to students. So far, we have only launched it in school districts that explicitly approved it, and only after circulating opt-out letters to parents and guardians. We provide a version of LiveHint, without generative AI, to anyone who opts out, and we clearly communicate which version a student is interacting with before every session. Speaking of communication, as generative AI applications become widely implemented, you might think that adding the “AI sparkles” is enough to inform users that they are interacting with generative AI—but we can do better. That’s why, in line with the certification requirements, we clearly explain to students whether they will interact with generative AI when entering a chat with LiveHint AI.

In an effort to measure potential bias in a robust way, we partnered with researchers at Cornell University, Columbia University, and the University of Michigan to build and use a framework for fairness evaluation, monitoring, and mitigation. Specifically, we looked at how different underlying foundation models exhibit different behavior as part of the LiveHint AI system when interacting with “simulated” students. Simulated student prompts were varied in terms of explicit or implicit student identifiers (such as adding “I am Spanish,” switching the dialect from American English to Indian English, or adding typos and slang to a baseline prompt). We noticed that different models responded differently. For example, Claude 3.5 Sonnet was more likely to switch to a different language (e.g. Spanish when Spain is mentioned, as in the previous example), and to add additional explanations and use simpler language when the simulated prompt had typos or used slang. More details about this study will be published in the proceedings of the International Conference on AI in Education in Italy in July 2025.

Designing with Maximum Flexibility

While there is more work to do to understand and mitigate bias, we continue to develop and use principled frameworks to choose and monitor new models as they become available and build guardrails around them to reach desired behavior. LiveHint AI is built so that it is possible to use a different model for every small piece of the overall system, giving us maximum flexibility when choosing models.

In general, some decisions might be perceived differently on an individual level. Therefore, it might be best to leave some decisions up to the student or their teachers when possible, such as allowing students to choose a different language if they prefer, rather than switching to it automatically. Ultimately, granting students and teachers the power to make such decisions provides the same things we strive to design toward: helpfulness, consistency, and above all else, safety and fairness.

Learn more about ways to evaluate the tools you consider through Digital Promise product certifications. Sign up for our newsletter and follow Digital Promise on Instagram, Facebook, and LinkedIn to stay updated on edtech and emerging technologies.

Guest Post: How We Respond to the Unique Challenges of AI in EdTech Development

Helpfulness

Consistency

Safety and Fairness

Designing with Maximum Flexibility

Related Articles

August 21, 2025

Three Keys to a Strong Research-Practice Partnership

August 19, 2025

Evaluating Micro-credentials for Quality: What to Look For

August 18, 2025

League Leadership Reflection: One Superintendent’s Professional Learning Journey

August 13, 2025

Portrait of a Graduate – Lessons from the Field, Part One

Popular Searches