How Three Organizations Are Using GenAI to Advance Equity—and Combating Bias within It

Male teacher helps two female students at computer

October 8, 2024 | By Dr. Debshila Basu Mallick, Dr. Jennifer de Forest, Chris Shaw and Dr. Rachel Burstein

Given the rapid advances in AI and the momentum in the education field to understand how these technologies can support teaching and learning, last year the Gates Foundation launched a pilot initiative to provide funding to test new AI ideas that are in support of equitable K-12 mathematics outcomes. This is the fourth in a series of five blog posts elevating key learnings from this set of investments. This series will culminate in the release of a Digital Promise report on the cohort’s work in October. Today’s blog explores how three grantees centered equity and reduced bias in education-based AI. Check out last week’s post here.

When product developers at UPchieve, a nonprofit that connects students with free math tutoring powered by human volunteers, began evaluating AI tools to power a system for forecasting student performance, they expected to find serious problems with bias. Instead, their product team was pleasantly surprised. “Our evaluations of different AI tools showed that AI model providers were basically doing their jobs,” said Chris Shaw, head of product at UPchieve. “Contrary to some of the headlines we were seeing, we couldn’t find any blatant racism or any type of inequities in what the models were putting out.”

Yet Shaw and his colleagues quickly found that GenAI tools had more subtle and complicated biases—and that there was a need to define such instances as bias. “We found failures in the model that aren’t typically associated with equity, but that have implications for educational equity,” Shaw says.

For example, Asian students tended to perform worse on the assessment when given the same competence level. Shaw and his colleagues suggest that the discrepancy could be because the GenAI model tended to underrate students who expressed less confidence in their abilities. Shaw explains, “If the student says they are bad at math, the model has a tendency to believe them.” This disadvantages students whose families may adopt a culture that places a strong value on humility.

In another example of bias, the model tended to give high scores to students. “The model shrunk everyone into a ‘normal’ category,” Shaw explains. This made it difficult for the model to pinpoint students who would benefit from extra help or students who needed to be challenged.

As leaders at OpenStax, YouthTruth, and UPchieve, we are excited about the possibilities for GenAI in education. But we are also mindful of subtle biases that can exist in GenAI tools. As members of the Bill & Melinda Gates Foundation’s K-12 AI Pilot Cohort, we developed approaches for reducing bias in AI models used in education, and sought to leverage AI as a tool for advancing equity in educational systems.

Recognizing When to Use AI

In our work on our pilot projects, we found that the current crop of GenAI tools can help promote equity when used for tasks that are highly structured and where the dataset is so large that human review is impossible. For example, YouthTruth initially used GenAI to analyze open-ended responses in 83,000 student surveys about math belonging. Encouraged by the quality and speed of the analysis, YouthTruth staff expanded their use of GenAI to analyze open-ended responses with students’ quantitative responses.

This process made it easy for YouthTruth and its partners to understand how students of different demographic groups and with different levels of involvement in their communities feel about math. YouthTruth’s pilot project suggests that GenAI is a valuable tool for advancing equity by elevating student voices. In addition, the speed of GenAI may allow school districts without an institutional researcher to make sense of large volumes of data, leveling the playing field between districts of varying resources.

The OpenStax team focused on understanding the value that GenAI brought to teachers seeking resources in its free library of high-quality educational content. Researchers highlighted the responses that were most effective in directing teachers to high-quality math resources that matched their queries. In another project, OpenStax engaged subject matter experts to evaluate the outputs of open source large language models for bias, safety, and other considerations. This initial human review was necessary for choosing a model to power OpenStax’s project and reduce bias in its implementation. It will be impossible for humans to review every single GenAI as the project scales, but this initial review gave OpenStax researchers confidence that the model served its instructional use case.

Consulting Users

Listening to users is critical to reducing the bias of tools and advancing equity in educational systems. UPchieve developed a bias mitigation framework to benchmark its systems against the attributes of its human users. These benchmarks examined a variety of factors, including whether the model is displaying the same kind of supportiveness as is expected from its volunteer math tutors. UPchieve then used its benchmarks to improve its prompts for the GenAI system. In addition, Upchieve implemented a student-facing rating system as a way for students to provide ongoing direct feedback on the outputs of the model. UPchieve cannot correct biases within the data that a GenAI system is trained on, but it can prioritize its organizational goals and those of the students and tutors who participate in its programs to get to prompts that will generate less biased results.

Researchers at OpenStax developed two prototypes for finding high-quality, free teaching materials within OpenStax’s content library. One prototype used GenAI to identify resources based on a lesson plan, problem set, academic standard, and other materials that a teacher submitted. The second prototype used existing search technology instead of GenAI to guide teachers to high-quality resources. Subsequently, OpenStax created a space for teachers to discuss the benefits and drawbacks of the AI and non-AI approaches.

The OpenStax R&D team found that teachers appreciated the opportunity to discuss whether the GenAI or status quo worked best for their needs. Given the time and other resource constraints that teachers have to balance every day, OpenStax found that it was important to narrow the focus of their prototype to make searches straightforward and speedy, instead of expecting teachers to become prompt engineers. By prioritizing teachers’ time and ensuring that all teachers can engage effectively with the system, OpenStax has aimed to make the search tool effective for all users.

Our experiences as members of the Bill & Melinda Gates Foundation’s AI Pilot Cohort made us hopeful about AI’s potential as a tool for advancing equity in educational systems—but only if the use of GenAI is accompanied by thoughtful efforts to identify and minimize bias. Listening to users of the technological systems that we build and a critical first step in that work.

Be on the lookout for Digital Promise’s full report on the AI Pilot Cohort, to be published in October.

Want to know more about centering equity and reducing bias in education-based AI? Find more resources here:

How Three Organizations Are Using GenAI to Advance Equity—and Combating Bias within It

Recognizing When to Use AI

Consulting Users

Related Articles

July 17, 2025

How Schools are Building Custom Data Solutions to Support Student Success

July 15, 2025

From ELA to CT: Building Meaningful Routines with NYC Educators

July 11, 2025

Making Extended Time Work for All Students

July 10, 2025

Guest Post: How We Respond to the Unique Challenges of AI in EdTech Development

Popular Searches