The National Institute of Standards and Technology measures how many photons pass through a chicken. Now, it wants to quantify transparency around algorithms.

The National Institutes of Standards and Technology (NIST) is a federal agency best known for measuring things like time or the number of photons that pass through a chicken. Now NIST wants to put a number on a person’s trust in artificial intelligence.

Trust is part of how we judge the potential for danger, and it’s an important factor in the adoption of AI. As AI takes on more and more complicated tasks, officials at NIST say, trust is an essential part of the evolving relationship between people and machines.

In a research paper, creators of the attempt to quantify user trust in AI say they want to help businesses and developers who deploy AI systems make informed decisions and identify areas where people don’t trust AI. NIST views the AI initiative as an extension of its more traditional work establishing trust in measurement systems. Public comment is being accepted until July 30.

Brian Stanton is coauthor of the paper and a NIST clinical psychologist who focuses on AI system trustworthiness. Without trust, Stanton says, adoption of AI will slow or halt. He says many factors may affect a person’s trust in AI, such as their exposure to science fiction or the presence of AI skeptics among friends and family.

NIST is a part of the US Department of Commerce that has grown in prominence in the age of artificial intelligence. Under an executive order by former president Trump, NIST in 2019 released a plan for engaging with private industry to create standards for the use of AI. In January, Congress directed NIST to create a framework for trustworthy AI to guide use of the technology. One problem area: Studies by academics and NIST itself have found that some facial-recognition systems misidentify Asian and Black people 100 times more often than white people.

The trust initiative comes amid increased government scrutiny of AI. The Office of Management and Budget has said acceptance and adoption of AI will depend on public trust. Mentions of AI in Congress are increasing, and historic antitrust cases continue against tech giants including Amazon, Facebook, and Google. In April, the Federal Trade Commission told businesses to tell the truth about AI they use and not exaggerate what’s possible. “Hold yourself accountable—or be ready for the FTC to do it for you,” the statement said. 

NIST wants to measure trust in AI in two ways. A user trust potential score is meant to measure things about a person using an AI system, including their age, gender, cultural beliefs, and experience with other AI systems. The second score, the perceived system trustworthiness score, will cover more technical factors such as whether an outdated user interface makes people call AI into doubt. The proposed system score assigns weights to nine characteristics like accuracy and explainability. Factors that play into trusting AI and weights for factors like reliability and security are still being determined.

The NIST paper says expectations around an AI system will reflect its use. For example, a system used by doctors to diagnose disease should be more accurate than one recommending music.

Masooda Bashir, a professor in the University of Illinois’ School of Information Sciences, studies how people trust or mistrust autonomous vehicles. She wants to see user trust measurement evolve to the point that you can pick your trust settings the same way people pick a color for a car.

Bashir called the NIST proposal a positive development, but she thinks the user trust score should reflect more factors, including a person’s mood and changing attitude toward AI as they get to know how a system performs. In a 2016 study, Bashir and coauthors found that stress levels can influence people’s attitudes about trust in AI. Those kinds of differences, she said, should help determine the weight given to the factors for trust identified by NIST.

Harvard University assistant professor Himabindu Lakkaraju studies the role trust plays in human decisionmaking in professional settings. She’s working with nearly 200 doctors at hospitals in Massachusetts to understand how trust in AI can change how doctors diagnose a patient.

For common illnesses like the flu, AI isn’t very helpful, since human professionals can recognize them pretty easily. But Lakkaraju found that AI can help doctors diagnose hard-to-identify illnesses like autoimmune diseases. In her latest work, Lakkaraju and coworkers gave doctors records of roughly 2,000 patients and predictions from an AI system, then asked them to predict whether the patient would have a stroke in six months. They varied the information supplied about the AI system, including its accuracy, confidence interval, and an explanation of how the system works. They found doctors’ predictions were the most accurate when they were given the most information about the AI system.

Lakkaraju says she’s happy to see that NIST is trying to quantify trust, but she says the agency should consider the role explanations can play in human trust of AI systems. In the experiment, the accuracy of predicting strokes by doctors went down when doctors were given an explanation without data to inform the decision, implying that an explanation alone can lead people to trust AI too much.

“Explanations can bring about unusually high trust even when it is not warranted, which is a recipe for problems,” she says. “But once you start putting numbers on how good the explanation is, then people’s trust slowly calibrates.”

Other nations are also trying to confront the question of trust in AI. The US is among 40 countries that signed onto AI principles that emphasize trustworthiness. A document signed by about a dozen European countries says trustworthiness and innovation go hand in hand, and can be considered “two sides of the same coin.”

NIST and the OECD, a group of 38 countries with advanced economies, are working on tools to designate AI systems as high or low risk. The Canadian government created an algorithm impact assessment process in 2019 for businesses and government agencies. There, AI falls into four categories—from no impact on people’s lives or the rights of communities to very high risk and perpetuating harm on individuals and communities. Rating an algorithm takes about 30 minutes. The Canadian approach requires that developers notify users for all but the lowest-risk systems.

European Union lawmakers are considering AI regulations that could help define global standards for the kind of AI that’s considered low or high risk and how to regulate the technology. Like Europe’s landmark GDPR privacy law, the EU AI strategy could lead the largest companies in the world that deploy artificial intelligence to change their practices worldwide.

The regulation calls for the creation of a public registry of high-risk forms of AI in use in a database managed by the European Commission. Examples of AI deemed high risk included in the document include AI used for education, employment, or as safety components for utilities like electricity, gas, or water. That report will likely be amended before passage, but the draft calls for a ban on AI for social scoring of citizens by governments and real-time facial recognition.

The EU report also encourages allowing businesses and researchers to experiment in areas called “sandboxes,” designed to make sure the legal framework is “innovation-friendly, future-proof, and resilient to disruption.” Earlier this month, the Biden administration introduced the National Artificial Intelligence Research Resource Task Force aimed at sharing government data for research on issues like health care or autonomous driving. Ultimate plans would require approval from Congress.

For now, the AI user trust score is being developed for AI practitioners. Over time, though, the scores could empower individuals to avoid untrustworthy AI and nudge the marketplace toward deploying robust, tested, trusted systems. Of course that’s if they know AI is being used at all.

More Great WIRED Stories