How to Evaluate the Risk of Generative AI Tools Like ChatGPT

Many organizations don’t know how they’re using AI tools, and worse—they don’t know anything about the tools themselves. Here’s what to look for.

You can’t be too careful when it comes to generative artificial intelligence (AI), according to a June 2023 report from MIT Sloan Management Review and Boston Consulting Group (BCG). Enterprise users of generative AI tools such as ChatGPT, DALL-E and Midjourney aren’t doing enough to protect themselves from the risks.

The report, Building Robust RAI [responsible AI] Programs as Third-Party AI Tools Proliferate, reveals that 78% of organizations access, buy, license, or otherwise use third-party generative AI tools. Some organizations are not even aware of all the ways employees are using generative AI tools, a phenomenon termed “shadow AI.”

In light or shadow, generative AI tools pose real risks. Using these tools improperly can result in reputational damage, regulatory penalties, financial loss and litigation, according to the report, which emphasizes that “outsourcing AI from third parties doesn’t inoculate organizations from these hazards.” The report found third-party tools account for more than half of all AI-related failures.

So what can organizations do to guard against these risks? Start by putting those third-party tools to the test, and properly.

How to evaluate generative AI tools responsibly

As generative AI has exploded in popularity, so too have the number of third-party tools that incorporate it. There are now countless AI apps for generating text, images, code, sound, presentations and much more, and enterprise users—including engineers—are finding creative ways to use these tools as an AI copilot.

But how many are taking the time to ensure the tools are safe? Not enough, according to the report, which reveals that 20% of companies using third-party generative AI tools do not evaluate their risks at all.

Even those that do evaluate the risks may be stopping short. One of the report’s key findings was that organizations should use several different methods to evaluate third-party generative AI tools—and the more the better. The report found that organizations using seven evaluation methods were more than twice as likely to uncover AI failures as organization using only three evaluation methods (51% versus 24%).

There are several approaches to evaluating generative AI tools, according to the report, which suggests that organizations could evaluate:

  • a vendor’s responsible AI practices
  • contractual language mandating adherence to RAI principles
  • vendor pre-certification and audits, where available
  • internal product-level reviews (for cases in which third-party tools are integrated into another product or service)
  • adherence to regulatory requirements and industry standards

Overall, the report emphasizes that third-party generative AI tools should be evaluated in the same way as an organization’s internal AI tools and in line with its overall RAI strategy.

“To effectively address the risks associated with third-party AI tools, RAI programs should include a comprehensive set of policies and procedures, such as guidelines for ethical AI development, risk assessment frameworks, and monitoring and auditing protocols,” Oarabile Mudongo, a policy specialist at the African Observatory on Responsible AI, says in the report.

For more help implementing AI responsibly, read Why You Need a Generative AI Policy.

Written by

Michael Alba

Michael is a senior editor at engineering.com. He covers computer hardware, design software, electronics, and more. Michael holds a degree in Engineering Physics from the University of Alberta.