How can engineers reduce AI model hallucinations – part 2

More best practices engineers can use to significantly reduce model hallucinations.

Many engineers have adopted generative AI at a record pace as part of their organization’s digital transformation. They like its tangible business benefits, the breadth of its applications, and often its ease of implementation.

Hallucinations can significantly undermine end-user trust. They arise from various factors, including:

  • Patchy, insufficient or false training data. It results in the Large Language Model (LLM or model) fabricating information when it’s unsure of the correct answer.
  • Model lacks proper grounding and context to determine factual inaccuracies.
  • Excessive model complexity for the application.
  • Inadequate software testing.
  • Poorly crafted, imprecise or vague end-user prompts.

Organizations can mitigate the risk and frequency of these hallucinations occurring. That avoids embarrassing the company and misleading its customers by adopting multiple strategies, including:


  • Clear model goal.
  • Balanced training data.
  • Accurate training data.
  • Adversarial fortification.
  • Sufficient model tuning.
  • Limit responses.
  • Comprehensive model testing.
  • Precision prompts.
  • Fact-check outputs.
  • Human oversight.

Let’s explore the last five of these mitigations in more detail. To read about the first five, click here.

Limit responses

Models produce hallucinations more often when they lack constraints that limit the scope of possible outputs. To improve the overall accuracy of outputs, define boundaries for models using filtering tools, maximum word lengths and clear probabilistic thresholds for the acceptability of outputs. These limits reduce the risk of hallucinations.

For example, when the model cannot assign a sufficient confidence level to a proposed recommendation about optimizing a production process, it should not provide that output to an engineer.

Comprehensive model testing

Inadequately tested models produce more hallucinations than comprehensively tested models.

Testing typically detects hallucinations by cross-referencing model-generated output with other trusted and authoritative sources.

It’s easy to recommend testing models rigorously before production use. It is vital to preventing or at least dramatically reducing the risk of hallucinations. However, software development teams are always under schedule pressure, and testing is the easiest task to shortchange because it occurs near the end of the project.

For example, project managers must assertively remind management of the costs and reputational risks of releasing inadequately tested models for production use.

Precision prompts

Ambiguity or lack of specificity in prompts can result in the model generating hallucinations or output that doesn’t align with the end-user’s intent. That result decreases confidence in the model or causes misinterpretation or misinformation.

Asking the right question is essential to achieve superior outputs from models. Accurate, relevant outputs depend on the clarity and specificity of engineers’ prompts. Precision prompts that reduce hallucinations exhibit these features:

  • Maximize clarity and specificity by writing prompts that are as short and specific as possible.
  • Provide context such as time, location or unique identifiers to narrow the scope.
  • Use descriptive language by specifying relevant characteristics such as profession, discipline, industry or geographic region.
  • Plan an iterative approach by refining successive prompts based on previous outputs.
  • Minimize the risk of biased outputs by ensuring fairness and inclusivity.

For example, write a specific prompt like “How is consistency achieved in stamping steel automotive wheels?” Avoid a general prompt like “How is quality achieved in manufacturing wheels?”

Fact-check outputs

Sometimes, hallucinations are not recognized and used by engineers in their work with dangerous or expensive consequences.

Engineers can reduce this hallucination risk by:

  • Fact-checking the output against other sources.
  • Asking the model to describe its reasoning and data sources.
  • Checking if the output is logically consistent and aligns with general world knowledge.
  • Writing a slightly different prompt to see if the model produces the same output.

For example, a prompt about a chemical additive should not produce output about a closely related but materially different chemical.

Human oversight

Once a model is in routine production use, it’s tempting for engineers to move on to address the next AI opportunity. However, not monitoring the performance of your AI application means you have no sense of the:

  • Number of hallucinations it’s producing.
  • Need to adjust or retrain the model as data ages and evolves.
  • Evolving end-user requirements that need to be addressed through model enhancements.

A better practice is to assign an analyst to regularly sample model outputs to validate their accuracy and relevance. Analysts can spot hallucinations that suggest model refinement is necessary.

For example, a model designed to support problem diagnosis for complex production machinery may occasionally provide an inaccurate investigation recommendation.

By implementing these best practices, engineers can significantly reduce model hallucinations and build confidence in the reliability of model outputs to advance their digital transformation.

Written by

Yogi Schulz

Yogi Schulz has over 40 years of Information Technology experience in various industries. He writes for ITWorldCanada and other trade publications. Yogi works extensively in the petroleum industry to select and implement financial, production revenue accounting, land & contracts, and geotechnical systems. He manages projects that arise from changes in business requirements, from the need to leverage technology opportunities and from mergers. His specialties include IT strategy, web strategy, and systems project management.