When AI Gets It Wrong: What We Learned About Intent and Structure

robot analyzer

From brittle regex to contract-first prompting, and everything in between

Abstract

Large Language Models are amazing at generating text, but when it comes to answering questions about your company’s data, they are often lost without a translator. I have been working on building that translator, an “intent classifier”, that can understand a user’s request and send it to the right place. Spoiler: it is harder than it sounds.


The Core Problem

When you build an artificial intelligence chat system on top of real business data, the big challenge is not just getting the answer; it is making sure the artificial intelligence understands the question well enough to know where to look.

We are leveraging Amplfiy Gen2, Bedrock, OpenSearch, and other AWS services to process and access our data. In our current system, we have multiple data types: clients, operational entities, and activity records. Each serves a different purpose, but they are related in ways that make query routing tricky. And more data types are coming. Each has its own OpenSearch index, populated from Amazon DynamoDB using Amazon Web Services Zero-ETL pipelines. Sounds neat in theory.

The problem is figuring out which indices to search, in which order, and how to form the query based on what a human types in a chat.

Why It Is Hard: Real Examples

Lorem ipsum dolor sit amet, at mei dolore tritani repudiandae. In his nemore temporibus consequuntur, vim ad prima vivendum consetetur. Viderer feugiat at pro, mea aperiam

  1. “How many records do you see in location <number>?”
    The AI needs to know this is an operational entity search, not just a search for where that number appears anywhere in a record.
  2. “How many records do you see in <city>?”
    The AI has to recognize <city> as a location property, find all operational entities in that city (possibly across multiple states), and then run a follow-up search for records tied to those entities.
  3. “How many related records do you see based on <recordId>?”
    This could mean:
    • Using a field populated during ingest that directly links related records.
    • Running a semantic similarity search to find records like the given record.
      The artificial intelligence might need to clarify which one the user wants, or try both

These all might require clarification from the user and multiple turns in the search process. And when you are building an AI that is supposed to handle these seamlessly, each case you “fix” can unexpectedly break others.

Approaches We Have Tried

  • Regular expression-based intent detection – Works for some queries, but it is brittle. One slight wording change and it breaks.
  • Multi-match OpenSearch queries – Let OpenSearch guess. Great when it works, but unpredictable. OpenSearch does have additional tools we may try in the future.
  • A middle-layer classifier (Claude 3.5 Haiku) – Similar to Anthropic’s “ticket routing” example, this classifier reads the user’s question, determines which type of request it is, and decides how to handle it, sometimes asking clarifying questions. Promising, but fixing one broken case often breaks others.
  • Schema-aware prompts – Giving the AI a map of what fields exist. Helps… sometimes.

A More Promising Direction: Contract‑First Prompting

Recently, we have started experimenting with a technique called “contract‑first prompting”, a method described by Nate Jones. It is a way to define a strict “contract” for how the AI should respond before it ever sees a user request. Instead of letting the model guess what format or structure we want, we lock down the rules and have it fill in the blanks.

This forces consistency and predictability in how the artificial intelligence interprets intent. We modified this a bit to fit our current architecture. It is early, but this approach already feels more stable than anything we have tried before. The AI has less room to improvise, which in this case is exactly what we want.

Final Thoughts 

The lesson so far is that AI chat needs a translator layer, something that lives between the user and the database, inferring human intent with natural language queries and turning them into structured queries reliably. Building that translator is as much an engineering problem as it is an AI problem. I can also mean adjusting the prompt that is being used to guide the AI behind the scenes.

With methods like contract‑first prompting, I am finally starting to feel like that translator might speak both languages fluently. But this is still a work in progress, and I know others are tackling similar problems.

If you have been down this road, or are building your own AI translator layer, I would love to hear what has worked and what has not for you. You can reach out through our contact page or connect with me on LinkedIn. Let us compare notes before our artificial intelligences guess us into a corner.

Matt Pitts, Sr Architect

Tags:

No responses yet

Leave a Reply

Your email address will not be published. Required fields are marked *