Back to blog

Vertex AI Search vs Text-to-SQL: A Comparative Study for Natural Language Search

February 15, 2025·4 min read
ArchitectureAIGCPBigQuery

When working with large volumes of corporate data, one of the most recurrent demands is to allow non-technical users to query using natural language. But what is the best approach to implement this?

In this article, I share the results of a technical study I conducted evaluating three different approaches for natural language search over data stored in BigQuery.

The Problem

The scenario was clear: we had millions of records in BigQuery (test reports, Change Requests, KPIs) and stakeholders needed to query this data without writing SQL. The central question was: how to transform natural language questions into accurate and reliable results?

The Three Evaluated Approaches

1. Vertex AI Search

Vertex AI Search is Google's managed solution for semantic search. It indexes data and allows natural language searches with context understanding.

Pros:

  • Relatively simple setup — just connect the data source
  • Native semantic understanding, without the need for complex prompt engineering
  • Direct integration with data in BigQuery and Cloud Storage

Cons:

  • High cost for large volumes of data
  • Less control over ranking and relevance logic
  • Latency can be high for complex queries

2. BigQuery ML with Embeddings

In this approach, we use BigQuery ML to generate embeddings of the data and the user's query, comparing them by vector similarity.

Pros:

  • Data stays in BigQuery, no need to move it to another service
  • Full control over the model and embeddings
  • Predictable cost (based on query processing)

Cons:

  • Requires data engineering to generate and maintain embeddings
  • Quality depends directly on the chosen embedding model
  • Significantly higher implementation complexity

3. Text-to-SQL with LLM

The most direct approach: using an LLM (like Gemini) to convert the user's question directly into a SQL query.

Pros:

  • Maximum flexibility — any question can become any query
  • Accurate result when the generated SQL is correct
  • User experience closer to a "conversation"

Contras:

  • Risk of hallucination: the LLM may generate invalid or semantically incorrect SQL
  • Requires detailed prompt engineering (schema, examples, validations)
  • Need for security layers to prevent destructive queries

Comparative Results

CriterionVertex AI SearchBigQuery MLText-to-SQL
AccuracyHighMedium-HighVariable
CostHighMediumLow-Medium
Setup ComplexityLowHighMedium
MaintenanceLowHighMedium
FlexibilityMediumHighVery High
LatencyMediumLowMedium

The Architectural Decision

We opted for a hybrid approach: Text-to-SQL as the main layer (for flexibility and cost), with a validation mechanism that checks the generated SQL against the real schema before execution.

To mitigate hallucinations, we implemented:

  • Schema injection in the prompt: the LLM receives the full schema of the relevant tables
  • Few-shot examples: examples of questions and expected SQLs to guide the model
  • Syntactic validation: the generated SQL is parsed before being executed
  • Fallback to simple search: if the SQL fails validation, we offer a keyword search

Lessons Learned

  1. There is no silver bullet. Each approach has its place. The important thing is to understand the trade-offs and choose based on business requirements.

  2. Cost is an architectural variable. Vertex AI Search solves elegantly, but the cost at scale can be prohibitive. Text-to-SQL is cheaper but requires investment in prompt engineering.

  3. Caching is indispensable. Regardless of the approach, implementing semantic caching (similar queries return cached results) reduced costs by more than 60%.

  4. Validation is as important as generation. In systems based on generative AI, the validation and security layer is what separates a prototype from a product.


This study was conducted within the context of a large-scale internal system. Numbers and trade-offs may vary depending on data volume and specific requirements of your case.