Announcing MCP Connect - Build and debug Model Context Protocol integrations directly in your browser
technical
November 17, 202513 min read
Dan Starns
by Dan Starns

Why Your AI Agent Needs Semantic Tool Discovery

As AI agents gain access to hundreds of tools through MCP servers, traditional tool selection breaks down. Learn how vector search and RAG can reduce prompt tokens while improving tool selection accuracy.

Why Your AI Agent Needs Semantic Tool Discovery

Want to try this yourself? Check out the MCP-RAG repository on GitHub and see semantic tool selection in action with our free, open-source MCP Connect Inspector UI. Follow our step-by-step guide to set it up and watch vector search reduce your token usage by 89%.

The Model Context Protocol has unlocked something remarkable: AI agents can now seamlessly integrate with dozens, even hundreds, of tools. GitHub's MCP server alone exposes over 90 operations. Slack, Linear, databases, APIs—the ecosystem grows daily.

But here's the problem: current AI agents send every single tool definition to the LLM on every request.

When you ask Claude to "create an issue in my repo," it receives descriptions for all available tools, even though it only needs create_issue. That's wasted tokens. Multiply this across conversations, and costs add up while response quality can degrade from information overload.

The Challenge: Too Many Tools

Let's look at a concrete example. When you connect a GitHub MCP server to an AI agent, you expose over 90 tools. Without intelligent filtering, every request includes:

  • Tool names and descriptions
  • Parameter schemas for each tool
  • Return type definitions
  • Required vs optional parameter flags

This information repeats for every single tool in every single request, even when most tools are completely irrelevant to the user's query.

How Semantic Tool Discovery Works

Traditional tool selection is binary: include everything or nothing. Semantic discovery uses vector embeddings to match user intent with relevant tools.

Here's the flow:

  1. Index Phase (one-time): Generate embeddings for every tool, parameter, and return type. Store these in a vector database like Neo4j.

  2. Query Phase (every request): Embed the user's query, search for semantically similar tools, and send only the top matches to your LLM.

Instead of sending all available tools when someone asks "create an issue," you send the most semantically relevant ones: create_issue, update_issue, create_issue_comment.

The Decomposed Approach

Most tool selection systems embed only tool names and descriptions. MCP-RAG uses a decomposed approach that indexes tools at a granular level:

typescript
// Instead of just:
ToolEmbedding

// We decompose:
ToolEmbedding
  ├─ Parameter 1Embedding
  ├─ Parameter 2Embedding
  └─ Return TypeEmbedding

This granular approach means queries like "add a comment about the bug" will match not just tools with "comment" in their name, but also tools with comment parameters or return types.

In Neo4j, this creates a graph structure:

cypher
(ToolSet)-[:HAS_TOOL]->(Tool)
(Tool)-[:HAS_PARAM]->(Parameter)
(Tool)-[:RETURNS]->(ReturnType)
Neo4j Graph Model showing ToolSet, Tool, Parameter, and ReturnType nodes with their relationships

Each node (Tool, Parameter, ReturnType) has its own vector embedding, enabling nuanced semantic matching across the entire tool schema.

Real-World Results

We benchmarked MCP-RAG against traditional tool selection using a GitHub server scenario with 90+ tools across 5 different queries in a sequential conversation. The tests simulate realistic multi-turn interactions where context accumulates across prompts.

Benchmark Methodology: Each test runs 5 sequential prompts that trigger different tools, mirroring real-world agent conversations. Both approaches use the complete GitHub MCP Server toolset (90+ tools) to represent authentic large-scale scenarios.

Performance Comparison

Both approaches achieved 100% accuracy (5/5 tests passed), demonstrating that RAG maintains perfect tool selection while dramatically improving efficiency.

MetricBase Tool SelectionRAG Tool SelectionImprovement
Average Response Time4,595 ms1,742 ms62.1% faster
Total Response Time22,975 ms8,710 ms62.1% faster
Min Response Time2,019 ms1,532 ms24.1% faster
Max Response Time8,627 ms2,190 ms74.6% faster

Token Efficiency

MetricBase Tool SelectionRAG Tool SelectionReduction
Total Tokens47,3235,16489.1% reduction
Average Tokens/Test9,4651,03389.1% reduction
Prompt Tokens47,1615,00289.4% reduction
Completion Tokens162162No change

Individual Test Performance

Test CaseBase Response TimeRAG Response TimeToken Reduction
Get PR #427,808 ms1,659 ms91.7%
List Issues2,446 ms1,653 ms86.1%
Create Issue2,019 ms1,676 ms88.5%
Get README2,075 ms2,190 ms91.6%
Add Comment8,627 ms1,532 ms87.5%

Key Findings:

  • RAG reduces token usage by approximately 89%, translating to significant cost savings
  • Response times improve by over 60% on average, enhancing user experience
  • Perfect accuracy maintained across all test cases
  • More consistent performance with lower variance across runs

These benchmark results are automatically generated from the test suite. View the latest performance data:

Sample Query Results

Query: "Get pull request #42 from rocket-connect/mcp-rag"

  • Selected tool: get_pull_request
  • Tokens used: 785
  • Response time: 1,659ms

Query: "List all open issues in the repository rocket-connect/mcp-rag"

  • Selected tool: list_issues
  • Tokens used: 1,309
  • Response time: 1,653ms

Query: "Create a new issue in rocket-connect/mcp-rag with title 'Test Issue' and body 'This is a test'"

  • Selected tool: create_issue
  • Tokens used: 1,088
  • Response time: 1,676ms

Query: "Get the contents of the README.md file from the main branch in rocket-connect/mcp-rag"

  • Selected tool: get_file_contents
  • Tokens used: 795
  • Response time: 2,190ms

Query: "Add a comment to issue #1 in rocket-connect/mcp-rag saying 'This is a test comment'"

  • Selected tool: create_issue_comment
  • Tokens used: 1,187
  • Response time: 1,532ms

Every query selected the correct tool on the first try. The system successfully matched natural language queries to specific GitHub operations through semantic similarity.

Why Vector Search Works

Traditional keyword matching struggles with natural language queries:

Query: "Let the team know about the deployment"

Keyword Match: Looks for exact words like "team", "know", "deployment" Problem: Misses tools like send_message, post_to_channel, notify_users

Vector Match: Understands semantic similarity Success: Correctly finds messaging and notification tools

Vector embeddings capture meaning rather than just matching words. They understand that "let the team know" and "send a message" are semantically similar, even without word overlap.

Implementation: Core Components

The MCP-RAG implementation consists of two main packages:

1. Neo4j Package (@mcp-rag/neo4j)

Handles all graph database operations through a CypherQueryBuilder class that generates parameterized Cypher queries:

Tool Indexing: The createDecomposedTools method stores tool schemas with vector embeddings in a graph structure.

Vector Search: The vectorSearchDecomposed method queries using Neo4j's native vector index with configurable similarity thresholds.

Schema Management: Methods like createVectorIndex and checkVectorIndex manage the vector index infrastructure.

2. Client Package (@mcp-rag/client)

Provides the high-level API as a wrapper around the AI SDK:

typescript
import { createMCPRag } from '@mcp-rag/client';
import { openai } from '@ai-sdk/openai';
import neo4j from 'neo4j-driver';

const rag = createMCPRag({
  model: openai('gpt-4'),
  neo4j: driver,
  tools: {
    // Your MCP tools here
  },
  openaiApiKey: process.env.OPENAI_API_KEY,
});

// Sync tools to Neo4j
await rag.sync();

// Generate text with automatic tool selection
const result = await rag.generateText({
  prompt: 'create an issue about the bug',
});

The client uses OpenAI's text-embedding-3-small model (1536 dimensions) for generating embeddings, creating a consistent embedding space for all tool components.

The Sync Process

The sync() method is where the indexing happens:

  1. Creates Vector Index: Sets up a Neo4j vector index for cosine similarity search with 1536 dimensions

  2. Generates Embeddings: For each tool, creates embeddings for:

    • The tool itself (name + description)
    • Each parameter (name + description)
    • The return type
  3. Builds Graph Structure: Creates nodes and relationships in Neo4j using the decomposed model

  4. Idempotent Design: Uses MERGE operations so multiple syncs won't create duplicates

The sync process is required before first use and should be called again if you modify your toolset with addTool() or removeTool().

Toolset Hashes

Hashes uniquely identify toolset versions for change detection and multi-version support.

How Hashes Work

  1. Serialization – Tools are deep-cloned (excluding execute functions)
  2. Sorting – Tools sorted by name, all nested keys sorted recursively
  3. Hashing – JSON string passed to hash function
typescript
// Hash changes when tools change
const hash1 = rag.getToolsetHash();
rag.addTool('newTool', myTool);
const hash2 = rag.getToolsetHash(); // Different from hash1

The default hash uses a bitwise function returning toolset-<hex>. You can provide a custom hash function for browser environments:

typescript
const rag = createMCPRag({
  // ...
  hashFunction: (input) => {
    const encoder = new TextEncoder();
    const data = encoder.encode(input);
    const hashBuffer = await crypto.subtle.digest('SHA-256', data);
    return Array.from(new Uint8Array(hashBuffer))
      .map((b) => b.toString(16).padStart(2, '0'))
      .join('');
  },
});

Hash Properties

  • Deterministic – Same toolset always produces same hash
  • Order-independent – Tool/property order doesn't affect hash
  • Change-sensitive – Any definition change produces different hash

Migrations

Migration syncs tool definitions to Neo4j, creating the graph structure for vector search.

Migration Flow

  1. Check – Determines if migration needed (hash exists in Neo4j?)
  2. Index – Creates vector index (1536 dimensions, cosine similarity)
  3. Embed – Generates embeddings for tools, parameters, return types
  4. Store – Creates graph structure in Neo4j

Custom Migration Hooks

typescript
const rag = createMCPRag({
  // ...
  migration: {
    shouldMigrate: async (session) => {
      const result = await session.run('...');
      return result.records.length === 0;
    },
    migrate: async (session, tools) => {
      // Your migration code
    },
    onBeforeMigrate: async (statements) => {
      console.log('Migrating:', statements.length, 'statements');
      return statements;
    },
  },
});

Multi-Version Toolsets

Multiple toolset versions can coexist in Neo4j:

typescript
// Version 1
await rag.sync();
const v1Hash = rag.getToolsetHash();

// Version 2 (both exist in DB)
rag.addTool('newTool', myTool);
await rag.sync();
const v2Hash = rag.getToolsetHash();

// Manage versions
const v1Info = await rag.getToolsetByHash(v1Hash);
await rag.deleteToolsetByHash(v1Hash); // Clean up old version

The generateText Flow

When you call rag.generateText(), here's what happens:

  1. Ensures Sync: Automatically runs sync if tools haven't been indexed

  2. Semantic Selection:

    • Generates an embedding for your prompt using the same model
    • Performs Neo4j vector similarity search
    • Retrieves the top N most relevant tools (default: 10)
  3. Passes to AI SDK: Calls the standard AI SDK generateText with only the selected tools

  4. Returns Full Result: Returns the complete AI SDK result including tool calls, token usage, and response content

This means MCP-RAG is a drop-in replacement for the AI SDK—it accepts the same parameters and returns the same result structures.

Example tools model visualization showing how semantic search selects relevant tools

When to Use Semantic Tool Discovery

RAG-based tool selection makes sense in specific scenarios:

Consider RAG When:

  • You have 20+ tools from one or more MCP servers
  • Your tools have overlapping functionality
  • Users make natural language requests with ambiguous intent
  • You're building production agents with token/cost concerns
  • You're integrating multiple MCP servers with combined tool counts of 50+

RAG May Not Be Needed If:

  • You have fewer than 10 clearly distinct tools
  • Tool selection is straightforward with no ambiguity
  • You're prototyping and token efficiency isn't a priority yet
  • Your tools have very different purposes with minimal overlap

Getting Started

The MCP-RAG project is open source and available on GitHub:

Repository: rocket-connect/mcp-rag

The monorepo includes:

  1. packages/client: High-level wrapper around AI SDK with semantic tool selection
  2. packages/neo4j: Neo4j-specific Cypher query builder and graph operations
  3. benchmarks: Test suite with GitHub tool selection benchmarks comparing RAG vs baseline
  4. examples/github: Complete working example with 93 GitHub MCP tools

Installation:

bash
npm install @mcp-rag/client @ai-sdk/openai neo4j-driver ai

export OPENAI_API_KEY=your_key_here

You'll need:

  • A Neo4j instance (cloud or local via Docker)
  • An OpenAI API key for embeddings (uses text-embedding-3-small)
  • Your MCP tools as AI SDK tool definitions

Running the Benchmarks

To see the system in action:

bash
git clone https://github.com/rocket-connect/mcp-rag
pnpm install

export OPENAI_API_KEY=your_key_here
export NEO4J_URI=neo4j://localhost:7687
export NEO4J_PASSWORD=your_password

cd benchmarks
pnpm test

The benchmark suite outputs detailed metrics including token counts, response times, and success rates. Results are saved in benchmarks/results/ with both latest snapshots and timestamped history.

The GitHub Example

Want to see MCP-RAG with real tools? The examples/github directory demonstrates the complete workflow:

GitHub MCP tools visualized in Neo4j Browser showing the graph structure
  1. Mocks all 93 tools from the GitHub MCP server
  2. Syncs them to Neo4j with embeddings
  3. Performs vector similarity search on user prompts
  4. Shows the top 10 selected tools with debug output
  5. Allows interactive testing with different queries

This example is perfect for understanding how semantic search reduces context overhead in production scenarios.

What's Next

The MCP ecosystem is evolving rapidly. Potential future enhancements include:

  • Custom Embedding Models: Support for other embedding providers beyond OpenAI
  • Adaptive Tool Selection: Dynamically adjusting the number of retrieved tools based on query complexity
  • Context-Aware Filtering: Using conversation history to bias tool selection toward recently used tools
  • Tool Composition: Chaining multiple tools together based on semantic relationships in the graph
  • Multi-Vector Search: Combining tool embeddings with parameter and return type embeddings for more nuanced matching

Final Thoughts

As MCP servers proliferate and AI agents gain access to more capabilities, intelligent tool selection becomes increasingly important. Sending every tool definition on every request doesn't scale—it wastes tokens, increases latency, and can overwhelm the model with irrelevant context.

Semantic tool discovery via vector search addresses this elegantly. The implementation is relatively straightforward: wrap the AI SDK, generate embeddings for tools at initialization, and perform similarity search on each request. The results speak for themselves: 89% reduction in tokens with 62% faster response times, all while maintaining perfect accuracy.

MCP-RAG provides a production-ready foundation for building AI agents that can scale to hundreds of tools while maintaining fast responses and accurate tool selection. It's a drop-in replacement for the AI SDK that adds semantic intelligence without changing your existing workflow.

Try It Yourself

Ready to see semantic tool selection in action? Our free, open-source MCP Connect Inspector UI lets you experiment with vector search directly in your browser.

Follow our step-by-step guide to:

  • Connect your MCP servers
  • Set up Neo4j for vector search
  • Watch semantic selection reduce tokens in real-time
  • Compare before/after metrics on your own queries

Resources


Dan Starns is the Founder & CTO of Rocket Connect, former core contributor to Neo4j, and advocate for graph databases in AI applications. He's currently based in Southeast Asia, organizing developer events and building tools to make AI agents more efficient.

TAGS

AI agentssemantic searchtool discoveryMCPModel Context Protocolvector searchRAGNeo4jClaudeLLM optimizationtoken efficiencyAI toolingdeveloper productivitygraph databasesembeddings

Connect with us now!