rag – Jyoti Agarwal

Grounding Your Salesforce Agent With Real-World Data (RAG, Chunking, Data Library & More!)

If Part 1 was about understanding what Agentforce is, Part 2 is all about understanding how your agent becomes smart, trustworthy, and actually useful in the real world.

And the secret is Grounding.
(Yes, the dramatic capital G is intentional 😄)

Let’s dive in.

🌍 What Is Grounding? (And Why Your Agent Needs It)

Grounding = connecting your AI agent to trusted, authoritative data so it answers based on facts — not imagination.

When you ask an agent a question like:

“What is the refund policy for our subscription product?”

It shouldn’t hallucinate. It should look at:

Your internal Knowledge Articles
Your Pricing policies
Your Product documentation
Your CRM records
Your Product database, etc.

That is grounding.

It tells the LLM:
👉 “Use THIS data only. Stay within THIS reality.”

The Building Blocks of an Agent

Even a perfectly grounded agent needs the right internal structure. Salesforce defines three essential elements that make up an agent:

1. Topics

Define what the agent is responsible for
Example: “Refund Requests”, “Appointment Scheduling”, “Order Status”

2. Instructions

Tell the agent how to behave, what to avoid, and what rules to follow
Example: “Always verify customer identity before sharing account details.”

3. Actions

Specific things the agent can perform
Examples:

Create a Case
Update an Order
Fetch Customer Details

⭐ Connect Actions to Data with Four Mechanisms

Grounding isn’t just about finding the right information — your agent must also know how to use that information when performing real actions.
In Agentforce, this connection happens through four powerful data-access mechanisms. Each mechanism tells the agent where the data lives and how it should be retrieved or modified.

These mechanisms act like different “doors” through which the agent can reach your business data, depending on what the task requires.

1️⃣ Grounded Actions — When your data is stored natively in Salesforce

Use Grounded Actions when the agent needs to work directly with Salesforce data you already trust — such as:

Accounts
Contacts
Leads
Cases
Opportunities
Custom objects

Grounded Actions allow the agent to read and write this data safely, using the platform’s built-in permissions and security model.
Perfect for CRM-centric tasks like:

“Update the case priority.”
“Create a follow-up task.”
“Find all opportunities closing this month.”

Because the agent uses real Salesforce objects, its decisions stay grounded in accurate, structured information.

2️⃣ Data Graph — When you need connected, contextual information

Sometimes data lives across many related objects. That’s where the Data Graph comes in.

A Data Graph gives your agent a relationship-aware view of your Salesforce data. You define a “graph” of objects and their connections — for example:

Customer → Orders → Order Line Items → Products

Your agent can then reason across the entire graph as a single interconnected dataset.

Useful for:

Customer 360 tasks
Order history analyses
Eligibility checks
Product recommendations

The Data Graph works best when decisions depend on multiple objects connected through relationships.

3️⃣ Actions on CRM and External Systems — When data lives beyond Salesforce

Businesses don’t live in one system, and neither should your agent.

This mechanism allows your Agentforce agent to interact with:

External APIs
Integration platforms
Back-office applications
Custom REST endpoints

Examples:

Fetching shipment tracking from a logistics system
Pulling credit score from a partner API
Checking inventory in a warehouse system

This expands your agent’s capabilities far beyond CRM and ensures it has access to real-time operational data, even if it lives outside Salesforce.

4️⃣RAG: The Heart of Grounding

Retrieval-Augmented Generation (RAG) means the agent:

Receives a user query
Retrieves relevant, real-world data
Uses that data to generate grounded, factual output

LLMs don’t know your business.
RAG lets them pull knowledge from YOUR data before generating an answer.

Structured vs. Unstructured Data in RAG

RAG can ground using both types of data:

1️⃣ Structured Data

Highly organized. Searchable by fields.
Examples:

Salesforce Objects (Lead, Case, Product, Contract)
Database tables
CSVs

Great for:
✔ precise lookups
✔ numerical or identifier-based queries

Example:

“What is the warranty period for product XYZ123?”

A simple CRM lookup might be enough.

2️⃣ Unstructured Data

Humans love writing. Machines don’t love parsing it.
Examples:

PDFs
Policy documents
Web pages
Meeting transcripts
User manuals
Knowledge articles

This is where LLMs shine — but only if you help them access the right parts.

3️⃣ Semi-Structured

A mix.
Examples:

JSON
XML
Chat logs
Formatted docs

🔥Most organizations have tons of unstructured content lying around — but it’s rich with answers. RAG makes unstructured data searchable, relevant, and safe to use inside an AI workflow.

📚 Introducing Agentforce Data Library

(Where Chunking, Indexing & Retrieval Live)

Agentforce uses the Agentforce Data Library (ADL) to ingest, transform, index, and prepare your data for retrieval.

Think of ADL as the “data brain” behind your agent.

🔨 How Data Library Works (The Real Magic)

Let’s break it down into digestible steps.

🧩 1. Chunking — Breaking Large Content Into Smart Pieces

LLMs can’t read a 40-page PDF and decide which part is relevant.
So ADL automatically chops your documents into smaller, meaningful “chunks.”

Example:

A 20-page Refund Policy PDF → 200 chunks
A product manual → 100 chunks

Each chunk becomes a small searchable unit.

👉 This makes retrieval fast, accurate, and context-rich.

🗂 2. Indexing — Creating a High-Speed Search Layer

After chunking, ADL builds a vector index.

In simple terms:

Each chunk becomes an embedding (mathematical representation of meaning)
These embeddings are placed in an index
When the agent gets a question, it finds the most similar chunks

This is the backbone of RAG.

🧭 3. Retriever — The Engine That Finds Relevant Chunks

The retriever is what actually searches the index.

When a user asks:
👉 “What are the cancellation rules for Enterprise Customers?”

The retriever fetches:

Enterprise contract policies
SLA docs
Pricing schedules
Relevant knowledge articles

These chunks are sent to the LLM along with the prompt template.

⚙️ 4. Setup-Time vs Run-Time — What Happens When?

Setup-Time (When You Configure ADL):

✔ You add data sources (files, knowledge articles, objects)
✔ ADL creates a Data Stream
✔ Chunking happens
✔ Indexing happens
✔ Retriever is prepared
✔ Metadata + mappings are generated
✔ You reference the retriever in your agent’s design

Run-Time (When the Agent Is Live):

User asks a question
Retriever searches the index
Most relevant chunks are selected
Prompt template is filled with these chunks
LLM generates grounded response
Agent returns accurate, policy-compliant output

🧪 A Practical Example — Making a “Refund & Warranty Support Agent”

Imagine you upload:

3 Warranty policy PDFs
50 Knowledge articles
A troubleshooting guide
A CSV of product models

ADL will:
🟦 Chunk PDFs → 700 chunks
🟦 Chunk support documents → 300 chunks
🟦 Create embedding index
🟦 Build retriever
🟦 Allow agent to pull relevant blocks at runtime

Then your agent can answer:
💬 “What’s the refund window for Model Z?”
💬 “Do premium users get extended warranty?”
💬 “Can I return a product without invoice?”

With incredible accuracy — because it uses YOUR content.

Tag: rag

🚀 AgentForce Series — Part 2: