Search isn’t just about rankings anymore. If you’re optimizing for Perplexity, Copilot, or Google’s AI Overviews, you’re up against layered retrieval systems and generative models that don’t work like traditional indexes. The only way to know if your content stands a chance is to simulate how those systems actually think.
Simulation closes the gap between guesswork and execution. You can test how your content gets chunked, see what gets retrieved, and compare it to what AI actually cites. It’s not about copying Google’s system – it’s about building a “close enough” test rig that gives you answers fast. Teams using simulation aren’t just reacting to updates. They’re predicting outcomes before content goes live.
Making GEO Work: Why Simulation Changes the Game
Search optimization used to be a reaction game. Google would tweak the algorithm, rankings would shuffle, and we’d respond with updates based on what moved and why. That rhythm doesn’t cut it anymore. Generative Engine Optimization (GEO) demands something else entirely – not just watching the system, but anticipating it. The search engines we’re dealing with now – Perplexity, Copilot, AI Overviews – aren’t keyword matchers. They’re layered reasoning machines with hidden retrieval steps, generative synthesis, and their own filters. If we keep treating them like static search boxes, we’ll stay one step behind.
Simulation flips the dynamic. It’s not theory. It’s how you start to see your content through the eyes of an AI system – how it breaks your page into chunks, ranks those fragments, and decides what’s worth showing. Whether you’re using LLMs to score retrievability, running synthetic queries, testing prompts for hallucinations, or analyzing chunk overlap, the goal stays the same: recreate enough of the internal logic so you’re not guessing when content hits the real thing. You’re testing, refining, and shipping with intent.
Why Simulation Makes or Breaks GEO Performance
Working with AI search means you’re dealing with systems that don’t behave like the old-school SERPs. There’s no clear breadcrumb trail from keyword to ranking. Instead, you’re up against vector embeddings, transformer models, entity graphs, and layered orchestration logic. One tiny change to your content might move the needle – or do absolutely nothing – depending on how it lands in that stack.
That’s exactly why simulation is a must. It gives you two big advantages:
You can control the inputs
By running synthetic queries through a defined retrieval model, you get a clean view of what’s being pulled and why. That separation helps you understand what’s driving retrieval versus what gets shaped later in generation. (This was the core approach used in the original GEO experiments with Perplexity.)
You get answers fast
Instead of waiting around for a production model to reprocess your content, you can make changes, test locally, and adjust within a few hours – not days or weeks. That tight loop is a game-changer for teams trying to move at speed.
Looking Ahead
AI search systems aren’t standing still. As platforms evolve, we’re likely to see regular changes in how they handle retrieval – from new embedding architectures to shifts in how entities are linked or how much context gets passed to the model. Having a solid simulation setup in place isn’t just helpful here – it’s strategic. The sharper your internal view of how a specific AI surface works, the quicker you can spot and act on new ranking opportunities before the rest of the field catches up.
How NUOPTIMA Use Simulation to Shape GEO Outcomes
At NUOPTIMA, simulation isn’t a side tool – it’s built into how we work. We use it to see how content behaves before it ever reaches an AI Overview. That means stress-testing content structures, refining answers, and running retrieval scenarios that mimic what engines like Google’s AI Mode and Perplexity actually do. Instead of just hoping to be included, we give our clients a roadmap to becoming the foundation of the answer.
This method is part of our broader GEO strategy – one we’ve developed through real-world experimentation across dozens of industries. We identify high-value questions, analyze how competitors show up in generative results, and build structured content that AI models can trust and surface. The goal isn’t to chase rankings. It’s to shape the output itself – to make sure your brand becomes the cited authority in your niche.
If you’re curious how this works in practice, we regularly share behind-the-scenes examples and insights over on LinkedIn. Or if you’re ready to explore what GEO could look like for your business, we’d be happy to walk you through our framework.
Building Your Own Local Retrieval Simulator with LlamaIndex
If you want to see how your content actually performs in a RAG-style setup, the best place to start is with a small simulation environment. Keep it lightweight. A Python script is enough. The idea is simple: feed in a query and your content, and see what chunks the retriever pulls out to answer the question.
Here’s what goes into the setup:
- Trafilatura to cleanly extract text from any URL
- LlamaIndex for breaking your content into chunks, indexing them, and simulating retrieval
- FetchSERP to bring in live citations from Google’s AI Overviews or other AI Modes for comparison
And here’s what you’ll get back:
- A list of retrieved chunks: Showing which parts of your page the model would hand off to the LLM
- Overlap analysis: So you can spot where your content matches or misses against actual citations
- A relevance chart: A quick visual to help you gauge how strong each chunk scored
We’ll walk through the full setup step-by-step – minimal tooling, maximum clarity.
Step 1 – Set Up Your Tools
Before you can run simulations, you need to get your environment in order. The goal here is to pull text from any URL, break it into chunks, simulate how an AI retriever would handle it, and compare that to what actually shows up in AI Overviews.
You’ll need a few core tools:
- Trafilatura to extract clean, readable text from web pages
- LlamaIndex to handle chunking, embedding, and retrieval logic
- FetchSERP to fetch real-time AI search citations from Google
- Gemini embeddings to power the vector-based search layer
The setup is pretty straightforward. Once Python is installed, you’ll bring in the right libraries with a few pip commands, and you’ll want to keep your API keys organized in a .env file for easy reuse across steps.
If you’re working in Colab or a local environment, this step should take just a few minutes – and then you’re ready to move on to pulling live page content.
Step 2 – Plug In Your API Keys
To get the full simulation pipeline running, you’ll need to connect to a couple of external services. These APIs power everything from live AI search comparisons to vector embeddings – so without them, the system won’t move.
Here’s what you’ll need to set up:
- An API key from FetchSERP to pull real-time results from Google’s AI Overview or AI Mode
- A Gemini API key to embed your content and optionally run LLM prompts inside the retrieval layer
Once you’ve got those, the easiest way to keep everything clean is to drop them into a .env file. This keeps your keys out of the main code and makes switching environments less of a pain.
- Format’s simple: just set GOOGLE_API_KEY and FETCHSERP_API_KEY as environment variables. Your code will read them in automatically when it runs.
No need to overthink it. Get the keys, drop them into .env, and move on – you’ll test that everything’s wired up in the next step.
Step 3 – Extract Content for Indexing
Before you can simulate anything, you need to get the raw content out of your page – no clutter, no layout junk. That’s where Trafilatura comes in. It’s a lightweight tool that scrapes web pages and gives you just the structured text you need for indexing.
Here’s a simplified version of how you’d do it:
Copied!import trafilatura def get_clean_text(url): html = trafilatura.fetch_url(url) return trafilatura.extract(html, include_comments=False, include_tables=True)
This function takes a URL, fetches the HTML, and returns a clean version of the page – stripped of anything that would confuse your retriever later.
Once you’ve got the text, you’ll pass it into LlamaIndex for chunking and embedding. But before that, make sure your Gemini embedding model is set up. Here’s the minimal setup using environment variables:
Copied!from llama_index.core import Settings from llama_index.embeddings.gemini import GeminiEmbedding import os Settings.embed_model = GeminiEmbedding( model_name="models/gemini-embedding-001", api_key=os.getenv("GOOGLE_API_KEY") )
If you’re not working with live URLs, no problem – you can paste content directly instead. The rest of the workflow doesn’t change.
This step ensures the text you’re feeding into the retriever is clean, structured, and optimized for embedding – which is key to getting meaningful results later.
Step 4 – Chunk and Index Your Content
Once your content is cleaned and ready, the next step is to make it searchable – not by users, but by the AI retrieval layer.
This is where LlamaIndex comes in. You feed it the raw text, and it automatically splits it into smaller, logical chunks. Each chunk then gets embedded using Gemini’s gemini-embedding-001 model. That’s what allows a vector retriever to later match a query to the most relevant parts of your page – not just by keywords, but by semantic proximity.
The result is a searchable index of your content, prepped and ready for simulated retrieval. No manual tagging. No hardcoding. Just text in, embeddings out.
This step may feel invisible, but it’s where a lot of the magic happens. A strong index is the backbone of any serious GEO simulation.
Step 5 – See What the Retriever Actually Pulls
Now that your content is indexed, it’s time to put it to the test. The point here is to simulate what a vector-based retriever would actually pull from your page when given a real query – before that content ever hits production.
You run the query through your local index, and the retriever surfaces the chunks it thinks are the most relevant – typically the top 5, but that’s adjustable. Along with each chunk, you get a similarity score showing how closely it matches the intent of the prompt.
This is the same shortlist that, in a live RAG pipeline, would get passed into the LLM to generate an answer. So what you’re looking at is a clear preview: these are the parts of your content that actually “compete” in the final generation step.
You’ll start to see patterns – which sections consistently surface, which ones get ignored, and where you may be missing relevance cues. That insight becomes your roadmap for editing, expanding, or reworking the page.
Step 6 – Pull Live Citations from AI Search
Now that you’ve seen what your retriever pulls locally, it’s time to compare that with what’s actually showing up in live AI search results. That’s where FetchSERP comes in – it’s an API that gives you real-time access to what Google’s AI Overview (or AI Mode) cites for a given query.
Here’s a simplified way to fetch that data:
Copied!import requests def get_ai_citations(query, country="us"): url = "https://api.fetchserp.com/api/v1/serp_ai_mode" headers = {"x-api-key": os.getenv("FETCHSERP_API_KEY")} params = { "q": query, "search_engine": "google.com", "ai_overview": "true", "gl": country } response = requests.get(url, headers=headers, params=params, timeout=30) return response.json()
Once you have the raw JSON response, you can extract just the citation URLs like this:
Copied!def parse_citations(data): overview = data.get("ai_overview", Array) return [item["url"] for item in overview.get("citations", []) if "url" in item]
What you get back is a list of URLs that AI Overviews is currently using to answer that query – the sources it’s surfacing in real time.
This step is key for grounding your simulation. You’re not just testing in a vacuum anymore – now you’re lining up your retrieved chunks against what Google’s system actually trusts enough to cite. That comparison helps validate whether your page is close to making the cut – or still needs work to get there.
Step 7 – Compare What You Retrieved vs. What AI Search Cited
Now that you’ve got both sets of results – your simulated chunks and what AI search is actually surfacing – it’s time to compare. The goal here isn’t to be overly precise. You just want a quick way to see whether your page is getting picked up and cited in generative results.
Here’s a minimal way to scan for overlap between your content and the links returned by something like AI Overview:
Copied!def find_matches(live_urls, retrieved_chunks): matched = [] for live_url in live_urls: live_host = live_url.lower() for chunk, score in retrieved_chunks: if live_host in chunk.lower(): matched.append((live_url, chunk, score)) return matched
This is deliberately simple – basically a string check. It’s not perfect, and it’s definitely not fuzzy, but it gets you in the ballpark. If your retrieved chunk includes a citation from the live result, great. If not, that’s a signal too. Either your content didn’t register, or your chunking missed the right part.
Down the line, you can expand this. Add domain-level comparison, check for semantic similarity, or flag fragments that show partial mentions. But for a first pass? This works.
Step 8 – Put the Whole Workflow in Motion
Time to put it all together. This final step lets you go end-to-end – starting with a real URL and finishing with a direct comparison between your content and what AI search actually shows in the wild.
Here’s a simplified walkthrough of how the full process might look:
Copied!if __name__ == "__main__": query = "best ultralight tents for backpacking" source_url = "https://www.example.com/ultralight-tent-guide" # Step 1: Pull the content content = extract_text_from_url(source_url) # Step 2: Create a retrieval index from that content index = build_index_from_text(content) # Step 3: Run a retrieval query against your local simulation results = simulate_retrieval(index, query) print("\n--- Local Retrieval (Gemini Embeddings) ---") for idx, (chunk, score) in enumerate(results, 1): print(f"\n[{idx}] score = {score or 'n/a'}") print(chunk.strip()[:600] + "…") # Step 4: Get live citations from Google AI Overview try: ai_data = fetch_ai_overview(query) urls = extract_citation_urls(ai_data) print("\n--- Live Citations from AI Overview ---") for url in urls: print("-", url) # Step 5: Compare retrieved content with URLs in the live response matches = compare_chunks_with_live(urls, results) print("\n--- Matches (Retrieved Chunks Containing Live URLs) ---") if not matches: print("No overlaps found - not unusual, since your page is the test source.") else: for url, chunk, score in matches: print(f"- {url} | score = {score}") except Exception as error: print("\n[Warning] Couldn’t fetch or process AI Overview data:", error)
What’s happening here is a tight loop between simulation and production. You’re taking a real-world query, running it through your local model to see what would be retrieved, and then lining that up with what’s cited by live generative search. It’s not about pixel-perfect alignment – it’s about surfacing patterns, catching edge cases, and making more informed optimization decisions.
Once this pipeline is wired up, you can run dozens of queries at scale and track how often your content appears locally vs. in the wild. That’s when the insights really start compounding.
Why This Workflow Actually Moves the Needle
Running this simulation setup isn’t just an academic exercise – it gives you a clear line of sight into how your content behaves inside a retrieval-based search system. You’re no longer guessing what AI will pull. You can see exactly which chunks make the cut, which ones are skipped, and where your coverage falls short.
This is especially useful when you put those simulated results next to live citations from something like Google’s AI Overview. If key sections of your page aren’t surfacing in either view, you’ve got a starting point: fix what’s missing, re-test it, and see what changes. No need to wait weeks for AI systems to reprocess your page – you can tweak, rerun, and check results in real time.
And this is just the beginning. You can layer on synthetic query expansion to simulate how your content performs across dozens of related questions. You can roll up relevance scores across your entire site to spot weak spots at scale. And you can go deeper with hallucination testing – not just asking if your content gets retrieved, but how it gets represented in a generated response.
Done right, this turns a one-off testing tool into a living framework for GEO strategy – helping you tune your site based on how AI systems actually behave, not how we wish they did.
Scoring Content Like an AI Retriever Would
One of the simplest ways to simulate how AI systems see your content? Let a large language model act as the retriever. You’re not asking it for opinions – you’re testing whether your content is structured in a way that makes it easy to pull, rank, and use in a real search response.
Instead of vague quality scores, you give it specific criteria tied to how AI retrieval actually works.
What to Look For:
- AI Readability: Are the paragraphs clean and skimmable? Can they be sliced into discrete, answerable segments? Do the headings clearly map to subtopics?
- Extractability: If you feed the model a query, can it grab the right passage without hallucinating, blending unrelated ideas, or rewording something crucial?
- Semantic Density: Does the content pack in the right signals – named entities, close synonyms, contextually linked terms – to stay aligned with the topic in vector space?
Once you’ve defined the scoring rules, you can have the model rate each section from 0 to 10 based on how “retrievable” it is. Roll those up across a set of pages, and you’ll start to see a heatmap of where your content is ready – and where it needs a rewrite.
Is it perfect? No. It’s still a proxy. But when you calibrate the scoring against live outcomes (like how often Perplexity cites your pages), it turns into something incredibly useful: a fast, low-friction way to predict whether AI systems will surface your content at all.
Synthetic Queries: Testing What AI Actually Retrieves
While LLM scoring gives you a sense of how retrievable your content should be, retrieval testing shows you whether it actually gets pulled in real-world conditions.
The idea is to simulate how AI search engines expand user intent. A single prompt like “best ultralight tents for backpacking” isn’t handled in isolation – models often break it down into multiple underlying questions:
“Which ultralight tents are most durable?”
“What’s the typical weight range for backpacking tents?”
“Are there good two-person ultralight options?”
That’s how systems like AI Overviews fan out a topic before deciding what to cite. To mimic this, you can generate synthetic subqueries using several approaches:
- Embedding-based neighbors: Use a vector model (e.g. mxbai-embed-large-v1) to find semantically similar queries from your keyword data
- LLM-driven expansions: Prompt the model to generate variations and alternate phrasings tied to the original question
- Entity injection: Add in specific products, names, or vertical-relevant terms to force the model to test edge cases in matching
Once you’ve got a pool of synthetic queries, you run them through your retrieval setup. That could mean:
- A local retriever trained on your own content – useful if you’ve got domain-specific language
- A remote retriever, like Gemini embeddings paired with vector databases such as Pinecone, Weaviate, or algorithms like Google SCaNN
The goal is to track which passages show up across variations and how closely the retriever behavior aligns with what you’d expect. If you manage the index and the embeddings, you can even simulate what happens when you update content – seeing exactly how those changes shift retrieval behavior over time.
This kind of testing doesn’t just confirm your assumptions – it highlights blind spots, weak signals, and false positives you’d never catch by looking at LLM output alone.
- Looking Ahead: We’re likely heading toward a new class of tools – third-party GEO testing suites that let you simulate retrieval patterns across multiple AI search platforms in one place. Think of it as a preflight checklist for content, similar to how Core Web Vitals gave SEOs a standard way to validate site performance before launch. Only now, it’s about retrieval, not rendering.
Testing for Hallucinations with Prompt Templates
Just because your content gets retrieved doesn’t mean it gets used correctly. The generative layer – where the actual response is written – can still misquote you, skip key details, or invent something that never appeared on your page. That’s where hallucination testing comes in.
The approach is simple but powerful: use structured prompts to test how reliably different models represent your content when generating answers. Some prompts refer to your brand directly, others don’t – the point is to simulate both controlled and organic conditions.
Does the AI stick to the facts? Or does it pull in unrelated details or mix things up?
Is the attribution correct? If the page includes quotes or data, does the model cite you – or someone else?
Are important qualifiers preserved? Think legal disclaimers, market-specific notes, or anything that could change the meaning if left out.
Closing the Loop: Connecting Simulation to Reality
A simulation is only as valuable as the real-world feedback you tie it to. The most effective GEO teams don’t treat testing environments as isolated sandboxes – they treat them like staging grounds for production. Before anything goes live, every page gets scored, retrieval-tested, and checked for hallucination risk. Once it’s published, live data takes over: citations, inclusion frequency, and actual rankings feed back into the system.
Let’s say your internal tests showed a 90% chance a page would be pulled by Perplexity. But in the wild, it only shows up 20% of the time. That gap tells you something’s off. Maybe the production retriever is using a different embedding model. Maybe a competitor’s page hits stronger entity signals. Maybe your key fact got buried during synthesis.
Over time, these gaps shape a calibration loop. Your simulation starts to behave more like the real thing. Eventually, you’ll be able to run predictive tweaks – restructure a heading, add a stat block, dial up semantic density – and get a reliable read on whether it’s likely to improve inclusion.
That’s where this gets powerful: not just testing ideas, but forecasting impact before anyone hits “publish.”
Simulation Isn’t a Shortcut – It’s Leverage
The goal of simulation isn’t to reverse-engineer every hidden detail of how AI search models work. It’s to create an environment where you can test content changes quickly, spot weak spots early, and move faster than the rest of your market.
In traditional SEO, we had tools like rank trackers, backlink profiles, and keyword difficulty scores. In GEO, that toolkit shifts – now it’s about scoring content with LLMs, testing retrieval patterns synthetically, and checking how cleanly the AI models reflect your message.
What’s Next
As AI search shifts toward more multimodal, context-aware systems – think MUM combined with user history and conversational memory – your simulation setup will need to evolve too. It won’t be enough to score plain text. You’ll need to test how image captions, video transcripts, and even user interaction paths affect retrieval and generation.
That’s the real shift: moving from reactive SEO cleanup to proactive content architecture. Instead of trying to decode the black box, you’re building a working model of it – one that’s close enough to guide your next move with evidence, not guesswork.
The real value of simulation in GEO isn’t about trying to perfectly mirror every detail of how Google, Perplexity, or Copilot behave. It’s about taking control in a landscape where the rules are vague, and the ground shifts constantly.
Success won’t come from guessing what the model might do. It’ll come from building test environments that let you model behavior, pressure-test your content, and iterate with real signals – not assumptions.
The teams who take this seriously – who build feedback loops, who test before they publish, who keep tuning based on what retrieval systems actually reward – those are the ones who will stop chasing updates and start shaping outcomes.
In a generative-first search world, advantage goes to the ones who see how the system moves – and act before anyone else does.
FAQ
1. How close can a simulation really get to the real thing?
It won’t be a perfect match – and that’s not the goal. The point is to get close enough to run experiments with confidence. If your setup reflects how content is chunked, embedded, and retrieved, then it becomes a reliable way to test ideas, spot blind spots, and make changes before your content hits production. You’re not cloning the system. You’re building a model of it that’s good enough to guide action.
2. Why not just wait and see what gets cited in AI Overviews?
Because by the time you’re cited – or not – you’ve already lost weeks. Simulation flips that timeline. You can test a page before it’s published, see which parts surface in retrieval, and fix what doesn’t land. That shortens your feedback loop and lets you build with intention, not guesswork.
3. Do I need technical skills to build a simulation setup?
You don’t need a PhD, but yes – you’ll need someone comfortable with Python, APIs, and vector tools like LlamaIndex or Pinecone. It’s not hard once you’ve done it once. And once it’s running, your content team can use it like any other testing tool – just more powerful.
4. Isn’t this all just overkill for SEO?
Not anymore. Traditional SEO was about static pages, keywords, and crawl budgets. GEO is about how AI systems interpret, reshape, and cite your content. If you’re still thinking in terms of just rankings, you’re behind. Simulation is what lets you catch up – and eventually, stay ahead.
5. How do I know if my simulation is working?
The best signal is alignment between your test output and real-world results. If your top retrieved chunks start showing up in AI citations – or at least reflect what gets pulled – you’re on track. If not, it’s a sign your model or content needs tuning. Either way, you’re learning, fast.