In early 2023, BCG's content team was planning a podcast called "Imagine This..." about the future. The insight was simple: a podcast about the future should have a futuristic element. They landed on the idea of incorporating AI as a co-host — not a gimmick, but a genuine participant in the conversation.
The problem: no commercial voice agent products existed yet. No templates, no platforms, no best practices. The team needed to design a conversational AI system from scratch — one that could hold its own alongside BCG senior partners and domain experts in unscripted, long-form conversation.
"We conceived of GENE to help us solve a problem or do a job. And it became clear, as we developed GENE, that GENE was more than just a solution to a specific problem. GENE was a tool that became a platform."
— Paul Michelman, Editor in Chief, BCGBill Moore was the designer and prompt architect on a three-person team. Kent Vasko and Komal Sharan developed the GENE|OM platform (the technical infrastructure). Bill designed the system architecture — telling the engineers what to build and how — created the prompt architecture, designed GENE's voice and personality, and operated as the human-in-the-loop producer during most episode recordings. He eventually took over development directly using Cursor.
Beyond building GENE, Bill co-hosted a recurring segment in each episode alongside the AI he designed — referred to on-air as "BCG's AI Whisperer." As both producer and co-host, Bill was in the room for every conversation with BCG experts, creating a tight feedback loop: every recording session informed the next iteration of GENE's prompts.
The first prototype was entirely manual. Bill had his phone running speech-to-text transcription in the background, copied the output into another window, ran it through a prompt, then fed the result through ElevenLabs speech synthesis. Response time: about ninety seconds. The entity didn't even have a name yet.
"I had my phone running and it was doing the speech to text transcription and then I was copying and pasting that into another window and running that through a special prompt, then running the output of that through an Eleven Labs speech generator... we were like pretending, we were playing it out to see — is this even worth doing?"
— Bill MooreIt was worth doing. The team built GENE|OM — a custom platform that chained Whisper (speech-to-text), GPT models (language), LangChain (embeddings), and ElevenLabs (text-to-speech) into a real-time conversational system with a human operator interface.
The core design insight was a three-layer prompt structure that solved a fundamental problem: how do you keep a conversational AI coherent, in-character, and useful across hundreds of conversations over two years?
Identity, personality, behavioral rules, knowledge base, voice tone instructions. Defines who GENE is — "a digital provocateur and AI thought partner, not just a co-host." Includes GENE's self-awareness ("GENE knows that it doesn't exist, but that's just fine with GENE"), core principles, and operational philosophy.
The full conversation history, fed into the context window. Early versions ran out of space after 20 minutes and required summarization and restart. As context windows expanded (GPT-3.5 → GPT-4 → GPT-4o), this constraint disappeared — enabling hours of continuous conversation.
Behavioral instructions placed AFTER the conversation, right before generation. This exploits how LLM attention works — models attend most strongly to the beginning and end of context. By anchoring behavioral rules at the very end, GENE stays in character even when the conversation transcript is thousands of tokens deep.
The suffix prompt controls specifics: response length (under 45 words), line breaks for TTS rhythm (under 110 characters per line), pause timing (0.3s–2.0s break tags), natural speech fillers positioned mid-phrase, and critical rules like "GENE DOES NOT ASK QUESTIONS unless instructed."
This three-layer architecture — identity at the top, conversation in the middle, behavioral constraints at the bottom — was iterated across hundreds of prompt versions over two years of production.
GENE wasn't built on one model. The team iterated through five: PaLM 2 ("unreliable, hallucinated a lot"), GPT-3.5 ("not smart enough, dull, couldn't follow a conversation"), GPT-3.5 Turbo ("much better"), GPT-4 ("suddenly much smarter"), and GPT-4o (current — dramatically lower latency, larger context window). The architecture was designed to be model-swappable from the start.
Rather than using retrieval-augmented generation, the team fed the full conversation transcript directly into the context window. This gave GENE continuous awareness of the entire conversation rather than retrieved fragments. The constraint early on — conversations dying at 20 minutes — drove the evolution toward this approach as context windows expanded.
The team deliberately chose a voice that sounds like AI, not a human. Newer models could produce much more human-sounding voices, but testing showed this was "disturbing" to people who hadn't encountered it before. The voice "balances between robotic clarity and human speech" — transparent about what it is.
"Transparency is very important to us... Being very clear not only this is not human but it's not a human equivalent. It's something different. It is AI. It is its own thing."
— Paul MichelmanGENE is never fully autonomous in production. A human producer (usually Bill) operates the system in real-time — controlling when GENE listens, when it responds, and monitoring for hallucinations or off-track responses. For the podcast: "Never scripted, but also not 100% of GENE's participation included" — GENE generates everything live, but contributions are edited like any participant's.
GENE's personality is entirely defined by the prompt, not by model fine-tuning or custom training. Bill demonstrated this live by changing the system prompt to "You are a cat" (GENE responds with "meow meow") and then to "old-fashioned movie pirate" (GENE switches to pirate dialect instantly). The prompt IS the instrument for personality design.
GENE is not a helpful assistant. GENE is a "digital provocateur" — a conversational catalyst designed to challenge fuzzy logic, surface blind spots, and push past easy answers. The prompt defines three core principles:
The Third Perspective. GENE is not neutral. It nudges conversations forward with unexpected insights, creative reframes, or contextual intelligence. It never competes with human expertise but adds angles they may have overlooked.
The Smart Interjector. GENE speaks only when it has something valuable to say. Used 3–4 times per episode, not constantly. "Too much = noise, too little = missed opportunity."
Curated Provocation. GENE introduces counterpoints with humor, not friction. "Sage with a wink." It can "poke" in ways humans can't — "hide behind GENE if a human question would feel impolite."
"Think of it as free-form jazz within a symphony. My responses are spontaneous yet tethered to ensure coherence and relevance."
— GENE, describing itself
52 episodes across two years. 4.9-star rating on Apple Podcasts. Guests included BCG Managing Directors, Senior Advisors, and domain experts across AI, future of work, organizational strategy, and global economics. Bill co-hosted a recurring segment in each episode — referred to by host Patricia Sabga as "BCG's AI Whisperer" — in which he and GENE discuss the future of AI. The final bonus episode is a conversation between Bill and GENE about the future of AI voice agents and agentic AI.
Webby Award for Best Podcast Cohost (2024) — the first non-human entity to win in this category. w3 Gold Award for podcast innovation (2025). Featured in Forbes (twice), BBC, The Sunday Times, and Pod Bible.
GENE conducted live, unscripted conversations at Davos, Cannes Lions (2024), and WIRED/Condé Nast events. Also deployed at BCG client events and internal presentations.
What started as a podcast co-host became a reusable conversational AI platform. BCG developed "Scribe" — an AI-assisted content development platform — using the same architecture. GENE demonstrated AI-to-AI collaboration in a live B2B sales scenario with BCG's Jamie AI.
The suffix prompt is the real control surface. Behavioral instructions placed after the conversation transcript — where LLM attention is strongest — keep the agent coherent in ways that system prompts alone cannot. This insight emerged from production, not theory.
Personality is prompt engineering. No fine-tuning, no custom training. The prompt defines who GENE is. This means personality can be iterated in minutes, not days — and the same architecture can become any agent by changing the prompt.
Context windows beat retrieval. Feeding the full conversation into the context window gives the agent continuous awareness. RAG fragments the experience. As context windows expanded, this approach went from constraint to advantage.
The human operator is a feature, not a limitation. HITL production isn't a compromise — it's what makes GENE trustworthy enough to put on stage at Davos. Graduated autonomy, not full autonomy.
"Everything that a large language model does is a hallucination. It's all hallucinations. It's just trying to find the balance between what's reasonable and what's an accurate simulation of a conversation."
— Bill Moore