Founder Essay: Conversation is the Interface
January 6, 2026
Why the next leap for AI is moving from the desk into the room
We're building JoinIn.ai because the intelligence and power of AI is still stuck in prompt and response.
Most of what matters in human life doesn't happen in documents—or even in transcript summaries. It happens in conversation: messy, overlapping, emotional, time-bound conversation. It's where decisions get made, trust gets built, conflict gets repaired, plans get negotiated, and culture gets formed. A spreadsheet can record a decision. A slide deck can summarize a strategy. But neither one captures—let alone participates in—the moment where a group finally aligns, or the subtle shift when someone changes their mind, or the awkward pause that signals confusion, or the interruption that derails a meeting.
Conversation is the most human thing we do. And if we want intelligence to interact naturally with humans, conversation is the next interface AI must master. As Sherry Turkle puts it, face-to-face conversation is "the most human—and humanizing—thing we do."[1]
In 2023, 2024, and 2025 we watched AI surge in capability—writing, reasoning, coding, creating—largely powered by models trained through next-token (next-word) prediction.[2] But that paradigm doesn't inherently live in time the way conversation does. These systems generate sequences; they don't "feel" a pause, predict a turn boundary, or understand that an interruption carries social cost unless we explicitly build those capabilities around them.[2]
And that's not as alien as it sounds—humans also shift between fast, automatic reactions and slower, deliberative reasoning depending on the moment.[3] Real conversation forces that switching constantly, with turn transitions often happening in the ~100–300 millisecond range—so people must anticipate endings and prepare responses before the other person is fully done.[4]
Yet despite the intelligence, AI has remained stuck behind "one human at a time" interfaces: chat boxes, push-to-talk, and assistants that politely wait to be addressed.
That's not how humans work.
We work in groups. We make side comments. We use pronouns and inside jokes. We gesture. We negotiate who has the floor. We repair misunderstandings. We read the room. We hold back. We jump in. We do the dance.
And today's AI, for all its power, still struggles to join that dance without becoming a nuisance.
The missing layer: interaction intelligence
The problem isn't that AI can't generate content. The problem is that conversation isn't primarily content. It's timing, intent, roles, and social context.
If you've ever used an AI assistant in a live meeting, you've felt the gap:
- It doesn't know when it's being addressed vs. when you're thinking out loud.
- It can't reliably tell who someone is referring to with "she" or "that idea."
- It interrupts at the wrong time—or stays silent when it should help.
- It treats group talk like clean turns, when real talk has overlap, backchannels, and half-utterances.
- It can summarize after the fact, but it can't participate in the moment.
The industry built many voice interfaces on foundations that made sense for the last era: call centers, IVR systems, one speaker, one channel, one goal. But human conversation isn't a call center. To bring AI into the room, we need the next innovation: systems that can coordinate with humans in real time.
Why conversation is the next killer feature
The next big leap for AI won't be another benchmark score. It will be AI moving from the desk into the room—into meetings, collaborative work, family discussions, and any live group setting where communication is happening in real time.
That leap requires a different skill set:
- knowing when to speak vs. listen
- knowing who it's talking to
- knowing what "we" are doing right now (decision? debate? brainstorming? conflict?)
- knowing how to ask for clarification without derailing momentum
- knowing how to offer help privately vs. broadcasting to everyone
- knowing how to repair after a mistake
This is what makes the difference between a clever tool and a helpful presence.
And there's a deeper reason this matters: as AI gets smarter, the risk isn't just "wrong answers." The risk is misfit—systems that force humans to adapt to them, instead of fitting into human life. A truly human-compatible AI shouldn't require constant prompting and supervision like a toddler. It should live inside our norms without hijacking them.
That's what conversation-native AI unlocks: intelligence that's present without being annoying.
How we're approaching it at JoinIn.ai
JoinIn.ai is built on a simple idea: AI should be able to join in.
Not in the sense of talking all the time. Not as a commentator. Not as an always-on microphone with opinions.
Join in as a participant that understands the flow of group communication and supports it—carefully, respectfully, and usefully.
To do that, we're starting where the pain is obvious today: meetings.
Meetings are where people feel the gap between AI's intelligence and its inability to behave. Meetings are where time is expensive. Meetings are where miscommunication turns into wasted work. Meetings are where people need help in the moment, not just a transcript afterward.
So we're starting with small, high-value behaviors that teach us the rules of real interaction:
- micro catch-up when someone joins late
- next-turn suggestions when a point is being missed
- real-time context: "what are we talking about, and what's the unresolved question?"
- action proposals that can be accepted, edited, or rejected
- private backchannels for the person "driving" the meeting
- and yes—sometimes speaking aloud, but only when it's socially clear
Most importantly: humans should be able to shape the assistant's behavior. When people flag an intervention as helpful or annoying, they're not just giving feedback—they're teaching the system what "good participation" means in their culture, their team, their environment.
Because there isn't one universal "right way" to join a conversation. A board meeting, a design critique, a family dinner, a gaming session, and a therapy group all have different norms. The assistant has to learn them—not impose one pattern everywhere.
So we build incremental tools that work now, and we use them to learn fast.
What success looks like
Success isn't an AI that talks more. It's an AI that makes conversation better.
It reduces the cognitive load of moderating, remembering, and translating talk into action. It helps quieter voices get space. It catches misunderstandings early. It keeps groups aligned. It's present, but it's not sticky. It doesn't compete with humans for attention—it protects human attention.
If we get this right, JoinIn becomes more than a meeting product. It becomes infrastructure: a conversation-native layer that other products can plug into—so collaboration software, productivity tools, games, smart-home experiences, education platforms, and future interfaces can all tap into real-time conversational understanding.
That's the long arc. But the north star is simple:
If conversation is the most human interface, AI has to learn to live there.
That's what we're building at JoinIn.ai.
References
- Greater Good Magazine (UC Berkeley), "How Smartphones Are Killing Conversation" (interview/feature with Sherry Turkle).
- CSET (Georgetown), "Large Language Models (LLMs): An Explainer" and "The Surprising Power of Next Word Prediction."
- The Decision Lab, "System 1 and System 2 Thinking" (summary of Kahneman's framework).
- Levinson, S.C., & Torreira, F. (2015). "Timing in turn-taking and its implications for processing models of language." Frontiers in Psychology.
