
Many organizations can be hesitant to overtake their tech stack and begin from scratch.
Not Notion.
For the three.0 model of its productiveness software program (launched in September), the corporate didn’t hesitate to rebuild from the bottom up; they acknowledged that it was essential, in truth, to assist agentic AI at enterprise scale.
Whereas conventional AI-powered workflows contain express, step-by-step directions primarily based on few-shot studying, AI brokers powered by superior reasoning fashions are considerate about instrument definition, can establish and comprehend what instruments they’ve at their disposal and plan subsequent steps.
“Rather than trying to retrofit into what we were building, we wanted to play to the strengths of reasoning models,” Sarah Sachs, Notion’s head of AI modeling, instructed VentureBeat. “We've rebuilt a new architecture because workflows are different from agents.”
Re-orchestrating so fashions can work autonomously
Notion has been adopted by 94% of Forbes AI 50 corporations, has 100 million complete customers and counts amongst its prospects OpenAI, Cursor, Figma, Ramp and Vercel.
In a quickly evolving AI panorama, the corporate recognized the necessity to transfer past less complicated, task-based workflows to goal-oriented reasoning methods that enable brokers to autonomously choose, orchestrate, and execute instruments throughout linked environments.
In a short time, reasoning fashions have grow to be “far better” at studying to make use of instruments and comply with chain-of-thought (CoT) directions, Sachs famous. This permits them to be “far more independent” and make a number of choices inside one agentic workflow. “We rebuilt our AI system to play to that," she stated.
From an engineering perspective, this meant changing inflexible prompt-based flows with a unified orchestration mannequin, Sachs defined. This core mannequin is supported by modular sub-agents that search Notion and the net, question and add to databases and edit content material.
Every agent makes use of instruments contextually; as an example, they’ll resolve whether or not to go looking Notion itself, or one other platform like Slack. The mannequin will carry out successive searches till the related info is discovered. It may possibly then, as an example, convert notes into proposals, create follow-up messages, monitor duties, and spot and make updates in data bases.
In Notion 2.0, the group targeted on having AI carry out particular duties, which required them to “think exhaustively” about the way to immediate the mannequin, Sachs famous. Nevertheless, with model 3.0, customers can assign duties to brokers, and brokers can really take motion and carry out a number of duties concurrently.
“We reorchestrated it to be self-selecting on the tools, rather than few-shotting, which is explicitly prompting how to go through all these different scenarios,” Sachs defined. The purpose is to make sure all the things interfaces with AI and that “anything you can do, your Notion agent can do.”
Bifurcating to isolate hallucinations
Notion’s philosophy of “better, faster, cheaper,” drives a steady iteration cycle that balances latency and accuracy by fine-tuned vector embeddings and elastic search optimization. Sachs’ group employs a rigorous analysis framework that mixes deterministic exams, vernacular optimization, human-annotated information and LLMs-as-a-judge, with model-based scoring figuring out discrepancies and inaccuracies.
“By bifurcating the evaluation, we're able to identify where the problems come from, and that helps us isolate unnecessary hallucinations,” Sachs defined. Additional, making the structure itself less complicated means it’s simpler to make adjustments as fashions and strategies evolve.
“We optimize latency and parallel thinking as much as possible,” which ends up in “way better accuracy,” Sachs famous. Fashions are grounded in information from the net and the Notion linked workspace.
In the end, Sachs reported, the funding in rebuilding its structure has already offered Notion returns by way of functionality and quicker charge of change.
She added, “We are fully open to rebuilding it again, when the next breakthrough happens, if we have to.”
Understanding contextual latency
When constructing and fine-tuning fashions, it’s essential to know that latency is subjective: AI should present probably the most related info, not essentially probably the most, at the price of velocity.
“You'd be surprised at the different ways customers are willing to wait for things and not wait for things,” Sachs stated. It makes for an attention-grabbing experiment: How gradual are you able to go earlier than folks abandon the mannequin?
With pure navigational search, as an example, customers is probably not as affected person; they need solutions near-immediately. “If you ask, ‘What's two plus two,’ you don't want to wait for your agent to be searching everywhere in Slack and JIRA,” Sachs identified.
However the longer the time it's given, the extra exhaustive a reasoning agent may be. For example, Notion can carry out 20 minutes of autonomous work throughout tons of of internet sites, information and different supplies. In these cases, customers are extra prepared to attend, Sachs defined; they permit the mannequin to execute within the background whereas they attend to different duties.
“It's a product question,” stated Sachs. “How do we set user expectations from the UI? How do we ascertain user expectations on latency?”
Notion is its greatest person
Notion understands the significance of utilizing its personal product — in truth, its staff are amongst its greatest energy customers.
Sachs defined that groups have energetic sandboxes that generate coaching and analysis information, in addition to a “really active” thumbs-up-thumbs-down person suggestions loop. Customers aren’t shy about saying what they assume needs to be improved or options they’d prefer to see.
Sachs emphasised that when a person thumbs down an interplay, they’re explicitly giving permission to a human annotator to investigate that interplay in a method that de-anonymizes them as a lot as attainable.
“We are using our own tool as a company all day, every day, and so we get really fast feedback loops,” stated Sachs. “We’re really dogfooding our own product.”
That stated, it’s their very own product they’re constructing, Sachs famous, in order that they perceive that they could have goggles on on the subject of high quality and performance. To steadiness this out, Notion has trusted "very AI-savvy" design companions who’re granted early entry to new capabilities and supply essential suggestions.
Sachs emphasised that that is simply as essential as inner prototyping.
“We're all about experimenting in the open, I think you get much richer feedback,” stated Sachs. “Because at the end of the day, if we just look at how Notion uses Notion, we're not really giving the best experience to our customers.”
Simply as importantly, steady inner testing permits groups to judge progressions and ensure fashions aren't regressing (when accuracy and efficiency degrades over time). "Everything you're doing stays faithful," Sachs defined. "You know that your latency is within bounds."
Many corporations make the error of focusing too intensely on retroactively-focused evans; this makes it troublesome for them to know how or the place they're enhancing, Sachs identified. Notion considers evals as a "litmus test" of improvement and forward-looking development and evals of observability and regression proofing.
“I think a big mistake a lot of companies make is conflating the two,” stated Sachs. “We use them for both purposes; we think about them really differently.”
Takeaways from Notion's journey
For enterprises, Notion can function a blueprint for the way to responsibly and dynamically operationalize agentic AI in a linked, permissioned enterprise workspace.
Sach’s takeaways for different tech leaders:
Don’t be afraid to rebuild when foundational capabilities change; Notion absolutely re-engineered its structure to align with reasoning-based fashions.
Deal with latency as contextual: Optimize per use case, reasonably than universally.
Floor all outputs in reliable, curated enterprise information to make sure accuracy and belief.
She suggested: “Be willing to make the hard decisions. Be willing to sit at the top of the frontier, so to speak, on what you're developing to build the best product you can for your customers.”

