TruAbutment AI Sales Agent · Proposal by SoluteLabs

Section 01 / Executive Summary

A clear picture of what we're building.

Based on our review of TruAbutment's prototype documentation, the call on April 27, and the technical walkthrough, we have a clear picture of what you are building and what it will take to get there.

The prototype validated the right instincts: an orchestrator-expert agent pattern, a proxy layer between the agent and your CP APIs, and a split between informational products (RAG) and catalog items (direct API). The gap between the prototype and a production system is well-defined, and this proposal addresses it directly.

Decisions confirmed on April 27

Full Python stack for the agent layer: FastAPI gateway, LangGraph orchestration, LangChain RAG, LangSmith observability.
React (TypeScript) for web chat; React Native for mobile, iOS-first, Android with minimal additional effort.
90 to 95% of end users are on iOS. Mobile is the primary surface, not secondary.
Multimodal confirmed: text and voice both required from day one.
Non-logged-in users are in scope: FAQ and product Q&A only, with lead capture.
No HIPAA / PII compliance constraints currently.
Timeline: 22 weeks build, 23 weeks end-to-end including discovery, UAT, and go-live. Total fixed-price investment: $114,000, milestone-based.
US-hours team overlap guaranteed for every engineer, the technical lead, and the project lead on this engagement. No async-only handoffs.

Section 02 / Scope

What you are building.

A fully conversational interface for product discovery and ordering, embedded in your website and iOS app. Two fundamentally different types of data, served by two fundamentally different mechanisms: a distinction that shapes the entire architecture.

Layer	Description
Knowledge	8 core informational products (ioConnect, Tru Dual Align Kit, Tru Reamer Kit, HS Cap, T-Marker, TruBolt Kit, ASC Pro Kit, Bite Scope Kit). Answered by RAG. Citation-backed. The agent refuses rather than guesses when context is missing.
Catalog	3,000+ SKUs (TruBase, AOT, T-L, TruScan Body, Lab Analog, fixtures, screws) served by direct CP API lookup. No embedding. Always live, always accurate.
Order	Every order placed through a deterministic state machine. Multi-step, mid-flow modifiable, account-aware pricing, idempotent submit. Voice cannot bypass the confirmation button.
Maintenance	An admin platform that product and marketing can own. Template-enforced KB editor, policy controls, quality gates, one-click rollback.
Lead capture	Non-logged-in visitors get a FAQ and product Q&A experience. High-intent signals captured as leads and routed to a lightweight CRM (initially Google Sheets or a simple database).

Section 03 / Personas

Who uses this system.

Four personas use this system. The Buyer is the primary design constraint: every architectural decision is tested against their needs first.

Persona	Description
The Buyer	Dental labs, clinicians, and resellers ordering implant components mid-case. Time-pressured, mixed catalog fluency, often describing a clinical situation rather than a part number. Needs to arrive at a verified, correctly priced order without leaving the conversation. Surface: web chat (desktop) and iOS app (primary). Logged in. Account-specific pricing applied automatically via CP API.
Non-logged Visitor	New prospects discovering TruAbutment through SEO or referral. Can ask product and FAQ questions, cannot place orders. High-intent signals trigger lead capture and handoff to a sales rep.
KB Editor	Product or marketing staff. No engineering background. Owns content updates through a structured dashboard. Template enforcement at input means their edits cannot degrade retrieval quality.
Policy Admin	Sales operations. Modifies system prompts, guardrail settings, escalation rules, and sales policies through the admin dashboard, without code deployments.
Human Sales Rep	Receives escalations with full context: conversation transcript, current cart state, retrieved KB chunks, and account profile. HITL integration via email or a lightweight CRM.

Section 04 / Architecture

Proposed architecture.

Five layers. Every channel (web, iOS, future touchpoints) communicates through a Python FastAPI gateway built entirely by SoluteLabs. LangGraph orchestration, RAG retrieval, and all business logic run within this unified Python service. BotPress is removed entirely.

L1Clients

React (TS) Web Chat

Streaming · Generative UI order cards · Voice toggle

React Native iOS

iOS-first · Voice-native · Android with minimal delta

Custom

Admin Dashboard

KB editor · Policy manager · RBAC · Quality gate

L2Gateway

Python FastAPI GatewayNew · built by SoluteLabs

Auth & SSO bridge · Conversation logger (async, DB) · Signal parser (intent → iOS UI transitions) · Rate limiter & CP API retry logic · Single /chat endpoint for all channels

L3Orchestration

LangGraph + LangChain

Orchestrator agent · Expert agents per product category · Guardrail node: confidence scoring, source citation, clarification gate · Dual routing: logged-in (full flow) vs non-logged (FAQ + lead capture) · LangSmith observability and trace.

Tool registry: product_search · price_lookup · inventory_check · order_create · upsell_recommend · escalate · lead_capture

L4Data

pgvector

Category A KB · Hybrid search · Cross-encoder re-ranking

TruAbutment API

CP API Bridge

Category B · Live pricing · Inventory · Orders

PostgreSQL + Redis

Cart state · Session cache · Order & conv logs

Custom

Policy Store

System prompts · Guardrails · Escalation rules

L5External

OpenAI / Azure

LLM inference · Embeddings (benchmarked wk 1)

ElevenLabs TTS

Streaming <1.5s · SoluteLabs Solutions Partner

Whisper / Azure STT

Speech-to-text · Selected on latency benchmark

Lead CRM

Google Sheets or lightweight DB · Upgradeable to HubSpot

Section 05 / Technical Approach

Technical approach.

5.1Dual user flow

The agent uses session state to route conversations to two distinct flows. The user sees one interface.

User type	Agent behaviour
Logged-in buyer	Full flow: product Q&A, compatibility, ordering, account-specific pricing. Order state machine active. Voice enabled.
Non-logged visitor	FAQ and product Q&A only. No order access. High-intent signals (pricing, ordering, compatibility questions) trigger lead capture: name, contact, intent summary routed to CRM.

Lead capture does not require a full CRM. Initially: Google Sheets via API, or a minimal DB table (name, email, conversation summary, product interest, timestamp). Sales rep receives an email notification with the transcript. Upgradeable to HubSpot or similar without changing the agent logic.

5.2Order workflow, as a state machine

The LLM proposes. Application logic gates. No order can be created without passing through every state in sequence.

01

Idle → Intent

User describes a need. Orchestrator detects order intent and routes to Cart Assembly.
02

Cart Assembly ⇌ Modification Loops freely

Items extracted via structured LLM output (product_id, quantity, variant). Validated against CP catalog API before entering cart. Ambiguous items trigger clarification cards, never a guess. Phase 1 supports ordering by SKU lookup across the full 3,000+ catalog (buyer names the item, agent fetches and validates). Phase 3 adds conversational discovery ("find me a 4.5mm fixture compatible with this case") over the same catalog.
03

Estimate

Read-only CP API call. Shows live account-discounted pricing. No order created yet.
04

Review

Generative UI card rendered in chat: product list, quantities, total price. User can modify or confirm.
05

Submit Gated, explicit button

Voice cannot trigger submit. The Submit button must be pressed explicitly. Idempotent: dedup key prevents duplicate orders on retry.
06

Confirmed Terminal

CP order created. Confirmation shown in chat with order number.

Failure modes handled

Inventory drift: live re-check at submit time; user notified of any changes between Estimate and Submit.
Session drop: cart state persisted to PostgreSQL (Redis is cache only, not source of truth). Recovery UX on return.
CP API timeout: retry with exponential backoff at the FastAPI gateway layer. User sees a loading state, not an error.

5.3RAG & knowledge layer

The most important decision in the RAG architecture is what not to RAG. Category B catalog items are structured spec data: direct API lookups are faster, cheaper, and always accurate. RAG is reserved exclusively for Category A.

Category	Architecture
Category A: 8 informational SKUs	pgvector database. Template-aligned chunking (one chunk per KB section: Overview, Components, Compatibility, Workflow, Specs, FAQ, Ordering). Each chunk tagged with product_id, section_type, version. Hybrid retrieval: dense vector + BM25 keyword. Cross-encoder re-ranking before LLM context. Citation-backed responses. Agent refuses rather than guesses when confidence is below threshold.
Category B: 3,000+ catalog SKUs	Direct CP catalog API. Real-time MSRP plus account-discounted pricing. Live inventory. No embedding, no staleness risk.

On vector store and embedding model selection: we commit to pgvector on PostgreSQL as the vector store: no new infrastructure, transactional consistency with the rest of the data layer, and proven at the volume of 8 KB products with sectioned chunks. The embedding model is benchmarked in week 1 of the discovery sprint against actual KB content and real user queries; dental and clinical terminology is specific enough that the right choice must be validated, not assumed.

On iterative improvement: we distinguish three things that are often conflated. (1) RAG iteration: improving chunking, retrieval, and re-ranking based on real query telemetry. Ongoing, high-leverage. (2) Prompt iteration: refining system prompts and few-shot examples. Also ongoing. (3) Model fine-tuning: retraining the LLM on TruAbutment's domain. We recommend against this. The gains are marginal when RAG is well-designed, and the maintenance cost is high.

5.4KB management dashboard

The admin dashboard is a blocking dependency for sustainable operation. Without it, every KB update requires an IT ticket. It is in scope for Phase 1.

Publish pipeline: Edit → Validate → Re-embed (changed chunks only) → Index → Regression eval → Live. Triggered by clicking Publish. No manual steps.

Role	Access level
Content Editor (Product / Marketing)	Section content fields only. Cannot modify chunk boundaries, templates, or system prompts.
Policy Admin (Sales Ops)	System prompts, guardrail settings, escalation rules, sales policies.
Engineering	Template schema, eval set, retrieval configuration.

Quality gate is a soft warning, not a hard block. When a KB update causes a regression on the golden QA set, the publisher sees a warning and must explicitly override to proceed. Hard blocks cause operational friction: a product manager cannot fix a typo because an unrelated test regressed.

5.5Voice mode

Component	Approach
Speech-to-text (STT)	OpenAI Whisper API or Azure Speech-to-Text. Selected based on latency benchmark in discovery.
Text-to-speech (TTS)	ElevenLabs streaming TTS. SoluteLabs is an ElevenLabs Solutions Partner. Target: <1.5s time-to-first-audio via streaming token pipeline.
Safety gate	Voice cannot submit an order. The Submit button must be pressed explicitly. Confirmed as a design requirement on the April 27 call.
Toggle	Per-session button on both web chat and iOS. Does not affect conversation state.

5.6HITL escalation

The agent escalates when it detects: negative sentiment, three consecutive clarification failures, an explicit user request, or a high-value order threshold (configurable).

On escalation, the agent pauses the conversation and sends the sales rep a package containing: full conversation transcript, current cart state, retrieved KB chunks that grounded the last response, and the user's account profile (if logged in). Initial delivery via email. Upgradeable to Intercom or Zendesk without changing the agent logic.

5.7Performance targets

The following are design targets, not contractual SLAs. They represent the user-perceived latency we engineer toward. Production SLAs (uptime, error-rate guarantees) are scoped in the post-launch retainer once real load patterns are established.

Metric	Target
Chat first-token latency	<2s P50 · <4s P95
Voice time-to-first-audio	<1.5s P50 (streaming TTS)
CP API tool calls	<3s typical · exponential backoff retry on timeout
Order state transitions	Synchronous · idempotent on retry (dedup key prevents duplicates)
Build / UAT phase availability	Best-effort during US business hours · staging environment available 24/7

Section 06 / Phased Delivery

Three phases plus a paid discovery sprint.

Phase 1 is the production foundation: everything else builds on it. Phases 1 and 2 overlap by ~2 weeks once the /chat API is stable.

Phase 01

Production foundation.

9 weeks / 2 dev + TL @ 50% / $39,000

Python FastAPI Gateway: auth/SSO bridge, conversation logger, signal parser, rate limiting, CP API bridge1.5 wks
pgvector setup, embed all Category A KB (8 products), hybrid search (dense + BM25)1 wk
LangGraph orchestration layer (replaces BotPress brain) + LangSmith tracing2.5 wks
Hallucination guardrails: confidence gate, citation enforcement, clarification node0.5 wk
KB Management Dashboard MVP: template editor, publish pipeline, quality gate1.5 wks
React (TypeScript) web chat UI (replaces BotPress Webchat)1 wk
Eval framework: golden QA set, regression suite0.5 wk
Integration testing and staging deploy0.5 wk

Outcome

Production web agent with real RAG, zero-hallucination guardrails, and non-technical KB editing. BotPress fully removed. Full Python stack.

Phase 02

Mobile, voice & sales intelligence.

9 weeks / 3 dev + TL @ 50% / $48,000

React Native iOS app + Chat API integration3 wks
Generative UI order cards (product selection, cart review, confirmation)1 wk
Voice pipeline: Whisper STT, ElevenLabs TTS, streaming, toggle button1.5 wks
Non-logged-in user flow + lead capture pipeline (CRM integration)1 wk
Upselling engine: product relation graph, upsell tool, conversational suggestion flow1 wk
HITL escalation: sentiment detection, context package, email/CRM handover0.5 wk
Account-specific pricing surfaced in agent responses0.5 wk
UAT + bug fix cycle1 wk

Outcome

iOS app live with voice, generative UI order flow, lead capture, and upselling. UAT milestone.

Phase 03

Scale & personalisation.

6 weeks / 2 dev + TL @ 50% / $24,000

Conversational catalog discovery: natural-language search across the full 3,000+ SKU catalog ("find me a 4.5mm fixture compatible with this case"). Builds on the SKU-lookup ordering already shipped in Phase 1.2 wks
Mobile catalog search UI (iOS)1.5 wks
Personalised recommendations from order_logs (purchase history)1.5 wks
Analytics: conversation quality dashboard, intent tracking, funnel metrics1 wk
Go-live + hypercare0.5 wk

Outcome

Full SKU coverage, personalised recommendations, analytics for the sales team. Full go-live.

Total timeline / 22 weeks build, 23 weeks E2E

Discovery sprint (1 week) runs before the build clock starts and resolves the open items in Section 8. CP API documentation, SSO scope, and idempotency questions are answered before Phase 1 begins.
Phase 1 (9 wks) and Phase 2 (9 wks) overlap by ~2 weeks once the /chat API endpoint is stable. Phase 3 (6 wks) begins immediately after Phase 2 UAT sign-off.
Compression to ~18 weeks is feasible by adding a fourth developer in Phase 1, subject to discovery-sprint findings on CP API and SSO complexity.

Section 07 / Investment

Investment & pricing.

A fixed-price proposal. The price absorbs the discovery-phase risk on the open items in Section 8 (CP API documentation, SSO status, idempotency). No change orders on items resolved in the discovery sprint.

Pricing is based on our standard rates: $40/hr for developers and $60/hr for technical leads, 40 billable hours per engineer per week. Dev hours reflect a 25% efficiency gain from agentic tooling (Claude Code, Cursor). All team members carry US-hours availability.

7.1Phase schedule

00

Discovery Sprint

1 week · 1 dev (40h) + 1 TL (20h)

$3,000

01

Production Foundation

9 weeks · 2 dev + TL @ 50%

$39,000

02

Mobile, Voice & Sales Intelligence

9 weeks · 3 dev + TL @ 50%

$48,000

03

Scale & Personalisation

6 weeks · 2 dev + TL @ 50%

$24,000

Total

Fixed price.

22 weeks build · 23 weeks end-to-end

$114,000

7.2Investment by capability

Same total, viewed by what each capability delivers. Useful when prioritising features or evaluating ROI per surface.

Capability	What it delivers	Investment
Discovery Sprint	Open items resolved · architecture locked · evaluation set agreed · per-phase SOW finalized	$3,000
Logged-in Buyer Core	FastAPI gateway · LangGraph orchestration · pgvector RAG with hybrid search · hallucination guardrails · React web chat · order state machine · eval framework · account-specific pricing	$35,000
KB Management Dashboard	Template-enforced editor for non-technical content owners · publish pipeline · quality gate · RBAC	$6,500
Mobile App (iOS-first, RN)	iOS app · Chat API integration · generative UI order cards · mobile catalog search · UAT	$30,500
Voice Mode	Whisper STT · ElevenLabs streaming TTS · per-session toggle · <1.5s time-to-first-audio	$7,500
Non-logged + Lead Capture	FAQ & product Q&A flow · high-intent detection · CRM handoff · email notification with transcript	$5,000
Sales Intelligence	Upselling engine (product relation graph) · HITL escalation with context package · personalised recommendations	$13,000
Catalog Scale, Analytics & Go-Live	Conversational discovery over 3,000+ SKU catalog · conversation analytics dashboard · go-live + hypercare	$13,500
Total		$114,000

7.3Operating costs (client-paid)

LLM, STT, TTS, and embedding inference costs are paid directly by TruAbutment via your own API keys (OpenAI or Azure for LLM and embeddings, ElevenLabs for TTS). This keeps data ownership, rate limits, and any enterprise-rate negotiations on your side. SoluteLabs configures the keys; we do not mark up inference.

Indicative monthly inference

$500 to $1,500 / month depending on conversation volume (assumes GPT-4o-mini for orchestration routing, GPT-4o for expert agents, Whisper STT, ElevenLabs streaming TTS at moderate voice usage).
A per-conversation cost model is delivered with Phase 1 so you can forecast spend as adoption grows. Final monthly figure is firmed up post-discovery once volume assumptions are validated.

7.4Post-launch retainer

After go-live, a monthly retainer covers RAG iteration (chunking, retrieval, re-ranking improvements based on real query telemetry), prompt iteration, KB content support, eval set expansion, monitoring, and minor feature work. Tier can be adjusted month-to-month with 30 days' notice. Unused hours roll forward one month.

Tier	Monthly	Coverage
Light	$2,500	1 dev × 10h/week · KB support, monitoring, minor bug fixes
Standard	$4,500	1 dev × 20h/week + TL × 5h/week · RAG iteration, prompt tuning, eval expansion, KB support, monitoring
Heavy	$8,000	2 dev × 20h/week + TL × 10h/week · active feature development, multi-quarter product roadmap

How we hold this fixed price

Discovery sprint absorbs the open items: CP API documentation, SSO scope, and idempotency questions are resolved in week 1, before the build clock starts. The fixed price absorbs what an indicative range would have left ambiguous, so you do not see scope-creep change orders mid-build.
TL at 50% allocation: the technical lead provides architecture oversight, code review, and client coordination, not full-time coding. Billed at 20h/week.
Agentic tooling efficiency: we use Claude Code, Cursor, and AI-assisted scaffolding throughout. We pass that efficiency back as a 25% reduction on dev hours, baked into the rate.
Milestone-based payment: each phase has clear deliverables and acceptance criteria before the next begins. Payment is per-phase against acceptance, not lump sum upfront.

Section 08 / Open Items

Open items before SOW.

These items need to be resolved before we produce a fixed-price Statement of Work. We propose resolving them in a paid 1-week discovery sprint.

Item	Why it matters	Owner
CP API documentation	Determines discovery buffer. Undocumented APIs add 1 to 2 weeks per phase.	TruAbutment
CP Order API idempotency	If the API does not dedup on retry, we add a reconciliation layer at the gateway. Affects Phase 1 scope.	TruAbutment
SSO status	Account-specific pricing depends on SSO. Built, in progress, or to be built?	TruAbutment
Existing regression eval set	Quality gate is meaningless without one. If none exists, we build it in week 1 with the product expert.	Joint
KB content owner	The highest-leverage role on your side. Who from product or marketing owns KB during the build?	TruAbutment
Lead CRM target	Google Sheets, simple DB, or an existing tool? Affects Phase 2 integration scope.	TruAbutment

Section 09 / Engagement

Engagement model.

Three steps from this proposal to Phase 1 kickoff. Discovery sprint runs first to lock the items in Section 8, then a fixed-price SOW per phase, then build.

01

Discovery Sprint

1 week · paid ($3,000). Structured workshop with your product and technical team. We resolve all six open items, lock the architecture, agree the evaluation set, and produce a detailed SOW with weeks, deliverables, and acceptance criteria per phase.

02

Statement of Work

Milestone-based pricing per phase. Each phase has a clear deliverable and acceptance criteria before the next begins. In/out scope explicit. No lump sum.

03

Build

Phase 1 starts immediately after SOW signature. Weekly standups, fortnightly demos. US-hours overlap guaranteed for all team members on this engagement.

Section 10 / Why SoluteLabs

Why SoluteLabs.

US-hours overlap	Every engineer, the technical lead, and the project lead on this engagement work US business hours. Live standups, fortnightly demos, and Slack response within your working day. No 12-hour async lag, no "we'll catch up tomorrow." This is a non-negotiable for our team composition on this project.
AI agent engineering	Production RAG pipelines, LangGraph orchestration, hallucination guardrails. This is our primary practice, not a side capability.
Python & AI ecosystem	Full Python stack: FastAPI gateway, LangGraph orchestration, LangChain RAG pipeline, LangSmith observability. We own the entire agent surface end-to-end.
ElevenLabs Solutions Partner	Voice pipeline delivered without integration overhead. We have shipped text and voice AI agents in healthcare contexts.
React & React Native	React TypeScript web chat plus React Native iOS-first mobile. No second codebase. Android available with minimal delta.
Health tech experience	We work with clients in the health tech sector and understand the precision and trust requirements of clinical workflows.
12 years · US entity	40-person team. Founded 2014, US entity in Delaware. Long-running engagements with B2B SaaS and enterprise teams.

Section 11 / Next Steps

Next steps.

Review this proposal and flag any corrections or additions, particularly on the architecture and scope sections.
Confirm the open items list in Section 8: identify owners and any items already resolved.
Sign off on the discovery sprint engagement. We can start within 5 business days.
Introduce us to the CP API owner so we can begin documentation review during discovery.

Ready to move when you are

Full Python stack end-to-end. FastAPI gateway, LangGraph orchestration, LangChain RAG, LangSmith observability. React TypeScript web chat. React Native iOS app.
Every architectural decision will be documented and owned by your team. No black boxes.
Phase 1 ends with a production-deployed, evaluated system. Not a second prototype.

AI Sales Agent

A clear picture of what we're building.

What you are building.

Who uses this system.

Proposed architecture.

React (TS) Web Chat

React Native iOS

Admin Dashboard

Python FastAPI GatewayNew · built by SoluteLabs

LangGraph + LangChain

pgvector

CP API Bridge

PostgreSQL + Redis

Policy Store

OpenAI / Azure

ElevenLabs TTS

Whisper / Azure STT

Lead CRM

Technical approach.

5.1Dual user flow

5.2Order workflow, as a state machine

Idle → Intent

Cart Assembly ⇌ Modification Loops freely

Estimate

Review

Submit Gated, explicit button

Confirmed Terminal

5.3RAG & knowledge layer

5.4KB management dashboard

5.5Voice mode

5.6HITL escalation

5.7Performance targets

Three phases plus a paid discovery sprint.

Production foundation.

Mobile, voice & sales intelligence.

Scale & personalisation.

Investment & pricing.

7.1Phase schedule

Discovery Sprint

Production Foundation

Mobile, Voice & Sales Intelligence

Scale & Personalisation

Fixed price.

7.2Investment by capability

7.3Operating costs (client-paid)

7.4Post-launch retainer

Open items before SOW.

Engagement model.

Discovery Sprint

Statement of Work

Build

Why SoluteLabs.

Next steps.