SoluteLabs × TruAbutment / Proposal & Architecture Confidential / 28 April 2026
A Proposal for TruAbutment, by SoluteLabs

AI Sales Agent

A conversational interface for product discovery and ordering: text and voice, web and iOS, grounded in live catalog data and a single Python service.

Fixed-price investment $114,000
22 weeks build, three milestone-based phases 23 weeks end-to-end including the discovery sprint No change orders for items resolved in discovery
US-hours overlap

Full US business-hours overlap for the entire engagement team: developers, technical lead, and project lead. Same-day Slack and email turnaround, live calls in your time zone, no async-only handoffs.

Section 01 / Executive Summary

A clear picture of what we're building.

Based on our review of TruAbutment's prototype documentation, the call on April 27, and the technical walkthrough, we have a clear picture of what you are building and what it will take to get there.

The prototype validated the right instincts: an orchestrator-expert agent pattern, a proxy layer between the agent and your CP APIs, and a split between informational products (RAG) and catalog items (direct API). The gap between the prototype and a production system is well-defined, and this proposal addresses it directly.

Decisions confirmed on April 27
  • Full Python stack for the agent layer: FastAPI gateway, LangGraph orchestration, LangChain RAG, LangSmith observability.
  • React (TypeScript) for web chat; React Native for mobile, iOS-first, Android with minimal additional effort.
  • 90 to 95% of end users are on iOS. Mobile is the primary surface, not secondary.
  • Multimodal confirmed: text and voice both required from day one.
  • Non-logged-in users are in scope: FAQ and product Q&A only, with lead capture.
  • No HIPAA / PII compliance constraints currently.
  • Timeline: 22 weeks build, 23 weeks end-to-end including discovery, UAT, and go-live. Total fixed-price investment: $114,000, milestone-based.
  • US-hours team overlap guaranteed for every engineer, the technical lead, and the project lead on this engagement. No async-only handoffs.
Section 02 / Scope

What you are building.

A fully conversational interface for product discovery and ordering, embedded in your website and iOS app. Two fundamentally different types of data, served by two fundamentally different mechanisms: a distinction that shapes the entire architecture.

LayerDescription
Knowledge8 core informational products (ioConnect, Tru Dual Align Kit, Tru Reamer Kit, HS Cap, T-Marker, TruBolt Kit, ASC Pro Kit, Bite Scope Kit). Answered by RAG. Citation-backed. The agent refuses rather than guesses when context is missing.
Catalog3,000+ SKUs (TruBase, AOT, T-L, TruScan Body, Lab Analog, fixtures, screws) served by direct CP API lookup. No embedding. Always live, always accurate.
OrderEvery order placed through a deterministic state machine. Multi-step, mid-flow modifiable, account-aware pricing, idempotent submit. Voice cannot bypass the confirmation button.
MaintenanceAn admin platform that product and marketing can own. Template-enforced KB editor, policy controls, quality gates, one-click rollback.
Lead captureNon-logged-in visitors get a FAQ and product Q&A experience. High-intent signals captured as leads and routed to a lightweight CRM (initially Google Sheets or a simple database).
Section 03 / Personas

Who uses this system.

Four personas use this system. The Buyer is the primary design constraint: every architectural decision is tested against their needs first.

PersonaDescription
The BuyerDental labs, clinicians, and resellers ordering implant components mid-case. Time-pressured, mixed catalog fluency, often describing a clinical situation rather than a part number. Needs to arrive at a verified, correctly priced order without leaving the conversation. Surface: web chat (desktop) and iOS app (primary). Logged in. Account-specific pricing applied automatically via CP API.
Non-logged VisitorNew prospects discovering TruAbutment through SEO or referral. Can ask product and FAQ questions, cannot place orders. High-intent signals trigger lead capture and handoff to a sales rep.
KB EditorProduct or marketing staff. No engineering background. Owns content updates through a structured dashboard. Template enforcement at input means their edits cannot degrade retrieval quality.
Policy AdminSales operations. Modifies system prompts, guardrail settings, escalation rules, and sales policies through the admin dashboard, without code deployments.
Human Sales RepReceives escalations with full context: conversation transcript, current cart state, retrieved KB chunks, and account profile. HITL integration via email or a lightweight CRM.
Section 04 / Architecture

Proposed architecture.

Five layers. Every channel (web, iOS, future touchpoints) communicates through a Python FastAPI gateway built entirely by SoluteLabs. LangGraph orchestration, RAG retrieval, and all business logic run within this unified Python service. BotPress is removed entirely.

L1Clients

React (TS) Web Chat

Streaming · Generative UI order cards · Voice toggle

React Native iOS

iOS-first · Voice-native · Android with minimal delta

Custom

Admin Dashboard

KB editor · Policy manager · RBAC · Quality gate

L2Gateway

Python FastAPI GatewayNew · built by SoluteLabs

Auth & SSO bridge · Conversation logger (async, DB) · Signal parser (intent → iOS UI transitions) · Rate limiter & CP API retry logic · Single /chat endpoint for all channels

L3Orchestration

LangGraph + LangChain

Orchestrator agent · Expert agents per product category · Guardrail node: confidence scoring, source citation, clarification gate · Dual routing: logged-in (full flow) vs non-logged (FAQ + lead capture) · LangSmith observability and trace.

Tool registry: product_search · price_lookup · inventory_check · order_create · upsell_recommend · escalate · lead_capture

L4Data

pgvector

Category A KB · Hybrid search · Cross-encoder re-ranking

TruAbutment API

CP API Bridge

Category B · Live pricing · Inventory · Orders

PostgreSQL + Redis

Cart state · Session cache · Order & conv logs

Custom

Policy Store

System prompts · Guardrails · Escalation rules

L5External

OpenAI / Azure

LLM inference · Embeddings (benchmarked wk 1)

ElevenLabs TTS

Streaming <1.5s · SoluteLabs Solutions Partner

Whisper / Azure STT

Speech-to-text · Selected on latency benchmark

Lead CRM

Google Sheets or lightweight DB · Upgradeable to HubSpot

Section 05 / Technical Approach

Technical approach.

5.1Dual user flow

The agent uses session state to route conversations to two distinct flows. The user sees one interface.

User typeAgent behaviour
Logged-in buyerFull flow: product Q&A, compatibility, ordering, account-specific pricing. Order state machine active. Voice enabled.
Non-logged visitorFAQ and product Q&A only. No order access. High-intent signals (pricing, ordering, compatibility questions) trigger lead capture: name, contact, intent summary routed to CRM.

Lead capture does not require a full CRM. Initially: Google Sheets via API, or a minimal DB table (name, email, conversation summary, product interest, timestamp). Sales rep receives an email notification with the transcript. Upgradeable to HubSpot or similar without changing the agent logic.

5.2Order workflow, as a state machine

The LLM proposes. Application logic gates. No order can be created without passing through every state in sequence.

  1. 01

    Idle Intent

    User describes a need. Orchestrator detects order intent and routes to Cart Assembly.

  2. 02

    Cart Assembly Modification Loops freely

    Items extracted via structured LLM output (product_id, quantity, variant). Validated against CP catalog API before entering cart. Ambiguous items trigger clarification cards, never a guess. Phase 1 supports ordering by SKU lookup across the full 3,000+ catalog (buyer names the item, agent fetches and validates). Phase 3 adds conversational discovery ("find me a 4.5mm fixture compatible with this case") over the same catalog.

  3. 03

    Estimate

    Read-only CP API call. Shows live account-discounted pricing. No order created yet.

  4. 04

    Review

    Generative UI card rendered in chat: product list, quantities, total price. User can modify or confirm.

  5. 05

    Submit Gated, explicit button

    Voice cannot trigger submit. The Submit button must be pressed explicitly. Idempotent: dedup key prevents duplicate orders on retry.

  6. 06

    Confirmed Terminal

    CP order created. Confirmation shown in chat with order number.

Failure modes handled
  • Inventory drift: live re-check at submit time; user notified of any changes between Estimate and Submit.
  • Session drop: cart state persisted to PostgreSQL (Redis is cache only, not source of truth). Recovery UX on return.
  • CP API timeout: retry with exponential backoff at the FastAPI gateway layer. User sees a loading state, not an error.

5.3RAG & knowledge layer

The most important decision in the RAG architecture is what not to RAG. Category B catalog items are structured spec data: direct API lookups are faster, cheaper, and always accurate. RAG is reserved exclusively for Category A.

CategoryArchitecture
Category A: 8 informational SKUspgvector database. Template-aligned chunking (one chunk per KB section: Overview, Components, Compatibility, Workflow, Specs, FAQ, Ordering). Each chunk tagged with product_id, section_type, version. Hybrid retrieval: dense vector + BM25 keyword. Cross-encoder re-ranking before LLM context. Citation-backed responses. Agent refuses rather than guesses when confidence is below threshold.
Category B: 3,000+ catalog SKUsDirect CP catalog API. Real-time MSRP plus account-discounted pricing. Live inventory. No embedding, no staleness risk.

On vector store and embedding model selection: we commit to pgvector on PostgreSQL as the vector store: no new infrastructure, transactional consistency with the rest of the data layer, and proven at the volume of 8 KB products with sectioned chunks. The embedding model is benchmarked in week 1 of the discovery sprint against actual KB content and real user queries; dental and clinical terminology is specific enough that the right choice must be validated, not assumed.

On iterative improvement: we distinguish three things that are often conflated. (1) RAG iteration: improving chunking, retrieval, and re-ranking based on real query telemetry. Ongoing, high-leverage. (2) Prompt iteration: refining system prompts and few-shot examples. Also ongoing. (3) Model fine-tuning: retraining the LLM on TruAbutment's domain. We recommend against this. The gains are marginal when RAG is well-designed, and the maintenance cost is high.

5.4KB management dashboard

The admin dashboard is a blocking dependency for sustainable operation. Without it, every KB update requires an IT ticket. It is in scope for Phase 1.

Publish pipeline: Edit → Validate → Re-embed (changed chunks only) → Index → Regression eval → Live. Triggered by clicking Publish. No manual steps.

RoleAccess level
Content Editor (Product / Marketing)Section content fields only. Cannot modify chunk boundaries, templates, or system prompts.
Policy Admin (Sales Ops)System prompts, guardrail settings, escalation rules, sales policies.
EngineeringTemplate schema, eval set, retrieval configuration.

Quality gate is a soft warning, not a hard block. When a KB update causes a regression on the golden QA set, the publisher sees a warning and must explicitly override to proceed. Hard blocks cause operational friction: a product manager cannot fix a typo because an unrelated test regressed.

5.5Voice mode

ComponentApproach
Speech-to-text (STT)OpenAI Whisper API or Azure Speech-to-Text. Selected based on latency benchmark in discovery.
Text-to-speech (TTS)ElevenLabs streaming TTS. SoluteLabs is an ElevenLabs Solutions Partner. Target: <1.5s time-to-first-audio via streaming token pipeline.
Safety gateVoice cannot submit an order. The Submit button must be pressed explicitly. Confirmed as a design requirement on the April 27 call.
TogglePer-session button on both web chat and iOS. Does not affect conversation state.

5.6HITL escalation

The agent escalates when it detects: negative sentiment, three consecutive clarification failures, an explicit user request, or a high-value order threshold (configurable).

On escalation, the agent pauses the conversation and sends the sales rep a package containing: full conversation transcript, current cart state, retrieved KB chunks that grounded the last response, and the user's account profile (if logged in). Initial delivery via email. Upgradeable to Intercom or Zendesk without changing the agent logic.

5.7Performance targets

The following are design targets, not contractual SLAs. They represent the user-perceived latency we engineer toward. Production SLAs (uptime, error-rate guarantees) are scoped in the post-launch retainer once real load patterns are established.

MetricTarget
Chat first-token latency<2s P50 · <4s P95
Voice time-to-first-audio<1.5s P50 (streaming TTS)
CP API tool calls<3s typical · exponential backoff retry on timeout
Order state transitionsSynchronous · idempotent on retry (dedup key prevents duplicates)
Build / UAT phase availabilityBest-effort during US business hours · staging environment available 24/7
Section 06 / Phased Delivery

Three phases plus a paid discovery sprint.

Phase 1 is the production foundation: everything else builds on it. Phases 1 and 2 overlap by ~2 weeks once the /chat API is stable.

Phase 01

Production foundation.

9 weeks / 2 dev + TL @ 50% / $39,000
  • Python FastAPI Gateway: auth/SSO bridge, conversation logger, signal parser, rate limiting, CP API bridge1.5 wks
  • pgvector setup, embed all Category A KB (8 products), hybrid search (dense + BM25)1 wk
  • LangGraph orchestration layer (replaces BotPress brain) + LangSmith tracing2.5 wks
  • Hallucination guardrails: confidence gate, citation enforcement, clarification node0.5 wk
  • KB Management Dashboard MVP: template editor, publish pipeline, quality gate1.5 wks
  • React (TypeScript) web chat UI (replaces BotPress Webchat)1 wk
  • Eval framework: golden QA set, regression suite0.5 wk
  • Integration testing and staging deploy0.5 wk
Outcome

Production web agent with real RAG, zero-hallucination guardrails, and non-technical KB editing. BotPress fully removed. Full Python stack.

Phase 02

Mobile, voice & sales intelligence.

9 weeks / 3 dev + TL @ 50% / $48,000
  • React Native iOS app + Chat API integration3 wks
  • Generative UI order cards (product selection, cart review, confirmation)1 wk
  • Voice pipeline: Whisper STT, ElevenLabs TTS, streaming, toggle button1.5 wks
  • Non-logged-in user flow + lead capture pipeline (CRM integration)1 wk
  • Upselling engine: product relation graph, upsell tool, conversational suggestion flow1 wk
  • HITL escalation: sentiment detection, context package, email/CRM handover0.5 wk
  • Account-specific pricing surfaced in agent responses0.5 wk
  • UAT + bug fix cycle1 wk
Outcome

iOS app live with voice, generative UI order flow, lead capture, and upselling. UAT milestone.

Phase 03

Scale & personalisation.

6 weeks / 2 dev + TL @ 50% / $24,000
  • Conversational catalog discovery: natural-language search across the full 3,000+ SKU catalog ("find me a 4.5mm fixture compatible with this case"). Builds on the SKU-lookup ordering already shipped in Phase 1.2 wks
  • Mobile catalog search UI (iOS)1.5 wks
  • Personalised recommendations from order_logs (purchase history)1.5 wks
  • Analytics: conversation quality dashboard, intent tracking, funnel metrics1 wk
  • Go-live + hypercare0.5 wk
Outcome

Full SKU coverage, personalised recommendations, analytics for the sales team. Full go-live.

Total timeline / 22 weeks build, 23 weeks E2E
  • Discovery sprint (1 week) runs before the build clock starts and resolves the open items in Section 8. CP API documentation, SSO scope, and idempotency questions are answered before Phase 1 begins.
  • Phase 1 (9 wks) and Phase 2 (9 wks) overlap by ~2 weeks once the /chat API endpoint is stable. Phase 3 (6 wks) begins immediately after Phase 2 UAT sign-off.
  • Compression to ~18 weeks is feasible by adding a fourth developer in Phase 1, subject to discovery-sprint findings on CP API and SSO complexity.
Section 07 / Investment

Investment & pricing.

A fixed-price proposal. The price absorbs the discovery-phase risk on the open items in Section 8 (CP API documentation, SSO status, idempotency). No change orders on items resolved in the discovery sprint.

Pricing is based on our standard rates: $40/hr for developers and $60/hr for technical leads, 40 billable hours per engineer per week. Dev hours reflect a 25% efficiency gain from agentic tooling (Claude Code, Cursor). All team members carry US-hours availability.

7.1Phase schedule

00

Discovery Sprint

1 week · 1 dev (40h) + 1 TL (20h)
$3,000
01

Production Foundation

9 weeks · 2 dev + TL @ 50%
$39,000
02

Mobile, Voice & Sales Intelligence

9 weeks · 3 dev + TL @ 50%
$48,000
03

Scale & Personalisation

6 weeks · 2 dev + TL @ 50%
$24,000
Total

Fixed price.

22 weeks build · 23 weeks end-to-end
$114,000

7.2Investment by capability

Same total, viewed by what each capability delivers. Useful when prioritising features or evaluating ROI per surface.

CapabilityWhat it deliversInvestment
Discovery SprintOpen items resolved · architecture locked · evaluation set agreed · per-phase SOW finalized$3,000
Logged-in Buyer CoreFastAPI gateway · LangGraph orchestration · pgvector RAG with hybrid search · hallucination guardrails · React web chat · order state machine · eval framework · account-specific pricing$35,000
KB Management DashboardTemplate-enforced editor for non-technical content owners · publish pipeline · quality gate · RBAC$6,500
Mobile App (iOS-first, RN)iOS app · Chat API integration · generative UI order cards · mobile catalog search · UAT$30,500
Voice ModeWhisper STT · ElevenLabs streaming TTS · per-session toggle · <1.5s time-to-first-audio$7,500
Non-logged + Lead CaptureFAQ & product Q&A flow · high-intent detection · CRM handoff · email notification with transcript$5,000
Sales IntelligenceUpselling engine (product relation graph) · HITL escalation with context package · personalised recommendations$13,000
Catalog Scale, Analytics & Go-LiveConversational discovery over 3,000+ SKU catalog · conversation analytics dashboard · go-live + hypercare$13,500
Total$114,000

7.3Operating costs (client-paid)

LLM, STT, TTS, and embedding inference costs are paid directly by TruAbutment via your own API keys (OpenAI or Azure for LLM and embeddings, ElevenLabs for TTS). This keeps data ownership, rate limits, and any enterprise-rate negotiations on your side. SoluteLabs configures the keys; we do not mark up inference.

Indicative monthly inference
  • $500 to $1,500 / month depending on conversation volume (assumes GPT-4o-mini for orchestration routing, GPT-4o for expert agents, Whisper STT, ElevenLabs streaming TTS at moderate voice usage).
  • A per-conversation cost model is delivered with Phase 1 so you can forecast spend as adoption grows. Final monthly figure is firmed up post-discovery once volume assumptions are validated.

7.4Post-launch retainer

After go-live, a monthly retainer covers RAG iteration (chunking, retrieval, re-ranking improvements based on real query telemetry), prompt iteration, KB content support, eval set expansion, monitoring, and minor feature work. Tier can be adjusted month-to-month with 30 days' notice. Unused hours roll forward one month.

TierMonthlyCoverage
Light$2,5001 dev × 10h/week · KB support, monitoring, minor bug fixes
Heavy$8,0002 dev × 20h/week + TL × 10h/week · active feature development, multi-quarter product roadmap
How we hold this fixed price
  • Discovery sprint absorbs the open items: CP API documentation, SSO scope, and idempotency questions are resolved in week 1, before the build clock starts. The fixed price absorbs what an indicative range would have left ambiguous, so you do not see scope-creep change orders mid-build.
  • TL at 50% allocation: the technical lead provides architecture oversight, code review, and client coordination, not full-time coding. Billed at 20h/week.
  • Agentic tooling efficiency: we use Claude Code, Cursor, and AI-assisted scaffolding throughout. We pass that efficiency back as a 25% reduction on dev hours, baked into the rate.
  • Milestone-based payment: each phase has clear deliverables and acceptance criteria before the next begins. Payment is per-phase against acceptance, not lump sum upfront.
Section 08 / Open Items

Open items before SOW.

These items need to be resolved before we produce a fixed-price Statement of Work. We propose resolving them in a paid 1-week discovery sprint.

ItemWhy it mattersOwner
CP API documentationDetermines discovery buffer. Undocumented APIs add 1 to 2 weeks per phase.TruAbutment
CP Order API idempotencyIf the API does not dedup on retry, we add a reconciliation layer at the gateway. Affects Phase 1 scope.TruAbutment
SSO statusAccount-specific pricing depends on SSO. Built, in progress, or to be built?TruAbutment
Existing regression eval setQuality gate is meaningless without one. If none exists, we build it in week 1 with the product expert.Joint
KB content ownerThe highest-leverage role on your side. Who from product or marketing owns KB during the build?TruAbutment
Lead CRM targetGoogle Sheets, simple DB, or an existing tool? Affects Phase 2 integration scope.TruAbutment
Section 09 / Engagement

Engagement model.

Three steps from this proposal to Phase 1 kickoff. Discovery sprint runs first to lock the items in Section 8, then a fixed-price SOW per phase, then build.

01

Discovery Sprint

1 week · paid ($3,000). Structured workshop with your product and technical team. We resolve all six open items, lock the architecture, agree the evaluation set, and produce a detailed SOW with weeks, deliverables, and acceptance criteria per phase.

02

Statement of Work

Milestone-based pricing per phase. Each phase has a clear deliverable and acceptance criteria before the next begins. In/out scope explicit. No lump sum.

03

Build

Phase 1 starts immediately after SOW signature. Weekly standups, fortnightly demos. US-hours overlap guaranteed for all team members on this engagement.

Section 10 / Why SoluteLabs

Why SoluteLabs.

US-hours overlapEvery engineer, the technical lead, and the project lead on this engagement work US business hours. Live standups, fortnightly demos, and Slack response within your working day. No 12-hour async lag, no "we'll catch up tomorrow." This is a non-negotiable for our team composition on this project.
AI agent engineeringProduction RAG pipelines, LangGraph orchestration, hallucination guardrails. This is our primary practice, not a side capability.
Python & AI ecosystemFull Python stack: FastAPI gateway, LangGraph orchestration, LangChain RAG pipeline, LangSmith observability. We own the entire agent surface end-to-end.
ElevenLabs Solutions PartnerVoice pipeline delivered without integration overhead. We have shipped text and voice AI agents in healthcare contexts.
React & React NativeReact TypeScript web chat plus React Native iOS-first mobile. No second codebase. Android available with minimal delta.
Health tech experienceWe work with clients in the health tech sector and understand the precision and trust requirements of clinical workflows.
12 years · US entity40-person team. Founded 2014, US entity in Delaware. Long-running engagements with B2B SaaS and enterprise teams.
Section 11 / Next Steps

Next steps.

  1. Review this proposal and flag any corrections or additions, particularly on the architecture and scope sections.
  2. Confirm the open items list in Section 8: identify owners and any items already resolved.
  3. Sign off on the discovery sprint engagement. We can start within 5 business days.
  4. Introduce us to the CP API owner so we can begin documentation review during discovery.
Ready to move when you are
  • Full Python stack end-to-end. FastAPI gateway, LangGraph orchestration, LangChain RAG, LangSmith observability. React TypeScript web chat. React Native iOS app.
  • Every architectural decision will be documented and owned by your team. No black boxes.
  • Phase 1 ends with a production-deployed, evaluated system. Not a second prototype.