Skip to contentSoluteLabs × TruAbutment / Proposal & ArchitectureConfidential / 28 April 2026
A Proposal for TruAbutment, by SoluteLabs
AI Sales Agent
A conversational interface for product discovery and ordering: text and voice, web and iOS, grounded in live catalog data and a single Python service.
Fixed-price investment$114,000
22 weeks build, three milestone-based phases23 weeks end-to-end including the discovery sprintNo change orders for items resolved in discovery
US-hours overlap
Full US business-hours overlap for the entire engagement team: developers, technical lead, and project lead. Same-day Slack and email turnaround, live calls in your time zone, no async-only handoffs.
Section 01 / Executive Summary
A clear picture of what we're building.
Based on our review of TruAbutment's prototype documentation, the call on April 27, and the technical walkthrough, we have a clear picture of what you are building and what it will take to get there.
The prototype validated the right instincts: an orchestrator-expert agent pattern, a proxy layer between the agent and your CP APIs, and a split between informational products (RAG) and catalog items (direct API). The gap between the prototype and a production system is well-defined, and this proposal addresses it directly.
Decisions confirmed on April 27
Full Python stack for the agent layer: FastAPI gateway, LangGraph orchestration, LangChain RAG, LangSmith observability.
React (TypeScript) for web chat; React Native for mobile, iOS-first, Android with minimal additional effort.
90 to 95% of end users are on iOS. Mobile is the primary surface, not secondary.
Multimodal confirmed: text and voice both required from day one.
Non-logged-in users are in scope: FAQ and product Q&A only, with lead capture.
No HIPAA / PII compliance constraints currently.
Timeline: 22 weeks build, 23 weeks end-to-end including discovery, UAT, and go-live. Total fixed-price investment: $114,000, milestone-based.
US-hours team overlap guaranteed for every engineer, the technical lead, and the project lead on this engagement. No async-only handoffs.
Section 02 / Scope
What you are building.
A fully conversational interface for product discovery and ordering, embedded in your website and iOS app. Two fundamentally different types of data, served by two fundamentally different mechanisms: a distinction that shapes the entire architecture.
Layer
Description
Knowledge
8 core informational products (ioConnect, Tru Dual Align Kit, Tru Reamer Kit, HS Cap, T-Marker, TruBolt Kit, ASC Pro Kit, Bite Scope Kit). Answered by RAG. Citation-backed. The agent refuses rather than guesses when context is missing.
Catalog
3,000+ SKUs (TruBase, AOT, T-L, TruScan Body, Lab Analog, fixtures, screws) served by direct CP API lookup. No embedding. Always live, always accurate.
Order
Every order placed through a deterministic state machine. Multi-step, mid-flow modifiable, account-aware pricing, idempotent submit. Voice cannot bypass the confirmation button.
Maintenance
An admin platform that product and marketing can own. Template-enforced KB editor, policy controls, quality gates, one-click rollback.
Lead capture
Non-logged-in visitors get a FAQ and product Q&A experience. High-intent signals captured as leads and routed to a lightweight CRM (initially Google Sheets or a simple database).
Section 03 / Personas
Who uses this system.
Four personas use this system. The Buyer is the primary design constraint: every architectural decision is tested against their needs first.
Persona
Description
The Buyer
Dental labs, clinicians, and resellers ordering implant components mid-case. Time-pressured, mixed catalog fluency, often describing a clinical situation rather than a part number. Needs to arrive at a verified, correctly priced order without leaving the conversation. Surface: web chat (desktop) and iOS app (primary). Logged in. Account-specific pricing applied automatically via CP API.
Non-logged Visitor
New prospects discovering TruAbutment through SEO or referral. Can ask product and FAQ questions, cannot place orders. High-intent signals trigger lead capture and handoff to a sales rep.
KB Editor
Product or marketing staff. No engineering background. Owns content updates through a structured dashboard. Template enforcement at input means their edits cannot degrade retrieval quality.
Policy Admin
Sales operations. Modifies system prompts, guardrail settings, escalation rules, and sales policies through the admin dashboard, without code deployments.
Human Sales Rep
Receives escalations with full context: conversation transcript, current cart state, retrieved KB chunks, and account profile. HITL integration via email or a lightweight CRM.
Section 04 / Architecture
Proposed architecture.
Five layers. Every channel (web, iOS, future touchpoints) communicates through a Python FastAPI gateway built entirely by SoluteLabs. LangGraph orchestration, RAG retrieval, and all business logic run within this unified Python service. BotPress is removed entirely.
L1Clients
React (TS) Web Chat
Streaming · Generative UI order cards · Voice toggle
React Native iOS
iOS-first · Voice-native · Android with minimal delta
Custom
Admin Dashboard
KB editor · Policy manager · RBAC · Quality gate
L2Gateway
Python FastAPI GatewayNew · built by SoluteLabs
Auth & SSO bridge · Conversation logger (async, DB) · Signal parser (intent → iOS UI transitions) · Rate limiter & CP API retry logic · Single /chat endpoint for all channels
L3Orchestration
LangGraph + LangChain
Orchestrator agent · Expert agents per product category · Guardrail node: confidence scoring, source citation, clarification gate · Dual routing: logged-in (full flow) vs non-logged (FAQ + lead capture) · LangSmith observability and trace.
Category A KB · Hybrid search · Cross-encoder re-ranking
TruAbutment API
CP API Bridge
Category B · Live pricing · Inventory · Orders
PostgreSQL + Redis
Cart state · Session cache · Order & conv logs
Custom
Policy Store
System prompts · Guardrails · Escalation rules
L5External
OpenAI / Azure
LLM inference · Embeddings (benchmarked wk 1)
ElevenLabs TTS
Streaming <1.5s · SoluteLabs Solutions Partner
Whisper / Azure STT
Speech-to-text · Selected on latency benchmark
Lead CRM
Google Sheets or lightweight DB · Upgradeable to HubSpot
Section 05 / Technical Approach
Technical approach.
5.1Dual user flow
The agent uses session state to route conversations to two distinct flows. The user sees one interface.
User type
Agent behaviour
Logged-in buyer
Full flow: product Q&A, compatibility, ordering, account-specific pricing. Order state machine active. Voice enabled.
Non-logged visitor
FAQ and product Q&A only. No order access. High-intent signals (pricing, ordering, compatibility questions) trigger lead capture: name, contact, intent summary routed to CRM.
Lead capture does not require a full CRM. Initially: Google Sheets via API, or a minimal DB table (name, email, conversation summary, product interest, timestamp). Sales rep receives an email notification with the transcript. Upgradeable to HubSpot or similar without changing the agent logic.
5.2Order workflow, as a state machine
The LLM proposes. Application logic gates. No order can be created without passing through every state in sequence.
01
Idle → Intent
User describes a need. Orchestrator detects order intent and routes to Cart Assembly.
02
Cart Assembly ⇌ Modification Loops freely
Items extracted via structured LLM output (product_id, quantity, variant). Validated against CP catalog API before entering cart. Ambiguous items trigger clarification cards, never a guess. Phase 1 supports ordering by SKU lookup across the full 3,000+ catalog (buyer names the item, agent fetches and validates). Phase 3 adds conversational discovery ("find me a 4.5mm fixture compatible with this case") over the same catalog.
03
Estimate
Read-only CP API call. Shows live account-discounted pricing. No order created yet.
04
Review
Generative UI card rendered in chat: product list, quantities, total price. User can modify or confirm.
05
Submit Gated, explicit button
Voice cannot trigger submit. The Submit button must be pressed explicitly. Idempotent: dedup key prevents duplicate orders on retry.
06
Confirmed Terminal
CP order created. Confirmation shown in chat with order number.
Failure modes handled
Inventory drift: live re-check at submit time; user notified of any changes between Estimate and Submit.
Session drop: cart state persisted to PostgreSQL (Redis is cache only, not source of truth). Recovery UX on return.
CP API timeout: retry with exponential backoff at the FastAPI gateway layer. User sees a loading state, not an error.
5.3RAG & knowledge layer
The most important decision in the RAG architecture is what not to RAG. Category B catalog items are structured spec data: direct API lookups are faster, cheaper, and always accurate. RAG is reserved exclusively for Category A.
Category
Architecture
Category A: 8 informational SKUs
pgvector database. Template-aligned chunking (one chunk per KB section: Overview, Components, Compatibility, Workflow, Specs, FAQ, Ordering). Each chunk tagged with product_id, section_type, version. Hybrid retrieval: dense vector + BM25 keyword. Cross-encoder re-ranking before LLM context. Citation-backed responses. Agent refuses rather than guesses when confidence is below threshold.
Category B: 3,000+ catalog SKUs
Direct CP catalog API. Real-time MSRP plus account-discounted pricing. Live inventory. No embedding, no staleness risk.
On vector store and embedding model selection: we commit to pgvector on PostgreSQL as the vector store: no new infrastructure, transactional consistency with the rest of the data layer, and proven at the volume of 8 KB products with sectioned chunks. The embedding model is benchmarked in week 1 of the discovery sprint against actual KB content and real user queries; dental and clinical terminology is specific enough that the right choice must be validated, not assumed.
On iterative improvement: we distinguish three things that are often conflated. (1) RAG iteration: improving chunking, retrieval, and re-ranking based on real query telemetry. Ongoing, high-leverage. (2) Prompt iteration: refining system prompts and few-shot examples. Also ongoing. (3) Model fine-tuning: retraining the LLM on TruAbutment's domain. We recommend against this. The gains are marginal when RAG is well-designed, and the maintenance cost is high.
5.4KB management dashboard
The admin dashboard is a blocking dependency for sustainable operation. Without it, every KB update requires an IT ticket. It is in scope for Phase 1.
Publish pipeline: Edit → Validate → Re-embed (changed chunks only) → Index → Regression eval → Live. Triggered by clicking Publish. No manual steps.
Role
Access level
Content Editor (Product / Marketing)
Section content fields only. Cannot modify chunk boundaries, templates, or system prompts.
Policy Admin (Sales Ops)
System prompts, guardrail settings, escalation rules, sales policies.
Quality gate is a soft warning, not a hard block. When a KB update causes a regression on the golden QA set, the publisher sees a warning and must explicitly override to proceed. Hard blocks cause operational friction: a product manager cannot fix a typo because an unrelated test regressed.
5.5Voice mode
Component
Approach
Speech-to-text (STT)
OpenAI Whisper API or Azure Speech-to-Text. Selected based on latency benchmark in discovery.
Text-to-speech (TTS)
ElevenLabs streaming TTS. SoluteLabs is an ElevenLabs Solutions Partner. Target: <1.5s time-to-first-audio via streaming token pipeline.
Safety gate
Voice cannot submit an order. The Submit button must be pressed explicitly. Confirmed as a design requirement on the April 27 call.
Toggle
Per-session button on both web chat and iOS. Does not affect conversation state.
5.6HITL escalation
The agent escalates when it detects: negative sentiment, three consecutive clarification failures, an explicit user request, or a high-value order threshold (configurable).
On escalation, the agent pauses the conversation and sends the sales rep a package containing: full conversation transcript, current cart state, retrieved KB chunks that grounded the last response, and the user's account profile (if logged in). Initial delivery via email. Upgradeable to Intercom or Zendesk without changing the agent logic.
5.7Performance targets
The following are design targets, not contractual SLAs. They represent the user-perceived latency we engineer toward. Production SLAs (uptime, error-rate guarantees) are scoped in the post-launch retainer once real load patterns are established.
Metric
Target
Chat first-token latency
<2s P50 · <4s P95
Voice time-to-first-audio
<1.5s P50 (streaming TTS)
CP API tool calls
<3s typical · exponential backoff retry on timeout
Order state transitions
Synchronous · idempotent on retry (dedup key prevents duplicates)
Build / UAT phase availability
Best-effort during US business hours · staging environment available 24/7
Section 06 / Phased Delivery
Three phases plus a paid discovery sprint.
Phase 1 is the production foundation: everything else builds on it. Phases 1 and 2 overlap by ~2 weeks once the /chat API is stable.
Phase01
Production foundation.
9 weeks/2 dev + TL @ 50%/$39,000
Python FastAPI Gateway: auth/SSO bridge, conversation logger, signal parser, rate limiting, CP API bridge1.5 wks
pgvector setup, embed all Category A KB (8 products), hybrid search (dense + BM25)1 wk
Account-specific pricing surfaced in agent responses0.5 wk
UAT + bug fix cycle1 wk
Outcome
iOS app live with voice, generative UI order flow, lead capture, and upselling. UAT milestone.
Phase03
Scale & personalisation.
6 weeks/2 dev + TL @ 50%/$24,000
Conversational catalog discovery: natural-language search across the full 3,000+ SKU catalog ("find me a 4.5mm fixture compatible with this case"). Builds on the SKU-lookup ordering already shipped in Phase 1.2 wks
Mobile catalog search UI (iOS)1.5 wks
Personalised recommendations from order_logs (purchase history)1.5 wks
Full SKU coverage, personalised recommendations, analytics for the sales team. Full go-live.
Total timeline / 22 weeks build, 23 weeks E2E
Discovery sprint (1 week) runs before the build clock starts and resolves the open items in Section 8. CP API documentation, SSO scope, and idempotency questions are answered before Phase 1 begins.
Phase 1 (9 wks) and Phase 2 (9 wks) overlap by ~2 weeks once the /chat API endpoint is stable. Phase 3 (6 wks) begins immediately after Phase 2 UAT sign-off.
Compression to ~18 weeks is feasible by adding a fourth developer in Phase 1, subject to discovery-sprint findings on CP API and SSO complexity.
Section 07 / Investment
Investment & pricing.
A fixed-price proposal. The price absorbs the discovery-phase risk on the open items in Section 8 (CP API documentation, SSO status, idempotency). No change orders on items resolved in the discovery sprint.
Pricing is based on our standard rates: $40/hr for developers and $60/hr for technical leads, 40 billable hours per engineer per week. Dev hours reflect a 25% efficiency gain from agentic tooling (Claude Code, Cursor). All team members carry US-hours availability.
7.1Phase schedule
00
Discovery Sprint
1 week · 1 dev (40h) + 1 TL (20h)
$3,000
01
Production Foundation
9 weeks · 2 dev + TL @ 50%
$39,000
02
Mobile, Voice & Sales Intelligence
9 weeks · 3 dev + TL @ 50%
$48,000
03
Scale & Personalisation
6 weeks · 2 dev + TL @ 50%
$24,000
Total
Fixed price.
22 weeks build · 23 weeks end-to-end
$114,000
7.2Investment by capability
Same total, viewed by what each capability delivers. Useful when prioritising features or evaluating ROI per surface.
Capability
What it delivers
Investment
Discovery Sprint
Open items resolved · architecture locked · evaluation set agreed · per-phase SOW finalized
$3,000
Logged-in Buyer Core
FastAPI gateway · LangGraph orchestration · pgvector RAG with hybrid search · hallucination guardrails · React web chat · order state machine · eval framework · account-specific pricing
LLM, STT, TTS, and embedding inference costs are paid directly by TruAbutment via your own API keys (OpenAI or Azure for LLM and embeddings, ElevenLabs for TTS). This keeps data ownership, rate limits, and any enterprise-rate negotiations on your side. SoluteLabs configures the keys; we do not mark up inference.
Indicative monthly inference
$500 to $1,500 / month depending on conversation volume (assumes GPT-4o-mini for orchestration routing, GPT-4o for expert agents, Whisper STT, ElevenLabs streaming TTS at moderate voice usage).
A per-conversation cost model is delivered with Phase 1 so you can forecast spend as adoption grows. Final monthly figure is firmed up post-discovery once volume assumptions are validated.
7.4Post-launch retainer
After go-live, a monthly retainer covers RAG iteration (chunking, retrieval, re-ranking improvements based on real query telemetry), prompt iteration, KB content support, eval set expansion, monitoring, and minor feature work. Tier can be adjusted month-to-month with 30 days' notice. Unused hours roll forward one month.
Tier
Monthly
Coverage
Light
$2,500
1 dev × 10h/week · KB support, monitoring, minor bug fixes
2 dev × 20h/week + TL × 10h/week · active feature development, multi-quarter product roadmap
How we hold this fixed price
Discovery sprint absorbs the open items: CP API documentation, SSO scope, and idempotency questions are resolved in week 1, before the build clock starts. The fixed price absorbs what an indicative range would have left ambiguous, so you do not see scope-creep change orders mid-build.
TL at 50% allocation: the technical lead provides architecture oversight, code review, and client coordination, not full-time coding. Billed at 20h/week.
Agentic tooling efficiency: we use Claude Code, Cursor, and AI-assisted scaffolding throughout. We pass that efficiency back as a 25% reduction on dev hours, baked into the rate.
Milestone-based payment: each phase has clear deliverables and acceptance criteria before the next begins. Payment is per-phase against acceptance, not lump sum upfront.
Section 08 / Open Items
Open items before SOW.
These items need to be resolved before we produce a fixed-price Statement of Work. We propose resolving them in a paid 1-week discovery sprint.
Item
Why it matters
Owner
CP API documentation
Determines discovery buffer. Undocumented APIs add 1 to 2 weeks per phase.
TruAbutment
CP Order API idempotency
If the API does not dedup on retry, we add a reconciliation layer at the gateway. Affects Phase 1 scope.
TruAbutment
SSO status
Account-specific pricing depends on SSO. Built, in progress, or to be built?
TruAbutment
Existing regression eval set
Quality gate is meaningless without one. If none exists, we build it in week 1 with the product expert.
Joint
KB content owner
The highest-leverage role on your side. Who from product or marketing owns KB during the build?
TruAbutment
Lead CRM target
Google Sheets, simple DB, or an existing tool? Affects Phase 2 integration scope.
TruAbutment
Section 09 / Engagement
Engagement model.
Three steps from this proposal to Phase 1 kickoff. Discovery sprint runs first to lock the items in Section 8, then a fixed-price SOW per phase, then build.
01
Discovery Sprint
1 week · paid ($3,000). Structured workshop with your product and technical team. We resolve all six open items, lock the architecture, agree the evaluation set, and produce a detailed SOW with weeks, deliverables, and acceptance criteria per phase.
02
Statement of Work
Milestone-based pricing per phase. Each phase has a clear deliverable and acceptance criteria before the next begins. In/out scope explicit. No lump sum.
03
Build
Phase 1 starts immediately after SOW signature. Weekly standups, fortnightly demos. US-hours overlap guaranteed for all team members on this engagement.
Section 10 / Why SoluteLabs
Why SoluteLabs.
US-hours overlap
Every engineer, the technical lead, and the project lead on this engagement work US business hours. Live standups, fortnightly demos, and Slack response within your working day. No 12-hour async lag, no "we'll catch up tomorrow." This is a non-negotiable for our team composition on this project.
AI agent engineering
Production RAG pipelines, LangGraph orchestration, hallucination guardrails. This is our primary practice, not a side capability.
Python & AI ecosystem
Full Python stack: FastAPI gateway, LangGraph orchestration, LangChain RAG pipeline, LangSmith observability. We own the entire agent surface end-to-end.
ElevenLabs Solutions Partner
Voice pipeline delivered without integration overhead. We have shipped text and voice AI agents in healthcare contexts.
React & React Native
React TypeScript web chat plus React Native iOS-first mobile. No second codebase. Android available with minimal delta.
Health tech experience
We work with clients in the health tech sector and understand the precision and trust requirements of clinical workflows.
12 years · US entity
40-person team. Founded 2014, US entity in Delaware. Long-running engagements with B2B SaaS and enterprise teams.
Section 11 / Next Steps
Next steps.
Review this proposal and flag any corrections or additions, particularly on the architecture and scope sections.
Confirm the open items list in Section 8: identify owners and any items already resolved.
Sign off on the discovery sprint engagement. We can start within 5 business days.
Introduce us to the CP API owner so we can begin documentation review during discovery.
Ready to move when you are
Full Python stack end-to-end. FastAPI gateway, LangGraph orchestration, LangChain RAG, LangSmith observability. React TypeScript web chat. React Native iOS app.
Every architectural decision will be documented and owned by your team. No black boxes.
Phase 1 ends with a production-deployed, evaluated system. Not a second prototype.