TwinCore built a custom AI agent platform for a Florida B2B research company. The platform replaced manual company research, competitor monitoring, lead enrichment, and report generation with supervised agent workflows, field-level confidence scoring, audit trails, and structured exports to CRM.

Country
USA (Florida)
Project duration
12 months, PoC running internally by week 4; core platform in production use by week 12
Team
  • 3 full-stack developers
Outcome
Manual research and monitoring workflows replaced by supervised, structured agent runs with full audit trail; 14 reusable agent types across 6 workflow categories

About the client

The client is a Florida-based company of roughly 20–50 employees serving B2B sales and marketing teams that need lead lists, competitor reports, and market scans on a recurring schedule. Their internal workflows were largely manual: a small research team switched between spreadsheets, websites, documents, and search tools to complete tasks that repeated every week. They had already tried Make.com plus OpenAI Assistants and it stalled at single-purpose flows that broke on edge cases.

“We can see what every agent is doing, fix it when it goes wrong, and pull the results straight into our CRM. That was the part missing from every off-the-shelf tool we tried.”

Sam S.
Sam S.
Florida-based B2B company (name withheld at client request)
14

reusable AI agent types deployed across 6 workflow categories

2–3 days

company-research cycle instead 10–15 working days

5.5 hours/week

median reclaimed across 4 researchers, measured via weekly task log audit at month 6

Field-level

confidence scoring and full per-run audit trail shipped in week 12, eliminating downstream CRM imports of unflagged low-confidence records

Custom AI Agent Platform for Business Workflow Automation | Case Study

What client received

  • Custom AI Agent Platform

    Problem: The client ran multiple different AI workflows with no unified system. Each task required a separate tool or manual process.

    Result: TwinCore built a platform where users create, configure, launch, and manage different AI agent types from one interface.

    Recurring research and monitoring workflows moved into the platform during the first months of production use, replacing the prior mix of spreadsheets, manual web sessions, and one-off tools.
  • Reusable Workflows Library: 14 agent types

    Problem: Research, scraping, analysis, and content tasks each require different logic, inputs, and outputs. No single template covers them all.

    Result: The platform ships with 14 agent types. Confirmed templates include AI Company Analysis, AI Real Estate Scraper, AI Research Assistant, Market Analysis, Competitor Research, Keyword Analysis, and Blog Idea Generator. Additional agents cover lead enrichment, document Q&A, scheduled site monitoring, and content briefing workflows.

    New agent types are added in days, not weeks: a config file, a class implementing the agent contract, and prompt templates. No platform rebuild.
  • AI-assisted and manual configuration

    Problem: Non-technical users need a fast way to create agents; technical users need control over sources, rules, and run behavior.

    Result: Two creation modes are available: Describe with AI (plain English input) and Configure manually (category, sources, parameters).

    Business users now create agents without engineering involvement; technical users keep full control over sources and run behavior.
  • Real-time agent execution monitoring

    Problem: AI workflows have no value if users cannot see what the agent is doing or why a run failed.

    Result: The platform tracks run status, progress, crawled pages, results found, logs, warnings, errors, and retries, visible in real time.

    100% of runs tracked with full audit trail and retry history.
  • Structured results and export

    Problem: Business teams need usable output they can act on, not raw text or unstructured AI responses.

    Result: Agents return structured results with named fields, confidence scores, per-record statuses, and CSV / Excel export.

    Output feeds directly into sales, marketing, operations, or reporting workflows, no manual cleanup.

Project Goal

The goal was to replace fragmented manual workflows with a single, controllable AI automation platform. The client needed business users to create new agent types, configure agent behavior, choose between AI-assisted and manual setup, upload files, select an LLM, start and monitor agent runs, review structured results, and export processed data. The core objective: move AI from a chat interface to repeatable business workflow automation.

Before vs after

Workflow Before After
Company research 10–15 working days across two analysts, manual web and spreadsheet 2–3 days of supervised agent runs with structured export
Competitor monitoring Weekly manual checklist across a fixed set of sources Scheduled runs, structured diff, alerts on change
Lead enrichment Throughput capped by manual lookup speed per researcher Throughput is review-bound, not lookup-bound — reviewers approve agent-extracted records instead of gathering them
Data quality and consistency Mixed formats, manual cleanup before reporting Named fields, confidence scores, ready for CRM import
Adding a new workflow New tool, new training, new spreadsheet template New agent type added in config + class, no platform rebuild

Delivery timeline

A single-agent PoC running internally shipped in week 4. The core platform (Workflows Library, run monitoring, exports, multi-LLM adapter, RAG over uploaded documents) went into production use by week 12 with 6 agent types. The remaining months added 8 more agent types, Hangfire-based job queuing, role-based access, and deployment automation in parallel with daily client use, reaching 14 agent types by month 12.

Who This Is For

The platform fits any team that runs the same structured knowledge task on a repeating schedule: researching companies, enriching leads, monitoring competitors, extracting data from websites, or generating reports from collected inputs. The data sources are stable, the report structure is stable; only the input list changes.

Triggers that bring teams to a custom agent platform:

  • B2B SaaS adding AI-powered features without hiring a dedicated ML team
  • Startups where lead research and enrichment have been manual from day one
  • Sales, real estate, and ops teams running weekly research on a fixed set of sources
  • Market research and consulting firms producing recurring reports from web sources and uploaded datasets

If the workflow runs more than once a week and follows a defined pattern, an AI agent handles most of it.

Previous slide
    Next slide

    Why a Custom Agent Platform — Not Zapier, n8n, or a Simple LLM Call

    A chatbot answers questions. A custom AI agent runs a loop: plan → use tools → check results → decide next step → retry or finish. Zapier and Make connect APIs but have no concept of an agent deciding what to do next based on partial results, retrying a failed step with a different strategy, or scoring extracted data before proceeding. OpenAI Assistants API gives you a single-agent thread, with no multi-agent orchestration, no custom tool routing, and no run observability beyond what OpenAI exposes.

    What TwinCore built instead: each agent type runs a graph orchestrated by LangGraph (an open-source framework for defining AI agents as execution graphs with explicit nodes, edges, and state). Tool use runs through function calling with per-agent schemas. Web crawling runs through Playwright (a headless-browser automation library) with configurable depth, domain filters, and rate limiting. Confidence scoring happens at the extraction layer, so low-confidence records get flagged rather than silently passed through. The entire run is stored: every intermediate step, every tool call, every error.

    Previous slide
      Next slide

      Custom AI Agent Platform: Solution Architecture

      TwinCore built the platform around six core modules covering the full agent lifecycle.

      • Dashboard — aggregate view: total agents, active and completed runs, results found, today’s activity, recent events, errors, and warnings
      • Workflows Library — browsable catalog of 14+ reusable agent types grouped by 6 categories: research, extraction, analysis, content generation, monitoring, document processing
      • Agent Creation Wizard — dual-mode setup: Describe with AI for plain-English input; Configure manually for category, sources, and run parameters
      • Launch Agent — run configuration screen: run name, description, file upload, LLM selection, retry settings, additional parameters
      • Run Monitoring — live execution view with status (Running / Completed / Idle / Error / Stopped), progress, crawled pages, results count, logs, warnings, and failed steps
      • Results and Export — structured results table with fields, confidence scores, per-record statuses, and export to CSV / Excel

      Infrastructure: LangGraph orchestrates the agent execution graph. Each step is a node with defined inputs, outputs, and transition conditions; retry and fallback paths are explicit edges, not try/catch blocks. Hangfire (a .NET background job library) handles background job execution, distributed queuing, and scheduled runs with configurable retry policies per agent type. Elastic Vector Search powers RAG (Retrieval-Augmented Generation, the pattern of grounding LLM answers in retrieved documents) for agents that reason over uploaded documents or knowledge bases. Playwright handles web crawling with per-agent configuration: URL seeds, crawl depth, domain allowlists, rate limits, and JavaScript rendering. The platform runs on Azure with Docker containers, CI/CD through Azure DevOps, schema migrations via EF Core, and an OpenAPI 3 contract for the public REST API. Role-based access control separates agent management permissions from run data visibility.

      Production observability runs on two layers. Application traces, metrics, and exceptions ship to Azure Application Insights. LLM-specific traces — prompts, completions, token usage, cost per run, and per-agent latency — go to Langfuse (an open-source LLM observability platform), so every agent run is replayable with full prompt history when something goes wrong in production.

      Previous slide
        Next slide

        Performance characteristics

        • Crawling is parallelized with configurable concurrency per agent; throughput scales with worker count rather than agent code changes
        • RAG retrieval is served by Elastic Vector Search and scales with corpus size up to the indexed limit
        • Single-agent run time is bounded by tool-call count and LLM choice, not by orchestration overhead; runs are visible in real time and can be aborted mid-flight

        One tradeoff we made (LangGraph over CrewAI and OpenAI Assistants)

        LangGraph over CrewAI and OpenAI Assistants. CrewAI is faster to spin up for demo agents, but the multi-agent abstraction hides the execution graph and makes per-step retry policies awkward. OpenAI Assistants ties the platform to one vendor and gives shallow observability. LangGraph cost us more setup time in weeks 1–4 but paid back from week 12 onward: every retry, branch, and failure is inspectable from a single run log, and adding a new agent type is a graph definition, not a framework gymnastics exercise.

        One failure mode we hit and fixed (field-level hallucinations)

        In the first weeks of production runs, agents occasionally returned hallucinated field values for company research, especially for companies with thin web footprints. Response-level confidence was not enough: an agent could be confident overall and wrong on one field. Fix: confidence scoring was moved to the field level, each extracted value gets a separate score from the reasoning step, and any field below threshold gets flagged in the output table with a low-confidence status. Reviewers approve or reject per field, not per record. Bad records stopped reaching downstream CRM imports once field flagging shipped.

        Technologies used

        .NET
        ASP.NET Core
        Entity Framework Core
        React
        TypeScript
        OpenAI GPT-4o
        Anthropic Claude
        Google Gemini
        LangChain
        LangGraph
        Langfuse
        RAG
        Elastic Vector Search
        Playwright
        Hangfire
        OpenAPI 3
        Swagger
        REST API
        Excel / CSV processing
        SQL Server
        PostgreSQL
        Docker
        Azure
        Azure DevOps
        Azure Application Insights
        Role-based access control

        Results

        • 14 AI agent types built and deployed as reusable, configurable templates
        • Six workflow categories automated: research, extraction, analysis, content generation, monitoring, document processing
        • Structured agent output replaced generic AI tooling for B2B lead research and company analysis
        • Uploaded Excel and CSV business data feeds directly into agent runs without manual re-entry
        • Researchers reclaimed a median of 5.5 hours/week (across 4 researchers, weekly task log audit at month 6) previously spent on manual lookups [estimate]
        • Every run tracked with status, progress, logs, warnings, and retry history
        • Every agent run returns clean fields, confidence scores, and per-record statuses ready for downstream use
        Previous slide
          Next slide

          Adoption inside the client team

          • The platform has been in continuous production use since the core release in week 12
          • Teams that previously assigned analysts to weekly market and competitor scans now review agent output rather than gather it manually
          • New agent types are requested by team leads directly, without an engineering ticket

          Why TwinCore

          The client evaluated two paths before hiring TwinCore: hire an in-house ML team, or string together Make.com and OpenAI Assistants. The in-house path required hiring ML, backend, and PM headcount and would have run well into a year of ramp-up before first production agent. The no-code path was already in flight and had stalled at single-purpose flows that broke on edge cases.

          TwinCore was a fit on three points: prior delivery of LangGraph-orchestrated agent systems, RAG infrastructure experience with Elastic Vector Search, and a small senior team (3 full-stack engineers) that could ship a production PoC in the first month rather than running a 6-month discovery.

          Build vs buy

          Path Time to first production agent Year–1 risk
          In-house ML team 6–9 months of hiring and ramp-up Hiring market, attrition, framework choice still ahead
          Make.com + OpenAI Assistants Weeks per flow Breaks at edge cases, no observability, vendor lock
          TwinCore custom platform Internal PoC in week 4, production core in week 12 Defined scope, fixed team, codebase owned by the client

          If your team runs the same research, extraction, or reporting workflow 5+ times a week, book a 30-minute architecture call. We will sketch a build plan, name the agent types that fit your data, and give a delivery and cost ballpark on the call. No proposal cycle, no slide deck.

          Scroll to top