Custom AI Agent Platform for Business Workflow Automation

TwinCore built a custom AI agent platform for a Florida B2B research company. The platform replaced manual company research, competitor monitoring, lead enrichment, and report generation with supervised agent workflows, field-level confidence scoring, audit trails, and structured exports to CRM.

Industry

B2B market research

Sales intelligence

AI Agent

Country

USA (Florida)

Project duration

12 months, PoC running internally by week 4; core platform in production use by week 12

Team

3 full-stack developers

Outcome

Manual research and monitoring workflows replaced by supervised, structured agent runs with full audit trail; 14 reusable agent types across 6 workflow categories

About the client

The client is a Florida-based company of roughly 20–50 employees serving B2B sales and marketing teams that need lead lists, competitor reports, and market scans on a recurring schedule. Their internal workflows were largely manual: a small research team switched between spreadsheets, websites, documents, and search tools to complete tasks that repeated every week. They had already tried Make.com plus OpenAI Assistants and it stalled at single-purpose flows that broke on edge cases.

“We can see what every agent is doing, fix it when it goes wrong, and pull the results straight into our CRM. That was the part missing from every off-the-shelf tool we tried.”

Sam S.

Florida-based B2B company (name withheld at client request)

Custom AI Agent Platform for Business Workflow Automation | Case Study

Custom AI Agent Platform

Problem: The client ran multiple different AI workflows with no unified system. Each task required a separate tool or manual process.

Result: TwinCore built a platform where users create, configure, launch, and manage different AI agent types from one interface.

Recurring research and monitoring workflows moved into the platform during the first months of production use, replacing the prior mix of spreadsheets, manual web sessions, and one-off tools.
Reusable Workflows Library: 14 agent types

Problem: Research, scraping, analysis, and content tasks each require different logic, inputs, and outputs. No single template covers them all.

Result: The platform ships with 14 agent types. Confirmed templates include AI Company Analysis, AI Real Estate Scraper, AI Research Assistant, Market Analysis, Competitor Research, Keyword Analysis, and Blog Idea Generator. Additional agents cover lead enrichment, document Q&A, scheduled site monitoring, and content briefing workflows.

New agent types are added in days, not weeks: a config file, a class implementing the agent contract, and prompt templates. No platform rebuild.
AI-assisted and manual configuration

Problem: Non-technical users need a fast way to create agents; technical users need control over sources, rules, and run behavior.

Result: Two creation modes are available: Describe with AI (plain English input) and Configure manually (category, sources, parameters).

Business users now create agents without engineering involvement; technical users keep full control over sources and run behavior.
Real-time agent execution monitoring

Problem: AI workflows have no value if users cannot see what the agent is doing or why a run failed.

Result: The platform tracks run status, progress, crawled pages, results found, logs, warnings, errors, and retries, visible in real time.

100% of runs tracked with full audit trail and retry history.
Structured results and export

Problem: Business teams need usable output they can act on, not raw text or unstructured AI responses.

Result: Agents return structured results with named fields, confidence scores, per-record statuses, and CSV / Excel export.

Output feeds directly into sales, marketing, operations, or reporting workflows, no manual cleanup.

Project Goal

The goal was to replace fragmented manual workflows with a single, controllable AI automation platform. The client needed business users to create new agent types, configure agent behavior, choose between AI-assisted and manual setup, upload files, select an LLM, start and monitor agent runs, review structured results, and export processed data. The core objective: move AI from a chat interface to repeatable business workflow automation.

Before vs after

Workflow	Before	After
Company research	10–15 working days across two analysts, manual web and spreadsheet	2–3 days of supervised agent runs with structured export
Competitor monitoring	Weekly manual checklist across a fixed set of sources	Scheduled runs, structured diff, alerts on change
Lead enrichment	Throughput capped by manual lookup speed per researcher	Throughput is review-bound, not lookup-bound — reviewers approve agent-extracted records instead of gathering them
Data quality and consistency	Mixed formats, manual cleanup before reporting	Named fields, confidence scores, ready for CRM import
Adding a new workflow	New tool, new training, new spreadsheet template	New agent type added in config + class, no platform rebuild

Delivery timeline

A single-agent PoC running internally shipped in week 4. The core platform (Workflows Library, run monitoring, exports, multi-LLM adapter, RAG over uploaded documents) went into production use by week 12 with 6 agent types. The remaining months added 8 more agent types, Hangfire-based job queuing, role-based access, and deployment automation in parallel with daily client use, reaching 14 agent types by month 12.

Who This Is For

The platform fits any team that runs the same structured knowledge task on a repeating schedule: researching companies, enriching leads, monitoring competitors, extracting data from websites, or generating reports from collected inputs. The data sources are stable, the report structure is stable; only the input list changes.

Triggers that bring teams to a custom agent platform:

B2B SaaS adding AI-powered features without hiring a dedicated ML team
Startups where lead research and enrichment have been manual from day one
Sales, real estate, and ops teams running weekly research on a fixed set of sources
Market research and consulting firms producing recurring reports from web sources and uploaded datasets

If the workflow runs more than once a week and follows a defined pattern, an AI agent handles most of it.

Why a Custom Agent Platform — Not Zapier, n8n, or a Simple LLM Call

A chatbot answers questions. A custom AI agent runs a loop: plan → use tools → check results → decide next step → retry or finish. Zapier and Make connect APIs but have no concept of an agent deciding what to do next based on partial results, retrying a failed step with a different strategy, or scoring extracted data before proceeding. OpenAI Assistants API gives you a single-agent thread, with no multi-agent orchestration, no custom tool routing, and no run observability beyond what OpenAI exposes.

What TwinCore built instead: each agent type runs a graph orchestrated by LangGraph (an open-source framework for defining AI agents as execution graphs with explicit nodes, edges, and state). Tool use runs through function calling with per-agent schemas. Web crawling runs through Playwright (a headless-browser automation library) with configurable depth, domain filters, and rate limiting. Confidence scoring happens at the extraction layer, so low-confidence records get flagged rather than silently passed through. The entire run is stored: every intermediate step, every tool call, every error.

Custom AI Agent Platform: Solution Architecture

TwinCore built the platform around six core modules covering the full agent lifecycle.

Dashboard — aggregate view: total agents, active and completed runs, results found, today’s activity, recent events, errors, and warnings
Workflows Library — browsable catalog of 14+ reusable agent types grouped by 6 categories: research, extraction, analysis, content generation, monitoring, document processing
Agent Creation Wizard — dual-mode setup: Describe with AI for plain-English input; Configure manually for category, sources, and run parameters
Launch Agent — run configuration screen: run name, description, file upload, LLM selection, retry settings, additional parameters
Run Monitoring — live execution view with status (Running / Completed / Idle / Error / Stopped), progress, crawled pages, results count, logs, warnings, and failed steps
Results and Export — structured results table with fields, confidence scores, per-record statuses, and export to CSV / Excel

Infrastructure: LangGraph orchestrates the agent execution graph. Each step is a node with defined inputs, outputs, and transition conditions; retry and fallback paths are explicit edges, not try/catch blocks. Hangfire (a .NET background job library) handles background job execution, distributed queuing, and scheduled runs with configurable retry policies per agent type. Elastic Vector Search powers RAG (Retrieval-Augmented Generation, the pattern of grounding LLM answers in retrieved documents) for agents that reason over uploaded documents or knowledge bases. Playwright handles web crawling with per-agent configuration: URL seeds, crawl depth, domain allowlists, rate limits, and JavaScript rendering. The platform runs on Azure with Docker containers, CI/CD through Azure DevOps, schema migrations via EF Core, and an OpenAPI 3 contract for the public REST API. Role-based access control separates agent management permissions from run data visibility.

Production observability runs on two layers. Application traces, metrics, and exceptions ship to Azure Application Insights. LLM-specific traces — prompts, completions, token usage, cost per run, and per-agent latency — go to Langfuse (an open-source LLM observability platform), so every agent run is replayable with full prompt history when something goes wrong in production.

Performance characteristics

Crawling is parallelized with configurable concurrency per agent; throughput scales with worker count rather than agent code changes
RAG retrieval is served by Elastic Vector Search and scales with corpus size up to the indexed limit
Single-agent run time is bounded by tool-call count and LLM choice, not by orchestration overhead; runs are visible in real time and can be aborted mid-flight

One tradeoff we made (LangGraph over CrewAI and OpenAI Assistants)

LangGraph over CrewAI and OpenAI Assistants. CrewAI is faster to spin up for demo agents, but the multi-agent abstraction hides the execution graph and makes per-step retry policies awkward. OpenAI Assistants ties the platform to one vendor and gives shallow observability. LangGraph cost us more setup time in weeks 1–4 but paid back from week 12 onward: every retry, branch, and failure is inspectable from a single run log, and adding a new agent type is a graph definition, not a framework gymnastics exercise.

One failure mode we hit and fixed (field-level hallucinations)

In the first weeks of production runs, agents occasionally returned hallucinated field values for company research, especially for companies with thin web footprints. Response-level confidence was not enough: an agent could be confident overall and wrong on one field. Fix: confidence scoring was moved to the field level, each extracted value gets a separate score from the reasoning step, and any field below threshold gets flagged in the output table with a low-confidence status. Reviewers approve or reject per field, not per record. Bad records stopped reaching downstream CRM imports once field flagging shipped.

Technologies used

.NET

ASP.NET Core

Entity Framework Core

React

TypeScript

OpenAI GPT-4o

Anthropic Claude

Google Gemini

LangChain

LangGraph

Langfuse

RAG

Elastic Vector Search

Playwright

Hangfire

OpenAPI 3

Swagger

REST API

Excel / CSV processing

SQL Server

PostgreSQL

Docker

Azure

Azure DevOps

Azure Application Insights

Role-based access control

Results

14 AI agent types built and deployed as reusable, configurable templates
Six workflow categories automated: research, extraction, analysis, content generation, monitoring, document processing
Structured agent output replaced generic AI tooling for B2B lead research and company analysis
Uploaded Excel and CSV business data feeds directly into agent runs without manual re-entry
Researchers reclaimed a median of 5.5 hours/week (across 4 researchers, weekly task log audit at month 6) previously spent on manual lookups [estimate]
Every run tracked with status, progress, logs, warnings, and retry history
Every agent run returns clean fields, confidence scores, and per-record statuses ready for downstream use

Adoption inside the client team

The platform has been in continuous production use since the core release in week 12
Teams that previously assigned analysts to weekly market and competitor scans now review agent output rather than gather it manually
New agent types are requested by team leads directly, without an engineering ticket

Why TwinCore

The client evaluated two paths before hiring TwinCore: hire an in-house ML team, or string together Make.com and OpenAI Assistants. The in-house path required hiring ML, backend, and PM headcount and would have run well into a year of ramp-up before first production agent. The no-code path was already in flight and had stalled at single-purpose flows that broke on edge cases.

TwinCore was a fit on three points: prior delivery of LangGraph-orchestrated agent systems, RAG infrastructure experience with Elastic Vector Search, and a small senior team (3 full-stack engineers) that could ship a production PoC in the first month rather than running a 6-month discovery.

Build vs buy

Path	Time to first production agent	Year–1 risk
In-house ML team	6–9 months of hiring and ramp-up	Hiring market, attrition, framework choice still ahead
Make.com + OpenAI Assistants	Weeks per flow	Breaks at edge cases, no observability, vendor lock
TwinCore custom platform	Internal PoC in week 4, production core in week 12	Defined scope, fixed team, codebase owned by the client

Wants the same?

Book a 30-min architecture call

If your team runs the same research, extraction, or reporting workflow 5+ times a week, book a 30-minute architecture call. We will sketch a build plan, name the agent types that fit your data, and give a delivery and cost ballpark on the call. No proposal cycle, no slide deck.