Why we replaced LangChain with a 200-line orchestrator
LangChain is powerful. It's also a cognitive tax. Here's why we stripped it out of three production systems and what we built instead.
S
Super AdminMarch 15, 2026
EngineeringLangChainorchestration
LangChain is powerful. It's also a cognitive tax. Here's why we stripped it out of three production systems and what we built instead.
The moment we decided
It was sprint 6 of a production copilot for a mid-size bank. The LangChain version was working. It was also 2,400 lines of abstracted magic that none of our engineers could fully read in their head at once. When a retrieval regression appeared, it took 4 hours to find the culprit across three nested chain objects.
Want to run this playbook with us?
A 30-minute scoping call. We listen, ask three questions, tell you if we can help.
That was the moment. We rewrote the orchestration layer in a Saturday afternoon.
What we kept from LangChain
Document loaders: they're good and cover almost every format we encounter.
Text splitters: battle-tested chunking implementations are worth keeping.
Callback handlers: the structured event system is genuinely useful for logging.
What we replaced with 200 lines
The core of our orchestrator is a single async function: retrieve_and_generate(query, context_config). It does three things: retrieves K documents from the vector store using hybrid search, builds a prompt from a Jinja2 template, and calls the LLM with structured output parsing.
The retrieval step
We use pgvector with cosine similarity for semantic search, combined with a tsvector full-text search for keyword recall. The results are merged with RRF (Reciprocal Rank Fusion) and the top 6 chunks go to the prompt. No reranking model — the latency cost wasn't worth it at our scale.
The prompt template
A single Jinja2 template, versioned in Git, with slots for system context, retrieved chunks, conversation history, and the user query. Every change is diffable. Every engineer understands it.
"The best abstraction is the one you can delete." — A principle we now apply to every dependency we add.
What we gained
Debugging time: 4 hours → 15 minutes for the same class of retrieval regression.
Onboarding: new engineers understand the full pipeline in < 1 hour.
Cost: removed two transitive dependencies that were pulling in 400MB of wheels.
Latency: p95 dropped from 2.4s to 1.7s (we're not sure why — possibly caching behavior).
When you should keep LangChain
LangChain is the right call for: rapid prototyping (the abstractions are perfect for day-1 exploration), teams who don't need to own the orchestration layer long-term, and use cases that need tool-calling agents with complex state management.
If you're in production, handling > 10k requests/day, and the pipeline is the core of your product — consider owning it.
Why we replaced LangChain with a 200-line orchestrator — Sainskerta Blog · Sainskerta