When we started AutoCom in January, the brief was simple: our D2C customers were drowning in manual work — pulling orders from Shopify, copying addresses into courier portals, sending WhatsApp updates by hand. They’d built their operations into a full-time job instead of a business.
The plan was to give ourselves six weeks to ship something real. Not an MVP with feature flags and “coming soon” placeholders — something that could take a real order, pick a carrier, generate a label, and notify the customer, all without a human touching it.
We hit that deadline. Here’s how.
The boring architecture won
My first instinct was to reach for the fancy stack. Event-sourced, CQRS, Kafka, the whole thing. I sketched it out on a whiteboard for about forty minutes before Sam quietly pointed out that we had six weeks, three engineers, and no existing infrastructure.
We threw the whiteboard diagram in the bin and went with:
- Laravel 11 for the HTTP layer and queue workers
- Postgres 16 for everything stateful — orders, events, idempotency keys, audit log
- Redis for queues and rate limiting
- A single state machine per order
No Kafka. No Elasticsearch. No microservices. One Postgres database we can EXPLAIN ANALYZE our way out of any bottleneck. One deploy target. One place to look when something breaks.
It turns out most of the hard problems in e-commerce automation aren’t distributed systems problems. They’re sequencing problems. Did we already create an AWB for this order? Did we already send the customer the “shipped” WhatsApp? A state machine and a few well-placed unique constraints get you most of the way there.
The state machine
Each order moves through a fixed set of states:
received → validated → assigned → labelled → dispatched → delivered
Each transition is a database row. We never update state in place — we insert an event and a trigger updates a materialised orders_current row. That gives us three things for free:
- Audit log — every state change is a permanent record with the user, the reason, the inputs
- Idempotency — replay a webhook, re-run a worker, nothing duplicates
- Debuggability — “why did this order fail?” is a
SELECT * FROM order_events WHERE order_id = ? ORDER BY idaway
The event table is the only write target. Everything else is a read model. That’s the entire design.
The three dead-ends
Not everything worked on the first try.
Dead-end #1: trying to be carrier-agnostic from day one. We started with an elaborate CarrierInterface and sub-classes for Delhivery, Shiprocket, and BlueDart. Two weeks in we realised we didn’t actually know what the shared interface should look like — each API had wildly different pickup mechanics, different label formats, different failure modes. We ripped it out, wrote Delhivery directly, and only pulled out the interface after Shiprocket was working end-to-end. Generalising before you have two concrete cases almost always produces the wrong abstraction.
Dead-end #2: WhatsApp Business templates. Meta’s template approval is a business-hours activity run by humans with opinions. We lost three days waiting on approvals for message variants we didn’t end up using. The fix: draft every template you think you’ll need in week one, submit them in parallel, and cache approval status in the database so you can fail fast on pre-flight checks rather than at send time.
Dead-end #3: assuming the carrier APIs were reliable. They are not. We now treat every carrier call as a transient failure by default — exponential backoff, a circuit breaker per carrier, and a dead-letter queue that a human reviews every morning. About 1.8% of label requests fail on the first try. Almost all of them succeed on the second.
What we’d do differently
Two things, looking back:
- Write the retry policy before the happy path. Every external integration needs a retry strategy. If you bolt it on later, you’ll end up with five slightly different retry loops and no coherent story for idempotency.
- Test with real data from day one. Our fixtures were too clean. Real merchant data has emoji in addresses, phone numbers with spaces, pincode typos, and dates in seven different formats. We lost a weekend to “it works on staging” because staging didn’t know what the real world looked like.
Six weeks in, AutoCom processed its first 47 real orders for a pilot customer in Kochi. The state machine has held up. The boring database has held up. The Laravel monolith has held up.
Sometimes the best architecture is the one you already know how to operate at 3 AM.