Management

I Paid 300K a Month to a Developer Who Was Secretly Feeding My Tasks to ChatGPT

A team lead discovers that a mid-level developer has been secretly outsourcing all programming tasks to ChatGPT instead of writing code himself.

I have a small product team of 12 people, building B2B logistics. Go, React, PostgreSQL, everything on Kubernetes. The domain looks boring from the outside, but on the inside it's hell: every carrier has its own API, and every API reads like it was written on a Friday evening. CDEK's tariff_code field is a string in one endpoint and a number in another — I still don't understand why, and nobody there understands either. I asked.

So, in the fall of 2025, one of our mid-level developers left. Typical story — he found +80K at some fintech, they promised him a "flat structure and decision-making freedom" (spoiler: three months later he messaged me saying he wanted to come back, but we'd already hired a replacement). The replacement, by the way, wasn't easy to find. We did about twenty interviews, maybe more — I lost count after fifteen.

The market is like this: half of the candidates can't write a tree traversal (I'm serious — I was starting to wonder if I was asking the wrong questions, but no), three lied about their experience so poorly I felt embarrassed for them, and one asked if he could work from Bali on the condition that standups are at 9 AM. He can't — we have on-call.

Anyway, a candidate appeared. I'll call him Dima.

The Interview

Dima did fine. Not brilliantly. Not one of those people who quotes the Go spec from memory and draws perfect architectures on the whiteboard (I'm actually wary of those types — they usually suffer in real projects because the world isn't perfect). He explained how he'd design a notification service and drew a reasonable diagram. He asked the right clarifying questions — what's more important, at-least-once or exactly-once, what are the latency requirements. He stumbled a bit on goroutine patterns, but I stumble on them too, honestly — I reread Sameer Ajmani's post every time I need to do something complex with channels.

300K gross, he agreed, came on board. First week he was onboarding, reading docs, setting up the local environment, asking questions in Slack. Reasonable questions. I exhaled.

And then.

The First Task

Integration with a regional carrier's API. A medium task: 40-page PDF documentation (half in broken English, the other half wasn't even documentation but marketing slides they forgot to remove), write a client, tests, plug it into the pipeline. Usually 5–7 business days.

Dima did it in two.

I was surprised. I went into the PR.

The code worked. Tests were green. Linter was silent. But I had this... you know that feeling when something is off, but you can't articulate what? I couldn't put my finger on it for two days, and then it hit me in pieces.

First — comments. Our style is: we only write a comment when the code does something non-obvious, and we explain why, not what. Dima slapped a comment on every function, all identical: // functionName handles the processing of X and returns Y. I told him — that's not how we do it. He removed them. From the NEXT PR. They stayed in this one. Minor thing, fine.

Second — naming. This is harder to explain. Variables were named fine, but... too fine? See, real programmers have naming habits. Some always write resp, some res, some apiResp. And it's consistent — it's a reflex. In Dima's code, one file had client, another apiClient, a third transportClient. Slightly different each time. Not a bug, but an absence of habit. Although... maybe I was nitpicking. Maybe the guy just writes that way, without an established style. I decided at the time that I was nitpicking.

Third — error handling. The Go classic, if err != nil. In our project, error messages are short, like "unmarshal transport resp", because the stack trace and wrap are right there and long strings only get in the way when you need to grep later. Dima wrote full sentences: "failed to unmarshal transport response: unexpected format". Also not a bug. But after two months, people usually pick up the project's style, and he wasn't picking it up.

I gave him feedback, showed him the conventions.

Dima said "got it, I'll fix it."

In the next PR he repeated all the same things.

The Tariff Refactoring

(A digression here. Tariff calculation in logistics is a whole separate level of hell. We have a three-screen-long nested if-statement there, because tariffs depend on the delivery type, region, weight, volumetric weight (that's weight / 5000, for those who don't know — and that 5000 coefficient has been hardcoded since 2019, because "we'll move it to config later" — you know how "later" works), and also on whether the client has a contractual discount or a promo code or both.)

This refactoring is what I gave to Dima. Break the if-statements into strategies. The task requires understanding the domain more than writing code.

Dima asked where the documentation was. I said — Confluence, a Slack thread from March, a recording of a call with the product manager. Normal situation — our documentation is like everyone else's, meaning it exists in fragments, and the rest is oral tradition and git blame.

Four days later — a PR, 1,200 lines. Decomposed into files, Strategy pattern, interfaces, tests. Beautiful.

On the third file, I stopped.

He handled the "contractual discount + promo code at the same time" case incorrectly. Our logic works like this: contractual discount applies to the base price, promo code applies to the price after the discount, but not below the minimum margin. Dima applied both discounts in parallel and took the minimum. The tests passed — because he wrote the tests himself, and the tests verified his logic, not ours.

I wrote: the discount order is wrong, look at how it works now. Twenty minutes later, a new commit. Now the promo code was first. Still wrong. I wrote in more detail. Third commit — correct.

That's when something stung. Not the mistake — everyone makes mistakes. It was that three times in a row, three different implementations, and none of them showed that he understood the principle. He didn't ask "why this order?" He didn't clarify "where is the minimum margin defined?" He just produced a different implementation. As if he... well, regenerated it with a new prompt.

Although, it's possible I was already biased by then. I don't know. Maybe he was just too shy to ask. That happens with newcomers — they're afraid of looking stupid.

The Test

Fine. I did something I'm ashamed of.

Well — not exactly ashamed. I'd do it again. But there's a residual aftertaste.

I wrote Dima a task with a trap. I asked him to add a new delivery type and in the description wrote that the cost calculation "uses an adapted Bellman-Ford algorithm." This is nonsense. Bellman-Ford is about shortest paths in graphs — it has absolutely nothing to do with cost calculation. Any backend developer with a decent CS background would ask about it.

Dima didn't ask.

Two days later, a PR arrived with a function called BellmanFordPricing. Inside — a linear calculation by zones, having nothing to do with the Bellman-Ford algorithm. The comment read: // implements an adapted Bellman-Ford algorithm for zone-based pricing. I opened GPT, pasted my task description, and asked it to write an implementation. Same approach. Same function name.

Well.

The Conversation (And I'm Not Sure I Was Right)

I called a one-on-one meeting.

I asked directly: are you feeding tasks into ChatGPT?

He paused. Said that Copilot suggests things, like everyone else uses.

I said it's not about suggestions, it's about copy-pasting entire tasks.

He said — well, I use it as a tool. The results are there. Everyone uses it.

And then came a conversation that's hard for me to retell objectively, because by that point I was already on edge. But I'll try.

His arguments: the code works, tests are green, what difference does it make how exactly it was written. Seniors at FAANG companies publicly write about Copilot. Finding a mid-level dev for 300K on the market who writes everything by hand — well, good luck.

My arguments: he doesn't understand what his code does. When I ask "why this structure" — he hesitates. He can't explain the trade-offs. He can't say what will happen under 10x load. In two months, he hasn't gotten any better at understanding our domain. Usually, a mid-level dev starts noticing things after a couple of months — "maybe we should use an event here?", "we're hitting the database twice, we could optimize." Dima kept asking the same questions.

And then there's debugging. When his service went down on staging, he spent an hour and a half poking around and couldn't find the cause. I found it in ten minutes — a race condition in goroutines, a classic. But for that you need to understand how goroutines work, not just know the word "goroutine."

Then again, I catch myself here cherry-picking arguments in my favor. Because there's another side, and it gnaws at me.

What I'm Not Sure About

The code WORKED. That's a fact. Tasks were getting CLOSED. Also a fact.

If I hadn't gone in and examined the code carefully, I wouldn't have noticed. If not for the Bellman-Ford trap, I might never have caught it — possibly not until the first serious incident. And maybe there would never have been a serious incident, who knows. Maybe he would have worked for a year, closing tasks, collecting his salary, and everything would have been fine. I don't know. That's not a rhetorical device — I genuinely don't know.

Here's something else that eats at me. The line between "I use an LLM as a tool" and "I'm a shell for an LLM" is blurry. I sometimes ask Claude to write a test. Or boilerplate. Or "what's that function in the standard library called, I forgot." How is that fundamentally different from what Dima was doing? Scale? Percentage? Where's the threshold?

I don't know. Seriously.

(By the way, looking ahead: we later changed our hiring process, and it helped, but not as much as I'd hoped. I'll tell you below.)

How It Ended with Dima

We parted ways. No scandal. I explained that we need someone who understands the code. He didn't argue — I think he was expecting this conversation. He worked his two-week notice period and handed off his tasks.

I feel sorry for him. He's not stupid and he's not a fraud. He found himself in a situation where a tool allowed him to do work he couldn't handle on his own. For a while, it worked.

What We Changed in Hiring

Briefly, because this is the boring part, but it might be useful for someone.

In the live-coding section, I now give "find the bug" instead of "write this." Twenty lines of Go with a race condition, or a goroutine leak, or incorrect context handling. It's harder to prepare for this via an LLM because you need to read code, not generate it.

Also during the probation period — one task on a call, sharing screen, no preparation. Not timed. I don't need speed — I need to see the process: how they read code, how they form hypotheses, how they search (searching is allowed and encouraged), how they debug.

Did it help? Well... seems like yes. We screened out one candidate at this stage — he got very nervous when I asked him to share his screen. But the sample size is small: we hired two people after the new process, both seem fine, though not enough time has passed to judge.

Off-topic, But It's Been Weighing on Me

I wrote all this and I'm thinking — the problem isn't Dima. Dima is a symptom.

We (well, the entire industry, but us included) have spent years building our hiring process around "WHAT can you do." Write a function. Design a system. Solve a problem. And if a person produces the right result, we hire them. We didn't care HOW they got to the result, because the only way to produce a correct result used to be — knowing how to do it.

Now there's a second way. And our process isn't ready for it.

But maybe that's not a bad thing? Maybe in five years, "being able to prompt properly" will be a legitimate skill, and we'll hire not developers but LLM operators, paying them the same 300K, and everything will be fine?

I don't know. Probably not, because when production goes down, you need someone who understands what's happening, not someone who can describe the problem to a chatbot. But maybe I'm just getting old and grumpy.

My team lead in 2018 probably grumbled the same way when I was googling answers on StackOverflow instead of reading man pages.

UPD: after rereading, I noticed that my "Bellman-Ford test" is, to put it mildly, a questionable method. If Dima had come to me and said "I see you wrote Bellman-Ford, but that's a graph algorithm — is this a mistake in the spec?" and I had replied "no, you just don't understand, do it as written" — that would have been manipulation. I was prepared for that outcome and would have admitted the error in the spec. But he didn't ask. That's the whole point.

I Paid 300K a Month to a Developer Who Was Secretly Feeding My Tasks to ChatGPT

The Interview

The First Task

The Tariff Refactoring

The Test

The Conversation (And I'm Not Sure I Was Right)

What I'm Not Sure About

How It Ended with Dima

What We Changed in Hiring

Off-topic, But It's Been Weighing on Me

Further reading

Why Airships Never Took Off. Part 12: Italian Semi-Rigid Airships

Why Airships Never Took Off. Part 11: Aircraft Carriers in the Sky

Why Airships Never Took Off. Part 10: The Most Famous and Successful Zeppelin

Why Airships Never Took Off. Part 9: Ashes of War and New Opportunities