AgentX Day 0001

the agentx logo

Hello World, We Have a Problem

Welcome to 2026!

It is a brave new world - companies are giving employees access to artificial intelligence and free training to lift their burdens and increase their productivity. We are learning about custom agents, custom prompts, tools, MCP, and the importance of starting new conversations so we can ensure we have fresh contexts without the overhead of noisy tokens.

Boy those tokens are getting expensive!

Did your company downsize so they could continue to afford tokens?

Did you hear the latest model is capable of 2 orders of magnitude more tokens? It can do so much more than the old model! And it takes so many more tokens! But don’t forget to start a new conversation so the agent doesn’t lose focus.

I hear the price of tokens is going up again…

Why Do We Fall Down, Bruce?

Large Language Models (LLM) are here to stay. But they are also the latest productivity hype. Killer app? Sure - but for most it is a subscription service that is very good at delivering 80% of what you need in 20% of the time. Getting that last 20% can be really tricky.

I learned about custom prompts by accident - I was using a coding agent and found that I was retyping the same baseline requirements over and over again. I had too - the agent would drift off on an unpredictable tangent and I kept needing to bring it back to square one. At one point, out of shear frustration, I asked the agent how I could ensure coding consistency and minimize halucinations. The answer came back: write it all down. Require the agent to read the files each time it does something. Track progress in lists with checkboxes… The whole opened up before me and like Bruce Wayne (or Alice, if you prefer) I fell right in.

Before I heard of Spec-driven development or Custom Agents or Prompts or Custom Commands I was implementing my own framework. I used the agent for 80% of the language and I fooled around with the remaining 20% to get it mostly working. My productivity started to go up. Then it hit a plateau. I had accidentally allowed an ambiguous branch of code to come into existence. By not explicitly stating that “I want to enhance this class” meant “update the class and do not build a new enhanced_class definition” I inadvertantly allowed the agent to put my latest ideas in one place or the other depending on how I prompted it, or based on which version it traced a problem back to. It took days to identify and fix the problem. It took longer to figure out how to prevent it from happening again. It really started to take longer when the system started to compress the context.

Once your session’s context gets corrupted, there is little you can do - sometimes it is just cheaper to kill the session, revert some code, re-write your prompts, and start again. Just when we think we are in the vibe, we fall down.

Getting Back Up

I found that I had generated too much code to solve my problems by hand. I knew in great detail what it was supposed to do - but I had relinquished a certain level of control to my agent. I used business terminology for code units and processes and let the agent come up with class and method names. I had very high code coverage, but neglectted to instruct the agent to include Gherkin use-cases in the test comments describing what the use-cases were. To solve my problems, I had to use my agent. It worked, eventually. In the process, I discovered a neat trick.

Agent - heal yourself!

I asked the agent to review the prompts I had written, and the code it had generated, and the fix it had to implement. What could I have done better? What could it have done better? What would have made the prompt more effective and less error prone? Update the prompt accordingly. Try again.

I found that after several iterations, prompts became more thorough. Explicit examples of what to do and what not to do littered them. I became less concerned about “one shot” and “two shot” prompts, and more focussed on prompt stability and reusability. My pattern became highly iterative, and my goal was “good boilerplate.” But I found that the right boilerplate was highly specialized each time I wanted to do something. And I learned that you always wanted to end your session before your context got full.

Not just because it took so long to compress the session. But because compressing it meant the loss of details that the agent found unimportant. Summarization is, by its very nature, the removing of as much detail as the possible but retaining enough to keep the conversation moving forwards. The problem was that I disagreed too often with what the agent found important and what I found important.

So two forces waged open war in my vibe coding sessions: retaining what I the user knew to be critical context, and retaining what the agent computed to be critical context. I did some investigation into context window management implementation so I could better manage my efforts. Two approaches were being used: the older approach was to just throw away the earliest prompt/response turns until there was enough space to allow the new prompt to be included. The older stuff, the thinking went, was important when the conversation started, but was now less important than where the conversation had moved on to. This was not a very good solution. The more modern approach was to summarize the context into salient facts. This was a common LLM activity from early machine learning days: “summarize War and Peace in 30 words or less” was an interesting way to show off a model’s capabilities.

But it was slow. And sometimes the jist and direction of the conversation is a lot less important than the deep details of some of the key milestones along the way. The value of context content is not homogenous. Some areas are dense with relevance, and other large stretches might be completely useless at the best of times, and detrimental in the worst of times.

Honestly, I thought, context curation should be a human endeavor.

Introducing AgentX