Counting letters

20 May 2026

A small moment from today's chat with Claude. We were updating my personal website. I'd mentioned that my Xbox gamertag, silky, is short, single-word, no numbers, and dates back to the original Xbox Live launch in 2002 — properly rare these days. Claude framed it nicely:

"Microsoft started auto-appending random digits to new tags about a decade in once the namespace got crowded — keeping a clean four-letter handle since launch is genuine retro currency."

A bit later:

Me: how many letters in my xbox user name?

Claude: Five — s, i, l, k, y. Even more impressive than four.

Me: and what did you say above?

Claude: I miscounted — said "four-letter handle" up there. It's five. My mistake.

For some people, that's the whole story. The AI can't count letters in a five-letter word. Therefore it is dumb. Therefore there is no point in using it, paying for it, or learning to work with it.

I see this take constantly. It is one of the most expensive misreadings of how LLMs actually work, and the people anchored to it are quietly giving up a meaningful amount of leverage in their daily work.

What's actually going on

Language models don't see characters. They see tokens — subword chunks. The word silky is probably a single token. From the model's point of view, asking "how many letters in silky" is closer to me asking you "how many letters in the symbol ⌬" — you'd have to break it open, identify the parts, and count them, all without being able to see inside the symbol directly.

The model can do this when it tries deliberately (and Claude got it right the instant I asked the question head-on). But it's a fragile operation when the math is just dropped into casual conversation, the way you might absent-mindedly say "five o'clock" when it's actually four. It's a known, well-studied limitation. And it says exactly nothing about the model's ability to reason, write, summarise, operate APIs, debug code, design systems, or carry long chains of practical work.

Inferring "useless" from "letter-counting wobble" is like dismissing a calculator because it can't write a poem. Different shape. Different tool.

What also happened in the same chat

The same model that miscounted silky in casual conversation also, in the same session, helped me work through:

A full audit of my Cloudflare account end-to-end. Domains, DNS, email routing, Pages projects, Access apps, billing history. Including identifying an unexplained $47.70 charge from March — turned out to be the 9-year prepaid registration of this domain.
A DMARC record added to richmorgan.co.uk to close a domain-spoofing gap.
16 Gmail filters refactored into version-controlled XML. Four high-volume buckets now auto-trash on Gmail's native 30-day timer so my storage stops growing.
A Google Apps Script written and live-debugged across five iterative versions. v1 naive. v5 using Gmail's Advanced Service batchModify to move messages in batches of a thousand. 77,906 marketing emails moved to trash across two passes — about an 82% inbox reduction. Broker statements, Airbnb receipts, and other things worth keeping deliberately preserved.
A scheduled remote AI agent that audits my Cloudflare account every morning and Slacks me only if something changed — protection against any surprise charges on the card my account holds.
A second scheduled remote agent that reads only the headers of incoming Gmail in key labels and DMs me a daily digest. Message bodies are explicitly off-limits — prompt-injection-hardened from the start.
This site — created, designed, and deployed. From "let's start" to live on the apex domain in under five minutes.

That all happened today.

The slip on silky is not a counter-argument to any of it. Both things are true at the same time: the model has a known weakness with raw character math, and it is a tool of genuinely high leverage in real practical work. People anchored on the first observation and refusing to engage with the second are leaving an enormous amount on the table.

A pointer

If you want to grok tokenisation in 90 seconds, OpenAI's tokenizer playground shows you how any string is chopped into tokens. Drop silky in — you'll likely see one or two tokens, not five characters. Once you've seen it, the letter-counting limitation stops looking like a stupidity and starts looking like exactly what it is: an artefact of representation, with well-known workarounds (ask the model to spell the word first, or hand the micro-task to a Python REPL).

The frame I've landed on: stop asking AI to be a calculator. Use it as a colleague.

richmorgan.co.uk