Jonathan's Blog

Anthropic Builds Good Applications

A lot of AI labs suck at building applications around their AI models. They spend ungodly amounts of time and money training models and then fail in the last mile delivery to the consumer. This is especially indefensible considering how much of a difference the product portion of AI deployment (the harness) can make.

Anthropic’s models aren’t that great, and they’re expensive as hell, but Anthropic builds great applications. Anthropic’s system prompts are huge behemoths that actually have thought, time, and testing put into them. With good reason, system prompts are important! Claude Code is just a great product. Anthropic puts more effort into building good products and seems to be winning on that front.

DeepMind with Gemini seem to be at the opposite end of the spectrum. They also train solid models, usually slightly behind Anthropic but still strong. However, they usually fall flat when it comes to actually making applications. Gemini CLI doesn’t seem to have good design or defaults, and the Gemini has a wimpy system prompt with terrible product design.

The actual product, the harness, that’s built around the model has a huge impact on the overall user experience. It feels like most AI labs can get a lot more mileage out of their models by creating better functioning products and putting more time into their system prompts rather than trying to eke out another percent on a math benchmark.


Changes