Akshay ๐Ÿš€

Hermes Mixture of Agents (MoA) explained.

Every agent commits to a single model, and every model has blind spots the others would have caught.

The usual workaround is to run the same prompt through a few models by hand and reconcile the answers. It works, but it lives outside the agent, so the tools, the memory, and the session are gone the moment that detour starts.

Hermes Agent by Nous Research just shipped Mixture of Agents, which folds that whole process back inside the agent.

The unit you work with is a preset. Think of it as a recipe that names a few models to consult and one model to write the final answer, saved under a label you can reuse.

So a preset might list GPT-5.5 and DeepSeek as the models to consult, with Opus as the one that replies. You set it up once, give it a name, and pick it later like any other model.

The models you consult run first and quietly hand their analysis to the one writing the answer. That final model is the one that actually replies and makes the tool calls, now informed by several perspectives instead of one.

Here is the part that makes it click. The preset shows up as a model, not as a framework to wire together.

So everything that already works in Hermes keeps working. Tool calls, follow-up iterations, memory, and the same session context behave exactly as they do with a single model, because to the agent loop it is a single model.

The models can come from anywhere. One preset can mix OpenAI, Anthropic, DeepSeek, and Google, and it is not capped at two.

A few things follow from that design.

โ†’ It composes a model instead of choosing one. Several models covering each other's blind spots can beat the strongest one on its own.

โ†’ It stays cheap to run. The models you consult see a stripped-down view of the conversation, so the extra calls stay light and the main context keeps its cache.

โ†’ It reaches past any single frontier model. Combining the providers already on hand assembles a composite that can outscore the best one available alone.

โ†’ It is a dial, not a default. It turns on for the hard ten percent of tasks where a second opinion matters, and stays off for routine work where speed wins.

Nous reports the effect on its own benchmark. A preset running Opus-4.8 over a GPT-5.5 reference scored higher than either model alone, by roughly six points and eight to eleven percent.

The lesson is not that one model has to win. It is that the best answer rarely comes from a single model, and the agent should make blending them as easy as picking one.

That said, if you're looking to set up Hermes, I wrote a full deep dive covering the Hermes agent's architecture, memory system, self-evolving skills, GEPA optimization, and how to set up multiple specialized agents.

The article is quoted below.

You can also watch my YouTube crash course on the Hermes agent: https://youtube.com/watch?v=bNp6YcKBLgY

284