1 · Current stack rank (October 2025)
| Rank | Model | Why we reach for it |
|---|---|---|
| 1 | Claude Sonnet 4.5 | Recommended daily driver. Excellent balance of quality, speed, and cost for most development tasks. Current CLI default. |
| 2 | GPT-5 Codex | Fast iteration loops with strong coding performance. Great for implementation-heavy work at lower cost than Sonnet. |
| 3 | Claude Haiku 4.5 | Fast and cost-effective for routine tasks, quick iterations, and high-volume automation. Best for speed-sensitive workflows. |
| 4 | Droid Core (GLM-4.6) | Open-source model with 0.25× token multiplier. Lightning-fast and budget-friendly for automation, bulk edits, and air-gapped environments. |
| 5 | GPT-5 | Strong generalist from OpenAI. Choose when you prefer OpenAI ergonomics or need specific GPT features. |
| 6 | Claude Opus 4.1 | Highest capability for extremely complex work. Use when you need maximum reasoning power for critical architecture decisions or tough problems. |
We ship model updates regularly. When a new release overtakes the list above,
we update this page and the CLI defaults.
2 · Match the model to the job
| Scenario | Recommended model |
|---|---|
| Deep planning, architecture reviews, ambiguous product specs | Start with Sonnet 4.5 for strong reasoning at practical cost. Use GPT-5 Codex for faster iteration or Haiku 4.5 for lighter tasks. |
| Full-feature development, large refactors | Sonnet 4.5 is the recommended daily driver. Try GPT-5 Codex when you want faster loops or Droid Core for high-volume work. |
| Repeatable edits, summarization, boilerplate generation | Haiku 4.5 or Droid Core for speed and cost savings. GPT-5 or Sonnet 4.5 when you need higher quality. |
| CI/CD or automation loops | Favor Haiku 4.5 or Droid Core for predictable throughput at low cost. Use Sonnet 4.5 or Codex for complex automation. |
| High-volume automation, frequent quick turns | Haiku 4.5 for speedy feedback loops. Droid Core when cost is critical or you need air-gapped deployment. |
Claude Opus 4.1 remains available for extremely complex architecture decisions or critical work where you need maximum reasoning capability. Most tasks don’t require Opus-level power—start with Sonnet 4.5 and escalate only if needed.
/model or by toggling in the settings panel (Shift+Tab → Settings).
3 · Switching models mid-session
- Use
/model(or Shift+Tab → Settings → Model) to swap without losing your chat history. - If you change providers (e.g. Anthropc to OpenAI), the CLI converts the session transcript between Anthropic and OpenAI formats. The translation is lossy—provider-specific metadata is dropped—but we have not seen accuracy regressions in practice.
- For the best context continuity, switch models at natural milestones: after a commit, once a PR lands, or when you abandon a failed approach and reset the plan.
- If you flip back and forth rapidly, expect the assistant to spend a turn re-grounding itself; consider summarizing recent progress when you switch.
4 · Reasoning effort settings
- Anthropic models (Opus/Sonnet/Haiku) show modest gains between Low and High.
- GPT models respond much more to higher reasoning effort—bumping GPT-5 or GPT-5 Codex to High can materially improve planning and debugging.
- Reasoning effort increases latency and cost, so start Low for simple work and escalate when you need more depth.
Change reasoning effort from
/model → Reasoning effort, or via the
settings menu.5 · Bring Your Own Keys (BYOK)
Factory ships with managed Anthropic and OpenAI access. If you prefer to run against your own accounts, BYOK is opt-in—see Bring Your Own Keys for setup steps, supported providers, and billing notes.Open-source models
Droid Core (GLM-4.6) is an open-source alternative available in the CLI. It’s useful for:- Air-gapped environments where external API calls aren’t allowed
- Cost-sensitive projects needing unlimited local inference
- Privacy requirements where code cannot leave your infrastructure
- Experimentation with open-source model capabilities
6 · Keep notes on what works
- Track high-impact workflows (e.g., spec generation vs. quick edits) and which combinations of model + reasoning effort feel best.
- Ping the community or your Factory contact when you notice a model regression so we can benchmark and update this guidance quickly.