mlx-code
A composable coding agent for Mac.
mlx-code bundles an MLX inference server, a terminal harness, multi-protocol API support, git worktree isolation, and composable multi-agent primitives all in one Python package. Run it offline, pipe it into shell scripts, or swap in any external API.
[System] Booting local MLX server on Apple Silicon...
[System] Workspace snapshotted to isolated git worktree.
> Fix the token counter bug and write tests to verify it.
↳ Calling tool [Grep] with "count_tokens" in "./"...
↳ Calling tool [Read] on "mlx_code/utils.py"...
[Agent] Found the issue. Delegating test creation to a sub-agent to keep my context clean.
↳ Calling tool [Agent] with prompt "Write pytest cases for count_tokens..."
[Sub-Agent] ↳ Calling tool [Write] on "tests/test_utils.py"...
[Sub-Agent] ↳ Calling tool [Bash] with "pytest tests/test_utils.py"...
[Sub-Agent] ↳ Sub-task complete. Returning failure logs to main agent.
[Agent] Tests failed as expected. Applying the fix now.
↳ Calling tool [Edit] on "mlx_code/utils.py"...
[Agent] Fix applied. Tests pass. Changes committed to your local timeline.
Features
Git Worktree Isolation
Every session gets a fresh git worktree. After every tool roundtrip, the agent creates a commit containing file changes and the full conversation state. Your git history becomes a step-by-step timeline you can roll back at any time.
Context Decay Mitigation
As sessions get longer, LLM performance drops. mlx-code solves this with an Agent tool, allowing the main agent to spawn sub-agents, delegate heavy sub-tasks, and return only the synthesized result to keep the main context pristine.
Swappable Backends
Run entirely locally via the built-in MLX server, or point the harness at a remote provider like Claude, Gemini, or DeepSeek. You can even run the harness in a sandboxed VM while the LLM runs safely on the host.
Composable by Design
Every component is modular, allowing you to import exactly what you need straight into your existing workflows. Mix, match, and wire them together to power anything from background scheduled jobs to complex multi-agent pipelines.
Quick Start
pip install mlx-code
mlc
This starts the local MLX inference server and drops you straight into the built-in REPL harness.
Usage & Examples
Basic Agent Interaction
Spin up an agent to analyze code, run tools, or connect to external providers.
Command Line
# Point to an external API
mlc-run --api deepseek --model deepseek-v4-pro
# Restrict tools and define a custom persona
mlc --tools Read Write Bash --system "You are a concise engineer."
# Load skills recursively from a directory (scans for SKILL.md files)
mlc --skill ./my-skills
# Server and environment controls
mlc --leash none # Start the inference server only (no REPL)
mlc --leash claude # Route a different harness through the local server
mlc --work /path/to/repo # Set the git worktree root (default is cwd)
mlc --cache /path/to/custom_cache # Set a custom directory for the KV-cache
Python API
import asyncio
from mlx_code.repl import Agent
async def main():
agent = Agent(system="You are a concise technical writer.")
await agent.run("Summarise all *.py files changed in the last 7 days. Save to digest.md.")
asyncio.run(main())
Git Worktree
Because mlx-code stores the full conversation as JSON in every commit message, you can restore both the workspace state and the agent's memory from any checkpoint if the agent ever goes off the rails.
Command Line
# Browse branches and commits across sessions/agents
mlc-git
# Resume a session from a specific commit hash
mlc --resume abc1234
Multi-Agent Pipelines
String multiple agents together. Pass the output of one agent as the context for another to form robust, self-critiquing pipelines.
Command Line
Compose agents naturally using shell pipes and subshells.
# Critique a generated solution
echo "Here's the solution you proposed: <excerpt>$(mlc -p "write code for a chrome extension to play youtube x5 speed")</excerpt> Now argue against it. What are the edge cases this doesn't handle? What assumptions did you make that might not hold in a production system? What would you change if you knew this code would be read by a senior engineer in a security audit?" | mlc
# Pipeline across different models and local/remote environments
echo "explain lsp.py" | mlc-run -a deepseek | cat - PLAN.md | mlc-run --url http://localhost:9000
Python API
import asyncio
from mlx_code.gits import resume_worktree
from mlx_code.repl import Agent, repl
async def main():
# Restore the worktree and the memory from a specific hash
gwt, messages = resume_worktree(".", "abc1234")
agent = Agent(ctx={"gwt": gwt})
agent.messages = messages
await repl(agent)
asyncio.run(main())
Python API
Build sequential or parallel workflows natively using Python's asyncio.
import asyncio
from mlx_code.repl import Agent
async def main():
researcher = Agent(system="You are a research assistant.")
await researcher.run("Research PBFT consensus. Save a structured summary to kb/draft.md.")
reviewer = Agent(system="You are a critical reviewer.")
await reviewer.run(
"Read kb/draft.md. Write a one-paragraph critique to kb/critique.md. Use only information in that file."
)
asyncio.run(main())
import asyncio
from mlx_code.repl import Agent
async def main():
topics = ["history", "algorithms", "industry_usage"]
agents = [Agent() for _ in topics]
# Spawn workers concurrently
await asyncio.gather(*[
a.run(f"Research the {t} of Byzantine Fault Tolerance. Save to kb/{t}.md.")
for a, t in zip(agents, topics)
])
# Synthesize results
reducer = Agent()
await reducer.run("Read all files in kb/. Synthesise into final_report.md.")
asyncio.run(main())
Adding New Tools
Extend agent capabilities with your own custom tools. Just subclass Tool, define a Pydantic schema, and pass the class at instantiation.
from mlx_code.tools import Tool
from mlx_code.repl import Agent
from pydantic import BaseModel, Field
class QueryParams(BaseModel):
query: str = Field(description="SQL query to run")
class LiveDBTool(Tool):
name = "QueryDB"
description = "Execute a query against the dev database"
parameters = QueryParams
async def execute(self, params: QueryParams, signal=None) -> dict:
result = run_query(params.query) # your logic here
return {"content": [{"type": "text", "text": result}], "is_error": False}
# Instantiate the agent with your custom tool
agent = Agent(extra_tool_classes=[LiveDBTool], tool_names=["QueryDB"])