mlx-code

A composable coding agent for Mac.

mlx-code bundles an MLX inference server, a terminal harness, multi-protocol API support, git worktree isolation, and composable multi-agent primitives all in one Python package. Run it offline, pipe it into shell scripts, or swap in any external API.

user@mac:~$ mlc --model mlx-community/Qwen3.6-27B-OptiQ-4bit
[System] Booting local MLX server on Apple Silicon...
[System] Workspace snapshotted to isolated git worktree.
> Fix the token counter bug and write tests to verify it.
↳ Calling tool [Grep] with "count_tokens" in "./"...
↳ Calling tool [Read] on "mlx_code/utils.py"...
[Agent] Found the issue. Delegating test creation to a sub-agent to keep my context clean.
↳ Calling tool [Agent] with prompt "Write pytest cases for count_tokens..."
    [Sub-Agent] ↳ Calling tool [Write] on "tests/test_utils.py"...
    [Sub-Agent] ↳ Calling tool [Bash] with "pytest tests/test_utils.py"...
    [Sub-Agent] ↳ Sub-task complete. Returning failure logs to main agent.
[Agent] Tests failed as expected. Applying the fix now.
↳ Calling tool [Edit] on "mlx_code/utils.py"...
[Agent] Fix applied. Tests pass. Changes committed to your local timeline.

Features

Git Worktree Isolation

Every session gets a fresh git worktree. After every tool roundtrip, the agent creates a commit containing file changes and the full conversation state. Your git history becomes a step-by-step timeline you can roll back at any time.

Context Decay Mitigation

As sessions get longer, LLM performance drops. mlx-code solves this with an Agent tool, allowing the main agent to spawn sub-agents, delegate heavy sub-tasks, and return only the synthesized result to keep the main context pristine.

Swappable Backends

Run entirely locally via the built-in MLX server, or point the harness at a remote provider like Claude, Gemini, or DeepSeek. You can even run the harness in a sandboxed VM while the LLM runs safely on the host.

Composable by Design

Every component is modular, allowing you to import exactly what you need straight into your existing workflows. Mix, match, and wire them together to power anything from background scheduled jobs to complex multi-agent pipelines.

Quick Start

pip install mlx-code
mlc

This starts the local MLX inference server and drops you straight into the built-in REPL harness.

Usage & Examples

Basic Agent Interaction

Spin up an agent to analyze code, run tools, or connect to external providers.

Command Line

# Point to an external API
mlc-run --api deepseek --model deepseek-v4-pro

# Restrict tools and define a custom persona
mlc --tools Read Write Bash --system "You are a concise engineer."

# Load skills recursively from a directory (scans for SKILL.md files)
mlc --skill ./my-skills

# Server and environment controls
mlc --leash none                      # Start the inference server only (no REPL)
mlc --leash claude                    # Route a different harness through the local server
mlc --work /path/to/repo              # Set the git worktree root (default is cwd)
mlc --cache /path/to/custom_cache     # Set a custom directory for the KV-cache

Python API

import asyncio
from mlx_code.repl import Agent

async def main():
    agent = Agent(system="You are a concise technical writer.")
    await agent.run("Summarise all *.py files changed in the last 7 days. Save to digest.md.")

asyncio.run(main())

Git Worktree

Because mlx-code stores the full conversation as JSON in every commit message, you can restore both the workspace state and the agent's memory from any checkpoint if the agent ever goes off the rails.

Command Line

# Browse branches and commits across sessions/agents
mlc-git 

# Resume a session from a specific commit hash
mlc --resume abc1234

Multi-Agent Pipelines

String multiple agents together. Pass the output of one agent as the context for another to form robust, self-critiquing pipelines.

Command Line

Compose agents naturally using shell pipes and subshells.

# Critique a generated solution
echo "Here's the solution you proposed: <excerpt>$(mlc -p "write code for a chrome extension to play youtube x5 speed")</excerpt> Now argue against it. What are the edge cases this doesn't handle? What assumptions did you make that might not hold in a production system? What would you change if you knew this code would be read by a senior engineer in a security audit?" | mlc

# Pipeline across different models and local/remote environments
echo "explain lsp.py" | mlc-run -a deepseek | cat - PLAN.md | mlc-run --url http://localhost:9000

Python API

import asyncio
from mlx_code.gits import resume_worktree
from mlx_code.repl import Agent, repl

async def main():
    # Restore the worktree and the memory from a specific hash
    gwt, messages = resume_worktree(".", "abc1234")
    
    agent = Agent(ctx={"gwt": gwt})
    agent.messages = messages
    
    await repl(agent)

asyncio.run(main())

Python API

Build sequential or parallel workflows natively using Python's asyncio.

import asyncio
from mlx_code.repl import Agent

async def main():
    researcher = Agent(system="You are a research assistant.")
    await researcher.run("Research PBFT consensus. Save a structured summary to kb/draft.md.")

    reviewer = Agent(system="You are a critical reviewer.")
    await reviewer.run(
        "Read kb/draft.md. Write a one-paragraph critique to kb/critique.md. Use only information in that file."
    )

asyncio.run(main())

import asyncio
from mlx_code.repl import Agent

async def main():
    topics = ["history", "algorithms", "industry_usage"]
    agents = [Agent() for _ in topics]
    
    # Spawn workers concurrently
    await asyncio.gather(*[
        a.run(f"Research the {t} of Byzantine Fault Tolerance. Save to kb/{t}.md.")
        for a, t in zip(agents, topics)
    ])
    
    # Synthesize results
    reducer = Agent()
    await reducer.run("Read all files in kb/. Synthesise into final_report.md.")

asyncio.run(main())

Adding New Tools

Extend agent capabilities with your own custom tools. Just subclass Tool, define a Pydantic schema, and pass the class at instantiation.

from mlx_code.tools import Tool
from mlx_code.repl import Agent
from pydantic import BaseModel, Field

class QueryParams(BaseModel):
    query: str = Field(description="SQL query to run")

class LiveDBTool(Tool):
    name = "QueryDB"
    description = "Execute a query against the dev database"
    parameters = QueryParams

    async def execute(self, params: QueryParams, signal=None) -> dict:
        result = run_query(params.query)   # your logic here
        return {"content": [{"type": "text", "text": result}], "is_error": False}

# Instantiate the agent with your custom tool
agent = Agent(extra_tool_classes=[LiveDBTool], tool_names=["QueryDB"])