MakoraGenerate, your favorite web-based GPU kernel generation tool, is now available in the CLI! Read the documentation at docs.makora.com. Installing it is as simple as pip install makora

You can still visit generate.makora.com, paste in a PyTorch operation, pick a target GPU, and get back an optimized kernel.

Today we're announcing the MakoraGenerate CLI - a command-line interface to generate, evaluate, and profile GPU kernels autonomously on remote hardware. This is ideal for integration into CI/CD systems and building end-to-end automated systems.

Two Interfaces, One Source of Truth

The coolest part? The CLI and the Web GUI are now perfectly in sync. You no longer have to choose between a visual dashboard and a command-line tool. If you kick off a massive evolutionary search via the CLI while grabbing coffee, you can open generate.makora.com on your laptop or phone and watch the performance graphs climb in real-time.

Cross-Platform Inspection: View kernel code, profiling logs, and performance metrics on whichever platform is most convenient at the moment.
Unified Session Management: Start a generation session in the cloud and pull the results locally with a single command.

Asynchronous mode: kick off a long-running agent

With the generate command, you can kick off a fully automated kernel generation job that will run as long as 24 hours, exploring different optimization paths using evolutionary search:

makora generate problem.py --language cuda --device H100

Synchronous mode: use individual tools

Automatically format a PyTorch operation into Makora's problem format

Running makora check problem.py —-device will tell you if you're adhering to the correct format. If you aren't, no worries! Just affix --fix and MakoraGenerate will automatically modify your coding, fixing the format issue.

makora check problem.py --device H100 --fix

Generate an individual optimized kernel

Using makora generate will kick off a job that uses evolutionary search and ultimately writes dozens of kernels, searching through the optimization space and applying different known techniques. If you just want a single kernel, you can invoke makora expert-generate in the same way. It will make its best attempt at a single kernel and stop there.

makora expert-generate problem.py --device H100

Evaluate correctness against the reference implementation

Running makora evaluate will check a kernel solution versus a reference implementation using the default ATOL and RTOL of 0.001.

makora evaluate problem.py solution.py --device H100

Profile performance on real remote hardware

One of the most powerful tools in the MakoraGenerate CLI is the profiler. It has access to nsight-computeand rocprof-compute profiling on Nvidia and AMD GPUs. You can get detailed results, readable by human or agent, along with analysis and suggestions for potential optimization opportunities. Running the profiler is simple:

makora profile problem.py solution.py --device H100

Every command returns structured, parseable output. No HTML to scrape, no screenshots to interpret - just clean results that an agent can reason about and act on.

This generate-evaluate-profile loop is exactly how agents think. Each step produces structured feedback that informs the next. The CLI makes the entire loop programmable.