So now we have our graph all set up. We did our forward passes through the model, so now what? Do we run it? We could! But it wouldn’t be very fast. Right now your graph is full of primops, which are the simplest set of primitive operations in luminal. One of the key tenants of luminal is a small primop set, which makes it easy to add new backends and write compilers for. But another consequence of a small primset is that even simple operations usually end up creating quite a few operations, and even small neural networks can end up with hundreds or thousands of primops, which are slow to run directly. So it’s time to compile the graph! We use a loose definition of a compiler. Compilers are structs that implement theDocumentation Index
Fetch the complete documentation index at: https://docs.luminalai.com/llms.txt
Use this file to discover all available pages before exploring further.
Compiler trait, which simply specifies a single function:
add(a, mul(b, -1)). We can have a compiler that looks for that pattern of nodes and directly replaces it with a Subtract operation. We’ll look at how to do this in the Writing Compilers section.
All you need to know for now is that we can use this compiler on the graph by doing:
- GenericCompiler - A handful of hardware-agnostic optimizations like CSE to be ran before any hardware-specific compilers.
- CudaCompiler<T> - The full stack of cuda compilers to convert a graph to a cuda-specialized graph with T as the datatype (either f32 or f16). Imported from luminal_cuda.
- MetalCompiler<T> - Same as CudaCompiler. Imported from luminal_metal.

