A new approach to ML
a - b
for instance, add(a, mul(b, -1))
gets written to the graph. Or when you do a.matmul(b)
, what actually gets put on the graph is sum_reduce(mul(reshape(a), reshape(b)))
.
Once the graph is built, iterative compiler passes can modify it to replace primops with more efficient ops, depending on the device it’s running on. On Nvidia cards, for instance, efficient Cuda kernels are written on the fly to replace these ops, and specialized cublas kernels are swapped in for supported operations.
This approach leads to a simple library, and performance is only limited by the creativity of the compiler programmer, not the model programmer.
Luminal has a number of other neat features, check out the repo here.