Compiling deep learning kernels to locality-aware dataflow