T05 : Parallel Processing and Scheduling
T05 : Parallel Processing and Scheduling
92 A Parallel Scan Algorithm in the Tensor Core Unit Model
Anastasios Zouzias and William McColl
178 Improved Algorithms for Monotone Moldable Job Scheduling using Compression and Convolution
Kilian Grage, Klaus Jansen and Felix Ohnesorge
56 TrainBF: High-Performance DNN Training Engine using BFloat16 on AI Accelerators
Zhen Xie, Siddhisanket Raskar, Murali Emani and Venkatram Vishwanath