Theory and algorithms for data-centric machine learning