Finding needles in a haystack : molecular similarity and machine learning for drug discovery applications