Teaching Video Comprehension to AI, One Million Moments at a Time