As they develop comprehension skills, American Sign Language (ASL) learners often view challenging ASL videos, which may contain unfamiliar signs. Current dictionary tools require students to isolate a single sign they do not understand and input a search query, by selecting linguistic properties or by performing the sign into a webcam. Students may struggle with extracting and re-creating an unfamiliar sign, and they must leave the video-watching task to use an external dictionary tool. We investigate a technology that enables users, in the moment, i.e., while they are viewing a video, to select a span of one or more signs that they do not understand, to view dictionary results. We interviewed 14 American Sign Language (ASL) learners about their challenges in understanding ASL video and workarounds for unfamiliar vocabulary. We then conducted a comparative study and an in-depth analysis with 15 ASL learners to investigate the benefits of using video sub-spans for searching, and their interactions with a Wizard-of-Oz prototype during a video-comprehension task. Our findings revealed benefits of our tool in terms of quality of video translation produced and perceived workload to produce translations. Our in-depth analysis also revealed benefits of an integrated search tool and use of span-selection to constrain video play. These findings inform future designers of such systems, computer vision researchers working on the underlying sign matching technologies, and sign language educators.