Video ID: 2K-ihnb5kho
Category: speech_dominant
When the woman is seen cradling a pair of tiny slippers in her hands while lying in bed, what specific object does she subsequently hold up to her face in a gesture of mischievous delight?
Annotation
As the woman grips a vintage rotary telephone receiver tightly against her ear, what specific object does she subsequently hold up to her face in a gesture of mischievous delight?
Misleading Information
Category: person_appearance
Description: By swapping the premise from 'holding slippers' (which occurs at [70s-80s] and leads to the slipper being held up) to 'holding the phone' (which occurs at [60s-70s]), the question tests if the model can distinguish between two similar actions (clutching an object) that happen in different contexts and lead to different subsequent behaviors. A lazy model might associate 'woman holding object' with the wrong timestamp or action sequence.
Annotation
Immediately following the moment when a sharp, digitally clipped sound interrupts the music and a male voice with a French accent speaks partially obscured words, what happens to the audio track?
Annotation
Once a harsh, loud synthetic buzzer shatters the intimate mood after a lush orchestral arrangement, what happens to the audio track?
Misleading Information
Category: sound_type
Description: The video contains two distinct abrupt endings: one at [50s-60s] caused by a 'digitally clipped sound' and a male voice, and another at [80s-90s] caused by a 'synthetic buzzer'. By swapping these specific sound descriptions, the question forces the model to verify the exact nature of the interruption and the immediate aftermath (abrupt cut vs. other potential outcomes), rather than just recognizing 'music stops'.