Video ID: CIsjrEN58Yg
Category: scene_dominant
When Indiana Jones whirls toward the specific horror of a full skeletal figure bound in frayed ropes hanging motionless against the moss-slick stone wall, what immediate physical reaction does he exhibit before stumbling backward?
Annotation
As the explorer leans in close to examine the weathered human skull embedded in the moss-caked stone with gnarled roots snaking across it, what specific action does he perform with his hand before the scene shifts?
Misleading Information
Category: object_location
Description: The model must distinguish between two distinct visual moments: one involving a full skeletal figure (where Indy stumbles back) and another involving a skull (where Indy brushes webs). A lazy model might conflate the actions associated with examining bones in general.
Annotation
Following the moment when Indy's voice cracks into a choked whisper declaring 'It's not just a thing...' and the dry creak of the skeleton's head lurching sideways echoes, what happens immediately after this sequence of sounds?
Annotation
After the hushed whisper cuts through the silence with the calm statement 'It's just a thing' while the light catches on a tangled web where a small object is entangled, what auditory event follows this description?
Misleading Information
Category: speech_speaker
Description: The model must differentiate between two different lines spoken ('It's just a thing' vs 'It's not just a thing') and their respective tonal qualities (calm/weary vs choked/cracking), then map the correct subsequent sound event (stillness vs mechanical click) to the specific line.