← Dashboard 2K-ihnb5kho
Ready annotator_a
Video ID: 2K-ihnb5kho
Category: speech_dominant
Standard Vision ○
Misleading Vision ○
Standard Audio ○
Misleading Audio ○
When the woman is seen cradling a pair of tiny slippers in her hands while lying in bed, what specific object does she subsequently hold up to her face in a gesture of mischievous delight?
A.She holds up a glass orb from the side table.
B.She holds up the floral-patterned blanket to her face.
C.She holds up one of the tiny slippers again. ✓ Correct
D.She holds up a ceramic figurine.
E.The visual detail in the question is incorrect
F.The audio detail in the question is incorrect
Answer timestamp: [70s-80s]s Modality: vision Category: temporal

Annotation

As the woman grips a vintage rotary telephone receiver tightly against her ear, what specific object does she subsequently hold up to her face in a gesture of mischievous delight?
A.She holds up a glass orb from the side table.
B.She holds up the floral-patterned blanket to her face.
C.She holds up one of the tiny slippers again.
D.She holds up a ceramic figurine.
E.The visual detail in the question is incorrect ✓ Correct
F.The audio detail in the question is incorrect
Answer timestamp: [70s-80s]s Modality: vision Category: temporal
Misleading Information
Category: person_appearance
Description: By swapping the premise from 'holding slippers' (which occurs at [70s-80s] and leads to the slipper being held up) to 'holding the phone' (which occurs at [60s-70s]), the question tests if the model can distinguish between two similar actions (clutching an object) that happen in different contexts and lead to different subsequent behaviors. A lazy model might associate 'woman holding object' with the wrong timestamp or action sequence.

Annotation

Immediately following the moment when a sharp, digitally clipped sound interrupts the music and a male voice with a French accent speaks partially obscured words, what happens to the audio track?
A.The music fades out gradually into silence.
B.The female vocalist begins singing a new verse immediately.
C.The recording ends abruptly with no fade. ✓ Correct
D.A second female vocalist joins in harmony.
E.The visual detail in the question is incorrect
F.The audio detail in the question is incorrect
Answer timestamp: [50s-60s]s Modality: audio Category: temporal

Annotation

Once a harsh, loud synthetic buzzer shatters the intimate mood after a lush orchestral arrangement, what happens to the audio track?
A.The music fades out gradually into silence.
B.The female vocalist begins singing a new verse immediately.
C.The recording ends abruptly with no fade.
D.A second female vocalist joins in harmony.
E.The visual detail in the question is incorrect
F.The audio detail in the question is incorrect ✓ Correct
Answer timestamp: [50s-60s]s Modality: audio Category: temporal
Misleading Information
Category: sound_type
Description: The video contains two distinct abrupt endings: one at [50s-60s] caused by a 'digitally clipped sound' and a male voice, and another at [80s-90s] caused by a 'synthetic buzzer'. By swapping these specific sound descriptions, the question forces the model to verify the exact nature of the interruption and the immediate aftermath (abrupt cut vs. other potential outcomes), rather than just recognizing 'music stops'.

Annotation