Instructions    (expand/collapse)

Task Info: I'm trying to teach an AI to understand the world through images. You can think of the AI as a five year old child: it can pick out some of the objects and people in the image, and knows basic attributes (such as color or size), but knows little else. That's where you come in!

To help the AI learn, you'll be given an image from a movie clip, a list of people and objects detected by our AI, and captions of previous/current/next clips. Please:

  • Ask three questions about the image, like a child would. The best questions tend to be curiosity driven and relevant to what's interesting in the image: for instance, about what people are doing, why and how people are doing those actions, what might happen next/previously, and what might happen if something in the image (hypothetically) changed.
  • Provide one reasonable answer for each question. Feel free to use the captions as inspiration here. In general, I'd like it if your question+answer pair involves making a likely inference about the situation. Ideally, the answer shouldn't be obvious without seeing the image, and it shouldn't be a story you made up.
  • Write a short rationale justifying your answer. For this, imagine that you're explaining why your answer is true to a child. Please base your rationale around:
    • specific lower-level observations in the image that let you infer your answer, and
    • general rules about how the world works.
  • Mark the likelihood that you think your answer is true, based on the image.
Guidelines:
  • Please use the AI's detected objects and people in questions, answers, and reasons, as this helps the AI learn about what it found. Please format the detections as they are given, in brackets - ie, "What is [person1] doing?"
    • Feel free to use pronouns if it's obvious after you mention a detection.
    • If the AI produces a wrong detection, please ignore it unless it's only the name that is incorrect. For instance, if a bagel is labeled as [donut1] then that's fine! You can ask about the bagel, just call it [donut1].
  • Please avoid asking counting questions ("how many X are there?"), questions about color or size, simple verification questions ("Is [person1] wearing pants?"), questions about who the characters/actors are, and questions that have a one-word answer. Our AI already can answer these questions.
  • Please try to ask about people's physical and mental actions, and how they relate to the objects in the world. These tend to be more interesting than questions involving the characters' physical attributes (what they are wearing, standing/sitting on, etc).
  • For rating likelihood, please ignore the captions. For instance, if your answer is that "Afterwards, [person1] starts dancing" and that's not obvious from the image, you should rate Unlikely (<25% chance) even if the answer is implied by the next caption.
FAQ/Less important guidelines: (feel free to skip)
  • Feel free to use pronouns if it's obvious after you mention a detection.
  • You can use the buttons to explore/hide the detections.
  • It takes me about 40 seconds per question+answer+reason+likelihood, writing around 6 words per question, around 6 words per answer, and around 10 words per reason. No worries if you take more or less time, but there's no need to write a lot unless you want to!
  • Try not to stress too hard about likelihood; it's admittedly somewhat subjective. That said, I encourage you to not ask only questions with Likely answers. Unlikely answers are fine too, just try to avoid making up a story that can't be reasonably inferred from the image.
Examples    (expand/collapse)
Detections: [person1] [person2] [car1] [car2] [car3] [car4]
  • Past caption: SOMEONE looks at her, Embarrassed, self-conscious, His habits making him appear unworthy.
  • This caption: It forces him to half-whisper something he hasn't at all said to himself.
  • Next caption: N/A
Question Answer Reason Likelihood
What is [person1] feeling right now? Currently, [person1] feels embarassed. [person1] is looking down at the ground instead of at [person2]. Likely (>75% chance)
What would happen if [person1] walked up to [person2] and tried to kiss her? [person2] wouldn't like it and would back away or slap him. [person2] looks angry at [person1], and she's standing far away from him. Possible (25% to 75% chance)
Where were [person1] and [person2] previously? They likely came from one of the houses. They currently are wearing slippers, which people usually only wear inside. Unlikely (<25% chance)
Detections: [person1] [person2] [person3] [person4] [person5] [person6] [person7] [person8] [person9] [diningtable1] [handbag1] [orange1] [tie1]
  • Past caption: Slowly, his eyes open and tears stream from them, rolling down his cheeks.
  • This caption: He speaks while holding the orange.
  • Next caption: SOMEONE turns to the woman behind him and the game resumes.
Question Answer Reason Likelihood
What is [person3] doing to [person1]? He is helping her put on her coat. His hands are on her coat, and she is clutching [handbag1] ready to leave. Likely (>75% chance)
Why is [person2] holding the orange under his chin? He is playing a party game. A popular party game is to pass an orange around without using ones hands. Possible (25% to 75% chance)
Why is [person2] crying? He is struggling to keep the orange under his chin. In the party game, a player loses if they drop the orange. Unlikely (<25% chance)
Detections: [person1] [person2] [person3] [person4] [person5] [person6] [chair1] [bed1] [tie1] [bottle1] [book1] [cup1] [sportsball1]
  • Past caption: The men gather around a pool table.
  • This caption: Stung, SOMEONE lies across the table and sets them up himself.
  • Next caption: They toss their money on the table, and SOMEONE shoots, but his shot is too hard and his ball leaps over the side of the table.
Question Answer Reason Likelihood
What is [person6] doing? [person6] is setting up a pool shot. [person6] is laying on the pool table with a pool cue. Likely (>75% chance)
Why is [book1] and all the money sitting on the table? [Person6] is determined to win, so he slammed all the money he has left on the table for a bet. [person6] has an angry look on his face, and [book1] seems to be his wallet as it is near his pants. Possible (25% to 75% chance)
Why are [person1], [person4], [person4], and [person5] watching [person6]? [person6] is making a difficult shot, and people have bet money. Pool is entertaining to watch, and people are especially intrigued when there is betting and drama in the game. Likely (>75% chance)
  • Past caption: The sub moves forward stirring up clouds of sediment as the second sub follows close behind.
  • This caption: A brown-haired man films himself with a camcorder as he stares out the window at the wreck.
  • Next caption: The man smiles at a bearded crew mate.
 
 
 
Q1
A1
R1
Q2
A2
R2
Q3
A3
R3
Optional feedback?    (expand/collapse)

If anything about the HIT was unclear, please comment below. I'd like to make my HITs easier for future workers, so I really appreciate feedback!