Task Info: I'm trying to teach an AI to understand the world through images. You can think of the AI as a five year old child: it can pick out some of the objects and people in the image, and knows basic attributes (such as color or size), but knows little else. That's where you come in!
To help the AI learn, you'll be given an image from a movie clip, a list of people and objects detected by our AI, and captions of previous/current/next clips. Please:
- Ask three questions about the image, like a child would. The best questions tend to be curiosity driven and relevant to what's interesting in the image: for instance, about what people are doing, why and how people are doing those actions, what might happen next/previously, and what might happen if something in the image (hypothetically) changed.
- Provide one reasonable answer for each question. Feel free to use the captions as inspiration here. In general, I'd like it if your question+answer pair involves making a likely inference about the situation. Ideally, the answer shouldn't be obvious without seeing the image, and it shouldn't be a story you made up.
- Write a short rationale justifying your answer. For this, imagine that you're explaining why your answer is true to a child. Please base your rationale around:
- specific lower-level observations in the image that let you infer your answer, and
- general rules about how the world works.
- Mark the likelihood that you think your answer is true, based on the image.
Guidelines:
- Please use the AI's detected objects and people in questions, answers, and reasons, as this helps the AI learn about what it found. Please format the detections as they are given, in brackets - ie, "What is [person1] doing?"
- Feel free to use pronouns if it's obvious after you mention a detection.
- If the AI produces a wrong detection, please ignore it unless it's only the name that is incorrect. For instance, if a bagel is labeled as [donut1] then that's fine! You can ask about the bagel, just call it [donut1].
- Please avoid asking counting questions ("how many X are there?"), questions about color or size, simple verification questions ("Is [person1] wearing pants?"), questions about who the characters/actors are, and questions that have a one-word answer. Our AI already can answer these questions.
- Please try to ask about people's physical and mental actions, and how they relate to the objects in the world. These tend to be more interesting than questions involving the characters' physical attributes (what they are wearing, standing/sitting on, etc).
- For rating likelihood, please ignore the captions. For instance, if your answer is that "Afterwards, [person1] starts dancing" and that's not obvious from the image, you should rate Unlikely (<25% chance) even if the answer is implied by the next caption.
FAQ/Less important guidelines: (feel free to skip)
- Feel free to use pronouns if it's obvious after you mention a detection.
- You can use the buttons to explore/hide the detections.
- It takes me about 40 seconds per question+answer+reason+likelihood, writing around 6 words per question, around 6 words per answer, and around 10 words per reason. No worries if you take more or less time, but there's no need to write a lot unless you want to!
- Try not to stress too hard about likelihood; it's admittedly somewhat subjective. That said, I encourage you to not ask only questions with Likely answers. Unlikely answers are fine too, just try to avoid making up a story that can't be reasonably inferred from the image.