Motivation of judging guidelines
The AAAI Scavenger Hunt challenges robots to locate and
obtain objects within the conference's natural setting -
a hotel or convention center. In order to encourage a wide variety
of AI researchers to field a system, the 2006 scavenger hunt
will scaffold its challenge through five different facets of
intelligent interaction with the physical environment:
A) spatial reasoning
B) object recognition
C) language and communication
D) human awareness
E) task awareness and planning
In each of these five, there are several different levels
of competence that a robotic system might demonstrate. For
ease of reference, these levels are labeled in
decreasing order of ability:
1) Concierge
2) Gofer
3) Scavenger
4) Wanderer
5) Slacker
This leads to a matrix of possible capabilities that
scavenger hunt systems might demonstrate.
Some participants might focus on achieving a high level of
competence in only one facet; others might put more effort
into integrating these components. Both types of entry
are welcome, and we will work with all participants to
make the scavenger hunt a motivating and rewarding venue.
Note that this framework merely wraps the previous
scavenger hunt with some additional directions that teams
might pursue. Any entry designed with the prior rules in mind
can still participate fully and without change.
Here are the original point values for reasoning
and handling the objects listed on the
Scavenger Hunt Item Page.
Human Awareness | ||
Capability | Points | Initial description |
---|---|---|
Slacker | 0 pts | No awareness of agents outside the system itself, i.e., humans are annoyingly-shaped walls that may or may not correspond to the floorplan. (What floorplan?!) |
Wanderer | 5 pts | At this level, the system would show the ability to handle disinterested humans robustly, e.g., it will pursue its task goals even with people walking by the system at ordinary speed, pause to look at it, etc. People will not try to interfere with the system or its functioning at this level. Also, they will be relatively sparse in time and space: perhaps at most 2, at most every minute, and for at most 15 seconds per "interaction." |
Scavenger | 10 pts | Here, a system will not only act robustly to disinterested humans, but will succeed (some/most of the time) in realizing and expressing the fact that a human is "interacting." At this level, the system should also become an active agent in this exchange - not (necessarily) in a typical human/robot interaction way. For example, the robot might ask the person to move (but not ask a wall to move). |
Gofer | 20 pts | would demonstrate the ability to identify people in the environment and to solicit their help, if they are willing to give it. Characterizing this level of human awareness is the notion that people are a _resource_ and not just an annoyance :) |
Concierge | 80 pts | would actively seek out people and engage with them in order to accomplish a specific part of a spatial task, e.g., getting directions or asking for more information on an object. |
Object Recognition | ||
Slacker | 0 pts | No object recognition capabilities demonstrated |
Wanderer | 5 pts | The ability to distinguish up to 5 preselected objects in known poses, based on 1-2 features of their "appearance" to the sensor in question. For visual sensors these sensor features could be color, texture, color composition, or shape, among others. For direct range-sensors, this would be identifying particular categories of depth patterns. (1 point per object) |
Scavenger | 10 pts | The ability to distinguish up to 10 preselected objects in unconstrained (or minimally constrained) poses, using 3 or more facets of the objects' "appearance" to the sensor(s) used. |
Gofer | 20 pts | The ability to identify objects within a set of five well-defined categories, e.g., "a pillow," "a chair," "a newspaper, or "a bottle of champaigne." |
Concierge | 80 pts | The ability to identify objects within a much larger and less well-defined set of categories, of specific instances of the previous categories. For example, "today's USA Today," "some shampoo," "tickets to Spamalot." |
Spatial Reasoning | ||
Slacker | 0 pts | No spatial reasoning capabilities demonstrated |
Wanderer | 5 pts | demonstrating the ability to use a human-provided map of environmental landmarks and obstacles Reasoning about the robot's location and objects' locations within the map would be demonstrated. |
Scavenger | 10 pts | demonstrating the ability to build a map of environmental features, assuming a static environment. Reasoning about the robot's location and objects' locations within this constructed map would be demonstrated. |
Gofer | 20 pts | would show an ability to recognize and reason about a limited (and a priori known) set of possible changes to the environment during task performance, e.g., doors being open/closed |
Concierge | 80 pts | would recognize, represent, and reason about any realistic changes to the environment during task performance, e.g., adding/removing furniture and movement of the objects to be retrieved |
Task Awareness and Planning | ||
Slacker | 0 pts | Tasks? What tasks? |
Wanderer | 5 pts | Here the system seeks to locate, find, record, perhaps manipulate objects in the environment with an awareness and external expression of what subtask it is currently performing. Awareness, but not planning, is required at this competence level. |
Scavenger | 10 pts | At this level, a system will not only be aware (and able to
express) the current subgoal(s) it is trying to achieve, but
it needs to demonstrate the ability to change its approach
in the face of changing circumstances. In addition, the
system should explicitly indicate that it is going to attempt
a different strategy and then articulate that new
strategy for achieving objectives.
This level of competence would include spatial replanning in the case of an unexpected obstacle, sensor-based replanning if object uncertainty is too high, identifying and asking a human for help if the system becomes lost or otherwise unable to achieve a goal, etc. |
Gofer | 20 pts | would demonstrate the ability to consider many different possible plans of action and then choose and execute the most suitable one. This reasoning should be available to onlookers, along with the system's notion of its current level of success and anticipated success in accomplishing the (sub)task. |
Concierge | 80 pts | should show the ability to explain what was tried and why it failed or succeeded, but tasks that fail will provide fuller tests of sophisticated systems - and such tasks will always be available! |
Language and Communication | ||
Slacker | 0 pts | No language-based communication capabilities demonstrated: the system uses menu/GUI inputs alone |
Wanderer | 5 pts | shows the ability to parse and respond to short
phrases entered by humans (who are not the system's
designers) using a very restricted vocabulary (20-30
concepts/terms). Only one language modality (visual, i.e.,
printed-sign-reading or audio or keyboard input)
needs to be implemented at this level. Presumably, this
would be keyboard input.
The language inputs may be requested by the system, e.g., to obtain help, or may be intiated by a user, e.g., to provide direction or a goal. However, the system should demonstrably and appropriately change its behavior in response to the input received. In addition, the system should have a mechanism by which it indicates when a language-interaction has _not_ been understood. |
Scavenger | 10 pts | demonstrating the ability to handle a less restricted vocabulary/grammar in receiving language-based inputs, e.g., full-sentence inputs that can handle >100 terms that could be entered by AI conference attendees with a minimum of prior introduction to the system's limitations. At this level, the system would also respond constructively to inputs it did not understand, e.g., suggesting alternatives that might be intended or providing tips on how humans might interact more fluidly. |
Gofer | 20 pts | would show the ability to handle at least two language modalities (presumably adding visual or audio inputs) at the scavenger level. In addition, the system should be able to handle short interactions created by non-experts (not just AI researchers who did not build the system) and would have a vocabulary greater than 2000 terms. |
Concierge | 80 pts | would recognize the use of visual, text-based, and audio language in humans' efforts to direct the system (human- initiated) and would be able to use visual and audio language effectively in order to seek help in its navigating and other spatial-reasoning tasks |