Scavenger Hunt Judging Guidelines Proposal

2006 Scavenger Hunt Judging Guidelines Proposal

Motivation of judging guidelines

  The AAAI Scavenger Hunt challenges robots to locate and
obtain objects within the conference's natural setting -
a hotel or convention center. In order to encourage a wide variety
of AI researchers to field a system, the 2006 scavenger hunt
will scaffold its challenge through five different facets of
intelligent interaction with the physical environment:

  A) spatial reasoning
  B) object recognition
  C) language and communication
  D) human awareness
  E) task awareness and planning

  In each of these five, there are several different levels
of competence that a robotic system might demonstrate. For
ease of reference, these levels are labeled in
decreasing order of ability:

  1) Concierge
  2) Gofer
  3) Scavenger
  4) Wanderer
  5) Slacker

  This leads to a matrix of possible capabilities that
scavenger hunt systems might demonstrate.

  Some participants might focus on achieving a high level of
competence in only one facet; others might put more effort
into integrating these components. Both types of entry
are welcome, and we will work with all participants to
make the scavenger hunt a motivating and rewarding venue.

  Note that this framework merely wraps the previous
scavenger hunt with some additional directions that teams
might pursue. Any entry designed with the prior rules in mind
can still participate fully and without change.

  Here are the original point values for reasoning
and handling the objects listed on the Scavenger Hunt Item Page.

locate an object and approach it (5 pts each)
identify objects based on color and shape (10 pts each)
indicate the relative positions of the objects found (~20 points)
find and move an object to a designated location (30 points)
create a map (of some sort), including the objects found (30 points)

In addition, these values may be increased or decreased by the judges
depending on the following factors:

the robustness of the system in handle these tasks despite small
(+25%), medium (+50%), or large (+100-200%) perturbations in its state

the repeatability of success at these tasks: full (100%), mostly (75%),
occasionally (50%), or rarely (25%)

the integration of the system's component parts, with deeper integration
receiving larger percentages of the full points. For example, an off-the-shelf mapping
package that displays the robot's gathered data but otherwise does not affect its
behavior or reasoning will score lower on this criterion than a mapping system
whose output facilitates further exploration or other behavioral adjustments.

the accuracy and utility of the system and its reasoning,
so that (continuing with the mapping example) simply presenting the raw odometry log
as a map would not score as large a percentage of the full score as one that
accurately merged identical environmental features.

the presentation of the system and its reasoning components
is an important factor, too. Well-presented systems will convey both
their strengths and weaknesses, the important design choices made, and
where additional work on the system would go in the future.

  The table below is meant as a guide for how the judges
will try to compare systems that take very different approaches to
the spatial reasoning tasks. They also intend
to provide some additional structure, e.g., to distinguish
"human awareness" here from the separate human/robot interaction
event at AAAI. An advantage of the AAAI robot competition is that
all participants are welcome to tailor the venue to help motivate
their own particular projects. Thus, these guidelines will change to
meet participants' needs and interests.

  Disclaimers: these facets are not mutually exclusive. (Who would
claim AI was perfectly modular?) Also, one might say that a
"concierge"-level system in all five -- or even just one --
spatial reasoning category goes improbably far toward "solving
the AI problem." Whether or not this is true, it is perhaps not
a bad thing to keep such lofty goals in mind, even if they are still distant.

  Additional Disclaimers: there is no way to BOTH
encourage a wide variety of approaches to spatial-reasoning tasks AND
to create a completely objective means of judging very different
entries. The judges reserve the right to present multiple
first-place awards, if the conditions warrant. Technical awards for
particular behavioral strengths or specializations will also be considered.

Capability	Points	Initial description
Human Awareness
Slacker	0 pts	No awareness of agents outside the system itself, i.e., humans are annoyingly-shaped walls that may or may not correspond to the floorplan. (What floorplan?!)
Wanderer	5 pts	At this level, the system would show the ability to handle disinterested humans robustly, e.g., it will pursue its task goals even with people walking by the system at ordinary speed, pause to look at it, etc. People will not try to interfere with the system or its functioning at this level. Also, they will be relatively sparse in time and space: perhaps at most 2, at most every minute, and for at most 15 seconds per "interaction."
Scavenger	10 pts	Here, a system will not only act robustly to disinterested humans, but will succeed (some/most of the time) in realizing and expressing the fact that a human is "interacting." At this level, the system should also become an active agent in this exchange - not (necessarily) in a typical human/robot interaction way. For example, the robot might ask the person to move (but not ask a wall to move).
Gofer	20 pts	would demonstrate the ability to identify people in the environment and to solicit their help, if they are willing to give it. Characterizing this level of human awareness is the notion that people are a _resource_ and not just an annoyance :)
Concierge	80 pts	would actively seek out people and engage with them in order to accomplish a specific part of a spatial task, e.g., getting directions or asking for more information on an object.
Object Recognition
Slacker	0 pts	No object recognition capabilities demonstrated
Wanderer	5 pts	The ability to distinguish up to 5 preselected objects in known poses, based on 1-2 features of their "appearance" to the sensor in question. For visual sensors these sensor features could be color, texture, color composition, or shape, among others. For direct range-sensors, this would be identifying particular categories of depth patterns. (1 point per object)
Scavenger	10 pts	The ability to distinguish up to 10 preselected objects in unconstrained (or minimally constrained) poses, using 3 or more facets of the objects' "appearance" to the sensor(s) used.
Gofer	20 pts	The ability to identify objects within a set of five well-defined categories, e.g., "a pillow," "a chair," "a newspaper, or "a bottle of champaigne."
Concierge	80 pts	The ability to identify objects within a much larger and less well-defined set of categories, of specific instances of the previous categories. For example, "today's USA Today," "some shampoo," "tickets to Spamalot."
Spatial Reasoning
Slacker	0 pts	No spatial reasoning capabilities demonstrated
Wanderer	5 pts	demonstrating the ability to use a human-provided map of environmental landmarks and obstacles Reasoning about the robot's location and objects' locations within the map would be demonstrated.
Scavenger	10 pts	demonstrating the ability to build a map of environmental features, assuming a static environment. Reasoning about the robot's location and objects' locations within this constructed map would be demonstrated.
Gofer	20 pts	would show an ability to recognize and reason about a limited (and a priori known) set of possible changes to the environment during task performance, e.g., doors being open/closed
Concierge	80 pts	would recognize, represent, and reason about any realistic changes to the environment during task performance, e.g., adding/removing furniture and movement of the objects to be retrieved
Task Awareness and Planning
Slacker	0 pts	Tasks? What tasks?
Wanderer	5 pts	Here the system seeks to locate, find, record, perhaps manipulate objects in the environment with an awareness and external expression of what subtask it is currently performing. Awareness, but not planning, is required at this competence level.
Scavenger	10 pts	At this level, a system will not only be aware (and able to express) the current subgoal(s) it is trying to achieve, but it needs to demonstrate the ability to change its approach in the face of changing circumstances. In addition, the system should explicitly indicate that it is going to attempt a different strategy and then articulate that new strategy for achieving objectives. This level of competence would include spatial replanning in the case of an unexpected obstacle, sensor-based replanning if object uncertainty is too high, identifying and asking a human for help if the system becomes lost or otherwise unable to achieve a goal, etc.
Gofer	20 pts	would demonstrate the ability to consider many different possible plans of action and then choose and execute the most suitable one. This reasoning should be available to onlookers, along with the system's notion of its current level of success and anticipated success in accomplishing the (sub)task.
Concierge	80 pts	should show the ability to explain what was tried and why it failed or succeeded, but tasks that fail will provide fuller tests of sophisticated systems - and such tasks will always be available!
Language and Communication
Slacker	0 pts	No language-based communication capabilities demonstrated: the system uses menu/GUI inputs alone
Wanderer	5 pts	shows the ability to parse and respond to short phrases entered by humans (who are not the system's designers) using a very restricted vocabulary (20-30 concepts/terms). Only one language modality (visual, i.e., printed-sign-reading or audio or keyboard input) needs to be implemented at this level. Presumably, this would be keyboard input. The language inputs may be requested by the system, e.g., to obtain help, or may be intiated by a user, e.g., to provide direction or a goal. However, the system should demonstrably and appropriately change its behavior in response to the input received. In addition, the system should have a mechanism by which it indicates when a language-interaction has _not_ been understood.
Scavenger	10 pts	demonstrating the ability to handle a less restricted vocabulary/grammar in receiving language-based inputs, e.g., full-sentence inputs that can handle >100 terms that could be entered by AI conference attendees with a minimum of prior introduction to the system's limitations. At this level, the system would also respond constructively to inputs it did not understand, e.g., suggesting alternatives that might be intended or providing tips on how humans might interact more fluidly.
Gofer	20 pts	would show the ability to handle at least two language modalities (presumably adding visual or audio inputs) at the scavenger level. In addition, the system should be able to handle short interactions created by non-experts (not just AI researchers who did not build the system) and would have a vocabulary greater than 2000 terms.
Concierge	80 pts	would recognize the use of visual, text-based, and audio language in humans' efforts to direct the system (human- initiated) and would be able to use visual and audio language effectively in order to seek help in its navigating and other spatial-reasoning tasks