Assigning textual tags to an image is an important task because tags are needed for things like image search. When you search for an image of a “cat,” modern search engines can only identify an image as containing a cat if the tag “cat” is associated with it.
Having people tag images by hand is an onerous task. Shenoy and Tan of Microsoft Research developed a way to tag images automatically by reading people’s brain scans while they look at images. The people did not even have to specifically think about trying to tag the image; they merely had to passively observe it.
The technique requires using an electroencephalograph (EEG), a cap with electrodes placed on the scalp in regular locations that can each measure brain activity in their local area.
This image shows the layout of the electrodes.
The brain reacts differently when a person views different kinds of stimuli. For example, the following diagram shows the average brain response when the user views a picture of a face and a picture of a non-face (i.e. anything else). Red areas show high activity and blue areas show low activity. Each line on the graph corresponds to the activity levels recorded by a single electrode.
As can be seen, the brain response is not static but varies over time. However, the graphs show that the brain’s changing responses over time are predictable based on what kind of stimulus is shown.
Image Labeling through Mind Reading
Shenoy and Tan then used a machine learning algorithm called Regularized Linear Discriminant Analysis (RLDA) to develop an image tagging system. The researchers recruited several test users. The researchers presented a series of images to the users, and an EEG reading of the user’s brain was made upon presentation of each image. Users were not required think about tagging the image or think about what kind of image it was. As is common in psychological experiments, they were given a distracter task, a task that ensures they are paying attention to the images but does not specifically relate to the experiment. In this case, they were asked to remember the images so they could identify them later in a post-experiment test. The RLDA algorithm could then take as input the associated pairs – image and EEG scan – and learn to recognize what kinds of EEGs were associated with what kinds of images. This learning is often termed the building of a learned model, which represents everything that the artificial intelligence knows.
The system can then be used to automatically tag images. A user wearing an EEG is shown an image to which the tags are unknown, and the system uses the learned model to predict from the EEG what the appropriate tag for the image would be (e.g. this is a face or this is not a face).
The following are some of the images used in the study.
The users only had to view the images for short periods of time for the system to work, which means that many images could be labeled quickly by presenting them in rapid succession. The researchers experimented with different time lengths, but there was no difference between allowing the subject to view the image for 500 ms, 750 ms, or 1 whole second.
The following graph shows how accurate the system was at assigning the correct tag. The vertical axis shows “classification accuracy,” which is the percent of time the system assigned the correct tag. The different lines show the accuracy on different kinds of tagging tasks, e.g. tagging faces versus inanimate objects, faces versus animals, etc. The horizontal axis shows “number of presentations,” which is how many times a user was shown each image (EEG readings are noisy so multiple EEG readings for the same image made the system more reliable).
The new method invented by Shenoy and Tan is certainly not as accurate as current methods of image labeling and also not as discriminative: most of the results reported in the graph above are for discriminations between only two kinds of images, for example pictures of faces and pictures of animals. Only one of the experiments used the system to try to discriminate between three kinds of images (the results of which are labeled “3-class” in the graph).
Companies can get image labeling done just by hiring employees to do it. An even cheaper way is to use Amazon’s Mechanical Turk, which pays people a few cents to do simple tasks online. Image labeling is a common job on Mechanical Turk. Google also has their own novel method, a game called Google Image Labeler, which tries to make image labeling fun by making the task multiplayer and giving people points when they provide the same label as another player. Google Image Labeler was based on the original ESP Game created by Prof. Luis von Ahn of Carnegie Mellon.
However, the mind reading approach has the advantage that it does not require any work at all from the user. The user merely has to passively observe the image, for as little as 500 ms. One can imagine a system that tags images by reading your mind as you surf the web. If Google Image Search needed to tag an image, it could just pop it up in a window for 500 ms and read your thoughts to get the tag.
Work needs to be done in getting the system to discriminate between more kinds of images and making it more accurate. Challenges are posed by the coarseness of the data that is gathered by EEGs (after all they are reading your brain from all the way up on your scalp), and scientists’ currently weak ability to interpret what brain scans mean.