Google Image Search on a photo from 10×8: Fork and Bowl After André Kértesz.
Currently about 200 billion digital images are held between three social sites alone: Facebook, Flickr and Instagram. As the digital image explosion on the internet continues unabated, the race to classify, store and make images accessible gathers pace. For many years text-based image retrieval systems have been used to catalogue digital images through associated keywords. While this is adequate for many users especially individuals cataloguing their family and friends photos, researchers believe the system will not support the full potential of new developments in technology. Researchers refer to the process as “categorisation of the visual world” and they admit that that the categorisation of objects and scenes, a fundamental human ability, is an important yet elusive goal for computer vision research.
For the past couple of years, Google and several other search operators like TinyEye and Digimarc have offered Image searches, which, as Google suggests are good for identifying “places, art and even mysterious creatures”. These searches are most useful if you have an image of a place or an artwork or perhaps a strange insect that you have snapped on your smart phone and would like to identify. It works well if you are looking for websites that may be displaying copies of your own images. You can just upload the image and the search will give you some matches.
Matching images as described above is one step forward, but what about the computer actually reading the image? Search-based image annotation is one method used by computer vision and machine learning communities. Currently ImageNet is the world’s largest visual database and one which is being used by many technologists as they teach machines to see like humans. The process of classification involves using WordNet, alexical database of English and an army of Amazonian Turks or Mechanical Turks as Amazon refers to them. These turks, or workers as they were once called, look at the images being stored in ImageNet and classify each image with regards to the object that it signifies. So begins the process of teaching a machine to see.
ImageNet was built by computer scientists at Princeton University in the USA and currently holds 14m images and more. In a recent New York Times article, John Markoff, in referring to the development of ImageNet suggests that “ while the Internet has given rise to a mountainous digital haystack of imagery, it also offers a path to clarity”, or does it? The ImageNet system may work for objects represented by the many nouns in the WordNet database, but what about the long-tail? What about searching for an image that represents what I am trying to express in words, or an image an artist has created that balances on the edge of the digital haystack?
According to a Microsoft Research Asia group there is still quite a bit of work to do with Content Based Image Recognition Systems. In an article this group published a couple of years ago, they suggested that “The challenges include how to construct a representative visual vocabulary, how to efficiently find relevant images for a long query, and how to compute “PageRank” for images for cache design and quality improvement.” Microsoft Researchers say that each image has a “salient object”. The difficulty they say is in differentiating the salient object from the background simply on appearances. All of this may work well when we are talking about the image as an illustration of the story, but what happens if the image is The Story? Is there a danger here that the simplification of reading an image could lead us down the wrong track? As the oft-quoted László Maholy-Nagy said “ the illiterate of the future will be the person ignorant of the use of the camera as well as the pen.” As our communication becomes more visual, our technology must rise to the challenge. We need to teach our machines to be able to use a camera as well as a pen if we want to see The Story and not just the illustration.