Image Retrieval Research Paper Example | Topics and Well Written Essays

?INTRODUCTION In recent years, the advancement in database management systems has enabled us to store and manage multimedia information. Multimedia information is important since the digital media is generated more than any other time now and there are many demands for extraction of the right information among this very big pile of data on the internet. Although database management systems are being more advanced each and every day, there is a question whether the information retrieval techniques used are capable enough to satisfy the needs of information seekers in complete? How should we retrieve images? There are different varieties of techniques being investigated in the area of information retrieval and the aim of this paper is to focus on the most popular and effective ones in extraction of images from multimedia databases. In general, we will discuss the two main research areas, using textual features and content based features but more specifically, we will discuss the technologies and advancements in employing the textual features. IMAGE RETRIEVAL There is a question of how to handle the vast growing multimedia information and the answer for that is to use multimedia databases. The aim is to retrieve images that most likely match with the query of a user. Users in general search for the images in databases using keywords and features like size, shape, location, etc. For retrieving the right information from such database using queries, two main approaches exist. [1] Firstly we could describe an image using a set of keywords but considering that manual labelling of all the multimedia information is a hard task. Another area of research which is at the centre of researcher’s attention is Content Based Image Retrieval (CBIR). [2] However, among these two main approaches, there is a question of efficiency of them. In information extraction, there is an issue called the semantic gap which implies on the difference between the high level interpretations the users have and how low level the features, text or texture, etc, are stored in the database. In other words, humans distinguish and perceive information from the images at a very higher and complex level than image features in the database. For example, things like context and emotion are those that are perceived by humans and not by such image retrieval algorithms. TEXT BASE IMAGE RETRIEVAL In order to label images with the keywords, we could either do it manually or automatically, using computer algorithms trained based on the textual information of image and the image features. [3] However even in the automatic way, for training the computer we need to manually provide a set of training data for example different category of images. In addition to the constraints above, when we describe images using a set of keywords, a set of conflicts might happen do to some of the well known issues in the natural language, including synonym words, using different words to express one thing or using a word with usages in different contexts. [4] [4] is one of the papers that suggests not using the text based feature alone since it is not going to be efficient due to the above natural language constraints. They recommend combining text features and visual features. There also some works that recently tried resolving the natural language constraints above. [5] The approach they used was Latent Semantic Analysis (LSA) used to joint feature vectors of textual and visual information of the images. The reason that the researchers mainly focus on the content based side of the research might be the problems that appears with labelling the data and of course the lack of accuracy caused by using general natural language processing techniques. Natural language processing is a complex task and full ability to process humans language has not been achieved yet. CONTEXT BASED IMAGE RETRIEVAL (CBIR) CBIR is a popular area of study in image retrieval these days. In this approach we index the images from their visual content. In other words we use features like shapes, texture and colour in the images to distinguish and classify them to proper categories. CBIR systems retrieve images based on basic visual features described above (shapes, texture and colour). Colour for instance is the first perceived visual effect by humans when they look at an image. Therefore it seems a very helpful visual feature. Using colours, we could differentiate between different objects in an image. Another important feature used in content based image retrieval is the texture. Using texture, we could find the direction of objects in the photos or detect a similar image when it is only rotated. There are also a group of research works that use a combined model of visual and conceptual models and achieve more accuracy. For instance by classifying based on colour feature (single visual feature) and then asking the users to specify more characteristics like shape, texture, etc. [6] There are also a group of works focusing in the semantic based image retrieval trying to cover the semantic gap issue in image processing. [4], [7] [4] uses an approach that is based on the Support Vector Machines (SVM) framework. To cover the semantic gap, it uses the WordNet an English dictionary of words with their synonym and antonym. The advantage or disadvantage of this technique, is to generate a feature vector for each image but overall it performs better than the traditional techniques based on the low level features. This research work also suggests using this content based image retrieval jointly with the text based approaches to achieve higher accuracy. TEXT BASED IMAGE RETRIEVAL PROCESS Text retrieval of images, started in 1970s. Although its an old technique and is based on text extraction which started in 1960s, many of the approaches use it in some way. It might be purely text retrieval of images or a combination of text retrieval and content based retrieval. Recent research work has proved that a combination of text based and visual based retrieval (hybrid image retrieval) achieves better results. In text based image retrieval, as we discussed in the previous section, the main issue of concern is labelling each image with a set of descriptive keywords. Even ignoring the fact that it would be very time consuming and expensive to label images in a massive sized multimedia database, still many key problems exist with the annotations. Many things are hard to describe about an image for instance the feelings, situations and emotions. It’s hard to label an image with scary because it depends on the definition of scary which is not clear. Different humans, perceive different things from an image regarding what is in an image and what the image is about for example. As we discussed earlier, it is hard to completely annotate all the multimedia content in a database and also there is a problem of Synonym hyponym and hypernyms, etc. There is also the issue of spelling errors and all these latter issues are the ones inherited form text processing or text mining or natural language processing. Basic text based image retrieval starts with Boolean searching of words but in advanced techniques a vector spaces is being used. Using a vector space, each word will be assigned with a dimension based on its frequency which implies on its importance among other search keywords. Using vector space, ranking system could be implemented. In other words images could be ranked according to their relevance to the keywords searched. There are different techniques that could be used in image retrieval, for example Bag of Words or N-Grams. We could remove the Stop Words. We could perform Stemming and perform spell correction and we could use the WordNet. WordNet is an English dictionary with the definitions of each word, their links and the hierarchy that they belong to. Using semantic approaches , a number of issues with text-based image retrieval could be resolved however in order to make the semantic gap smaller, the research has to go toward the intelligent image retrieval. Although it is a challenging problem and in order to solve it, many research work has to be done. COLLABORATIVE TAGGING Collaborate tagging is known as an important and recent support of technology for distributed cognition. Thanks to the presence of the World Wide Web, users can upload multimedia content including images and associate tags with them. Tagging is considered to be a new technology started to use in 2004 and 2005 (by Flickr). Since collaborative tagging involves humans directly in labelling the images, it could be considered as a helpful solution for text-based image retrieval systems. One important advantage of using tags is that they introduce a new sort of taxonomy of keywords and classifiers. This is a great help for users in browsing retrieved images that are closely related their search queries in terms of providing the right meta-data. This specially in case of a website like Flickr could be considered as an advantage over search engines like Google that their image retrieval system highly depends on the content of the web page. IMAGE TAGGING Having images tagged one by one and the right way, could significantly improve the performance of text based image retrieval techniques. Tagging could be based on: 1- The global images 2- Segmented regions in the images Different applications, consider different levels of tagging for the above. In some applications, users are given the both option of granularity tagging and they can determine how much they want to tag the images. GENERATING AND FORMING TAGS Collaborative tagging provides a cheap alternative by helping users add tags to the images online. However there will be a downside to this freedom of using tags by users and that is the reliability. Such systems are different from those that only experts annotate the images. Considering the fact above, we could provide an environment or model that users could collaborate with the professionals in labelling the images. For example once a main tag is assigned by a user to an image, system could suggest a set of related tags. CLUSTERING THE TAGS Collaborative tagging gives the users the chance of assigning tags to images however not every user assigns the same tags to an exact same image. Most of the time, these tags are relates and could be clustered into groups. We could think of tag clustering as a technique for identifying a set of tags that have semantic concepts in common. MOVING TO WARD COLLABORATIVE SYSTEMS Earlier in the background section we mentioned that text based image retrieval is not the only focus of the research since all the images in multimedia databases are not tagged and this techniques inherits the problems of natural language processing including the latent semantic, synonym and antonym words, etc. Collaborative tagging adopted by many well known applications online including Flickr has shifted the research toward new areas of research including Knowledge Propagation, using Emergent Semantics, Ontologies and Neighbour Voting, etc. In this paper, we will aim in introducing these techniques and making a comparison of the level of their effectiveness. KNOWLWDGE PROPAGATION An important issue in using collaborative framework is that not all the images in multimedia databases are annotated or if they are annotated, it might have not been properly done. Knowledge propagation techniques propagate the keywords from annotated subsets to those that are not annotated. The main structure of this model is based on discovering the correlation of the keywords in annotated images and visual features of the images that are not annotated . Once the correlation is learned, the classifier assigns keywords to the unannotated images. The works in knowledge propagation are mainly model-based or classifier based. [6][7] Model based approaches use the relation (co-occurence) among keywords and areas in the images. Co-occurance can be estimated using techniques like Latent Semantic Analysis. In the classifier based approach, a small set of annotated images will be used for training. One keyword is being used for every image. Next the classifier predicts the keywords for those not annotated. Different supervised machine techniques are used including SVM (Support Vector Machines) and Bayes classifier. [8] is one of the most recent works in knowledge propagation in collaborative tagging for image retrieval systems. In their approach, they segment images into regions. Then each feature is represented using feature vectors. They also use a weight measure to count the importance of specific regions. On the other hand, s subset if total images in the multimedia database is tagged and used by classifier for the training process. Estimating weight importance of the regions in the images is done via a Genetic Algorithm (GA). Therefore the GA algorithm selects the salient regions from the selected images. They also use a once class support vector machine (OCSVM) to calculate the density distribution of the images. After the weighting process is finished for the regions in the images, another classifier called the variable length radial basis function (VLRBF) is trained on the feature vectors of the regions and their weights. After these keywords classifiers are trained with the data, they are used to propagate keywords to the images that are not annotated. An overview of the entire model described in [8] could be observed in the image below. [8] measures the performance of their proposed knowledge propagation algorithm based on a 10,000 images sized corpus called Corel Gallery. In this database there are 100 different categories of images. Each category of image shares the same semantics. The visual features used are colour and texture. They predefine 100 keywords to describe each one of the 100 image categories. Next a small percentage of images from each category are selected for the training process. The GA algorithm helps selecting the salient regions in the images from the training set. Next the OCSVM algorithm determines the density of the salient regions by GA. A VLRBF classifier is used afterwards and then knowledge is propagated to the images not in the training set. Their experiments, compares the performance of their algorithm with a model-based approach called CMRM [9] and another approach based on the SVM [10]. For measuring the accuracy, they define the following: Strict Annotation Accuracy (SSA) and Relaxed Annotation Accuracy (RAA). In SSA, an image is considered to be correctly annotated if the keyword that has the largest weight within its auto-annotation, matches the ground truth annotation. In RAA, if there exists a keyword that coinsides with it’s ground truth annotation, it is considered to be correctly classified. The amount of imaged used for training is 10% and 20%. From the results in the table below, VLRBF is outperforming the CMRM method. When only 10% of the images are used, the proposed technique in [8] performs better than CMRM and is almost equal to the SVM approach. When only 20% of the images are used, it is outperforming both the CMRM and SVM based approaches. As we observed in the Knowledge Propagation approaches and more specifically the model proposed in [8] which is classifier based, the incomplete annotation problem of text based approaches in image retrieval systems, more specifically the collaborative tagging system is addressed. EMERGENT SEMANTICS Similar to the approach we described above, augmenting Navigation for Collaborative tagging With Emergent Semantics utilises the tags provided by users on collaborative tagging sites and using classification techniques and image analysis approaches for better image retrieval. The aim is to find new relations between data elements. This approach proposes a navigation map system in which with the relations between users, tags and data elements is described. This approach also tries to address the issues cause by collaborative tagging systems by combining social tagging and data analysis. This approach in contrast differs from other approaches due to the fact that earlier approaches as we described above, focus on detecting objects in the images and try to link them to the tags. They propose a new interface in which with, data could be explored either by tags or by visual features. Users start by searching for the images with a tag. Next the images are shown in the suggestion display area. User can add the images to the search collection. This process continues until the system suggests enough related images to the tag searched by user. The Figure below shows an example visualisation by searching began with the tag ‘fall’. There are two spheres, one is classifier sphere and the other is tag sphere. In the Figure above we could observe that how the tags and classifier sphere are related to each other within this database and what words could be searched by user to describe a fall theme image. For the data gathering and evaluation, they use a collection of 3000 images uploaded by 12 random Flickr users. The visual features are colour and texture. The search for the similar images starts by the user entering one or more search tags. The system suggests some initial images related to the tags searched. The overall process is by suggesting more related tags to the user until the results are refined enough and user is happy with the outcome. The advantage of the system is suggesting the tags that are related to images but depending on the users and the amount of information stored in the database, they are highly related but not often searched. NEIGHBOUR VOTING The intuition behind the neighbour voting is that amateur tagging (or in other words, tagging images by internet users, for example those by Flickr) is uncontrolled, ambiguous and personalised. One major problem in collaborative tagging, therefore is measuring how rerevant the tags specified by the users are considering their relation to their keywords. Neighbour voting is one of the proposed techniques [14] that learn the relevance of the tags by accumulating votes from the neighbour images. Collaborative tags can be ambiguous and highly personalised. To find an image related to a search query, the common sense is that the object or the content must be recognisable by most of the users. An example is submitting a query to Flickr for the word “airplane”. [14] The pictures C and D is the views of the outside of an airplane and D is the inside of an airplane by itself. C and D are subjective because the airplane concept cannot easily be captured by common users. Apart from that, individual tags are mostly used once per image. This means that relevant and irrelevant tags are not distinguishable by their frequencies. Therefore one of the problems in collaborative tagging could be the irrelevancy of the keywords of some of the images. There are methods that using machine learning techniques, find the relation between low level visual features and the high level semantic concepts of the keywords. [11][12][13] The disadvantage of the above techniques is that ‘compared to a potentially unlimited vocabulary existed in social networks, currently only a limited number of visual concepts can be effectively modelled using small-scale clusters.’ [14] If we consider different persons are labelling visually similar images with the same tags, these tags will reflect the objective aspect of the images. Therefore the intuition behind this approach is that we could infer the relevance of a tag to an image with respect to the relevance of that tag with neighbour images. The Figure below demonstrates the learning of tag relevance by neighbour voting. Each tag will be assigned a relevance tag value estimated by adding up the neighbour votes. For instance the voting value for the word “bridge” is 5 which demonstrates the relevance of the image to “bridge”. This relevance tag voting system has been applied on one million tagged images from the Flickr. [14] The size of the tags was 227,658 (unique tags). For measuring the accuracy the authors had to set their own ground truth because no other research was done on the data. The idea is to use 10 queries, consisting of 8 objective and 2 subjective concepts. For each concept, 1000 examples are selected. For the experiments, they index original image tags in a baseline system. They also index the same tags using learned tag relevance in a new system. By comparing the two mentioned systems. The experiments run by first querying based on single words and next with multiple keywords. (the figure below) The results attained from single query is very encouraging. As the Figure above shows the neighbour voting system outperforms the baseline. Same happens again for the multi word queries. The table above compares the accuracy of the baseline with the neighbour voting system. On average 13% more accuracy has been achieved using the neighbour voting system. The advantage of using a relevance tagging system is Reliability Scalability And Flexibility. ONTOLOGIES One other area of research proposes a combination of collaborative tagging and semantic web technologies. This approach [15] combines the low level features and high level descriptions. High level descriptions are created by content creators and a set of tags created by end users. The intuition behind this research is that although collaborative tagging and ontology are different things but they are complementary. The hypothesis is that by combining the two, the information retrieval process will be improved. Considering that a description has 3 levels, first one is the low level descriptions. Low level descriptions are those that are automatically extracted from the resources. They could be the resolution, or date of creation for example or their dominant colour. These are the core ontologies. High level descriptions are those that are provided by the creator of the image. These descriptions could be linked to the domain specific ontology concepts. The problem with the high level descriptions is that these descriptions are more specialized than the level of knowledge an end user has. In this approach a set of tags could be assigned to the images by end users in a dynamic way. [1] Daniel L. Swets and John J. Weng: Efficient Content-Based Image Retrieval using Automatic Feature. IEEE 1995 [2] Shingo Uchihashi, and Takeo Kanade: CONTENT-FREE IMAGE RETRIEVAL BASED ON RELATIONS EXPLOITED FROM USER FEEDBACKS. IEEE 2005. [3] Chen Zhang, Joyce Y. Chai, Rong Jin: User Term Feedback in Interactive Text-based Image Retrieval. SIGIR’05, August 15–19, 2005, Salvador, Brazil. [4] Marin F., Nozha B., Michel C.: Semantic interactive image retrieval combining visual and conceptual content description. Multimedia Systems (2008) 13:309–322 [5] Zhao, R., Grosky, W.I.: Narrowing the semantic gap—improved text based web document retrieval using visual features. IEEE Trans. Multimedia 4(2), 189–200 (2002) [6] Mori, Y., Takahashi, H., Oka, R. (1999). Image-to-word transfor-mation based on dividing and vector quantizing images with words. Proc. Int. Workshop on Multimedia Intelligent Storage and Retrieval Management. [7] Goh, K., & Chang, E. (2005). Using one-class and two-class SVMs for multiclass image annotation. IEEE Transactions on Knowledge and Data Engineering, 17(10), 1333–1346. doi:10.1109/ TKDE.2005.170. [8] @article{yap2010knowledge, title={{Knowledge Propagation in Collaborative Tagging for Image Retrieval}}, author={Yap, K.H. and Wu, K. and Zhu, C.}, journal={Journal of Signal Processing Systems}, volume={59}, number={2}, pages={163--175}, issn={1939-8018}, year={2010}, publisher={Springer} } [9] Jeon, J., Lavrenko, V., Manmatha, R. (2003). Automatic image annotation and retrieval using cross-media relevance models. Proc. Int. ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 119–126. [10] Chang, E., Kingshy, G., Sychay, G., & Wu, G. (2003). CBSA: Content-based soft annotation for multimodal image retrieval using bayes point machines. IEEE Transactions on Circuits and Systems for Video Technology, 13(1), 26–38. doi:10.1109/ TCSVT.2002.808079. [11] K. Barnard, P. Duygulu, D. Forsyth, N. de Freitas, D. M. Blei, and M. I. Jordan. Matching words and pictures. JMLR, pages 1107{1135, 2003. [12] Y. Jin, L. Khan, L. Wang, and M. Awad. Image annotations by combining multiple evidence & Wordnet. In ACM Multimedia, pages 706{715, 2005. [13] J. Li and J. Z. Wang. Real-time computerized annotation of pictures. TPAMI, 30(6):985{1002, 2008. [14] @conference{li2008learning, title={{Learning tag relevance by neighbor voting for social image retrieval}}, author={Li, X. and Snoek, C.G.M. and Worring, M.}, booktitle={Proceeding of the 1st ACM international conference on Multimedia information retrieval}, pages={180--187}, year={2008}, organization={ACM} } [15] @article{emiliocombining, title={{Combining Collaborative Tagging and Ontologies in Image Retrieval Systems}}, author={Emilio, J. and Gayo, L. and De Pablos, P.O. and Manuel, J. and Lovelle, C.}, publisher={Citeseer} } Read More

Image Retrieval - Research Paper Example

Extract of sample "Image Retrieval"

CHECK THESE SAMPLES OF Image Retrieval

Generalized Framework For Mining Web Content Outliers

Developing the Corporate Strategy for Information Security

Psychological Aspects of Cybersecurity

Forensic 3D Laser Scanning of Footwear Impression Evidence

Algorithms for Breast Cancer Decision Phase

Advances in Search Engine Technology and Its Impacts on Libraries

Optical Character Recognition System

Eye Biometric Recognition System - How It Works