Multimedia Data Fusion Report Example | Topics and Well Written Essays

MULTIMEDIA DATA FUSION Name Institution Course Date Multimedia Data Fusion: Multimedia data fusion is the manner in which the different features of multimedia is combined with an aim of analyzing specific media tasks. The process can also be regarded as multimodal fusion. However to obtain a good understanding of the data multimedia analysis of this multimodal data has to take place. The most common examples of multimedia analysis are: semantic concept detection, audio-visual speaker detection, and human tracking and event detection. In such cases, the multimedia data used can either be sensory or non-sensory. Examples of sensory multimedia task are audio, video or RFID while the non-sensory are like the online resources such as database and WWW resources (Xian & Guan, 2013) The aim of fusion is to complement on the quality hence multimedia analysis involved fusion of the available modalities to ensure the output has a better accuracy and the decision making process is reliable. A good example is the use of audio features together with the visual features plus text input while analyzing a sporting event represented in a video. It is however good to note that fusion will increase the cost and make the system analysis more complex (Zhou, Leung & Yao, 2013). Fig 1: Multimedia fusion frame Fig 2: Source (http://www.indiedb.com/engines/multimedia-fusion/images/multimedia-fusion-frame-editor) However it is good to note that: i) the media vary in format and rates hence a video can be captured at a rate different from the audio hence enjoy different frame rate; ii) the media streams have different processing time hence the chosen strategy has to consider that; iii) The media modalities are either correlated or independent and the modalities vary in the confidence level required to finish the task; and iv) the fusion process take into consideration some cost that is required for capturing and processing of the media. The Level of Multimodal Fusion: Multimodal fusion is defined as the combination of several multimedia sources plus there features to complement analysis performance. The levels of multimodal fusion can be classified into three namely: Feature level (early fusion) decision level (late fusion) and the combination of the two which is referred as hybrid fusion. 1. Feature level multimodal fusion: It is also referred to as early level multimodal fusion and involves the picking of the ideal features from input data, they get combined and the outcome is forwarded to a single analysis unit (AU) to carry out the analysis. The media stream has distinct features with varying properties. A good example is the feature fusion which combines multimodal features like the skin color and motion cues. Therefore the combination of the features received is combined into a single sematic level decision (Klein, 2004). Fig 3: Analysis unit F1/D1 Source: (Atrey, Hossain, El Saddik & Kankanhalli, 2010) Several features exist and can be combined to create the desired outcome. Examples of possible features are: visual features which can be based on color, texture and shape, Text features which are possible to extract from ASR, OCR, video closed caption text and possible production metadata, Audio features which are normally generated according to their FFT or MFCC coupled with features such as ZCR, LPC, volume standard deviation, non- silence ratio and pitch; iv) motion features which are frequently represented kinetic energy form hence give the possibility of measuring the pixel fluctuation in relation to shot, motion direction, magnitude histogram, optical flows and motion pattern formation direction and metadata which are used to complement the data during the production process. Examples are the time stamp, name, image source (video) and finally shots locations (Maybury, 2012). 2. Decision level multimodal fusion It is sometimes called late fusion approach and its analysis unit normally 1st provide the system local decision D1 to Dn that are normally obtained based on individual features F1 to Fn. Using decision fusion the system can be combined (DF) unit to result to a fused decision vector and that may be analyzed further and further to obtain the final decision output D regarding the task or possible analysis (Atrey, Hossain, El Saddik & Kankanhalli, 2010). Fig 3: Decision level multimodal F1 F2 F3 Source: (Atrey, Hossain, El Saddik & Kankanhalli, 2010) 3. Hybrid level multimodal fusion Hybrid level multimodal fusion is meant at enjoying both the advantages accrued from the Decision level multimodal and Feature level multimodal fusion. The features in this case are in the 1st instance fused with a FF unit and then the resultant vector is analyzed by an AU. Consequently the individual features are studied under other completely different AUs together with other decision features using the DF units (Sharma & Kaur, 2013). Further fusion occurs in the latter stages of all the decision obtained as the final decision. Fig 4: Hybrid level multimodal F1 F2 Fn-1 Source: (Atrey, Hossain, El Saddik & Kankanhalli, 2010) Technical Review in Fusion Text and Image for Mining in Social Media Increased use of social media like Twitter, Facebook, and Instagram increased the volume of the flowing data to deal with in terms of analyzing and data extraction. The social network content are multimedia, images, and texts. To get this information from twitter, the multimedia has to use text mining techniques that give them automatic ability to detect the sent message. Once the message I twitted (written), they are filtered out to remain with the group of English tweets. Some Spanish & Dutch tweets still remain at this stage (He, Zha & Li, 2013). . The remaining tweets then are: tokenized- convert the list of strings into tokens based on the whitespace and also remove the punctuations, stop word filtering-eliminate common words as their presence is mean, stemming filtering-remove words to it’s by removing the suffixes and the prefixes, and indexing-use of TF-IDF which weighs twitter features based on the frequency of use of each word in a single tweet compared to overall number of tweets (Sun, Wang, Cheng & Fu, 2014). To ensure accuracy image mining of data, three vector features can be used namely: histogram of oriented gradient (HOG), Grey-level Co-occurrence matrix (GLCM) used to describe the color and the texture. HOG descriptors which are utilized in the computer vision and image processing all used for data/object detection. However the appearance of the object and image shape are dependent on the intensity gradient .However GLCM is crucial in text description and is mostly applied in measuring the surface textures. Therefore fusion is applied for text and image by mere combination of the image and the text features (Kompatsiaris & Hobson, 2008). It is proper to note that in fusion method, where the text mining score is dismal in comparison to the threshold, the text mining in such as case cannot be depended on hence the tweet is solely classified using the image only and vice versa (M. Alqhtani, Luo & Regan, 2015). `Internet increases the growth of the need of digital multimedia information. The common information are normally images and text and at time all in one. Fusing futures in multimedia therefore take two paths, either through late fusion or early fusion. In late fusion, it focuses on multiple features and carries the fusion strategy using different candidate result though correlation between original features may cause this strategy to underperform. Early fusion enhances similarity evaluation through mapping different features in a unified space. The system experience problem like high cost due to its unified feature space built in respect to the global statistical information .The creation of large database using this technique is expensive due to the diversity and web content involve in social media like twitter (Liu & Qin, 2014). Unfortunately both the early fusion and late fusion techniques have failed to dominate the social media multimedia data fusion requirement forcing adoption of other methods considered more reliable like feature interaction graph (FIG). Social media database which is defined as multi-modal multimedia object can be represented by equation D = {Oi|i = 1, 2... |D|}.The media content can be classified into textual features, visual content features and user features. Textual features deal with text, visual content features deal with images especially the color, texture, edge and visual words and finally user feature deal with images especially people images (Naaman, 2010). Therefore multimodal object can be represented as O where O = [T, V, U] Multimodal fusion occurs in different forms that ranges from sematic signal level to sematic level. Signal level is applicable in audio visual techniques especially speech recognition or in robotics field where image data is possible combined with other sensors input. However sematic fusion multitasks by doing different multimodal input (Naaman, 2010). Dempster-Shafer Data Fusion Theory Dempster-Shafer theory is more focused on the belief unlike the Bayes theory which focused on the probability. Dempster-Shafer evidence theory offers an alternative to traditional probabilistic theory for the mathematical representation of uncertainty. Dempster –Shafer evidence theory enjoys the advantage is that it has the ability to deal luck of ideal information (ignorance) and missing details in the data. The second advantage is its ability to deal with union of classes. Dempster-Shafer Data Fusion Theory is a mathematical theory of evidence normally used in a situation observation from varying sources are summed together to give a degree of belief that consider all the evidence presented (Li, Luo & Jin, 2010). The initial requirement of the Dempster-Shafer theory is since the theory s mass dependent, it will require that masses be assigned to it meaningfully in different ways. At the same time Dempster-Shafer theory will require preliminary PRIOR information that is present at that particular time and the masses should be assigned in such a way that it shows the knowledge of the system. However it is better and safer if once is undecided rather than making wrong choice of target and acting in a way that may lead to several consequences (Zhang, Liu & Zhang, 2014). The principle of operation is based on the knowledge that the level of belief for a given question is obtainable from other subjective probabilities of the other relevant question. Dempster-Shafer’s rule is therefore used to combine the varying degree of belief in case independent evidence is available (Li, Zheng, Xuan & Jiang, 2014). Demeter’s rule of combination at times is generally taken to approximately be similar to Bayes rule although in its interpretation it is unnecessary to specify the priors and conditionals. Unfortunately Bayer uses symmetry error arguments to in cases that involve assigning probabilities to random variables. Dempster-Shafer’s theory only allow one to specify degree of ignorance in such situations but avoid the option supplying probability to top up the unity. The aim of the theory is to decompose the evidence so that probability judgment separately based on each components of evidence which are the combined by the dempster’s rule. Therefore the rule combine parallel belief functions which maybe unrelated to create a pool of belief function which is a summation of the two. Dempster-Shafer theory contains two new theories that are missing in Bayes theory. These Two theories are notions of support and plausibility. In case for the support for the target become “quick” it is defined it is defined to be the total mass of all the states referring to it as the "fast". Support is another name for belief which is denoted by the abbreviation Bel and it’s used to measure how strong the evidence favor P and it ranges from no evidence at 0 to certainty 1 (Rottensteiner, Trinder, Clode & Kubik, 2005). Plausibility can be defined as [Pl (p) =1-Bel (~p)] and is normally ranges between 0 to 1with a function to measure how much –p leaves pace for belief in p. spt{A) = ∑m{B) …………………………………….(1) BCA The support is an example of a loose but lower limit to the uncertainty. On the other hand, a loose Upper Unit to the uncertainty is the plausibility. The definition state that even for the fast state, the total mass of all other states will not contradict the fast state (Li, Zheng, Xuan & Jiang, 2014). Pls (A) = ∑m (B) …………………… (2) A∩B≠Ɵ Belief ≤ plausibility Data fusion is a relatively new field with most method still regarded unreliable. However Demister-Shafer theory though relatively new is more reliable compared to the rest in data fusion. Demister-Shafer theory calculations: Where Where A is the number of power set in the equation Where |A − B| is the difference of two set (Paksoy & Göktürk, 2011). Event Detection in Twitter Twitter has created a platform where people share among other things real life events happening in real time. However considering that most of the tweets are meaningless, there is need to design a mechanism that detects crucial shared events in almost real time (Mao, 2012). Several that happens and are tweeted about like concerts, disaster, sports events, public celebrations or even protests should be directly detected by such software’s. However these events can be presented online in terms of text data, image data or both hence the technology should be able to draw out the difference and notice all cases (Atefeh & Khreich, 2013). According to Mungro (2014) the system meant to detect the online event has never been perfected and this leads to it always calling for action where it is never necessary. He points out that though four new detection methods were added onto the original method Netevmon, they are not all worth relying on. However he points out that Dempster-Shafer is a promising fusion method that might only need slight improvements to reach a level it can be trusted to raise right alarms and at the right time. Noisy hydrophone data sequence is the new method that is used to detect and extract images used on twitter online events. The method depends on the unbelievable huge orientation and a rather robust reconstruction that is depending on the mutual information (MI) to measure the images. Hence the created map of spectrogram image is important in providing segments of various acoustic events (Zhou & Chen, 2013). Fig 5: Image Data Source (Huang, 2012) Fig 6: Block Diagram of text data Source: (Huang, 2012). References Atefeh, F., & Khreich, W. (2013). A Survey of Techniques for Event Detection in Twitter. Computational Intelligence, 31(1), 132-164. doi:10.1111/coin.12017 Atrey, P., Hossain, A., El Saddik, A., & Kankanhalli, M. (2010). REGULAR PAPER. Multimodal Fusion for Multimedia Analysis: A Survey, (19), 345-379. Doi:DOI 10.1007/s00530-010-0182-0 He, W., Zha, S., & Li, L. (2013). Social media competitive analysis and text mining: A case study in the pizza industry. International Journal of Information Management, 33(3), 464-472. doi:10.1016/j.ijinfomgt.2013.01.001 Huang, R. (2012). Active media technology. Heidelberg: Springer. Klein, L. (2004). Sensor and data fusion. Bellingham, Wash.: SPIE Press. Kompatsiaris, Y., & Hobson, P. (2008). Semantic multimedia and ontologies. London: Springer. Li, J., Zheng, C., Xuan, H., & Jiang, Y. (2014). Data Fusion in Environment Monitoring Systems With Extended Dempster-Shafer Theory. AMM, 543-547, 1074-1077. doi:10.4028/www.scientific.net/amm.543-547.1074 Li, J., Luo, S., & Jin, J. (2010). Sensor Data Fusion for Accurate Cloud Presence Prediction Using Dempster-Shafer Evidence Theory. Sensors, 10(10), 9384-9396. Doi: 10.3390/s101009384 Liu, T., & Qin, H. (2014). Detecting and tagging users’ social circles in social media. Multimedia Systems. Doi:10.1007/s00530-014-0435-4 Naaman, M. (2010). Social multimedia: highlighting opportunities for search and mining of Multimedia data in social media applications. Multimedia Tools And Applications, 56(1), 9-34. Doi: 10.1007/s11042-010-0538-7 M. Alqhtani, S., Luo, S., & Regan, B. (2015). Fusing Text and Image for Event Detection in Twitter. The International Journal of Multimedia & Its Applications, 7(1), 27-35. doi:10.5121/ijma.2015.7103 Mao, J. (2012). Multimodal data fusion as a predictior of missing information in social networks. Maybury, M. (2012). Multimedia information extraction. Hoboken, N.J.: Wiley. Mungro.M, 2014.Rating the significance of the detected Network Events. Department of Computer Science, Hamilton New Zealand. Paksoy, A., & Göktürk, M. (2011). Information fusion with dempster-shafer evidence theory for Software defect prediction. Procedia Computer Science, 3, 600-605. doi:10.1016/j.procs.2010.12.100 Sharma, P., & Kaur, M. (2013). Multimodal Classification using Feature Level Fusion and SVM. International Journal of Computer Applications, 76(4), 26-32. Doi: 10.5120/13236-0670 Sun, J., Wang, G., Cheng, X., & Fu, Y. (2014). Mining affective text to improve social media Item recommendation. Information Processing & Management. doi:10.1016/j.ipm.2014.09.002 Rottensteiner, F., Trinder, J., Clode, S., & Kubik, K. (2005). Using the Dempster–Shafer method for the fusion of LIDAR data and multi-spectral images for building detection. Information Fusion, 6(4), 283-300. doi:10.1016/j.inffus.2004.06.004 Xie, Z., & Guan, L. (2013). Multimodal Information Fusion of Audiovisual Emotion Recognition Using Novel Information Theoretic Tools. International Journal of Multimedia Data Engineering and Management, 4(4), 1-14. doi:10.4018/ijmdem.2013100101 Zhang, Z., Liu, T., & Zhang, W. (2014). Novel Paradigm for Constructing Masses in Dempster- Shafer Evidence Theory for Wireless Sensor Network’s Multisource Data Fusion. Sensors, 14(4), 7049-7065. Doi: 10.3390/s140407049 Zhou, X., & Chen, L. (2013). Event detection over twitter social media streams. The VLDB Journal, 23(3), 381-400. Doi:10.1007/s00778-013-0320-3. Zhou, S., Leung, H., & Yao, F. (2013). Multimedia Data Fusion. Mathematical Problems In Engineering, 2013, 1-3. doi:10.1155/2013/586259 Read More

Multimedia Data Fusion - Report Example

Extract of sample "Multimedia Data Fusion"

CHECK THESE SAMPLES OF Multimedia Data Fusion

Organizational Analysis: Nordstrom Inc

Social Fashion Application

Drug Design for HIV

Online Based Company Analysis and Benchmarking

The Feasibility of Virtual Fitting Room in the Fashion Industry

How the Internet is Being Used by Businesses and Government

New Ventures are All Around Us

Works of Shahzia Sikander