Event Detection and Prediction Based on Social Multimedia Data Literature review Example | Topics and Well Written Essays

Data Mining Customer Inserts His/Her Name Customer Inserts Grade Course Customer Inserts Tutor’s Name 3rd March, 2014 Literature Review Twitter and social media trends have drastically changed in the recent past with millions of users going to the platform to chat, exchange ideas or share stories. As a result, this platform has formed a rich place for news, events and information mining. However, due to the huge burst in information, data mining in Twitter is a complicated venture that requires a lot of skills and information on important ways of undertaking data mining. Twitter and social media sites have traffic overflows which are multiple and huge in terms of the frequency (Memon, 2010). For instance twitter receives over 80 million tweets a day and this leads to billions of tweets per month. As a result, event prediction and detection requires the use of complex algorithms which go through the text and images in keyword matching process (Russell, 2011). One of the requisite skills includes extraction features and algorithms that could be deployed in mining data such text and data. Several data mining tools and algorithms have been developed with the capabilities and purpose of analysing data and text. Researchers and other people have come up with techniques of mining data from social media with the use of different types of algorithms (Liu, 2010). For instance, through the use of mining tools such as RData Mining tool, we can target some key words to mine within events and other forms of data from twitter streams. There are several techniques that could be used in the process of mining text and multimedia data in social media channels (Liu, 2012). One of the important uses of data mining within social media is on event detection in twitter or social media channels. Event detection and prediction within social media through the use of different data mining techniques and algorithms is common and growing within the social media sphere (Ting, 2012). Techniques such as mining events through geo-tagged events and geo-tweet photos have been utilized in regions such as Japan and Singapore to identify events such as Typhoons and floods. These techniques have been successfully in finding information on different events. These mining techniques make use of keywords to within bursts of Twitter streams for matching identities (Zafarani, 2014). As a result, these tweets are grouped into certain databases where they are analyzed. The processes of mining involved searching for keywords with emphasis on event detection with focus on words are frequent. Then these event keywords will be unified while geo-tweet photos which correspond with the keywords will be clustered and grouped together. Each of these photos will be matched against these events and shown on the map. In the process of event detection and prediction it is imperative to look into variable factors such as distinct languages and locations. This tool will address the gap found in the process and tasks which require quick event detection and prediction in Twitter and social media circles (Stefanidis, 2013, pp. 321-327). This project will look into data mining for the purpose of event prediction and detection through the use of two algorithms in the process of extraction. These methods will utilize different algorithms which will extract text and photo streams from twitter. The failure of having an accurate event detection and prediction method in the process of social media or twitter mining precipitates a problem that needs a solution (Marseken, 2011). As a result the use and combination of Term Frequency-Inverse Document Frequency method (TF-IDF) and Scale-Invariant Feature Transform (SIFT) algorithms might yield better results in the process of event prediction. These two algorithms have been effective in the process of mining and obtaining information on different events. The Term Frequency-Inverse Document Frequency method (TF-IDF) algorithm is effective in the process of extraction of text from twitter streams (Ting, 2012). This is because the TF_IDF algorithm will make use of a mathematical statistic to reflect the importance of certain words within different sets of twitter streams. The advantage of using the TF_IDF algorithm technique allows the retrieval of information since the TF_IDF values increase proportionally with the number of times a certain keyword appears in a social media or twitter stream (Liu, 2010). Text mining within Twitter is a complicated process that requires a powerful tool and algorithm to mine; retrieve and store keywords. Due to the huge number of tweets within Twitter, the TF_IDF algorithm makes use of weighing scheme whereby central mechanism is scoring ranking based on relevance of that has been obtained from user queries. The TF_IDF algorithm utilizes the combination of term frequency and inverse document frequency (Bramer, 2007). In the case of twitter, the TF_IDF algorithm will go through the keywords specified with keyword frequency which will be summed up when these keywords are discovered. The inverse document frequency will go through all the tweets on twitter and determine the events detected through the use of the keyword (Bramer, 2007). As a result, it will go through all tweets to determine the amount of information that will be provided and whether the keyword is common or rare across all tweets. The use of the TF_IDF algorithm is very effective in retrieving all the common and necessary texts which will be used in event detection and prediction (Wang, 2010, pp. 89-95). The Scale-invariant-feature-transform (or SIFT) is a type of algorithm that will be deployed in the process of extraction of images from Twitter. The SIFT algorithm has been used in the past for purposes of recognition of objects and images from database or the internet. SIFT has been used in the process of extracting key photos and scenes within bursts of information stored in Twitter or social media sites (Wang, 2010, pp. 96). The SIFT algorithm utilizes the use of feature generation which is the transformation of an image into feature vectors (Marseken, 2011). These vectors are in variation to the image translation, scaling, rotation and illumination changes. For accuracy in the process of extraction, SIFT deploys the use of feature matching and indexing; cluster identification and outlier detection. These mechanisms are useful in the deployment and implementation of image detection and matching. Feature matching and indexing utilizes the use of SIFT keys to identify matching keys from the new image being detected in the twitter stream based on the indentified images (Schmidt, 2012, p. A31). SIFT keys look into the ratio of the distance of the closest neighbour to the second closest distance. This algorithm rejects distance ratios greater than 0.8 and therefore it robustly used in identification of objects since it eliminated 99% of false matches. Another mechanism utilized in the detection of photos is by use of cluster identification through use of Hough Transform which looks for keys which agree with the target image (Kirsch, 2010, p. 88-90). For instance, a cluster of features from a certain image will be compared to other images to look for a match. These algorithms will be enhanced through the use of colour histograms in storing images and features of an image (Zafarani, 2014). These histograms utilize pixels in identifying images through classification of images in colour spaces. These colours might be in various spaces if it is RGB, RG chromaticity and any colour space of any dimension. The histogram of an image is produced first by the discretization of the colours in an image into a number of bins. Counting and grouping the number of image pixels in each bin forms a colour histogram of the image (Stefanidis, 2013). These algorithms will be deployed in the process of event detection and prediction since it is superior to text mining. Combining text, media and multimedia mining is superior since it more accurate in pin pointing out events from social median and twitter streams. This is because enhancing text mining with image mining enhances the task of detection and prediction of events (Memon, 2010). The development of a tool that utilizes the combination of these algorithms will ensure fewer and relevant searches are obtained based on keywords and images. References Bramer, M. (2007). Principles of Data Mining. Manchester: Jones & Bartlett Learning. Kirsch, S. (2010). “Sustainable Mining.” Dialectical Anthropology, 34(1), pp. 87-93 Liu, B. (2012). Sentiment Analysis and Opinion Mining. Sydney: Jones & Bartlett Learning. Liu, H. & Tang, L‎. (2010). Community Detection and Mining in Social Media. Chicago, IL: John Wiley and Sons. Marseken, S. (2011). Object Recognition and Categorization: Scale-Invariant feature transform. Savannah, GA: Lippincott Williams & Wilkins Memon, N., Xu, J. &‎ Hicks, D. (2010). Data Mining for Social Network Data. New York, NY: Routledge. Russell, M. (2011). Mining the Social Web: Analyzing Data from Facebook, Twitter and Social Media. London: Palgrave. Schmidt, C. (2012). “TRENDING NOW: Using Social Media to Predict and Track Disasters.” Environmental Health Perspectives, 120(1), pp. A30-A33 Stefanidis, A., Crooks, A. & Radzikowski, J. (2013). “Harvesting ambient geospatial information from social media feeds.” GeoJournal, 78(2), pp. 319-338. Ting, I. & Hong, T. (2012). Social Network Mining, Analysis, and Research Trends: A Phenomenal Analysis. Boston, MA: Cengage Learning. Wang, Z. & Shawe-Taylor, J. (2010). “A kernel regression framework for SMT.” Machine Translation, 24(2), pp. 87-102. Zafarani, R. ‎ Abbasi, M. & ‎ Liu, H. (2014). Social Media Mining: An Introduction. Lowell, MA: John Wiley and Sons. Read More

Event Detection and Prediction Based on Social Multimedia Data - Literature review Example

Extract of sample "Event Detection and Prediction Based on Social Multimedia Data"