Android Malware Detection Using Machine Learning or Other Techniques Research Paper Example | Topics and Well Written Essays

In an attempt to build a permission-based model, they conducted an experiment using 2000 android applications, with half of them being malware applications from over 49 different malware families and the other half were benign apps. The decrypted manifest file of each application was passed through an APK analyzer and a permission detector was used to match 131 standard Android permissions (Spreitzenbarth, Schreck, Echtler, Arp, and Hoffmann, 2015). When a permission was detected, its count was increased and stored. The stored total for each permission is further used by the feature selection function to rank and select the most relevant features for the permission-based Bayesian classifier (Arp et al., 2014)

Intents

In Android, intents are the entities that explain the operation that is to be performed by the phone. In this state, the intent is where data can be retrieved maliciously as intents are performed by the system defined in the Android manifest file. In DroidMat (Wu et al., 2012), the Android APIs, intents and permissions were drawn from the manifest files and analyzed under a machine learning algorithm, which showed that DREBIN detected the largest amount of malware from the APKs (Arp et al., 2014). According to Gardiner and Nagaraja (2016), big organizations harmed by malware have been compromised and classified information, such as contacts, business plans, and manufacturing designs, have been obtained (Narayanan, Chandramohan, Chen, Liu, and Saminathan, 2016).

Classifier-based methods

These methods employ recent observations to design labels for new samples. This method is a form of supervised learning due to the fact that the methods are attached at the end of each data set with known data point, hence making data collection easy at any point (Cen, Gates, Si and Li, 2015).

Communication pattern detection

It involves a boosted decision tree that identifies hosts in two system steps. These are that the system identifies the host’s network and then passes traffic through all the classifiers. If one classifier outputs a result above the given threshold, the host is viewed as running an application. A random forest classifier then merges the two results using the original application and is able to obtain the TP. The effectiveness of the application-classifier view respectively (Gardiner and Nagaraja, 2016).

Application characterization

This consists of API related permissions, intents, and command related features extracted from the application and input into the Eigenspace analysis systems. The features obtained for Dalvik's executable files include calls that identify the subscriber and the executed external commands. This could be intercepted or extracted elsewhere by the system activity and used by an intent permitted application to get the user data (Chen et al., 2016).

Code hiding

A new form of malware is another security problem where developers hide malicious codes into their system, for instance, Ginger master malware hides the Bash script in innocent files, like install.png, which later the application uses an algorithm to extract the malicious codes and execute them (Apriville et al., 2014)

Payloads

Hiding malicious code in the payloads of an application is a new trend, the malicious code is hidden in the main app resource. This trick lures the user into downloading and installing the hidden application and when done, the application loads the Bash files that are used in the user phones (Rasthofer, Arzt, and Bodden, 2014).

Features on Resources

Certificates

Certificates can contain malicious scripting that is downloaded together with the given applications. The certificates may prompt the user and gain access to critical information that the user is not aware of, thus putting them at an risk as their private content can be downloaded without their knowledge (Mariconti et al, 2016).

Incognito applications

Finding an incognito application extraction requires the syntactic and the resource-centric feature. Such apps are usually stored in form of APK and DEX (Vigna et al., 2014), such codes can cause harm to the user by accessing their information.

Approaches to Malware Detection

According to Gu et al. (2008), Bootneck is a system that detects hosts who are infected without prior knowledge. The system separates hosts and checks for similar characteristics to merge. The system monitors signature produced from the Snort IDS. Antokakis et al. (2012) gives a description of the system for identifying previously unseen DGAs by taking into consideration the fact that the DGAs take a large result from non-existent domain (NXDomain) responses (Suarez-Tangil et al., 2017). The system first clusters these processes and takes them into a filtering step, which takes into account the multiclass variant of the alternating decision tree (ADT) classifier. The system then produces a Markow Model (HMM) for every DGA that could be used to evaluate each domain. Tam et al. (2017) suggests dynamic analysis is in place in order to detect the malware, by having a prior definition of input scripts that will be performed when the device is running. However, knowledgeable adversaries can side step this and still trigger malicious behaviors. Dynamic analysis can only detect malicious activities if the malicious code is running during the analysis (Yu, Huang, and Yian, 2014).

Graph-based detection

Graph based detection is important due to its ability to represent communication patterns in a statistical component of an algorithm. In discovering anomalies, graph analysis was introduced to detect botnets. The graph-based model is preferred as it takes into account how the user connects and covers a larger area, due to the sudden growth of edges between the neighboring nodes. The method is independent, as it avoids a dependence on protocol semantics and static packets. However good the method is, it does not keep track of the hosts’ history, thus making it challenging in the case of an infection and it seeks help. Yu, Huang, and Yian (2014) explained the traffic graph in its dynamism to categorize network flow detected by point-to-point networks. Node and edge centered matrices were used (dynamic and static) in conjunction with the largest-connected-size as a graph level matrix. Gardiner and Nagaraja (2016) provide the example of a proposal by BotGrep, which was based on data mining methodology to discover point-to-point graphs, while finding expander graphs using the random walk. The purpose of the random walk is to extract all the point-to-point networks after clustering. The system is able to locate point-to-point networks on ISP data-level with an increased rate of 98% on TP and a 0.4% rate on FP. BotGrep uses the same operations but has an added communication graph and then applies the Laplace-Beltrami formula to reduce the data dimensionality; then apply point-to-point network at random walk. Invernizzi et al. (2014) uses the binary approach to detect the download stage of malware infections in a large-scale network, the information collected from the traffic created by the http is used to create the neighborhood graph, where nodes stand for IP address, domain names, FQDNs, URLs filenames, downloaded files (represented by hashes) and paths. Each host has generated graphs in the order that is suspected of having malware. Yu, Huang, and Yian (2016) incorporate a malicious domain detection into the graph. An undirected graph is generated where nodes are the hosts and domain and edges move from hosts to a domain that they access.

Using instantiated ground truth, belief propagation is applied until stabilization occurs in the blacklists and whitelists. If the final belief value is above the threshold, then the domain is labeled as malicious with a more advanced method of detecting the malware within 16 minutes of the first operation. Yu, Huang, and Yian (2014) also applies the graphical technique for curbing a suspicious domain. They particularly focus on NGAs failure where the domain's query is unresolved and could be an indication of a bot activity, such as FFSNs. The aim of the work is to collect domain clusters that exhibit similar behavior, which can be analyzed further as inputs to identify particular attacks (Yu, Huang, and Yian, 2014).

Research Gap

The security of data is at high risk of cyber-attack and hacking methods or techniques are increasing day in day out at a very high rate, unlike the methods of preventing data attacks (Vigna et al., 2014). One of the major technology gaps is malware application profiling. In that connection, research needs to be conducted for malware to be able to detect it easily as it is correlated to the phenomenon called profiling for its application in a bid to simulate users’ realistic inputs plus malicious activities. Crow droid do collect the behavior of the data direct from the users through crowdsourcing. Evaluation of data is then done with the help of an algorithm (Clustering Algorithm) (Narayanan, Chandramohan, Chen, Liu, and Saminathan, 2016).

A significant research gap is the study of a dynamic collection of malware in mobile devices (machine learning algorithm) and its detection. Several methods or approaches have ever been proposed, though with no experimental grounds for comparison. It is thus clear that determination of the algorithm to use as we develop other techniques as well as the way or method to use in its validation is very difficult. Therefore, an experiment to demonstrate the effectiveness of various algorithms (machine learning algorithms) needs to be conducted to be able to come up with a good way of detecting malware. Most of the currently used algorithms include Multilayer perceptron and random forest but unfortunately, they give poor results in training machine learning algorithms (Vigna et al., 2014). The effect poses a challenge to discover machine-invariant hardware measurements.

it is necessary to undertake an experiment as well as to carry out a detailed study of efficient classifiers of machine learning algorithms, which is a challenge at present. The major challenging issue is the absence of research involving several samples of mobile malware, in order to conduct an analysis of well-familiarized machine learning classifying techniques (Narayanan, Chandramohan, Chen, Liu, and Saminathan, 2016). A well-organized study needs to be done in order to understand the efficiency of the present algorithms in a bid to come up with reliable empirical outcomes from bigger established experiments.

Most existing techniques are static; they can detect a potential attack. There are only a few techniques that use dynamic analysis which, can help detect and prevent an attack at runtime. All techniques researched, including the machine learning, approach have drawbacks which limit their precision. A good example is Flow Droid, it is oblivious to multi-threading and it means that it cannot effectively resolve reflective calls. The gap in research arises when most current studies rely on ‘in the lab’ validation contexts (Vigna et al., 2014). These are the machine learning techniques that have been proposed as the best techniques to deal with malware detection in Android devices. They provide valid data about the behavior in the real world, or ‘in the wild.’ However, the studies do not provide conclusive data, since a performance difference arises between the lab setting and the real-world setting. None of the techniques take into account the environment that mobile devices interact with. Taint droid monitors the leaking of confidential information from mobile phones (Narayanan, Chandramohan, Chen, Liu, and Saminathan, 2016). But it is not able to detect the destination of this information, where is it being leaked to. Such considerations are very important in Android mobile phones.

Possible Future Research Direction

Surveys of existing research on cell safety identified several fruitful directions for future studies, together with developing dynamic solutions to prevent move-system privilege escalation attacks that contain user manipulations and intermediate network services. For example, a malicious mobile app can take advantage of person manipulations by way of showing a UI this is overlaid on the pinnacle of the victim's (Yerima, Sezer, and McWilliams, 2014). A consumer may additionally touch precise buttons that trigger the delivery of touch activities to the malicious app via IPC. The malicious app then forwards those activities to the sufferer’s app by signaling to the occasion dispatch mechanism that its manner cannot take care of the circumstances. As a result, the activities are forwarded to the UI elements of a sufferer that the malicious application wants to tamper with and manipulate. Likewise, in a network channel attack, a malicious app uses tool-specific records to ship a message that seems to originate from some other manner on a tool to a community service. The community service believes that the message comes from another on-tool method and sends a reaction to this method. While a benign on-tool procedure gets the message, it triggers a movement that is desired with the aid of a malicious app (Yerima, Sezer, and McWilliams, 2014). These kinds of sophisticated go-Procedure attacks are not thoroughly addressed by cutting-edge studies.

Another promising research course is addressing cell security while concurrently taking into consideration the environment in which a mobile device interacts. This work is particularly beneficial as its deployments proliferate, e.g. in the context of home automation, a few critical research questions that should be addressed encompass:

When controlling equipment at home, how is the person’s act-

Turns include ensuring no malicious app overtakes the controls without the user’s aim?

When checking the status of controls at home, what regulations and mechanisms can make certain the statistics offered to a person are straight forward and no longer presented with the aid of malware?

Every other example that requires robust cell safety answers

Are a cellular tool’s interactions with its environment? e.g., when a mobile phone is paired with a vehicle and it likewise senses records from a driver’s tempo maker. If a driver starts feeling unwell, the tempo maker sends this data to the cell phone, which in turn directs the vehicle to pull over, opens the doors and dial a medical emergency service. It may be viable, however, that a malicious driver behind takes control of the auto in front and directs it to conduct a robbery (Yu, Huang, and Yian, 2014). Security considerations emerge as particularly pretty important in such scenarios in which a malicious system on a cellular tool might also now not simply steal non-public records or inject malicious information, but can act. Best friend bodily affects consumer safety or protection. The directions outlined in advance can be mixed to expand protection answers that recall the surroundings wherein devices perform and interact to display IPC flows to hit upon and thwart dynamically (Yerima, Sezer, and McWilliams, 2014).

The major future research direction should be developing and evaluating:

Dynamic safety gear to save you pass-method privilege escalation attacks, regarding consumer manipulations and intermediate community services.

Protection answers for cellular devices as they interact with their environment.
Defensive equipment concerning Android platform fragmentation

Conclusion

In this study, it is evident that Android security could be breached in many ways by taking advantage of the little loop holes. Therefore, security measures ought to be taken, such as using advanced system options like Droid screen, which is an effective way of solving malware in Android devices. On the Eigenspace approach, Eigen techniques shows a high rate of malware detections of up to 96.4% on accuracy and as low as 3.6% on inaccurate data, hence is the proposed approach when running Android malware analysis. Using the Markov chain (Mama Droid) also proved popular in detecting malware, as it operates in an environment with different granularities. The droid effectively detected unidentified malware produced almost at the same time as that of the sample (F-measure 99%) and maintains a good detection of approximately 87% p.a.

Android Malware Detection Using Machine Learning or Other Techniques - Research Paper Example

Extract of sample "Android Malware Detection Using Machine Learning or Other Techniques"

CHECK THESE SAMPLES OF Android Malware Detection Using Machine Learning or Other Techniques

An Assessment of the Vulnerabilities of the iPhone

The Changes in the Management Processes of Enterprises

How Can Artifficial Intelligent Enhance Life in the 21st Century

Modern Technology and the Efficiency of Football Players

A Study on Threat Awareness and Use of Countermeasures among Online Users

Using the Internet of Things Device in the Hackinig Process

Future of Operating System Security

Android Malware Detection and Prevention Techniques