3260 papers • 126 benchmarks • 313 datasets
Malware Classification is the process of assigning a malware sample to a specific malware family. Malware within a family shares similar properties that can be used to create signatures for detection and classification. Signatures can be categorized as static or dynamic based on how they are extracted. A static signature can be based on a byte-code sequence, binary assembly instruction, or an imported Dynamic Link Library (DLL). Dynamic signatures can be based on file system activities, terminal commands, network communications, or function and system call sequences. Source: Behavioral Malware Classification using Convolutional Recurrent Neural Networks
(Image credit: Papersgraph)
These leaderboards are used to track progress in malware-classification
Use these libraries to find malware-classification models and implementations
This paradigm is presented and discussed in the present paper, where emphasis has been given to the phases related to the extraction, and selection of a set of novel features for the effective representation of malware samples.
Unlike other compression-based distance metrics known to us, the new Burrows Wheeler Markov Distance works by embedding sequences into a fixed-length feature vector, which allows it to provide significantly improved clustering performance on larger malware corpora, a weakness of prior methods.
In this representation instances from the same class are close to each other while instances from different classes are further apart, resulting in statistically significant improvement when compared to other approaches on three datasets from two different domains.
This paper proposes the use of techniques from explainable machine learning to guide the selection of relevant features and values to create effective backdoor triggers in a model-agnostic fashion, and demonstrates effective attacks against a diverse set of machine learning models.
Malware detection and classification is a challenging problem and an active area of research. Particular challenges include how to best treat and preprocess malicious executables in order to feed machine learning algorithms. Novel approaches in the literature treat an executable as a sequence of bytes or as a sequence of assembly language instructions. However, in those approaches the hierarchical structure of programs is not taken into consideration. An executable exhibits various levels of spatial correlation. Adjacent code instructions are correlated spatially but that is not necessarily the case. Function calls and jump commands transfer the control of the program to a different point in the instruction stream. Furthermore, these discontinuities are maintained when treating the binary as a sequence of byte values. In addition, functions might be arranged randomly if addresses are correctly reorganized. To address these issues we propose a Hierarchical Convolutional Network (HCN) for malware classification. It has two levels of convolutional blocks applied at the mnemonic-level and at the function-level, enabling us to extract n-gram like features from both levels when constructing the malware representation. We validate our HCN method on the dataset released for the Microsoft Malware Classification Challenge, outperforming almost every deep learning method in the literature.
The exploration of an even more optimal DL-SVM model is the next stage towards the engineering of an intelligent anti-malware system that utilizes the power of deep learning models.
This paper proposes a file agnostic deep learning approach for categorization of malware that exploits the fact that most variants are generated by using common obfuscation techniques and that compression and encryption algorithms retain some properties present in the original code.
Specialized models that can handle extremely long sequences while successfully performing malware detection in an efficient way are presented, including an implementation of the Convoluted Partitioning of Long Sequences approach.
This work presents a Deep Learning based malware classification approach that requires no expert domain knowledge and is based on a purely data driven approach for complex pattern and feature identification.
This paper proposes four easy-to-extract and small-scale features, including sizes and permissions of Windows PE sections, content complexity, and import libraries, to classify malware families, and uses automatic machine learning to search for the best model and hyper-parameters for each feature and their combinations.
Adding a benchmark result helps the community track progress.