This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.
The last decade’s growing interest in deep learning was triggered by the proven capacity of neural networks in computer vision tasks. If you train a neural network with enough labeled photos of cats and dogs, it will be able to find recurring patterns in each category and classify unseen images with decent accuracy.
What else can you do with an image classifier?
In 2019, a group of cybersecurity researchers wondered if they could treat security threat detection as an image classification problem. Their intuition proved to be well-placed, and they were able to create a machine learning model that could detect malware based on images created from the content of application files. A year later, the same technique was used to develop a machine learning system that detects phishing websites.
The combination of binary visualization and machine learning is a powerful technique that can provide new solutions to old problems. It is showing promise in cybersecurity, but it could also be applied to other domains.
Detecting malware with deep learning
The traditional way to detect malware is to search files for known signatures of malicious payloads. Malware detectors maintain a database of virus definitions which include opcode sequences or code snippets, and they search new files for the presence of these signatures. Unfortunately, malware developers can easily circumvent such detection methods using different techniques such as obfuscating their code or using polymorphism techniques to mutate their code at runtime.
Dynamic analysis tools try to detect malicious behavior during runtime, but they are slow and require the setup of a sandbox environment to test suspicious programs.
In recent years, researchers have also tried a range of machine learning techniques to detect malware. These ML models have managed to make progress on some of the challenges of malware detection, including code obfuscation. But they present new challenges, including the need to learn too many features and a virtual environment to analyze the target samples.
Binary visualization can…