Malpedia: A Collaborative Effort to Inventorize the Malware Landscape
For more than a decade now, a perpetual influx of new malware samples can be observed. To analyze this flood effectively, static analysis is still one of the most important methods. Thus, it would be highly desirable to have an open, freely accessible, curated, and cleanly labeled corpus of unpacked malware samples for research on static analysis methods.
In this paper, we introduce MALPEDIA, a collaboration platform for curating a malware corpus. Additionally, we provide a baseline for a cleanly labeled malware corpus consisting of 607 families divided into 1792 samples. This corpus offers a plethora of possibilities for researchers, including using it as a testbed for evaluations on detection and analysis methods, quality assurance for classification, and contextualization of new malware. To ensure the quality of our corpus, we adapted the requirements by Rossow et al., derive specific requirements for the context of static malware analysis, and evaluate our corpus against them.
Based on our corpus, we show that looking beyond packers dramatically reduces the size needed for a corpus to be representative, as the number of distinct malware families and versions after unpacking is orders of magnitude smaller than the number of unique packed samples. Additionally, we perform a comprehensive study of the Windows malware in the corpus, scrutinizing its structural features. This analysis clearly illustrates that MALPEDIA offers a wealth of information, readily available for in-depth investigations.
 AV-Test GmbH, “Malware Statistics,” October 2017. Tracking website by AV-Test: https://www.av-test.org/en/statistics/malware/.
 T. Barabosch, N. Bergmann, A. Dombeck, and E. Padilla, “Quincy: Detecting host-based code injection attacks in memory dumps,” in Proceedings of the 14th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), Bonn, Germany, 2017.
 T. Barabosch, S. Eschweiler, and E. Gerhards-Padilla, “Bee master: Detecting host-based codeinjection attacks,” in Proceedings of the 11th InternationalConference on Detection of Intrusions andMalware, and Vulnerability Assessment (DIMVA),London, UK, 2014.
 FIRST Traffic Light Protocol Special InterestGroup, “TRAFFIC LIGHT PROTOCOL (TLP).”FIRST Standards Definitions and Usage Guidance: https://first.org/tlp/.
 C. Wagner, A. Dulaunoy, G. Wagener, and A. Iklody,“Misp: The design and implementation of acollaborative threat intelligence sharing platform,”in Proceedings of the 2016 ACM on Workshopon Information Sharing and Collaborative Security,pp. 49–56, ACM, 2016.
 G. Webster, B. Kolosnjaji, C. von Pentz, J. Kirsch,Z. Hanif, A. Zarras, and C. Eckert, “Finding the Needle:A Study of the PE32 Rich Header and RespectiveMalware Triage,” in Proceedings of the 14thConference on Detection of Intrusions and Malwareand Vulnerability Assessment (DIMVA), Bonn,Germany, 2017.
 D. Plohmann, “ApiScout: Painless Windows APIinformation recovery,” April 2017. Blog postfor ByteAtlas: http://byte-atlas.blogspot.de/2017/04/apiscout.html.
 V. Zwanger and F. C. Freiling, “Kernel mode apispectroscopy for incident response and digitalforensics,” in Proceedings of the 2nd ACM SIGPLANProgram Protection and Reverse EngineeringWorkshop (PPREW), Rome, Italy, 2013.
 Horsicq, “Detect-It-Easy,” 2014. GitHub Repository:https://github.com/horsicq/Detect-It-Easy/.
 Microsoft, “/SAFESEH (Image has Safe ExceptionHandlers),” tech. rep., Microsoft, 2017. MSDNArticle: https://msdn.microsoft.com/en-us/library/9a89h429(v=vs.110).aspx.
 Microsoft, “PE Format (Windows),” tech.rep., Microsoft, 2017. MSDN Article:https://msdn.microsoft.com/en-us/library/windows/desktop/ms680547(v=vs.85).aspx.
 N. A. Quynh, “Capstone disassembly engine.”http://www.capstone-engine.org/.
 D. Andriesse, J. Slowinska, and H. Bos, “Compileragnosticfunction detection in binaries,” in Proceedingsof the 2nd IEEE European Symposiumon Security and Privacy (EuroS&P), Paris, France,2017.
 Microsoft, “Debug Interface Access SDK,” 2015.MSDN Article: https://msdn.microsoft.com/en-us/library/x93ctkx8.aspx.
 M. Russinovich and D. A. Solomon, Windows Internals:Including Windows Server 2008 and WindowsVista, Fifth Edition. Microsoft Press, 5th ed.,2009.
 M. Galkovsky, “DLLs the Dynamic Way,”November 1999. Article for MSDN:https://msdn.microsoft.com/en-us/library/ms810279.aspx.
 M. Suenaga, “A Museum of API Obfuscation onWin32,” tech. rep., Symantec, 2009.
 B. Farinholt, M. Rezaeirad, P. Pearce, H. Dharmdasani,H. Yin, S. Le Blond, D. McCoy, andK. Levchenko, “To catch a ratter: Monitoring thebehavior of amateur darkcomet rat operators inthe wild,” in Proceedings of the 38th IEEE Symposiumon Security and Privacy (S&P), San Jose, CA,2017.
 T. Gardon, “New self-protecting USB trojan able toavoid detection,” March 2016. Blog post for ESET:https://www.welivesecurity.com/2016/03/23/new-self-protecting-usb-trojan-ableto-avoid-detection/.
 FireEye, “Tracking Malware with Import Hashing,”January 2014. Blog post for FireEye:https://www.fireeye.com/blog/threatresearch/2014/01/tracking-malwareimport-hashing.html.
 S. Tomonaga, “Malware Clustering usingimpfuzzy and Network Analysis,”March 2017. Blog post for JPCERT/CC:http://blog.jpcert.or.jp/2017/03/malwareclustering-using-impfuzzy-and-networkanalysis---impfuzzy-for-neo4j-.html.
 Y. Nativ, L. Ludar, and S. Shalev, “theZoo,” 2014.GitHub Repository: https://github.com/ytisf/theZoo.
 E. Freyssinet, Lutte contre les botnets : analyse etstratégie. PhD thesis, Université Pierre et MarieCurie - Paris VI, 2015.
 E. Freyssinet, “Botnets.fr,” 2011. Wiki: https://www.botnets.fr/wiki/Main_Page.
 Various, “Malware Wiki,” 2009. Wiki: http://malware.wikia.com/wiki/Main_Page.
 MITRE, “Adversarial Tactics, Techniques, andCommon Knowledge (ATT&CK),” 2015. Wiki:https://attack.mitre.org/wiki/Main_Page.
 MalwareHunterTeam, “ID Ransomware,” April2016. WebService: https://id-ransomware.malwarehunterteam.com/index.php.
 M. Hypponen, “Malware Museum,” February2016. Archive: https://archive.org/details/malwaremuseum.
 F. Skulason, A. Solomon, and V. Bontchev, “A new virus naming convention,” 1991. Article by CARO:http://www.caro.org/articles/naming.html.
 CME Editorial Board, “The Common Malware Enumeration (CME),” November 2006. Articleby CARO: https://cme.mitre.org/about/faqs.html.
 M. Sebastián, R. Rivera, P. Kotzias, and J. Caballero, “Avclass: A tool for massive malware labeling,” in Proceedings of the 19th International Symposium on Research in Attacks, Intrusions, and Defenses (RAID), Evry, France, 2016.
 C. Lever, P. Kotzias, D. Balzarotti, J. Caballero, and M. Antonakakis, “A lustrum of malware network communication: Evolution and insights,” in Proceedings of the 38th IEEE Symposium on Security and Privacy (S&P), San Jose, CA, 2017.
 Y. Ye, T. Li, D. Adjeroh, and S. S. Iyengar, “A survey on malware detection using data mining techniques,” ACM Computing Surveys (CSUR), 2017.
 M. Belaoued and S. Mazouzi, “An MCA Based Method for API Association Extraction for PE Malware Categorization,” International Journal of Information and Electronics Engineering, 2015.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.