YARA-Signator: Automated Generation of Code-based YARA Rules
Effective detection and identification signatures are an important component in the toolkit for malware analysis. The creation of such signatures is still widely a manual task that requires notable experience and knowledge on the side of analysts. In this paper, we present YARA-Signator, an approach for the automated generation of code-based YARA rules. The method is based on the isolation of instruction n-grams that on the one hand appear frequently within a malware family and on the other hand are not found in any other family.
Applying YARA-Signator to the Malpedia data set, we show that in fact on average 51.85% of the instruction n-grams of length 4 and higher are only found in the respective family. The rules produced by the system using this data set achieve an overall F1 score of 0.983 and cause only very few false positives in a sanity check against a large goodware data set. YARASignator is made available as open source and a periodically updated reference rule set is provided for free through Malpedia.
D. Bianco, “The pyramid of pain.” Blog post: http://detect-respond.blogspot.com/2013/03/the-pyramid-of-pain.html.
C. Blichmann, “Automatisierte Signaturgenerierung für Malware-Stämme,” 2008. Diploma Thesis.
J. Zaddach and M. Graziano, “Bass - bass automated signature synthesizer,” 2017. Github repository: https://github.com/Cisco-Talos/BASS.
D. Plohmann, M. Clauß, S. Enders, and E. Padilla, “Malpedia: a collaborative effort to inventorize the malware landscape,” Proceedings of Botconf, 2017.
L. Gibelli, T. Edvin, T. Kojmnet, A. Wu, and N. Horne, “Clamav - open source anti virus engine,” 2004. Website: https://www.clamav.net/.
V. M. Alvarez, “Yara - the pattern matching swiss knife for malware researchers,” 2014. Website: http://virustotal.github.io/yara/.
I. Intel, “Intel-64 and ia-32 architectures software developer’s manual,” 2013.
R. Edward, Z. Richard, R. Cox, J. Sylvester, P. Yacci, R. Ward, A. Tracy, M. McLean, and C. Nicholas, “An investigation of byte n-gram features for malware classification,” Journal of Computer Virology and Hacking Techniques, 2016.
C. Blichmann, “vxsig - automatically generate av byte signatures from sets of similar binaries.,” 2019. Github repository: https://github.com/google/vxsig.
T. Dullien, E. Ventura, S. Meyer-Eppler, T. Kornau, C. Blichmann, and J. Newger, “Zynamics,” 2004. Website: https://www.zynamics.com/software.html.
S. H. Ding, B. C. Fung, and P. Charland, “Kam1n0: Mapreduce-based assembly clone search for reverse engineering,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, (New York, NY, USA), p. 461–470, Association for Computing Machinery, 2016.
F. Roth, “yarGen,” 2013-12-18. Github Repository: "Github repository: https://github.com/Neo23x0/yarGen.
C. Doman, “Yabin,” 2018. Github repository: https://github.com/AlienVault-OTX/yabin.
H. Yi, “Hyara (ida plugin),” 2018. Github repository: https://github.com/hy00un/Hyara.
KoreLogic Security, “Converting ida pat to yara signatures,” 2013. Blog post: https://blog.korelogic.com/blog/2013/11/15/pat2yara.
W. Ballenthin, “Yara-fn,” 2019. Github repository: https://github.com/williballenthin/idawilli/tree/master/scripts/yara_fn.
J. Martin, j0sm1, jovimon, and mmorenog, “Yara rules,” 2018. Github repository: https://github.com/Neo23x0/signaturebase/tree/master/yara.
F. Roth, “Yara rules from signature base,” 2018. Github repository: https://github.com/Neo23x0/signature-base/tree/master/yara.
M. Worth, “Open-source-yara-rules,” 2018. Github repository: https://github.com/mikesxrs/Open-Source-YARA-rules.
R. Wesson and SupportIntelligence, “Project icewater,” 2018. Github repository: https://github.com/SupportIntelligence/Icewater.
D. Plohmann, “SMDA - a minimalist recursive disassembler library for x86/64.,” 2018. Github repository: https://github.com/danielplohmann/smda.
N. A. Quynh, “Capstone: Next-gen disassembly framework,” 2014. Website: http://www.capstone-engine.org/BHUSA2014-capstone.pdf.
F. Bilstein, “Automatic generation of code-based yara-signatures,” 2018. Bachelor Thesis.
C. Cohen and J. Havrilla, “Function Hashing for Malicious Code Analysis,” tech. rep., SEI, CMU, 2009.
V. Chvatal, “A greedy heuristic for the set-covering problem,” Math. Oper. Res., vol. 4, p. 233–235, Aug. 1979.
M. Stonebraker, “Postgresql,” 1989. Website: https://www.postgresql.org/.
B. Levene and J. Grunzweig, “Sure, I’ll take that! New ComboJack Malware Alters Clipboards to Steal Cryptocurrency.” Blogpost: https://researchcenter.paloaltonetworks.com/2018/03/unit42-sure-ill-take-newcombojack-malware-alters-clipboardssteal-cryptocurrency/.
T. Micro, “ShurL0ckr Ransomware as a Service Peddled on Dark Web, can Reportedly Bypass Cloud Applications.” Blogpost: https://www.trendmicro.com/vinfo/us/security/news/cybercrime-and-digitalthreats/shurl0ckr-ransomware-as-aservice-peddled-on-dark-web-canreportedly-bypass-cloud-applications.
J. Grunzweig and K. Wilhoit, “The Fractured Block Campaign: CARROTBAT Used to Deliver Malware Targeting Southeast Asia.” Blogpost: https://unit42.paloaltonetworks.com/unit42-the-fractured-block-campaigncarrotbat-malware-used-to-delivermalware-targeting-southeast-asia/.
M. Talbi, “De-obfuscating Jump Chains with Binary Ninja.” Blogpost: https://thisissecurity.stormshield.com/2018/03/20/de-obfuscating-jump-chains-withbinary-ninja/.
D. Plohmann, “Patchwork: Stitching against malware families with IDA Pro.” Presentation for SPRING2014: https://public.gdatasoftware.com/Web/Landingpages/DE/GISpring2014/slides/004_plohmann.pdf.
D. Plohmann, “Empty msvc,” 2019. Github repository: https://github.com/danielplohmann/empty_msvc.
E. Raff, W. Fleming, R. Zak, H. Anderson, B. Finlayson, C. Nicholas, and M. McLean, “Kilograms: Very large n-grams for malware classification,” 2019.
gbrindisi, “Gozi ISFB Sourceccode.” Github Repository: https://github.com/gbrindisi/malware/tree/master/windows/gozi-isfb.
A. Ivanov, “Scarabey Ransomware.” Blogpost: https://id-ransomware.blogspot.com/2017/12/scarabey-ransomware.html.
Copyright (c) 2020 Felix Bilstein, Daniel Plohman
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.