Yara Studies: A Deep Dive into Scanning Performance
You probably know this scenario - you spent a while analyzing new samples, which was not easy, but you’re finally done. You also created a neat Yara rule to match the samples, and you're ready to send it off and move on to your next task (or lunch). But oopsie - the Yara rule is warning of slowed scanning. Or your colleague comments they do not like a particular part and wants to be sure the rule is effective.
While working with Yara, I consulted with many analysts about this problem. They knew what they wanted to detect, but Yara was not always helping them write the rules more effectively. Based on my experience with algorithms used in Yara, we worked together to find a solution to improve scanning speed and limit potential hurdles for future usage.
This paper presents five studies with descriptions of the five problems, an explanation of why Yara does not like the first solution, and tips on what can be improved. Note that no sensitive information is disclosed in this paper. All studies were anonymized, so the general problem is the same, but there is no direct link to a specific malware family mentioned, nor can it be tracked.
“The Official Yara Documentation.” https://yara.readthedocs.io/en/v4.2.3/.
“Awesome YARA.” https://github.com/InQuest/awesome-yara.
D. Regéciová, “Yara: Down the Rabbit Hole Without Slowing Down,” The Journal on Cybercrime & Digital Investigations, vol. 7, 2022.
D. Regéciová, D. Kolář, and M. Milkovič, “Pattern Matching in Yara: Improved Aho-Corasick Algorithm,” IEEE Access, vol. 9, pp. 62857–62866, 2021.
“Yara GitHub.” https://virustotal.github.io/yara/.
E. de O. Andrade, “MC-dataset-binary and MCdataset-multiclass.” https://figshare.com/authors/Eduardo_de_O_Andrade/4923649.
B. Bosansky, D. Kouba, O. Manhal, T. Sick, V. Lisy, J. Kroustek, and P. Somol, “Avast-CTU Public CAPE Dataset,” 2022.
“Regex Performance.” https://github.com/rust-leipzig/regex-performance.
“Yara Performance Guidelines.” https://github.com/Neo23x0/Yara-Performance-Guidelines.
Copyright (c) 2023 Dominika Regéciová
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.