Hunting malwares with Yara

A beginner’s guide to one of the most widely used tool in the malware analysis industry.

Andrea
InfoSec Write-ups

--

Yara logo
Yara Logo

Hi guys !

In recent years we are witnessing, as the data show, an incredible increase in malware attacks that cause billions of dollars of damage every year.

Ransomware damage graph

I have recently become interested in this world, and today I want to show you one of the essential tools for malware hunting, Yara.

This tool consists of a set of rules that allow us to identify all files that match them.
Don’t worry if you are confused, even I was quite lost when I first started using it.
Let’s get started!

You can install Yara on Linux with “sudo apt install Yara” and on Windows by downloading the zip files at the following link.

Releases · VirusTotal/yara · GitHub

Creating a file for yara is very easy, you can simply create a txt file and change the extension to .yar.

The first thing to do is to create our rule and name it as we like.

Next we have to put the meta section here we can enter information such as description, author, date and much more.

Note: Yara has a very large set of rules I will only put the most important ones, however at the end of the article I will leave you the link to the documentation.

Remember that strings must always be placed between “ “

Now comes the string section.
Yara makes use of file identification on these.

For the uninitiated, strings are the text that can be read from a binary file.

Take this c++ file as an example.

Once compiled and executed, it will print ‘Hello’ on the console.

If we run the strings command on Linux, we can see the string “Hello”.

Note: You will certainly have noticed all the other strange strings, don’t worry they are normal, in fact they are added at compile time.

Let’s pretend our file is malware, we could add the string hello as it allows us to identify our file.

Note: Strings need $ in order to be declared.

Now we do something a little more difficult.

Each file has a precise sequence of bytes at the beginning that serve to identify the filetype.

List of file signatures — Wikipedia

For example, an executable in Linux (ELF) has the sequence 7f 45 4c 46 02 01 01.

Note: On Windows (MZ) it’s 4D 5A.

We could add this sequence of hexadecimal numbers to avoid false positives.

In fact, if we did not add the magic bytes if there was a txt file or any other extension in the filesystem that contained the string “Hello” inside it would be reported, which we do not want.

Note: Hexadecimal strings must be put between { }.

The last section is the condition section.

Here, we specify the conditions for locating the file (s).

There are several keywords.

And: True when both conditions are true.

$first_string and $second_string

Identifies the file when both strings are present.

Or: True when at least one is present.

$first_string or $second_string

It identifies the file when there is at least either the first or the second string.

Any of them: True only when all strings are present.

Let’s look at an example.

We declare a meaningless string by calling it casual_string using the or operator, we would still be able to locate the file since the string “hello” is present.

Another very useful keyword is “at” which is used to check whether the string is present at the specified offset.

The offset is the distance in bytes from the beginning of the file, since the magic bytes is at the beginning it will have an offset of 0.

Well now let’s try to see if everything works correctly let’s run Yara and as we can see here is our file.

As mentioned earlier Yara is a very advanced tool, and it would probably take a whole book to describe all the features it contains, if you are interested I leave here the link to the official documentation.

Welcome to YARA’s documentation! — yara 4.2.0 documentation

I particularly recommend looking at the modules because they are very useful.

Bonus: yarGen

As time goes on, malwares are becoming more complex and unpredictable, and manually writing yara rules can be a very difficult task.

For this, yarGen comes to our rescue!

GitHub — Neo23x0/yarGen: yarGen is a generator for YARA rules

Once installed, I recommend that you use the following command.

python3 yarGen.py -a "Your name" --excludegood -z 2 -m 'YourDirectory' -o Yourfile.yar
  • -a: Is used to give the author’s name to the rule.
  • — excludegood: yarGen might put non-malicious strings inside the rule to avoid this we use this flag.
  • -z: yarGen gives a score to strings, the higher it is the more likely the string is traceable to something malicious, to avoid false positives we put the minimum score at 2.
  • -m: Is used to specify the directory where the file is loacted.
  • -o: Used for the output file.

Note: yarGen works on directories and not on files, so make sure the file is the only one in the directory to avoid inconvenience also remember that a tool does not completely replace a human, so it is good to learn how to manually write rules for yourself as well.

Conclusion

I hope this article has been helpful to you and helped you better understand the potentialities that this tool offers.

I plan to bring more articles on malware analysis because I believe it is a field that is still given too little importance.

See you in the next article, bye guys!

--

--