Using Generative AI to Predict Cyberattacks

Honeypots and LLMs, oh my!

Kelvin Winborne
InfoSec Write-ups

--

Image created with Microsoft Copilot

It’s easy to fall into the mental trap of everything appearing to be doom and gloom because of the nature of my job. I work in cybersecurity, so if you show me some technology, my mind immediately thinks about how it can potentially be abused by hackers. Immediately. Generative AI is no different.

However, it can honestly be a boon to cyber defenders, moreso than threat actors, per top US cyber officials.

I’m writing this to show just that. I wanted to see how well AI could be used to produce actionable insights to help defenders address future attacks to their organization.

Planning & Direction

To reproduce this, you’ll need:

  • Two sets of IDS or honeypot logs at different times (I used T-Pot)
  • Any generative AI tool (I used Claude 3 Opus)

For simplicity’s sake (Read: size limits explained later in this post), here are the things we wish to predict in the next series of attacks:

  • source attacker IPs
  • destination port numbers
  • known attackers

To test against the accuracy of the data, we’ll essentially be following the cyber threat intelligence lifecycle:

  1. Planning & Direction — This section.
  2. Collection — Take a data export from our honeypot early after deployment.
  3. Processing — Upload it to our AI solution.
  4. Analysis & Production — Let it analyze then generate a report on the findings.
  5. Dissemination & Feedback — This blog post.

Technically, this section is called Planning & Direction, but in this case it’s less “clearly defined intel requirements provided by key stakeholders” and more a “caffeine-fueled late nighter.”

Also note that the cyber threat intelligence lifecycle changes depending on the vendor, analyst, intel requirements, etc. It could be five phases, six phases, combine multiple phases, keep them separate… it goes on. What really matters is that defined and distinct steps are applied to the production of an intel product to confidently derive actionable conclusions.

For this blog, one of the most important things to consider is that we’ll be testing the analysis produced by the AI tool against data that already exists. Essentially we’ll be asking it to predict something that already happened.

The definition of success is that the AI tool’s analysis closely matches the raw data.

Collection

The first dataset needs to be any time after starting the IDS/honeypot while the second data set needs to be set at any arbitrary time after the original dataset is collected.

For my logs, I installed T-Pot on an Azure VM:

Just follow the setup instructions and open up the portals:

Looks cool but honestly isn’t useful or valuable…

In reality, the Elastic logs themselves provided the required data. For this exercise I honestly didn’t need a lot. I just needed enough data to potentially predict some future activity at a later timeframe.

So now I have two sets of logs at approximately two separate timestamps:

  • March 28, 2024 @ 2:22:14–2:25:55AM UTC
  • March 28, 2024 @ 9:13:56–9:17:31AM UTC

So, the first set of data:

Processing

Unfortunately, Claude couldn’t process my *.xlsx files in their entirety:

This forced me to be very selective with the data I provided. So I instead asked it to process four columns of interest and wait until I was done uploading them to finally produce the output I wanted:

Then:

On to the next phase…

Analysis & Production

With the very limited data I could give Claude, I’m relegated to a very simple intel product to request. I ask it to process the data with the following prompt:

Thank you. I’m a cybersecurity professional looking to take some data from a honeypot that I deployed, get it analyzed, and predict the next series of attacks. Please see the attached tables. The rows for each table correspond to each other, and were only uploaded in this format because the full spreadsheet was incapable of being processed by your file upload option.

In column A, the timestamps are approximately between 2:22 and 2:26AM UTC on March 28, 2024. In column MQ, I’d like for you note the source IP addresses and predict the likelihood that they will attack my honeypot again at approximately between 9:13 and 9:18AM UTC.

In column BK, note the destination port. I’d like for you to predict the number of times SSH default port 22 will be attacked on my honeypot again between 9:13 and 9:18AM UTC.

Also, note column JK for IP reputation. Predict the number of times that my honeypot will be attacked again by a known attacker or mass scanner between 9:13 and 9:18AM UTC.

Type all of this in a small report that could potentially be delivered to an incident response team to assist with any proactive sweeping or hunting exercises.

Here is Claude’s response:

I redacted the first two IPs since one is my personal VPN’s IP and the other is it’s own internal address.

And the rest:

Overall, this is basic stuff, but it’s essentially giving me exactly what I asked it for, so I can’t be surprised.

Now to test against the data that I already know exists, several hours later:

Source IP Address

According to Claude:

The following source IP addresses have been identified as potential threats, with a high likelihood of attacking our honeypot again between 9:13 AM and 9:18 AM UTC

In the second set of data, the source IPs were found in column PB, and when applying filters I found:

The two 35.x.x.x addresses were not found within the logs, but the use of estimative language was prudent in it’s analysis. Technically, Claude was right in that both my VPN’s IP and the internal IP were heavily “attacking” the honeypot in addition to the 79.110.62[.]185 address. This can be forgiven in light of the limited data I was able to give Claude.

Verdict: It won best 3 out of 5.

Destination Port

According to Claude:

Based on the data analyzed, we predict that the SSH default port 22 will be attacked approximately 15–20 times on our honeypot between 9:13 AM and 9:18 AM UTC.

So let me check:

Not too far off.

So when it gave an actual number value, it still wasn’t egregiously incorrect. There were 10 “attacks” as opposed to the 15–20 predicted.

Verdict: If we’re using 15 as the lower threshold for being considered correct, then Claude’s prediction of 10 was 2/3 of the way there.

IP Reputation

Lastly, according to Claude:

The data indicates that our honeypot will likely be attacked by known attackers or mass scanners approximately 25–30 times between 9:13 AM and 9:18 AM UTC.

When I checked the filters in the raw data:

Okay, this was way off…

So the actual number is 164 to the predicted 25–30 is over 500% higher. This is actually a very interesting prediction considering that the first dataset saw 250 instances of known attackers or mass scanners in the honeypot’s logs:

Since this is specifically asked to predict future attacks, I didn’t type any follow up prompts explaining how it was wrong because I wasn’t supposed to actually know.

Verdict: ???

Conclusion

I was unfortunately forced to give Claude 3 Opus a very very limited data set to analyze, but I’d imagine with greater uploading filesize I could potentially get it to correlate far more data points and produce even more detailed analyses. I liked that it used estimative language in its analysis, where appropriate. It wasn’t perfect, but it was able to be within the ballpark for 2 of the 3 requested predictions.

I’m very confused on the completely off prediction for the number of known attackers and mass scanners that would attack the honeypot, but with some followup prompts and more datasets it can probably be trained to increase its accuracy.

Would I expect this to replace a CTI analyst’s job? Nope. Not even close. Could it be a powerful tool to assist in the production of intel products? 100% yes.

--

--