How I Discovered Thousands of Open Databases on AWS

Published in

InfoSec Write-ups

10 min readJan 23, 2022

My journey on finding and reporting databases with sensitive data about Fortune-500 companies, Hospitals, Crypto platforms, Startups during due diligence, and more.

Table Of Contents

Overview
Background
My Hypothesis
Scanning
BI & Automation: From thousands to hundreds
Examples of data I found
Conclusion

Overview

It is easy to find misconfigured assets on cloud services, by scanning the CIDR blocks (IP ranges) of managed services, since they are known and published by them.

An email from one of the companies I reported.

In just 1 day, I found thousands of ElasticSearch databases and Kibana dashboards that exposed sensitive information, most probably by mistake:

Sensitive information about customers: emails, addresses, current occupation, salaries, private wallets addresses, locations, bank accounts, and other sensitive information.
Production Logs that are written by Kubernetes cluster — From the applications logs to the kernels and system logs.
Logs that are collected from all the nodes, pods, and applications running on top of them, in one place, are open to the world. I just got there first.
Some of the databases were already malformed by ransomware.

A company that I found compromised, is giving services to these companies. The image is taken from their website.

Background

There must be plenty of assets out there, listening outside their scope, waiting to be discovered.
published CIDR blocks make it easier for attackers to find these assets, to spread all types of malware, or to put their hands on sensitive data of real companies.

DevOps, Developers, and IT practitioners often misconfigure some of the following:

Binding the socket on the wrong network interfaces.
For example, listening to connections from 0.0.0.0/* — So it is visible to all network interfaces, instead of only the inner-network interface IP address (172.x.x.x)
Misconfigured security group for the cluster (allow all TCP and all UDP from broad CIDR blocks).
Sometimes, the security group is changed by others that are not aware of the consequences.
The default network or subnet is used, subnet settings are derived, and a public IPv4 address is assigned silently.

The Hypothesis

I hypothesized that I can easily find misconfigured assets, mostly due to human errors if I will scan specific CIDR blocks from within the cloud operator (An Instance / VPS).

The key is to scan smartly, leveraging the pre-known network infrastructure (known CIDR-blocks for the software we like to scan) to find live servers, that are within my reach.

If you search a bit, you can find the relevant CIDR blocks of every cloud provider.
Let’s say I’m an IT technician or a security engineer, that needs to allow incoming connections from a specific cloud service, like AWS’s Elastic Container Service (ECS). It can be done by adding the CIDR blocks of the service to the Security Group rules, allowing connections from these CIDR blocks.

All Cloud providers publish a list of their services along with their CIDR blocks (IP addresses ranges) for each service.

AWS IP address ranges

Amazon Web Services (AWS) publishes its current IP address ranges in JSON format. To view the current ranges, download…

docs.aws.amazon.com

Get the CIDR blocks of ElasticSearch Service (ES) from AWS

Some CIDR blocks were accessible only from within the cloud provider. All you have to do is to start a VPS / instance with internet connectivity inside the cloud provider you want to scan.

What does one need to find them?

A basic understanding of networks, the IP stack and routing, and cloud infrastructure.
A lightweight tool for Port Scanning (like MasScan, or NMap)
A list of CIDR blocks to scan(managed services — like Kubernetes or ElasticSearch) along with the ports that are most likely to be open on the instances within these IP ranges.
A tool to visualize all the data we collect (like ElasticSearch+Kibana)

Port Scanning — Collecting the data about the assets

I have used MasScan to scan for the open ports on the CIDR blocks I selected.
MasScan is a TCP port scanner, it spews SYN packets asynchronously.
Under the right circumstances, it can scan the entire Internet in under 5 minutes.

The input is the CIDR blocks (e.g 50.60.0.0/16 or 118.23.1.0/24) and the ports we would like to scan (9200, 5600, 80, 443, etc.).

I started an ELK stack on my instance using Docker images.
I started MasScan on the same machine, and it started scanning the CIDR blocks. It streamed the output (response logs) of MasScan to ElasticSearch using LogStash and visualized everything through Kibana.
During the scan, the TCP responses were logged and indexed in ElasticSearch.

I let it run for a while, and I had 337K+ IP and Ports combinations scanned in my hands, in no time.
Many of them were opened.
This is how my dashboard looked:

337K open ports in AWS’s ElasticSearch service CIDR blocks (customers clusters), in a few hours.

Analyzing And Visualizing The Data

The photos are censored.
I have reported the incidents to the involved parties as well as AWS.
I got their permission to proceed with this post.
Most of the parties addressed the issue within a day or two.
Some of them ignored the reports to this day.

Thanks to the pipeline I created — I had real-time logs and I could start looking at the services immediately, as it scans for more.

I exported the assets that were up from Kibana to a CSV file using Kibana’s export button. Then I loaded it using pandas (python).

For each IP, I fired an HTTP HEAD request and got an HTTP response with the fingerprint of the asset.

I eliminated the responses that required authentication.

Then I printed their web page’s titles from the HTTP response.

We can do the same for ElasticSearch port, or any service. The IPs were already scanned by MasScan so I assumed they were up.

Printing the title of the HTML documents that came in response from the endpoint I found. Can be also done using nmap’s HTTP script.

Getting the title of the servers that I found relevant

so, now I have a list of all the assets, along with their web page name.
I explored the addresses in the browser by entering them in the navigation bar.

Then, I explored the assets carefully, one at a time.
I searched for the following:

/_aliases
/<index>/_search

This ElasticSearch REST API is very convenient. It is easy to get the fields metadata, documents count, and everything you wanted to know about the ES cluster.

ElasticSearch Index summary via /_aliases REST call

Examples Of Data I Found

Kubernetes logs for the entire production clusters (gathered through log collectors)

K8 cluster logs, streamed via fluentd, with live URLs and logs to the kernel.

Successful SSH login using the private key

SSH Daemon Logs — Cluster’s machines. Realtime logs.

Full cloud visibility — instance type, AMI, account ID, visible to the world, thanks to misconfigured clusters of K8 and ElasticSearch

Private Dashboards

some companies have streamed Jira tasks and issued them into their ElasticSearch dashboards) — including customer data, code examples, accounts names, etc. This Kibana dashboard contained everything about the R&D of the company.
The company was in the middle (or peak) of their due-diligence process, reporting they raised quite a lot of money just a few weeks after I reported this incident. That could have been fatal for them if someone else found it.
I have reported their VP R&D and they have fixed the issue within the same day.

Hospitals / Medical supply — Vaccination information of individuals.

A vaccination date along with an email address

Crypto Trading Platform

Banking
Bank transfer along with full name and bank account details.

Bank transfer along with full name and bank account details.

Live car fleet metrics (about each car):
- IMEI (unique cellular identifier)
- Location (coordinates)
- gsm_strength
- fuel status
- error codes
- battery status

Is it real?

sometimes I had trouble understanding whether the database is real or a honeypot.
So, I googled whatever I found in the database.
Usually, I found the real website and knew who to report this way.

The document in ElasticSearch:

The document in the production website, online:

Already-Hacked Servers by RansomWare

If your cluster is open to the world, there is no need for a 0-day to run the ransomware. One can access everything if you have no authentication on top of that.

As you can see, some of the assets I’ve discovered were already “hacked” (not hacked, more but malformed, by RansomWare.
A tool like elasticsearch-dump can be used to back up (and restore) the database. After it backed up, they deleted everything from the cluster, leaving this message: “All your data is a backed up. You must pay 0.16BTC to …”

Security and Privacy Policies

As you can imagine, many of the organizations I found were GDPR Compliant. This means they are sensitive about data breaches, they have a DPO (Data Protection Officer) who actively searches and handles this kind of incident.

An example of a privacy policy I read (to find the DPO)

“Contact us if you wish to know how your data is stored”.

Some DPOs did not reply, and some of the DPO’s mailboxes’ have rejected my emails with permissions errors (they are not allowed to receive emails outside the organization).

For some companies, the DMARC and SPF records did not allow emails to be received in their mail servers from outside the company.
What a mess, they cannot be reached by email at all.

I also sent emails to the twin companies and other subsidiaries.
I tried to contact them through their website contact forms.
No response whatsoever. I had to start conversations through CEOs, CTOs, and VPS.

As for the ignoring companies — their assets and data are exposed to this day, and they don’t care.

Conclusion

Publishing CIDR blocks per service is a logical issue when it comes to port scanning. We need it for many reasons — but it poses such a huge risk for cloud providers at the same time, making their customers scannable with ease.

Misconfiguration happens all the time, and it is here to stay, causing many holes in the company's security unknowingly.
We often sin and use the default VPC subnet when configuring instances. Hence, many instances are assigned with a public IP address automatically.
The problem starts with the lack of visibility.
As far as I know, nobody inside of AWS is currently actively searching for misconfigured databases or managed services. It is outside their scope.

I have reported the companies that I found, until a certain limit (there were too many), and I could not reach them all on my own.
In an email conversation with AWS, they claimed it was the company’s responsibility to configure and secure their asset, and that they are not actively searching for this misconfiguration. This makes sense, yet I found this surprisingly easy, and they can solve it without much effort.
I have done it on my own, pretty well, in my spare time.

Click here to learn more information about AWS’s shared responsibility model.

At the moment all we can do is assume that misconfiguration is always possible, no matter the company’s size — and that somebody is always seeing you from the network. If you open a service to the world, at least use decent authorization and authentication.

By the way, I am doing this in my spare time.
I also really love coffee!

Avi is making the internet safer in his spare time

I'm a business-oriented engineer, who loves security and AI, with deep security insights. I like to pwn cloud…

www.buymeacoffee.com

Thank you for reading this far.

If you like my works, or if you offer a bug-bounty program, let me know.
You are welcome to contact me or comment if you have any questions.

Check out my previous releases:

POC For Google Phishing In 10 Minutes: ɢoogletranslate.com

Back in 2016, I ran into a post about someone buying ɢoogle.com. It was used for phishing proposes (notice the first…

infosecwriteups.com

Facebook Knows What You Eat: Discover The Entire Data Facebook Collects About You, Step By Step.

A story of how I explored https://facebook.com/dyi programmatically.

medium.com

Identify Website Users By Client Port Scanning — Using WebAssembly And Go

Websites tend to scan the open ports of their users, from the browser, to identify new/returning users better. Can…