TryHackMe — Content Discovery

Adithya Thatipalli
InfoSec Write-ups
Published in
5 min readMay 1, 2022

--

In this article we will cover another TryHackMe challenge “Content Discovery”.

This room teaches us how can we identify hidden content in Webservers and use them to explore more vulnerabilities. Let’s dive in…

Content on a web page is of many types like Configuration files, Images, Media etc. There are several ways to explore them using Automatic ,Manual and OSINT ways.

1.Manual

First way to find the hidden content is to search for robots.txt file. It is a kind of Index file which contains all the data, permissions and content inside that directory.

In our machine, we will explore the robots.txt and we will see which are allowed and not allowed.

In the above image, we can see that there is a directory which we are not allowed.

Next way is to use the favicon database.

The favicon is a small icon displayed in the browser’s address bar or tab used for branding a website.

Sometimes when different frameworks are used to build a website, a favicon that is part of the installation will be left to replace it with custom one, and if the website developer doesn’t replace this with a custom one, this can give us a clue on what framework is in use.

OWASP has a database of common framework icons that we can use to check against the targets favicon https://wiki.owasp.org/index.php/OWASP_favicon_database

In this challenge, we will see which favicon is used by the creator. We will navigate to the source code of the static page and compare the hash of the favicon with our OWASP database.

Now we will use Curl command to get the hash value to the favicon used in this page.

We can use Linux Terminal or Powershell to get the hash value.

Using Sitemap XML file

Sitemap is an XML file which contains the detail list of every file which the owner wishes to be visible on a Search Engine. During website crawling, sitemap file plays a crucial role to identify content.

In this challenge, we need to navigate to the sitemap file and identify the secret path which owner doesn’t want us to explore.

Next way is to using Http-Headers

When we make a request to server, it will respond with different headers which contain very useful information related to the web app like Server version, Framework etc. Using Curl to make a request is also a way to get more content.

Using Framework Stack

Every website use some sort of website stack to build their website. In the website we can able to see the source of the framework. When we navigate to the source website, based on version number and other key information we can identify the open vulnerabilities to exploit them

In this challenge we will access the website and then Navigate to the framework website to identify some critical information.

We got the framework source page, lets navigate to the website and get the flag to clear this challenge.

Based on the instructions given in the framework page, we got the flag.

Using OSINT- Google Dorking

Google dorks are way to use search engine in a more effective way to narrow down the results. We can use some keywords to get the results in customized way like Filetype, Particular site, Specific keyword etc.

Using OSINT — Wappalyzer

Wappalyzer is a online tool and browser extension which can help to let us know what all the frameworks, languages used in a particular website

Using OSINT — Wayback Machine

Wayback machine is a feature in the archive.org, a non profitable community build website which is like a online storage stores all the changes of websites, files and many more. It helps us to how a website changes and that also helps us to explore many loopholes.

Using GitHub

Git is a version control system that tracks changes to files in a project. Working in a team is easier because you can see what each team member is editing and what changes they made to files. When users have finished making their changes, they commit them with a message and then push them back to a central location (repository) for the other users to then pull those changes to their local machines.

You can use GitHub’s search feature to look for company names or website names to try and locate repositories belonging to your target. Once discovered, you may have access to source code, passwords or other content that you hadn’t yet found.

Using S3 Buckets

S3 Buckets

S3 Buckets are a storage service provided by Amazon AWS, allowing people to save files and even static website content in the cloud accessible over HTTP and HTTPS. The owner of the files can set access permissions to either make files public, private and even writable. Sometimes these access permissions are incorrectly set and inadvertently allow access to files that shouldn’t be available to the public.

The format of the S3 buckets is http(s)://{name}.s3.amazonaws.com where {name} is decided by the owner, such as tryhackme-assets.s3.amazonaws.com. S3 buckets can be discovered in many ways, such as finding the URLs in the website’s page source, GitHub repositories, or even automating the process.

One common automation method is by using the company name followed by common terms such as {name}-assets, {name}-www, {name}-public, {name}-private, etc.

Using Automation Discovery

Automated discovery is the process of using tools to discover content rather than doing it manually. Although there are many different content discovery tools available, all with their features and flaws, we’re going to cover three which are preinstalled on our attack box, ffuf, dirb and gobuster.

Thanks for reading this :)

We will connect in the next article.

--

--