Bad Bots: The Unseen Cyber Threat and the Fight to Secure the Internet

Published in

InfoSec Write-ups

10 min readNov 18, 2024

Introduction

The internet ecosystem, vital for global communication, commerce, and innovation, is increasingly polluted by “bad bots” — malicious automated programs that conduct various nefarious activities. From data scraping and credential stuffing to orchestrating denial-of-service attacks, bots have grown both in volume and sophistication.

By 2023, bots accounted for 47% of all internet traffic, with bad bots responsible for over 30% of that share. This dominance reflects a major threat to businesses globally, with an estimated $40 billion lost annually due to bot-related activities. During key periods, such as holiday shopping seasons, bot traffic can surge beyond 30%, making it harder for website owners to differentiate between legitimate users and malicious traffic.

Interesting facts further highlight the gravity of the situation:

1 second of page load delay caused by bot-induced server overload can reduce conversion rates by 7%.
50% of login attempts globally are linked to bots attempting credential stuffing.
Bots often exploit older mobile app versions, which may lack updated defenses, during the transition to newer versions.

The race between bot evolution and countermeasures continues to shape cybersecurity strategies.

The Scope of the Bot Problem

Bad bots make up nearly 30% of all internet traffic, a startling figure that highlights the widespread nature of the problem. This bot-driven traffic is responsible for costing businesses over $40 billion annually. Bots are increasingly targeting all aspects of the web, from scraping content and pricing data to conducting sophisticated fraud schemes like credential stuffing and account takeovers

The First Generation: Basic Automation

Characteristics

The first-generation bots, emerging in the early 2000s, were simplistic and relied on basic scripts written in languages like Python, PHP, or Perl. They targeted HTTP endpoints using direct, rule-based logic. Their operations were straightforward but effective for early threats.

Examples of Malicious Activities

Web Scraping: Automated tools like curl and BeautifulSoup extracted product pricing, proprietary data, or entire web pages.
Spamming: Bots automated form submissions to post advertisements or send phishing messages.
Credential Stuffing: Exploited breached credentials to access user accounts via repetitive HTTP POST requests.

How Organizations Responded

Static IP Blocking:
Analyzed server logs for repeated requests and blocked suspicious IPs via firewalls like iptables.
Limitations: Proxy servers quickly bypassed IP blocks.
Rate-Limiting and Throttling: Controlled traffic flow using tools like Nginx modules.
CAPTCHAs:Early CAPTCHAs challenged users with text or image recognition tasks. However, these slowed legitimate users and were ineffective against advanced automation.

The Second Generation: Stealth and Mimicry

As first-generation defenses matured, bots evolved. Second-generation bots were stealthier and designed to mimic human behavior.

Advancements

IP Rotation: Used proxy pools like ProxyMesh to evade static IP blocking.
Headless Browsers: Tools such as Puppeteer and Selenium could execute JavaScript, bypassing detection based on client-side scripts.
Behavioral Mimicry: Generated randomized mouse movements and typing delays to resemble human interactions.

Challenges and Countermeasures

Credential Stuffing at Scale: Sophisticated frameworks like Sentry MBA exploited login portals.
Behavioral Analytics: Machine learning models analyzed anomalies in interaction patterns to flag bots.
Dynamic CAPTCHAs: Risk-based challenges were introduced but were increasingly bypassed using CAPTCHA-solving services like 2Captcha.

The Third Generation: AI-Driven Bots

Third-generation bots integrated machine learning for adaptability and decision-making, pushing bot sophistication to unprecedented levels.

Capabilities

Machine Learning Mimicry: Bots trained on datasets to replicate human interactions, including navigation paths and click sequences.
API Exploitation: Targeted GraphQL APIs to extract data, bypassing web GUIs entirely.
Autonomy: Bots self-optimized their attack strategies using feedback from failed attempts.

Defensive Challenges

Even with tools like Akamai Bot Manager and Cloudflare Bot Management, the evasion of JavaScript challenges and high traffic volumes made detection increasingly resource-intensive.

How Bots Make Life Difficult for Website Owners

Bots complicate operations for website owners in various ways:

Performance Degradation:

· High traffic from bots causes slower page load times or outright server crashes

· Overloaded systems during peak periods, such as holiday sales, result in lost revenue

2. Analytics Distortion:

· Bots inflate metrics like page views, making it difficult to assess genuine user engagement

· Marketing campaigns based on skewed data may lead to poor ROI

3. Fraudulent Activities:

· Credential stuffing, inventory hoarding, and fake account creation drain organizational resources

· Carding bots test stolen credit cards, causing financial loss and reputational damage

4. Escalating Costs:Detecting and mitigating bots requires expensive tools, increasing operational overhead.

Key Mechanisms in Bot Detection

Modern anti-bot solutions employ a multi-layered approach to differentiate bots from legitimate users:

Behavioral Analysis: Solutions monitor and analyze traffic patterns, user actions, and session flows. For example, bots often bypass genuine workflows (like navigating through a website) and directly request endpoints or URLs. Machine learning models are trained on these behaviors to identify anomalies.
Device and Browser Fingerprinting: Unique attributes of the client device and browser, such as screen resolution, installed plugins, and HTTP headers, are recorded. Suspicious patterns, like identical fingerprints across multiple requests, are flagged.
JavaScript Challenges: Many bots fail to execute JavaScript properly. Anti-bot tools use embedded scripts to evaluate client behavior, detecting headless browsers or automated scripts.
Behavioral Biometrics: This includes monitoring fine-grained user activities like mouse movements, typing speed, and touch gestures (for mobile devices). Human interactions have organic patterns that are challenging for bots to mimic.
Dynamic Challenges:

CAPTCHA and Crypto Challenges: Bots are challenged with tasks that require human-like reasoning or expend computational resources, increasing the cost of attacks. CAPTCHAs-as-a-service has emerged as an underground market, where bots outsource CAPTCHA solving to human solvers or advanced AI.
Private Access Tokens: Emerging techniques like those used by Cloudflare eliminate traditional CAPTCHAs, relying on secure tokens that validate legitimate users based on hardware and software attestations.

6. API and Mobile-Specific Features:

Anti-bot systems evaluate mobile-specific attributes, such as the tilt and posture of the device during interactions, gyroscope data, and app-specific behaviors, to detect automation.
For mobile apps, tools like Akamai’s Bot Manager assess app integrity, session anomalies, and usage patterns to flag bot activity.

Challenges of Older Mobile App Versions

When mobile applications do not enforce mandatory updates, older versions remain susceptible to bot attacks. These outdated apps may lack the latest security patches, like anti-bot scripts or enhanced validation mechanisms. Organizations face a balancing act:

Security Risks: Old versions can be exploited by bots for account takeovers, API abuse, or credential stuffing.
User Experience: Forcing updates too aggressively might alienate users or cause retention issues.

Solutions like Akamai and F5 Distributed Cloud Bot Defense address this by applying centralized bot management across all endpoints, including APIs and mobile app traffic, irrespective of app version.

Commercial Bot Management Solutions and Detection Techniques

Akamai Bot Manager leverages behavioral analysis, device fingerprinting, and machine learning to detect bots effectively. It excels in granular bot categorization and dynamic mitigation strategies.
Cloudflare Bot Management integrates seamlessly with its extensive global edge network. It uses JavaScript challenges, machine learning, and fingerprinting but is also highly reliant on its CDN infrastructure.
Imperva Advanced Bot Protection focuses on client-side integrations for bot detection, employing server-side behavioral analysis and reputation-based filtering.
DataDome emphasizes real-time traffic analysis through machine learning and AI-powered detection, with strong support for mobile and API traffic.
PerimeterX Bot Defender is known for advanced user behavior analytics and employs contextual signals to differentiate between humans and bots effectively.

Deployment Models:

Akamai and Cloudflare are primarily edge-based solutions, benefiting from their CDN capabilities to offer high-speed processing and minimal latency.
Imperva and PerimeterX provide flexible deployment options, including cloud-based, on-premises, or hybrid solutions.
DataDome supports easy SaaS integration, targeting modern application architectures.

Focus Areas:

Akamai and Cloudflare often appeal to enterprises already using their CDN services, offering robust bot protection as an added layer.
Imperva and DataDome are popular in e-commerce and financial services for mitigating sophisticated threats like scraping and credential stuffing.
PerimeterX is a strong choice for applications requiring detailed behavioral analytics.

Challenges and CAPTCHA Management:

CAPTCHA implementations vary; some solutions, like Akamai and Imperva, allow integration with multiple CAPTCHA providers or leverage their proprietary ones.
DataDome prioritizes minimizing user friction with adaptive challenges, and PerimeterX uses “invisible challenges” to reduce disruptions for legitimate users.
Mobile and API Support:Solutions like DataDome and PerimeterX have advanced support for securing mobile apps and APIs, with capabilities to detect anomalies in API calls or suspicious patterns of mobile app usage.

Reporting and Insights:

DataDome and Akamai are lauded for detailed dashboards and actionable analytics, helping security teams gain insights into bot traffic.
Cloudflare provides streamlined integration with its broader suite but may lack the depth of reporting compared to specialized solutions like PerimeterX.

Pricing and Scalability:

Cloudflare offers competitive pricing with its bundling of bot protection into other services, making it appealing for SMEs.
Akamai, Imperva, and PerimeterX are typically more enterprise-focused, with pricing scaling based on usage and customization.
When a bot is discovered, modern bot mitigation solutions offer a range of flexible response actions. These actions help organizations tailor their defenses based on the bot’s threat level, the application’s sensitivity, and the desired user experience.

Possible Actions when Bots are identified:

1. Deny Request (Hard Block)

Description: The system outright denies the bot’s request by returning an HTTP 403 (Forbidden) status code.
Use Case: Effective against known malicious bots or high-confidence detections of harmful activity (e.g., credential stuffing).
Pros: Direct and clear-cut; stops malicious activity immediately.
Cons: Determined bots may retry from a different IP or leverage proxy pools to evade blocking.

2. Drop Request (Silent Block)

Description: The server silently drops the request without responding, effectively causing a timeout for the bot.
Use Case: Useful for bots that retry aggressively, as it wastes their resources by forcing them to wait.
Pros: Reduces server load by not processing a response; confuses bots.
Cons: May increase retry rates, leading to potential resource exhaustion on the server.

3. Tarpitting

Description: The server intentionally delays its response to the bot, slowing down its operations.
Use Case: Designed to reduce the effectiveness of high-frequency attacks like scraping or brute-force attempts.
Pros: Wastes bot resources and reduces the number of requests they can send in a given time.
Cons: May still consume server resources for extended periods, especially during high traffic.

4. Challenge with CAPTCHA

Description: The system presents a CAPTCHA (e.g., Google reCAPTCHA or hCaptcha) to verify human users.
Use Case: Best suited for medium-confidence detections or scenarios where false positives need to be avoided.
Pros: Reduces the chance of blocking legitimate users; highly effective against simple bots.
Cons: Advanced bots use CAPTCHA-solving services (e.g., 2Captcha) or AI-based solvers to bypass challenges

5. Serve Static Page

Description: The bot is redirected to a generic static page, often displaying minimal content or error messages.
Use Case: Used for bots scraping dynamic data, such as product pricing or proprietary content.
Pros: Prevents sensitive data from being exposed while conserving backend resources.
Cons: Bots may still access non-critical static content unless actively redirected or monitored.

6. Redirect to Honeypots

Description: Malicious requests are redirected to deceptive endpoints designed to monitor or waste bot resources.
Use Case: Ideal for gathering intelligence on bot behaviors or pre-emptively flagging malicious IPs.
Pros: Offers insights into attack vectors; can waste bot resources without affecting legitimate users.
Cons: Requires careful setup to ensure legitimate users are not inadvertently redirected.

7. Throttling or Rate Limiting

Description: Restricts the frequency of requests from a suspected bot by slowing down or limiting its traffic.
Use Case: Best for scenarios where low-confidence detection is likely, or during high traffic events.
Pros: Reduces server strain and prevents abuse without outright blocking traffic.
Cons: May impact legitimate high-frequency users, such as API consumers.

8. Alert and Monitor

Description: Suspicious activity is logged and flagged for review without taking immediate action against the bot.
Use Case: Useful for low-confidence detections or when tracking bot behavior is prioritized over blocking.
Pros: Provides actionable data without disrupting user experience.
Cons: Does not mitigate immediate threats.

9. Behavioral Adaptation

Description: Some solutions dynamically modify page content or functionality to confuse bots, such as changing form field names or injecting random delays in page loading.
Use Case: Effective against bots using hardcoded or predictable scripts.
Pros: Adds complexity to automated attacks, forcing bot creators to adapt constantly.
Cons: May introduce additional latency or complexity for legitimate users.

10. Mobile-Specific Actions

In mobile apps, bot defense actions often include additional layers:

Verification of Device Integrity: Detect if an app is running on an emulator or a jailbroken/rooted device.
User Interaction Validation: Bots are detected based on anomalies in touch gestures or accelerometer data.
Session Token Revocation: Suspicious sessions are terminated, and reauthentication is required.

Open Source Solutions

Tools and Frameworks

ModSecurity:

Open-source WAF that provides bot detection and blocking rules.
Implementation Example: Use the OWASP CRS (Core Rule Set) to block bot-like patterns in HTTP requests.

2. Fail2Ban:

Although primarily designed to prevent brute-force attacks, Fail2Ban can block bots by analyzing server logs for repeated malicious activity.

3. BotSentry:

Open-source framework focusing on real-time bot traffic analysis and mitigation.

Advantages and Limitations

Pros: Cost-effective and flexible.
Cons: High technical expertise required for configuration and maintenance.

Conclusion

The evolution of bad bots from simple scripts to sophisticated AI-driven systems illustrates the dynamic and relentless nature of cyber threats. Modern bots pose significant challenges, targeting vulnerabilities across web applications, APIs, and even mobile apps with non-automatic updates. Their ability to mimic human behavior, solve CAPTCHAs through third-party services, and exploit system loopholes makes them a formidable adversary.

Organizations must adopt multi-layered, adaptive bot management strategies to keep pace. While commercial solutions like Akamai Bot Manager, Cloudflare Bot Management, and Imperva Advanced Bot Protection offer robust defenses, they are not without limitations. Open-source alternatives and custom-built solutions provide flexibility and cost-effectiveness but demand significant expertise.

Additionally, website owners face challenges beyond technical defenses, including maintaining user experience while combating bots, dealing with fluctuating bot traffic, and addressing gaps in mobile app updates. Eye-opening statistics — such as bots accounting for nearly half of all internet traffic — highlight the urgent need for comprehensive mitigation measures.

Ultimately, defending against bots is not a one-time effort but an ongoing process requiring innovation, vigilance, and strategic investment. By staying informed and leveraging advanced technologies, businesses can protect their assets, users, and reputations in an increasingly automated and adversarial digital landscape.