Published on November 15, 2024 • Updated on November 21, 2024

Investigating the "||clean.gg^" Ad Blocking Rule on Yahoo.com

By Roman K. Fox

Introduction

On the Yahoo.com landing page, we observed a significant amount of blocked content. Our investigation revealed that requests originating from ||clean.gg^ were blocked. This blog post digs into the reasons behind the activation of this rule, which originates from EasyList, and its broader implications.

What Is the `||clean.gg^` Rule?

The ||clean.gg^ rule is part of the EasyList Adblock List under the ad servers category. It blocks all requests to domains matching clean.gg and its subdomains.

The clean.gg domain, based on its registration details, is associated with clean.io, which redirects to HumanSecurity. This cybersecurity company focuses on detecting fraudulent activity and mitigating bot traffic. Understanding why requests to clean.gg are triggered and what data they transmit was a challenging process due to the obfuscated nature of the scripts involved.

Tracking ecosystem on Yahoo.com

I'll write in the same order as my investigation process. Request with tracking data to clean.gg → d1ccw66oyq8ex2.js which sends the request → benij-2.2.4.js which injects the script d1ccw66oyq8ex2.js.

The Role of `d1ccw66oyq8ex2.js`

The activation of the ||clean.gg^ rule was traced back to a dynamically loaded JavaScript file:
https://s.yimg.com/aaq/f10d509c/d1ccw66oyq8ex2.js.

The d1ccw66oyq8ex2.js script has some obfuscated code to collect information and send event via XHR POST request to https://i.clean.gg/1a. This is the payload:

{
  "action_name": "Events",
  "f": "pageview",
  "jstimestamp": 1731658138504,
  "elapsed_time": 6,
  "action_group_id": "2126027a-64a1-ad9f-a5a7-9a3a9308c7d9",
  "version": "4.28.7-aq-b13022d0_47bf_45ee_bcad_8343b3f1e86b_p_45",
  "topLocation": "https://www.yahoo.com/",
  "referrer": "https://www.yahoo.com/",
  "jstzoffset": -120,
  "custom_fields": [
    {
      "apiVersionSuffix": "b13022d0_47bf_45ee_bcad_8343b3f1e86b_p_45"
    }
  ],
  "s": "https://s.yimg.com/aaq/f10d509c/d1ccw66oyq8ex2.js",
  "bk": 5,
  "ia": 0,
  "nl": "en-GB",
  "ls": {
    "rb": 0,
    "fs": 982,
    "fd": 888
  },
  "details": "init | <EF>3",
  "cid": "2126027a-64a1-ad9f-a5a7-9a3a9308c7d9:TP",
  "tml": 6524
}

Btw, this is how script obfuscation looks like:

function ag(f3, f4) {
    var f5 = a9();
    a7(f5, d(483), f4),
    f5[d(549)] = !0x1,
    f5[d(620)](d(216), d(718)),
    f5[d(42)](f3);
}

The d1ccw66oyq8ex2.js file is actually dinamically injected by another script, benij-2.2.4.js, using the following code:

function Yi({tagSrc: e, id: t, WIN: n, async: i=!1, onloadCb: s, onerrorCb: o}) {
    const r = n.document.createElement(U);
    i && (r.async = !0),
    s && "function" == typeof s && (r.onload = s),
    o && "function" == typeof o && (r.onerror = o),
    r.type = Y,
    r.src = e,
    r.id = t,
    n.document.head.append(r)
}

It has some logic connected to current page, but I didn't dig into it.

Connection to HumanSecurity

The blocked request to https://i.clean.gg/1a appears to send user interaction data to Clean.gg. Key payload fields include:

topLocation: Indicates the user’s current page (e.g., https://www.yahoo.com/).
referrer: Tracks the referring URL, helping Clean.gg map navigation paths.
custom_fields: Includes metadata like the API version for real-time telemetry.

Clean.io, the registrant of the clean.gg domain, is redirected to HumanSecurity, a company specializing in:

Bot Mitigation: Blocking non-human traffic.
Fraud Prevention: Identifying fraudulent ad impressions.
Advanced Threat Detection: Analyzing 2,500+ signals in real-time.

I've checked my logs and found clean.gg integration with many popular websites, including:

aol.com
engadget.com
iphonelife.com
rivals.com
techcrunch.com
yahoo.com
sports.yahoo.com
finance.yahoo.com

Conclusion

Clean.gg’s integration with popular websites raises concerns about the amount of data being collected. While HumanSecurity provides legitimate services like fraud detection and bot mitigation, the aggregation of user data across multiple high-traffic sites—often without explicit user awareness—introduces significant privacy concerns.

AdBlocking tools are hurning revenue for websites, but they also protect users from unwanted tracking and data collection. Finding a balance between privacy and revenue generation is a challenge for publishers and ad tech companies alike.