Brain Storming on Blocking Bad-ads

I’m just jotting down some notes about using the Google Safe Browsing API to prevent a site from serving malicious/bad ads.

Problem Defined

  • Ads are put on a site via javascript by calls as simple as “getad(‘adposition1’)”. JavaScript is executed via the client’s browser after the page is served.
  • Those calls don’t touch any of our servers, they go from the client to the Google/Glam/Whatever Ad Server. So we don’t see the ads before they appear on the customer screens.
  • The ads being served may be malicious
    • Any ad that is served can link to a site that has been infected. We will want to block this.
    • Any ad that is served can “take over” the page and redirect the page to a site that may or may not have malware. We want to block ALL take over attempts.
    • There may be other types of ads that we wish to block.  Potentially we might wish to block specific ads on specific sites (i.e. a sexual connotations in ads on pre-teen audience sites). This may be beyond the initial scope and/or incur unwanted execution expenses.
  • Serving a malicious ad can get a site listed as “infected” even though your server has had nothing to do with ANY of the ad content.

Obstacles

  • Any extra calls WILL slow the page load process.
  • Each page load MUST call the ad serving script again
  • If an ad can be identified as bad, some other type of content must be served in that position to ensure page integrity.
  • The request for ad content HAS to come from the customer side because many ads are geo-specific and the customer’s IP determines what ad shows at what time.
  • You don’t want to set up a system where the site itself can submit a site as “bad” as anyone could sniff that info and seed our black list with bad data.
  • The results of the first getad() call could result in more javascript which must, in turn, be processed by the browser to produce the final ad. Potentially, several layers of JS could exist before the real ad is served. (e.g. 2 layers of indirection before ad: Google Ad Manager JS —serves—> Glam Ad embeded JS call —serves—> JS call to 3rd Party Ad Server —serves—> Ad). This pattern is real and happens often.

Possible solutions

  • Status Quo: As problem sites are reported to us, determine which ad is bad, report it to the ad server & hope they fix it before google sees it and lists the site as a dangerous site in it’s tool bar and in chrome.
    • Unless you are “lucky” you don’t get the badad.
    • Once you get the badad, it is hard to determine the initial JS that caused the problem
  • Embed everything JS with its own iframe
    • Will block take overs
    • May or may not prevent Google from listing the site, probably not.
    • Will break ads that are contextual based
  • Check the ad entirely on the client side via a black list: GSB API (http://code.google.com/apis/safebrowsing/) or PhishTank (http://data.phishtank.com/data/online-valid.xml)
    • This Good/Bad check could be done with a single call with the API call
    • Calls to external servers are dependent upon the health/bandwidth of that server
    • This could also be done via downloading the black list and checking off of that: http://code.google.com/p/jgooglesafebrowsing/wiki/Quick_Start_Guide
    • Blacklist downloading would cost time and would have to be updated periodically.
  • Implement a hybrid solution where a call is done to our servers to see if the an ad is good or bad.  (Server side base code: http://lampsecurity.org/php-google-safe-browsing-api )
    • Ad call is processed in JS eval (Will have to be checked for nested JS calls)
    • MD5 of ad is sent to the server. The results are Good/Bad/Unknown.  (Pass the url?)
    • If the result is Good, ad is served and process exits
    • If the result is Bad, either go to step 1, or serve place holder/known good ad & exit.
    • If the result is Unknown, send the JS to the server for verification. The server processes the code and returns a Good/Bad result.
    • If the result is Good, ad is served and process exits
    • If the result is Bad, either go to step 1, or serve place holder/known good ad & exit
  • Other solutions?

Reading

Anyway, I had this going through my head and wanted to get this all written out. So I can have a place to check back on this tomorrow…