Hacking Legacy Sites for Fun and (Non)profit
April 27, 2022•1,191 words
Audience
This post is written for an audience of software engineers and assumes general Internet experience. Some definitions are provided below to provide context for those without a background in developing software.
Definitions
- GDPR (General Data Protection Regulation): A European Union law focusing on data protection and privacy. California has a similar one called the CCPA (California Consumer Privacy Act). There is no federal law in the USA providing data privacy protection.
- Cookie banner: Those annoying cookie notifications you get on every new site you visit asking you to choose how closely you want the website to track your behavior.
- Google Analtyics: Google's analytical platform for tracking user behavior. Used by a mind-boggling number of sites.
- API (Application Programming Interface): Enables applications to exchange data with each other using a documented interface. A major revolution in computer science that enabled the software industry to grow so quickly.
- JSON (JavaScript Object Notation): A standardized format for representing a JavaScript data as human-readable text.
- regex (Regular Expression): An esoteric way of searching through text using patterns. For example, this regular expression was written by Satan himself to match email addresses: (?:[a-z0-9!#$%&'+/=?_`{|}~-]+(?:.[a-z0-9!#$%&'+/=?_`{|}~-]+)|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\[\x01-\x09\x0b\x0c\x0e-\x7f])")@(?:(?:a-z0-9?.)+a-z0-9?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\[\x01-\x09\x0b\x0c\x0e-\x7f])+)])
Recently at work I had to fix a few legacy websites with broken cookie banners after we did a major GDPR compliance effort across all the publicly accessible websites. These sites were initially created 14 years ago and haven't been updated for many years. It's a technological wonder that they're still up and running, but they're still there!
Unfortunately, their old age makes delivering updates difficult. And thanks to some technology choices that broke the modern cookie banner code, there were some updates that needed delivering.
Thankfully, those sites already had Google Analytics. Besides being able to track your every move on a website, Google Analytics has the handy feature of remotely delivering code snippets! That's actually how the cookie banner software is delivered to these old sites in the first place. So instead of trying to figure out how to resurrect extremely old deployment infrastructure, I decided to first try to hack together a solution to fix the broken cookie banner software and patch the website via Google Analytics.
That effort turned into the hackiest code I've ever written. It's ugly, nonsensical without the context of the problem at hand, and uses browser APIs I hardly knew existed.
But it works!
And that was the key point. We have no plans to actively return to those legacy sites and provide new updates. All that mattered is we were compliant with GDPR. Were we actively maintaining those sites or had major rework for them on the horizon, I wouldn't have turned to my hacky solution. I showed what I wrote to a couple of good friends and they were rightly horrified at what I had done.
But again, it works!
So let's take a look at the code.
First up, I added a forEach
method to the JavaScript String
prototype.
...
Yeah. It's that bad.
The good news is since forEach
on a String
makes no sense, the site doesn't already try to do that somewhere, so there are no conflicts!
But when we look at the actual implementation, it gets worse.
Theoretically, in a sane world, forEach
on a String
might be a method that loops through each character in a string and lets you do something with it. That would make a bit of sense and can already be done in JavaScript, just not using forEach
.
But that's not what I did. I discovered that the cookie banner broke because we had a String
instead of a JSON object. But String
s can be turned into JSON!
"So", I thought, "what if I turn the String
into the JSON object the code expected, then do the forEach
stuff that was supposed to happen anyway on my newly created object!"
Turns out, that actually worked 🤣
String.prototype.forEach = function(originalForEachFunction) {
var stringToJSON = JSON.parse(this);
stringTOJSON.forEach(originalForEachFunction);
}
However, the journey wasn't over. While that fixed the error I was seeing and got the cookie banner to appear, I noticed there was an error when accepting any cookies! Apparently, the cookie banner would make a server call to record what preferences were selected.
I dug into the code and discovered that the network call was failing because the same String
I turned into a JSON object earlier was still a string later when it should be an object! That's because the code above didn't actually modify the string at all.
At this point I thought I hit an impasse. There was no obvious way for me to insert myself into the code like I did earlier with my vomit-inducing String.forEach
hack.
I let my brain stew on it for a while. That evening, I listened to a new episode of Darknet Diaries, a phenomenal podcast that tells stories about the darkside of the internet, mainly focusing on hackers and computer security. It's one of my favorite podcasts, and it reminded me that I should think like a hacker regarding my cookie banner program.
And what would a hacker do?
Intercept every single network call, look for the data they're interested in, and modify it as needed!
While I typically don't work during the evenings, this problem and idea was burning a hole in my head and I had to try it out immediately.
So there I sat, the faint glow of the computer lighting up my face in the dark room, digging up browser API documentation on how to peek at every network call being made. That night of hacking led me to create this monstrosity, which involves XMLHttpRequest
, regex
replacements, and lots of null
checks (I modified the code to simplify what's going on and to provide some minor obfuscation, so imagine something even worse):
var originalSend = XMLHttpRequest.prototype.send;
XMLHttpRequest.prototype.send = function(data) {
if (data && data.brokenFieldName) {
data.brokenField = data.brokenField.replace(/\\\"/g, '"').replace(/\"\[/g, '[').replace(/\]\"/g, ']');
data.brokenField = JSON.parse(data.brokenField);
}
originalSend.call(this, data);
}
It's horrible and I hate that my brain came up with it, but it works!
The best part about it is that I never would've been able to come up with such a bonkers idea earlier in my career. I'm at a point where I feel extremely comfortable with web development technologies, meaning I now understand what is available to me and how I can bend the rules. That kind of mastery feels incredibly good once you're there and the feeling of getting something working in a non-traditional manner is the heart of the hacker spirit. Makes me think I would've had a solid career as a white hat hacker in another life!
Anyway, I hope you hated that code as much as I did. The hack has been humming away in production for a few weeks now and works flawlessly.
And before you ask, yes, I heavily documented what is going on with the hack in several places so that people won't be confused when they find my monster a few years down the road.
Until the next hack,
/Lane