The End of the Anonymous Hacker?

 Originally published on November 28, 2018 by Patrick Gebhardt
Last updated on January 23, 2024 • 5 minute read

We recently gave an outlook on the impact Artificial Intelligence can have on our future. In addition, there are of course specific effects for each individual profession. And if you define illegal hacking as a profession, then there too. Because just like righteous programmers, hackers usually leave their very own mark in the programs and code they write. And surprise: AI can easily find clues that expose a hacker faster than they would like.

Anyone who frequently uses programming platforms knows how chaotic the whole thing can sometimes be; it's often not easy to keep track. For example at GitHub, various programmers leave their suggestions, ideas and code contributions. The more developers working on a project, the more difficult it becomes to determine, in retrospect, who wrote certain code fragments. Consistent documentation is the exception. Machine learning methods are an obvious choice to tackle this problem, but their success crucially depends on the choice of a feature set that represents a programming style. However, new AI systems have developed a simple and highly effective way to solve this problem sustainably.

Two Scientists and a Key Paper

The whole thing is based on the theoretical considerations of two computer scientists. Rachel Greenstadt, professor of computer science at Drexel University, and Aylin Caliskan, professor at George Washington University, have published a groundbreaking study paper in 2017. It made it clear that even the smallest code extracts can be sufficient to distinguish programmers from each other. The reason for this is the peculiarity with which each developer writes their code. This characteristic is as hidden as the slightest difference in two fingerprints, but just as well suited for identification. In another document, Aylin Caliskan showed how exactly it is possible to de-anonymize coders via "Code Stylometry". For this process, the extensive binary code of a programmer is considered. The researchers then translated the binary code back into C++, in which it was written, while preserving the elements of the programmer's unique style. The details of the procedure can be found here.

Security vs. Privacy

As is all too often the case in the course of AI systems, there are both advantages and risks in the de-anonymization of code snippets. Because every anonymization also has a protective character, not only for criminals, but for everyone. Now, with Code Stylometry, hackers can be traced much more easily. This would help companies to protect themselves better against such attacks or to defend themselves against them. Malware developers could be identified and prosecuted. But such methods would also endanger anonymity on programming platforms. Because even if you switched accounts in the future while working on code, an AI like that could track you down. And as with other explosive topics, such as CCTV or machine learning, we stand here between the two poles of security and privacy - and we have to decide. Perhaps the biggest conflict on the Internet of our time. Please leave us your opinion in the comments!