I worked on something like this back in 2016, I'm not sure how much things have changed since then. I used dynamic binary instrumentation to deal with the field encryption. Basically, manually map the executable into executable memory on Linux (as if it were a shared library). Begin execution at the packet switch, but before executing a block of code, disassemble it until a conditional branch, and modify it according to some heuristics to remove the at rest encryption. The original block of code wasn't executed since it might not fit into the original block size, so new blocks were mmap'd for this. Malloc/Free were hooked and replaced with wrappers over glibc's free/malloc, but with bookkeeping so that the memory can be freed after execution of the packet switch. atexit was just replaced with a noop. That all just dealt with the encryption, but there were also randomized packet id's and field orders. Those problems were dealt with by using manually written heuristics based on the packet id's which were actually interesting. Packet handlers with references to text strings (even hashed ones), etc were a gold mine here because they made static detection of packet id's simple. If there was no text string, many of the offsets could be auto detected just by parsing a replay and running small snippets to determine which offsets actually "made sense" for the field that was being searched for. For example, if there was a gold gain packet, the amount of gold gained shouldn't be out of an expected range, or else the offset is likely not corresponding to that field. Once all of the high volume code blocks had been instrumented, replays were able to be parsed in 2-3 seconds (along with generating the desired data aggregations). This is all from memory so it's possible there could be a minor mistake or two.
I've always heard that "security through obscurity" is discouraged because, well, there's no stopping someone from digging in and figuring it out. However in this case it seems somewhat successful in that the author was not able to decrypt the packets directly.
The article says that "while it might seem feasible to reimplement these functions in Python without running the client, several factors make this approach impractical" and then lists some reasons like the lookup tables changing, chunk layouts getting shuffled, etc.
Is that all it takes to thwart decrypting the packets? Even though, presumably, you have access to all those lookup tables and chunk layouts somewhere in the client? Is it just too much effort to piece together how it works? I'd be curious to hear more specifics on how exactly Riot was able to make reverse engineering this so impractical.
There's hundreds/thousands of generated one way decryption schemas for fields. However, it's not impossible to generate the decryption in another language with some effort.
Example:
A packet could be decrypted like this (the actual decryption takes more steps than this)
We observe that each each operation is composed of ADD_CONST, XOR, SUB_CONST, LOOKUPTABLE and the lookup tables in the client which is ~256 bytes long.
We could extract these operations and generate a really long script in python.
Why didn't I approach it this way?
1) It's really fragile. League is an actively updated game and the decryption mechanism may change in the future. If the decryption adds another operation like MUL_CONST or DIV_CONST, I would need to account for that on my end. This is unlike the reverse engineering efforts for dead games/servers where the packets do not change.
2) I don't need to know how the decryption mechanism works. Building a game server would require decryption of packet necessary. I only need to observe game state.
As for understanding how it works, I have not put enough time/effort to give an answer. :)
dug into LOL more than a decade ago with a few mates to back an api/bot/site, parsing the keyframe and chunk formats within a week of spectator launch to automate timer callouts for jungle camps through fog of war due to the observer delay being less than the respawn time and so-called « auto-shoutcasting » for matches implementation when we were maybe 11 years :)
there are a number of difficulties these days (ive not played in years but work in the industry and do not touch these due to legal risk particularly REing competitor code)
from kernel anticheat being a requirement and packman before that - this article was written during covid so predates vanguard but contains packman -
legality - RE is « forbidden » - bannable so in which you do not want to lose your account in which you have spent tens of thousands of hours or more, breaking authentication and DRM flows is [DRM, auth handshake, protecteions] illegal in USA -
entire obfuscation format and flow changes with every patch; you have to repeat the work every hotfix + patch (and it isn't just a new xor key) - the re implementations would probably need to be realised every week or two - annoying - this is likely one of most tedious bits
> legality - RE is « forbidden » - bannable so in which you do not want to lose your account in which you have spent tens of thousands of hours or more, breaking authentication and DRM flows is [DRM, auth handshake, protecteions] illegal in USA
You are confused. Reverse engineering is perfectly legal in USA, but that of course is irrelevant when it comes to losing your account (which isn't yours to begin with).
I would have assumed that the changes make it too impractical to maintain.
Semi-related but the game Vindictus/Mabinogi Heroes (a Source engine MMO) changed the game archive format multiple times (and probably continues to change it every so often) because people would eventually reverse-engineer the format, dump the files, then use them in Garry's Mod or the like.
This is really something cool, and it is exactly what I was looking for. To give a context, I worked on some data science-inspired studies [1] about LoL, and the future research direction is to provide a formal modeling for the games and analyze them through it. While I had a little success by getting aggregated data from websites such as uol.gg, the granularity is not fine enough to do very interesting analysis.
The World of Warships community has gone through similar steps, but the encryption is much more straightforward. Some of the packets are pickled Python, some are just binary blobs, so there are some undocumented packets but for the most part people have done a decent job of figuring it out and building tooling around it such as the minimap renderer: https://github.com/WoWs-Builder-Team/minimap_renderer
There’s an odd unspoken and somewhat understood agreement between the developer (Wargaming) and community though: the community actively reverse engineers the game to document the packets and WG kind of looks the other way (except when they recently threatened me with a perma ban :) — they even use the tooling the community creates for official tournaments.
In this article the author mentions Riot partnering with external companies to provide more rich data set and analytics. Do they use these tools/data sets for tournaments as well? Is it known at all how these partnerships are structured?
I'm not very well versed in RE, but I know that competitive games like this spend a lot of effort in preventing you from attaching debuggers, hooking and decompilation.
By passing this is not mentioned at all in the article. Is this because they're trivial to bypass for experienced people, or because they want to hide their method from the dev?
I did something similar with a friend for some time for another game.
As it went, our data was used to prove things to the developer they would have loved to hush-hush, which led to a cat and mouse game with the data and their open and... not so open apis. In the End, we stopped playing the game and stopped our efforts at it.
Fun times.
Getting data by directly processing the packets instead of using the (buggy, slow) replay system is a great idea. There's a lot of interesting data in the middle of LoL gamestate that is missing in summary overviews that only consider the final state of the game.
I remember doing this 10+ years ago now for a site called probuilds. I left lol shortly after this. Cool to see that the packets haven’t changed much. (Based on my memory)
Shortly after I released this for TSM riot came out with the api.
I've been working on something similar [1], but I took a different approach: I statically extract all decryption stubs using a IDA script I wrote, then emulate them using Unicorn. I'm also interested in your implementation details—do you have your code on GitHub or somewhere else?
I see comments like this a lot actually and I'm curious, if the client is manipulating the intended style and layout of the site, do you really think it's the responsibility of the website owner ?
This isn't the case of a browser plugin modifying the styles. The blog framework or whatever detects what your browser/system preference is and respects it. So if you've got your browser/os set to "dark mode" the page renders in "dark mode". Except the author used transparent images with dark lines, so they are invisible.
The site automatically displays in dark mode if the browser says it’s using dark mode.
So this isn’t something the user is doing to manipulate the style and layout: their browser is saying “hey, fyi, this user’s local system biases to dark mode” and the site is choosing to respond by styling in a way that breaks diagram visibility.
The blog has a toggle for darkmode and some of their images are black text with a transparent background. When darkmode is toggled, the text is effectively invisible, so in this case it seems to be an oversight of the blog.