Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Short answer: not where it counts.

My work focuses on recognizing known functions in obfuscated binaries, but there are some papers you might want to check out related to deobfuscation, if not necessarily using ML for deobfuscation or decompilation.

My take is that ML can soundly defeat the "easy" and more static obfuscation types (encodings, control flow flattening, splitting functions). It's low hanging fruit, and it's what I worked on most, but adoption is slow. On the other hand, "hard" obfuscations like virtualized functions or programs which embed JIT compilers to obfuscate at runtime... as far as I know, those are still unsolved problems.

This is a good overview of the subject, but pretty old and doesn't cover "hard" obfuscations: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1566145.

https://www.jinyier.me/papers/DATE19_Obf.pdf uses deobfuscation for RTL logic (FGPA/ASIC domain) with SAT solvers. Might be useful for a point of view from a fairly different domain.

https://advising.cs.arizona.edu/~debray/Publications/generic... uses "semantics-preserving transformations" to shed obfuscation. I think this approach is the way to go, especially when combined with dynamic/symbolic analysis to mitigate virt/jit types of transformations.

I'll mention this one as a cautionary tale: https://dl.acm.org/doi/pdf/10.1145/2886012 has some good general info but glosses over the machine learning approach. It considers Hex-rays' FLIRT to be "machine learning", but FLIRT just hashes signatures, can be spoofed (i.e. https://siliconpr0n.org/uv/issues_with_flirt_aware_malware.p...), and is useless against obfuscation.

Eventually I think SBOM tools like Black Duck[1] and SLSA[2] will incorporate ML to improve the accuracy of even figuring out what dependencies a piece of software actually has.

[1]: https://www.synopsys.com/software-integrity/software-composi...

[2]: https://slsa.dev/



Very cool - thank you very much!

> My take is that ML can soundly defeat the "easy" and more static obfuscation types (encodings, control flow flattening, splitting functions). It's low hanging fruit, and it's what I worked on most, but adoption is slow.

If I wanted to implement my own toy HexRays-like decompiler using a few of these techniques to decompile x86-64 binaries is there any high quality up-to-date paper/resource you would recommend?

Or do you think that "A Generic Approach to Automatic Deobfuscation of Executable Code" paper is a good enough start?

Also, what do you think about https://tigress.wtf/ ?


"A Generic Approach" seems like a good starting point for a classical approach: building a set of reusable components and heuristics to recognize idioms, etc.

Might also be worth considering an approach integrating LLMs for summarizing code. Maybe you could fine-tune a pretrained model that already "understands" source code to associate sources with generated code? If going this route I would still probably use a disassembler to preprocess, and maybe also extract basic blocks to use as my "target" domain for fine-tuning.

As for Tigress, I used it extensively and found it to be really great most of the time. There are some limitations to be aware of: it only works with C code, and you have to turn your multi-file projects into a single file with a main() function. Also, its C parser (CIL) has some limitations (e.g. doesn't recognize the word static in "struct foo x[static 1]") so you might need to translate your C code first. I translated manually because it was a really rare issue for the code I started with. I also had mixed results using Virtualize and JIT. Sometimes they would emit invalid code, so I ended up just throwing out that data.

In my view, the up-and-coming Tigress challenger is obfuscator-llvm. I think it is very promising for future work because it inherently supports more languages than only C. But currently obfuscator-llvm is much more limited (~3 transformations compared to ~48). So if you're using C, today I would pick Tigress.


Thanks again! :)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: