Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I kinda want something which just treats XML as a dumb tree definition language... give me elements with attributes as string key/value pairs, and children as an array of elements. And have a serialiser in there as well, it shouldn't hurt.

Basically something behaves like your typical JSON parser and serialiser but for XML.

To my knowledge, this is what TinyXML2 does, and I've used TinyXML2 for this before to great effect.



That's what you call a DOM Parser - the problem with them is, as they serialize all the elements into objects, bigger XML files tend to eat up all of your RAM. And this is where SAX2 parsers come into play where you define tree based callbacks to process the data.


The solution is simple: don't have XML files that are many gigabytes in size.


A lot of teleco stuff dumps multi-gb stuff of xml hourly. Per BTS. Processing few TB of XML files on one server daily

It's doable, just use the right tools and hacks :)

Processing schema-less or broken schema stuff is always hilarious.

Good times.


Lol I love the upbeat tone here. Helps me deal with my PTSD after working with XML files.


Depending on the XML structure and the servers RAM - it can already happen while you approach 80-100 MB file sizes. And to be fair, in the Enterprise context, you are quite often not in a position to decide how big the export of another system is. But yes, back in 2010 we built preprocessing systems that checked XMLs and split them up in smaller chunks if they exceeded a certain size.


Tell that to wikimedia, I've used libxml's SAX parser in the past to parse 80GB+ xml dumps.


Some formats are this and they are historical formats.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: