JSON parser written in 6502 assembly language

plorntus · on March 1, 2021

I wonder if this is faster than GTA Onlines microtransaction JSON file parser.

Waterluvian · on March 1, 2021

I think a hypothetical Json parser written in Conway's Game of Life would be faster.

keanebean86 · on March 1, 2021

Rockstar could get a performance boost by bundling powerpoint: https://youtu.be/uNjxe8ShM-8

oblio · on March 1, 2021

Or they could use this: https://github.com/xoreaxeaxeax/movfuscator

See this presentation, too: https://www.youtube.com/watch?v=R7EEoWg6Ekk

yholio · on March 1, 2021

Dev: Please see the some powerpoint slides that explain how we can bundle an improved JSON parser that will...

PHB: Bundle PowerPoint?! That's... genius! I can see a bright future for you here.

Dev: ...

dnsco · on March 1, 2021

This just made my day. Thank you.

kristaps · on March 1, 2021

Time complexity O(R*)

matheusmoreira · on March 1, 2021

Is it a matter of if? I wonder how many times faster it is...

JdeBP · on March 1, 2021

If you want to measure it, you first need to work out how to load 10MiB of JSON into the 6502's 64KiB address space.

Or test it rewritten for a KimKlone. (-:

* https://www.laughtonelectronics.com/Arcana/KimKlone/Kimklone...

vidarh · on March 1, 2021

Bank-switching like the KimKlone does would be possible to support via a cartridge on the C64. Alternatively - though I don't think any of them have been pushed to that size, the CBM REU's did memory transfers fast enough that they actually were faster than using the CPU (because they could use every available bus-cycle for the actual data transfer).

So you would likely need custom/new hardware, but in terms of how, it's a solved problem.

boomlinde · on March 1, 2021

That's all right, this is an event driven parser so at no point does it actually need a full representation of the document in addressable memory to parse it.

sim_card_map · on March 1, 2021

userbinator · on March 1, 2021

As surprising as it sounds, this might actually have a practical use --- with the 6502 being such a tiny core it ends up in a lot of obscure MCUs, possibly with IoT-ish/toys/etc. high-volume low-cost applications, and the seeming unwillingness of a certain subset of developers to use more efficient binary protocols, I wouldn't be surprised if someone does end up needing to get a 6502 to parse JSON.

The 8051 and Z80 fill similar roles today --- when even the cheapest ARM core is too expensive.

colejohnson66 · on March 1, 2021

I just can’t imagine anyone who knows how to write assembly being one of the “JSON everything” people. But who knows.

Not knocking JSON. Just thinking it wouldn’t be the right thing for this situation.

userbinator · on March 1, 2021

I agree, but unfortunately you sometimes don't get a choice of what the other end has created for you to consume.

magicalhippo · on March 1, 2021

I have a few STM32 projects where I need bidir communication between the microcontroller and the PC. The microcontroller has on the order of 10-20kB of RAM total and around 100kB of Flash.

After trying several other solutions I ended up with JSON over USB serial.

I tried several of the binary JSON-like variants, but they were all a PITA to port or ended up eating a lot of space, either in RAM or Flash or both. For this project I wasn't too concerned about message size or performance as such, as long as it could be read or generated in small portions.

I ended up adapting two JSON libraries, a one for parsing and for generating. The parser was SAX-like, so worked just fine for incremental parsing. Worked quite well, in total considerably smaller both in RAM and Flash than the alternatives.

bradstewart · on March 1, 2021

An STM32 is several orders of magnitude more powerful than an 8051 (or a 6052) though.

magicalhippo · on March 1, 2021

Sure but I wasn't after fast parsing, I was after a somewhat flexible data exchange format that didn't take a lot of space (both data and code), and ideally was easy to debug.

A custom binary protocol would certainly be smaller but would take a lot more time to develop and debug.

KingOfCoders · on March 1, 2021

I like JSON. I write Z80.

mpoteat · on March 1, 2021

Enter a JSON wrapper format for specifying assembly applications, compete with a NodeJS transpiler toolchain a la Babel.

Seriously though, it's been a while since I've had to work with assembly or IDA but JSON is great as an interoperable data format. Sure, it's not maximally efficient, but what needs to be?

lambda_obrien · on March 1, 2021

> it's not maximally efficient, what needs to be?

I'm guessing most things written for a 6502

QuesnayJr · on March 1, 2021

Emacs had to add JSON parsing to its C core to get reasonable performance when using JSON for interprocess communication.

nineteen999 · on March 1, 2021

What, you mean Emacs Lisp wasn't fast enough?

seibelj · on March 1, 2021

If it’s good enough and it’s what you need. Proper engineering doesn’t need to be maximally efficient, just adequate for the task at hand.

colejohnson66 · on March 1, 2021

I get what you’re saying, but why would a (relatively) wasteful format such as JSON be used on a design using a 6502? Why not a binary one?

goatlover · on March 1, 2021

> Proper engineering doesn’t need to be maximally efficient,

Until it does, like with limited hardware.

winrid · on March 1, 2021

I imagine it would be easier to have a proxy that exposes a binary protocol, but that's me...

wiz21c · on March 1, 2021

Do you have nulbers to back that ?

As it happens, I'm coding on the 6502 for fun (demo maker :-)). And of course, I want to build my own super coll tools to help my development. However, I always come to the same conclusion : developping tools makes sense only if other people will use it (ie the effort I put in that must be amortized).

Now, if you say the 6502 is still a thing, then I may have some incentive to complete these tools...

userbinator · on March 2, 2021

Here's one example: http://forum.6502.org/viewtopic.php?f=1&t=3346

catblast01 · on March 1, 2021

> The 8051 and Z80 fill similar roles today --- when even the cheapest ARM core is too expensive.

This is a bit of a reach.

There is no world in which these and JSON appropriately intersect. If you’re parsing json then you’re probably dealing with enough need for sanitization that you have already fucked up if you’re at the point of an 8 bit MCU — how did the json get there in the first place?

The 6502 used to be a General purpose CPU, so for some retro hacking stuff that may be different.

zamalek · on March 1, 2021

> JSON65 supports incremental parsing, so you can freely feed it any sized chunks of input, and you don't need to have the whole file in memory at once.

It boggles the mind that we have a high quality JSON parser that supports this on such a constrained system, where many of the high-level language libraries that run on massively faster hardware don't.

shultays · on March 1, 2021

It supports that because it is a constrained system

lifthrasiir · on March 1, 2021

Because it is slower? Many performant JSON parsers optimize parsing by batching, so it isn't constrained by possibly much slower I/O bottleneck. Also JSON65 itself does have a string size limit of 255 bytes.

speedgoose · on March 1, 2021

It's because there is no need for that and when you need to load a JSON that doesn't fit in memory you can use better dedicated tools.

patrick451 · on March 1, 2021

The first paragraph of the readme is definitely worth a click! One of the best of seen.

timmattison · on March 1, 2021

The opening line in the README.md is worth clicking through for.

sfg · on March 1, 2021

It is excellent.

caslon · on March 1, 2021

Is the 6502 the peak of hobbyists writing assembly? I think it fills a fantastic niche of being well-documented while also not being completely terrible to write by hand.

wvenable · on March 1, 2021

I just spent the day writing 6502 assembly for my homebrew computer. I finished writing an in-place command line parser for a ROM monitor.

The 6502 has a lot going for it -- very small instruction set, simple instruction timing, etc.

But it's also a challenge because it's only got 3 registers, a fixed stack location, the zeropage, and some other strange properties. It's taken a little while to figure out how to use it effectively.

userbinator · on March 1, 2021

Personally I prefer the Z80 --- it has the advantage of many more registers, addressing modes more suited to HLLs, and a stack that is not limited to a fixed 256-byte portion of the address space. Of course it also needs more than twice the number of transistors and thus die area, which may be why the 6502 is still so widespread.

cmrdporcupine · on March 1, 2021

Z80 instruction set is nicer, but the Z80 was also half the speed per clock cycle, less responsive to interrupts, and more expensive. Hence why the 6502 got slapped in a lot more home computers and the like.

Actually the 6502 designers really intended it more for control systems / embedded.

The 6809 is the best of the bunch tho.

dboreham · on March 1, 2021

Although Z80 has lower IPC, it had a roughly 2x clock speed so performance was about the same Z80 slightly faster for most applications.

It would be rare however for there to be a direct "bake off" between the two CPUs -- any given design organization was either an 8080/Z80 shop or a 6800/6502/6809 shop. There weren't sufficiently great differences in price/performance between the two to make it worthwhile changing.

zimpenfish · on March 1, 2021

You could get a BBC Model B (6502 2MHz), add the 3MHz 6502 board, add the 6MHz Z80B board, and do a "bake off" between all three (obvs. scaling for clock speed.)

Then add the 8MHz ARM1 board and watch it smoke the rest into dust, I suppose.

JdeBP · on March 1, 2021

Considering that said hobbyists make things like the KimKlone to improve upon the 6502, possibly it is not. (-:

* https://laughtonelectronics.com/Arcana/KimKlone/Kimklone_sho... (https://news.ycombinator.com/item?id=16564257 https://news.ycombinator.com/item?id=3070169 https://news.ycombinator.com/item?id=26292235)

I enjoyed the 6502, but I enjoyed the 6809 more.

ConcernedCoder · on March 1, 2021

6502 asm is incredibly easy to work with, the instruction set is intuitive as well as the memory layout, registers, etc...

caslon · on March 1, 2021

Absolutely agree! I love writing it, myself.

WorldMaker · on March 1, 2021

The 6502 was also in the right place at the right time in the wave of the great "homebrew computer club" era. There were so many DIY kits built around 6502 and 6502 clones. Some of the earliest mass produced (and mass consumed) "home computers" were all built around 6502 and 6502 clones. Not to mention how large a number of early game consoles were also built on those chips or their clones/offspring.

At least a little of hobbyists writing assembly in 6502 comes from nostalgia for early Apple, early Nintendo, early Commodore, etc.

Also, yeah as someone who's undergrad included a MOS 6502 "micro-controller" lab and Motorola 68k "microprocessor" lab courses, I can tell you from hands on experience that the 6502 really was in a fantastic place for writing clean, easy to debug assembly. That was a big time crunch factor in dealing with breadboarded machines where debugging the hardware was an equal or greater challenge. Given the choice between the two assembly languages I know which one I'd pick up again.

davidgould · on March 1, 2021

If you want a more modern microcontroller, but in the spirit of the 6502, take a look at the STM8.

gfody · on March 1, 2021

there's something incredibly comfortable about being able to view the instruction set along with affected flags and supported address modes in its entirety on something about the size of an index card

zozbot234 · on March 1, 2021

RISC-V has its own ISA summary card, AIUI. Plot twist: there are no address modes (since it's a load-store architecture, all loads and stores are explicit) and no flags to keep track of (since compare-and-branch instructions are used instead of flags, to make higher-performance implementations easier).

caslon · on March 1, 2021

The elegance of that time is something unfortunately missing from modern software.

the_only_law · on March 1, 2021

I don't even know hoe x86 experts got there. I'm constantly learning about all sorts of strange instruction that look like random strings and perform seemingly odd calculations.

caslon · on March 1, 2021

Intel has kept their ISAs roughly-consistent with each other since 1972. Of course, in doing this, they have created a behemoth. You can't make something smaller when your goal is eternal (in)consistency.

zimpenfish · on March 1, 2021

You could probably get the entire (original, ARM1-3ish) ARM instruction set onto an index card with a little creative freedom. Possibly even a single side?

edit: Forgot to explicitly clarify that it was ARM

protomyth · on March 1, 2021

I do admit, given a choice I would rather program the 6809. I loved the 6502 and it was the first processor I learned assembly, but I still think the 6809 is better.

wiz21c · on March 1, 2021

Give this a try : https://8bitworkshop.com/

bollorior · on March 1, 2021

If you asked me to write a JSON parser in 6502 assembly, I would put the fairly short and unambiguous JSON grammar into lex and yacc to get a parser in C, then compile that to 6502 assembly. Then maybe glance through the assembly code, although I doubt very much that I know anything about assembly that the compiler doesn't.

Of course the author is a highly skilled person who also did this for his enjoyment. But am I missing some way in which what he did would be more than incrementally better than what I would do? Aren't these tools pretty much as good as the best humans at solving this problem?

m463 · on March 1, 2021

There are some assembly languages that are pretty straightforward to write code in, and if you're organized it is not too onerous a task.

That said, 6502 is probably one of the LEAST featureful assembly languages I've ever used. I remember finding out it didn't have an ADD instruction - you clear carry, then add with carry.

tyingq · on March 1, 2021

Processors like the 6502 aren't considered very C friendly for a variety of reasons, but mostly because the register size isn't big enough for the most common data types like int or char *. You often end up with terrible ASM.

bollorior · on March 1, 2021

Interesting, thank you.

anta40 · on March 1, 2021

Kinda misleading title, because if you take a look at the code: https://github.com/ppelleti/json65/tree/master/src

It's only the core (https://github.com/ppelleti/json65/blob/master/src/json65.s) which is written in asm. The rest are C.

Seems neat anyway, let's fire a 6502 emulator :)

naters · on March 1, 2021

The "Library organization" section [1] explains that the json65.s file is the core of the library and the only code necessary to build the library. The additional C code provides a handy tree structure and a callback to pass to the parsing engine.

However, because SAX is a callback-oriented parsing method, you can design your own data structure and write your own callback functions and do without this tree structure.

The additional C code provides similar nice-to-haves, such as string pool interning, a function to print out the aforementioned tree structure, and a wrapper function to parse json from a file.

[1] https://github.com/ppelleti/json65#library-organization

RantyDave · on March 1, 2021

Came here to say that. Also we have cbor which is "json for embedded".

But, hey! it's still pretty hardcore. Not something I'd be doing.

boomlinde · on March 1, 2021

It's not misleading. It's a JSON parser written in 6502 assembly language. The C stuff is just things you may or may not want to do with a JSON parser (json65-file.c to parse the content of a given FILE, json65-tree.c to deserialize JSON into a tree data structure).

uKVZe85V · on March 1, 2021

According to github https://github.com/ppelleti/json65 C 47.1%, Assembly 40.4%, Perl 9.4%, Haskell 3.1%.

Yes, it actually contains perl and haskell code [0].

[0]: https://github.com/ppelleti/json65/blob/master/run-tests.pl https://github.com/ppelleti/json65/tree/master/tools

lifthrasiir · on March 1, 2021

The leading paragraph mentions "It's time to write a JSON parser in 6502 assembly language?" so I figured out everything in C is supplementary ;-)

rsj_hn · on March 1, 2021

I would be careful to use this without validating and normalizing the unicode prior to passing it to this parser. Not saying that it's a bad design, but big flashing security warnings should be used so that people are aware that higher order unicode can and usually is normalized prior to JSON parsing (so that, e.g. producers that use full width quotation marks can have their json parsed correctly rather than not have the double quotes recognized) and this consumer can only handle fully normalized and validated UTF8

CorpOverreach · on March 1, 2021

I'm pretty sure no one is running mission/security-critical applications on a 6502.

rsj_hn · on March 1, 2021

I always assume the worst! There will be some auto engineer trying to backport some functionality and will be googling "JSON Assembly" and will port this code.

211BSD · on March 1, 2021

It won't and should not be used in mission-critical applications, but just for extra fun, one may still want it to be as correct as possible. Right now I'm writing a cryptography library in PDP-11 assembly while investigating tools for its formal verification. Cryptoline looks like a promising candidate.

uKVZe85V · on March 1, 2021

Never underestimate the longevity of systems, since new systems tend to be built on top of old systems.

Though I do not know any, I'm pretty sure in 2021 there *are* mission/security-critical applications running on 6502s in a number of places in the world.

Tor3 · on March 1, 2021

The 6502 is one of the few processors validated for embedded medical devices. From https://www.westerndesigncenter.com/ ".. the 65xx microprocessors protect millions of lives annually within embedded heart defibrillation and pacing systems. " Can't get more mission-critical than that I suppose..

ddingus · on March 1, 2021

This makes me happy. One day some of us could end up with a 6502 keeping us alive, and perhaps writing 6502 code.

Sometimes the world makes me grin.

peanut_worm · on March 1, 2021

I am a lowly JS dev so forgive my ignorance but what exactly would it be parsed to? Assembly doesn’t have a concept of objects/hash tables, does it?

lifthrasiir · on March 1, 2021

Its core, implemented in 6502 assembly, is a streaming parser and emits a series of events to signal the boundary of objects or arrays and atomic values. For example `[1, {"a": true}]` will emit something like `abegin; number 1; obegin; okey "a"; true; oend; aend; end`. There is a supplementary C library that parses this into a recursive structure (you will typically need to do the linear search for keys then).

peanut_worm · on March 1, 2021

Oh okay that makes sense. Thank you for explaining.

minitoar · on March 1, 2021

This particular parser is intended to be called from C.

sircastor · on March 1, 2021

My only experience with the 6502 is writing ASM for the NES. So I had no concept for how this would work.

nineteen999 · on March 1, 2021

You could use the CC65 compiler to compile C to 6502 assembly, although, even C generally uses a (relatively) obscene amount of memory on an 8-bit system.

https://www.cc65.org/

user-the-name · on March 1, 2021

For comparison, here is a single-file JSON parser I wrote for resource constrained systems, but which more targets things like 32-bit Cortex M systems rather than 8-bit:

https://github.com/DagAgren/SmallJSONParser

ddoolin · on March 1, 2021

But is the JSON allowed to have comments?

Very cool. Things like this make me admire people's patience.

debo_ · on March 1, 2021

ASM, Perl, Haskell, and C all in one JSON parser. I got a kick out of reading this.

bilekas · on March 1, 2021

> strings are limited to 255 bytes

> However, there is no limit on the length of a line,

I'm a little confused by this.. Can someone explain ?

theblazehen · on March 1, 2021

I think the size of a json string would be limited, eg

    {'key': 'string that is limited'}

ddingus · on March 1, 2021

Any entity within a line needs to be 8bits in length or less.

Probably due to the index registers being 8 bits themselves.

     LDA (pointer to string), X