It alrady handles long lists without recursion: case Lisp_Cons: { register struc...

zmonx · on March 3, 2018

You can crash Emacs for example with the recipe from #2099:

   $ emacs -Q --eval "(let (v) (while t (setq v (cons v v))))"

In response to how such deep structures can arise: They may for example arise when testing Emacs with randomly generated code snippets. For testing edge cases, it is highly desirable that Emacs run robustly also in such situations.

stevekemp · on March 3, 2018

Indeed emacs should be robust. I recently ran some fuzz-testing of Emacs, and found evaluating a similar recursive example would crash it:

https://blog.steve.fi/i_ve_never_been_more_proud.html

My case was resolved (a file with a few thousand "`" characters), but despite running more fuzzing I couldn't get any other interesting results.

kazinator · on March 3, 2018

I use AFL (American Fuzzy Lop) on TXR from time to time.

And, guess what, there was also a problem with nested backquotes. Backquote is spelled ^ (caret) in TXR Lisp, so the cases that were triggering the degenerate behavior had ^^^^^^^ rather than ```````...

http://www.kylheku.com/cgit/txr/commit/?id=ab98634ea89927220...

stevekemp · on March 3, 2018

Small world. This is my bug-report, which meandered a little:

https://lists.gnu.org/archive/html/bug-gnu-emacs/2017-07/msg...

kazinator · on March 3, 2018

That loop doesn't release any of the objects that it creates; the pointer v always hangs on to every cons cell that was created. So there is nothing for the garbage collector to do here other than to confirm that everything is reachable and not actually reclaim anything. If we disable garbage collection, it will hit OOM. Die here, die there.

BTW: (setq v (cons v v)) -> (push v v)

zmonx · on March 3, 2018

Whatever it does, I expect Emacs to not crash.

If it cannot allocate more memory, I expect it to throw a Lisp exception that tells me so, not to crash the whole process.

I greatly appreciate Daniel's work to increase the robustness of Emacs, because it means that there will be fewer situations where a process crash leads to lost work for users. In addition, increasing the robustness of Emacs in such cases will also greatly broaden the set of test cases that can be run reliably, and hence help to detect and correct even more issues.

kazinator · on March 5, 2018

Without a question, if cons cannot allocate, it should throw some exception; the code with its best intentions should be there. That way, someone who can tune their operating system or run on a MMU-less system can take advantage of it. The rest of the people won't actually it though; their system will thrash and either it or they will kill the offending process before it has a chance to reach that exception throw. That's if they don't run out of patience and just reboot the machine.

sebcat · on March 3, 2018

> I'm just wondering what is the exact scenario, involving what structure.

> Was: Re: bug#30626

    (seq-doseq (_ (stream-range 1 1000000)) nil)

kazinator · on March 3, 2018

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=+bug%2330626

Looks like an X Y problem here. Let's fix the garbage collector (X) because of some badly implemented lazy processing (Y) where the real issue is.

I don't believe anyone has a filesystem that can't be traversed by a lazy data structure in Lisp, with a recursive GC, in 8 megs of stack. (Unless, say, the traversal is being fooled by cycles created by directory symlinks.)

sebcat · on March 3, 2018

One could argue that. One could also argue that it should be possible to collect any garbage successfully created.

kazinator · on March 3, 2018

Nope; worse is better. It has to be recognized that a garbage collection scheme which can satisfy this idealist requirement has some disadvantages relative to one which doesn't.

Should it also be possible to successfully read any well-formed S-expression that we can create in a text stream? Oops, don't use recursion in the reader, then.

jemfinch · on March 3, 2018

It doesn't have disadvantages apart from the code to manually manage an explicit stack. The actual space used will be reduced.

And yes, it should be possible to successfully read any well formed sexp that can be created in a text stream and held in memory. That's why quality parsers use an explicit stack as well.

Recursion is a useful development tool in some cases, but when iterative algorithms fail because they use it, the implementations are buggy.

olliej · on March 5, 2018

I rewrote the JSC parser to be mostly iterative many years ago - for a “real” language making an iterative parser that is readable is incredibly challenging. Especially in something like js where an expression could actually contain a statement. The current state is “as iterative as is sanely understandable” and handles all real (and some insane) programs fine. I suspect similar is true for other production hand written parsers.

Your alternative is table based parsers which can only reasonably be implemented with generators. The problem is that table based parsers are slower than hand rolled recursive descent parser by a fairl large margin, and general purpose generators (bison, yacc , etc) tend to have difficulty with the quirky syntax of large languages. Note there have been a number of papers over the years that demonstrated 2x perf improvements in yacc/bison by switching from indexed state transitions to computed go to, but even that leaves them slower than hand rolled.

kazinator · on March 5, 2018

> for a “real” language making an iterative parser that is readable is incredibly challenging.

Not to mention if it is defined in terms of a character-classifying read-table which dispatches functions, which can be patched by the user program (as in Common Lisp).

kazinator · on March 4, 2018

You can hold a S-exp in memory such that you don't have anywhere near enough memory left to scan it with even your condensed explicit stack.

> That's why quality parsers use an explicit stack as well.

Observation: that rules out the entire description of the ANSI CL reader from being quality, haha.