Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It alrady handles long lists without recursion:

    case Lisp_Cons:
      {
	register struct Lisp_Cons *ptr = XCONS (obj);
	if (CONS_MARKED_P (ptr))
	  break;
	CHECK_ALLOCATED_AND_LIVE (live_cons_p);
	CONS_MARK (ptr);
	/* If the cdr is nil, avoid recursion for the car.  */
	if (EQ (ptr->u.s.u.cdr, Qnil))
	  {
	    obj = ptr->u.s.car;
	    cdr_count = 0;
	    goto loop;
	  }
	mark_object (ptr->u.s.car);
	obj = ptr->u.s.u.cdr;
	cdr_count++;
	if (cdr_count == mark_object_loop_halt)
	  emacs_abort ();
	goto loop;
      }
The "goto loop" chases the cdr pointer iteratively. So with regard to list structure, this will only blow the stack if you have deep car recursion. When conses are used to represent nested lists, you don't get that much depth on the CAR side.

I'm just wondering what is the exact scenario, involving what structure.

Of course there can be other kinds of linked objects. E.g. a huge graph structure, where the GC happens to traverse a very long path through the graph before bottoming out somewhere.

BTW, "[i]f the cdr is nil, avoid recursion for the car." Initial reaction: good frickin' idea and I'm stealing it immediately. Though, wait, not sure what it buys you in practical terms; who the heck builds a deep structure of conses where the linkage is through car and the cdr-s are nil.

A better test might be for the cdr being any atom at all. Though obviously some kinds of atoms are structures with pointers that can have depth behind them, many kinds of atoms are shallow. If a cons has any atom whatsoever as its cdr, then maybe we should recurse on the cdr and tail loop on the car.

Analysis required.



You can crash Emacs for example with the recipe from #2099:

   $ emacs -Q --eval "(let (v) (while t (setq v (cons v v))))"
In response to how such deep structures can arise: They may for example arise when testing Emacs with randomly generated code snippets. For testing edge cases, it is highly desirable that Emacs run robustly also in such situations.


Indeed emacs should be robust. I recently ran some fuzz-testing of Emacs, and found evaluating a similar recursive example would crash it:

https://blog.steve.fi/i_ve_never_been_more_proud.html

My case was resolved (a file with a few thousand "`" characters), but despite running more fuzzing I couldn't get any other interesting results.


I use AFL (American Fuzzy Lop) on TXR from time to time.

And, guess what, there was also a problem with nested backquotes. Backquote is spelled ^ (caret) in TXR Lisp, so the cases that were triggering the degenerate behavior had ^^^^^^^ rather than ```````...

http://www.kylheku.com/cgit/txr/commit/?id=ab98634ea89927220...


Small world. This is my bug-report, which meandered a little:

https://lists.gnu.org/archive/html/bug-gnu-emacs/2017-07/msg...


That loop doesn't release any of the objects that it creates; the pointer v always hangs on to every cons cell that was created. So there is nothing for the garbage collector to do here other than to confirm that everything is reachable and not actually reclaim anything. If we disable garbage collection, it will hit OOM. Die here, die there.

BTW: (setq v (cons v v)) -> (push v v)


Whatever it does, I expect Emacs to not crash.

If it cannot allocate more memory, I expect it to throw a Lisp exception that tells me so, not to crash the whole process.

I greatly appreciate Daniel's work to increase the robustness of Emacs, because it means that there will be fewer situations where a process crash leads to lost work for users. In addition, increasing the robustness of Emacs in such cases will also greatly broaden the set of test cases that can be run reliably, and hence help to detect and correct even more issues.


Without a question, if cons cannot allocate, it should throw some exception; the code with its best intentions should be there. That way, someone who can tune their operating system or run on a MMU-less system can take advantage of it. The rest of the people won't actually it though; their system will thrash and either it or they will kill the offending process before it has a chance to reach that exception throw. That's if they don't run out of patience and just reboot the machine.


> I'm just wondering what is the exact scenario, involving what structure.

> Was: Re: bug#30626

    (seq-doseq (_ (stream-range 1 1000000)) nil)


https://debbugs.gnu.org/cgi/bugreport.cgi?bug=+bug%2330626

Looks like an X Y problem here. Let's fix the garbage collector (X) because of some badly implemented lazy processing (Y) where the real issue is.

I don't believe anyone has a filesystem that can't be traversed by a lazy data structure in Lisp, with a recursive GC, in 8 megs of stack. (Unless, say, the traversal is being fooled by cycles created by directory symlinks.)


One could argue that. One could also argue that it should be possible to collect any garbage successfully created.


Nope; worse is better. It has to be recognized that a garbage collection scheme which can satisfy this idealist requirement has some disadvantages relative to one which doesn't.

Should it also be possible to successfully read any well-formed S-expression that we can create in a text stream? Oops, don't use recursion in the reader, then.


It doesn't have disadvantages apart from the code to manually manage an explicit stack. The actual space used will be reduced.

And yes, it should be possible to successfully read any well formed sexp that can be created in a text stream and held in memory. That's why quality parsers use an explicit stack as well.

Recursion is a useful development tool in some cases, but when iterative algorithms fail because they use it, the implementations are buggy.


I rewrote the JSC parser to be mostly iterative many years ago - for a “real” language making an iterative parser that is readable is incredibly challenging. Especially in something like js where an expression could actually contain a statement. The current state is “as iterative as is sanely understandable” and handles all real (and some insane) programs fine. I suspect similar is true for other production hand written parsers.

Your alternative is table based parsers which can only reasonably be implemented with generators. The problem is that table based parsers are slower than hand rolled recursive descent parser by a fairl large margin, and general purpose generators (bison, yacc , etc) tend to have difficulty with the quirky syntax of large languages. Note there have been a number of papers over the years that demonstrated 2x perf improvements in yacc/bison by switching from indexed state transitions to computed go to, but even that leaves them slower than hand rolled.


> for a “real” language making an iterative parser that is readable is incredibly challenging.

Not to mention if it is defined in terms of a character-classifying read-table which dispatches functions, which can be patched by the user program (as in Common Lisp).


You can hold a S-exp in memory such that you don't have anywhere near enough memory left to scan it with even your condensed explicit stack.

> That's why quality parsers use an explicit stack as well.

Observation: that rules out the entire description of the ANSI CL reader from being quality, haha.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: