Hacker Newsnew | past | comments | ask | show | jobs | submit | zmonx's commentslogin

You can crash Emacs for example with the recipe from #2099:

   $ emacs -Q --eval "(let (v) (while t (setq v (cons v v))))"
In response to how such deep structures can arise: They may for example arise when testing Emacs with randomly generated code snippets. For testing edge cases, it is highly desirable that Emacs run robustly also in such situations.


Indeed emacs should be robust. I recently ran some fuzz-testing of Emacs, and found evaluating a similar recursive example would crash it:

https://blog.steve.fi/i_ve_never_been_more_proud.html

My case was resolved (a file with a few thousand "`" characters), but despite running more fuzzing I couldn't get any other interesting results.


I use AFL (American Fuzzy Lop) on TXR from time to time.

And, guess what, there was also a problem with nested backquotes. Backquote is spelled ^ (caret) in TXR Lisp, so the cases that were triggering the degenerate behavior had ^^^^^^^ rather than ```````...

http://www.kylheku.com/cgit/txr/commit/?id=ab98634ea89927220...


Small world. This is my bug-report, which meandered a little:

https://lists.gnu.org/archive/html/bug-gnu-emacs/2017-07/msg...


That loop doesn't release any of the objects that it creates; the pointer v always hangs on to every cons cell that was created. So there is nothing for the garbage collector to do here other than to confirm that everything is reachable and not actually reclaim anything. If we disable garbage collection, it will hit OOM. Die here, die there.

BTW: (setq v (cons v v)) -> (push v v)


Whatever it does, I expect Emacs to not crash.

If it cannot allocate more memory, I expect it to throw a Lisp exception that tells me so, not to crash the whole process.

I greatly appreciate Daniel's work to increase the robustness of Emacs, because it means that there will be fewer situations where a process crash leads to lost work for users. In addition, increasing the robustness of Emacs in such cases will also greatly broaden the set of test cases that can be run reliably, and hence help to detect and correct even more issues.


Without a question, if cons cannot allocate, it should throw some exception; the code with its best intentions should be there. That way, someone who can tune their operating system or run on a MMU-less system can take advantage of it. The rest of the people won't actually it though; their system will thrash and either it or they will kill the offending process before it has a chance to reach that exception throw. That's if they don't run out of patience and just reboot the machine.


These are very good points! I would generalize "functional" to declarative though:

Logic programming languages like Prolog and Mercury are also much more amenable to parallelization than C-like languages.

In fact, different Prolog clauses could in principle be executed in parallel without changing the declarative meaning of the program, at least as long as you stay in the so-called pure subset of the language which imposes certain restrictions on the code.


Prolog has definite clause grammars (DCGs), which are very similar to monads.

You can think of a DCG as giving you two implicit arguments, which you can use to express concatenations and in fact arbitrary relations between states.

There are also ways to access these implicit arguments, which are similar to the corresponding constructs in Haskell.

DCGs are a very important feature of Prolog. They are frequently used for parsing and state transformation tasks. Like monads in Haskell, DCGs often make working with immutable data much more convenient, because you can make the more uninteresting parts of your code implicit. Other logic programming languages like Mercury also support DCGs.


It never occurred to me that there were similarities between DCGs and monads, but that's an interesting argument. Thank you for this!


In my experience, that is an overgeneralization, though definitely a tempting one that is frequently encountered.

That being said, I find that Prolog code is often more readable than Lisp code. An important reason for this is that Prolog supports prefix, infix and postfix operators that can be defined as part of the concrete syntax. On the level of the abstract syntax tree, all terms conform to the inductive definition, so this is only a notational convenience.


To distinguish homoiconic languages from others (where you may also "operate on the parse tree just the same as you can on regular program data"), a bit more qualification is needed though.

For example, in typical cases, reasoning about such data structures (lists in Lisp, terms in Prolog, bytes in assembly code etc.) is very convenient in homoiconic languages, and in fact I think one could rightfully regard this ability to conveniently reason about a program's abstract syntax tree via built-in language mechanisms as a key advantage of homoiconic languages.


In my opinion, this article puts too much emphasis on reading, and too little emphasis on actually reasoning about programs in homoiconic languages like Prolog and Lisp, and due to this imbalance the conclusion is not sufficiently justified.

It is true: Being able to "read without parsing" is definitely nice.

But that is only a subset of those advantages that a homoiconic language gives you. An at least equally important advantage is due to the fact that programs in homoiconic languages are typically very easy to reason about by built-in mechanisms in that language.

For example, Prolog programs are readily represented as Prolog terms, and can be easily reasoned about by built-in mechanisms such as unification.

Since I regard it as a key advantage of homoiconic languages that their abstract syntax is completely uniform and can typically be easily reasoned about within such languages, I disagree with the main point that the article is trying to make.

One interesting fact about homoiconicity is that extremely low-level languages (like assembly code) and extremely high-level languages (like Prolog) are homoiconic, yet there is a large gap "in the middle", where there are many languages (like Java, C, Python etc.) that lack this property.


Agreed; I've written Python code that processes Python code (both using ast and redbaron), and the result was quite opaque, unlike my metaprograms in Prolog.

In fact, the macropy project[1] offers the "read" step (by abusing the import system), and while using them is pretty cool, I don't think the implementation of the macros is very nice.

[1] https://github.com/lihaoyi/macropy


> very easy to reason [sic] about by built-in mechanisms in that language.

Languages allow you to express your reasoning, but they don't do the reasoning by themselves. Also, there is no conclusive evidence that homoiconic languages have simpler semantics, especially of the denotational kind.


I think the meaning is that fewer things are implemented by bringing in outside capabilities. In lisp, everything's can be expressed in terms of other lisp code. Typically directly so. In other languages that are not as macro friendly, most keywords get the explanation of "this is how the computer will act" and then explanations of new behaviors.


> In lisp, everything's can be expressed in terms of other lisp code.

Um, and how exactly are primitive forms defined?

> most keywords get the explanation of "this is how the computer will act" and then explanations of new behaviors.

Have you heard of Hoare logic? The meaning of ordinary ALGOL-style imperative programs can be given in terms of relating preconditions to postconditions. Suppose that you have the Hoare triples:

    {P} foo {Q}
    {Q} bar {R}
Then you can derive the Hoare triple:

    {P} foo; bar {R}
Note that `Q` is not mentioned at all. Hence, any implementation is free to translate the program

    foo; bar
into something that doesn't have `Q` as an intermediate state.


You seem to be arguing past me. If you are just upset that I said everything instead of most things, yeah, obviously some things are defined elsewhere. To that end, how things are actually implemented takes a trip to assembly. (I mainly blame that I'm writing in my phone. Often while in the bus.)

My point was that you don't typically see c constructs explained in terms of other c constructs. This is quite common in lisp. To see lisp constructs explained in terms of other lisp constructs. In large because there are few constructs.

You showing me that you can explain using other logic is kind of my point. It is awesome that you can do this. I recommend the skill. It is still not showing c or Java or Haskell or whatever in terms of themselves.

Note that I think you actually can do this, in large. It is not typically done, though.


> To that end, how things are actually implemented takes a trip to assembly.

Don't confuse “defined” with “implemented”. This is the entire point to having an axiomatic semantics!

> My point was that you don't typically see c constructs explained in terms of other c constructs.

Languages can't be entirely defined in terms of themselves. At some point you need to drop down to something else. If most of Lisp is defined in terms of other Lisp constructs, that is perfectly fine, but, for my purposes, i.e., proving things about programs, there are two mutually exclusive possibilities:

(0) The semantics of Lisp is the semantics of its primitive forms, and derived forms are just Lisp's standard library.

(1) So-called “derived” forms have an abstract semantics of their own right, and their implementation in terms of so-called “primitive” forms is, well, an implementation detail.

So, my answer to “most of Lisp is defined in terms of Lisp itself” is “that's cute, but how mathematically elegant is the part of Lisp that is not defined in terms of itself?”


I said explained. Not defined. Not implemented. Definitely not proved. Just explained. There is typically exposition, as well.

So, by all means, keep arguing points I'm not making. I was offering what I suspect the parent post meant by it being easier to reason using the mechanics of the language. Nothing more.


> I was offering what I suspect the parent post meant by it being easier to reason using the mechanics of the language.

And it still doesn't make sense. “Reasoning about programs” is making inferences about their meaning, i.e., deriving judgments from prior judgments. How exactly do homoiconic languages make it any easier to make inferences about the meaning of programs, given that homoiconicity is largely a property of how concrete and abstract syntaxes are related to each other? (Not that homoiconicity makes things more difficult either. It is just completely unrelated to reasoning about programs.)


For one particular example where homoiconicity makes reasoning about programs easier, consider an important reasoning method called abstract interpretation:

https://en.wikipedia.org/wiki/Abstract_interpretation

Using abstract interpretation, you can derive interesting program properties. The uniformity and simplicity of Prolog code, as well as its built-in language constructs like unification and backtracking, make it especially easy to write abstract interpreters for Prolog.

Here is a paper that applies this idea to derive several interesting facts about programs and their meaning:

Michael Codish and Harald Søndergaard, Meta-circular Abstract Interpretation in Prolog (2002) https://link.springer.com/chapter/10.1007%2F3-540-36377-7_6

Abstract interpretation is also applicable to other programming languages. However, it is much easier to apply to homoiconic languages like Prolog.


Abstract interpretation is entirely about the semantics of programs and has nothing to do with their surface syntax.


Yes, indeed!

Please note that what makes this reasoning method so easily applicable in this case is uniformity of the abstract syntax, not of the surface syntax, which is also called concrete syntax.

Homoiconicity is a relation between the concrete and abstract syntax tree (AST) of programs and the language's built-in data structures.


No idea about you, but when I reason about programs, I spend very little time manipulating syntax. Most of the time is spent manipulating semantic objects, like predicates on the program state, whose representation is independent from even the abstract syntax of a programming language.


I learned a lot of math from other math. Even better if it was in the same symbology that I was already used to.

Specifically, learning algebraic manipulation is typically taught by showing the basic math that you are abstracting over. Multiplication is often taught in terms of addition.

Are there deeper understandings? Of course! I am again just saying that I see the appeal for this method and suspected that was the point that sparked confusion.


> One interesting fact about homoiconicity is that extremely low-level languages (like assembly code) and extremely high-level languages (like Prolog) are homoiconic, yet there is a large gap "in the middle", where there are many languages (like Java, C, Python etc.) that lack this property.

There are languages like picolisp and guile that try and fill that gap a little :)


I'd like to know more examples of this



Have you not heard of Clojure?

https://clojure.org


Julia is homoiconic but has much more complex syntax than Lisp. I personally find Julia macros harder to think about than Lisp or Scheme macros, partly for that reason.


Please note that the creators of Julia no longer call it homoiconic [1].

The fact that you can access the AST in a language is not sufficient to make it homoiconic. There are several programming languages like Julia that let you access the AST yet are not (conventionally) considered homoiconic.

[1] https://stackoverflow.com/questions/31733766/in-what-sense-a...


Oh, I didn't realize that. Thanks!


Yes, very much so.

In addition, some of the automated translators I have seen also apply automated refactoring: They tend to merge similar sections of code and minimize the delta (similar to diff) that is factored out. This can create a maintenance problem especially in the context of a system like the IRS's, where presumably the code that performs calculations for different years or other periods (legislations etc.) is similar but not identical, but must be retained as fully as possible to reliably perform calculations that are subject to earlier regulations.

Such portions of code should be kept distinct, but an automated conversion may conflate them, and it may require additional and error-prone fiddling to enforce the separation.


Although it may appear a bit surprising at first, assembly language is homoiconic in the sense that you can easily reason about the program code within the language: It is easy to reason about bytes in assembler language, and compiled assembler code is just a sequence of bytes.

This is similar to high-level languages like Lisp and Prolog, whose code is readily represented by built-in data structures in the respective language.

A fun fact about this is that extremely low-level (like assembly) and extremely high-level (like Prolog) languages are homoiconic, but there is a large gap "in the middle" (like Java, C), where there are many languages that lack this property.


I agree, and recommend Biomake:

https://github.com/evoldoers/biomake

It uses the declarative programming language Prolog to write flexible rules.

A logic programming language like Prolog is very well suited for expressing rule-based dependencies.


Very nice, thank you for sharing!

I have one comment on the naming convention. Consider has_type/3 from the post. For example:

    has_type(_, true, bool).
A better name for this would be:

    term_type(_, true, bool).
or even:

    context_term_type(_, true, bool).
This makes clear what each of the arguments means, and does not limit the reading to only one direction of use. Note that we can query this also in other directions, and the predicate still makes sense!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: