At a high level, compilers just translate one programming language to another. A key part of this translation is the Abstract Syntax Tree (AST), which represents the programming language transformed into a tree of computation independent of the syntax of the language. Once you have the AST, you can then step through it and translate the tree to anther language like java bytecode, ASM, CIL, etc.
When compiling the AST sits in the middle of the whole process. The first part deals with parsing your language into the AST the second part deals with transforming the AST to the target language (including possible optimizations of the tree).
Lisp is effectively the raw AST, which is where its power comes from. The use of parentheses is the cleanest way to directly represent a raw tree that you can interact with. This means that you can use Lisp to cut the language design process in half from either direction:
On the one hand, if you are worried about parsing your language, then you can simply transform it into lisp, which is virtually identical to creating the AST, and then you can use a lisp interpreter/compiler for the second half.
On the other hand, if you're interested in writing the interpreter/compiler for a language and don't want to stress about parsing, you can write it for lisp and not have to worry about parsing a complex language. If you follow the Norvig code you can extend that example to a language devoid of parenthesis by writing the parser for it.
Even if you want to do both it's not a terrible idea to prototype both halves using Lisp and then perform the minimal work pull out the Lisp code and replace it with some other implementation of the AST.
> Given the current state of machine learning, could an AST be generated to accommodate a given language?
No machine learning required. It is certainly possible to automatically generate an AST based on a grammar.
The more interesting question is whether you can automatically generate a grammar based on samples of the language - https://en.wikipedia.org/wiki/Grammar_induction - although it is unclear why you'd bother trying for a programming language (as opposed to natural language).
> And more importantly, can the form of an AST cause a difference in performance of the overall program?
If you are compiling to machine code, or to bytecode for some VM, or transpiling to another source language (e.g. JavaScript): not really. The form of an AST could make a difference to compilation speed or the ease of implementing later stages of the compiler, but I can't see how it would directly make any difference to the performance of the compiled program.
However, if you are doing tree-based interpretation: Yes, differences in choice of AST can make a significant difference to runtime performance.
When compiling the AST sits in the middle of the whole process. The first part deals with parsing your language into the AST the second part deals with transforming the AST to the target language (including possible optimizations of the tree).
Lisp is effectively the raw AST, which is where its power comes from. The use of parentheses is the cleanest way to directly represent a raw tree that you can interact with. This means that you can use Lisp to cut the language design process in half from either direction:
On the one hand, if you are worried about parsing your language, then you can simply transform it into lisp, which is virtually identical to creating the AST, and then you can use a lisp interpreter/compiler for the second half.
On the other hand, if you're interested in writing the interpreter/compiler for a language and don't want to stress about parsing, you can write it for lisp and not have to worry about parsing a complex language. If you follow the Norvig code you can extend that example to a language devoid of parenthesis by writing the parser for it.
Even if you want to do both it's not a terrible idea to prototype both halves using Lisp and then perform the minimal work pull out the Lisp code and replace it with some other implementation of the AST.