Starting in about the 11th century, philosophers in India of the Navya-Nyāya school adopted a similar artificial dialect of Sanskrit, to make their arguments precise. Of course there was no computer implementation backing the formalization, but the goal was similar: to have a language that remains a subset of natural language and (somewhat) intelligible to a typical educated person, yet is a formal language with unambiguous meanings. A good description of this language is in a couple of papers by Ganeri [1].
For example (taken from [2]), instead of saying the straightforward “Caitra goes to the village”, if you wished to be really precise you might say “There is an activity which leads to a connection-activity which has as Agent no one other than Caitra, specified by singularity, [which] is taking place in the present and which has as Object something not different from ‘village’.” (Sounds a bit less unnatural and also a bit less confusing in Sanskrit.)
> [Attempto Controlled English] can serve as knowledge representation…
It is not clear to me when this project started, but there was a 1985 article pointing out the Navya-Nyāya/“Shastric Sanskrit” language as an example of something that is both (somewhat) a natural language and (somewhat) usable for knowledge representation [2]. In its own way that article became, in popular culture in some circles in India, the source of various unfounded memes about Sanskrit being good for computers, or more absurd claims (https://news.ycombinator.com/item?id=14295285).
I believe Latin was still being used for theological discourse in Europe well into the 1700s for that same reason, precision. Something nice about a dead language is the meanings of the words don't have a way to change from under you.
Latin is still perfectly capable of every kind of ambiguity that other natural languages are. For example, I once wrote the sentence
Quisque aliquid habet quod occultet
for a t-shirt.
While the intended meaning is 'everybody has something to hide', in a different context one could imagine that the subject of "occultet" is someone else previously referred to. For example, if we had just been talking about Moxie Marlinspike, we could conceivably read this sentence as 'everybody has something for him [Moxie] to hide'. (Like, all of us users out here have got different things that Moxie can help each of us to protect.)
There's also a famous joke "malo malo malo malo malo" ('I prefer (being) a bad man in a bad situation to (being) an apple in an apple tree'). I'm sure we can proliferate examples of ambiguous Latin to match every other natural language.
A cool disambiguation feature in Latin is the distinction between the possessive pronouns "eius" and "suus", where "suus" is used when referring to possessions of the grammatical subject of the sentence and "eius" when referring to someone else's possessions. While English can specify the former ("his/her/its own"), it doesn't have a straightforward way to show that the possessor is not the subject of the sentence.
You can see the contrast between eius and suus in the text of the Magnificat
where "ancillae suae" ('his handmaiden') occurs in a sentence whose grammatical subject is God, but "nomen eius" ('his name') in a sentence whose grammatical subject is the name. And sure enough, there is an actual disambiguation between the subject of a sentence and someone else later on:
Suscepit Israel, puerum suum, recordatus misericordiae suae, sicut locutus est ad patres nostros, Abraham et semini eius in saecula.
He [God] has taken up Israel, his [God's] servant, remembering his [God's] mercy, as he [God] said to our ancestors, Abraham and his [Abraham's] seed forever.
In this case "his mercy" and "his seed" refer to God's mercy but Abraham's seed, but there is no referential ambiguity about that in the Latin because one is "misericordiae suae" and the other is "semini eius".
That's a great point and very interesting comment. I imagine one of the reasons for this formalized (or if you don't like it, over-precise) language to arise and take hold in the Indian setting may have been the fact that Sanskrit too is capable of an amazing amount of ambiguity, in multiple ways:
1. Creative re-interpretation/re-analysis: A lot of commentators have re-interpreted existing verses to derive different meanings, and this is very possible to do. To pick a simple example, there is Kālidāsa's verse “…jagataḥ pitaru vande pārvatīparameśvarau” which clearly is a prayer to Pārvatī and Parameśvara (Shiva). But pārvatī-parameśvarau could be re-analyzed as pārvatīpa-rameśvarau (Parvati's and Ramā (Lakshmi)'s husbands), i.e. a prayer to Shiva and Vishnu. Similarly there are Shiva-para and Vishnu-para interpretations of various verses/works, people have written spiritual commentaries on love poetry, etc. So if something straightforward you say can be interpreted to have any meaning whatsoever (exaggerating a bit) by a sufficiently clever commentator, you better be careful :)
2. Happening naturally: Poets have used this too, what appear to be the same word being used in different settings. For example, here's a prayer that millions of people recite, to Ganesha: “agajānāna-padmārkaṃ gajānanam aharniśam / anekadantam bhaktānām ekadantam upāsmahe” — here the first line has the well-known word “gajānana” (the one with an elephant face), but it also starts with “agajānana” with appears to be the opposite (not an elephant face?). Actually the simple compound “agajānāna-padmārkaṃ” turns out to be made from: a-ga=mountain (that which does not move), thus agajā=daughter of the mountain (Pārvatī), agaja-ānana=Pārvati's face, agajānana-padma=the lotus of Pārvati's face, and the whole word agajānana-padma-arka=the sun to the lotus that is the face of Pārvatī (the sun of course being what makes a lotus bloom), thus it's a simple adjective describing Ganesha (namely that he makes his mother's face bloom with joy). And this is a perfectly straightforward usage of language that most educated readers will simply understand and find unremarkable, not a trick. At most a pleasing coincidence that the same syllables repeat (known as “yamaka”). In the second half, “ekadantam” refers to Ganesha having one tusk, but the “anekadantam” that it starts with is not the opposite of that but simply “anekadam taṃ” (him, who gives many things).
3. Used intentionally: At the extreme, poets have used ambiguity in the above way and also using puns (śleṣa, words with multiple meanings like kara=hand/doer/tax), to compose poems that have multiple meanings, including entire continuous works of poetry that tell two stories at once (each stanza being interpretable in two ways, and in one instance even up to six ways). There's a book about this called Extreme Poetry (https://cup.columbia.edu/book/extreme-poetry/9780231151603 — unfortunately for the lover of literature, this is a product of modern academia so heavy on theory and light on examples, but worth a look nevertheless). There's even a verse that consists of the syllable yā repeated 32 times (yāyāyāyā...) (https://www.scribd.com/document/6591853/The-Wonder-That-is-S...) which is not just “yeah, yeah” but is intended by the author to mean something. :-)
> straightforward usage of language that most educated readers will simply understand and find unremarkable, not a trick
Do you think those readers would recognize the specific words, or that they would successfully parse the words in context at first glance using their language ability?
> There's even a verse that consists of the syllable yā repeated 32 times
Wow! It seems like this tradition or at least possibility is shared between Sanskrit and Chinese.
In general, a lot of the wordplay techniques and genres that are mentioned in the Extreme Poetry book you linked to are also practiced in similar forms in modern English (and to some extent French due to the Oulipo), but many of them were only invented or popularized during the 20th century, so I imagine some of these Sanskrit wordplay traditions are dramatically older.
I meant that they would successfully parse the words in context (and recognize the specific words and meanings), using their language ability. To use an example from an interview with a modern master of Sanskrit constrained writing (http://www.indictoday.com/interviews/citrakavya-the-wonder-p...), in an English sentence like “She is his panicky Hispanic friend”, the sound “hispanic” may recur when spoken (depending on the speaker's accent I guess) but most listeners would successfully parse it anyway. (I think there is some “lookahead” in the way we parse things... even if “She is Hispanic” and “She is his panicky…” begin with the same set of sounds, when listening to the latter sentence the listener would quickly backtrack and latch on to the correct understanding, which includes unconsciously re-analyzing the earlier words.)
The difference I was suggesting is that such instances are either rare or awkward in English. But in Sanskrit they are common (helped by poets' enormous skill over centuries, honed in a highly language-focused tradition) even in popular works, and feel natural/elegant.
When it comes to ambiguity in languages, linguistics has a distinction/designation of low-context vs. high-context languages which can also be related to the concept of entropy. Latin is often used as an example of a low-context language because so much is explicit in the surface representation. Japanese is often used as an example of a high-context language - meaning that a good deal of context is needed to properly interpret a sentence. This is loosely analagous to Java (low-context) vs. a Groovy DSL (high-context).
People would usually mention that "amor Dei" ("love of God") could mean love toward God, or love that God has toward someone else.
Similarly you could have "odium brassicae" ("hatred of cabbage") which could mean a person's attitude toward cabbage, or perhaps a cabbage's attitude toward something or someone else.
Latin obviously has other ways to be more specific ("odium, quod brassica in te fert" 'hatred that cabbage bears against you' or something) but if you just use a genitive to express an attitudinal relation, it's going to be gramatically ambiguous which direction that attitude flows!
4) The "Vitelli" in Latin would be the vocative of the name "Vitellius" rather than "Vitellio".
Cool sentence -- I had never heard of it before!
The Latin etymological equivalent of the Italian sentence would be "illi vitelli de illis Romanis sunt belli", which uses "de" in a way that's unidiomatic for ancient Latin (where it only means something like "from", not "of" in the sense of "belonging to").
Indeed, Ecclesial Latin is still the official language of the Catholic Church, inasmuch as documents like papal encyclicals are published in Latin first before translation to other languages.
However, I don't think the people using Latin for these purposes really think that logical precision is an important reason to do so. The Vatican Latinist I briefly studied with, Reginald Foster, doesn't seem to think that way -- he regularly made fun of people who acted as if "Latin came down from heaven in a gold box".
You might say that the use of Latin has a benefit of precision for the Catholic Church because there are some familiar theological and ecclesiological terms available whose meaning should be clear and which can be matched up with similar vocabulary in older Christian texts. Still, these texts are not using some kind of formal logic, and they don't do a more careful job of avoiding ambiguities than lawyers drafting legislation or contracts in modern languages do.
then these texts must be really horrible to read and no less bug ridden than any reasonably sized code-base, except that any bug is swiftly explained away as a feature by the commanding authority. Whether bear arms or it fell like scales from his eyes, this stuff doesn't age well, which is pretty much the point.
To be fair, though: The later bit is from Greek; but why not tomatoes, huh?
It would be great to use that kind of language in software development, I think it would avoid a lot of bugs and discussions between QAs and developers.
This is the primary benefit for heavily structuring the sentences. It effectively turns into a programming language with its own form of definitions and statements. My question is: Can we make it Turing-complete by means of recursive sentences? Maybe by using the sentences that redefine proper nouns?
Has anyone attempto'd to combine this with state machines in a way that you could specify rules in an english-like way, with a computer checking that you've resolved all ambiguities?
For example, a system that starts and stops based on observing parts of another system, but also allowing a user to input explicit start/stop commands would require additional rules:
The system is stopped if the subsystem is in state A.
The system is started if the subsystem is not in state A.
The system is stopped if the user sets the stop flag.
The system is started if the user unsets the stop flag.
Error: ambiguous state if subsystem not in state A and user sets the stop flag.
There's some real value here in allowing non-programmers to work through all the edge cases of a system, while simultaneously adding tools to convert standard English to ACE (eg: identifying ambiguous English and asking them to rewrite pronouns or split sentences up).
Have you seen Cucumber? https://cucumber.io/docs It's a system that attempts to enable specifications for a software product to be written in a way that can be parsed by an automated testing framework.
In theory, you carefully say what you want the software to do in given scenarios, human developers understand the spec as-is, and automated tests can read and execute the spec as-is as a test.
I have no professional experience with it because all the docs make it look like something I would run from: lots of in-code scaffolding and talking like a computer just to get the same job done. But maybe you'll like it.
FWIW, the "specifications can be tested" is nothing magical. It binds statements of the Cucumber spec to snippets of code using regular expressions. (The effect is about the same as titles given to blocks of code in Ruby's RSpec).
The "it's not code" part of Cucumber is that the test document is something that a non-programmer won't freak out at manipulating. (But a developer would still have to make sure the code snippets bind to the statements, so...).
I have found that test suites solve this problem about 98% of the time and are easier to teach people to debug.
That last 2% is pretty damned awkward though. When you have 2 preconditions your tests go cartesian, you have to pick a dominant one. It's always a matter of one sucking less than the other, but that situation isn't stable. It tends to flip as bugs are identified or requirements shift.
Thing is, if it's 3 concerns, and definitely by 4, you're probably due for a refactor anyway, instead of reaching for Cucumber or a similar tool.
This reminds me of when my team started code reviewing feature documentation. The text was going to have to be read hundreds of times and understanding it would be critical for QA to function properly. A common problem we ran into was sentences that were hard to understand without reading them multiple times due to small connective parts of speech being omitted. It intuitively felt wrong, but it was not immediately obvious why. After researching it a little every case seemed to boil down to resolving ambiguity. English is filled with ambiguous words that can be multiple parts of speech. These seemingly unnecessary pieces of syntax are actually really efficient at hinting what part of speech will come next so that the context is immediately understood.
Wikipedia articles like this bug me. It is a well written introduction to the project, but why is that on the Wikipedia? Put that on your own website! I wish more people would follow wikipedia’s very clean and easy style.
This is of course why the article can’t follow wikipedia’s citation rules. And also all the references just link to articles by one group, so the article can’t give you any real context. Is this a serious, notable work? Or is someone just boosting their search rankings with a Wikipedia link?
It seems like the latter, though if so, it's surprising to me that it hasn't been removed. Wikipedia's editors are pretty brutal about the notoriety threshold.
Only with things in which editors exist who feel confident enough (sometimes wrongly) to judge their notability. You can completely make up pages describing concepts in obscure fields, as long if you can generate a citation or three - bonus points if the papers cited are unintelligible to all but a few dozen people in the world, and don't even mention the concept.
This sounds kind of like the simplified English used on aeroplanes [1] except more formal and less for humans to write. That leads to signs like "no step" instead of "don't stand on this".
Finally I understand why all those years of Google searches for documentation about ACE (ADAPTIVE Communication
Environment)[1] have been so painful. To make matters worse, versioning was very close - Attempto is at version 6.7 while the programming ACE is at 6.5.
A friend of mine, who worked with Verilog a lot, was expressing some anger about extension .v on the Internet (Github?) mostly related to Coq, not Verilog. Oh well...
As I'm getting older I feel the need to make the code as readable as possible [1] to a point where non-programmers would be able to read it, especially in complex business rules enviroments (ex: finance, insurance, health, etc).
The quote "the limits of my language are the limits of my world" [2] shows when we try expressing these business rules in high-level programing languages like C, C#, Javascript. There are (a significant) parts of my code where I wish to not be limited by the syntax of the language.
DSLs helps, but usually takes a disproportionate effort to implement.
Would be great see more natural languages embedded on our day to day programing language.
Can you define new nouns and verbs? AKA objects and functions? Imagine being able to have the specification, documentation, and actual code implementation all be the same thing!
It's not. Take this sentence: "ACE construction rules require that each noun be introduced by a determiner (a, every, no, some, at least 5, ...)."
The sentence does not follow the rule it describes, since "rules" does not have a determiner. It might be worth a try, but I suspect ACE would make the article awkward and harder to read.
If fact databases were written in this language, wouldn't it make NLP much easier? To allow machines to more easily query fact databases and infer new facts?
I was replying especially to the "easily query fact databases " and less about the general post about ACE. I could imagine though that you could take ACE syntax and transform it into Datalog/Datomic queries quite easily.
This seems similar to Applescript. Someone wants to make a precise language that looks like English and can be read and written by non-programmers. Ultimately you get the reading part, sort of, but writing is harder. You end up with very subtle and precise rules in the grammar which are not as easily inferred correctly as with many common programming languages. Just writing sentences that you think should be correct won't work. You need to fully understand the programming language. So you're back at requiring programmers to work with it.
It might be great to have tool-assisted writing, where you write naturally and the computer says “do you mean: [pedantic unambiguous version]”.
The main risk would be people blindly accepting corrections that actually change the meaning -- just like any other autocorrection system. I’m not sure how to mitigate against that.
Actually legalese comes to mind. Therein and hereto ... I mean the same purported goal to disambiguate and precisely specify. Because of complexity of the result it is hard to say whether that goal is achieved or not :)
For example (taken from [2]), instead of saying the straightforward “Caitra goes to the village”, if you wished to be really precise you might say “There is an activity which leads to a connection-activity which has as Agent no one other than Caitra, specified by singularity, [which] is taking place in the present and which has as Object something not different from ‘village’.” (Sounds a bit less unnatural and also a bit less confusing in Sanskrit.)
> [Attempto Controlled English] can serve as knowledge representation…
It is not clear to me when this project started, but there was a 1985 article pointing out the Navya-Nyāya/“Shastric Sanskrit” language as an example of something that is both (somewhat) a natural language and (somewhat) usable for knowledge representation [2]. In its own way that article became, in popular culture in some circles in India, the source of various unfounded memes about Sanskrit being good for computers, or more absurd claims (https://news.ycombinator.com/item?id=14295285).
[1]: http://www.columbia.edu/itc/mealac/pollock/sks/papers/Ganeri... (can't find a link to the second paper now)
[2]: https://www.aaai.org/ojs/index.php/aimagazine/article/view/4...