An agent-based parser for a structural IDE

04 Dec 2012 ideas
I had been thinking about the difference between text editing and structured editing and it dawned upon me that the latter will never win simply because it is not "natural" to edit structurally and that such editing is not forgiving of "mistakes". Sadly, text editing is good by default; and structured editing is not so by design.

We have a grammar, so we use it. But do grammars have to be policemen all the time? After all, we humans created grammar; and I have to think that we created it by convention to make sense of the otherwise chaotic barrage of communicative sounds we invented. So if natural language grammar is intended to be "guidelines and lighthouses to the islands of information in the oceans of noise", why shouldn't computer language grammar be the same?

The problem, of course, is that such a grammar would be non-deterministic. How would you specify a terminal symbol with any kind of certainty when any possible other symbol could intervene and has to be considered noise in a very context-sensitive way?

I wonder if there's an other way; and the rest of this post will record my thoughts to date on this other way.

Imagine each symbol of the language's alphabet as a cell waiting for user input. Imagine each non-terminal in the language's grammar also as a similar cell, only it waits for symbols to activate.The moment a symbol appears in the input (this could be the read of the next symbol from a file or the next key pressed by the user) the symbol grabs it and announces that its now "active". This triggers "interested" non-terminals to claim that symbol as part of themselves, although the ownership is shared - until all symbols of a particular non-terminal are satisfied and it claims all symbols to itself; and announces that it is active. This triggers the next level of non-terminals to wake up, and so forth.

This sounds pretty close to how Agent-based systems work, hence the term in the title. If I were Microsoft, I'd probably call it ActiveParsingTM.

The key advantage with such parsing would be that ambiguous input could be handled by the system. Also, input that doesn't match any production rule in the grammar would still be allowed into the system; which means that erroneous user input is tolerated, not berated.

In addition, such a system would be incremental from the ground up: although it is backed by a grammar, input is not constrained to "start at the top and end at the bottom of the grammar". It is also inherently event-driven (see design notes below) and therefore should be able to react to editing actions better. This ties in well with "natural" structural editing.

Finally, such a system would allow for hybrid languages that embed other languages within themselves, eg HTML+CSS+JS.

I cannot imagine that this hasn't been thought of before, as always. My Goog-fu not being upto snuff, however, I was not able to ferret up anything concrete. Of course, I did find a lot of references in the NLP literature to agent based systems and automata based parsers and such like, but they usually devolved into stats or deep math to prove the value which I couldn't comprehend.

Design Notes

Update
Found it! An undated paper about Object Oriented Parallel Parsing for Context-free Grammars. Although I couldnt find a date on it, references to Symbolic Lisp machines dates it to at least the 90s if not the 80s. Google search terms "parallel parsing computer language" 
© 2024 Vinod KD