AnySyn: A new way to edit


#1

I previously posted a proposal here for a new language, csyn. Due to discussion there it has morphed into a package for Atom that allows one to edit a javascript file using an alternate syntax, like coffeescript for example. Here is the readme …

AnySyn

Atom editor: edit javascript source files with custom syntax

AnySyn is an Atom editor plugin that allows you to edit javascript source files with an alternate syntax. For example the word function could appear as -> when editing but when saved the valid javascript function keyword would be used.

The syntax conversion is lossless and one can switch to the new syntax and back to javascript with only changed code differing in format. This solves the problem of format changes messing up the git diffs.

The first alternate syntax supported is very similar to coffeescript. When combined with ES6 you get a “language” that is similar to coffeescript but is real javascript with simple syntax substitutions. This allows one to work on someone else’s javascript file using the “coffeescript” syntax and the file owner would only see javascript.

AnySyn is so simple that it could almost be written with all regex replacements. However, significant-whitespace support such as used in Python and Javascript is more complex and requires using the AST.

In the beginning the syntax will be specified by writing code. It will convert JS to the AST, generate the source with the new syntax, and then when saving it will do the opposite. Note that a new grammar will need to be written for Atom to match the new syntax.

Motivation

I have used coffeescript exclusively for four or five years and loved it. When I originally looked at changing from coffeescript to ES6 I thought I could never use it because it still uses the C syntax with all the noise. Then it occured to me that something like AnySyn could fix that. You write the code mentally as real javascript but with easier writing and reading.

Coffeescript-like syntax Features

This is a wish-list for the first supported syntax. Some features may not be included and I assume more will be added. Note that each feature is optional via settings. E.g, if you don’t like using <- for return then you can turn off that feature.

  • Significant whitespace, no more ugly pyramid of braces
  • Parens usually not needed in for, if, or function call
  • Skinny and fat arrows with almost the same semantics as coffeescript
  • No need for empty parens before function arrows
  • @var replaced with this.var
  • <- replaced with return
  • #var replaced with let var
  • x@y replaced with x.get(y) (map access)

What AnySyn doesn’t do

AnySyn makes no changes to ES6 just to be more compatible with coffeescript. AnySyn is only to reduce ES6 noisiness. For example these are not supported.

str = `AnySyn doesn't change #{this} to ${that}`
# this non-comment doesn't become // this comment

Atom integration out of the gate

AnySyn will be supported by Atom the same as a first-class language. When a javascript file is loaded it is automatically parsed to an AST and then the editor buffer receives the source with the new syntax. It will have highlighting customized for the new syntax. Flipping the buffer between AnySyn and JS will be supported with one quick command.

Status

Just a specification at this point. There is nothing more than this readme.

Why switch from Coffeescript

Many coffeescript users like me are converting to ES6. For a quick writeup comparing the two see this

Here is my personal list of reasons for changing to ES6.

  • Improved debugging: Even with source maps coffeescript is harder to debug.
    • You can’t hover over a variable like @var to see the value
    • You can’t evaluate coffeescript in the console
    • Stepping can be confusing because of line mismatch. I sometimes have to step many times to get past one line of coffeescript.
  • Larger community: Coffeescript has divided the community. I can finally publish code without people bitching they can’t read it.
  • Advanced features: While some coffeescript features are lost, like all code being expressions, there are many, if not more, features gained from ES6, like iterators.

Examples of the “CoffeeScript” syntax

These examples are mostly taken from here.

//--- JS ---
let square = x => x * x;
let add = (a, b) => a + b;
let pi = () => 3.1415;

//--- AnySyn ---
#square = x => x * x   // `#` changed to let
#add = (a, b) => a + b // no semicolons
#pi = => 3.1415        // no empty parens
//--- JS ---
var square = function(x) { return x * x; };
var pi = function() { return 3.1415; };

//--- AnySyn ---
var square = (x) -> <- x * x  // anonymous function() becomes () ->
var pi = -> <- 3.1415         // return becomes <-
//--- JS ---
if (x == 0) {
  for (let i = 0; i < 10; i++) {
    y += 10;
  }
}

//--- AnySyn ---
if x == 0                      // parens optional
  for #i = 0; i < 10; i++  // whitespace significant
    y += 10
//--- JS ---
function helloWorld (a = 'hello', b = 'world') {
try {
 console.log(a);
} catch(e) {
 console.log(
   b
 );
}

//--- AnySyn ---
-> helloWorld (a = 'hello', b = 'world')  // -> changed to function
  try
   console.log a  // call doesn't need parens
  catch(e)
   console.log(   // left paren needed for multi-line params
     b
//--- JS ---
class Parrot extends Bird {
  constructor(name) {
    super(name);
    this.name = 'Polly';
  }
  get name() { 
    return this.name;
  }
}

//--- AnySyn ---
class Parrot extends Bird
  constructor name      // no parens needed
    super name
    @name = 'Polly'     // @ changed to this.
  get name
    <- @name
//--- JS ---
function* range(start, end, step) {
  while (start < end) {
    yield start;
    start += step;
  }
}

//--- AnySyn ---
->* range (start, end, step)  // ->* means generator function
  while start < end
    yield start
    start += step
//--- JS ---
map.set(key, value)
map.get(key)

//--- AnySyn ---
map@key = value  //  @  replaces get and set for maps
map@key

Ideas for the future

I know it is a bit premature but here are ideas that have been tossed around …

  • Adding support for icons. The lamda symbol could be used instead of =>.
  • Some wild kind of editing directly on the AST instead of text.
  • Adding a syntax specification language that programmatically creates new syntaxes and grammars.
  • Supporting other languages than javascript

License

AnySyn is copyright Mark Hahn via the MIT license.


Proposal: Real ES6 JS with Coffeescript syntax
Spec for new language that allows use of ES6 like coffeescript
#2

I would go as far as saying that AnySyn should not specify a syntax, instead, it should provide the possibility to translate features that are commonly referred to as verbose into something completely user-specified.
While this might make specifying an atom grammar for it difficult, a mechanism similar to the one used in semantic highlighting could be used (not sure though).


#3

Wow, I read the other thread too, and I really like the idea that’s developed! I’ve seen discussion about a similar idea earlier. If I remember correctly, the idea was some transform layer between the text buffer and the display buffer of the text editor. It would also allow for certain coding styles (e.g. if(a) return; vs. if (a) return;, note the space after the if) independent of the format that’s saved, which I think is really cool, and maybe also possible with AnySyn.

I’ll be enjoying my vacation until the end of august, but after that I’d love to help on this :smile:


#4

Yeah, but what you see in the TextBuffer is what you’re editing! If I understood correctly, the content would be re-compiled on the fly when saving.


#5

I believe this is the idea you’re thinking about:


#6

From what I understood, the TextBuffer has the text as it is in the file on the hard drive and the DisplayBuffer is the representation of it as we see it on screen, and that it is the DisplayBuffer that we’re editing, and those edits are sent right to the TextBuffer. If there can be a layer between those, that performs bi-directional transformations, the text we edit in the DisplayBuffer could look radically different to what’s in the TextBuffer and will be saved. That layer does the same thing as AnySyn, only it does it while editing, instead of on save. It will need to be deeply integrated into Atom itself though.

Transforming while editing performs a lot of transforms on small pieces of text, while transform on save would have to transform the entire file on each save, but I don’t know the details about string manipulation in javascript to tell which would be more performant.

If it is the TextBuffer we’re editing, though, and those edits just update the DisplayBuffer, providing this layer would be much more complicated.

But still, either way the coding style thing I mentioned in my previous post would still be possible and it’s a feature I would love :smile:


#7

Yep that’s the one :smile:


#8

I still think that storing the parsed AST would be the best solution, though. As of now, the flow would be:

  • Load the file from disk
  • Parse it and make an internal AST
  • Generate the “personally linted” source
  • On save, re-generate the AST and save as JS source

You can see there’s an over-abundance of parsing. In my opinion this should be avoided.

Someone might remember the Macintosh File System: there, files consisted of two components: data and metadata; this would have proved incredibly useful for our purposes… unfortunately no filesystem implements such a system anymore.
But all is not lost! I would store the generated JS and AST in the same file, with the AST occupying the initial part of the file under the form of comment. This way, one can avoid parsing on load.


#9

There is one important thing to consider when thinking about how the text is loaded, edited, and saved. We have to make sure that the output text exactly matches the input except where edits are made. If we don’t then spurious GIT diffs will occur and clog up the repo. Repo owners hate this.

When doing white-space conversion then potentially large amounts of text will be modified. The following is a small example but it could be a problem over the whole file.

// text in
if (a) {b=1; c=2
  d=3
}

// converted
if a
  b = 1
  c = 2
  d = 3

//edited
if a
  b = 1
  c = 2
  d = 9

// naive recoding from AST
if (a) {
  b = 1;
  c = 2;
  d = 9;
}

// desired output
if (a) {b=1; c=2
  d=9
}

There are two ways to fix this.

  • Get everyone on the project to agree to a specific beautifier and enforce running this on all saves. The GO community does this.

  • There is a second much more complex way …

Compare the old AST and the new to find the smallest lexical unit that has changed. The original AST would have line and char position info so we can replace only the small lexical unit in the original. This is a tricky algorithm but I believe it is possible to develop it to work on all test cases.

Something that would be impossible is to make sure the newly generated code follows a unique style standard that the repo owner demands. This may kill this idea in some use cases. I can see people bitching about the output produced.


#10

I forgot all about that other thread. It duplicates this idea a lot. I think the difference is that I’m proposing a specific solution. My solution may suck but it is best to get something going. It can be improved later.

After reading that thread and this I think maybe the beautifier solution is the most practical one, at least for now. On new projects the code style could be clean from the beginning with clean diffs. Any pull requests that aren’t beautified could be rejected.

Changing to this standard on an old project would require one commit with a boatload of garbage diffs but it would be clean from then on.

Also, I really like the idea of using a beautifier as the GO community does.


#11

Many users would hate this. They just want to be able to check out a file into their VIM and start editing. It would limit who would join a project. This would be worse than using coffeescript in terms of people bitching.


#12

I see, that’s a very reasonable concern. Such features are definitely worth revisiting in the future, in my opinion, but it’s best to get something going :smile:


#13

Yes, some kind of plugin scheme that allows any syntax makes sense and should be supported from the beginning.

However, making it easy to add a syntax is a big technical challenge. Each syntax would require specifying the syntax in some DSL language and then some solution for highlighting would be needed. This development would be as big or bigger than AnySyn itself.

AnySyn can host a syntax provider service with the interface set to a high-level. The first “coffeescript” plugin provided would be developed with unsophisticated hard-wired code and hook in at that level. Later, a new project could develop the tools to make this easier. I personally will be happy with the first one since my goal is to get the best of the coffeescript and ES6 worlds. I’m selfish that way.

I think one advantage of AnySyn as a project is that it is focused on this one short-term goal. If you start a project with many lofty goals you are likely to end up with nothing.


#14

Completely true but, for example, I find <- instead of return horrible :wink:. I can already see someone who thinks “But I want to write r3turn instead!”

I believe we do not need something extremely sophisticated. Something like what I implemented in my package selection-counter might suffice actually.

You have the return statement normally expressed as: <- {expr} and I change it into r3turn {expr}

Let’s take functions:
In js they are (for example, not to be taken to the letter):

{name} = function({args}) {
    {stmts}
}

I change it into:

{name} ({args}) ->
    {stmts}

And so on…


#15

Yes I completely understand. Many would not like my personal syntax.

But this is not AnySyn’s problem. If it is developed as a framework-only with no specific syntax in mind then you and everyone else is free to create new packages that work with the AnySyn provider. Everyone would need to watch the AnySyn development to keep it honest and not allow biases to creep in.

Here is a possible spec for how AnySyn would work and the interface it provided.

  • It would be a normal package that when enabled would make the conversion process invisible. Opening javascript would just appear with the new syntax as if it was stored that way.

  • A config setting could allow specific files, maybe a regex on the path, to be linked to specific syntax plugins. It would be much like how a grammar is linked to files. Actually it could be an Atom grammar, right? If so the highlighting could be implemented the standard Atom way without any interaction with AnySyn.

  • When a file is opened that matches the pattern it would call the plugin.

  • The plugin would provide two simple calls.

    • One call would accept an AST and return the text with new syntax. This would use some code generator, probably escodegen.
    • The second call would do the opposite. It would take the text and return an AST, probably using esprima or acorn.
  • AnySyn would create a texteditor, read the file, convert the javascript to an AST, call the plugin to get the converted text, and place it in the buffer.

  • AnySyn would trap any save and do the reverse.

  • AnySyn would also provide an Atom command to toggle the buffer between the standard javascript and the new syntax. This would be especially useful in debugging. It would obviously do this using the same two plugin calls.

I think that is all AnySyn would do. As you can see it is a pretty trivial package. I think it could be written in a few days. I wouldn’t need any help. It would have a simple test plugin that does something trivial like implementing only the straight text substitutions.

The contributions by others could be done by helping me with my coffeescript plugin or creating their own. Individuals could fork any plugin to create their personal syntax. They could publish them also.

This forking would be so easy that I think there would not be a need for the config settings we talked about to turn individual features on and off. I could be wrong. The plugin could support these settings itself if desired.

Comments on this idea would be appreciated. If everyone agrees I’ll turn this post into a spec to add to AnySyn’s readme and then start working on the package.

Edit: Added the steps where AnySyn converts javascript to/from an AST.


#16

I don’t think so. Textmate grammars are made of simple pattern matching with regex’s, they are not literally “grammars”…


#17

I meant that the cataloging of grammars and what files they are associated with could be handled by Atom’s existing code. The real grammar file would only provide the highlighting. Does this make sense? I have never looked at a grammar file or whatever it is called. I have no clue in this area.

In any case AnySyn has to tell Atom what highlighting to use. I don’t know how this would happen.


#18

I’m not sure either. Anyway, Atom’s grammar files only contain a name, some associated file extensions and a series of patterns to match to scopes. It has to be said that they are JSON files, so one could theoretically insert fields in them without breaking backwards compatibility.


#19

Say you were creating an AnySyn plugin for javascript, would it work if you just copy the base javascript grammar file, and replace function with (->|function) in the regexes? Or would this be too naive?


#20

Even though I’ve never seen a grammar file I would think that would work.

Even though white-space is a big change I don’t think that changes any highlighting. And I think everything else will be simple substitution.