Decoupling Display Format From Storage Format


#1

One thing has long bugged me about coding with other people (which is simply a reality for the vast majority of folks these days) … all the holy wars over brace style, tabs versus spaces, trailing whitespace being a no-no, line length … ad nauseam. I think I have an idea that will allow us to do away with this in one fell swoop …

The format in which I view and edit my code does not have to be a direct one-to-one representation of the code that gets stored on disk. For example, I might enter:

def foo( a , b , c=1 )
    bar a, b, c
end

And what might get stored on disk is:

def foo(a, b, c = 1)
  bar(a, b, c)
end

Obviously, the editor would need to do a bit more work and would probably need a full parser for any language it would provide this feature to. But wouldn’t it be worth it to finally be able to stop arguing about these debates that have raged on for decades?

I would expect to have to define the following:

  1. The display style (tab width, brace style, etc etc … it would vary highly depending on the language)
  2. The storage style

The display style would probably be the same across all projects while the storage style would be project-specific. But this way, all my code could look the way I want it to … and I wouldn’t be putting anyone off because on their machine it would look just the way they want it to. And if we built enough smarts into it, the formatting engine could minimize the impact of edits to make diffs clearer besides!

This would also make the whole maximum line length debate go away (about time!) because the editor could be smart enough about the language being edited to wrap it on a small screen even if the line was far too long to fit :smile: For example:

puts "Wow this is a really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, really, long line of text!"

Could be rewritten for display into:

puts "Wow this is a really, really, really, really, really, really, " \
  "really, really, really, really, really, really, really, really, " \
  "really, really, really, long line of text!"

Thoughts? Crazy? Or crazy enough it just might work?!?


AnySyn: A new way to edit
What is it about JavaScript that makes people want to have hugely long lines?
Dreaming of dynamic CoffeeScripted JavaScript
Softwrap on regular expressions would be nice
How to have same snippet with different code convetion
Indentation/TabWidth as Encoding, not style
#2

I think that one of the major problems with this approach is going from the storage format to the display format. For some transformations, it would be simple, like indentation. Other transformations, like the string breaking you suggested, sound more complicated.

Eventually you get into a place where you having to make things up in order to convert back into the storage language, because the transformation from display -> format is lossy.

I like this idea, I’m just not sure if it is feasible to make it as powerful as I would like.


#3

Well, it isn’t like we’re destroying the storage format. If the user doesn’t change anything in the file, then the editor just throws away the display format when the user is done viewing it. Also, just because we’re formatting something for display doesn’t mean that we have to actually edit the display format as if it is the actual text and then convert back to storage format. Let’s go back to my really long line example …

Let’s say that we go to column 92, signified by the pipe character here:

puts "Wow this is a really, really, really, really, really, really, " \
  "really, really, really,|really, really, really, really, really, " \
  "really, really, really, long line of text!"

And we add some text:

puts "Wow this is a really, really, really, really, really, really, " \
  "really, really, really, (wow, is this ever going to end?) really, " \
  "really, really, really, really, really, really, really, long " \
  "line of text!"

Because the editor knows that whatever cursor location in the display version corresponds to line x column 92 in the storage version, it knows to insert (wow, is this ever going to end?) at that location in the storage version.

And yes, performance is probably a concern … but if you think about it, WYSIWYG word processors have been doing exactly this kind of thing for decades. This is the same thing as using a style sheet to say that your body text looks like this and has a line and a half margin after each paragraph. And then you move to a location on the screen, type something and the word processor inserts only the text and formatting that you typed in the right location in the storage format. So I doubt the performance problems are insurmountable …


#4

To be clear, I wasn’t suggesting that there would be a performance problem.

Because the editor knows that whatever cursor location in the display version corresponds to line x column 92 in the storage version …

That’s a good point. It’s kind of like source maps in other things. I’d be excited for something like this to work, but I also think it would be difficult to pull off.


#5

I agree. It will take some thought and planning, but I believe there is ample precedent.


#6

Just my 2 cents regarding implementation: It actually wouldn’t be necessary to maintain some sort of source map or to correlate new changes to old values. An alternative would be to have a “Javascript Beautifier” built in. Upon opening a file, it would beautify it with your settings, then on save it would beautify it with the global settings.


#7

This is a good point. You can take it even further and say that whenever the editor saves a file, it will uglify it (to save even more space). Then, when you pull up the original copy, Atom will beautify it. Grant it, you will need to put several layers of caching in place to keep from continuously re-generating files. I still think it would be a neat feature.


#8

The downside to that would be that uglifying would not preserve change histories in vcs


#9

That’s true. Perhaps there is another method of compression that would work. Or having it beautified, saved in whatever vcs, compress with uglify, then save to disk? This is starting to sound too complex to work.


#10

Yes, uglifying might have its place … but I think that would be the same place we use it now, compressing JavaScript and such for fast download, not as the default storage format. If we just keep to “translating between styles”, I think this is workable … whether at the basic level that @HMUDesign mentions or the more advanced version I envision.


#11

I’ll have to agree with you there. I think it’s very probable, however I think you’ll have to make a dedicated instance of “proper” white-space for each programming language. This is because the syntax for required white-space is relative to each programming language. Eg. Java wants it’s else-if statement to look like this: else if(x===y) {}. But PHP wants it’s else-if to look like this: elseif($x==$y){} So, Java MUST have a space in between else and if to look at it like proper syntax.


#12

To add my 2 cents, I just opened this:

So my discussion is for just indentation, and things can always move from there. But specifically I think indentation is an easy one to start with.

As for minifying/uglifying and VCS - maybe if this is a gradual approach, and things are minimal to start off with, VCS (like Git) would join the movement and begin to understand the storage format.

I’m sure there are many things common across languages that could be formulated into a standard “encoding” - indentation is already one… alignments could be another (elastic tabstops)… then all you need is braces and function/if brackets spacing and you’ve nearly covered the majority of styling differences.

As mentioned in my discussion noted above - I want to push for indentation first - having that as distinct encoding style vs coding style. THAT alone will save me a huge amount of time and I’m sure a lot of other people too - as for me that’s the major stifle point.

Also - regarding storage format - Golang already has a Gofmt tool that renders code in their “standard” format - such a thing could become the storage format for said language - and the actual language maintainers chose storage style. If you think about it… they’ll be writing the compiler that works with the stored format - so it could be in their interests.


#13

I’ve been thinking along the same lines. In most programming languages, some text characters are syntactically relevant while others are ignored and used only for presentation to the developer.

I envision a world where the presentational layer and the syntactic layer are not wrapped up in the same set of characters (where some characters even perform double-duty depending on context).


#14

Sorry for reviving this, but I stumbled into it and I think the idea is awesome :rocket:

A big part of my day job is developing and maintain the front-end for a pretty large and in some parts old codebase. The current team is on the same page pretty much, but we’re also dealing with some pretty crappy stylesheets created by people no longer at the company.
But even in the same team, what formatting works well during creation is not always the same that works well during maintenance.

The idea we had was to have an editor maintaining a certain well defined format on disk, but being able to switch representation in the editor.

Really, code is a user interface. Syntax highlighting, autocomplete, indentation, indent-guides, bracket highlighting, linters, etc. are all crutches to help prevent mistakes and improve efficiency of a very clumsy and dated interface. I love writing code, but it’s simply not necessary to have the editor display exactly what’s on disk.

Atom is awesome. It’s even innovative in many ways. We even have quite innovative languages. But the act of writing code exactly as it’s stored on disk is a bit ancient and worn by now.


#15

If you really want editor independence then make the file format the AST in json. There have even been editors that edit the AST directly. You would have to provide a GIT hook to translate it into some kind of line-formatted text to diff it.


#16

This would be the holy grail, and would be one of those features that could let atom move from “a cool editor” to “the best editor”


#17

Sure the promise seems awesome, but I’m rather sceptic about how an implementation can deal with that efficiently. The first step would probably be to define a spec that agnostic enough to fit every known languages, and this part in itself is already out of reach for me, but as the saying goes: they did not know it was impossible so they did it.


#18

This is the one and only reason that people love Lisp. (All the other “reasons” are just overloads of this reason.)


#19

So as I understand it, the limiting factor here is that Atom shows the text file exactly as it is, and there would need to be a translation layer?

So suppose you have the functions s2v and v2s that would convert saved files to view and vice versa. There would be some lossiness, in that an outside change to the saved file could be changed by the v2s(s2v()) roundtrip. That is not a problem IMHO, as long as repeated roundtrips result in a stable saved file.

Furthermore, there could be cases where the saved file cannot be transformed by s2v, and then the user could assist to convert the saved file to a transformable one.

So basically if you have the translation functions, and a memory buffer for the transformed version, this could already be implemented, no?

I’m specifically thinking of a coffeescript-lite which would add CS-like significant whitespace to ES6, that seems like it would be fun. However, one of the problems of CS is that sometimes you need to look at the compiled source just to see if it does what you think it does. With a repeated v2s(s2v()) cycle, the optional braces could get inserted but could be rendered less visible, so that they don’t cause as much visual noise but do provide certainty regarding semantics.

A possible problem would be cycle speed, and that could be improved by keeping the AST of both representations alive and converting changes to patches of that tree. However, that is not trivial.


#20

I’m join this discussion because I think that this is a really neat feature, which can improve coding health a lot… Here are my 2 cents :slightly_smiling:

The most approchable method I imagine is using some kind of bidirectional code transformation. So, let’s say:

  • Atom loads a file -> our engine apply some transformations with user settings.
  • Atom saves a file -> our engine apply the same transformations with project settings.

Transformations could be dynamic per project/language (like plugins). Ex: indentation, function spacing, braces, preprocesors, even naming conventions! Then in my project we decide to use the Java indentation and the Java braces plugin. Get the point?

The tough part is: these transformations must (?) rely on a language specific parser. So, we must be able to parse (inside Atom) the language.

The good point about this problem: with a language specific parser you can build a lot of editing capabilities (better highlight, good autocomplete, cool refactors, etc).

About the performance concern… Of course the first implementation will be slow, but remember Atom: first was deadly slow and now is a lot faster! So, performance shouldn’t stop us trying!

We could try to implement this for JavaScript… We have plenty of tools to make this task approachable: parsers, transformators using parsers, etc.