Add support for indentNextLinePattern as in TextMate for more advanced indentation


#1

In some cases, one may want to indent only the line following a line of code. In C, this could be:

if (true)
  printf("foo\n");

and in Fortran (where the continuation character & is used):

if (.true.) &
  write (*,*) "foo"

The current indent support using increaseIndentPattern and decreaseIndentPattern do not seem sufficient for this. TextMate adds a indentNextLinePattern (http://manual.macromates.com/en/appendix), and it seems as if SublimeText has bracketIndentNextLinePattern (http://www.sublimetext.com/forum/viewtopic.php?f=3&t=4375#p44557) to allow for more flexible behaviour.

It would also be useful if the amount that the follow-on line is indented could be set independently of other indentation.

I am also looking into the overrides package, but have not been able to get it to do what I want yet.


#2

I tried to find where auto indenting in handled in Atom by running `grep -r “increaseIndentPattern” ./" and the only thing that popped up was the package for converting textmate grammar packages. Does anyone know where Atom handles auto indenting?


#3

In some other programming languages such as SAS, the close statement is optional. So sometimes, it is necessary to indent twice. For example:

%macro some_macro_function();    
  proc sort data = some_data; 
    by some_variable;
  run;
%mend;

But also this (since the close statement “run” is optional):

%macro some_macro_function();    
  proc sort data = some_data; 
    by some_variable;
%mend;

I once wrote self-defined indentation function in VIM to support this. However, it is impossible to do so in Atom.


#4

It is obviously possible to write a package to do this in Atom. What constraints are you assuming to make you say this?


#5

Sorry for the wording but what I meant is that there is no default way to implement such indentation. I am very new to write packages for Atom although I am willing to learn and experiment. However, I found there lacks documentation about how to write a language package. I had to look into other existing language packages myself to get a clue for how to write a package for a new language, not to mention write an entire new package for indentation.


#7

I too need functionality such as this, for work I use a BASIC language that does not indent properly with the current limitations. example below.

This is how a case statment in unibasic should look.

BEGIN CASE
	CASE VAR1 = 1
		READ BLAH FROM F.BLAH,VAR1 THEN
			MORE STUFF
		END
	CASE VAR1 = 2
		READ BLAH FROM F.OTHER.FILE,VAR1 THEN
			DO OTHER MORE STUFF
		END
	CASE 1 * this is the default case
		READ BLAH FROM F.LAST.RESORT,VAR1 THEN
			DO LAST RESORT TYPE STUFF
		END
END CASE

this auto indents as follows with the grammar I have written. I have researched this to no avail.

BEGIN CASE
CASE VAR1 = 1
	READ BLAH FROM F.BLAH,VAR1 THEN
		MORE STUFF
	END
CASE VAR1 = 2
	READ BLAH FROM F.OTHER.FILE,VAR1 THEN
		DO OTHER MORE STUFF
	END
CASE 1 * this is the default case
	READ BLAH FROM F.LAST.RESORT,VAR1 THEN
		DO LAST RESORT TYPE STUFF
	END
END CASE

#8

Your use case is supported by increaseIndentPattern and decreaseIndentPattern. You just have to add patterns that match BEGIN CASE and END CASE.


#9

I have set up patterns for BEGIN CASE and END CASE but it does not seem to work as needed. I have researched this thoroughly and tried many variations or the patterns. I do not want to high jack this thread, would you be able PM me to assist if you are certain my case can be done with current features?


#10

I suppose I’ll resurrect this thread because this has been a major annoyance with how I program JavaScript and Atom’s auto-indenting, and I think everyone using Atom could seriously benefit from this feature.

Atom handles its auto-indentation here:

Sublime Text also implements three extra indent patterns:

  • bracketIndentNextLinePattern
    Indents only the next line if this line matches
  • disableIndentNextLinePattern
    Disables bracketIndentNextLinePattern
  • unIndentedLinePattern
    The auto-indenter totally ignores these lines when calculating bracketIndentNextLinePattern (used for comments generally)

See this post for more information (note: I quote most of the problems mentioned in this post later on):

Before you go any further, understand that I started this post thinking I could solve all indentation issues with some added regex rules, which is most of the post. But this is simply not possible, so if you want to see my final solutions, go ahead and jump to the bottom

A possible (complicated) solution

This post actually details a lot of issues, and I personally don’t agree with the naming scheme of ST’s extra indentation patterns, so I propose the following possibly be added to Atom (if one of the other solutions I mention at the bottom isn’t more favorable):

  • indentPatterns
    An object with numeric indices indicating by how much to indent (positive index) or unindent (negative index) starting on the next line (note that all regexes will be tested and summed, so if -1 and 2 both match a line, then the total indentation will be 1)
    Has a similar result to the current increaseIndentPattern but allowing indenting/unindenting by multiple levels
  • indentThisLinePatterns
    Similar to indentPatterns, where the numeric indices dictate how much to indent/unindent by, but in this case, the indentation only affects the current line
    Has a similar result to the current decreaseIndentPattern but allowing indenting/unindenting by multiple levels
  • indentFromThisLinePatterns
    Also similar to indentPatterns, and starts the indent/unindent from this line on
    Has a similar effect to the current decreaseIndentPattern but allowing indenting/unindenting by multiple levels
  • indentNextLinePatterns (instead of bracketIndentNextLinePattern)
    Again similar to indentPatterns, where the numeric indices dictate how much to indent/unindent by, but only affects the next line (skipping blank lines)
  • deferIndentPattern
    If this line was going to be indented/unindented by the previous line, then defer that indentation to the next line, keeping this line in-line with the previous line
  • extendIndentPattern
    If this line was going to be indented/unindented by itself or the previous line, then copy that indentation to the next line
  • ignoreIndentPattern
    If this line was going to be indented/unindented by the previous line, then simply ignore it, thereby effectively cancelling the indentation effect of the previous line
  • ignoreLineIndentPattern
    Like ignoreIndentPattern but for indentNextLinePatterns effects only
  • indentUntilPatterns (I don’t like this solution much personally)
    Similarly indexed to indentPatterns, but each indent-level number indexes an array of objects which fit the following schema.
    Note that the end regex line has to also match the correct indentation level that each rule predicts, based on the indentation of the begin-matched line and indentIncludingBegin and indentIncludingEnd, and if there is a line between the lines that would match begin or end that has less of an indentation than the rule would predict, then the rule fails and does not match. See the example in problem #2.
    • begin: (string) the regex to start the indentation at
    • end: (string) the regex which finishes the indentation
    • indentIncludingBegin: (bool, default false) indicates if the begin-matched line should be indented from this rule
    • indentIncludingEnd: (bool, default true) indicates if the end-matched line should be indented from this rule
  • extendIndentUntilPatterns (this is starting to get really, really complicated…)
    Just like indentUntilPatterns, except that all matched lines will extend their indent pattern like extendIndentPattern; has the following schema.
    • begin: (string) the regex to start continuing the indentation from
    • end: (string) the regex to stop continuing the indentation from

How we can try to fix Sublime Text’s issues for Atom

The rest of this post is pretty much dedicated to showing how my proposal would solve the issues Sublime Text has had, including explanations on how the auto-indenters would theoretically parse and indent the included code

1 - Sometimes indenting the next line should be able to be continued

[quote]bracketIndentNextLinePattern seems to be cumulative. So if one was to want incomplete statements to be indented, statements spanning 3+ lines get indented too far and are not restored to the original level after a ;

a.b()
  .c()
    .d()
      .e();
    f();
// instead of
a.b()
  .c()
  .d()
  .e();
f();

[/quote]
This can actually be fixed pretty easily using features Sublime Text had already implemented. However, the article seemed to ignore languages like JavaScript where the semicolon is mostly optional, but indentThisLinePatterns exists to solve this too. Also, the behavior of the first f(); which the auto-indenter mistakenly over-indented is a bug because Sublime Text only recursively searched backwards for unIndentedLinePattern instead of also searching for bracketIndentNextLinePattern. As long as we implement recursive back-searching through indentNextLinePatterns, indentThisLinePatterns, deferIndentPattern, and extendIndentPattern, we can avoid this bug entirely.

// How the ST article coded the regex

a.b()                           // missing semicolon -> indent next
  .c()                          // missing semicolon -> indent next
    .d()                        // missing semicolon -> indent next
      .e();                     // existing semicolon -> default behavior (consume 'indent next')
    f();                        // existing semicolon -> default behavior (consume 'indent next')
//  ^ note: this line's over-indentation is just a bug in ST


// How the ST article could have coded the regex

a.b()                           // no starting dot & missing semicolon -> indent next
  .c()                          // starts with dot & missing semicolon -> defer indentation
  .d()                          // starts with dot & missing semicolon -> defer indentation
  .e();                         // existing semicolon -> default behavior (consume 'indent next')
f();                            // existing semicolon -> default behavior


// Why this new regex coding fails for JavaScript

// The following is valid JavaScript code
a.b()
  .c()
  .d()
  .e()
f();

// Yet the new regex coding would produce the following indentation:
a.b()                           // no starting dot & missing semicolon -> indent next
  .c()                          // starts with dot & missing semicolon -> defer indentation
  .d()                          // starts with dot & missing semicolon -> defer indentation
  .e()                          // no starting dot & missing semicolon -> indent next
    f();                        // existing semicolon -> default behavior (consume 'indent next')


// How indentThisLinePatterns can solve this issue

a.b()                           // no starting dot -> default behavior
  .c()                          // starts with dot -> indent this
  .d()                          // starts with dot -> indent this
  .e()                          // starts with dot -> indent this
f();                            // no starting dot -> default behavior

2 - Sometimes lines have to be indented by multiple levels

[quote]Currently, it is only possible to automatically adjust indentation one level at a time, which affects switch statements https://forum.sublimetext.com/t/configure-auto-indent-with-multiple-scopes-per-line/22408
[/quote]
This is where replacing increaseIndentPattern and deacreaseIndentPattern with just indentPatterns comes in handy, because it allows lines to be indented or unindented multiple levels for matching a single regex.

However, implementing multiple indentation levels does not fix switch statements. Examples below:

// Switch statement using case, default, return, and break statements for
// indentation, which also fails
switch(expr) {                  // opening bracket -> indent 1
  case 'foo':                   // case statement -> indent 1
    aFunction();                // default behavior
    
    case 'bar':                 // case statement -> indent 1
      return;                   // return statement -> indent -1
    
    default:                    // default statement -> indent 1
      anotherFunction();        // default behavior
      break;                    // break statement -> indent -1
  }                             // closing bracket -> indent from this -1
// This version almost works, except when case statements are chained together,
// and it also makes the following example fail

// Example of normal function (starting at 2 indents)
    function foo() {            // opening bracket -> indent 1
      while(true)               // while statement w/o bracket -> indent next 1
        break;                  // default behavior

      return 'bar';             // default behavior
    }                           // closing bracket -> indent from this -1

// What would happen if break and return unindented (starting at 2 indents)
    function foo() {            // opening bracket -> indent 1
      while(true)               // while statement w/o bracket -> indent next 1
        break;                  // break statement -> indent -1

    return 'bar';               // return statement -> indent -1
}                               // closing bracket -> indent from this -1
// This obviously doesn't work, because break and return statements have to be in
// the right context to unindent properly, hence why I propose the following:


// Switch statement using case, default, break, and return statements for
// indentation, but with indentUntilPatterns
switch(expr) {                  // opening bracket -> indent 1
  case 'foo':                   // case statement -> indent 1 until break or return statement
    aFunction();                // default behavior
    break;                      // break statement -> indent -1 because case's 'until'
    
  case 'bar':                   // case statement -> indent 1 until ...
    return;                     // return statement -> indent -1 because case's 'until'

  default:                      // default statement -> indent 1 until ...
    anotherFunction();          // default behavior
    break;                      // break statement -> indent -1 because case's 'until
}                               // closing bracket -> indent from this -1
// Note that each case/default statement only indents until it finds a return/
// break statement at the correct indentation level, meaning that breaks/returns
// embedded in an if statement or whatnot won't undo the indentation. Also note
// that a recursive back-search is made for every break or return statement. My
// example here still has issues if there isn't a break or return statement at 
// the end of the branch, like so:

// Switch statement like above, but broken without the break statement
switch(expr) {                  // opening bracket -> indent 1
  case 'foo':                   // case statement -> indent 1 until break or return statement
    aFunction();                // default behavior
    break;                      // break statement -> indent -1 because case's 'until'
    
  case 'bar':                   // case statement -> indent 1 until ...
    return;                     // return statement -> indent -1 because case's 'until'

  default:                      // default statement -> indent 1 until ...
    anotherFunction();          // default behavior
  }                             // closing bracket -> indent from this -1
//^ In this case there isn't anything to tell the auto-indenter that the
//  branch's indentation should be cancelled out by the closing bracket, or if
//  you included closing brackets in the 'until' clause of the case statements,
//  then every time you type a '}' there could be a huge overhead searching back
//  for the line that matches the first part of the clause, and there's issues
//  with determining when to stop searching

Really, as far as I can tell, there isn’t a way to make switch statements work perfectly without something like VS Code’s language servers or a very interesting indentation implementation based on syntax highlighting, because the auto-indenter essentially needs to understand the basics of whichever language it’s working on.

3 - That one Sublime Text bug again

[quote]a combination of the above two points, the same applies to multiple if statements without braces:

if (true)
  if (false)
    cool();
  this_should_be_one_level_backwards();

[/quote]
Again, this is just a bug with ST and could be fixed by recursively back-searching through any lines which temporarily affect the indentation.

4, 5, and 6 - Sublime Text bugs that don’t affect Atom (I think)

7 - Sometimes multiline structures have to continue the next-line indentation from the previous line

[quote]Because the scope selectors operate at EOL [end of line], and the regular expressions only have the one line of context to match against, it is not possible to reliably skip block comments that don’t cover the new line character. i.e. using a <scope>comment</scope> selector and matching <key>unIndentedLinePattern</key><string>.</string> would be a generic solution that would prevent each language from needing to override this regex pattern (probably in the past, single line comments didn’t scope the \n character, so this technique couldn’t be used), but still causes e.g.

if (true)
  /* test
  example */

Enter to lose the indentation from bracketIndentNextLinePattern after the comment because one can’t guarantee that */\s*$ ended a comment
[/quote]
However, when given the tools, one can check previous lines for the characters that denote the start of a comment, like with extendIndentUntilPatterns, but this has a variety of issues. It’s not worth doing a ton of examples over, but basically if the comment start and end lines are at different indentation levels than predicted, then the auto-indenter won’t understand to end the next-line indentation, or if we scrapped the indent-level checking, producing a possibly very large overhead (scanning the entire file just for a single line), then the auto-indenter would have to understand how strings work, including multiline strings. It’s a huge mess.

8 - Multiline if statements

[quote]related to some of the items mentioned already is batch reindenting multi-line if statements. Unless the syntax definition has a unique meta scope on the if, it’s hard to use regexes that would handle this correctly based on a single line of context - see http://stackoverflow.com/questions/41571959/sublime-text-3-indentation-for-multi-line-statements-in-php

    if (VeryLongThingThatTakesUpALotOfRoom ||
        OtherQuiteLongThingSoINeedTwoLines) {
      statement1();
      statement2();
    }

reindents to

    if (VeryLongThingThatTakesUpALotOfRoom ||
      OtherQuiteLongThingSoINeedTwoLines {
    statement1();
  statement2();
}

[/quote]
This one actually is possible to solve with the solution I’ve proposed above, and the example below is an even more complicated version. But again, it doesn’t work too well when multiline strings are involved, or also in this case, if part of the fragmented if statement doesn’t begin or end with an operator. Looking at the example you should be able to figure out why (I’m getting lazy with my examples).

if (VeryLongThingThatTakesUpALotOfRoom        // begin of fragmented if statement -> indent next 2
    || OtherQuiteLongThingSoINeedTwoLines ||  // continuation of expression -> extend indent
    YetAnotherLongThingToGetThreeLines ||     // continuation of expression -> extend indent
    AndAnotherSoICanMakeYetAnotherMess) {     // end of fragmented if statement -> indent -1
  statement1();                               // default behavior
  statement2();                               // default behavior
}                                             // closing bracket -> indent from this -1

9 - If implemented properly, not an issue we need to worry about

10 - Not something relevant to Atom I don’t think

11 - Technically relevant to Atom, but not the auto-indenter

12 - Using nonconforming soft tab sizes

I’m not even going to quote this one. All that needs to happen to make sure this bug never exists in Atom is to treat incomplete tabs (like 1 space when it should be 4) as full tabs AND correct their tab spacing to conform (unless it’s in a comment or string, and we have the same issue as before, with knowing the full syntax of the language). Sublime Text doesn’t seem to have done the latter.

Alternate Solutions

Frankly, I just don’t think that completely reinventing the regex auto-indent system is a good idea, because relying on just single-line contexts is not going to cover all scenarios. However, for all of the following alternate solutions, I believe we should keep backwards-compatibility at least for a while.

If we still wanted to use the regex system, we could use a few of the rules I have suggested above (definitely excluding the last two if at all possible) in combination with the grammar scopes. This means that in the grammar files we would make the patterns mark things like switch branches with a meta scope name, which could then have overriding or compounding indentation rules (there would probably have to be a setting or something for override vs compound). These meta scopes are already supported by Atom, but I think more specific scope selectors override less specific ones right now, just like in CSS.
Also, right now, the scoped settings come from the first token of the line (like switch, for example) instead of specifically where the cursor is. Do you think this should be changed or stay as is?

Or for another solution, what if we open up Atom’s suggestedIndentForTokenizedLineAtBufferRow (see the top of the post) to Atom language packages through a service like the ones Linter provides? This would allow languages to implement their own fixes for the unsolved problems above, like switch statements (#2) and multiline comments extending a one-line indent (#7).

Maybe we should add a text-edit notification to the service so that the language can cache an AST-like representation of the code? But this has the potential to get complicated with embedded languages, like the JavaScript or CSS sections of an HTML document.

Finally,

I really want to add these features to Atom and its languages, but before I even try to add any pull requests I 'd really like some serious feedback on my proposals. So then, what are your thoughts on all of this?


#11

I am glad you resurrected this, i never found a solution to my scenario. I will provide feed back as requested soon and would love to participate in the development if you would like help.


#12

Bump… :slight_smile:

Also, I will totally accept some help, especially because I’ll be starting a programming job in two weeks (wohooo! self-taught ftw) and I won’t have as much free time anymore.


#13

I’m going to go ahead and expose suggestedIndentForTokenizedLineAtBufferRow (but with a shorter name) through a service in a package I’m calling autoindent-plus. My goal is to create a base package like linter so that other packages can hook into it and easily apply their own indentation rules. In fact, there could be different packages for different languages and coding styles. At the moment I’m going to start with javascript and php, because I use them the most right now, but I want to eventually make more.

Also, with the idea of different coding styles, would it make sense to do an optional ‘first-line match’ kinda thing like how language packages can auto-select the correct language (this seems a bit wrong because it would be adding arbitrary and unconventional extensions to an already standardized convention)? Or else, should the user only have global settings for coding style per language? Regardless, the global settings should exist.