Sequential match rules in a language grammar


#1

I’ve been trying to figure out how to write language grammars for Atom, and it’s… very confusing.

I found the topic Sequential Includes in Grammars but it doesn’t seem to answer my question, so, here goes:

Say I have this:

[
     value;
     test;
] {
     fudge;
     cake;
}

The [] sections can only come before the {} sections, and their content both can span multiple lines.

However, it seems like everything you can do in the TextMate style grammars is either random or in the z direction.
By that I mean, you can do something like this:

any one of the following things:
     value
     test
     cake
     fudge

And you can also say:

any one of the following things:
     {
         begin: '['
         end: ']'
         contents can be any one of the following things:
              value
              test
     }
     {
         begin: '{'
         end: '}'
         contents can be any one of the following things:
              cake
              fudge
     }

But it doesn’t seem like you can do this:

these things **in this order**:
     {
         begin: '['
         end: ']'
         contents can be any one of the following things:
              value
              test
     }
     {
         begin: '{'
         end: '}'
         contents can be any one of the following things:
              cake
              fudge
     }

Unless I’m missing something?

The reason I need to use the begin-end match rules rather than a giant regex that can match it all is a regex is only given the current line.


#2

To write a rule that highlights a multi-line section like the one you gave

[
     value;
     test;
] {
     fudge;
     cake;
}

consider the following modification to the rules you gave

{
  'begin': '(?=\\s*\\[)'
  'end': '(?=\\n)'
  'patterns':[
    {
      'comment': 'matches a [] section so long as its the first match'
      'begin': '\\G\\s*\\['
      'end': '\\]'
      'patterns':[
        # rules for matching things inside of this [] section
      ]
    }
    {
      'comment': 'matches a {} section immediately following a [] section'
      'begin': '(?<=\\])\\s*{'
      'end': '}'
      'patterns':[
        # rules for matching things inside of this {} section
      ]
    }
  ]
}

This additional rule will create a wrapper around your []{} code that will allow the {} to only be matched if it immediate follows the closing ] on the same line. If a {} section starts on the line after the closing ] then the outer “wrapper” will have already closed by matching the endline marker \n.


#3

It took me a while to understand how this worked, but I think I get it now. I’m assuming the inner patterns go until they reach their end, and then the outer pattern is allowed to complete? Previously I assumed that the outer pattern would complete first, and then the inner pattern would match by those contents, but this makes way more sense.


#4

That’s correct. Inner patterns must always finish before the outer pattern can. So as long as the inner pattern is allowed to start, the outer pattern has to wait for it to finish. You can use this along with similar tricks to make fairly complex multi-line rules.