Grammar : Understanding captures vs. name


#1

Might just be hacky code but I have two questions (posed here as well : https://github.com/olegbl/language-typescript/issues/9)

Is captures even required?

  {
    'captures':
      '1':
        'name': 'storage.type.variable.ts'
    'comment': 'Match this.'
    'match': '\\b(this)\\.'
    'name': ''
  }

As I understand name is just a shortcut for captures : https://atom.io/docs/latest/creating-a-package#language-grammars so you could do

  {
    'comment': 'Match this.'
    'match': '\\b(this)\\.'
    'name': 'storage.type.variable.ts'
  }

Any reason for using captures?

Using captures with name

For the following case :

  {
    'captures':
      '1':
        'name': 'keyword.operator.ts'
      '2':
        'name': 'variable.parameter.function.ts'
    'comment': 'Match stuff like: module name {...}'
    'match': '\\b(module)\\s*(\\s*[a-zA-Z0-9_?.$][\\w?.$]*)\\s*'
    'name': 'meta.function.ts'
  }

I think the first match becomes the keyword.operator.ts second match becomes variable.parameter.function.ts and the combined thing becomes meta.function.ts. Is this correct?.


#2

That is almost correct. To understand the difference a bit of knowledge of RegExp is necessary:

Expressions between () in a RegExp are called capture groups, so the captures hash in a grammar refers to the capture groups defined in the RegExp.
The global name field refer to the whole string matched by the RegExp.

With that in mind let’s take the following situation from the language-coffee-script grammar:

{
    'captures':
      '1':
        'name': 'variable.parameter.function.coffee'
      '2':
        'name': 'storage.type.function.coffee'
    'comment': 'match stuff like: a -> … '
    'match': '(\\([^()]*?\\))\\s*([=-]>)'
    'name': 'meta.inline.function.coffee'
}

This rule will match both -> and (arg) ->.
Both are valid inline functions as declared with CoffeeScript, but only the second one declares arguments.
We want to render differently the arguments of a function and the function arrow, but we want the expressions to be identified as a whole as a coffee function, so we’ll use capture groups to isolate the function arguments from the function arrow.
Using the name field we say “The whole expression, whether there’s arguments or not, is a function”, and with the captures field we say “But we want the arguments and the arrow captured in groups to be colored differently”.

We could go further and use that rule instead:

{
    'captures':
      '1':
        'name': 'variable.parameter.function.coffee'
      '3':
        'name': 'storage.type.function.coffee'
      '4':
        'name': 'storage.type.function.bound.coffee'
    'comment': 'match stuff like: a -> … '
    'match': '(\\([^()]*?\\))\\s*((->)|(=>))'
    'name': 'meta.inline.function.coffee'
}

Here I replaced the ([=-]>) part by ((->)|(=>)).
That way I get simple functions (->) and bound functions (=>) in two different capture groups (#3 and #4).
The capture group #2 is the combination of both #3 and #4 and is only needed to allow the | operation to operate on -> and => but it doesn’t need to get it’s own token so it doesn’t appear in the captures field.

Now to come back to your first example, the rule is trying to match the this keyword when used to access one of its properties.
The \b(this)\. expression will match this. but neither athis. nor this. But the writer probably doesn’t want the dot in the expression to be grouped and colored with the this so the whole expression is leaved with an empty name and only the capture group containing this will get a name.

One better way to write this rule would be to use a positive lookahead group ((?=<expr>)) so that we doesn’t capture the dot in the expression while still testing for its presence:

{
  'comment': 'Match this.'
  'match': '\\bthis(?=\\.)'
  'name': 'storage.type.variable.ts'
}

Here the positive lookahead will ensure that the rule only matches this when it’s followed by a dot without capturing the dot in the result. As the dot isn’t captured we can use the global name field and get rid of the captures one.
The \b part tests a word boundary, and not an actual character so it doesn’t need to be isolate from this.

I hope that little demonstration have helped you to understand the differences between the name and the captures field in a grammar rule.