Grammar syntax hierarchy?


#1

Hi guys,

I am writing a grammar package for a custom log file.
I would like to detect lines with a certain format such as:

  • 123 CCC {BBB} => I want to colorize 123
  • 123 WWWW-SSSS => I dont want to colorize 123

I am using this pattern:

  {
    match: '\\s*([0-9]*)\\s*(?:[A-Z]*)\\s*{(?:[A-Z]*)}'
    name: 'start.line.mypckge'
  }

It seems Atom doesn’t ignore the non selective. Why does it apply the name to the whole match0 instead of only the match 1?

I tried:

{
    match: '\\s*([0-9]*)\\s*(?:[A-Z]*)\\s*{(?:[A-Z]*)}'
    captures:
      1:
        name: 'start.line.mypckge'
}

But still, my other rules that colorizes CCC and {BBB} and WWWW-SSSS don’t work.

I don’t understand very well the mechanism of the grammar package.

Thank you,


#2

You need captures for 2 and 3 in order to highlight the rest of that segment. Once Atom has gotten an affirmative match for a segment of text, it stops looking. If you want to apply a separate pattern inside your current one, you have to use a nested patterns inside the current pattern. However, that’s not necessary in this case, where you can just use three captures entries.


#3

There is no 2 and 3, there is only 1. Last both of them are non-capturing groups.

Assuming there are 2 and 3, my pattern matching CCC {BBB} together doesn’t work because they have been separated…

I don’t see why Atom doesn’t respect the behavior of non-capturing group. There is no way to reuse what was used to match a capturing group even if outside it.


#4

If CCC {BBB} is supposed to be a single token, then you should combine those parts of the regex into a single capturing group and just use 1 and 2.


#5

Using your pattern to match 123 CCC {BBB}, your pattern will: see the 123 which correctly matches [0-9]*, then move to CCC with is matched by [A-Z]*, and finally move to {BBB} which is matched by {[A-Z]*}. So your whole expression is matched by your pattern, but only 123 is given a scope. Since the entire expression has been matched, Atom will continue tokenizing past that expression, not where the last capture group ends. Atom is respecting non-capture groups perfectly in this case (unless I am misunderstanding something). Non-capturing groups do not allow for reuse - they still capture the expression. The only difference is that they are not assigned a capture group, so you cannot add a specific scope to them like you could to [0-9]*.

If you want Atom to find 123 CCC {BBB} but stop its matching at 123, then you need to use lookaheads.


#6

Ah I see, I rarely use several regex for the same text then I didn’t know the regex engine’s behavior was like that.

Using lookahead, good idea, I think I tried one year ago but it wasn’t supported (or maybe it was lookbehind or negative look***)

Thank you DamnedScholar and Wliu !


#7

If you find yourself wanting more complexity, you can look into nested patterns. Here’s an example from a small language package I wrote.