How to define a grammar for the group repeated by quantifier?


#1

I’m trying to address area/language-latex#14. To highlight something like this, where the number of [prefixN][suffixN][keyN] groups is arbitrary,

\autocites(multiprefix)(multisuffix)[prefix1][suffix1]{key1}
\autocites(multiprefix)(multisuffix)[prefix1][suffix1]{key1}[prefix2][suffix2]{key2}
\autocites(multiprefix)(multisuffix)[prefix1][suffix1]{key1}[prefix2][suffix2]{key2}[prefix3][suffix3]{key3}

I tried the following grammar (I haven’t added scopes to all the component yet):

  {
    'match': '(\\\\[aA]utocites)(?:(\\()([^)]*)(\\))){0,2}(?:(?:(\\[)([^\\]]*)(\\])){0,2}(\\{)([^}]*)(\\}))+'
    'captures':
      '1':
        'name': 'keyword.control.autocites.latex'
      '9':
        'name': 'constant.other.reference.latex'
    'name': 'meta.reference.latex'
  }

I think this match pattern can catch the syntax properly as can be seen here, but the highlight for the + quantifier part is not what I expected (and maybe the {0,2} parts won’t be neither). Only the last keyN is colored in each case.

Is there any room for improving my usage of regex or is this Atom’s limitation?


#2

You can improve your use of regex by using less of it. Let the grammar engine pick up some of the work.

  {
    begin: '(\\\\[aA]utocites)'
    end: '$'
    beginCaptures:
      1:
        name: 'keyword.control.autocites.latex'
    name: 'meta.reference.latex'
    patterns: [
      {
        match: '(?:(\\()([^)]*)(\\))){0,2}(?:(?:(\\[)([^\\]]*)(\\])){0,2}(\\{)([^}]*)(\\}))+?'
        captures:
          8:
            name: 'constant.other.reference.latex'
        name: 'meta.reference.latex'
      }
    ]
  }

By using a pattern to mark off an area from \autocites to $, you can declare the whole line to be dedicated to looking for the following patterns. I think you should probably break apart the child pattern more, for the sake of readability, but that should work. Also, at the very end, + is greedy by default which meant that it was matching the whole line. If you make it +? as I did in my example (and in Regex101), then it matches the smallest amount of text it can match.


#3

Thanks for your suggestion, it seems working well. I suppose the whole child pattern should also be lazy to give a proper highlight to lines like

\autocites(multiprefix)(multisuffix)[prefix1][suffix1]{key1} some other text (dummy)(dummy)[dummy][dummy]{dummy}

which is unlikely to happen in actual documents, though.

I still don’t see why my regex only highlights the last [*][*]{*} part. I’ll ask somewhere else if this is a question more of a regex itself than of this editor.


#4

Look at the Regex101 example I linked to. You get six matches that way.

It’s easiest to read and manipulate if you do as little as possible in the regexes and have multiple patterns to match against.