Grammar word boundaries and Non Word Characters


#1

Hi,
I’m trying to define language pack for jBASE. Common practise in this language is to separate variable’s name parts with dot sign, ex. Y.TEST.AND.DEBUG. This made me encounter some problems with matching logical operators (AND, OR, NOT).
Ideally I would like to use regex’s word boundaries:

{
    'comment': 'keyword operators that evaluate to True or False'
    'name': 'keyword.operator.logical.jbase'
    'match': '\\b(AND|OR|NOT)\\b'
}

Unfortunately it doesn’t work as expected, even with overridden Non Word Characters:

‘.source.jbc’:
‘editor’:
‘nonWordCharacters’: ‘/\\()"’:,;<>~!@#$%^&*|+=[]{}`?-’ # Default with dot excluded

Am I doing something wrong and Non Word Characters actually affect \b?
If not, is there any way to achieve my goal in elegant manner?

Here are tests I made on this site:

  • how it works at the moment:
  • replaced \w and \W embedded in \b with [a-zA-Z0-9_.] and [^a-zA-Z0-9_.]:
  • solution that seems a bit hard-coded:

@Update:
After some digging:

  • Non Word Characters do not affect regexing at all, but is quite useful with ex. word tokenizing
  • Using sub-expressions seems to do the job quite well. I came with solution like this:
    (?:[^.])\\b(AND|OR|NOT)(?![.])\\b