Custom Grammer using TextMate

Hello - I’m a total rookie when it comes to programming, but I’m doing a bit of scripting for a fairly unknown language. Therefore I’m trying to create my own grammar so that the code is highlighted properly.
I’ve looked through a few examples, and most of it is gibberish to me, because of the lack of programming knowledge, though, the basics seems rather simple.

In all the examples I found, there is always some weird characters showing up all over the place.
For example:

  {
    contentName: 'entity.name.section'
    begin: '/*'
    end: '*/'
  }

In my eyes, /* this text would be selected, and therefore differently styled */.
However, it turns out the proper or at least working way to do this is as follows:

  {
    contentName: 'entity.name.section'
    begin: '/\\*'
    end: '\\*/'
  }

What’s with these double-slashes in the middle of the string?
I’ve also seen double-slashes followed by the letter b - what’s the point of this?

  {
    'match': '\\b([1-9]+[0-9]*|0)'
    'name': 'constant.numeric.integer.decimal'
  }

Is there somewhere where I can read and expand my knowledge on this?
Thank you.

An awkward side effect of how parsers work (if you think that’s bad, definitely don’t look at MUSHcode). * is commonly used as a special character, so some software doesn’t accept it as a literal * unless it’s “escaped” first. Compare this to a character like t, which is always literal unless you escape it. The escape sequence in question goes like \t, which corresponds to a tab character and is the only practical way to represent a tab in code since a tab just appears to the user as a variable number of spaces. Since * is commonly used for glob-style matching (*.txt will find all text files in a search program like grep), you have to use \* to get your literal asterisk. However, \ is also a common special character, so you have to escape that as well, in circumstances where the text might be parsed multiple times. So \\* translates into \* the first time it gets read and then into * the second time.

This rule applies to snippets as well.

\b is a regular expression escape sequence that indicates a word boundary. There are a bunch of tools and sites online that tell you all about regex syntax, but this one is my favorite right now since it has all of the info plus an interactive prototyping tool.

3 Likes

Thank you so much for the excellent explanation. Huge help!

No problem. :slight_smile: