Need some help understanding language-c grammar


#1

Can anyone explain the following line to me? This is from the ‘language-c’ package.

'begin': '(?x)\n    \t\t\t\t(?:  ^                                 # begin-of-line\n    \t\t\t\t  |  (?: (?<!else|new|=) )             #  or word + space before name\n    \t\t\t\t)\n    \t\t\t\t((?:[A-Za-z_][A-Za-z0-9_]*::)*+~[A-Za-z_][A-Za-z0-9_]*) # actual name\n    \t\t\t\t \\s*(\\()                           # start bracket or end-of-line\n    \t\t\t'

NOTE: This is what it actually looks like when opening the grammar in Atom.


#2

I just deleted all the whitespace and \n\t characters. Makes it a lot cleaner to work with :smile:. I’m assuming that when language-c got converted from Textmate all the actual newlines, comments, and tabs got shoved onto that one line :confused:.

At any rate, the regex matches patterns like ~hello( or hi::~hello(.


#3

Yea, I did the same, I just didn’t understand what each regex was. Plus the comments are really crappy (they should show an example instead of broken english. Anyhow, thanks @Wliu!


#4

@Wliu is correct. You can find these artifacts in pretty much all of the grammars. The (?x) at the start is the key clue as it causes the regex engine to ignore all white spaces in the regex string. Prior to converting from textmate the regex @Zooce linked would have looked like this:

'begin': '(?x)
    (?:  ^                                 # begin-of-line
      |  (?: (?<!else|new|=) )             #  or word + space before name
    )
    ((?:[A-Za-z_][A-Za-z0-9_]*::)*+~[A-Za-z_][A-Za-z0-9_]*) # actual name
    \\s*(\\()                              # start bracket or end-of-line
    '

#5

@tomedunn, I’ve definitely noticed an abundant use of (?x). I was wondering, what is the (?<!else|new|=) for? When the comment says “or word + space before name”, I don’t understand the meaning of that.


#6

@Zooce that is a negative look behind. The first part of the regex

    (?:  ^                                 # begin-of-line
      |  (?: (?<!else|new|=) )             #  or word + space before name
    )

is saying match the start of a line or any character that does NOT immediately follow else, new, or =. As a simple example, the regex (?<!else)(if) would match the word if but not the if in elseif since it is proceeded by the word else. I hope that makes some sense.


#7

Wow, thank you very much @tomedunn! That was very helpful!