I thought regular expressions were greedy, but they aren't


#1

I am trying to search for a line starting with 'log.' followed by a line not starting with 'r'.
This is the regex I’m using: log\.[^\n]+\s+[^r]

  • [^\n]+ matches everything until newline
  • \s+ matches newline and indentation

Surprisingly, [^r] matches the last whitespace before return, meaning \s+ did not eat all space. I also tried putting a character from the first line in place of [^r]: [^\n]+ shrinks to get a successful match.
I don’t remember this happening before. Is it my fault, or have atom regular expressions become non-greedy? If the latter is true, how can I make them greedy?


#2

works for me

Edit:

matches the last whitespace before return

if with “return” you mean the keyword “return” in your document, it seems like you want to match [r] instead of [^r] as “return” starts with an “r”


#3

Thank you for teaching me how to do multi-line matches with regexes. Very nifty.

You misunderstood greediness. It just means that it tries the longest match first, not that it tries the longest match only.

Say you have this line:

abc abc abc

Say you search for a.*c. This regex will match the whole line due to the greediness of *. (If you use the the non-greedy variant a.*?c then it will match just the first abc.)

But now say you search for a.*b. This regexp will match the whole line minus the last character, and it does not mean that it doesn’t match at all. (Your idea of why it shouldn’t match at all is that .* has already gobbled everything and now we look to see whether there is a b at that spot but there isn’t.) Instead, if .* has gobbled “too much”, then it will try gobbling less to see if it has a successful match.

You can change your regexp to be more precise and say that the “non-r” should be at the beginning of a line:

log\.[^\n]+\s+^[^r]

Note how I added a second caret to indicate that it should match at the beginning of a line. It’s confusing that ^ has two different meanings outside versus inside square brackets.


#4

@laszlokorte, he is seeing unexpected matches if the word “return” appears in the line. See screenshot:


#5

Thank you very much. The caret did not help (the line is indented), but I was able to fix it by forcing a newline (to avoid matches in the first line) and adding whitespace to the negated class: log\.[^\n]+\n\s+[^r\s]