Regex search full project


#1

I want to find all files in my project which contan the word protected more than once. I tried something like protected * protected but this did not return any results. Is there a documentation of the regex search in Atom anywhere?

The search documentatio does not mention regex at all:
https://atom.io/docs/latest/using-atom-find-and-replace


#2

Currently find-in-project is line per line. So if your protected are on two different line a single regex will not find it.

Also while * allow you to repeat, you have to specify what to repeat. for example .*
As setup it will repeat a space.

There’s the number of result near each file. Do you have a lot of file where it occurs only once ?


#3

Thanks. I tried a lot more variants, including protected.*protected
I have like a 1000 files and 5-10 of them contain that word twice (but shouldn’t, so I want to find them).
By now I solved my problem search for the single word and scrolling through the results, but I would still be interested in learning more about regex search in Atom.


#4

This should search for two instance of the word protected.

([\s\S]*?protected){2,}

Use is on a tool like grep or another editor that support searching multi-line regex on multiple file.

You can decompose as is:
. search for any character but stop at newline.
\s search for a whitespace character
\S search for a non-whitespace character.
togherther [\s\S] englobe every char.

.* repeat the group zero or more time as many time as possible.
the problem then is that it’ll ‘eat’ your protected query too.
.*? make the glob lazy. It’ll only eat one character then test are we at protected yet ?

() arround that, will group this pattern
{2,} means repeat at least twice


#5

OK, thanks, that helped, especially the \s\S part.
I did not understand the part about making the glob lazy.

I changed it to (^protected[\s\S]*){2,} because it will always be at the beginning of the line, then it works with grep in Linux or in Notepad++ on Windows, found the same files I found manually.
It did not work in Atom and it did not work with the grep that is available in the git Bash that comes with GitExtensions.


#6

consider regex .*protected
against abc_protected

first step is .* catch as many char as possible
that eat a, then b, then cp, r,ot,e,d.
cursor is at abc_protected|

second step is protected.
because cursor is at abc_protected|, it cannot find.
it then it backtrack.
place cursor at abc_protecte|d search for protected
place cursor at abc_protect|ed search for protected
place cursor at abc_protec|ted search for protected

place cursor at abc_|protected search for protected

so .* is called a greedy repetition. It’ll match everything it can before exiting.
the rest of the query will work with the backtrack mechanism.

in contrast .*? is a lazy repetition. It’ll try to match as little as possible before checking context.
place cursor at a|bc_protected search for protected
place cursor at ab|c_protect|ed search for protected
place cursor at abc|_protected search for protected
place cursor at abc_|protected search for protected

If there’s ton of stuff after the ‘protected’ keyboard, backtracking will be slower than necessary.
Sometime also backtracking will not work, so it’s not just performance.

lastly i’ll add if you only want to find two occurrence of protected and exit there, {2,2} or {2} will be faster than {2,}


#7

Basically .* is like saying: to go to the groceries store, drive until you cannot drive anymore, then go back one intersection at at time.

If you think left-to-rigth .*? is what you want.
Unless there’s no overlap with the end condition.
For example a simple string regex can be "[^"]*" (that one don’t support escaping)


#8

Thank you for the explanation.