What will it take to implement a decent RegExp engine for find/replace in Atom?


#1

Atom already uses Oniguruma where it needs decent RegExp support. So it’s REALLY strange that it’s still using Javascript’s built in extremely lacking RegExp engine for Find/Replace. A feature where human beings constantly enter new RegExp expressions to find pieces of code. Surely we should have a decent RegExp engine for this purpose.

This is driving me crazy. I love Atom. But the JS RegExp engine is so poor.

If you’d like better RegExp support for find/replace, please +1 the issue on Github above.


#2

Please don’t do this, but rather add a :+1: reaction to the main post so that we don’t get spammed with notifications :slight_smile:.


#3

Correction, Atom uses Oniguruma where it needs it for compatibility with TextMate grammars.

Personally, I am :-1: on the maintainer team doing any work along these lines in Atom unless the community can give a list of:

  1. Regular expression features that are missing from the JS regular expression engine
  2. A realistic stack ranking of the priority of those features

It’s easy to say “the one that we have isn’t good enough”. It is very hard to describe what would be good enough. The last thing I want to happen is for the Atom maintainer team to do a ton of work integrating some new regular expression engine only to find out that it also doesn’t have some feature that some group considers “necessary”.


Side Note: The most common feature request I hear about is lookbehind assertions which are coming to the V8 regular expression engine.


#4

I specifically want look-behind assertions. They would have made a complex search-and-replace I needed to do much easier. If we just wait for Javascript V8, how long would we have to wait?

Other desiderata:

  • Must not provide a mechanism whereby a carelessly typed RE locks up the GUI. One solution to this is to make sure that the RE can be searched for in linear time (which I think means “no lookbehind”). A better solution is to do the searching on an interruptable worker thread. There was already a feedback issue submitted for this.

  • Must be fast - particularly in the common case where searching a (say) 20,000 line file for a fixed string. The current engine is “fast enough”.

Related issue:

Find in project is too slow to be useful. A “Find in project” for a six character string with 10,000 matches took me about 14 seconds. Ripgrep took 0.14 seconds (both cases with a pre-heated disk cache). I’d be happy with a second or so overhead to allow for searching in atom buffers rather than saved files, and for formatting the output. Perhaps a way for find-in-project to use an external tool, rather than implemented inline? (This does mean having a different RE language for “find in project” rather than “find” - which is icky.)


#5

If you want lookbehind you don’t have to wait. It’s there behind a flag in V8 right now. Just need to flip the flag: --harmony_regexp_lookbehind

There are hybrid regexp engines that will use a guaranteed linear algorithm unless you use features like lookbehind, back-references, etc. where they can switch to a non-guaranteed algorithm. Essentially you have two different regexp engines, and they are certain to have subtle incompatibilities. The V8 regexp engine actually checks for interruptions as it runs and should be interruptible, but I haven’t looked at this code for a long time. There may be some way to make this more useful.

I’d be interested in seeing a profile for the find-in-project issue. V8’s regexp engine is pretty fast so I suspect the bottleneck is not the regexp engine itself.


#6

Two questions:

  • How can I flip the flag?
  • How do I obtain the profile?

#7

https://electron.atom.io/docs/api/chrome-command-line-switches/
See especially the entry “–js-flags”

Don’t know how to profile Atom. I’m a regexp engineer, not an Atom engineer :slight_smile:


#8

Also How do I turn on V8 flags such as --log-timer-events?


#9

Thanks Erik Got lookbehinds working at least. Slow but useful in a single file.


#10

I fixed the performance.