Can I Use Grammar To Determine A Word Regular Expression?


#1

In implementing autocomplete+ vNext, I’d like to find a way to build a regular expression to identify words in a file in a way that is grammar-aware.

I’ve tried using a similar approach to cursor.wordRegExp but this failed pretty badly because it doesn’t take into account things like - in a css property. (i.e. text-align would resolve text and align as words, but not text-align.

It feels like this should be something that is squarely in the wheelhouse of Grammar, but aside from implementing some pretty crazy logic after tokenizing the body of text - I’m not sure what the right solution is.

This would drastically improve the quality of FuzzyProvider in autocomplete+ - currently it uses the really basic
regexp of: /\b\w*[a-zA-Z_-]+\w*\b/g. Aside from being a little too generic, it’s also not unicode aware, which is badness.

/cc @nathansobo @ProbablyCorey @kevinsawicki @thedaniel @maxbrunsfeld (by the way, now would be a great time to review https://github.com/atom-community/autocomplete-plus/pull/186 - it will be shipping in the next 24 hours pending no major objections).


#2

What about the idea to just make use of the parsing results from syntax highlighting? If syntax highlighting said this is a function name, then autocomplete could make use of that and offer it for completion.


#3

Interesting thought. I wonder how much noise would have to be filtered out on a grammar-by-grammar basis. I’ll go digging in the syntax highlighting code to figure it out.


#4

@kgrossjo, unfortunately, that may be a bit drastic with large files, the current simple regex already can take a significant amount of time to execute on a large file. Some language syntaxes are so involved that parsing a large file could take 100x as long as long to parse the words as the current expression.


#5

Isn’t the (current) file already parsed anyway?


[Announce] autocomplete+ v2.0.0 Is Released