Can I Use Grammar To Determine A Word Regular Expression?


In implementing autocomplete+ vNext, I’d like to find a way to build a regular expression to identify words in a file in a way that is grammar-aware.

I’ve tried using a similar approach to cursor.wordRegExp but this failed pretty badly because it doesn’t take into account things like - in a css property. (i.e. text-align would resolve text and align as words, but not text-align.

It feels like this should be something that is squarely in the wheelhouse of Grammar, but aside from implementing some pretty crazy logic after tokenizing the body of text - I’m not sure what the right solution is.

This would drastically improve the quality of FuzzyProvider in autocomplete+ - currently it uses the really basic
regexp of: /\b\w*[a-zA-Z_-]+\w*\b/g. Aside from being a little too generic, it’s also not unicode aware, which is badness.

/cc @nathansobo @ProbablyCorey @kevinsawicki @thedaniel @maxbrunsfeld (by the way, now would be a great time to review - it will be shipping in the next 24 hours pending no major objections).


What about the idea to just make use of the parsing results from syntax highlighting? If syntax highlighting said this is a function name, then autocomplete could make use of that and offer it for completion.


Interesting thought. I wonder how much noise would have to be filtered out on a grammar-by-grammar basis. I’ll go digging in the syntax highlighting code to figure it out.


@kgrossjo, unfortunately, that may be a bit drastic with large files, the current simple regex already can take a significant amount of time to execute on a large file. Some language syntaxes are so involved that parsing a large file could take 100x as long as long to parse the words as the current expression.


Isn’t the (current) file already parsed anyway?

[Announce] autocomplete+ v2.0.0 Is Released