Access to tokenized contents of TextEditor


Hi everybody,

What I would like to do, is create a plugin (package), that adds autocomplete functionality to package.json files. I started out with autocomplete-plus, which adds great features out of the box, but one thing I can’t seem to figure out: How to get access to the tokenized input of a file? As far as I understand, something must have done the tokenization, since there is already syntax highlighting, and in the autocomplete-plus plugin I get access to the token names, before the cursor, so re-parsing the file seems wasteful, and unecessary.

So what would be a good way to do this? I have access to a TextEditor instance, and I can also get the appropriate Gramar from the editors registry.



There isn’t any supported method of doing what you’re asking in the Atom API. The parts of Atom that represent the tokenized contents of the file are undocumented and subject to change at any time without notice.

With that said, there are examples here on the board of people doing the kinds of things you’re talking about. You may want to search for dynamic grammar.


Cool, thanks, I have a look at the dynamic grammars then!


You may also want a look at the symbol provider of autocomplete plus.

And this part of the symbol store look similar to what you are asking

And with the following warning for completeness :stuck_out_tongue:


Wow, thanks, this looks very good for a first experiment! :slightly_smiling:


Just a quick follow-up: Since editor.displayBuffer.tokenizedBuffer.tokenizedLines contained the buffered text tokenized, and not the text returned by TextEditor#getText(), I decided to use an external library called tokenizer2 (available on npm). This is FAR from optimal, since this way the file gets tokenized each time the proposals show up, on top of the (most likely much more efficient) default Atom-tokenization, that is required for syntax highlighting, etc. I have looked at other packages, like atom-autocomplete-xml (which uses regexes to get the nexessary info), or autocomplete-css (which only works with scope descriptors), but none of these rely on figuring out some semantic information about the text, that would need tokenization.

If you take a look at this, I got the package in a working state, there is a gif demonstrating what I’m doing:

So is there really no stable way to access the “fresh” token array? I’m OK with anything (like observing stuff, or a callback), but reparsing the file looks really wasteful, since it will be done anyways by Atom.


I’m not sure what you are trying to do.
If you just want to suggest entries from what already has been typed, symbol provider is already doing this.

If you want to inspect the code in a more specific way, pretty much all the autocomplete-* package does that.
Some like python and go use an external tool that monitor files, some like autocomple-html monitor the current buffer.