Tokenizing process


#1

From a language package installed in atom I’m trying to find out the pattern object(s) that correspond to a specific scope .

looking into some language grammars I can tell that there isn’t a standard way a language defines patterns ;


so I made the assumption that these rules are parsed while tokeninizing the text-buffer…
I therefore found some classes that deal with tokens and first-mate but I can’t seem to find where the actual parsing takes place;

Could somebody explain (or point me to docs) on the workflow that the text-buffer goes through to tokenize a language.

thanks


#2

The tokenization process is not documented and isn’t a supported feature in the Atom API. Any explanation of the process will become invalid sooner or later. With that said, you can start here to do some code spelunking and see how it is currently implemented:


#3

I’m more interested in how a language-package gets decoded and stored rather than in the entire tokenizing process _

now I have a more specific question:

in the grammar.registry why do idsByScope point to negative indexes and what’s the easiest way I can reach those scope patterns?


#4

The tokenization process is not documented and isn’t a supported feature in the Atom API. To be honest, I don’t know the answers to your questions and I’m not really comfortable giving a guided tour to implementation details that we don’t want people depending on. You may want to look at the work some others have done in working with the low-level grammar engine. This topic has links to most if not all of them: