Tree-sitter: Look ahead and only parse special strings


Hello there! :slight_smile:

I am preparing the creating of an Atom package that highlights Adobe ZStrings in Lua files for Adobe Lightroom plugins. That’s a very special thing and is probably never needed by another Atom user, but I wanted a small project for getting familiar with Atom package development anyways.

Here is an example for an Adobe ZString. It always starts with "$$$/.

I already managed to create a working syntax highlighter for ZStrings using Atom’s old TextMate-like parser and now I struggle to adopt that success to tree-sitter. I know that’s a better case for the TextMate parser anyways, but I want to do it with tree-sitter just for sake of learning and modernness.

This is my current tree-sitter grammar:

module.exports = grammar({
  name: "zstring",
  rules: {
    source: $ => repeat(choice(
      prec.left(2, $._junk),
      prec.left(1, $.zString)
    zString: $ => seq(
       optional(seq($.zStringRoot, $.zStringSeparator)),
       repeat(seq($.zStringFolder, $.zStringSeparator)),
    zStringStart: $ => seq(
    zStringEnd: $ => $.zStringQuote,
    zStringQuote: $ => '"',
    zStringPrefix: $ => "$$$",
    zStringSeparator: $ => "/",
    zStringRoot: $ => prec(1, /[A-Za-z0-9]+/),
    zStringFolder: $ => /[A-Za-z0-9]+/,
    zStringKey: $ => /[A-Za-z0-9]+/,
    zStringEquals: $ => /[\s]*=[\s]*/,
    zStringDefault: $ => repeat1(choice(
      token.immediate(prec(1, /[^"\\\n]+/)),
    stringEscape: $ => token.immediate(seq( // Based on
    _junk: $ => /./

This is my testing file:

abc "$$$" def

hupe9hupe9rhp "$$$/LightroomPluginName/Meta/PluginName=Plugin Title" 8hpu

9ohj9iof gz8uo

This is the parsing output using tree-sitter parse from tree-sitter-cli:

I already tried dozens of variations and I can’t stop tree-sitter from assuming that the first string is a ZString too. I want tree-sitter to define the first string as $._junk and just move on, but it currently tries to intepret the string using the $.zString rule. So how can I tell tree-sitter to look ahead and check if the whole string part until end is a ZString, otherwise set the parser pointer back, define the string as $.junk and continue with parsing? I had some success while playing around with token and token.immediate, but I can’t use the functions, because every part of my $.zString rule is another named rule.


Can you provide more examples and what they should be please? I’m not familiar with this language. Also,

  • Does "$$$/ always identify the start of a zString?
  • Is boringString an actual thing, or an attempt to fix the problem?
  • Must " characters be balanced?


Good morning, Aerijo (well, in case we have similar timezones)!

  • Yes, source contents that don’t begin with "$$$/ can be safely defined as $.junk.

This is the general ZString format:

"$$$/Root/Folder/Key=Default Value"


"$$$/MyExamplePlugin/UserInterface/OptionsWindow/Button/Apply=Apply changes"

  • In one of my earlier versions (repo for context) I wanted to find all strings and categorize them either as $.zString or $.boringString. An approach like this was more strict and Lua-aware, but then I discarded $.string and $.boringString when I understood that a more dumb parser is enough for a simple case like I have (simply highlighting zStrings in my Atom IDE).

  • I want the parser to be forgiving and not caring too much about valid Lua, so something like this should parse without problems.

    <<< "$$$/i-am-just-here-to-confuse-you"$$$/R/A=B" >>>

Thanks for your quick response! I will push the latest state to mentioned repository. :slight_smile:

  • In this test you have " and $ in the path text, but in the definitions you declare only word characters are valid. Which should it be?

  • In your example, it seems to have 3 sections (i-am..., R, A=B), but you’ve declared only 2 should be expected.

  • I tried myself, but it wouldn’t accept it as key when in the final segment. I’m assuming the final segment is the only one that can contain a =, so I made that a rule in an external scanner

Also, I’d try changing _junk to choice(/[^"]+/, /"/) .


Wow, thank you so much for your help! I love to see your amount of commitment for the tree-sitter project considering that it gets way too less attention. I mean, even big players like Facebook profit from the Atom infrastructure while its new parsing engine doesn’t even get enough love to have a completed documentation.

I already merged your pull request (Thank you! :astonished:) into the repository and I will answer there regarding the “3 quotes” test.