Issue? with quoted strings in grammars


#1

I am building a language support grammar.

I have an issue with the patterns (sic) for handling quoted strings.

The standard way of handling this is to use something like:

{
    'begin': '\"'
    'end': '\"'
    'contentName': 'string.quoted.double.lo'
    'patterns': [
        'include' : '#escaped-char'
    ]
  }

The problem with this is that backslash in not handled properly. You can see this in most grammar packages, try the string:

"foo\"more"

This will typically not be handled properly.

One solution I have hacked up (hacked being the operative word) looks like:

{
    'begin': '\"'
    'end': '[^\\\\]?\"'
    'name': 'string.quoted.double.lo'
    'patterns': [
        'include' : '#escaped-char'
    ]
  }

Note the shift from contentName to name. If you don’t do that, the last character in the string will not be marked as part of the string.

N.B.: TO my sensibility, this is actually a BUG in the system.


#2

I tried your "foo\"more" example in C, Coffeescript, Java, JavaScript, and Shell, and each languages highlights it like I’d expect. The blue is an escape character.

I’m not sure where you’re getting your original code from, but the patterns used in almost all of the core language packages that deals with strings generally look something like this:

{
	'begin': '"'
	'beginCaptures':
	  '0':
		'name': 'punctuation.definition.string.begin.js'
	'end': '"'
	'endCaptures':
	  '0':
		'name': 'punctuation.definition.string.end.js'
	'name': 'string.quoted.double.js'
	'patterns': [
	  {
		'include': '#string_escapes'
	  }
	  {
		'match': '[^"]*[^\\n\\r"\\\\]$'
		'name': 'invalid.illegal.string.js'
	  }
	]
}

so yes, your second snippet is pretty close. (Also note that you don’t need to escape the " when using single quotes. '"' and "'" are valid Coffeescript.)


#3

I elided a bunch of context from my original question.

My string notation supports interpolation; for example:

"fred \(a+3) is 4"

is a legal string expression. I wanted to mimic what I was able to achieve with Sublime: have the inner interpolation be highlighted as though it is outside a string.

So, far, anything I try in Atom fails to work for some edge case or other. In fact, my ‘solution’ does not work :frowning:

This is what I currently have:

  {
    'begin': '"'
    'end': '[^\\\\]"'
    'name': 'string.quoted.double.lo'
    'patterns': [
      { 'include' : '#escaped-char'}
      {
        'begin':'\\\\\\('
        'end':'\\)'
        'name':'string.interpolated.lo'
        'patterns' : [
          { 'include':'$self' }
        ]
      }
    ]
  }

where the escaped char looks like:

  {
    'repository' : [
      'escaped-char' :
        'name' : 'constant.character.escape'
        'matches' : '\\\\([^u\(]|u[0-9a-zA-F]+;)'
    ]
  }

This does not work for empty strings, or for the case where the interpolation is at the end of the string:

"fred = \(four)"

I think the issue is that the ‘end’ pattern is tried too early. If you just have

'begin': '"'
'end': '"'

then " terminates the string.

Help would be appreciated :slight_smile:


#4

Fixed!

My repository was wrong.

I knew there was a reason I did not like untyped languages.