Efficient way to sequentially redecorate entire buffer


#1

I’m attempting to highlight DNA codons, which are bunches of three letters (A, G, C, or T) aligned from the beginning of the sequence. For example:

  • raw sequence: ACTAGACAT...
  • split by codon: ACT|AGA|CAT|...
  • insert G at beginning: GAC|TAG|ACA|T... (note the shift here)

Since it’s important to make sure codons are lined up correctly, I’d like to highlight them in alternating colors (likely shades of gray), depending upon whether they’re an “odd” or “even” codon (first odd, second even, and so on). The first thing I thought of to do that would be to have an onDidStopChanging callback which would destroy all marks, remark the entire buffer, then redecorate the marks sequentially. For buffers of any size, obviously, that would take a while and not really be an enjoyable editing experience.

So the second thing I thought of would involve making some sort of grammar for this, but I’m not certain a regex would be able to perform this exact task: counting whether a set of 3 letters in a string is an “even” or “odd” codon and then highlighting (of course, regexes can do anything, but I’m not sure where I’d start with that).

Is there an easier way to use markers to accomplish this? Since the lines demarcating codons keep shifting upon text inserted, I don’t think there’s a way to keep the same markers, but I’d really like to know a more performant way to accomplish this task.


#2

Just noodling about this, I’m pretty sure a grammar can do this. Essentially, a line consists of codon pairs with maybe a codon or spare characters at the end. Each codon pair consists of an odd and an even codon. And it just so happens that the definition for odd and even codon match … but they’re colored differently.

I haven’t tried creating an Atom grammar from scratch, only modifying them … but I’m pretty sure that creating a grammar would be the most performant solution. Using markers and decorations means that you’ve essentially got to replicate what the grammar engine already does … just not as well probably.


#3

You should definitely be able to accomplish this with a grammar - I created a quick proof of concept here https://github.com/postcasio/language-dna-codons

What I did was create a pattern that matches two codons, giving them even/odd classes, then another pattern that only matches a single codon that is always even.

'scopeName': 'source.dna-codons'
'name': 'DNA Codons Grammar'
'fileTypes': [
  'dna'
]
'patterns': [
  {
    'match': '([ACGT]{3})([ACGT]{3})'
    'captures':
      '1':
        'name': 'codon.even'
      '2':
        'name': 'codon.odd'
  }
  {
    'match': '([ACGT]{3})'
    'name': 'codon.even'
  }
]

Here it is in action:


#4

Wow, thank you so much! I’m not that familiar with regex so I didn’t realize you could do that groups-of-three jazz. I’ll be playing with that a little, although it looks very complete as-is. Does the second pattern match a trailing codon, which isn’t caught by the first pattern?


#5

Yep, exactly. It’ll match as many codon pairs as possible, and if there’s a single codon after, the second pattern matches instead.


#6

Awesome! I’m doing this as part of a much bigger package for DNA editing so sticking this in as a grammar without having to use special logic is really helpful.


#7

I’ve also just realised you don’t even need the second pattern if you just make the second codon optional:

'scopeName': 'source.dna-codons'
'name': 'DNA Codons Grammar'
'fileTypes': [
  'dna'
]
'patterns': [
  {
    'match': '([ACGT]{3})([ACGT]{3})?'
    'captures':
      '1':
        'name': 'codon.even'
      '2':
        'name': 'codon.odd'
  }
]