Highlighting syntax for various languages in .html files

Hey, guys,

I work on several projects that include EJS or Statamic’s templating within .html files. I have a language plugin for each, but the problem is that the EJS plugin only appears to check for .ejs files, so my .html files with EJS fall back to using Antlers, which is strange since they use different syntax. Anyway, other than manually switching the language every time, is there a better way to tell Atom to use Antlers or EJS based on the actual syntax, not simply the file type?

Thanks in advance!

Is it possible for a regular expression to tell the difference between EJS, Antlers, and vanilla HTML?

I thought of it as, is it possible to tell the difference between {{ and <%. I mean, they can already tell the difference between vanilla HTML and a templating language once triggered and traversing a particular file–my impression is that they’re not being triggered very efficiently.

If you can write a heuristic that can reliably tell the difference between plain HTML, Antlers and EJS, then you could conceivably add some code to your init.coffee that would do this. The edge cases for this kind of thing are pretty hard though. For example, is the following an HTML, Antlers or EJS file?


By the nature of HTML templating languages, it could conceivably be any one of the three. Another example:

    <p><%= (new Date()).toString() %> - Here are some {{ antlers }}</p>

At least we know that this one is either EJS or Antlers.

I also recently wrote about the difficulty of writing a reliable heuristic for determining if a chunk of text was Markdown or not.

People generally don’t like it when computers try to guess what they want and get it wrong. So it is often better to do the simple, dumb thing than to try to do the smart thing and guess wrong even a fraction of the time. This is why Atom checks the file extension and looks at the first line of the file (mostly for shebang lines) to determine the grammar to use.

With that said, I would love it if someone ported Linguist over to Atom for these kinds of things. Even still, it would just use the file extension the vast majority of the time.

I wouldn’t expect a program to be able to determine which grammar to use based on your example, but would this ever occur in the real world? Personally, I would never mix two templating languages in a single file. Ideally, for me, the Atom would determine which grammar to use based on some signature syntax.

For example, if the file only contains <> tags, use vanilla HTML; if it contains <> tags and/(or <% %> or <%= %> tags), use EJS. Similarly, if the file only contains <> tags, use vanilla HTML; if it contains <> tags and/or {{ }} tags, use Statamic Antlers. Each plugin could add its own condition. To me, this’d be a safer assumption than purely checking the file extension and would solve my issue.

That’s just my suggestion, and I don’t know the ins and outs of Atom, but I figured we were passed a scenario where all my .html files are marked as Statamic files.

If the editor were mostly used for XML-like languages, this might work. But it’s general-purpose software that has to work equally well for all languages, as part of its design philosophy. At one point, someone decided to match against file extension and the first line of the file, because that catches the vast majority of cases without introducing extra complication.

Now, if you want to build an init.coffee script (or even a full package) that switches for you, that’s perfectly doable as long as a couple of if statements and regular expressions can clearly and reliably tell the difference.