Can Atom interpret my #!/bin/bash shebang?


#1

Oftentimes, files are named at liberty, and with complete disregard for any “extension” conventions.

I have now a bash script which is curiously named as 00_generic_test.sh.template – and in context, it is a decent name for its purpose. Importantly, it’s version-controlled and referred to alot from other code; so trying to rename the script is not worth any while.

The pain with Atom is that it thinks this file is JSON. Despite the proper shebang line and executable permissions on the file.

… This actually pounces me every time I open the file, by throwing a bunch of “invalid syntax” markers at me – highlighted, of course, with bright red background of the “One Dark” theme.


Yes I found this petty “customization” point (??) – which basically all can do is to add even more misdirected [file extension → language] mappings. Can an editor do better than this?

If I were to write an Atom package which, let’s settle for simplicity, would detect #!/bin/bash shebang line and set the buffer language accordingly – which Atom APIs I could rely on to accomplish that?

Is atom.workspace.eachEditor() going to cut it, performance-wise? What if many more content signatures get added?

Are there any previous efforts I didn’t find (I didn’t find any) in the direction of, roughly, bringing GitHub’s linguist into Atom?


#2

Atom can recognize shebangs, yes, and the core package language-shellscript will be active for files where it recognizes the language based on the shebang. It sounds like the file extension rule overrides the first-line rule when determining what Atom should guess, though.

You can get a list of available grammars from the atom.grammars global. So you would watch with atom.workspace.observeTextEditors(), check that editor.lineTextForScreenRow(0) matches a shebang, select the grammar you want via atom.grammars.grammarForScopeName('source.shell'), and pass it to editor.setGrammar().


#3

This to start the conversation… not to give solution.

Hi.

When defining a language / grammar, there is a attribute that can set called firstLineMatch.
This is defined as: Noted from a template by @DamnedScholar

A regular expression that is matched against the first line of the document when Atom is trying to decide if the grammar is appropriate. Useful for shell scripts, mostly.

Executing atom.grammars.grammarForScopeName("source.cpp").firstLineRegex from the Atom’s console reveals an object which includes: (?i)-\*-[^*]*(Mode:\s*)?C\+\+(\s*;.*?)?\s*-\*

That would me like to believe that some type of action that you seek is possible. More might be possible via the grammar API: https://atom.io/docs/api/v1.19.5/GrammarRegistry

I do not know how to use this information though on forcing JSON to not be used.
atom.grammars.grammarForScopeName("source.json") -> null

Regards.


#4

And this is the step I didn’t think of that you would do if you wanted to support a wide array of shebangs, if you’re the sort of person who uses multiple shells and Perl all the time.


#5

Please have a look if there is anything that looks helpful here:

@Wliu: you would be the right person to tag into this conversation … right?


#6

…and I would image that somewhere this instruction will play a role somewhere:
atom.grammars.getGrammarScore()


#7

Currently, file extensions take priority over firstline matches such as shebangs (Ingramz knows more than me about that, but it doesn’t look like he has an account on Discuss). I’m guessing that if you had an observeTextEditors hook and checked if the text on line one (https://atom.io/docs/api/v1.19.6/TextEditor#instance-getTextInBufferRange) matched the shebang, you could then set the grammar.


#8

@Wliu: I have followed your directions (thank you!) to test it out.
I have come up with some code, but…
a) There is a complication in setting the grammar… an object reference is needed.
b) There is a problem (see below) - my suggestion does not work all the time.

@ulidtko: Try to add the following in init.coffee ->

atom.workspace.observeTextEditors (_editor) ->
  _firstLine = _editor.lineTextForBufferRow(0) # read 1st line
  # console.log _firstLine
  if (_firstLine.search(/#!.*bin.*bash/) > -1) # search for bang line
    # console.log "This is BASH!"
    _editor.setGrammar atom.grammars.grammarForScopeName('source.AWL')

NOTE 1 -

Replace the source.AWL with whatever language / grammar represents the BASH.

NOTE 2 -

This will not work when Atom restarts and the file is opened on startup of Atom. This works only when adding the file when Atom is already open. The observeTextEditors call happens long before the grammar is loaded when the file opens on Atom restarting. I have no idea on how to counter this positively.

UPDATE 1 -

A slight change in the code:

atom.workspace.observeActiveTextEditor (_editor) ->
  _firstLine = _editor.lineTextForBufferRow(0) # read 1st line
  # console.log _firstLine
  if (_firstLine.search(/#!.*bin.*bash/) > -1) # search for bang line
    # console.log "This is BASH!"
    _editor.setGrammar atom.grammars.grammarForScopeName('source.AWL')

NOTE 3 -

Changing observeTextEditors -> observeActiveTextEditor will ensure the grammar is forced onto the editor as soon as the editor is newly in focus [AFAIK]. If Atom is restarted and the particular file is open and in focus - this concept will not work. If the file is focussed on later, this will work. Idea borrowed from change commitment to grammar-selector package [LINK].

NOTE 4 -

Other helpful hints might be available from the package projects:


I hope this soothes the pain a little.
Cheers.


#9

Maybe try waiting until https://atom.io/docs/api/v1.19.7/PackageManager#instance-onDidActivateInitialPackages has been called.


#11

Hello.

Perhaps this proposal is worth looking at:

# ++++++++++
# init.coffee
# ++++++++++

doCheck = (_editor) ->
  _firstLine = _editor.lineTextForBufferRow(0)
  #console.log _firstLine
  if (_firstLine.search(/#!.*bin.*bash/) > -1)
    # console.log "This is BASH!"
    _editor.setGrammar atom.grammars.grammarForScopeName('source.AWL')
    
atom.packages.onDidActivateInitialPackages ->
  for _editor in atom.workspace.getTextEditors()
    doCheck _editor

atom.workspace.observeTextEditors (_editor) ->
  return unless atom.packages.initialPackagesActivated
  doCheck _editor

NOTE 1 -

Replace the source.AWL with whatever language / grammar represents the BASH.

NOTE 2 -

This seem to work at start-up: cycling through all the editors.
It does turn true but this occur far before Atom has started up completely.
Thereafter the editors that are newly added into Atom are inspected as needed.

NOTE 3 -

At first I used atom.packages.hasLoadedInitialPackages() which did not have a good result.
The atom.packages.initialPackagesActivated works as expected - it becomes true only when all packages are loaded at start-up. (see CORRECTION NOTE 1)

NOTE 4 -

Tested in Atom V1.19.5 on Windows 7 SP1 (see CORRECTION NOTE 2)
with file named 00_generic_test.sh.template
where shebang line #!/bin/bash is the first line.

Hope this is of value.
- Dan Padric



CORRECTION NOTE 1 -> Applies to NOTE 3
It is possible that the inital test was in error.
atom.packages.hasLoadedInitialPackages() was retested in Atom V1.19.7.
hasLoadedInitialPackages does work and it is part of the documented API.
IMHO: The suggestion by @Wliu below is still the better option .


CORRECTION NOTE 2 -> Applies to NOTE 4
A typing error: The code segment was tested with Atom V1.19,4 not V1.19.5 as previously stated.


How get shell linter to recognise file without shebang
#12

@danPadric I will again point your attention to the fact that you’re using undocumented APIs. In this case I believe you can get around using .initialPackagesActivated by setting a global boolean, initialPackagesActivated, to false at first. Then, when all initial packages have been activated, set it to true. For example:

initialPackagesActivated = false

doCheck = (_editor) ->
  _firstLine = _editor.lineTextForBufferRow(0)
  #console.log _firstLine
  if (_firstLine.search(/#!.*bin.*bash/) > -1)
    # console.log "This is BASH!"
    _editor.setGrammar atom.grammars.grammarForScopeName('source.AWL')
    
atom.packages.onDidActivateInitialPackages ->
  initialPackagesActivated = true
  for _editor in atom.workspace.getTextEditors()
    doCheck _editor

atom.workspace.observeTextEditors (_editor) ->
  doCheck _editor unless initialPackagesActivated

#13

Hi @danPadric!

@ulidtko: Try to add the following in init.coffee ->

Yeah, I could’ve totally written that myself (with 'source.shell' ofcourse), but thanks for the snippet anyway!

However, as I read the thread further, I see that it gets surprisingly tricky.


@Wliu

Currently, file extensions take priority over firstline matches such as shebangs

Why is that? Reads like a bug to me.

I thought there’s no content-type detection support at all in Atom; turns out there is, but it loses to filename extensions! What the hell…


#14

I agree with your take on this.


#15

There are downsides to both approaches; for example, back when first line matches did take priority the majority of PHP files were incorrectly being tokenized as plain HTML due to their first line being <!DOCTYPE html> or <html>.

If you can think of a solution that addresses both use cases, I would love to hear it :).


#16

Thank you, I understand the situation better.

Was that an invite for brain-storming or a go-away-you-pest? :stuck_out_tongue_winking_eye:

I have some thoughts that needs articulating - but first 2 questions ->

  • How is the situation currently handled if there are conflict in the file extension?

  • What is the idea behind atom.grammars.getGrammarScore()?


Initial thoughts ->

  • The “fuzzy find” and priority concepts used by the autocomplete / snippets system might be usable in this case.

  • A system that uses a “result reached first” concept is going to have an issue. The key would be in “allowing” that there will be conflicts and giving scores according to what matches. If scores are the same - then the user might be asked to choose one.


#17

Always an invite for brainstorming!

I assume your first question means that there are two language packages that register the same extension. Community packages beat core packages, but when there’s two community packages, I believe that’s when the firstLineMatch is checked. If that doesn’t solve the problem? :man_shrugging: Probably whichever package was loaded first, but I’d have to dig into the code to find out.

Can you elaborate on your second question?

Regarding thoughts: I agree that there is a problem with the “result reached first” model. What do you mean by “according to what matches”?


#18

Allow me to take up your first question ->

Referencing
https://github.com/atom/atom/blob/master/src/grammar-registry.coffee#L44

The question is - how does the mechanism work?
My assessment: The internal mechanism of getGrammarScore seems to be to give a “weight” to each definition. Add +0.25 for the correct path and add +0.125 for the content match.

Added to this could be a weight value to prioritize a match. It could be community vs built-in, but also could be something available to the user per user configuration. This combines with the user being able to add custom extensions in the configuration… which in this topic would have given a path conflict.

HOWEVER, for me it will be still okay if the user is asked which grammar to use if two/more scores are the same. A list can be given to him with the conflicting selections. Again a priority weight could be assigned, when the user choose to do so. Else he will get to select at each conflict.

(to be continued)


#19

This is partially handled in my previous reply. The design should embrace the possibility of having a conflicting result. A adjustable weight setting can “learn” from the user what he expects. When the user selects which of the conflicting chooses he uses, he has the option of assigning a custom weight. This can be done “blindly” (example +0.05 is added) or by having him choose a number (user 10 -> 1 which translates to system 0.1 -> 0.01).


To me the first line content test is the most important. If there is only one grammar that matches, then there is no reason to test further. If there is no match or conflicts, then the weight score setting comes in place.

Again => if there is only one extension match, then that is the selection to use. If there is more than one match or no match, then the weight score setting is used.

It would be nice if a file name and path match could be included, like the same way as the first line works. It would be most natural for me to add code snippets for python into myCode/python/ for example. This piece could be user defined. If there is a single match then do not use the weights, simply go with the match.

You could even say that if there is a single match on some attribute test, the score is doubled / higher.

If no match is found of any of the content or extension - then go to the default.

If there is multiple matches, then look at priority settings.

If the weight score is still equal (non zero) - ask the user… as described above.

I hope this is something to chew on and start a debate.
Regards.
- Dan Padric

@moderators : Could we perhaps split this off the original topic? Perhaps with a title “Brain storming: Suggestions for improving grammar assignment.”


#20

Hello,

The thoughts I have put down earlier is still rough. It aims to change the way the current grammar identification works.

As a coding proposal, a matrix weight-scoring system can be used instead of a single weight-scoring number as we have now.

Perhaps the previous notes needs a rewrite to organise evrything more logically. Before doing a rewrite and detailing the matrix scoring code concept - another train of thought first.


APPROACH 2

Let us assume that the current method of identifying the grammar is working pretty darn well. It is only some user specific cases that the current identification system is having trouble with.

The proposal is to construct a two step process for identifying a grammar. STEP 2 is the current method… unchanged (for the moment).

STEP 1 relies on the definition from the user or community package. The identification tests that will run will not be part of the standard grammar definition. See it as if a community package that runs before the standard grammar identification. BUT if STEP 1 identifies the grammar, do not use STEP 2 (standard grammar identification).

The user defined (by configuration) tests would be:

  1. First line or last line text. (example: shebang)
  2. File definition -> Path (storage location)
  3. File definition -> File name
  4. File definition -> File extension
  5. File definition -> File size
  6. File definition -> Meta information

One or more of the tests needs to be defined before this is used as an identification method. Each configured test (1-6) can be a list… it is a OR-list.

All the tests in a set need to pass before the grammar is marked. The results of tests 1-6 is AND.

Example:
The definition might look like…

"Grammar" := 'source.AWL',
"Test1"   := ['^Fun.*'],
"Test4"   := ['awl','AWL'],

Tests 1 and 4 is defined, so it is a legal grammar identification definition. Tests 2, 3, 5 and 6 is set to a default of true.

if 
    Test1() and Test2() and 
    Test3() and Test4() and 
    Test5() and Test6() 
then
    setGrammar := "Grammar"
    doStandardGrammarIdentification := false
else
    doStandardGrammarIdentification := true

What makes Approach 2 nice, is that it can be integrated into the system without needing to break what is already there. It can even be left dormant - if nothing is defined, the method is simply skipped over.

What are the thoughts on this approach?
- Dan Padric


#21

@moderators : Could we perhaps split this off the original topic? Perhaps with a title “Brain storming: Suggestions for improving grammar assignment.”

I don’t mind at all, you’re welcome to hijack this very thread! :smiley: