Ask Atom to save file in utf8 not us-ascii


#1

Hello,

How can I ask Atom to save files in utf8 and not us-ascii.

Running file-i myAtom.js returns:

myAtom.js: text/x-c++; charset=us-ascii

How to save my documents in utf 8
#2

UTF-8 is a superset of US-ASCII. Perhaps your file didn’t have any non-ASCII characters in it, and so it was classified as ASCII.


#3

The problem is I can’t open my files with iMacros on Firefox. It seems to be an encoding issue.


#4

is there a bom at the beginning?


#5

To my knowledge, right now Atom only supports UTF-8 and not other encodings. And yes, @kgrossjo is correct that US-ASCII and UTF-8 are indistinguishable without a BOM (so long as the file does not contain characters past the first 127), so that explains why file -I may have returned what it did if you didn’t have any special characters in the file.


#6

There is no bom at the beginning of the file. (These characters do not appear at the beginning of the file )

Files created with iMacro have the following encoding with file -I:
hello.js: text/x-c++; charset=utf-8

While files created with Atom get the following result:
hello.js: text/x-c++; charset=us-ascii

I think the problem is coming from there.


#7

In order to test the theory, I just created a file called test.md with Atom. When I first saved it, all it contained is Testing. Then I ran file -I:

$ file -I test.md
test.md: text/plain; charset=us-ascii

Then I added the character ʤ (for a total file contents of Testing ʤ), which is of a Unicode code page that would require special handling. Then I ran file -I:

$ file -I test.md
test.md: text/plain; charset=utf-8

So, Atom is at least capable of saving in UTF-8. And file -I is working as we surmised, returning US-ASCII when only characters in the common 128 characters are used and UTF-8 when higher code pages are involved.

I then repeated the experiment by creating a file test2.md by using the following command:

$ echo "Testing" > test2.md

And then used OS X’s TextEdit to add the ʤ character and here are the two results as in the previous test:

$ file -I test2.md
test2.md: text/plain; charset=us-ascii
$ file -I test2.md
test2.md: text/plain; charset=utf-8

So Atom’s behavior is consistent with other tools on the platform.


#8

Thank you, so this is really a bug coming from iMacros. Thank you very much.

I have opened another thread here to try solve the problem.


#9

You’re welcome. I’m sorry that I couldn’t fix it for you.


#10

I’ve solved the problem.


#11

Thanks for circling back with the link!


#12

Just for the record: UTF-8 with a BOM doesn’t make any sense. Because BOM stands for “Byte Order Mark”, and there is no byte order in UTF-8 that needs to be marked. So these iMacros folks should fix their product, I should say.

That said, there are editors out there that support “UTF-8 with BOM” as one of the possible file encodings, so with those it’s easier to cater to such broken software.


#13

That assertion is misstated. A more correct statement would be: “The sequence of bytes used to signal UTF-8 is not technically a ‘byte order mark’ (BOM), because UTF-8 has no byte order.” That statement is true and I fully agree.

Your original statement assumed that “byte order” is the only purpose of the so-called “UTF-8 BOM”, which would more correctly be called a “UTF-8 signature”. In fact this byte sequence has great value in distinguishing between ASCII, UTF-8, ISO-8859-1, Windows CP-1252, etc. If you run upon a text file stored on Windows without a BOM/signature, what encoding is it? You don’t have a clue. A UTF-8 signature unambiguously indicates which charset is used by the file.

Some people like the UTF-8 signature. Others don’t. Fine. But it’s valid and allowed; software that uses them are not “broken”. In fact, tools that don’t support a UTF-8 signature if present, and/or don’t maintain it (do we know any tools like that?) by stripping it out without asking, are the tools that are “broken”.