UTF-8 problems?


#1

Hi, has anyone experienced strange character encoding problems working in Atom?

99% of the time I am programming, and so am pretty much just using ASCII. However, everything is saved in UTF-8, as is usual nowadays and is Atom’s default. During the last couple of days I needed to update a bunch of strings/locales files (de, es, fr, it, ja, pt-br, zh-hans, and zh-hant) and now see several garbled characters have been introduced into lines which previously were good and which I had not edited. (I can tell this from the SVN history, for example in a de file I edited, I can see the line I changed, but also in a line I did not edit I see a ö converted to �.)

If I do a Find in Project for � (not scientific, I’m sure, but I’m not sure what else I can do) and restrict it just to the locales directory, I see a lot of results from files that previously were good. I can’t find any way to copy-and-past from the Find in Project results pane, so here is a screenshot:

Has anyone else seen similar problems? It seems to me this is pretty serious, but I cannot fully discount the problem being caused by something other than Atom either. (Perhaps the new strings I was copying-and-pasting in were in some kind of funky encoding and that was what caused the problem, for example.)


#2

Crucial to diagnosing encoding issues (which this may be) is being able to see how the text of the file is encoded. I use the hex package. You might want to use it to check out what’s going on.


#3

Actually, I am now certain the problem is coming from Atom. The Find in Project results do not match the actual contents of the files if I check in Sublime Text 2. So it would appear that Atom is introducing character encoding problems on reading the files. The actual number of problem files, according to Sublime Text 2, is much smaller, corresponding more closely to the files I edited in the last couple of days.


#4

Actually the encoding seen’s broken, when i open a file that is encoding with “window 1252” the file open with “utf-8” like always i change the encoding manually to “window 1252” and this was working.

Now when u open a second file and try to change the encoding this doesn’t work u cant change any encoding seen’s like the encoding is broken when u open a second file.


#5

Well I have been working with files that are UTF-8 only. So even if Atom does get confused with files in different encodings it really shouldn’t have done with the files I was working with.


#6

I’ve installed the hex package, but I’m not really sure what to do with it to try and diagnose the problem.


#7

What I would do is look at the bytes in the file. Can you match up the bytes in the file with a specific encoding? Is it UTF-8? You can look at this issue to see the process I went through for an encoding issue:


#8

I really feel that I need to file a bug report for this, as I am now pretty certain that it was Atom which caused the problem. Frustratingly, however, I cannot reproduce the problem with any reliability and have been unable to generate a test case. I’m not sure what to do at this stage. Any advice?

(This is, IMHO, a blocker for a 1.0 release. In this particular case we were lucky, and our QA process spotted the problems quite quickly, but I can easily imagine a situation where this would have gone unnoticed and been deployed live or shipped.)


#9

Please do file a bug on this and include as much information as possible. If you can include before and after versions of the files or appropriate sections of files without violating IP or other concerns, please do. And one more request, if you can post a link to the bug here so that interested folks can easily subscribe … I would appreciate it.


#10

Done.