File encoding puzzel


#1

I have this code at the start of my module:

  activate: (state) ->
    p = "/tmp/encodiningtext.txt"
    text = "<li>ê</li>"
    console.log text
    fs.writeFileSync(p, text, { encoding: 'utf8' })
    console.log fs.readFileSync(p, 'utf8')

But the input/output don’t match, this is what I see in Atom’s Javascript console:

<li>ê</li>
<li>�</li>

I’m sure it’s some sort of encoding issue, but I don’t know where the problem is. What makes this even weirder to me is this second case:

  activate: (state) ->
    p = "/tmp/encodiningtext.txt"
    text = "<li>�ê</li>"
    console.log text
    fs.writeFileSync(p, text, { encoding: 'utf8' })
    console.log fs.readFileSync(p, 'utf8')

When I do that the input/output match as I would have expected:

<li>�ê</li>
<li>�ê</li>

Does this make sense to anyone?

Thanks for any tips,
Jesse


#2

It’s not being stored in UTF-8. I executed the following code from the JavaScript Console in Atom:

var path = "/tmp/encoding-test.txt"
var text = "ê"
fs.writeFileSync(path, text, { encoding: "utf8" })

I then opened the file in Atom and viewed it with the hex package:

If this was stored in UTF-8, the single character ê should be the double c3 aa:

http://utf8-chartable.de/

This might be due to the shift to IO.js in v0.177.0. Would you mind opening a bug on Atom Core? Please also post a link to this topic there and a link to the bug you open here.

Thanks!


#3

Ok, will do.

It does seem likely to be a difference in IO.js. I just tested this code in node.js and it seemed to work as expected:

Jesses-Mac-Pro:~ jesse$ node --version
v0.12.0
Jesses-Mac-Pro:~ jesse$ node
> var p = "/tmp/encodiningtext.txt";
undefined
> var text = "<li>ê</li>";
undefined
> console.log(text)
<li>ê</li>
undefined
> fs.writeFileSync(p, text, { encoding: 'utf8' });
undefined
> console.log(fs.readFileSync(p, 'utf8'));
<li>ê</li>
undefined
> 

#4

Actually I just tried in IO.js and got the same results as when running directly in node. So maybe the problem is somewhere in the integration between IO.js and Chrome? Anyway issue now posed in Atom issues.

Jesses-Mac-Pro:~ jesse$ iojs --version
v1.1.0
Jesses-Mac-Pro:~ jesse$ iojs
> var p = "/tmp/encodiningtext.txt";
undefined
> var text = "<li>ê</li>";
undefined
> console.log(text)
<li>ê</li>
undefined
> fs.writeFileSync(p, text, { encoding: 'utf8' });
undefined
> console.log(fs.readFileSync(p, 'utf8'));
<li>ê</li>
undefined
> 

#5

I reported a bug a little while back regarding strange encoding problems. I never could boil it down to a test case.


#6

From above it seems there might be some issue with the underlying fs library that needs to get addressed.

I could very well be wrong, but I also think there might be a timing issue in TextBuffer/File that can mask the underlying problem somehow. Here’s a case to try in the latest 0.178.0 release:

  1. Create a new file with content:
<li>ê</li>
  1. Save it, and it should save fine.

  2. But now open the debugger and put a breakpoint at the start of:

pathwatcher/lib/File.prototype.writeFileSync
  1. Now try to save the file again. You should hit the breakpoint. You don’t need to do anything, just continue, but notice that now the file is in the wrong encoding and displays in Atom as:
<li>�</li>

So assuming it’s a timing problem that the pause at the breakpoint triggers, it’s also possible that the problem might show up in other cases when the sync file write is slow for some other reason. Maybe?

@leedohm Does that seem a reasonable explanation to you? If so I can file a bug… just not sure if the problem is in pathwatcher.File or in the interaction between pathwatcher.File and TextBuffer.

For reference I’m also on OS X. Version 10.10.2.

Jesse


#7

Filing a separate bug isn’t the right thing to do, one of them will just have to be closed as duplicate. But you can always add more information to the bug you already have open.