Non-Latin scripts


#1

I wonder whether any thought has been given to supporting non-Latin scripts. Awareness of grapheme clusters is one key part of this. I don’t have an invite yet, but I looked at the autoflow package, and it didn’t seem to pay any attention to this.

JavaScript is rather weak here, so it’s not going to work without thought and effort.


#2

Only one data point, but it seems to support Japanese just fine, as far as I’ve been able to test.


#3

With Japanese, the issues could be:

  • line wrapping (allowed not just at spaces)
  • word selection
  • characters outside the BMP

I was thinking more of Arabic, Indic and South-East Asian languages where character to glyph mapping is not 1:1.

With Arabic, there’s also bidi, of course.


#4

Here is a tweet from someone that Shift-JIS is character corrupting.

I just tested ISO 2022-JP and Shift-JIS as well, they are both character corrupting. I do not have an editor that allows me to save in EUC-JP, so I have not been able to test that.