No support for UTF-8 BIDI


#1

Why is this always missing in text editors? Is it hard to implement?

I can count very few editors that do offer support for BIDI either in hacky ways or fully supported.

I would really love if GitHub would take that into account since it is for me what makes or breaks a text editor for me as I cant use it for clients work.

This is also the exact reason why I never bought sublime text even though I liked it allot.


Can't use Hebrew
The cursor disappears
#2

For those not “in the know”, BiDi stands for bi-directional text.

Here is a very good introduction: http://www.iamcal.com/understanding-bidirectional-text/


#3

Just reading through about half of the document that @execjosh linked, I’d say that yes, that would be hard to implement :laughing: I’ve done some basic text layout stuff for GUIs in simple games and that was reasonably simple. Thinking about what including Unicode BIDI would have taken in that makes me glad that I never had to do it :smile:

That said … I hope the Atom team supports this even though I doubt I’ll ever have to use it.


#4

But as atom is based on a browser engine, they already all support BiDi can’t they leverage the support on top of that?


#5

For layout, you would assume so yes … but what about cursor movement? Selection highlighting? Take for example the following: You have a string of text that is a mixture of strong LTR and RTL text, like the quotation example in the document linked. To simplify, I’ll just use Ls for LTR characters and Rs for RTL characters.

LL LLLL "RR!" LL LLL

Let’s say my cursor is to the left of the first LTR character. I hold down SHIFT and press the right arrow nine times. All good right? I should have everything up to and including the first double-quote selected. Now I press the right arrow one more time. Should the leftmost RTL be added to the selection? Or should the rightmost RTL character be added to the selection and not the leftmost RTL character? My assumption is that the right way to do this selection would be to add the rightmost RTL character … and then with successive keypresses the leftmost, then the exclamation point and the rest as normal.

It’s these kinds of special cases that I would believe would make supporting BIDI difficult.


#6

Yes, however enabling direction: rtl (maybe toggling it on and off such as notepad++ does). could be possible with a simple css change and it does fix the issues on where the cursor is supposed to go naturally.

I am not sure if it would help in atom directly as i was unable to select or move my cursor over any RTL text that I have written, i had to delete the entire line in order to edit it.

Again what i’m saying is that the browser already knows how to handle this and you can see that by using the reply box in this discourse forum as it works perfectly. I understand that implementing BiDi from scratch is super difficult, It just seems weird if that is the case here.

I hope i’m not coming off as someone who demands a feature right now, I just think it affects a great deal of developers in the EMEA region that either need to pass on great tools or break the workflow by working with multiple editors at the same time on the same project and I’m a bit sad that no modern tool considers implementing this these days.


#7

You’re coming off as someone passionate about wanting a very capable editor :smile: Don’t worry about it.

Maybe it is much simpler than I think it is. I’m just speculating here, not arguing against the feature. I still hope it gets implemented eventually.


#8

It must not necessarily be an all or nothing thing: a very naive implementation will probably do for many use cases.

Of course, in the end I hope Atom to be very good at this, clearly indicating each RTL block in a LTR buffer and vice versa, with nice tools and so on, but I think that even very little is far better than nothing.


#9

I am also a RTL user and would very much like to see some support implemented.

However, I acknowledge that dealing with BiDi is a PiTA because of text selection and cursor movement issues which usually makes the editor nearly unusable.

I actually like the way SublimeText deals with RTL - It doesn’t.
Instead it writes RTL characters from left to right which is much more predictable, less awkward and overall easier to deal with. It is less readable but trust me, you get used to reading from left to right and after you get the hang of it it’s actually a blessing :smile:

Atom, however, seems to deal with RTL very bad, not sure what’s going on there but the cursor is broken and makes it unusable.

implementing what ST does would be much easier than BiDi IMO and solve the problem instantly until BiDi support will become an option.


#10

I work with bi-directional text day and night. The problem you are describing is a layout problem, not the text selection problem. Basically, direction natural symbols that can be used in RTL and LTR both languages (such as !, ", and ', but not the opening and closing quotation marks thare used in word processors) have confusion when they appear in the middle of RTL and LTR text. BiDi layout algorithms use preferential attachment mechanisms to address this issue which in some rare cases cannot have a perfect solution. Similarly, direction aware symbols that can be used in both RTL and LTR languages have a different type of issue. Take the example of parenthesis ( and ), if one considers them as open and close parenthesis respectively, then a multilingual Unicode font cannot meaningfully accommodate them to serve LTR and RTL languages. Because open parenthesis will look like a closing parenthesis in RTL languages. Open/close or start/end are direction aware words that change their direction depending upon the direction of text flow of the language in question, on the other hand, left/right are fixed names, so if those characters are defined as left and right parenthesis respectively then their usage in any language will be unambiguous. Another approach will be to introduce two sets of open and close parenthesis in Unicode character map, one pair for RTL and the other pair for LTR.

Having said that, those problems are related to layout and our browsers already do a great job of dealing with bidirectional text, even in rich text editors, if appropriate “direction” and “unicode-bidi” CSS properties are set. https://developer.mozilla.org/en-US/docs/Web/CSS/unicode-bidi

Cursor movement and text highlight just needs to follow the flow of character stream and let the location of those characters be recognized by the layout engine.


#11

It may be blessing for you, but certainly not for every one. As much I like ST, I hate it not being able to render my RTL text properly. Yes there are people who think ignoring the problem is one feasible solution. But if the solution already exists, especially on the Web, there is no reason not to have it in a 21st century Web technology centric text editor from the day one.


#12

It is hard to implement. You can have a look at emacs-bidi. There are so many things to consider, much more than UAX #9.


#13

A compromise solution would be to show all text LTR (with the option to switch completely to RTL perhaps) in the exact same order as it is stored in file so that we can see the character stream. I would also suggest turning off all character merging, that is, show each character separately (using the glyph you would use if it were written in isolation).

At the same time use something like the markdown-preview package to show the rendered text with the bidi handled properly in a separate pane. That way one could edit the character stream easily while looking at the result. Sort of how latex separates the content from the presentation.


#14

+1. A hotkey to toggle flat LTR / flat RTL, while not perfect, is a very low-hanging way to get something usable. (Implementation might be as simple as <bdo dir=rtl> + some mirrored CSS?)

AFAIK that’s all Vim offers, to this day:

  • While not as “nice” to read, a flat view of the stream is occasionally much easier to “debug” than with bidi.
    It’s also easier to position cursor, especially by clicking with mouse: bidi creates many spots where a click is ambiguous.
    Even keyboard positioning in flat stream is slightly easier than bidi with logical movement and way easier than bidi with visual movement.

    • [If/when Atom gets around to implementing full bidi, I hope it’ll disregard per-platform conventions and stick to logical arrow movement. Stepping through a line in logical order is the main and only way to “debug” confused bidi text! Editors with visual movement ruin that; the only remaining hope is the little-known trick that Shift+arrows selection is typically logical.]
  • The next step in usability, still simpler to implement than full bidi, could be a mode showing deciding per-line on flat LTR or flat RTL depending on heuristic, e.g. first strongly directional char. It’d (mostly) avoid the need to toggle the view manually.
    It does start to introduce some selection complications and arrow movement anomalies (keep Right pressed, assuming it moves logically forward in LTR lines and logically backward in RTL lines, and you might get stuck in loop). It also makes indentation structure harder to see — another reason flat LTR/RTL modes are handy in a code editor.

  • Bidi requires knowing correct paraghraph/line base direction. Bidi with wrong base direction is very confusing and unhelpful. Therefore having good heuristic is important (gedit has great one), and having manual override is a must.

Disclaimer: I don’t edit bidi sources much, but when I did I was happy with Emacs’ bidi. I do have a hotkey to toggle bidi off, but despite all the theory above, I don’t remember myself using it much…


#15

Exactly @cben. (Disclaimer: I am new to Atom and even newer to hacking it) but my research led to <bdo dir=rtl> as well. There is the css attribute unicode-bidi that has an experimental value plaintext that should work as well and would be easy to add to styles.less but it is only supported in chrome version 48 and up.

Any suggestions on how we can add the <bdo dir=rtl> tag around text in the editor in real-time. I am guessing that the DOM will need to be modified and the modification will need to be triggered by some event in the editor as one types. Preferably every time one starts to enter a new line.

Making Atom configurable/customizable in its approach to bidi will go a long way in driving adoption.


#16

Let’s say my cursor is to the left of the first LTR character. I hold down SHIFT and press the right arrow nine times. All good right? I should have everything up to and including the first double-quote selected. Now I press the right arrow one more time. Should the leftmost RTL be added to the selection?

Personally I’d expect left arrow to always move cursor to the left side, regardless of text direction. These days I have to edit i18n files with a mix of Hebrew and English in the same lines. Atom as of version 1.7.2 behaves totally unexpected when I simply moving cursor along the line of text with mix of different RLT. Sometimes left arrow would move cursor to the left, regardless of RTL (that’s what I expect and prefer), but often the cursor simply disappears in some parts of string and reappears somewhere later as I keep pressing left arrow. In fact it pretty much makes Atom unusable for editing our i18n files and I have to switch back to Sublime 3 which works as expected and doesn’t has any problems with cursor in mixed direction strings.

Now sure how I could help thought. Not yet ready to dive into Atom sources. But if there’s a need for test files or someone is willing to make plugin and need help with testing — I’ll be all over it.


#17

Does anyone know of any progress on this? The problem is well-understood, browsers support RTL and bidi text well these days, so it shouldn’t be impossible to fix. I don’t mind doing the coding if I can get some input into how to contribute.


#18

hi
i inistall atome inubuntu and import a rtl language project and all of my rtl words destroy.
how i can repair this files?
طراحی سایت
دیوار مهربانی