# Implementation In the following examples we assume: * `mod-n` refers to the classical interpretation of tabsize stopping at every n-th column (common settings are `mod-2`, `mod-4`, `mod-8`) * `spaces-n` refers to the elastic tabstop setting using at least n spaces for padding (`spaces-0`, `spaces-1`, `spaces-2` etc.) * All measurements and spacing-related values refer to monospaced characters which are easier to handle. Ideally we would use proportional fonts which are measured relative to either the line-height, a certain character or in pixels. * Keep in mind that all tabs are stored as `\t` characters internally(!) and the interpretation as indentation tabs or elastic tabstops is up to the text editor | tab type | setting | whitespace | |-----------------|------------|------------| | indentation | `mod-4` | `→` | | elastic tabstop | `spaces-2` | `↔` | Also see the [reference implementation 'Elastic Notepad' by Nick Gravegaard in the Scala programming language](https://github.com/nickgravgaard/ElasticNotepad/blob/master/app/src/elasticTabstops.scala) ## Elastic tabstops basics ### Introduction For a basic implementation we have to consider the following edge cases. Let's start with a simple example where one word is just over 8 characters, so we hit the next tabstop in most settings. This is how it looks right now in most text editors (assuming `tabsize=4`): ``` ╲╱ aaaaaaaaa→ ccccccccc→ eee bbbbb→ ddddd→ fff ╱╲ // note how the words have different alignments ``` The word `aaaaaaaaa` is over 8 characters so the next tabstop is used whereas `bbbbb` is to short and the next word doesn't align with the word above. With elastic tabstops we want all "cells" to be aligned together in "column blocks" which together form an "alignment block": ``` aaaaaaaaa↔ ccccccccc↔ eee bbbbb↔ ddddd↔ fff ╱╲ ╱╲ // see how the words are aligned ``` If the first cells are empty however we have to be careful not to confuse them with indentation tabs: ``` aaaaaaaaa↔ ccccccccc↔ eee ↔ ddddd↔ fff ↔ ii↔ kkkk hhhhhh↔ jjjjjjjj↔ lll ``` This becomes more difficult if we actually mix indentation with empty elastic tabstops: ``` → aaaaaaaaa↔ ccccccccc↔ eee → ↔ ddddd↔ fff // this blank line starts a new alignment block → ↔ ii↔ kkkk → hhhhhh↔ jjjjjjjj↔ lll ``` If we fill the blank line with tabs it means both blocks should be fused together: ``` → aaaaaaaaa↔ ccccccccc↔ eee → ↔ ddddd↔ fff → ↔ ↔ // this line connects the sections → ↔ ii↔ kkkk → hhhhhh↔ jjjjjjjj↔ lll ``` If we want elastic tabstops and indentations to allow different settings (like `mod-4` vs `spaces-2`) we have to treat them differently: ``` aaaaaaaaa → b↔ d↔ f → ↔ e↔ g ``` *Note how the indentation is 4 characters wide whereas the elastic tabstops are only 3 characters wide.* ### Basic algorithm Let's say we have the following code and want to align with elastic tabstops. :::{literalinclude} ../../tests/simple_whitespaces.txt ::: **1. Split each line by tabs to get *cells*.** ``` [ [ "01" ] , [ "" , "456789A123" ] , [ "" , "" , "// rectangle" ] , [ "" , "" , "aa" , "ddddd" , "gggggggg", "jjj" ] , [ "" , "" , "bbbbbbb" , "e" , "hhhhhh" , "kkk" ] , [ "" , "" , "cccccccccccc" , "fffffffff", "iiii" , "lll" ] , [ "" ] , [ "" , "" , "// triangle bottom left" ] , [ "" , "" , "aa" ] , [ "" , "" , "bbbbbbb" , "e" ] , [ "" , "" , "cccccccccccc" , "fffffffff", "iiii" ] , [ "" ] , [ "" , "" , "// triangle bottom right" ] , [ "" , "" , "" , "" , "gggggggg" ] , [ "" , "" , "" , "e" , "hhhhhh" ] , [ "" , "" , "cccccccccccc" , "fffffffff", "iiii" ] , [ "" ] , [ "" , "" , "// diagonal down" ] , [ "" , "" , "aa" , "ddddd" ] , [ "" , "" , "bbbbbbb" , "" , "hhhhhh" ] , [ "" , "" , "cccccccccccc" , "" , "" , "lll" ] , [ "" , "45" ] , [ "0123456789A12" ] , [ "" ] ] ``` **2. For each *cell* calculate the visual width** * For monospaced fonts this is equal to the string length * For proportional fonts this must be calculated based on font metrics (not covered here for now). ``` [ [ 2 ] 01 , [ 0, 10 ] → 456789A123 , [ 0, 0, 12 ] → → // rectangle , [ 0, 0, 2, 5, 8, 3 ] → → aa↔ ddddd↔ gggggggg↔ jjj , [ 0, 0, 7, 1, 6, 3 ] → → bbbbbbb↔ e↔ hhhhhh↔ kkk , [ 0, 0, 12, 9, 4, 3 ] → → cccccccccccc↔ fffffffff↔ iiii↔ lll , [ 0 ] , [ 0, 0, 23 ] → → // triangle bottom left , [ 0, 0, 2 ] → → aa , [ 0, 0, 7, 1 ] → → bbbbbbb↔ e , [ 0, 0, 12, 9, 4 ] → → cccccccccccc↔ fffffffff↔ iiii , [ 0 ] , [ 0, 0, 24 ] → → // triangle bottom right , [ 0, 0, 0, 0, 8 ] → → ↔ ↔ gggggggg , [ 0, 0, 0, 1, 6 ] → → ↔ e↔ hhhhhh , [ 0, 0, 12, 9, 4 ] → → cccccccccccc fffffffff↔ iiii , [ 0 ] , [ 0, 0, 16 ] → → // diagonal down , [ 0, 0, 2, 5 ] → → aa↔ ddddd , [ 0, 0, 7, 0, 6 ] → → bbbbbbb↔ ↔ hhhhhh , [ 0, 0, 12, 0, 0, 3 ] → → cccccccccccc↔ ↔ ↔ lll , [ 0, 2 ] → 45 , [ 13 ] 0123456789A12 , [ 0 ] ] ``` **3. Remove the last cell of every line because it has no influence on the alignment of the elastic tabstop before it.** ``` [ [ ] ~~01~~ , [ 0 ] → ~~456789A123~~ , [ 0, 0 ] → → ~~// rectangle~~ , [ 0, 0, 2, 5, 8 ] → → aa↔ ddddd↔ gggggggg↔ ~~jjj~~ , [ 0, 0, 7, 1, 6 ] → → bbbbbbb↔ e↔ hhhhhh↔ ~~kkk~~ , [ 0, 0, 12, 9, 4 ] → → cccccccccccc↔ fffffffff↔ iiii↔ ~~lll~~ , [ ] , [ 0, 0 ] → → ~~// triangle bottom left~~ , [ 0, 0 ] → → ~~aa~~ , [ 0, 0, 7 ] → → bbbbbbb↔ ~~e~~ , [ 0, 0, 12, 9 ] → → cccccccccccc↔ fffffffff↔ ~~iiii~~ , [ ] , [ 0, 0 ] → → ~~// triangle bottom right~~ , [ 0, 0, 0, 0 ] → → ↔ ↔ ~~gggggggg~~ , [ 0, 0, 0, 1 ] → → ↔ e↔ ~~hhhhhh~~ , [ 0, 0, 12, 9 ] → → cccccccccccc fffffffff↔ ~~iiii~~ , [ ] , [ 0, 0 ] → → ~~// diagonal down~~ , [ 0, 0, 2 ] → → aa↔ ~~ddddd~~ , [ 0, 0, 7, 0 ] → → bbbbbbb↔ ↔ ~~hhhhhh~~ , [ 0, 0, 12, 0, 0 ] → → cccccccccccc↔ ↔ ↔ ~~lll~~ , [ 0 ] → ~~45~~ , [ ] ~~0123456789A12~~ , [ ] ] ``` Removing the last cell also prevents indentation tabs to be aligned with the first line of a column (e.g. if we would keep the cell `456789A123` (length=10) all the indentation tabs below would assume an incorrect width of 10 and it would look like this: ``` 01 → 456789A123 → → // rectangle → → aa↔ ddddd↔ gggggggg↔ jjj → → bbbbbbb↔ e↔ hhhhhh↔ kkk → → cccccccccccc↔ fffffffff↔ iiii↔ lll ``` *This is not what we want!* **4. Set every *column* value to the max value of the column by searching the max value of each run of values >0 (zeros interupt a run).** Note how indentation tabs are runs of all zeros which have no value >0 in the column before, whereas empty elastic tabstops have values >0 before them (note that `diagonal down` below has some trailing zeros which are still used to align the words `hhhhhh` and `lll`). ``` [ [ ] ~~01~~ , [ 0 ] → ~~456789A123~~ , [ 0, 0 ] → → ~~// rectangle~~ , [ 0, 0, 12, 9, 8 ] → → aa↔ ddddd↔ gggggggg↔ ~~jjj~~ , [ 0, 0, 12, 9, 8 ] → → bbbbbbb↔ e↔ hhhhhh↔ ~~kkk~~ , [ 0, 0, 12, 9, 8 ] → → cccccccccccc↔ fffffffff↔ iiii↔ ~~lll~~ , [ ] , [ 0, 0 ] → → ~~// triangle bottom left~~ , [ 0, 0 ] → → ~~aa~~ , [ 0, 0, 12 ] → → bbbbbbb↔ ~~e~~ , [ 0, 0, 12, 9 ] → → cccccccccccc↔ fffffffff↔ ~~iiii~~ , [ ] , [ 0, 0 ] → → ~~// triangle bottom right~~ , [ 0, 0, 12, 9 ] → → ↔ ↔ ~~gggggggg~~ , [ 0, 0, 12, 9 ] → → ↔ e↔ ~~hhhhhh~~ , [ 0, 0, 12, 9 ] → → cccccccccccc fffffffff↔ ~~iiii~~ , [ ] , [ 0, 0 ] → → ~~// diagonal down~~ , [ 0, 0, 12 ] → → aa↔ ~~ddddd~~ , [ 0, 0, 12, 0 ] → → bbbbbbb↔ ↔ ~~hhhhhh~~ , [ 0, 0, 12, 0, 0 ] → → cccccccccccc↔ ↔ ↔ ~~lll~~ , [ 0 ] → 45 , [ ] 0123456789A12 , [ ] ] ``` **5. Visually extend each *column block* to the widest *cell*.** Note that so far we only have calculated the relative width, not the absolute tabstops. Changing the visual representation can be more or less difficult depending on how rendering is implemented in the text editor. Some options include: * set `display: inline-block: width: x` of the cell containing an elastic tabstop * set `letter-spacing` of the tab character of an elastic tabstop * set `position: absolute; left: x` of the cell content before an elastic tabstop * introduce fake or virtual spaces Because most text editors don't have good support for flexible text rendering most implementations resort to inserting spaces for alignment which *Nick Gravgaard* duped `Elastic Tabstops Lite`. :::{literalinclude} ../../tests/simple_spaces-2.txt ::: Note that *Elastic Tabstops™* is a trademark of *Nick Gravgaard* and must not be used for incorrect implementations which resort to inserting spaces for alignment. He [can](https://github.com/SublimeText/ElasticTabstops/issues/4) and [will](https://github.com/ajaxorg/ace/pull/1152) come after you! ### Issues with monospaced fonts #### Wacky cursor movement and vertical selection The editor will probably still use mod-n characters to handle cursor movement when moving up or down a line. With elastic tabstops the position may be off. Depending on your solution for alignment you have to consider: * `charIdx` the tab's position within the text string `['\t', '\t', 'a', 'a', 'a', '\t', 'b', 'b', '\t', 'c', '\t', 'd', 'd', 'd', 'd', '\t', 'f', 'f', 'f', 'f']` * `tabsize`: the visual tab size used by the text editor * `visPosMod`: the actual tab position according to the `mod-n` setting for all tabs (because the are still interpreted as the same by the text editor) * `visPosEts`: the visual position according to elastic tabstops (plus the `mod-n` setting for indentation tabs). * `widthEts`: the desired padding and width for elastic tabstops ``` | | | | | | | | | // all tabstops mod-4 → → aaa→bb→ c→ dddd→ ffff // text displayed with visible whitespaces 4 4 1 2 3 4 // and the visual tab size used by the text editor for mod-4 0123456789A123456789B123456789C12 // column position → → aaa→bb→ c→ dddd→ ffff // internal string using actual \t characters 0 1 2345678 9A 12345 6789 // and character's positions in the string ``` If you change the width of a tab or introduce additional characters the index positions and visual positions change for the text editor and this screws up all kind of features which rely on them, like vertical cursor movement and vertical selection. To address this issue you have to convert between these coordinate systems. What the editor sees: 1. Using visual column position ``` 01230123012301230123 0123456789A123456789B ↓ Editor thinks we are at visPosMod=9 → aa→ ccc → bbbbbbb→ddd ↑ and tries to move us to visPosMod=9 again ``` Thus we need to calculate how far away the cursor is from the last tabstop's visual column position (`visPosEts`) and add the same offset to the tabstop in the next line. 2. Using character index in string ``` 0123456789A123456789B ↓ Editor thinks we are at charIdx=5 →aa→ccc →bbbbbbb→ddd ↑ and tries to move us to charIdx=6 ``` Thus we need to calculate how far away the cursor is from the last tabstop character index in string (`charIdx`) and add the same offset to the tabstop in the next line. What Elastic Tabstops sees: ``` 0123456789A123456789B ↓ actually we are at visPosEts=15.. → aa↔ ccc → bbbbbbbb↔ ddd ↑ ..and thus actually need to move to visPosMod=13 or charIdx=10 ``` ## Spacing The spacing of an elastic tabstop can be defined by 3 values: * padding: the amount of padding which is added to every cell no matter what. * minimum width: the minimum width a cell is expanded (after applying the padding). * modulo: expand to next modulos column (0=disabled) to emulate classic tab behaviour Some example configurations and how they compare: ``` mod-2 = { pad: 1, min: 0, mod: 2 } mod-4 = { pad: 1, min: 0, mod: 4 } mod-8 = { pad: 1, min: 0, mod: 8 } spaces-0 = { pad: 0, min: 0, mod: 0 } spaces-1 = { pad: 1, min: 0, mod: 0 } spaces-2 = { pad: 2, min: 0, mod: 0 } spaces-4 = { pad: 4, min: 0, mod: 0 } reference = { pad: 2, min: 4, mod: 0 } ``` The formula to calculate the spacing: ``` width' = max(cellWidth_MAX + pad, min) - cellWidth width = mod == 0 ? width' : width' + (visualPos_ETS + width') % mod ``` without tabs: ``` foo(a, b, c) foobar(a, b, c) foobarbaz(a, b, c) public int myint = 123 protected float myfloat = 0.5 private string mystr = "abc" ``` with classic tabs (`tabsize=4`): ``` foo→(a, b, c) foobar→ (a, b, c) foobarbaz→ (a, b, c) public→ int→myint→ = 123 protected→ float→ myfloat→= 0.5 private→string→ mystr→ = "abc" ``` *Some cells are aligned but it's too small to actually work and using tabs within a line doesn't really make sense.* with classic tabs (`tabsize=8`): ``` foo→ (a, b, c) foobar→ (a, b, c) foobarbaz→ (a, b, c) public→ int→myint→ = 123 protected→ float→ myfloat→= 0.5 private→string→ mystr→ = "abc" ``` *With a larger tabsize more cells get aligned but it's still not enough to align everything. Even higher tabsizes are required to perfectly align everything automatically but the loose spacing gets out of hand.* with elastic tabstops (`spaces-2`): ``` foo↔ (a, b, c) foobar↔ (a, b, c) foobarbaz↔ (a, b, c) public↔ int↔ myint↔ = 123 protected↔ float↔ myfloat↔ = 0.5 private↔ string↔ mystr↔ = "abc" ``` *With elastic tabstops and padding of 2 spaces we get good and loose alignment but the function parameters are very far away.* with elastic tabstops (`spaces-1`): ``` foo↔ (a, b, c) foobar↔ (a, b, c) foobarbaz↔(a, b, c) public↔ int↔ myint↔ = 123 protected↔float↔ myfloat↔= 0.5 private↔ string↔mystr↔ = "abc" ``` *This is a balanced option but it is hard tell the difference if a space or tab is used for strings which are very close.* with elastic tabstops (`spaces-0`): ``` foo↔ (a, b, c) foobar↔ (a, b, c) foobarbaz(a, b, c) ╱╲ ╲╱ ╲╱ elastic tabstops hidden in here public↔ int↔ myint↔ = 123 protectedfloat↔myfloat= 0.5 private↔ stringmystr↔ = "abc" ╱╲ ╱╲ elastic tabstops hidden in here ``` *While for functions this supports the code style "no space before opening brace", tabular data seems to have no gap at all and now there is no space before equal sign.* with elastic tabstops (`spaces-0`) plus additional alignment spaces : ``` foo↔ (a, b, c) foobar↔ (a, b, c) foobarbaz(a, b, c) public↔ ·int↔ ·myint↔ ·= 123 protected·float↔·myfloat·= ··0.5 private↔ ·string·mystr↔ ·= "abc" ``` *You can also add additional padding with spaces if you really want to use the `spaces-0` option but this requires more manual work.* ## Indentlevel aware elastic tabstops The official version of elastic tabstops (by the Gravgaard commitee) defines elastic tabstops solely by the tab number. However this can lead to the following undesired alignment (see [original discussion](https://github.com/nick-gravgaard/ElasticNotepad/issues/4)): ``` def foo(x, y): → if x > y: → → temp↔ = x↔ # store temp → → x↔ = y↔ # swap → → y↔ = temp↔ # restore temp → start↔ = 0↔ # set start → end↔ = 12345↔ # set end ``` Thus it would make sense to define an alignment block by the indentation level. ``` def foo(x, y): → if x > y: → → temp↔ = x↔ # store temp → → x↔ = y↔ # swap → → y↔ = temp↔ # restore temp → start↔ = 0↔ # set start → end↔ = 12345↔ # set end ``` You have to be careful with this decision because it will change where a user will insert elastic tabstops and it will mess up the alignment on different implementations. Another good reason why you want to use this feature is for transitioning. While your text editor may support elastic tabstops and everything looks good, in all other tools it probably won't. in a text viewer with no elastic tabstop support (`tabsize=4`) ``` def foo(x, y): → if x > y: → → temp→ = x→# store temp → → x→ = y→# swap → → y→ = temp→# restore temp → start→ = 0→# set start → end→= 12345→# set end ``` Now you may be tempted to replace the elastic tabstops in the file with alignment spaces and revert them back in the viewer: ``` def foo(x, y): → if x > y: → → temp··= x······# store temp → → x·····= y······# swap → → y·····= temp···# restore temp → start·····= 0······# set start → end·······= 12345··# set end ``` This only works under the following assumptions: * there can be no multiple spaces in strings(!) or the implementation needs language aware features for quotes: `print('Hello , World!')` * there were no manual alignment spaces inserted by the user (e.g. to align numbers or comments) * padding is larger than 2 (at least for saving in the file) The best solution for this is to insert tabs after alignment spaces to denote "spaces-aligned elastic tabstops". in a text viewer without elastic tabstop support (`tabsize=4`): ``` def foo(x, y): → if x > y: → → temp··→ = x······→ # store temp → → x·····→ = y······→ # swap → → y·····→ = temp···→ # restore temp → start·····→ = 0······→ # set start → end·······→ = 12345··→ # set end ``` For the sake of the argument let's assume we actually wanted the equalsigns to line up like this. In a text viewer with no elastic tabstop support and `tabsize=6` it will look like this: ``` def foo(x, y): → if x > y: → → temp··→ = x······→ # store temp → → x·····→ = y······→ # swap → → y·····→ = temp···→ # restore temp → start···→ = 0······→ # set start → end·····→ = 12345··→ # set end ``` A `tabsize=6` is exotic but the general problem here is that the indent tabs change the visual position of any following tabs and under certain circumstances will change the way it was aligned in the elastic tabstops situation. One way to solve this is to force spaces for indent tabs. However this forces an indentation style (in the file) onto the user and this is undesired. With "indentaware spaces aligned elastic tabstops" the users have the option: ``` def foo(x, y): → if x > y: → → temp··= x·····# store temp → → x·····= y·····# swap → → y·····= temp··# restore temp → start··= 0······# set start → end····= 12345··# set end ``` For the sake of the argument lets assume we wanted the equalsigns to line up like this now. using tabs before alignment spaces in a text viewer with no elastic tabstop support (`tabsize=4`): ``` def foo(x, y): → if x > y: → → temp··→ = x·····→ # store temp → → x·····→ = y·····→ # swap → → y·····→ = temp··→ # restore temp → start··→= 0······→ # set start → end····→= 12345··→ # set end ``` using tabs before alignment spaces in a text viewer with no elastic tabstop support (`tabsize=6`): ``` def foo(x, y): → if x > y: → → temp··→ = x·····→ # store temp → → x·····→ = y·····→ # swap → → y·····→ = temp··→ # restore temp → start··→ = 0······→ # set start → end····→ = 12345··→ # set end ``` Now because the elastic tabstops were aligned with the indentation level in the first place their relative position won't change. Another way how this can be solved is to use special inline-blockcomment markers `/**/` to denote "spaces aligned elastic tabstops" but this again only works under the following assumption: * requires language aware features for `inline blockcomments` * only works in programming languages which support inline-blockcomments (which in the case of Python, does not!). * only works if the code is not nested within another blockcomment And it looks super strange: ``` def foo(x, y): → if x > y: → → temp··/**/= x······/**/# store temp → → x·····/**/= y······/**/# swap → → y·····/**/= temp···/**/# restore temp → start·····/**/= 0······/**/# set start → end·······/**/= 12345··/**/# set end ``` ### Algorithm ``` [ cell widths level , [ ] 0 ~~def foo(x, y):~~ , [ 0 ] 1 → ~~if x > y:~~ , [ 0, 0, 4, 3 ] 2 → → temp↔ = x↔ ~~# store temp~~ , [ 0, 0, 1, 3 ] 2 → → x↔ = y↔ ~~# swap~~ , [ 0, 0, 1, 6 ] 2 → → y↔ = temp↔ ~~# restore tem~~ , [ 0, 5, 3 ] 1 → start↔ = 0↔ ~~# set start~~ , [ 0, 3, 7 ] 1 → end↔ = 12345↔ ~~# set end~~ , ] ``` TODO! ## Right-aligning numbers ### Simple number handling ``` ↔ 123 ↔ -123 ↔ 3.14 ↔ .5 ↔ +0.1e-10 ↔ 1e10 ↔ abc ↔ -.1 ↔ 30th ``` can be aligned to: ``` ↔ 123 ↔ -123 ↔ 3.14 ↔ .5 ↔ +0.1e-10 ↔ abc ↔ -.1 ↔ 30th ``` Note that `+-` is considered a part of the integer iff it is valid number. `.5` and `-.1` are a tricky case where the first is a valid number with `""` integer part (as opposed to a string which is not a number) and the second only has `-` as integer part (with no adjecent digits). They are both valid shorthand floats. For simple number alignment you can treat any suffix as the decimal part (`1e-10`, `3.14e+10`, `0.5f`, `30th`). Regex for simple integer part detection: `/^(?:[+-]?\d+)|(?:[+-]?(?=\.\d))/`. or just align all to right if a single valid number was detected in the column: ``` ↔ 123 ↔ -123 ↔ 3.14 ↔ .5 ↔ +0.1e-10 ↔ abc ↔ -.1 ↔ 30th ``` ### Scientific notation ``` ↔ 123 ↔ 3.14 ↔ 1e+3 ↔ 1e-3 ↔ 1.2e+3 ↔ 1.2345e+3 ``` The handling of scientific notation is ambiguous: `1e3` is a integer, `1e-3` is a float, `1.2e3` is a integer, `1.2345e3` is a float. Should we always align on the `.` (which might not be there), on the `e` or depending on the type? ``` align on first non-integer (simple): ↔ 123 ↔ 3.14 ↔ 1e+3 ↔ 1e-3 ↔ 1.2e+3 ↔ 1.2345e+3 align on `.` (simple): ↔ 123 ↔ 3.14 ↔ 1e+3 ↔ 1e-3 ↔ 1.2e+3 ↔ 1.2345e+3 align on `e` as integer (always assume integer): ↔ 123 ↔ 3.14 ↔ 1e+3 ↔ 1e-3 ↔ 1.2e+3 ↔ 1.2345e+3 align on `e` as `.` (always assume float): ↔ 123 ↔ 3.14 ↔ 1e+3 ↔ 1e-3 ↔ 1.2e+3 ↔ 1.2345e+3 align by type (requires parsing and is ambiguous): ↔ 123 ↔ 3.14 ↔ 1e+3 // but why align on 'e' and not as a whole int? ↔ 1e-3 // but why align on 'e' and not as a decimal only? ↔ 1.2e+3 // but why align on 'e' and not as a whole int? ↔ 1.2345e+3 // but why align on '.' and not 'e'? align on expanded value (requires parsing and is ambiguous): ↔ 123 ↔ 3.14 ↔ 1e+3 // 1000 4 ints 0 decimals ↔ 1e-3 // 0.001 1 ints 4 decimals ↔ 1.2e+3 // 1200 4 ints 0 decimals ↔ 1.2345e+3 // 1234.5 4 ints 2 decimals ``` ### Binary notations Most programming languages also allow to write numbers for binary notations (hex, octal, binary). While these are integers too and aligment makes sense within the same type (e.g. all hexs) it's arguable what the point of mixed number types is as their digits share no relationship. ``` ↔ 123 ↔ 0xC0C0 ↔ 0xDEADBEEF ↔ 0x is not hex ↔ 0o777 ↔ 0o1445 ↔ 0123 ↔ 0b1010 ↔ 0b11001100 ↔ 3.14 ``` can be aligned to: ``` ↔ 123 ↔ 0xC0C0 ↔ 0xDEADBEEF ↔ 0x is not hex ↔ 0o777 ↔ 0o1445 ↔ 0123 ↔ 0b1010 ↔ 0b11001100 ↔ 3.14 ``` Regex for integer part detection for most number formats in programming languages: `^(?:[+-]?(?=0)0[xX][0-9a-fA-F]+)|^(?:[+-]?(?=0)0[oO]?[0-7]+)|^(?:[+-]?(?=0)0[bB][01]+)|^(?:[+-]?\d+)|^(?:[+-]?(?=.\d))` Note that some languages also support digit separaters (or numeric separators) like `12'345'678'901'234LL` or `12_345_678_901_234LL`. These can also be used for binary formats. They don't even have to separate on fixed intervals (`0xCAFE_C0_C0`). Consecutive separators are usually not allowed and have to occur between digits (not allowed: `0xCA__FE` or `0x_F_F` or `0xF_F_`) but they can be prefixed with zeros (`0x0FF`). ### Special numbers Some programming lanuages support special numbers like: * not-a-number: `NaN` * infinity: `Infinity`, `inf`, `INF` * complex numbers with imaginary units: `i`, `j`, `ij` * special binary notation: `0x1.0p+0`, `0x1.FFFFFFFFFFFFFp+1023` Because the possible notations are infinite these should be either ignored or handled with language-aware features. ### Optionality Note that right aligning numbers (unlike indentlevel aware elastic tabstops) won't change how blocks nearby will behave so it's up to the editor and user if and how they want numbers to align. If number alignment is disabled the column will become shorter but the next elastic tabstop will still be aligned. This will only become relevant if alignment spaces are inserted into the actual file. ``` aaa↔ 123↔ ccc bbbbb↔ 3.14↔ ddd ``` ``` aaa↔ 123↔ ccc bbbbb↔ 3.14↔ ddd ``` Altough in some cases you might want to prefer to set the tabstop before the number instead of after: without right-aligned numbers: ``` mat = [ → [ -1↔ , 0↔ , 1↔ ], → [ 0↔ , 1↔ , 1↔ ], → [ 123↔ , 234↔ , 456↔ ] ] ``` with right-aligned numbers: ``` mat = [ → [↔ -1,↔ 0,↔ 1], → [↔ 0,↔ 1,↔ 1], → [↔ 123,↔ 234,↔ 456] ] ``` ### Suffixed numbers Another issue which needs to be handled is how suffixed numbers should be treated: * Should the whole cell be treated as a number or a string? Only if it parses to a valid number (`123f`)? * Should the suffix be treated as the decimal part? * Should the integer, decimal and string be separated altogether? * Should it be handled differently when the suffix has a whitespace between or not? ``` Alfa↔ increased↔ 123x Bravo↔ increased↔ 3.14x Charlie↔ increased↔ 42x Alfa↔ increased↔ 123 times Bravo↔ increased↔ 3.14 times Charlie↔ increased↔ 42 times string after decimal (surprising): Alfa↔ increased↔ 123···x Bravo↔ increased↔ 3.14x Charlie↔ increased↔ 42···x string after decimal (surprising): Alfa↔ increased↔ 123··· times Bravo↔ increased↔ 3.14 times Charlie↔ increased↔ 42··· times ``` Note that in some programming languages `123u`, `123f`, `123L` are valid numbers which denote the type (unsigned int, float, long etc.). ### String alignment Another option is where strings within columns of numbers should align to. This is also the reason why floats with empty integer parts (`.5`) and non-numbers (`abc`) should be differentiated: ``` all to right: ↔ 12345 ↔ abc ↔ abcdefgh ↔ 0.2345 to left (non-number): ↔ 12345 ↔ abc ↔ abcdefgh ↔ 0.2345 with int (non-number) ↔ 12345 ↔ abc ↔ abcdefgh // overflow to right ↔ 0.2345 as int: ↔ 12345 ↔ abc ↔ abcdefgh ↔ 0.2345 to center (half-int, half-float): ↔ 12345 ↔ abc ↔ abcdefgh ↔ 0.2345 with decimal (non-number): ↔ 12345 ↔ abc ↔ abcdefgh // overflow to left ↔ 0.2345 as decimal (non-number): ↔ 12345 ↔ abc ↔ abcdefgh ↔ 0.2345 to right (non-number): ↔ 12345 ↔ abc ↔ abcdefgh ↔ 0.2345 ``` The handling of "to center" is ambigious for monospaced fonts. If the string has an odd length (like `abc`) it's open to the implementation to decide if a larger part (`ab` or `bc`) should go to the integer side or decimal side or if it should be biased towards the longer side once the max width has been calculated as to not grow the column unnecessarily. ``` ↔ 12 ↔ abcde // always bias to integer side (grows left) ↔ 3.14 ↔ 12 ↔ abcde // bias to (longer) decimal side ↔ 3.14 ↔ 123 ↔ abcde // always bias to decimal side (grows right) ↔ 3.1 ↔ 123 ↔ abcde // bias to (longer) integer side ↔ 3.1 ``` One situation where string alignment is relevant are config files and jsons: ``` width↔ =↔ 3 height↔ =↔ 23 depth↔ =↔ 123 name↔ =↔ "Alfa" street↔ =↔ "Mainstreet" city↔ =↔ "Metropolis" distance↔ =↔ 0.2 angle↔ =↔ 0.234 offset↔ =↔ 0.2345 ``` *Best not to mix them in the first place and add some blanks.* A situation where you want to use "strings as ints" is if semantically similar function calls can have mixed int and strings: ``` foo↔ (↔ 1,↔ "bb",↔ 3) foobar↔ (↔ "a",↔ 234,↔ 345) foobarbaz↔ (↔ 12345,↔ 23456,↔ "ccccc") ``` Otherwise you would have to introduce a lot of extra tabs which causes loose spacing just to keep vertical selectability: ``` foo↔ (↔ 1↔ ,↔"bb"↔ ,↔ 3↔ ) foobar↔ (↔ "a"↔ ,↔ 234↔ ,↔ 345↔ ) foobarbaz↔ (↔ 12345↔ ,↔ 23456↔ ,↔ "ccccc"↔ ) ``` ### Alignment groups Another possible option is that strings are considered alignment breakers and split the column into alignment groups such that numbers are only aligned within their group: ``` width↔ = 3 height↔ = 23 depth↔ = 123 name↔ = "Alfa" distance↔ = 0.2 angle↔ = 0.234 offset↔ = 0.2345 ``` This changes "string alignment" behaviour because there is no longer a longest integer/decimal/string for the whole column. ### Algorithm If we look closely at integer and floats we see the cell width has to expand to the longest integer part (including signs) plus the longest decimal part (including dot) or longest string: ``` ↔ 5↔ // digit ↔ 345↔ // int ↔ 12345↔ // long ↔ 45th↔ // int string ↔ abc↔ // string ↔ abcdefgh↔ // longstring ↔ .2↔ // shorthand ↔ 0.234↔ // float ↔ 0.23456↔ // double max width=7 ``` ``` int decimal /---\/----\ ↔ 5↔ // digit ↔ 345↔ // int ↔ 12345↔ // long ↔ 45th↔ // int string ↔ abc↔ // string ↔ abcdefgh↔ // longstring ↔ .2↔ // shorthand ↔ 0.234↔ // float ↔ 0.23456↔ // double \-----/ str longest integer part=5 longest decimal part=6 longest string part=8 max width=max(integer+decimal,string)=11 ``` Note that the longest string is longer than either the integer or decimal part. This becomes relevant if you want to treat string alignment differently. So the whole column needs to be 11 characters wide and some numbers influence the width of the previous(!) elastic tabstop. unaligned: ``` aaa↔ 5↔ // digit b↔ 345↔ // int cccccc↔ 12345↔ // long dddd↔ 45th↔ // int string eee↔ .2↔ // shorthand ff↔ 0.234↔ // float ggggg↔ 0.23456↔ // double ``` without right-aligning numbers: ``` /----\ /-----\ /-----------\ aaa↔ 5↔ // digit b↔ 345↔ // int cccccc↔ 12345↔ // long dddd↔ 45th↔ // int string eee↔ .2↔ // shorthand ff↔ 0.234↔ // float ggggg↔ 0.23456↔ // double ``` with right-aligned numbers: ``` /----\ /---------\ /-----------\ aaa↔ ····5↔ // digit b↔ ··345↔ // int cccccc↔ 12345↔ // long dddd↔ ···45th↔ // int string eee↔ ·····.2↔ // shorthand ff↔ ····0.234↔ // float ggggg↔ ····0.23456↔ // double ``` Note the extra spacing we have to back-propagate. ## Converting from elastic tabstops to alignment spaces Let's assume we already use elastic tabstops in a text file and want to convert it to spaces which align the columns. This is pretty easy because we already calculated the virtual character width with the algorithm above and the spaces just have to be inserted into the text file. The reverse process however requires special treatment. ## Converting from alignment spaces to elastic tabstops Let's assume we have a text which uses a lot of spaces for alignment and we want to convert it to elastic tabstops: :::{literalinclude} ../../tests/simple_spaces-2_whitespaces.txt ::: **1. For each line find the start and end index position of consecutive spaces (i.e. more than one space).** In the following diagram are two arrays which are displayed next to each other vertically because we need them separately. ``` line start indices end indices 0 [ [ ] [ [ ] 01 1 , [ ] , [ ] →456789A123 2 , [ ] , [ ] →→//·rectangle 3 , [ 4, 21, 35 ] , [ 16, 27, 37 ] →→aa············ddddd······gggggggg··jjj 4 , [ 9, 17, 33 ] , [ 16, 27, 37 ] →→bbbbbbb·······e··········hhhhhh····kkk 5 , [ 14, 25, 31 ] , [ 16, 27, 37 ] →→cccccccccccc··fffffffff··iiii······lll 6 , [ ] , [ ] 7 , [ ] , [ ] →→//·triangle·bottom·left 8 , [ ] , [ ] →→aa 9 , [ 9 ] , [ 16 ] →→bbbbbbb·······e 10 , [ 14, 25 ] , [ 16, 27 ] →→cccccccccccc··fffffffff··iiii 11 , [ ] , [ ] 12 , [ ] , [ ] →→//·triangle·bottom·right 13 , [ 2 ] , [ 27 ] →→·························gggggggg 14 , [ 2, 17 ] , [ 16, 27 ] →→··············e··········hhhhhh 15 , [ 14, 25 ] , [ 16, 27 ] →→cccccccccccc··fffffffff··iiii 16 , [ ] , [ ] 17 , [ ] , [ ] →→//·diagonal·down 18 , [ 4 ] , [ 16 ] →→aa············ddddd 19 , [ 9 ] , [ 18 ] →→bbbbbbb·········hhhhhh 20 , [ 14 ] , [ 20 ] →→cccccccccccc······lll 21 , [ ] , [ ] →45 22 , [ ] , [ ] 0123456789A12 23 , [ ] , [ ] ] ] ``` Note that `rectangle` and `triangle bottom left` are trivial and only require one tab whereas `triangle bottom right` and `diagonal down` require multiple tabs for consecutive spaces. **2. Find an uninterrupted run of end indices over the lines (i.e. vertically), get all unique values and sort them. Iterate over these values and prepend the end indices with every value that is smaller than the end index.** For `triangle bottom right` we get `[27, 16, 16]` (unique and sorted is `[16, 27]`) and `[27, 27]` (unique and sorted is `[27]`). The first run covers the lines 13-15, check against each end index (`16 < 27` so prepend with `16`). The second run covers 14-15 but `27 = 27` so don't do anything. For `diagonal down` we get `[16, 18, 20]` (unique and sorted is also `[16, 18, 20]`). The run covers the lines 18-20, check against each end index: `[16, 18, 20] is >= 16` so do nothing, `16 < 18` so prepend with `16`, `[16, 18] < 20` so prepend with `16` and `18`. ``` line start indices end indices 0 [ [ ] [ [ ] 01 1 , [ ] , [ ] →456789A123 2 , [ ] , [ ] →→//·rectangle 3 , [ 4, 21, 35 ] , [ 16, 27, 37 ] →→aa············ddddd······gggggggg··jjj 4 , [ 9, 17, 33 ] , [ 16, 27, 37 ] →→bbbbbbb·······e··········hhhhhh····kkk 5 , [ 14, 25, 31 ] , [ 16, 27, 37 ] →→cccccccccccc··fffffffff··iiii······lll 6 , [ ] , [ ] 7 , [ ] , [ ] →→//·triangle·bottom·left 8 , [ ] , [ ] →→aa 9 , [ 9 ] , [ 16 ] →→bbbbbbb·······e 10 , [ 14, 25 ] , [ 16, 27 ] →→cccccccccccc··fffffffff··iiii 11 , [ ] , [ ] 12 , [ ] , [ ] →→//·triangle·bottom·right 13 , [ 2 ] , [ 16, 27 ] →→·························gggggggg 14 , [ 2, 17 ] , [ 16, 27 ] →→··············e··········hhhhhh 15 , [ 14, 25 ] , [ 16, 27 ] →→cccccccccccc··fffffffff··iiii 16 , [ ] , [ ] 17 , [ ] , [ ] →→//·diagonal·down 18 , [ 4 ] , [ 16 ] →→aa············ddddd 19 , [ 9 ] , [ 16, 18 ] →→bbbbbbb·········hhhhhh 20 , [ 14 ] , [ 16, 18, 20 ] →→cccccccccccc······lll 21 , [ ] , [ ] →45 22 , [ ] , [ ] 0123456789A12 23 , [ ] , [ ] ] ] ``` **3. For each line, check if the tabposition is within start and end position. Replace the spaces with a tab.** :::{literalinclude} ../../tests/simple_whitespaces.txt ::: ## Language-aware elastic tabstops If the elastic tabstops implementation also has language-awareness it will enable additional features which can be used only for viewing files. **Different spacing settings** Code styles have different rules for spaces before and after certain language features (keyword, assignment, function header, function call etc.). We can use this information to apply different spacing settings. Lets say we have the following code style: * one space before assignment (`·=`) * no space before function call (`foo()`) * comments on mod-4 (` // comment`) * no space before, one space after parameter separater (`foo(a, b)`) ``` myint↔ x↔ = foo↔ ("Alfa"↔ ,↔ 1, a, b, c)↔ // some myfloat↔ yyy↔ = foobar↔ ("Bravo"↔ ,↔ 123, a, b, c)↔ // comment mystr↔ zzzzz↔= foobarbaz("Charlie",↔12345, a, b, c)↔ // here ``` ## Special cases ### Tabs-only line handling If a line consists only of tabs it is ambiguous what the user's intention was. Should they be treated as sole indentation tabs (variant A) or mix indentation tabs with elastic tabstops to fuse with the next block (variant B). Note that some editors automatically trim trailing whitespaces on save depending on the user settings and in most cases will be removed anyway. ``` → aaaaa↔ 111↔ 222 → ↔ 111↔ 222 → ↔ ↔ → bbb↔ 111↔ 222 → ↔ 111↔ 222 ``` variant A - interrupted column block: ``` aaaaa 111 222 111 222 ↵ bbb 111 222 111 222 ``` variant B - continuous column block: ``` aaaaa 111 222 111 222 ↵ bbb 111 222 111 222 ``` *(When editing this document some editors automatically clean trailing whitespace. To prevent accidental corruption some line ends are displayed as `↵` here.)* Variant A is arguable better because it allows more fine-grained control while still allowing to interrupt column blocks with blank lines. For testing (contains actual `\t` characters): ``` aaaaa 111 222 111 222 \t\t\t bbb 111 222 111 222 ``` *(When editing this document some editors automatically clean trailing whitespace. To prevent accidental corruption some tabs are stored as `\t` here. Please convert them manually.)* ### Tabs within strings Using an actual tab character instead of the `\t` escape sequence in a string may be allowed by a programming language. This essentially creates an alignment within an alignment. An alignment of this tab character however may or may not be desired by the user but this would require a language-aware feature for `quotes`. Although if an actual tab character within strings is allowed in the programming language and the developer wants to use it for alignment within strings (example 1), it would only force the developer to use `\t` escaping where it is not intended for alignment anyway. In this case a language-aware feature is not required. **Example 1 - alignment within is desired** ``` → print('Names:') → print('Alfa↔ : 0')↔ /* try making↔ */ → print('Bravo↔ : 1')↔ /* this comment↔ */ → print('Charlie↔ : 2')↔ /* a bit longer↔ */ ``` wrong rendering: ``` print('Names:') print('Alfa : 0') /* try making */ print('Bravo : 1') /* this comment */ print('Charlie : 2') /* a bit longer */ ``` for testing (contains actual `\t` characters): ``` print('Names:') print('Alfa : 0') /* try making */ print('Bravo : 1') /* this comment */ print('Charlie : 2') /* a bit longer */ ``` **Example 2 - alignment within is not desired** ``` → print('Alignment characters') → print('Underscore: "_"')↔ /* try making↔ */ → print('Space: " "')↔ /* this comment↔ */ → print('Tab: "→ "')↔ /* a bit longer↔ */ ``` expected rendering: ``` print('Alignment characters') print('Underscore: "_"') /* try making */ print('Space: " "') /* this comment */ print('Tab: " "') /* a bit longer */ ``` wrong rendering: ``` print('Alignment characters') print('Underscore: "_"') /* try making */ print('Space: " "') /* this comment */ print('Tab: " "') /* a bit longer */ ``` for testing (contains actual `\t` characters): ``` foo('Underscore: "_"') /* try making */ bar('Space: " "') /* this comment */ baz('Tab: " "') /* a bit longer */ ``` ## Optimizations * Don't calculate last cell because it's not used for alignment * Track changes and only update changed lines :::{todo} Add more optimization tips ::: ## Other implementations * [Elastic Notepad - elasticTabstops.scala](https://github.com/nickgravgaard/ElasticNotepad/blob/master/app/src/elasticTabstops.scala) * [Online demo on Scala Scastie](https://scastie.scala-lang.org/KI0z14XATkq1I6Bqa4ZRIA) * [Notepad++ - ColumnsPlusPlus - ColumnsPlusPlus.cpp](https://github.com/Coises/ColumnsPlusPlus/blob/master/src/ColumnsPlusPlus.cpp) * [Visual Studio - Always Aligned - ElasticTabstopsConverter.cs](https://github.com/nickgravgaard/AlwaysAlignedVS/blob/master/AlwaysAlignedVS/ElasticTabstopsConverter.cs) * [Scintilla - Elastic Tabstops - ElasticTabstops.cpp](https://github.com/nickgravgaard/ElasticTabstopsForScintilla/blob/master/ElasticTabstops.cpp) **Descrptions** * [observablehq.com@shaunlebron/elastic-tabstops](https://observablehq.com/@shaunlebron/elastic-tabstops) **Elastic Tabstops Lite** * [ACE - Elastic Tabstops Lite Extension - elastic_tabstops_lite.js](https://github.com/ajaxorg/ace/blob/master/src/ext/elastic_tabstops_lite.js) * [Visual Studio Code - Elastic Tabstops Lite - extension.ts](https://github.com/isral/elastic_tabstops_lite.vsce/blob/master/src/extension.ts)