Jake Wharton recently caused me to go down yet another silly
optimization rabbit hole when he nonchalantly linked to
a piece of code
used to count the number of digits in a `Long`

during a Slack conversation about Kotlin’s lack
of ternary operator. This of course triggered folks like Madis Pink and
me to want to optimize it…

# Counting digits Link to heading

The simplest way to count the number of digits would be to compute `log10(n).toInt() + 1`

, where `n`

is our input number. Unfortunately logarithmic functions like
Kotlin’s log10
are only defined for floating point numbers. If our input is an `Int`

or a `Long`

, we could first
convert to `Double`

and then call `log10`

, but not all `Long`

values can be stored in a `Double`

(any value above 2^53), and we would have to special case `0`

. We must therefore find a different
solution^{1}.

Important NoteIn the rest of this article, we’ll assume the input number is

positiveto keep the code easier to understand. You can see how Jake handled negative values in the original source code. There might be better ways of doing it in the various solutions presented below, but I’ll leave it as an exercise to the reader.

Let’s start with the most straightforward solution:

```
1fun simpleCount(v: Long): Int {
2 if (v < 10L) return 1
3 if (v < 100L) return 2
4 if (v < 1000L) return 3
5 if (v < 10000L) return 4
6 if (v < 100000L) return 5
7 if (v < 1000000L) return 6
8 if (v < 10000000L) return 7
9 if (v < 100000000L) return 8
10 if (v < 1000000000L) return 9
11 if (v < 10000000000L) return 10
12 if (v < 100000000000L) return 11
13 if (v < 1000000000000L) return 12
14 if (v < 10000000000000L) return 13
15 if (v < 100000000000000L) return 14
16 if (v < 1000000000000000L) return 15
17 if (v < 10000000000000000L) return 16
18 if (v < 100000000000000000L) return 17
19 if (v < 1000000000000000000L) return 18
20 return 19
21}
```

I like this solution because of how obvious it is. What I don’t like about it is the large number of branches and instructions it generates. I will spare you the full arm64 assembly dump, just know that it yields 111 instructions and 32 branches. While early returns will likely spare you the full execution of the generate code, this won’t be great for the instruction cache nor the branch predictor.

On a Pixel 6, this solution takes 63,236 ns to execute over a large array of random inputs.

# Using a binary search Link to heading

Jake’s solution is similar to the one we just saw, but re-arranges the tests to optimize for likely values, and effectively performs a binary search.

```
1fun binarySearchCount(v: Long): Int {
2 return if (v < 100000000L) {
3 if (v < 10000L) {
4 if (v < 100L) {
5 if (v < 10L) {
6 1
7 } else {
8 2
9 }
10 } else if (v < 1000L) {
11 3
12 } else {
13 4
14 }
15 } else if (v < 1000000L) {
16 if (v < 100000L) {
17 5
18 } else {
19 6
20 }
21 } else if (v < 10000000L) {
22 7
23 } else {
24 8
25 }
26 } else if (v < 1000000000000L) {
27 if (v < 10000000000L) {
28 if (v < 1000000000L) {
29 9
30 } else {
31 10
32 }
33 } else if (v < 100000000000L) {
34 11
35 } else {
36 12
37 }
38 } else if (v < 1000000000000000L) {
39 if (v < 10000000000000L) {
40 13
41 } else if (v < 100000000000000L) {
42 14
43 } else {
44 15
45 }
46 } else if (v < 100000000000000000L) {
47 if (v < 10000000000000000L) {
48 16
49 } else {
50 17
51 }
52 } else if (v < 1000000000000000000L) {
53 18
54 } else {
55 19
56 }
57}
```

This new solution has a similar number of instructions as the previous one (108 vs 111), but only a quarter of the branches, for a total of 8 branches, down from 32. In addition, you are guaranteed to never execute all the branches.

With the smarter way to go throught the branches, the binary search takes 27,034 ns to execute, or only ~40% of the original execution time. This is already a great improvement.

# Educated guess Link to heading

What makes this problem “difficult” is that we are trying to work in base 10 while computers work
in base 2. We can however use this to our advantage to *estimate* how many digits our number has.
To do this we need to count the number of *leading zeroes* in the binary representation of the
number. Using this formula:

```
1val bitCount = 64 - n.countLeadingZeroBits()
```

We know how many bits are required to encode the number. Knowing this we can use a lookup table to map this number of bits to a possible number of digits. For instance if the input is 237, the formula above tells us we need 8 bits to encode the value. From this we need to map to the closest power of 10. This can be done either via another lookup table (like I did oringally) or by using Madis’ solution:

```
1val powerOfTen = bitCount * 3 / 10
```

With our example of 237, the value returned is 2 because the closest power of ten is 100, or 10^2.
If the input is less than the matching power of ten, we can return `powerOfTen`

as the number of
digits, and if the input is greater we need to adjust the result and add 1:

```
1private val PowersOfTen = longArrayOf(
2 0,
3 10,
4 100,
5 1000,
6 10000,
7 100000,
8 1000000,
9 10000000,
10 100000000,
11 1000000000,
12 10000000000,
13 100000000000,
14 1000000000000,
15 10000000000000,
16 100000000000000,
17 1000000000000000,
18 10000000000000000,
19 100000000000000000,
20 1000000000000000000
21)
22
23fun log2Count(v: Long): Int {
24 val guess = ((64 - v.countLeadingZeroBits()) * 3) / 10
25 return guess + (if (v >= PowersOfTen[guess]) 1 else 0)
26}
```

The expensive integer division can be removed using another trick from Madis (which is similar to what I have seen compilers do automatically):

```
1fun log2Count(v: Long): Int {
2 val guess = ((64 - v.countLeadingZeroBits()) * 10) ushr 5
3 return guess + (if (v >= PowersOfTen[guess]) 1 else 0)
4}
```

With this solution we end up with only 24 instructions and 2 branches (1 branch always executed for the array bounds check, the other one in case we fail the check, which can’t happen in our case):

```
1str x0, [sp, #-32]!
2str lr, [sp, #24]
3mov w3, #0x40
4mov w2, #0xa
5clz x4, x1
6sub w3, w3, w4
7mul w2, w3, w2
8lsr w2, w2, #5
9ldr w3, [x0]
10ldr w0, [x3, #224]
11ldr w3, [x0, #8]
12cmp w2, w3
13b.hs #+0x24
14add w0, w0, #0x10 (16)
15ldr x0, [x0, x2, lsl #3]
16cmp x1, x0
17cset w0, ge
18add w0, w2, w0
19ldr lr, [sp, #24]
20add sp, sp, #0x20 (32)
21ret
22mov x0, x2
23mov x1, x3
24bl #+0x144
```

This solution runs in 16,396 ns on a Pixel 6, taking only only 60% of the time taken by the binary search, and only 25% of the naive implementation.

# Conservative approximation Link to heading

If you don’t need to know the *exact* number of digits, but a close approximation that’s guaranteed
to be always greater or equal to the real answer (to size a buffer for instance), you can drop the
entire lookup table, which greatly simplifies the generated arm64 assembly.

```
1fun conservativeLog2Count(v: Long): Int {
2 val guess = ((64 - v.countLeadingZeroBits()) * 10) ushr 5
3 return guess + 1
4}
```

The resulting aarch64 code has only 8 instructions and no branches:

```
1mov w3, #0x40
2mov w2, #0x4d1
3mov w0, #0x1
4clz x1, x1
5sub w1, w3, w1
6mul w1, w1, w2
7add w0, w0, w1, lsr #12
8ret
```

This approximation saves us even more time, running in 14,853 ns.

# Recap Link to heading

Here is the final comparison between the various implementations:

Implementation | Instructions/branches | Time | Speed |
---|---|---|---|

Branches | 111 / 32 | `63,236 ns` | `1.0x` |

Binary search | 108 / 8 | `27,034 ns` | `2.3x` |

log2 | 24 / 2 | `16,396 ns` | `3.8x` |

log2 conservative | 8 / 0 | `14,853 ns` | `4.2x` |

Frustratingly — but unsurprisingly — a C++ implementation of
`log2Count`

produces 12 instructions instead of Kotlin’s 24, and no branches. This is of course due
to the fact that C++ doesn’t do any bounds check for us on the array. I haven’t tried to benchmark
this solution using JNI and `@CriticalNative`

but I doubt the savings will be greater than the cost
of the JNI overhead.

We could use the

`log10`

technique for any value lower than 2^53 and otherwise use one of the solutions that follow, but after benchmarking it’s not faster than any of the other solutions except the most naive one. ↩︎