The Kotlin standard library is a wonderful set of APIs, but it sometimes hides… interesting
surprises. So today let’s took at the innocent looking `minOf()`

and `maxOf()`

functions.

These functions let you perform a min/max on a series of values, instead of just two with the
more common `min()`

and `max()`

functions. Both `min()`

and `max()`

are straightforward and
simply delegate — on Android and JVM — to `Math.min()`

/`Math.max()`

respectively, which you
can assume to be implemented with intrinsics (and are on Android).

Now consider the following piece of code that loops over an array of colors to find the minimum of the red & green channels across all colors:

```
1var minColor = Float.MAX_VALUE
2for (color in colors) {
3 minColor = minOf(minColor, color.red, color.green)
4}
```

Let’s benchmark it on a Pixel 6, using 9,999 colors:

```
minOf(3) 86,815 ns 0 allocs
```

Seems reasonable, but we forgot to check the blue channel, so we’ll fix our code:

```
1var minColor = Float.MAX_VALUE
2for (color in colors) {
3 minColor = minOf(minColor, color.red, color.green, color.blue)
4}
```

And the benchmark now yields:

```
minOf(4) 256,048 ns 9999 allocs
```

The new benchmark is 2 times slower (total execution time is 3 times the previous total execution
time). Does this make sense? To find the minimum of 3 values, we need to invoke `min()`

twice:

```
1fun minOf(a: Float, b: Float, c: Float) = min(a, min(b, c))
```

And to find the minimum of 4 values, we need three `min()`

:

```
1fun minOf(a: Float, b: Float, c: Float, d: Float) = min(a, min(b, min(c, d)))
```

We would therefore reasonably expect the first benchmark to take ~2/3 of the time of the second benchmark, not 1/3. The number of allocations reported by the benchmark gives us a hint about what’s happening: with 9,999 reported allocations, it means we are doing one allocation per color that we are testing. Looking at the source code of the standard library the answer becomes obvious:

```
1public actual fun minOf(a: Float, vararg other: Float): Float {
2 var min = a
3 for (e in other) min = minOf(min, e)
4 return min
5}
```

The `minOf()`

function offers specializations for 2 and 3 parameters, but falls back to a `vararg`

variant starting with 4 parameters. The problem with `vararg`

is that they are equivalent to passing
an array that must be allocated at the call site. Here is the relevant DEX bytecode on Android:

```
1const/4 v0, #int 3 // #3
2new-array v1, v0, [F // type@0037
3const/4 v2, #int 0 // #0
4aput v4, v1, v2
5const/4 v4, #int 1 // #1
6aput v5, v1, v4
7const/4 v4, #int 2 // #2
8aput v6, v1, v4
9if-ge v2, v0, 0017 // +000b
10aget v4, v1, v2
11invoke-static {v3, v4}, Ljava/lang/Math;.min:(FF)F // method@0008
12move-result v3
13add-int/lit8 v2, v2, #int 1 // #01
14goto 000c // -000a
```

The ART^{1} AOT^{2} compiler is however able to optimize this away (the fact that the allocations
show up in a benchmark without speed compilation means the JIT kept array though):

If you need to min/max 4 values in a tight loop, an easy fix to benefit from optimized code in interpreted, JIT, and AOT modes is to use a 4 parameter specialization:

```
1inline fun fastMinOf(a: Float, b: Float, c: Float, d: Float): Float {
2 return minOf(a, minOf(b, minOf(c, d)))
3}
```

This implementation produces exactly what we expect in DEX:

```
1invoke-static {v2, v3}, Ljava/lang/Math;.min:(FF)F // method@0008
2move-result v2
3invoke-static {v1, v2}, Ljava/lang/Math;.min:(FF)F // method@0008
4move-result v1
5invoke-static {v0, v1}, Ljava/lang/Math;.min:(FF)F // method@0008
6move-result v0
```

And of course the aarch64 assembly matches what the AOT compiler was already doing:

Here is how this new version compares in the same benchmarks as before:

```
minOf(3) 86,815 ns 0 allocs
minOf(4) 256,048 ns 9999 allocs
fastMinOf(4) 114,490 ns 0 allocs
```

All the allocations are gone, and we are now closer to the 2/3 ratio between `minOf(3)`

and
`fastMinOf(4)`

.

Will this impact your app? Not unless you invoke `minOf()`

/`maxOf()`

with 4 values in performance
critical loops. For Jetpack Compose, optimizing the interpreted and JIT use cases is also important
since it helps improve the debug workflows and reduces noise in other measurements.