david wong

Hey ! I'm David, a security consultant at Cryptography Services, the crypto team of NCC Group . This is my blog about cryptography and security and other related topics that I find interesting.

SIMD instructions in Go last month

One awesome feature of Go is cross-compilation. One limitation is that we can only choose to build for some pre-defined architectures and OS, but we can't build per CPU-model. In the previous post I was talking about C programs, where the user actually chooses the CPU model when calling the Make. Go could probably have something like that but it wouldn't be gooy. One solution is to build for every CPU models anyway, and decide later what is good to be used. So one assembly code for SSE2, one code for AVX, one code for AVX-512.

Note that we do not need to use SSE3/SSE4 (or AVX2) as the interesting functions are contained in SSE2 (respectively AVX) which will have more support and be contained in greater versions of SSE (respectively AVX) anyway.

The official Blake2 implementation in Go actually uses SIMD instructions. Looking at it is a good way to see how SIMD coding works in Go.

In _amd64.go, they use the builtin init() function to figure out at runtime what is supported by the host architecture:

func init() {
    useAVX2 = supportsAVX2()
    useAVX = supportsAVX()
    useSSE4 = supportsSSE4()
}

Which are calls to assembly functions detecting what is supported either via:

  1. a CPUID call directly for SSE4.
  2. calls to Golang's runtime library for AVX and AVX2.

In the second solution, the runtime variables seems to be undocumented and only available since go1.7, they are probably filled via cpuid calls as well. Surprisingly, the internal/cpu package already has all the necessary functions to detect flavors of SIMD. See an example of use in the bytes package.

And that's it! Blake2's hashBlocks() function then dynamically decides which function to use at runtime:

func hashBlocks(h *[8]uint64, c *[2]uint64, flag uint64, blocks []byte) {
    if useAVX2 {
        hashBlocksAVX2(h, c, flag, blocks)
    } else if useAVX {
        hashBlocksAVX(h, c, flag, blocks)
    } else if useSSE4 {
        hashBlocksSSE4(h, c, flag, blocks)
    } else {
        hashBlocksGeneric(h, c, flag, blocks)
    }
}

Because Go does not have intrisic functions for SIMD, these are implemented directly in assembly. You can look at the code in the relevant _amd64.s file. Now it's kind of tricky because Go has invented its own assembly language (based on Plan9) and you have to find out things the hard way. Instructions like VINSERTI128 and VPSHUFD are the SIMD instructions. MMX registers are M0...M7, SSE registers are X0...X15, AVX registers are Y0, ..., Y15. MOVDQA is called MOVO (or MOVOA) and MOVDQU is called MOVOU. Things like that.

As for AVX-512, Go probably still doesn't have instructions for that. So you'll need to write the raw opcodes yourself using BYTE (like here) and as explained here.

Well done! You've reached the end of my post. Now you can leave me a comment :)