A Go implementation of Poly1305 that makes sense
Still, after reverse-engineering what the implementations were doing, I grew convinced that cryptography code could be perfectly understandable if only we commented it. I set out to prove this, and the code below is the Poly1305 implementation that came out. The amd64 assembly implementation in golang.org/x/crypto/poly1305 is only 30-60% faster than this code, which provides some timid hope for my dream of reducing the assembly in the Go crypto standard libraries year over year.
Source: blog.filippo.io