Briefly mentioned in #16, but as ARM devices become more popular, it would great to have an accelerated implementation for them as well.