First, if the assembly code is competently commented (in particular with a clear register map), porting it is just a matter of following the algorithm main sequence and is rather straightforward, translating instructions one by one, with the addition of the occasional vcombine when two D vectors become a Q vector; your activity will mostly consist in finding the correct name for the intrinsic func