profile was calling vlong run-time support directly which is counter-productive when those operations are done by the compiler and might not even exist in the library. i've used A = A+B rather than A += B to ensure the generated code is the same (ie, calls to _addv and _subv) on those platforms that still use them. (those functions are handled specially for profiling purposes in vlop.s on some platforms.) it's probably time to put a bit more intelligent vlong support into the portable part of cc, but this change will allow alpha, amd64 and others to cope until i have a chance to look at that (eg, for powerpc).