Actually this can be even more simplified if we implement a 2-bit counter using boolean logic ourselves:
std::vector<int64_t> input;
// The bits in a & b implement 64 parallel 2-bit counters to keep track
// of number of bits in input at position i. a[i] is bit 0 and b[i] bit 1 of the
// counter for position i.
int64_t a = 0, b = 0;
// After this loop finishes, a hold the number that appears once and
// b the number that appears twice.
for (int64_t n_i : input) {
b ^= a & n_i;
a ^= n_i;
}
I think this is more efficient and elegant.