cudaMemcpyAsync allows asynchronous transfer of memory between the CPU and GPU over a non-default stream, enabling overlap of memory copies and computation so that copies do not block the CPU or GPU. This can improve performance by hiding memory transfer latency with useful work.