One site used the analogy that it's easier to move large boxes than a bunch of sand. But the problem with that analogy is that I imagine multiple bits of data have to be read one by one regardless. I don't imagine that the read/write head can "move" multiple bits simultaneously.
Yes, they can. That's known as data bus width. A 64-bit CPU can work with 64 bits of data simultaneously. And SSD's are designed to buffer data 4096 bytes at a time from its memory blocks to the computer's faster RAM, where the CPU can work with it more quickly.
File "allocation units" (and its predecessor, sector "clusters") were in part to reduce the number of entries in the indexes -- the file allocation table (FAT) or the NTFS master file table (MFT) -- that point to the locations of each file's data blocks.
A smaller cluster size means a larger index to address the entirety of the partition's data storage space, and more lookups in the index to pull together the pieces of a given file, and potentially more file fragmentation (a particular problem for spinning disc platters). All of these translate to less efficiency.
A larger cluster size means less fragmentation and fewer index lookups, but also means more wasted space ("slack space"). For example, say you have a 9KB file. With 2KB clusters that file will be split across 5 clusters, with 1KB of the 5th cluster being unused. (It also can't be reused for another file because there are no FAT/MFT index entries pointing to partial clusters, only whole clusters -- that's the whole point of clusters.) In contrast, with 4KB clusters that 9KB file will be split across 3 clusters, so less potential fragmentation but 3KB of the last cluster is wasted instead of 1KB.
Given the mean size of files on a typical system, the choice of 4KB clusters was a practical tradeoff for the sake of efficiency.
For an analogy, let's say you're sending your buddy to the store for party supplies. It's easier to tell him to pick up three cases of beer, rather than telling him to get 72 bottles. He doesn't have to count bottles, he just grabs three units from the stack on the store's floor. It's more efficient that way.
For backward compatibility, solid-state drives used the same file system right from the outset, though the term "cluster" was modernized to "allocation unit". However, this was also a fortuitous break because the memory chips in SSDs are actually built from blocks of 4KB, so no SSD could have used smaller allocation units, anyway.
Incidentally, today's large spinning disc drives (known as "AF" or "Advanced Format") have also switched from 512-byte sectors to 4KB sectors natively, so the same principal applies.
BTW, this is a good place to also mention the topic of SSD "alignment". Unlike magnetic media (which can directly overwrite individual sectors), SSDs can only write 4KB blocks of memory at a time, and must do a block erase operation first because they cannot directly overwrite.
You don't want your file system to line up its file allocation units so that one a.u. spans across the end of one SSD memory block and the beginning of the next block. If you had to change the data in that one a.u., the SSD would have to read two memory blocks, save the beginning of one block and the end of the other, stitch in the new changing data, erase both SSD blocks, and then write the two stitched blocks back to the SSD media. That's massively inefficient.
Fortunately, modern OSes are smart enough to make sure they align their allocation units with the SSD's memory blocks when creating the file system.
Should I, from now on, format my SSDs with the smallest available allocation unit size even if most of the files I intend to put on the drive are movies?
SSDs must read/write data in blocks of 4KB at a time, so there is no point trying to use allocation units smaller than that. You'll just hamper its operation and make it hugely inefficient. The NTFS file system defaults to 4KB allocation units, so leave it alone.