I've now also figured out a solution to my first problem, though still not much of an idea of what really causes it. In drw_flush, after writing out the screen buffer, do this:
push 0 ; put X,Y offsets on stack
sub rsp, 16 ; 4 more dwords (don't care)
mov eax, 16 ; sys_ioctl
mov edi, [drw_fbfd]
mov esi, 0x4606 ; FBIOPAN_DISPLAY
mov rdx, rsp ; ptr to structure on stack
syscall
add rsp, 24 ; drop it
ret
From what little documentation there is, this IOCTL takes a fb_var_screeninfo structure, but only pays attention to the x/y offset fields (so allocating 24 bytes for it should be enough). With the offsets set to zero, it now works perfectly on my machine.