I was looking at this video that TwistedGamingTV posted of his Arcade1Up KI machine which looks at input latency, and upon closer inspection, it seems fairly big. I'm using this particular snippet here since we can clearly see when the button gets fully pressed in and when a movement occurs on screen:
https://youtu.be/0PWe4_PxYOY
As you can see, from the moment the button is fully pressed, we can count 5 frames of video before any movement is seen on screen. Since the video itself is encoded at 30fps on YouTube, and the emulator runs at 60fps, that translates to 10 frames of "in engine" input latency, which totals 166ms (1000/60*10). Relatively speaking this is immense when compared to, for example, Nintendo's SNES Classic which only has 5 frames (83ms) of input latency. It's double!

I'm curious to see if there's anything you guys could do to reduce or improve this? Generally this comes down to a variety of factors, but the biggest culprit is usually the method of VSync that is used combined with where in your loop the control polling occurs. What method of VSync are you guys using? Is it Triple Buffering? Because if so, while it is a good method for achieving a smooth framerate, it induces a lot of input latency since (as the name implies) it buffers 3 whole frames before one is displayed on the screen, meaning the frame you see reflects your input from 3 frames ago, not the current frame. The fact that Nintendo is able to achieve just 5 frames of input latency on the SNES Classic, which is also software emulation that runs on a Linux kernel, means that there *IS*, technically, a way to achieve lower input latency while maintaining VSync and avoiding screen tearing.
I trust you guys took care to minimize this as you developed this emulator, but can I implore you to take a second look and see if anything was missed or if anything could be tweaked in your rendering loop to bring this closer to 5 frames?
Many thanks!
