Closed Bug 856427 Opened 11 years ago Closed 11 years ago

Add vsync support on windows

Categories

(Core :: Graphics, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla24

People

(Reporter: avih, Assigned: avih)

References

Details

Attachments

(9 files, 9 obsolete files)

3.55 KB, text/html
Details
7.62 KB, text/html
Details
11.43 KB, patch
avih
: review+
Details | Diff | Splinter Review
330 bytes, text/plain
Details
145.38 KB, image/png
Details
94.36 KB, image/png
Details
11.89 KB, application/octet-stream
Details
198.42 KB, image/jpeg
Details
18.09 KB, patch
Details | Diff | Splinter Review
Bas hooked Windows API at widget (so associated with a window and hence a monitor). The API is relatively generic and could probably be supported on other platforms as well, either utilizing existing APIs, or rolled out by sampling at buffer swap.

The changes at the refresh driver which uses this API are generic and should work on all platforms.

Open issues with this patch (for bigger picture - see bug 689418 comment 24):
- No frame limit on high-refresh-rate monitors.
- Not sure how this patch will behave with concurrent animation on two windows - each on a different monitor, as the refresh driver uses a single static high-res timer.

https://tbpl.mozilla.org/?tree=Try&rev=d6bb269f5d02
This is windows specific, but possibly easy to implement/support on other platforms.
Attachment #731657 - Flags: review?
This should be generic enough to work on all/most platforms.
Attachment #731658 - Flags: review?(vladimir)
So, I'm not a huge fan of this patch, because I doubt it will actually hit vsync; it will just get closer than what we have now, at some complexity cost.  Instead, I think what we want is something like this:

- Move our Present() call to a separate thread and call it with present interval 1 (thus synchronizing it with vsync)

- After Present() returns, queue up an event on the main thread to run rAF callbacks -- these will just do layers updates and the like

Note that we could still end up doing more than one layers transaction even while we're waiting; I'm not sure what our options are for those.  I would think we'd apply the updates to the layers and then render everything the next time we actually render our layer tree.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #3)
> So, I'm not a huge fan of this patch, because I doubt it will actually hit
> vsync; it will just get closer than what we have now, at some complexity
> cost.  Instead, I think what we want is something like this:
> 
> - Move our Present() call to a separate thread and call it with present
> interval 1 (thus synchronizing it with vsync)
> 
> - After Present() returns, queue up an event on the main thread to run rAF
> callbacks -- these will just do layers updates and the like
> 
> Note that we could still end up doing more than one layers transaction even
> while we're waiting; I'm not sure what our options are for those.  I would
> think we'd apply the updates to the layers and then render everything the
> next time we actually render our layer tree.

I agree that that's an easier approach, and in most cases better approach that's preferable in the future. But there's a lot of additional changes that would mean. You'd increase the input latency for other interactions for one. That might actually be ok, since the maximum input latency would still be 20 ms in full frame rate situations.

There's other downsides though, where if a scene is -barely- presentable at 50 fps you're now immediately dropping to 25 fps. I think in the end this is the right approach with some logic to deal with the downsides, but we need the complexity in mind.

I know practically no FPS gamers for example who play with v-sync on for these reasons, and since we want to support games here as well, it's something to think about.
The testcase is cross-browser, but on opera the tearing-section isn't animating.

Also, I've added phase-shift control to the build for testing (defaults to +10% and then round the delay up), using layers.frame_rate.vsync.phasePercentage . The patch is the last at the try build:

Try: https://tbpl.mozilla.org/?tree=Try&rev=2b6b37641a3b

Win32 build (I _think_ that's the path, should be available once it completes the build, soon): http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/ahalachmi@mozilla.com-2b6b37641a3b/try-win32/

It seems that to get consistent phase WRT actual vblank signal, we need the delay to hit after the vblank signal, but since we have integers ms timers, and they're not always accurate, we should be rounding the delay up (rather than floor as it used to be) and then add some (10% by default, configurable).

This seems to produce very consistent vsync animation.

One issue I noticed is that on my secondary monitor it's less smooth than on the primary monitor. Might be an issue of the window API hook, or my usage of it, or just some other issue (chrome and IE are also less smooth on the secondary monitor).
With a focus on games, we'll need to support non-integer-divided framerates. (like 57) However, we want it to be smooth at full-speed. That said, some people don't want to put up with tearing, so I should think this would need to be toggleable?

IIRC, we want to use something like: http://www.opengl.org/registry/specs/EXT/wgl_swap_control_tear.txt

I believe we want to vsync properly at 60fps for 60Hz, but if we take to long, we'd prefer to risk tearing and start hitting 59/58/57fps.

If we can estimate when the next vblank is well, we should be able to tell if we're going to hit inside our 16.67ms window, and vsync. Else, we'd like to do a non-vsync'd swap instead.

To expand upon a tangent, we will need to do all this properly even after a user moves a window from one monitor to another. (Probably just use the info from the last 'owner' monitor when the window straddles screens)
The current refresh driver code will end up dividing down to the nearest multiple of 60fps that we can actually hit.  It's effectively doing forced vsync.

I do agree that we should have a toggle for people who want totally uncapped frame rates, but let's get the smooth/no-tearing path done first.  Once we have that, completely uncapping is trivial (and we can be smart about it -- e.g. uncap if we drop below 60fps, as Jeff says).

Avi, does the suggestion I made above in comment #3 make sense?  I'd really rather try to do something like that than the patches that you have here, because the patches are really just guessing.. and we're never going to get accurate timings.  We can probably hit it within 2-3ms (which is enough to actually get correct vsync), but that means 2-3ms sort of wasted per frame.

On a related note, I'm probably going to get a nvidia FCAT setup in the Toronto office (http://www.anandtech.com/show/6862/fcat-the-evolution-of-frame-interval-benchmarking-part-1) so that we can measure all this stuff with precise accuracy, and as a bonus, can capture smooth videos of Firefox :p
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #7)
> because the patches are really just guessing.. and we're never going to get
> accurate timings.  We can probably hit it within 2-3ms (which is enough to
> actually get correct vsync), but that means 2-3ms sort of wasted per frame.

Indeed, sort of wasted. Not as bad as on OS X, since we can still do GC/CC at that time, but we do lose some potential processing time of the rAF callbacks. It's still quite nice though, you should try it ;)

> Avi, does the suggestion I made above in comment #3 make sense?  I'd really
> rather try to do something like that than the patches that you have here,

I also think your suggestion is better, with the exception of Bas' comment about making it harder to run uncapped, thus mhurting input lag. The other side of it is that the working patch is enough to play with and notice some issues (as the phase one), and changing to your approach requires, I think, a bigger change, and I also don't know if there should be some relation to OMTC.


> On a related note, I'm probably going to get a nvidia FCAT setup in the
> Toronto office
> (http://www.anandtech.com/show/6862/fcat-the-evolution-of-frame-interval-
> benchmarking-part-1) so that we can measure all this stuff with precise
> accuracy, and as a bonus, can capture smooth videos of Firefox :p

Cool! :)
Everyone,
I posted some comments about the new VSYNC changes at:
https://bugzilla.mozilla.org/show_bug.cgi?id=689418

The work that is being done appears to be on a good track.  That said, I would strongly advise not to decimate.  e.g. Run 85fps@85Hz, 100fps@100Hz, 120fps@120Hz, 144fps@144Hz.  The exact equality between fps and Hz is _extremely_ important for maximized fluidity.
Currently, Chrome has an impressive amount of VSYNC fluidity for rAF; I'd love to see FireFox at least match (or exceed) competition.

Performance (e.g. too much rAF processing, slow computer, etc) shouldn't cause a sudden drop when one CPU cycle too much occurs (e.g. sudden drop to 30fps @ 60Hz) but instead simply gracefully degrade (e.g. 59fps, 58fps, 57fps, etc.) as performance gradually constrains the framerate. 

I own two 144 Hz LCD computer monitors -- an ASUS VG278H and a BENQ XL2411T
To one of you Toronto Mozilla FireFox developers, I offer the loan of one of these if it will help testing of >60Hz support.   It supports refresh rates in fractional Hz increments (0.001 Hz) from ~50Hz all the way to 144Hz, so this can be a good testbed.   Alternatively, the ASUS VG248QE 144 Hz computer monitor is available from Canada Computers for only $279 Canadian; it has a 4.5 star rating on Amazon.com and 5-egg rating on NewEgg USA.

I've given early access to a bleeding-edge precision motion testing benchmark site to both Avi and Vladimir (it's explained in the posts of https://bugzilla.mozilla.org/show_bug.cgi?id=689418
 ...).
If others need early beta tester access to the Blur Busters Motion Tests, contact me at mark@blurbusters.com -- this is going to launch later this spring.  

As a reference, I've done hundreds of hours of testing (see 689418 for info)
Tested in 1Hz increments:
Chrome -- repaints at fps=Hz up to 240 Hz, then hard-framelimits to 250fps
IE10 -- repaints at fps=Hz up to 105 Hz, then halves framerates (e.g. 55fps@110Hz)
P.S. Full screen repaints at 1920x1080, 120 times per second, uses less than 5% of CPU on my system.  So it's quite easily doable on many modern graphics cards.  All the graphics cards capable of outputting 120 Hz are all highly GPU-efficient.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #7)
> The current refresh driver code will end up dividing down to the nearest
> multiple of 60fps that we can actually hit.  It's effectively doing forced
> vsync.
> 
> I do agree that we should have a toggle for people who want totally uncapped
> frame rates, but let's get the smooth/no-tearing path done first.  Once we
> have that, completely uncapping is trivial (and we can be smart about it --
> e.g. uncap if we drop below 60fps, as Jeff says).
> 
> Avi, does the suggestion I made above in comment #3 make sense?  I'd really
> rather try to do something like that than the patches that you have here,
> because the patches are really just guessing.. and we're never going to get
> accurate timings.  We can probably hit it within 2-3ms (which is enough to
> actually get correct vsync), but that means 2-3ms sort of wasted per frame.
> 
> On a related note, I'm probably going to get a nvidia FCAT setup in the
> Toronto office
> (http://www.anandtech.com/show/6862/fcat-the-evolution-of-frame-interval-
> benchmarking-part-1) so that we can measure all this stuff with precise
> accuracy, and as a bonus, can capture smooth videos of Firefox :p

On windows I'm not too eager to move the Present call to a different thread at this point. Especially since that generates a certain amount of indeterminism as sometimes the D3D command buffer will then flush on the main thread and sometimes on this alternate thread. As well as requiring us to deal with sometimes a present call not being executed in time for the first frame causing a 'flicker' on first draw. We're of course facing this problem with OMTC as well, but there we've got some time to address it still :-).

What we -can- do is have a thread that sits and waits on http://msdn.microsoft.com/en-us/library/windows/desktop/bb174559%28v=vs.85%29.aspx and fire the event from there. Interestingly enough that will have the added advantage of working with software rendering. This is an alternate approach I suggested to avi. It's a little less flexible than the approach in the patches though but it might serve us well.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #7)
> Avi, does the suggestion I made above in comment #3 make sense?  I'd really
> rather try to do something like that than the patches that you have here,
> because the patches are really just guessing.. and we're never going to get
> accurate timings.  We can probably hit it within 2-3ms (which is enough to
> actually get correct vsync), but that means 2-3ms sort of wasted per frame.

Yep, 2-3ms should be easy, once the rendering chain is good.
I have found different browsers have a different sensitivity to these delays;
For animations in rAF (which I find is good test case of the whole HTML5 rendering chain from JavaScript all the way display, including browser's own rendering engine), Chrome is able to hit correct VSYNC even if I insert an artificial delay (intentional busy loop) lasting up to about ~80-90% of a refresh cycle (e.g. 14ms delay at 60Hz); animations still play smooth even with an extreme delay.  Safari (at least the versions I tested) appeared to be more sensitive (only a few ms delay and it starts to stutter or drop frames).   It might have been driver specific behavior, or the implementation of VSYNC flipping.

Windows XP has fairly inaccurate timing of timers I've found, while Windows Vista, 7 and 8 have far more accurate timers (sub-millisecond accurate on an idle system).  Technically, you could also query Direct3D RasterStatus.ScanLine to get a very accurate prediction of when the next blanking interval will be (sub-millisecond accuracy).

Ideally, you want to program it in a way so that on an idle modern i7 system, you're getting darn accurate flip accuracy, and right at the sweet spot where you'll hit correct vsync even if you've got nearly a full frame of delay.   But that may be challenging at first.  I'd aim for half-frame-delay robustness (no stutters until more than 8ms of delay at 60Hz, or more than 4ms of delay at 120Hz).  It's when you start running multiple windows, background apps, and multiple tabs, that the accuracy of vsync starts to degrade; running 3 simultaneous high-framerate animation windows can naturally start to make things stutter a little, for example.

Here's a Stack Overflow Question I answered about sub-millisecond accuracy tests in web browsers:
http://stackoverflow.com/questions/6233927/microsecond-timing-in-javascript
This was work that I did before High Resolution Time was introduced! (thank goodness)

I used the early High Resolution Time (microseconds) http://www.w3.org/TR/hr-time/ and found that the accuracy of the timing of the occurences of the rAF calls was happening at a fairly variable accuracy but on an idle system, settled to an 0.1ms accuracy on my quad-core i7 system, and maintained less than +/- 1ms accuracy even for full-screen (1920x1080) animations running at 120 frames per second.  There were no stutters even over a period of one minute, provided I didn't do anything drastic (such as open a new webpage in a new window).  So such vsync accuracies is certainly doable.   On a slow Mac (Core2 duo) the accuracies varied more; ~2ms, still good enough for perfect 60fps vsync pageflips.
(In reply to Mark Rejhon from comment #9)
> The work that is being done appears to be on a good track.  That said, I
> would strongly advise not to decimate.  e.g. Run 85fps@85Hz, 100fps@100Hz,
> 120fps@120Hz, 144fps@144Hz.  The exact equality between fps and Hz is
> _extremely_ important for maximized fluidity.

Rest assured, we also prefer to not force decimation on our users. However, sometimes animation at 60hz is better than frequent flipping between 60hz and 120hz, and possibly also better than (non vsynced) 85hz if that's the fastest we can handle for specific system/content combination. Still, disabling vsync should definitely be an option, and hopefully gracefully degrading from vsync 120h to non-vsync 85hz (or other).

Another issue to consider is that animating (and especially scrolling, at least today), at 120hz WILL consume much more CPU resources and power usage than the same animation at 60hz. While this _might_ not be an issue when using the mains power (for some it still will though), it would definitely be an issue when using the battery.

(In reply to Bas Schouten (:bas.schouten) from comment #12)
> What we -can- do is have a thread that sits and waits on
> http://msdn.microsoft.com/en-us/library/windows/desktop/bb174559%28v=vs.
> 85%29.aspx and fire the event from there. Interestingly enough that will
> have the added advantage of working with software rendering. This is an
> alternate approach I suggested to avi. It's a little less flexible than the
> approach in the patches though but it might serve us well.

Indeed. One aspect we should consider is that we would need different threads (and/or timers) for different monitors on which we have windows. The current refresh driver implementation is of a single global high-res timer, and with bas' patch I'm sort of still using it as a single source of vsync input, which will probably result in incorrect behavior when having 2 animating windows on 2 different monitors.

We could decide that we only care about one/the-main(?) window/monitor to wait for vsync and trigger the refresh driver globally from it, but this would leave the implementation lacking and possibly appearing somewhat arbitrarily for vsync per window.
(In reply to Avi Halachmi (:avih) from comment #14)
> Another issue to consider is that animating (and especially scrolling, at
> least today), at 120hz WILL consume much more CPU resources and power usage
> than the same animation at 60hz. While this _might_ not be an issue when
> using the mains power (for some it still will though), it would definitely
> be an issue when using the battery.

False.

1. Mobile systems do not support 120 Hz with a few exceptions (such as Toshiba Qosimo gaming laptops), purchased by people who wants the full 120 fps

2. Full screen repaints at 1920x1080, 120 times per second, uses less than 5% of CPU on my system.  If you want, I have a Kill-A-Watt power meter, and I can do some benchmarking of 60fps and 120fps for simple browser scrolling.  

3. It's quite easily doable on many modern graphics cards. 

4. All the graphics cards capable of outputting 120 Hz are all highly GPU-efficient.

 




> (In reply to Bas Schouten (:bas.schouten) from comment #12)
> > What we -can- do is have a thread that sits and waits on
> > http://msdn.microsoft.com/en-us/library/windows/desktop/bb174559%28v=vs.
> > 85%29.aspx and fire the event from there. Interestingly enough that will
> > have the added advantage of working with software rendering. This is an
> > alternate approach I suggested to avi. It's a little less flexible than the
> > approach in the patches though but it might serve us well.
> 
> Indeed. One aspect we should consider is that we would need different
> threads (and/or timers) for different monitors on which we have windows. The
> current refresh driver implementation is of a single global high-res timer,
> and with bas' patch I'm sort of still using it as a single source of vsync
> input, which will probably result in incorrect behavior when having 2
> animating windows on 2 different monitors.
> 
> We could decide that we only care about one/the-main(?) window/monitor to
> wait for vsync and trigger the refresh driver globally from it, but this
> would leave the implementation lacking and possibly appearing somewhat
> arbitrarily for vsync per window.
[oops, apologies for the unnecessary quote.  Wish I could edit my posts.]

> Indeed. One aspect we should consider is that we would need different
> threads (and/or timers) for different monitors on which we have windows. The
> current refresh driver implementation is of a single global high-res timer,
> and with bas' patch I'm sort of still using it as a single source of vsync
> input, which will probably result in incorrect behavior when having 2
> animating windows on 2 different monitors.

Good point, an area that needs thought.
Chrome does it on a per-window basis, I've written about it in:
https://bugzilla.mozilla.org/show_bug.cgi?id=689418
As part of my hundreds of hours of testing in the last three months, I've done different-refresh multimonitor tests too, which showed:

- Windows that fit on one monitor, automatically run at the Hz of that screen.
- Windows that overlap multiple monitor, automatically choose one monitor as the master Hz.  Animations only look fluid on one monitor.  The lower of the two Hz is automatically chosen; so a window that spreads between a monitor set to 60Hz and a monitor set to 85Hz, will refresh animations at 60fps.
(In reply to Mark Rejhon from comment #15)
> 1. Mobile systems do not support 120 Hz with a few exceptions

OK.

> 2. Full screen repaints at 1920x1080, 120 times per second, uses less than
> 5% of CPU on my system.  If you want, I have a Kill-A-Watt power meter, and
> I can do some benchmarking of 60fps and 120fps for simple browser scrolling.

It's not just the graphics card. Firefox itself uses CPU resources per paint, depending on the content, composition type, etc. Also, It's not like GPUs don't use power, rather, they're usually even less efficient than modern CPUs WRT power consumption. You could try a power meter with different refresh rates of Firefox, but CPU usage will vary greatly between different types of content.

Also, did you try this newer build with phase control? any comments on it? :

(In reply to Avi Halachmi (:avih) from comment #5)
> Also, I've added phase-shift control to the build for testing (defaults to
> +10% and then round the delay up), using
> layers.frame_rate.vsync.phasePercentage. Win32 build:
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/ahalachmi@mozilla.com-2b6b37641a3b/try-win32/
(In reply to Mark Rejhon from comment #13)
> Technically, you could also query Direct3D
> RasterStatus.ScanLine to get a very accurate prediction of when the next
> blanking interval will be (sub-millisecond accuracy).

At least on Direct3D9, this number is not guaranteed to be accurate - it may well be, depending on the monitor, GPU and drivers, but relying on it as a predictor of when to Present might well cause tearing. Trying to Present *during* VBlank may also cause issues when an OS compositor is present. If Firefox's Present call races with the OS compositor's composition event, this can cause a very irregular frame rate.

Regarding the compositor (on Windows Vista and up: the Desktop Window Manager), it would be *safest* to try to always present right after a composition event (e.g. right after DwmFlush() [1] returns). This does incur a loss in responsiveness of up to 20ms (on a 50Hz laptop screen) compared to the best case, however.

For a bit of background:

In my experiments trying to present in time for the DWM I used Sleep(2), Sleep(1), Sleep(0) and SwitchToThread() to try and minimize CPU usage associated with polling for the correct moment - something Firefox hopefully won't have to deal with too much. I was able to come to within a few ms of the ideal with the DWM enabled by measuring the OS composition interval and the durations of the aforementioned calls, then presenting 5-6 standard deviations before the predicted composition event.

This got quite complicated however and I found that on my laptop, composition was often completely irregular. Disabling the DWM and polling for VBlank manually worked better, but even then I would have to 'insert' VBlank events (and present) if one did not come within the expected time, as it became clear that my laptop's GPU would often simply fail to set the RasterStatus.InVBlank flag. This worked fairly well, but I wouldn't call it anything other than a hacky workaround (FWIW, I don't expect Firefox to deal with the situation on my laptop - it's pretty crazy and will probably become less and less relevant).
(In reply to Avi Halachmi (:avih) from comment #17)
>> 2. Full screen repaints at 1920x1080, 120 times per second, uses less than
>> 5% of CPU on my system. 

Question: Does your browser window paint refreshing also cover scrolling?

I've found simple block-copy scrolling operations use very little extra CPU and GPU on systems that are able to output 120Hz (they tend to be Radeon's and Geforce's); so if your repaint logic is also connected to the scrolling, you want to keep scrolling at the same framerate as Hz, which looks very good during smooth scrolls (momentum scrolls).   Theoretically, the 'revealed' content during scrolling at the top/bottom of the window could be refreshed or filled-in at a lower framerate, than the scrolling framerate, but I don't know if Mozilla uses an architecture that de-couples scrolling framerate from screen repaint framerates, but ideally, you want everything to run at 120fps@120Hz for all subsystems (without requiring users to go through too many hoops of configuring the browser), on a 120Hz monitor, since it really does genuinely make a noticeable difference in animations.   I realize there's pressures between W3C standardization, battery consumption (even though nearly none run at >60Hz), and all, but the 120Hz use case is still rare enough that making it run fps=Hz (at least of the primary monitor, for now) is the best rule of thumb.


> modern CPUs WRT power consumption. You could try a power meter with
> different refresh rates of Firefox, but CPU usage will vary greatly between
> different types of content.

True; if animation code that does a lot of operations run twice as much, that will have an impact.  However, scrolling logic (block copies) use very little CPU / GPU for 60fps vs 120fps on modern GPU's; the revealed content during scrolling can be run at a different framerate (i.e. lower) than the block-copy operation of scrolling.


> Also, did you try this newer build with phase control? any comments on it? :

I'll try it this weekend.  I'm now away from Toronto this week, so I'm away from my high-refresh rate equipment.  I'll definitely play with the phase-shift control and see how it behaves; and post my results.  Also, I've emailed you an invite to my Blur Busters Motion Tests, so you can test too as well.
(In reply to Emanuel Hoogeveen from comment #18)
> In my experiments trying to present in time for the DWM I used Sleep(2),
> Sleep(1), Sleep(0) and SwitchToThread() to try and minimize CPU usage
> associated with polling for the correct moment

Is FireFox able to take advantage of a module that attempts to keep accurate track of VSYNC based on all available information?
-- Historical VSYNC timing information (the research you did), including accounting for missed VBlank's ("snap-to" to the nearest correct value, based on historical values)
-- Any other VSYNC related information (discovered today and tomorrow)
-- Queries of RasterStatus.ScanLine to improve accuracy (if applicable)
-- Cross platform (different VSYNC monitoring methodologies for different platforms)
-- Detecting what the perfect 'phase' for a page flip is (where possible/supported)

This module would then predict when the next VSYNC will occur (on modern systems, possibly highly accurately).  This would theoretically allow you to do an accurate Sleep, followed by an immediate page flip.  Or the module could provide callbacks that signals that you now need a page flip.  It would be inaccurate on some systems (within a millisecond or few) while highly accurate on other systems (less than ~0.1ms on idle i7 systems and Geforce/Radeon GPU)
Avi,

The April 1st Nightly Build is no April's Fool Joke -- it's the first version of Mozilla that actually passes 120fps Blur Busters tests.  I didn't even have to adjust the phasing (at this time).  There are still bugs (obviously) such as a freeze that occurs when I try to resize the window, and other bugs (e.g. when trying to change settings), but the VSYNC is working much better beginning with the April 1st Mozilla nightly build.  

I see some framedrops, but only once every few seconds.  Nothing that further debugging/tweaking in the future couldn't fix; now that I hear Mozilla has two 120Hz monitors in-house for testing...

Tests of "rare" test cases:
-- GOOD: However, Mozilla correctly changes the repaint rate (and rAF rate) if I change the refresh rate of the monitor a sole Mozilla window is on.  So 120fps@120Hz, then 96fps@96Hz, then 60fps@60Hz, the back to 120fps@120Hz; those test cases pass.  Not that people would switch resolutions often while the browser was running; but I tested this anyway.
-- NEED ADJUSTMENT: I notice that visible repaints switch between 120fps repaints and 60fps repaints whenever I move the window between a 120Hz monitor and a 60Hz monitor.  (but rAF continues operating at 120Hz; meaning every other frame is not being displayed, when the window overlaps the 120Hz and 60Hz monitors).  On Chrome, rAF automatically changes rate.   

So far, so good -- keep up the good work.
P.S. I am not sure what Mozilla's planned policy about requestAnimationFrame running at fps=Hz at refresh rates beyond 60Hz, but I would argue that this is what probably should be industry standard (fps=Hz operation).  So far, the behavior of rAF running fps=Hz is consistent with Chrome (to 240fps), Safari (60,75Hz), IE10 (to 105Hz), so Mozilla should ideally follow suit. 

One would argue some HTML5 games assume 60fps.  However, that's a bad assumption; as some laptops throttle down to 50Hz for battery savings; while we also now have 120Hz LCD's hitting the market over the last few years.  HTML5 games rate-adapting can easily done by counting the number of rAF's per second and rate-adapting animations on-the-fly, though future JavaScript exposure could help (refresh rate, frame rate control, etc).  

Later this year, Blur Busters will spearhead a small W3C proposal in respect to creating an industry standard for rAF behavior on >60Hz monitors, but if someone gets a head start -- contact me mark@blurbusters.com
Blocks: 689418
(In reply to Bas Schouten (:bas.schouten) from comment #12)
>  We're of course facing this
> problem with OMTC as well, but there we've got some time to address it still
> :-).

Hopefully not much time :-). We really, really need OMTC; in fact I think it's higher priority than vsync. So I think vsync should build on OMTC.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #24)
> (In reply to Bas Schouten (:bas.schouten) from comment #12)
> >  We're of course facing this
> > problem with OMTC as well, but there we've got some time to address it still
> > :-).
> 
> Hopefully not much time :-). We really, really need OMTC; in fact I think
> it's higher priority than vsync. So I think vsync should build on OMTC.

I agree hopefully not much time ;-). But even first implementation of OMTC would not block on vsync, it's a more complicated story with more consequences to do that. I'm also not even sure, especially on the 50Hz machines I've tested on currently vsync is certainly of a bigger consequence to the smoothness of animations than OMTC.

Having thought about this I still think the timing approach in the current method better. I don't see why it need be significantly less accurate. But it has several advantages, at least on windows:
- It allows a more flexible approach more easily, to send the refresh event with a certain phase shift from actual vblank for example if we so wanted to.
- It seems like it might allow a more consistent approach between VBlank on/off code paths.
- It allows more easily getting the data that's correct for the window (rather than having a thread per monitor, and then figuring out the windows associated with that monitor, etc.).
I don't actually think that the approach with the timing is much better, though maybe it helps a little in some cases.  I set up basically a FCAT setup here, given that we had the Eideticker recording infrastructure already.  I added some code to the D3D layers backend to draw a color strip on the left side of every frame, so that we could distinguish frames like FCAT does.

Also, key -- on Windows 7, you need to disable Aero (go back to Windows Basic).  Having Aero/DWM enabled forces vsync on for display presentation, so we can't actually get a good idea of what Firefox is actually rendering with it on.

Here are two recordings of Firefox, one current nightly, and one nightly with the patches from here applied.  These are comprssed using the Lagarith codec (install from http://lags.leetcode.net/codec.html) and for sanity you'll want to open them in VirtualDub (http://www.virtualdub.org/) so that you can easily step forward frame by frame.  Source was 1920x1080 60Hz; I cropped the right side of the window and resized it down by 2x for size.

Pre-patch: https://www.dropbox.com/s/yi79h5p0r1x02z4/without-patch-compressed.avi
Post-patch: https://www.dropbox.com/s/5v9c0namr90gigz/with-patch-compressed.avi

The frame color order: http://people.mozilla.com/~vladimir/misc/colorbars.html

Each color on the left bar is a different frame that we rendered; if we ever have more than one color there, that's non-vsync'd-rendering, and we have tears.  We have lots of colors there both pre and post patch, and even captured some very visible tears in the tab strip (when the frame switch happened to have there).  There are also some skipped frames, where we seemed to have rendered a frame that never made it to the screen.  It doesn't seem to get better with the patch than without.
P.S. Vladimir, I'm creating a new Blur Busters Motion Test benchmark that measures frametimes between rAF calls, to the microsecond precision (in browsers that support performance.now() precision system timer reads).   This might become an additional useful tool.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #26)
> I don't actually think that the approach with the timing is much better,
> though maybe it helps a little in some cases.  I set up basically a FCAT
> setup here, given that we had the Eideticker recording infrastructure
> already.  I added some code to the D3D layers backend to draw a color strip
> on the left side of every frame, so that we could distinguish frames like
> FCAT does.
> 
> Also, key -- on Windows 7, you need to disable Aero (go back to Windows
> Basic).  Having Aero/DWM enabled forces vsync on for display presentation,
> so we can't actually get a good idea of what Firefox is actually rendering
> with it on.
> 
> Here are two recordings of Firefox, one current nightly, and one nightly
> with the patches from here applied.  These are comprssed using the Lagarith
> codec (install from http://lags.leetcode.net/codec.html) and for sanity
> you'll want to open them in VirtualDub (http://www.virtualdub.org/) so that
> you can easily step forward frame by frame.  Source was 1920x1080 60Hz; I
> cropped the right side of the window and resized it down by 2x for size.
> 
> Pre-patch:
> https://www.dropbox.com/s/yi79h5p0r1x02z4/without-patch-compressed.avi
> Post-patch:
> https://www.dropbox.com/s/5v9c0namr90gigz/with-patch-compressed.avi
> 
> The frame color order:
> http://people.mozilla.com/~vladimir/misc/colorbars.html
> 
> Each color on the left bar is a different frame that we rendered; if we ever
> have more than one color there, that's non-vsync'd-rendering, and we have
> tears.  We have lots of colors there both pre and post patch, and even
> captured some very visible tears in the tab strip (when the frame switch
> happened to have there).  There are also some skipped frames, where we
> seemed to have rendered a frame that never made it to the screen.  It
> doesn't seem to get better with the patch than without.

I only took a real quick look but those captures seem to be taken with the DWM disabled, in which case sadly the patches don't work.
Well, yes, as I said above not disabling DWM makes it very hard to actually tell if we're properly hitting vsync or not.  I'll do some recordings with DWM enabled, which will give us an idea if we're skipping/doubling frames or not.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #29)
> Well, yes, as I said above not disabling DWM makes it very hard to actually
> tell if we're properly hitting vsync or not.  I'll do some recordings with
> DWM enabled, which will give us an idea if we're skipping/doubling frames or
> not.

If you do, please also apply the phase control (top patch here: https://tbpl.mozilla.org/?tree=Try&rev=2b6b37641a3b ) as i haven't attached it to this bug. It will delay the callbacks in about 2ms by default instead of rounding the delay downwards. It appears to help quite a bit.
Aside: Although the scrolling looks smooth and VSYNC'd, the rAF seems to be broken now: I just tested the April 6 nightly; rAF stopped synchronizing to VSYNC.  So 75Hz, 85Hz, 100Hz and 120Hz looks much more awful than Chrome and Internet Explorer.
(In reply to Mark Rejhon from comment #31)
> Aside: Although the scrolling looks smooth and VSYNC'd, the rAF seems to be
> broken now: I just tested the April 6 nightly; rAF stopped synchronizing to
> VSYNC.  So 75Hz, 85Hz, 100Hz and 120Hz looks much more awful than Chrome and
> Internet Explorer.

The April 1st build you tested was a try build with some experimental patch. It's not part of the official nightly releases. Official nightlies should stay the same as before (i.e. 60hz by default, not vsynced), unless you noticed some other regression in this regard.
Avi, aha -- so the rAF algorithm is still being refined before being accepted.  No worries, I understand these things take time.  

Meanwhile, a comment about Mozilla's test case:
https://bug856427.bugzilla.mozilla.org/attachment.cgi?id=732003
You should have a toggle to enable/disable updating of HTML outside the Canvas2D.  This is because Chrome plays animations perfectly (zero stutters, even for minutes!) if you disable HTML updates outside the window.   So benchmarking with and without HTML updates outside of the Canvas2D, is very important, because many HTML5 games do not update HTML outside the Canvas2D window.
(Note: Perfect zero-stutter animation being idle system with good GPU, only Chrome browser window running, no background CPU/GPU tasks)
Assignee: nobody → avihpit
Status: NEW → ASSIGNED
I had some thoughts about how to handle this in the presense of an OS compositor, so I'll put them here. Let me know if this is creating too much noise.

Firefox is a different beast from what I'm used to when looking at how to handle VSync, because we can decide when to generate a new frame to present (instead of being bound to a consistent internal framerate). It occurred to me that in the presence of a compositor, this can change the problem from one of exact timing to one of a series of judgment calls. 

Let's look at this in terms of Windows Vista and up, with the DWM compositor, as that's what I'm familiar with. With a compositor present, it's unwise to wait for actual VBlank because trying to present during it can make you race with the compositor - it's better to work *with* the compositor rather than against it. The DWM API has a useful function, DwmFlush() [1], which returns after the next composition event has happened. Therefore you can present what you have ahead of time, and call DwmFlush() to ensure that present has been picked up - if it turns out you were too late, the call will just return a composition interval later.

For smooth animation, that means you want to avoid presenting too late. This can be as simple (with a bit of hand waving) as measuring the average composition interval, finding out how much time has passed since the last composition using a high precision timer, then calling DwmFlush() to see if you were too late (which can be inferred from how long it took to return) and increasing the safety margin if you were.

Now in the case of Firefox, we can (in theory) kick off a paint whenever. When we've taken the safety margin into account, all we have to decide is whether to kick off another paint or not based on how long they've been taking recently (again with some hand waving). So if the average paint takes 3ms, our safety margin is 2ms and each composition interval is 20ms (50Hz), you would check how much time has passed since the last compositor event, and if it's less than 15ms you would kick off another paint (if you know the variance in paint times you can include some measure of that in the safety margin). That way you might be painting excessively (which could be solved with a Sleep() call if you know paint times are very low), but you'll always have the most recent data available for presenting.

A few details:
1) You could kick off a paint event before calling DwmFlush() if paint times are long enough that you need the time of your safety margin.
2) Though my situation was a bit different, I did work with a safety margin and that usually needed about 6 standard deviations of the paint time, composition interval etc. to work consistently
3) If paint times are very variable, or if you're on a system where composition events are haphazard (like my old laptop), your safety margin might grow to be larger than the composition interval if left unchecked. If this happens you probably can't ensure smooth animation anyway, so it's probably best to cap the safety margin to something reasonable.
4) Say you've kicked off another paint close to the safety margin. For added safety you could call into the VSync thread every 1ms or so to make sure your most recent paint gets presented even if the one you kicked off takes longer than you expected.
5) Of course, none of this will work if the compositor is turned off for whatever reason. Even on Windows 7, a fair amount of people turn it off for various reasons. On Windows 8 I think it's always enabled.

So, basic control flow would look like this:
0a) Measure the refresh interval (a good indication can be obtained from DwmGetCompositionTimingInfo() [2], but this won't be entirely accurate).
0b) Initialize the safety margin to something reasonable.
0c) Use DwmFlush() to make sure you're at the start of a composition interval
1) Kick off a paint and record how long it takes.
2) If more than [safety margin] time is left before the next composition event, go to step 3a), otherwise go to step 3b)
3a) Present and use DwmFlush(). If DwmFlush() takes more than a composition interval to return, increase the safety margin by some constant amount.
3b) Present and use DwmFlush(). If DwmFlush() takes less than a composition interval to return, decrease the safety margin by some constant amount.
4) Kick off a paint and record how long it takes. If another paint could (statistically) be finished before less than [safety margin] time is left, kick off more paints until this is no longer the case. Then go to step 2).

Let me know what you think. The main difficulty is probably to get a simple but fast structure for gathering statistics (preferably using integers to avoid fp errors accumulating over time), and tweaking the way the safety margin is adjusted to avoid it growing/shrinking too fast/slow and taking care of its various edge cases.

[1] http://msdn.microsoft.com/en-us/library/windows/desktop/dd389405.aspx
[2] http://msdn.microsoft.com/en-us/library/windows/desktop/aa969514.aspx
(In reply to Emanuel Hoogeveen from comment #35)
> I had some thoughts about how to handle this in the presense of an OS
> For smooth animation, that means you want to avoid presenting too late. This
> can be as simple (with a bit of hand waving) as measuring the average
> composition interval, finding out how much time has passed since the last
> composition using a high precision timer, then calling DwmFlush() to see if
> you were too late (which can be inferred from how long it took to return)
> and increasing the safety margin if you were.

Interesting idea of the algorithm.  One concern is race conditions -- if if it kept floating the safety margin, it would stutter all the time as it tried to "hunt" for the ideal point.

-- You'd want outlier rejection algorithms to reject a certain percentage of unexpectedly-timed DwmFlush calls, in case something strange happened and something else blocked (other than DwmFlush).
-- You'd want something that stays stable.  You'd want a safety margin so CPU variances don't cause too many early/late flips.  So you'd want to be at least 25%-50% off that ideal point, so you had room for system variances.   One good stutter test I've found is a VSYNC-synchronized requestAnimationFrame() (once Mozilla officially adopts that) with random intentional delays that are less than a frame length.   (e.g. On a fast GeForce/AMD system, Chrome is able to have perfectly smooth animations even with random delays up to 80% of a frame length).  
-- You'd want a window where no incrementing/decrementing occurs.  So that it would tend to stabilize somewhere with a sufficient safety margin off the ideal point.
(In reply to Emanuel Hoogeveen from comment #35)
> I had some thoughts about how to handle this in the presense of an OS
> compositor, so I'll put them here. Let me know if this is creating too much
> noise.

I quick reply from reading this, and I should add a disclaimer that I need to analyze the proposal in more detail.

On a quick glance the approach has one fundamental problem which may or may not actually be a big issue depending on the workloads we care about:

Unlike things like video games and video players firefox does not invalidate the fullscreen on every frame. Which parts are invalidated and redrawn depend on many factors, hence depending on those factors frame timings could vary greatly, for example, in the case of 60Hz you might have a small animation going on smoothly at 60 fps, and a big animated GIF at 10 fps, your frame timings might look for example like:

8 2 2 2 2 8 2 2 2 2 8

Etc. where each frame that you redraw the GIF will be considerable longer.

When there's user interaction the situation becomes more complex, the user might trigger buttons lighting up and something on the page changing color at random points while an animation is going on, and this might happen quite frequently, so you could end up with ever more erotically varying timings. It's something we'll have to consider carefully.
So, I've also implemented a proof of concept for the WaitForVBlank in a thread approach.

Tested mainly on a win7-64 laptop with optimus (HD4000/GT650m), i7-3630qm.
- The VGA port (and main laptop display) are via the HD4000.
- HDMI port via the GT650m.

Also tested using Win8-32 on Acer Iconia W510 (Atom z2670, PowerVR SGX 54 IGPU).
- Micro HDMI port for secondary display.


Here are my results/summary of the approaches so far (with current implementation):

GetVBlankInfo (implementation: attachment Part 1 - by bas):

Pros:
+ Always returns info regardless of where the window is (main/VGA/HDMI display).
+ Only overhead is the API calls.

Cons:
- Always returns info for the main display, regardless of where the window is.
- Need to aim timer 2-3ms after the next vblank to hit the correct phase.
  - Reduces "useable" rAF callback duration by that amount.
- Stddev: 4-5% (of 16.67ms)


WaitForVBlank in a thread, post events to main thread (implementation @bottom):

Pros:
+ When it works, it's reliable and works well.
+ (single GPU) works well also on secondary monitor(s?) - refresh rate adapts.
+ stddev 1-2% (better timing than with GetVBlankInfo).

Cons:
- Fails when window is moved to the HDMI (NVidia) display (and works again when returned).
- Requires D3D10/D2D (so doesn't work on the Atom tablet). Works when forcing D3D10+D2D.
- Current implementation seems to block main thread while the thread is in WaitForVBlank.
  - Implemented guestimated PR_sleep for ~80% of the interval -> doesn't block main thread.
    - Main thread is still blocked for the rest (20%).
- Complexity/thread (compared to the GetVBlankInfo approach).


My thoughts:
- If we want to choose one approach on windows, IMO GetVBlankInfo is better.
- Main thread blocking is probably a sacrifice not worth making. If we can fix this and OK with the extra complexity:
  - Maybe we could try WaitForVBlank, Fallback to GetVBlankInfo, Fallback to (current) timers (that's how my code behaves now).

---------- Called from a thread: -----------
> nsresult LayerManagerD3D10::WaitForVBlank() {
>   RefPtr<IDXGIOutput> o;
>   if (mSwapChain) {
>     HRESULT hr = mSwapChain->GetContainingOutput(byRef(o));
>     if (!hr) {
>       if (!(hr = o->WaitForVBlank())) {
>         return NS_OK;
>       }
> 
>       printf("WFVB-E-0x%x ", hr);
>     } else {
>       printf("GCO-E-0x%x ", hr);
>     }
>   } else {
>     printf("!mSC ");
>   }
> 
>   return NS_ERROR_FAILURE;
> }
Can you attach your patch?  Main thread blocking is definitely not good, and I'm a little surprised it's happening (though it depends on how/where the wait is done, I guess if we end up waiting to present then I can see it).  But that gets better in an OMTC world, which will be world we'll be in very soon...
This is the same patch which was added at the build of comment 5 (but wasn't attached to this bug).
Enough for tests, but not "production quality".

By default the vsync thread runs for 20 seconds, then dies (and vsync fallsback to GetVBlankInfo, if available, otherwise fallsback to timers).

Can be changed to never die by setting to 0: layers.frame_rate.vsync.threadDieSeconds
One unanswered question is eliminating the default cap to 60fps.   I do implore that Mozilla considers making 120fps easily accessible to Javascript.

One big reason is the new Epic Citadel demo, http://www.unrealengine.com/html5/ which works at 1920x1080 at 120 frames per second, but only if I follow the instructions in the FAQ: http://www.unrealengine.com/html5_faq/
_____________

Q: I’m using Nightly Firefox, but when I benchmark, I am locked at 60 frames per second. Is there any way to unlock frame rate?

A: There is an option in Firefox that will unlock the frame rate, allowing Epic Citadel to run faster than 60 frames per second.

To set this option:
- In a new browser tab, type “about:config”
- In the Search box, type “frame”
- Double-click on layout.frame_rate
- Enter 500 (this will be the new max frame rate)
- Click OK and restart the browser for the change to take effect!
_____________

I want Mozilla to seriously consider automatically setting layout.frame_rate to the refresh rate of the
One unanswered question is eliminating the default cap to 60fps.   I do implore that Mozilla considers making 120fps easily accessible to Javascript.

One big reason is the new Epic Citadel demo, http://www.unrealengine.com/html5/ which works at 1920x1080 at 120 frames per second, but only if I follow the instructions in the FAQ: http://www.unrealengine.com/html5_faq/
_____________

Q: I’m using Nightly Firefox, but when I benchmark, I am locked at 60 frames per second. Is there any way to unlock frame rate?

A: There is an option in Firefox that will unlock the frame rate, allowing Epic Citadel to run faster than 60 frames per second.

To set this option:
- In a new browser tab, type “about:config”
- In the Search box, type “frame”
- Double-click on layout.frame_rate
- Enter 500 (this will be the new max frame rate)
- Click OK and restart the browser for the change to take effect!
_____________

This is one significant reason for Mozilla to seriously consider automatically not limiting layout.frame_rate to 60 by default.  Perhaps make it 0, and make it represeent "match the refresh rate of the highest Hz monitor connected".  If people find that video games play at higher frame rates in Chrome, they may be more likely to try the competition rather than going through a series of manual steps.
So, does a value of -1 default to 60fps? because my browser has that setting set to -1 and not 60 or 0.
(In reply to Franpa_999 from comment #44)
> So, does a value of -1 default to 60fps? because my browser has that setting
> set to -1 and not 60 or 0.

Currently, a value of 0 or lower would translate directly to 60, and it's -1 by default.

If you Apply the patches (part 1-2-3), then 0 or lower would translate to "Use vsync if available, else use 60".
I'm not familiar with compiling stuff so I might just wait for it to land in a stable release.
A test-only build (separate from Nightly) would be appreciated, as I'd love to do my motion tests again.  One idea is to have a test-only configuration setting that forces the WaitForVBlank method, versus GetVBlankInfo method.  Or temporary magic cookies in layout.frame_rate (e.g. -1 for WaitForVBlank, and 0 for GetVBlankInfo).

(P.S. Apologies for the accidental double-post above)
This patch combines parts 2+3 (adds phase delay) with slight cleanup, and also obsoletes part 4 (WIP using WaitForVBlank).

I'm not happy with this approach:
> +      if (sRegularRateTimer && mgr->IsWidgetLayerManager()) {
> +        sRegularRateTimer->setVsyncWidget(mPresContext->GetRootWidget());
> +      }

The vsync API was implemented at nsIWidget and according to Bas, it's probably the best place for it. So this is used to set the widget which will be used to access the API.

However, if we have more than one window, it will switch constantly, and while currently it seems that we always get vblank info for the main monitor so it doesn't matter, we prefer that each window gets info for its monitor, at which case we would get higher rate than we want.

Any advice on this would be highly welcome.
Attachment #731658 - Attachment is obsolete: true
Attachment #745489 - Attachment is obsolete: true
Attachment #745490 - Attachment is obsolete: true
Attachment #731658 - Flags: review?(vladimir)
Attachment #748775 - Flags: review?(vladimir)
Yes, that's the best method -- have each window adapt to the refresh rate of its monitor.  That said, there are several use cases to consider:

- All windows entirely on one monitor (clean scenario for separate/shared refresh rates)
- Windows on multiple monitors, fitting within each screen (clean scenario for separate refresh rates)
- Window stretched over multiple screens (no clean scenario: choose compromise such as main monitor Hz, or the lowest Hz, or the highest Hz)

You will just have to choose various ideas of algorithms for the non-clean cases:
- Choose the Hz of main monitor.
- Choose the Hz of the lowest-numbered monitor that a window is on.
- Choose the highest Hz a window is visible on,
- Choose the lowest Hz a window is visible on. (Chrome seems to choose this scenario when window is stretched across screens).
That said, if you continue to use a shared refresh rate system (reliable synchronization to one only one monitor for all windows), consider the use cases:
- 120Hz as main, 60Hz as secondary
- 60Hz as main, 120Hz as secondary

As long as animations (rAF, WebGL, etc) can be allowed to run fully synchronized at 120Hz if Mozilla is maximized on just one monitor (the 120Hz one), it will, in my opinion, pass the VSYNC benchmark of the Blur Busters Motion Tests.
Comment on attachment 731657 [details] [diff] [review]
Part 1 (by bas): add vsync API via widget

Review of attachment 731657 [details] [diff] [review]:
-----------------------------------------------------------------

This all looks fine.  I don't like the idea of calling more QPC/QPF in here, but can't do much about it.
Attachment #731657 - Flags: review? → review+
Comment on attachment 748775 [details] [diff] [review]
Part 2 v4: Use the vsync API from the refresh driver

Review of attachment 748775 [details] [diff] [review]:
-----------------------------------------------------------------

Just needs two changes; looks fine otherwise.

::: layout/base/nsRefreshDriver.cpp
@@ +290,5 @@
> +      // round the delay up (by adding 1, since we later floor) and add a little (10% by default).
> +      // Note that newTarget doesn't change (and is the next vblank) as a reference when we're back.
> +      static const double kDefaultPhaseShiftPercent = 10;
> +      static const double phaseShift = 0.01 *
> +        (Preferences::GetInt("layout.frame_rate.vsync.phasePercentage", kDefaultPhaseShiftPercent) % 100);

Should use a cached pref here for this, or just read it once at startup; not once per frame.

@@ +532,4 @@
>  {
>    int32_t rate = Preferences::GetInt("layout.frame_rate", -1);
>    if (rate <= 0) {
> +    // outIsDefault indicates to the caller that we should only use the return value as fallback to vsync

Let's just call this variable *preferVsync or something and be explicit about what it does?  Here and in the PreciseRefreshDriverTimer constructor.
https://tbpl.mozilla.org/?tree=Try&rev=1941bc0213d6

I'm getting a lot of Mochitest failures, probably for similar reason to bug 820167 (turned out to be early reference to LayerManager), but now it's probably an early |mPresContext->GetRootWidget()|.

It's rebased to a relatively clean revision ( https://tbpl.mozilla.org/?rev=81dd97739fa1 ), so I definitely can't explain the B2G failures, since the part I consider to be the issue is within #ifndef MOZ_WIDGET_GONK

This patch will work equally well if we'll delay calling GetRootWidget() until things are settled down. Any idea how to check that other than ugly hardcoded timestamps checks?

I'll push a trybuild with 10s delay before the patch kicks in, just to try to isolate the issue.
I wrote a test application to detect smoothness/amount of stuttering via Emscripten. The application is a _very_ GPU-lightweight emscripten-based WebGL application that uses requestAnimationFrame to drive animation. The result was that on my Windows 7 laptop both latest Firefox stable and Firefox nightly from about a week ago demonstrate stuttering issues.

I posted a demo and an issue about this to emscripten issue tracker, see here for the full story: https://github.com/kripken/emscripten/issues/1170

The extremely peculiar thing is that of all the platforms I've tested in, I can detect stuttering in the frame deltatimes that the application receives, but on Firefox, what seems to happen is that deltatimes are good (under 16.666..ms), but visually estimated, the drawing is still not smooth. Most weird! It makes me think some kind of threading is in place that has an effect here.

I tried with layout.frame_rate=-1 and layout.frame_rate=500, but neither had an effect. On another Windows 8 computer, the animation was smooth, so this does not seem like a consistent issue. On my Mac Mini OSX Firefox, the animation was not smooth either. Can you reproduce?
(In reply to Avi Halachmi (:avih) from comment #53)
> https://tbpl.mozilla.org/?tree=Try&rev=1941bc0213d6
> I'm getting a lot of Mochitest failures, probably for similar reason to bug
> 820167 (turned out to be early reference to LayerManager), but now it's
> probably an early |mPresContext->GetRootWidget()|.

The try servers were unstable during that push. I've repushed an identical try which looks much better:
https://tbpl.mozilla.org/?tree=Try&rev=5d1e584d665e
But it still has few issues (some of them non-windows).

This limits the trigger to windows only, and still got the windows issues:
https://tbpl.mozilla.org/?tree=Try&rev=615d1cc313dd

For reference, this keeps all the code but disables the trigger, and it looks clean:
https://tbpl.mozilla.org/?tree=Try&rev=8cb0e4747568

> This patch will work equally well if we'll delay calling GetRootWidget()
> until things are settled down. Any idea how to check that other than ugly
> hardcoded timestamps checks?

This is also windows only, and doesn't trigger before first paint:
https://tbpl.mozilla.org/?tree=Try&rev=d77e1afd8037

It's better, but still consistently fails Cipc, Ripc and Mn, which are intermittent oranges which get permanent by this patch.

FWIW, this is "the trigger" around which all the above patches circulate. I still feel the same about it:
(In reply to Avi Halachmi (:avih) from comment #48)
> I'm not happy with this approach:
> > +      if (sRegularRateTimer && mgr->IsWidgetLayerManager()) {
> > +        sRegularRateTimer->setVsyncWidget(mPresContext->GetRootWidget());
> > +      }
Accessing the vsync API statically instead of via a widget. This bypasses the part which I don't like ("the trigger"). If this proves good, it'll need some rearrangement of the API access code, since the refresh driver is probably not the best place for it.

It always uses the desktop window to get vsync info, which was happening anyway with the original patch, and seems to behave identically as far as vsync timing goes.

https://tbpl.mozilla.org/?tree=Try&rev=9564af44bdde
(In reply to Avi Halachmi (:avih) from comment #56)
> Accessing the vsync API statically instead of via a widget. This bypasses
> the part which I don't like ("the trigger"). If this proves good, it'll need
> some rearrangement of the API access code, since the refresh driver is
> probably not the best place for it.
> 
> It always uses the desktop window to get vsync info, which was happening
> anyway with the original patch, and seems to behave identically as far as
> vsync timing goes.
> 
> https://tbpl.mozilla.org/?tree=Try&rev=9564af44bdde

Try looks good to me. Where should the API reside?
Flags: needinfo?(vladimir)
From looking at that, I think you should just create a new RefreshDriverTimer subclass -- e.g. WindowsDWMVSyncRefreshDriverTimer or similar.

Then you can only onstruct it if the version of windows is appropriate, otherwise use the previous Precise one.  I like that idea the best so far, because then we can have multiple RefreshDriverTimer impls for different approaches/platforms instead of shoehorning everything in with #ifdefs.  Any shared code can be put into a base class as needed.
Flags: needinfo?(vladimir)
I'm much happier with this patch compared to the original approach of API access via a widget object (and especially obtaining that object), though this code still adds two #ifdefs.

Also, this patch modifies the default refresh rate from 60 to 59.94, since supposedly that's the common rate around. Though close enough, it's not strictly related to this patch, so I can remove it to a new bug if that's preferred.
Attachment #731657 - Attachment is obsolete: true
Attachment #752392 - Flags: review?(vladimir)
Actually, 59.94 only applies to analog NTSC TV's (this was a hack they came up for NTSC Color).  See Wikipedia for an explanation of 59.94 when they switched from NTSC black-and-white to NTSC color, back in the 1950's.  
http://en.wikipedia.org/wiki/NTSC

HDTV's switched back to 60Hz.  Most LCD controllers now refresh closer to 60.000Hz than 59.940Hz, so I suggest undoing this default refresh rate change.
...That said, fractional refresh rates should be configurable, so make the preference a float rather than an integer.  That way, the hard-core 59.94 users can set that, if they must.
Once this is implemented maybe we could gather statistics about common refresh rates via telemetry, if it matters. My two PC monitors and my laptop all use 59.94Hz - they're hardly brand new, but they definitely don't predate HDTVs.
Well, that's because old habits die hard.  Plus, when you connect laptops and computers to TV's (something happening more and more often), surfing Mozilla on a HDTV, you're typically getting 60Hz.

The analog relic, 59.94, has got to go.  It shouldn't be the default, IMHO.  Remove it.
(In reply to Avi Halachmi (:avih) from comment #59)
> Created attachment 752392 [details] [diff] [review]
> Part 3 v2: move the vsync API to a derived refresh driver class

https://tbpl.mozilla.org/?tree=Try&rev=61a2d36b01a2


(In reply to Mark Rejhon from comment #60)
> Actually, 59.94 ... fractional refresh rates should be configurable ...

http://msdn.microsoft.com/en-us/library/windows/desktop/ee417025%28v=vs.85%29.aspx
"... Often, developers choose 60 Hz as the refresh rate, not knowing that the enumerated refresh rate from the monitor is approximately 60,000 / 1,001 Hz ..."

The most common display these days is a laptop display, and I believe those are still 59.94 most of the time (mine is). Also, other than this one, I believe most other refresh rates are not fractional.

Regardless, the default refresh rate value and whether or not we should allow fractional config is a different discussion, and the change to 59.94 of this patch is certainly not mandatory. Let's keep this bug focused on vsync.
So just to be clear here, is the default refresh rate anything more than a value used during initialization? Since the intent is VSync, I would imagine that the actual refresh rate as reported by the device or compositor is what it actually ends up using, unless the reported refresh rate is somehow invalid (e.g. 0). Is the refresh rate reported by DwmGetCompositionTimingInfo used, or something else? Is it a floating point value?
The default refresh rate is the one which is used when DwmGetCompositionTimingInfo is not available (if applying this patch, and everywhere by default before this patch).

When DwmGetCompositionTimingInfo is available, it's accurate as far as I tested, and provides fractional ms intervals and timing. However, that only happens on Windows Vista or later, and only when DWM is enabled, i.e. not when using classic theme etc.
Right, that makes sense. It seems you can also get the refresh rate from DXGI through IDXGISwapChain::GetDesc (in the BufferDesc member of the returned DXGI_SWAP_CHAIN_DESC object). I don't know whether that works without the DWM active.. I would hope so, but it doesn't say. If it works, that would certainly be a better fallback for 75Hz monitors and the like.
Comment on attachment 752392 [details] [diff] [review]
Part 3 v2: move the vsync API to a derived refresh driver class

Review of attachment 752392 [details] [diff] [review]:
-----------------------------------------------------------------

Please don't put up patches for review that are built on top of other unreviewed patches!  Flatten everything together into one patch that could be ready for checkin, even if you keep them separate in your queue for further development/fixups.  (e.g. generate it via "hg diff -r qbase" or similar)

Overall, this approach is good, but it should be cleaner still, see below.

::: layout/base/nsRefreshDriver.cpp
@@ +269,5 @@
>      TimeDuration rateDuration = mRateDuration;
>  
>      TimeStamp lastVblank;
>      TimeDuration vblankInterval;
> +    if (mPreferHwTiming && NS_OK == GetVBlankInfo(lastVblank, vblankInterval)) {

I would get rid of mPreferHwTiming and GetVBlankInfo on this base class; just refactor it such that there's a virtual method that can be overridden.  ScheduleNextTick is already such a method, so you should be able to implement it in the Windows VSync driver timer class.

@@ +639,4 @@
>      sRegularRateTimer = new PreciseRefreshDriverTimer(rate, isDefault);
> +#else
> +    sRegularRateTimer = new PreciseRefreshDriverTimerWindowsDwmVsync(rate, isDefault);
> +#endif

Only construct the PreciseRefreshDriverTimerWindowsDwmVsync if we can actually get the DWM API that we need -- otherwise construct PreciseRefreshDriverTimer.  For example, something like...

#ifdef XP_WIN
    if (WindowsVSyncRefreshDriverTimer::Usable()) {
      sRegularRateTimer = new WindowsVSyncRefreshDriverTimer();
    }
#endif

    if (!sRegularRateTimer)
      sRegularRateTimer = new PreciseRefreshDriverTimer();

The Usable() call can load the library and check for the symbol, and stash them in static member variables.

This way, you don't need a bunch of ifs in both this and in PreciseRefreshDriver -- we just construct exactly what we're going to use.
Attachment #752392 - Flags: review?(vladimir) → review-
Attached patch V6 - Address comment #68 (obsolete) — Splinter Review
Not sure if using static dll pointer and adding another full ScheduleNextTick is the cleanest, but I don't feel too strongly about it, so I fully followed comment #68.
Attachment #748775 - Attachment is obsolete: true
Attachment #752392 - Attachment is obsolete: true
Attachment #748775 - Flags: review?(vladimir)
Attachment #752774 - Flags: review?(vladimir)
Comment on attachment 752774 [details] [diff] [review]
V6 - Address comment #68

Review of attachment 752774 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, so I'll r+ it, but please fix the issues below before checking it in.

::: layout/base/nsRefreshDriver.cpp
@@ +359,5 @@
> +  {
> +    if (!sDwmGetCompositionTimingInfoPtr) {
> +      // The API may not exist on XP.
> +      return NS_ERROR_NO_INTERFACE;
> +    }

No need to test for this -- I'd put in a MOZ_ASSERT(sDwmGetC...) for debug builds, but we'll never create this refresh driver if the API isn't available.

@@ +395,5 @@
> +        NS_OK != GetVBlankInfo(lastVblank, vblankInterval) ||
> +        vblankInterval < kMinSaneInterval ||
> +        vblankInterval > kMaxSaneInterval ||
> +        (aNowTime - lastVblank) > kMaxSaneInterval ||
> +        (aNowTime - lastVblank) < kNegativeMaxSaneInterval) {

Man, I wish we had some kind of "between" primitive so that you could write this as

|| !between(vlankInterval, kMinSaneInterval, kMaxSaneInterval)
|| !between(aNowTime - lastVblank, kegativeMaxSaneInterval, kMaxSaneInterval)

not sure where you could add one, though.  But either way, for my sanity, make the order of < and > match up -- in the first pair you do < min || > max, in the second pair you > max || < min. :)

@@ +605,5 @@
>  
>  /* static */ void
>  nsRefreshDriver::Shutdown()
>  {
> +  // XXXavih: This cleanup is never reached.

It's called in the Shutdown method in nsLayoutStatics, so if it's not being called, the problem is there, not here.  But having this not be called is OK.

@@ +683,5 @@
> +  if (!sRegularRateTimer) {
> +    bool isDefault = true;
> +    double rate = GetRegularTimerInterval(&isDefault);
> +#ifdef XP_WIN
> +    PreciseRefreshDriverTimerWindowsDwmVsync::LoadDll();

We already have nsRefreshDriver::InitStatics and nsRefreshDriver::Shutdown -- call LoadDll/UnloadDll there.
Attachment #752774 - Flags: review?(vladimir) → review+
Doesn't apply cleanly to latest inbound, so waiting with the push until inbound updates from m-c.
Attachment #752774 - Attachment is obsolete: true
Attachment #755337 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/fcb0d844e1c4
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24
Great!  Does this mean 120Hz-capable rAF is now going to hit Nightly?
I'd like to test this with my motion tests.
Just installed firefox-24.0a1.en-US.win32.installer.exe (May 29th trybuild).  Needed to figure out how to download setup executables from the try build links.

The good news is that it mostly passes the Blur Busters Motion Tests.  It stutters more often than Chrome and IE10 does, but it passes the tests.  Good start!  Exciting to finally see it arrive in Nightlies, if what I am reading is correct!

One quirk I've noticed (compared to other browsers): It's much harder to detect stutters.  In Chrome and IE, I can reliably detect whether or not I stuttered simply by checking a timer between consecutive rAF calls, and seeing if it exceeds a threshold.  FireFox will now often stutter without the JavaScript being able to detect whether it stuttered or not.   For example, if a rAF call takes too long and a stutter occurs, the next rAF will still be called "on time".  So thus, I'm unable to detect that it stuttered.  rAF is called exactly 120 times a second, but it is stuttering 30-60x per second.

Eventually -- It would be lovely if a W3C standardization for rAF extensions for high-refresh rate monitors -- so that it gives more accurate stutter reporting & vsync capabilities detection, so I don't have to do hacky useragent detection techniques to decide if a browser supports VSYNC or not.  When someone wants to work on this with me, please contact me at mark@blurbusters.com so that we can do a collaboration.
(Note: Tests were on a 120Hz monitor.)
UPDATE: The stuttering problem also occurs in Epic Citadel 5 too.  It runs at 120fps@120Hz on my 120Hz monitor, but it's stuttering as if the pageflips aren't done accurately enough during heavy-load animations.  That the frames are being rendered on time and fast enough, but isn't being flipped at the correct times.

It is, however, currently working nearly perfectly during light-load animations.
Nightly download from nightly.mozilla.org
Tests done on the TestUFO Motion Tests (Email me for login/password).

Windows 7 -- PASS VSYNC test
Windows 8 -- PASS VSYNC test
Mac -- PASS VSYNC test

Not as reliable as Chrome (yet), some unexplained stutters, but otherwise PASS.
Windows XP is untested (at this time).

I've now modified the useragent detection to green-light FireFox Version 24 as a VALID browser for Blur Busters Motion Tests.  We dislike useragent detection, but it's necessary till W3C standardizes vsync capabilities detection (which I intend to be a part of).
I have downloaded Aurora, and it's still passing tests.  Good.

That said, the stuttering problem is still happening in Epic Citadel 5 (Unreal Engine), as well as situations when I add more processing (to consume ~5% of CPU time) to each requestAnimationFrame call.   So that the API still runs at 120fps, Task Manager shows 10%, but there's a lot of stuttering even though it's running at 120fps.

I think there's a very *minor* bug

Test case
1. Run Epic Citadel HTML5 on a 120Hz computer monitor.
2. Find high-contrast vertical borders (e.g. preferably a dark tree in front of bright sky.
3. Strafe left/right.

You'll see the stutters; it mysteriously runs less smoothly than it should, as if the pageflippings are not happening at the right times.
I have (quietly) launched the Blur Busters Motion Tests, finally.
<a href="http://www.testufo.com">TestUFO.com: Blur Busters Motion Tests</a>

- <a href="http://www.testufo.com/#test=framerates-text">Test: Vertical Scrolling Text</a>
- <a href="http://www.testufo.com/#test=photo">Test: Moving Photo</a>
- <a href="http://www.testufo.com/#test=framerates">Test: 30fps vs 60fps vs 120fps</a>
Flash applications still stutter a fair bit with latest Firefox 24 nightly and Adobe Flash Player plugin while Aero is enabled. Unsure if this is a Flash issue or not though.

just download a copy of Bubble Tanks 2 so that you can play it without it being in a tiny box and then play it. It stutters quite a lot. Same for Youtube videos.
That's not Flash.  Those animations are 100% HTML5.
www.testufo.com even works on iPads, if you don't believe me.
This is the world's first VSYNC-accurate motion test website, ever launched on the Internet.
I wasn't talking about your website.
Ooops, my apologies.  Your reply came only 2 minutes right after mine.
No vsync is present when using an Aero Basic theme. I tried googling to see if there was an option I had to enable in the browser but couldn't find anything.
The VSync support added in this bug is based around Windows' compositor, I believe, so it won't work on Aero Basic. Adding VSync support for Windows XP and people running with the compositor disabled will require either using blocking Present calls or building in machinery to poll for VBlank, so that will probably be easier after OMTC is finished. At the moment, the focus is on improving VSync support (where it's available) for complex pages.
Thankyou for the clarification. What about the stutter when using Flash applications while Aero is active?
I mean is work still being done to improve it?
I think Flash handles its own painting, but I'm not really sure.
Franpa -- Yes, Avih mentioned that he is working to improve the synchronization.  Chrome and IE10's synchronization strength is much stronger than FireFox 24+ at the moment, but I'm pretty happy to see FireFox finally VSYNC.

P.S. Opera 15 now passes the VSYNC test; even at 120Hz.
P.P.S. VSYNC works in Google Chrome on Windows XP and Aero Basic, as well as on Mac's too.

NOTE: Eventually we need a W3C discussion/standard for the ability of webpages to turn on/off VSYNC, as well as configure a preferred requestAnimationFrame rate (e.g. HTML5 games that really want to stay at 60fps, to things like VSYNC OFF competition gaming.  That said, that still won't bypass the Windows manager's compositing, so VSYNC OFF will still effectively behave as triple-buffering.   Which W3C mailing list should I start these such discussions in?  I want to participate in them.
Here's a different, useful test:
http://www.testufo.com/#test=animation-time-graph

This graphs the deviations in the timing of requestAnimationFrame() ... It recalibrates the average (center) once every second, to center the graph data.

I notice if you maximize the window and then change the pixel thickness to 1 pixel thick, and switch to 120Hz, the graph scrolling starts to stutter:
http://www.testufo.com/#test=animation-time-graph&scale=5&ppf=1 
But it doesn't show up as increased inaccuracies, so there appears to be other inaccuracies at play other than this.
(In reply to Mark Rejhon from comment #93)
> I notice if you maximize the window and then change the pixel thickness to 1
> pixel thick, and switch to 120Hz, the graph scrolling starts to stutter
> But it doesn't show up as increased inaccuracies, so there appears to be
> other inaccuracies at play other than this.

While we trigger iterations relatively accurately (and these are the intervals between rAF callbacks - which your graph displays), by the time it's presented to the OS we've managed to accumulate inaccuracies which are also affected, as you found out, by window size.

Bug 894128 will hopefully track the quest for 100% smoothness on windows.
Depends on: 894128
I just noticed an interesting behavior.  Scrolling is sometimes smoother (but with lots of pauses) during the first 10 seconds of launching of FireFox and then immediately loading the page from cache.  Then afterwards, scrolling starts to "miss a bunch of vsync".

Likewise for animations (on 120Hz monitor, maximized window):
http://www.testufo.com/#test=mprt&size=8
This plays smoothly at first (many vsync's caught), then starts to miss a lot of vsync's.

Watch how the animation looks over a 20 second time period.
- First 5 seconds it plays smoothly for several 1/2 second stretches with a lot of long freezes
- After 15 seconds, it settles into a cycle of approximately 50% caught VSYNC, 50% missed VSYNC, so it starts to 'jitter'.
NOTE: You may have to tune the intensity of animation (e.g. increase the size= parameter to lower CPU utilization per animation frame).  Do this until it's just right at the point where it easily runs at full framerate=Hz animation (After a few seconds).  Then copy the URL.  Quit FireFox 24 and restart, immediately paste URL, and observe the animation.  Stays fluid and then becomes less fluid, as if the timing of vsync page flippings are "hunting", and then settling into a suboptimal moment.
Benchmarks of requestAnimationFrame between FireFox and other browsers.  I retested in the latest FireFox 25 and 26, with no improvement yet.  You will observe that FireFox is very sensitive to stutters long before framerate degrades, so it shows VSYNC's are being missed.  

The strange hole of fluidity in the chart (the ones marked with "!!!") is probably due to consistently missed VSYNC's and always animating nicely always via the "next" VSYNC, so that it ends up looking correct.

Is rAF() being scheduled a full frame in advance of VSYNC?  That would be the most ideal situation for fluidity.
Mark, the current scheduling works as follows:

- Only layout flushes are scheduled (that's the refresh-driver's job), and if screen update is required, the rest follows as soon as possible (composition, paint, etc), possibly using events (i.e. not directly from the scheduled layout flush).

Also, right now, D3D presentation doesn't use double buffering, but rather copies its buffer to the OS whenever it happens to run.

- vsync scheduling at the refresh driver works as follows:

1. Find the next (future) vsync signal timestamp, and schedule the next tick to that timestamp + 10% interval (typically 10%=1.7ms) - to try to always hit the correct side of vsync. That 10% value could be overridden with a hidden pref: layout.frame_rate.vsync.phasePercentage (the value is read modulus 100, so e.g. 324 would result in 24 (%) aim past the next vsync). The pref requires restart to take effect.

For non-vsync scheduling, it's quite similar to the above, except for that 10% factor.

Also note that the actual delay which we use for the scheduling is rounded to integer ms resolution. So if the next vsync is in 3ms, and we try to aim +10% = 4.7ms, it will actually set the delay to 5ms.

Also note that the delay resolution depends on the system resolution for timers (set by win32 API timeBeginPeriod). Firefox doesn't default to High resolution timers mode, but rather sets it up as high-res for 90s following different triggers. Right now one of those triggers happens to be tab animation (opening or closing a tab).

However, since this is a system resolution, if any other application on your system has set it for high resolution (e.g. some video players during playback etc), then firefox will get high resolution timeouts even if it hasn't asked for it explicitly.

2. _after_ scheduling its next tick, the refresh driver starts flushing layouts synchronously. This may be quick, but may also be long, and even longer than 16.7ms on some cases. If it took too long (such that the next schedule is already due by the time the current flush cycle finishes), then typically the next iteration would happen rather quickly.

The behavior of 2 above is suboptimal IMO, since I think it would be better to schedule the next tick _after_ finishing the current flush cycle, such that if that cycle took too long, it would be taken into account implicitly at the scheduling calculation. I've posted a patch which does that on bug 880036 (and also allows adaptive/full vsync behavior), but right now we don't plan to land it yet, since that bug is still under investigation, and that patch doesn't touch the deeper issue (paint starvation by design) but rather works around it.

However, while it might not be the right fix for the issue of 880036, it might still help with vsync timing, though this side effect wasn't tested.

So for now, you could play with the percentage pref, and see how much of a difference it makes.

I also hope to post a try build which will allow to play with all the timing factors of which I'm aware, with live-prefs (i.e. none would require restart). These would be:
- vsync schedule phase percentage.
- scheduling before/after the flush cycle.
- debug prints to console of missed frames (bug 900785).
- debug prints to console of when firefox sets high/low resolution timers (however, as mentioned above, it would be meaningless if high-res is set by other apps).
The undocumented layers.frame_rate.vsync.phasePercentage setting still exists?
about:config -> Create -> New Integer -> play with 0 to 100?
Correction.

Your comment #5 says
layers.frame_rate.vsync.phasePercentage

Source code says:
layout.frame_rate.vsync.phasePercentage
The code always wins ;)

It's likely that I copied it from the code for comment 5, so it probably was correct for that experimental build. But since the existing pref for frame_rate was layout, I probably fixed it by the time the code landed.
Also note that 100%100 is 0, so you'd want values 0..99
FireFox-specific requestAnimationFrame tests at various phasePercentage settings.  Notice the VERY INTERESTING pattern.

phasePercentage above about 80 began to behave buggy, starting to reduce FireFox framerates, so the testing was not reliable.  My prediction is that the best fluidity would theoretically occur at a negative phase percentage of about 10%, but for the time being, it appears that the new best default for phasePercentage is 0 until further notice, it appears to make animations slightly more robust, at least until the rendering pipeline is refined.

This attachment is the PNG version of the FireFox-specific rAF() fluidity tests.
This attachment is the Excel version of the FireFox-specific rAF() fluidity tests.
Fixed version of FireFox_rAF_Benchmarks_phasePercentage.png
Attachment #788592 - Attachment is obsolete: true
Fixed version of FireFox_rAF_Benchmarks_phasePercentage.xlsx
Attachment #788593 - Attachment is obsolete: true
Avi, there's another interesting effect I discovered during tests.  Please try this:
http://www.testufo.com/#test=animation-time-graph&scale=5&measure=rendering&ppf=4&easteregg=1&busywait=0
Wait for the graph to fills up.

Now move the mouse around very fast on the graph, the time spent inside rAF() seems to fall dramatically.  It looks like JavaScript occasionally takes half the time to execute and sometimes double the amount of time, so causes sustained time periods of spikes.  Why does this happen?  Is that the CPU being throttled up/down, priority being raised up/down, or is that because of high/low precision timers being toggled randomly?   This sort of stuff may be wreaking havoc on other things, like scrolling fluidity (increasing average deviations).

Either way, independently of this, I think the new best default for phasePercentage is 0 (for beta 24, aurora 25, and nightly 26) because it seems the current state of the FireFox rendering pipeline is such today, that phasePercentage 0 is working noticeably better than phasePercentage 10.
Great data, Mark!

I think that the most interesting column is the 60hz/75% one. It shows that we could do vsync on 12+ ms rAF delay range [3-14]... if we could aim consistently. I think that this column also shows that on your system, the overhead at the Firefox side is ~4ms per frame (I was hoping for less, TBH...). This 4ms value also roughly matches the 120hz columns data as far as I can tell.

So, I think that theoretically, if we could aim at the next vsync minus slightly less than the Firefox processing overhead, we'd get the maximum benefit (i.e. would be able to handle the longest rAF range starting from 0 without missing vsyncs).

The firefox overhead is probably the duration starting at the beginning of the refresh driver tick, and ending at the compositor's PostPresent.

We could probably measure this for the previous cycle, and I'd even imagine that if we assume that the next cycle would be similar to the previous one, we'd only make a big "error" on load levels transitions. Not sure how this would behave in practice, as I've never looked at the firefox-overhead distribution over time (i.e. if it's varying a lot, this extrapolation system might do more harm than good).

roc, bas: any thoughts on the above? Also, on OS X we don't do any of this trickery, yet it's very smooth. Maybe that fact that we do double buffering there mitigates the timing issues since it only has to make sure the next frame is buffered rather than having to present it with relatively accurate timing?
Flags: needinfo?(roc)
Flags: needinfo?(bas)
(In reply to Avi Halachmi (:avih) from comment #109)
> ... Also, on OS X ...

I feel there's a gap in the logic there that (non blocking?) double buffering will fix it, though I can't quite put my finger on it. Maybe it also needs, like on OS X, to block if we try to push the 2nd frame within a single vsync interval (i.e. when swap hasn't happened yet on the oldest frame at the buffers) for this to work properly?
In a non-blocking setup, the worst case scenario happens when you get 60fps on average, but the frames arrive at irregular intervals. Then if you have one refresh interval in which two new frames arrive, you present both of them and the first is skipped - but then during the next refresh interval *no* new frames arrive so you end up duplicating a frame. In essence you want to avoid consuming new frames too greedily, so you can either keep a double (ring) buffer (so you'll always be a frame behind unless an entire refresh interval passes without any new frames coming in), or switch to some sort of (possibly threaded) blocking setup to ensure you never push more than one frame per interval. Or some more complicated hybrid to reduce latency.
I am running a quad-core i7 setup with a Geforce GTX 680 (runs 120fps in HTML5 Unreal5).  A double buffer is very beneficial for fluidity, but adds 1 frame of input lag.  Gamers often dislike input lag, so it is a trade-off, especially since Canvas2D is used for HTML5 and WebGL games.  The windows desktop manager itself already acts as a kind of a buffer layer, ideally.

Can you explain why this is happening:
http://www.testufo.com/#test=animation-time-graph&scale=5&measure=rendering&ppf=4&easteregg=1&busywait=0
Wait for the graph to fills up.
Now move the mouse around very fast on the graph, the time spent inside rAF() seems to fall dramatically.  rAF() uses less CPU if I'm moving the mouse?
I see rAF() overhead fluctuate massively (deviation 2-3ms), something that does not happen in Chrome or Opera (deviation <1ms). 

The fluctuations is definitively identified to be another source of stutter; sometimes animations run smoother whenever I'm moving the mouse (or not moving the mouse) because that extra time puts it into the YELLOW/RED territory or vice-versa.

So rAF() execution time inconsistency is part of the problem.
Photo Attached.  There's another strange behaviour I've discovered within FireFox in addition to the above.

Another good test is using a camera with a 1/2 second exposure pointed at http://www.testufo.com/#test=frameskipping .... (lower monitor brightness as much as you can, due to long exposure).  This is a graph that tells you exactly which frame got skipped.  I see some strange patterns where the exact same frame is being skipped, such as skipping every exactly 28th frame, or skipping every exactly 30th frame.  

To understand how I determined this, go read the instructions at the top of http://www.testufo.com/#test=frameskipping ... For FireFox, you want a really, really long exposure (1/2 second or 1 second) so you can capture many refreshes.

To do this type of frame skipping analysis (this test pattern was not originally created for browser debugging, but proved amazingly relevant)

Configure the test
1. Obtain a camera with adjustable exposure. 
2. Lower brightness of monitor, since you're taking a long exposure of your computer monitor.
3. Choose a long exposure, preferably 1 second.
4. View http://www.testufo.com/#test=frameskipping  (Read the instructions at the top to understand how this test works)
5. Configure it to a larger number of grid squares.
5. Wait for test to say "VALID" 
6. Take a photo.
7. Analyze photo for frame skipping.
8. If you see no frameskipping, enlarge window & grid until you do.  If you still get no frameskipping, adjust phasePercentage until you just barely do.
9. You will now observe frameskipping occuring every exactly X frames

I am curious to see during "borderline situations" (where it is nearly perfect but is teetering on occasionally missed VSYNC's), FireFox has a tendancy to do frame skipping on every exactly "Xth" frame during this specific situation.  This is interesting behaviour I thought I'd add.
(In reply to Avi Halachmi (:avih) from comment #109)
> I think that the most interesting column is the 60hz/75% one. It shows that
> we could do vsync on 12+ ms rAF delay range [3-14]... if we could aim
> consistently. I think that this column also shows that on your system, the
> overhead at the Firefox side is ~4ms per frame (I was hoping for less,
> TBH...). This 4ms value also roughly matches the 120hz columns data as far
> as I can tell.
> 
> So, I think that theoretically, if we could aim at the next vsync minus
> slightly less than the Firefox processing overhead, we'd get the maximum
> benefit (i.e. would be able to handle the longest rAF range starting from 0
> without missing vsyncs).
> 
> The firefox overhead is probably the duration starting at the beginning of
> the refresh driver tick, and ending at the compositor's PostPresent.
> 
> We could probably measure this for the previous cycle, and I'd even imagine
> that if we assume that the next cycle would be similar to the previous one,
> we'd only make a big "error" on load levels transitions. Not sure how this
> would behave in practice, as I've never looked at the firefox-overhead
> distribution over time (i.e. if it's varying a lot, this extrapolation
> system might do more harm than good).
> 
> roc, bas: any thoughts on the above?

My main thought is that we need to be actually detecting the vsync event and using that to trigger the next rAF instead of using a timer :-).
Flags: needinfo?(roc)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #114)
> My main thought is that we need to be actually detecting the vsync event and
> using that to trigger the next rAF instead of using a timer :-).


Yes.  I think we need to focus on this instead of trying to tweak the heuristics currently.  Let's make a plan on how to do this?  I think there's a few prereqs we need to get in place, and we need to figure out if we need to do Present on a different thread and all that stuff.
Assuming you can trigger rAF from another thread and you don't mind triggering it right after VBlank, an easy way to do this with the DWM is to have a thread that simply calls DwmFlush() [1] in a loop and sets an event whenever it returns. Obviously that still only works if the compositor is enabled, but the current system has the same limitation.

[1] http://msdn.microsoft.com/en-us/library/windows/desktop/dd389405.aspx
That... seems too easy!  Having a thread calling DwmFlush in a loop while rAF is active, and injecting a rAF event into the main thread seems really straightforward.  We should take this to a new bug for that.
One possible complication is that the DWM may be completely broken (I've seen this sporadically on my laptop), in which case it can return instantly several times in a row. So it might be good to check that a reasonable minimum time has passed before triggering another event (say more than half a refresh interval for the reported refresh rate).
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #117)
> That... seems too easy!  Having a thread calling DwmFlush in a loop while
> rAF is active, and injecting a rAF event into the main thread seems really
> straightforward.  We should take this to a new bug for that.

Executing rAF immediately at Vblank is a good idea (or execute rAF right after compositing if you do a compositing step before Vblank).  I think that's what some other browsers do.  Depending on what the rAF-to-display pipeline is (buffer layers, compositing, etc)
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #117)
> That... seems too easy!  Having a thread calling DwmFlush in a loop while
> rAF is active, and injecting a rAF event into the main thread seems really
> straightforward.  We should take this to a new bug for that.

/me thinks about calling DwmFlush either right before presentation or right after it.
I've relocated DwmFlush()'s in a past application and it improved framerate synchronization noticeably (one DwmFlush per refresh is enough, I believe).    Though historically, I discovered it was reliable to use the return from a Direct3D Present() as a timing reference of a VSYNC, so I removed all references to DwmFlush in my old software.  (But this info may not be applicable here, if you're not using double buffering & you're manually flipping).

Emanuel -- is it possible that nothing changed and the system is still in the vertical blanking interval, when DwmFlush instantly returns?  (It takes about a millisecond or so to exit VBlank)

I anxiously await any trybuilds with various flags to try new things (including possiblly an enable/disable of DwmFlush).
DwmFlush is supposed to return after the next composition event - so unless it's compositing multiple times in a single VBlank (which I guess it might be, if things are particularly messed up) it shouldn't return until during or after the next VBlank.
We can't call DwmFlush on the main thread. That will kill us.

We can call it on the OMTC thread, but the catch is that there's only one OMTC thread running one compositor per window. So we have to be clever.
Even as we figure out all these settings -- 

Meanwhile, it appears that the new recommended default phasePercentage should now become 0, until further notice. Ideally in the public FireFox 24 beta, too.  As it stands, FireFox is still struggling (relative to other browsers) in maintaining synchronization with VSYNC, so every little bit helps -- with the current rendering  architecture it appears to give it an extra millisecond of rAF() headroom before things start to stutter (that's a huge amount of headroom when a 120Hz frame is only 8 milliseconds long).
I think comments on new changes should move to bug 689418.
Mark, the current layout.frame_rate.vsync.phasePercentage pref apparently already supports -99 .. +99 (modulus 100 keeps sign). The actual delay used is still never negative.

I'll post the prints+live prefs try build soon.
Adds by default:
- Prints -/+ to the console on detected missed/duplicate frames (based on bug 900785).
- Prints info on timer resolution changes by firefox (though meaningless if other apps set it to hi-res). The timeBegin[/end]Period prints are the actual changes.

Unmodified existing prefs - turned live (now don't require restart):

- layout.frame_rate [integer: defaults to -1]:
  -1 : Auto (vsync intervals if available, 60hz otherwise).
   0 : ASAP (iterates as fast as possible).
  >0 : Any manually chosen rate.

- layout.frame_rate.vsync.phasePercentage [integer: defaults to 10]
  - Only effective when vsync is used. Accepts -99 .. 99 for % interval to add/reduce from the delay. Obviously, if the final delay is <0, then 0 is used.


New prefs (all live):

- layout.frame_rate.debug.hires-timer-grace-ms [integer: defaults to 90000]
  - The amount of ms to wait after high-res times are no longer required, before dropping back to low-res timers.
  - Without this patch, it's hardcoded to 90s. I think we can live with 1s.

- layout.frame_rate.debug.sched-after-tick [bool: defaults to false]
  - If true, will calculate and schedule the next refresh driver tick at the end of the tick, rather at the begining.
  - Note: somewhat buggy (when set to true) - Firefox screen updates may stop, e.g. after a JS alert box. Click outside firefox and then click firefox again to redraw firefox.

- layout.frame_rate.debug.delay-override [integer, defaults to 0]
  - if >0, will override/set the timer delay in ms regardless of the value of layout.frame_rate.
  - Note that while it sounds similar to layout.frame_rate>0, it's not. The reason is that with manual frame rate (e.g. 60), it tries to stick to an imaginary vsync signal beating in whatever the frequency is (and with vsync, to an actual vsync signal). So it may use a shorter delay if the next (virtual) vblank happens soon. But this delay override ignores all other timing aspects (rate, phasePercentage, etc) and just uses a fixed delay. Might be useful with DwmFlush (see next).

- layout.frame_rate.debug.dwm-flush.bits [integer, defaults to 0]
  - bit-wise value where each bit triggers DwmFlush in a different place:
    1: At the begining of a refresh driver tick.
    2: At the end of a refresh driver tick.
    4: Before calling Present.
    8: After calling present.

  Note that blocking the main thread on DwmFlush is a pretty bad idea, especially if it blocks for most of the cycle. It _might_ be useful if we know there's not a lot of time to wait till the Flush - so it's just used for slight alignment. See next pref.

- layout.frame_rate.debug.dwm-flush.min-percentage [integer, defaults to 0]
  - Only call DwmFlush if we're past that percentage of vsync interval (and dwm-flush.bits matches).
  - E.g. if 0 - will always call DwmFlush.
  - E.g. if 80 (and assuming monitor is 60hz) - will call DwmFlush only if the next vsync is within less that 20% of 16.67ms.

- layout.frame_rate.debug.dwm-flush.print [bool: defaults to false]
  - If set to true, prints to the console the amount of time DwmFlush waited, per flush call.
  - If set to true, prints "." to the console if DwmFlush was skipped due to dwm-flush.min-percentage.

Notes:
- Not all missed frames are detected, especially when using DwmFlush in various places.
- To open a console with firefox, run Firefox.exe -console
- There's an overhead to reading prefs few times per cycle, reading timestamps, and printing to console (especially when it scrolls). Hopefully the overheads are not too big, and by default it doesn't spam the console. Console overhead could be avoided by running without a console.
- Theoretically all combinations are possible, though I haven't tested prefs cobminations extensively.

- Hopefully, playing with some of these prefs would point us either at low hanging fruites, or at places where vsync timing is more important.


Of the combinations which I tested, the following one is particularly interesting for two reasons: It's much simpler than the current implementation on m-c, and in my tests it performed better (smoother) on the content-related scenarios which I tested:

- Fixed delay of just less than the interval (layout.frame_rate.debug.delay-override=15 or 16 for 60hz monitors).
- Together with layout.frame_rate.debug.dwm-flush.bits=4 (DwmFlush() before present).


Knock yourself off.

Try build should be available soon at https://tbpl.mozilla.org/?tree=Try&rev=45edf0a9c037
(In reply to Avi Halachmi (:avih) from comment #127)
>
> Knock yourself off.

Out!

And the try build is ready.
Thanks Avi!
This will take me several hours, so it may take several days before I get to the analysis (since this is free time stuff)

Meanwhile, there's a related development:
I have done testing of 120fps <VIDEO>, with some surprising success:
http://www.blurbusters.com/faq/real-time-120fps-video/ 
IE10+, Chrome, and FF24+ are able to play real-time 120fps H.264 video files successfully today on 120Hz computer monitors.  There are some microstutter issues sometimes, but pause-then-play seems to fix them. 

As an aside: The rAF() counting technique can actually be a means of detecting a 120Hz refresh rate of web visitors (as shown by www.testufo.com) and can mean that JavaScript code can present realtime 120fps videos only to 120Hz visitors.  Apparently the videos play fine at 120fps in IE10, too.  

I just wish the IE10+ 105fps rAF() framecap didn't exist (Shame, shame Microsoft!)  I'm going to post a message on public-web-perf@w3.org soon about recommending against framecaps being built into browsers; since Microsoft is now the only web browser that does not support 120fps@120Hz; after successful 120fps rAF tests with Chrome, Opera15+, FF24+, Safari6+.  If you want to be involved in my attempt to amend a w3c standard towards 120Hz-friendliness, make sure you click subscribe to http://lists.w3.org/Archives/Public/public-web-perf/ so you can participate in this discussion.
Adding a comment here from a reference to this thread from comment #52 (by Avi Halachmi) in #894128 -- please make sure that any new vsync implementation in FF is stress tested and validated against http://www.duckware.com/test/chrome/jerky3.html
Flags: needinfo?(bas)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: