What makes an NTP server a good server, when is it 'good enough'?

In another thread @stevesommars linked to an interesting paper of his that explored some aspects of long-term monitoring:

One thing I recall noticing is that occasionally some servers gave very incorrect time, off by years.

But I feel like you could, with enough data, probably make a statement like, “The 90th-percentile accuracy of pool servers was within 250ms” or something. (To be clear, that statistic is a completely made-up example.)

I wonder if the pool monitoring data used for graphs could give some answers here?

Dave’s comment about root dispersion as a reasonable benchmark, and my side project to get all the stuff I manage into Ansible, led me to grab this data:

% ansible all -m ansible.builtin.shell -a "sudo chronyc tracking | grep dispersion"
nyc-2g | CHANGED | rc=0 >>
Root dispersion : 0.004316320 seconds
mia-2g | CHANGED | rc=0 >>
Root dispersion : 0.001929239 seconds
las-2g | CHANGED | rc=0 >>
Root dispersion : 0.001757243 seconds
lux-2g | CHANGED | rc=0 >>
Root dispersion : 0.000888808 seconds
mumbai-1g | CHANGED | rc=0 >>
Root dispersion : 0.001670093 seconds
korea-1g | CHANGED | rc=0 >>
Root dispersion : 0.000842893 seconds
singapore-ls | CHANGED | rc=0 >>
Root dispersion : 0.000610336 seconds
capetown | CHANGED | rc=0 >>
Root dispersion : 0.001687264 seconds
malaysia-ptp | CHANGED | rc=0 >>
Root dispersion : 0.000001237 seconds
sao-paulo | CHANGED | rc=0 >>
Root dispersion : 0.001140296 seconds
bangalore-do | CHANGED | rc=0 >>
Root dispersion : 0.002006466 seconds
singapore-do | CHANGED | rc=0 >>
Root dispersion : 0.001366011 seconds

With the exception of malaysia-ptp, these are all stratum 2+ servers getting time over the Internet. At the same time, they’re all VMs inside data centers where I’ve taken some care to configure them with several good, nearby NTP sources. Someone using wifi on a laptop through their cable modem would probably have worse numbers.