NTP synchronization is an important part of all network configurations. Nuage Networks VCS, VNS, and VSS deployments are no exception. MetroÆ will configure Nuage component VMs for NTP sync during installation and upgrade. MetroÆ health checks also verify that the Nuage components have achieved NTP sync before declaring the job complete.
Sometimes, elements outside of the scope of MetroÆ can have negative effects on the ability of Nuage components to achieve NTP sync. Our team experienced this recently in our automated testbed.
In the lab, MetroÆ is thoroughly tested using a suite of tests configured and controlled using a combination of a Jenkins CI/CD server (https://jenkins.io/) and the Robot test framework (http://robotframework.org/). We noticed that, over time, the duration of our KVM-based installation and upgrade tests were gradually increasing. Checking the test logs, we discovered that NTP sync achievement for the VSCs on KVM was taking 10, 15, sometimes 20 minutes! In fact, we were seeing some tests fail because NTP sync was not achieved in a 30-minute timeout period.
As we scratched our heads, puzzled by this lengthy NT sync behavior, we saw an online conversation between some Nuage deployment experts. They had observed that when they had VSC NTP sync failures, there was a large clock skew between the VSC and the hypervisor hosting it. When we checked, we found that the KVM hypervisors themselves were *not* configured for NTP synchronization. There was, indeed, a growing clock skew. As the time difference grew, NTP sync times went up.
Solution: We modified the KVM hypervisors to sync to the same NTP server as we use for the VSC. Now synch is achieved on the first try!