Fixed a major problem with the computers at the office today. We currently have our users authentication handled by LDAP and their home directories stored in an NFS server. This is a pretty typical tried and tested scenario in many places.
Things were working fine for a while but for the past week or so, we started to experience mysterious problems. If only one person was logged in, everything was fine. But the moment another person logged into another machine, the NFS clients would hang and the server would experience 75% write disk activity.
Searching around suggested several remedies, none of which worked. A lot of the remedies were also particular to NFS3 only. However, after reading the man page for NFS4, we discovered that there was a problem with client address. Quoting the man page for nfs:
Specifies a single IPv4 address (in dotted-quad form), or a non-link-local IPv6 address, that the NFS client advertises to allow servers to perform NFS version 4 callback requests against files on this mount point. If the server is unable to establish callback connections to clients, performance may degrade, or accesses to files may temporarily hang.
If this option is not specified, the mount(8) command attempts to discover an appropriate callback address automatically. The automatic discovery process is not perfect, however. In the presence of multiple client network interfaces, special routing policies, or atypical network topologies, the exact address to use for callbacks may be nontrivial to determine.
We do not yet rightly know how things changed recently but previously, the NFS4 clients reported the correct client IP address to the server. However, when we checked the current machines, they were all reporting a 0.0.0.0 address.
Adding the correct static IP address to /etc/fstab by using the clientaddr option solved the problem. Now, things are back to normal again.
Update: It turns out that it’s related to this bug-report.