Detecting MTU sizing issues with OpenVPN on DD-WRT
Choosing the right DD-WRT firmware version for a router has never been an easy task. After the much anticipated v24 stable release got out (v24 SP1, around 9 years ago), the community continued to push for new features, support for different models and improved performance. But when all you had were 54Mbps (802.11g), wireless tuning was crucial. If any build got out with the promise of increased wireless speeds, everyone would upgrade immediately. New beta builds started popping up more frequently but the chance of picking up a random one that would brick your device was quite high. The community was testing builds fresh out of the development trunk and so months (or years?) later and after numerous requests, the Recommended Router Database was born to help guide users for the best firmware for their router.
The DD-WRT firmware became so popular that vendors like Linksys and Buffalo created partnerships to launch products tailored for DD-WRT. While flashing and general support got much better with time, getting wireless radios to work properly didn’t. Eventually the Recommended Router Database stopped being updated and the official wiki suffered the same fate. Unless you had time to scour through hundreds of forum posts, experimentation was your only hope. But others felt the same pain and started documenting their results. Steve Jenkins did so for his Cisco E4200, a router that I continue to own and operate for fun projects and experiments.
It has been a while since I last updated a DD-WRT based router. I’ve moved all my gear to Ubiquiti UniFi and I couldn’t be more satisfied with it, but I still rely on an old Netgear WNDRv3700 for a site-to-site VPN between my house and my parents'. This is great for backups and sharing resources, like printers or media storage. The radios had been very stable and I remember it took me a while to get the right settings for that to happen, but I decided to give it a try after the KRACK vulnerability was published and see where it would take me.
To avoid issues with older configurations values, I did a reset to factory defaults. By trial and error, last year I was forced to stop on 11-14-2016-r30880 as OpenVPN got upgraded to the 2.4.x branch and after that revision it stopped working correctly due to the the lack of support by the OS for --mtu-disc
. I’ve since learned that defining the socket option could have probably helped me workaround the issue.
This time I’ve flashed build 11-16-2017-r33772. I was very happy to see it boot! OpenVPN was remarkably up-to-date (2.4.4), the latest at the time of writing, up from 2.3.5 with r30880, and apparently a connection was being established correctly to the OpenVPN server - I was impressed. But not for long…
The VPN connection was very slow and would freeze constantly. The WNDRv3700 router is a capable device, but I can max out its CPU (at 99.9% usage) if a lot of VPN bandwidth is consumed. In this case, regular browsing over VPN shared services would trigger the same behavior. I wondered what could be causing this.
After a lot of digging and configuration changes back and forth, I started suspecting that the MTU (the Maximum Transmission Unit) was incorrectly set. The basic rule is that the longer the MTU, the better for performance, but the worse for reliability. Could this be it?
I turned to the ping command on macOS to test if I was able to reach a host behind the VPN using a standard MTU of approximately 1500 bytes. If you consider an overhead of 18 bytes (header and trailer), you should be able to send an un-fragmented message 1450-1480 bytes long. To test that, I turned to the ping
command. It can be configured to send one message (-c 1
) with a size of 1470 bytes (-s 1470
) and with the Don’t Fragment bit set so that routers in between don’t attempt to fragment it.
To my surprise, when I tried the following command, it failed:
ping -D -v -s 1470 -c 1 192.168.1.8
ping: sendto: Message too long
That was unexpected! Even by lowering the message size to 1450 bytes, it still didn’t work, so I tried lowering it in bigger increments. To my surprise, the maximum size I was able to send was 290 bytes - almost 5x less than a healthy connection was supposed to achieve.
While I haven’t been able to find out the root cause of the MTU sizing issue - forcing me to revert to r30880 in the meantime - I got closer to it.