Proxmox 5x “e1000 driver hang” fix

Problem:

A while back (possibly after an update is the theory) I started to experience proxmox servers going down hard at our datacenter, no response on network, and the only thing that would get them back was a hard reset. This would happen up to several times a day at seemingly random!

After investigating the logs, these stood to me:
Especially line #7

...
Aug  9 22:10:36 vm6 kernel: [613016.212489] e1000e 0000:00:1f.6 enp0s31f6: Reset adapter unexpectedly
Aug  9 22:10:36 vm6 kernel: [613016.212596] vmbr0: port 1(enp0s31f6) entered disabled state
Aug  9 22:10:40 vm6 kernel: [613020.202972] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Aug  9 22:10:40 vm6 kernel: [613020.203105] vmbr0: port 1(enp0s31f6) entered blocking state
Aug  9 22:10:40 vm6 kernel: [613020.203183] vmbr0: port 1(enp0s31f6) entered forwarding state
Aug  9 22:10:42 vm6 kernel: [613022.133940] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:0
...

The issue seems that after a kernel/driver update, the interface going into a perpetual hang state, and crashes..

Solution:

The solution i have found to work across multiple different servers is using the “ethtool” to set some values to prevent this from happening.*

Disable the following:

GSO (generic-segmentation-offload)
GRO (generic-receive-offload)
TSO (tcp-segmentation-offload)
TX (tx-checksumming)
RX (rx-checksumming)

ethtool -K <INTERFACE> gso off gro off tso off tx off rx off

and also disable “pcie power saver”:

pcie_aspm=off

Create a script to automate this for us

Since this has to be done for the WAN facing interfaces (not loopback etc) and only (and everytime) an interface connects, we can utilize the scripts in /etc/network/if-up.d/ to do the job 🙂

Now run the following command to create a new script file:

nano /etc/network/if-up.d/hangfix-ifup && chmod +x /etc/network/if-up.d/hangfix-ifup

Insert this into the file:

#!/bin/sh -e

if [ "$IFACE" = "YOUR-INTERFACE-NAME-HERE" ]; then
	/sbin/ethtool -K $IFACE gso off gro off tso off tx off rx off
	pcie_aspm=off
fi

exit 0

2 Comments

Niels says:

June 23, 2020 at 9:32 AM

Hi,

Thanks for the solution!

Maybe easier is to put the command in the post-up in /etc/network/interfaces.

Like this:
iface enp0s31f6 inet manual
post-up ethtool -K enp0s31f6 tso off gso off

- Niels says:
  
  June 24, 2020 at 11:49 AM
  
  Hi Niels,
  (Great name btw)
  
  Thank you so much for your comment, and I am glad it could help you.
  That is not a bad idea at all, and much simpler, have you tried this with good results?
  Also if the interface re-establishes, etc?
  
  Kind regards
  Niels

Problem:

Solution:

Create a script to automate this for us

Previous PostAppVeyor OnPremise Ubuntu 18.x installation

Next PostHow to exit the VI/VIM editor

2 Comments

Leave a Reply Cancel Reply

About me

More about me

Links

Lol

Proxmox 5x “e1000 driver hang” fix

Problem:

Solution:

Create a script to automate this for us

Previous PostAppVeyor OnPremise Ubuntu 18.x installation

Next PostHow to exit the VI/VIM editor

2 Comments

Leave a Reply Cancel Reply

About me

Top tags

More about me

Links

Lol