OPNSense Gateway Packet Loss

Has anyone else noticed an oddity where OPNSense shows significant packet loss going through the invisagig? I’m not seeing this from other devices and seems exclusive to dpinger in opnsense and the invisagig.

To be clear, I do NOT notice any packet loss or latency from computers that go through opnsense → invisagig → internet.

I have a monitor IP setup in opnsense to 208.67.222.222 (I can change this to anything else and see the same issue) and this is my average result:

RTT: 19558.3ms
RTTd: 1403.4ms
Loss: 23.0%

If I ping from my computer going through the same path, I get 23ms to the same IP.

Normally, I wouldn’t care too much about this, however, this is how I detect failover and the only way to keep the interface up with what appears to be dpinger acting up. I’ve fully factory reset my invisagig just to make sure it wasn’t something from going through the beta chain but the issue persists.

I would just chalk this up to dpinger in opnsense as I’ve seen others report oddities with it however, every other device that I’ve hooked up is working without fault so figured I’d start here.

I’m running v1.0.12. I don’t recall seeing this issue in the betas. It has been happening for quite a while now but I’m just now starting to dig into it. I will be on vacation for the next week so won’t have time to dig into it until the following week but wanted to see if there were any ideas as to what is going on.

-Josh

Howdy!

Given that your LAN clients cannot replicate the issue I would agree that it seems dpinger might be having some issue. I have seen other threads like this one over the past 6 months or so of OPNSense users reporting similar things: Dpinger make a mess on latest release

Outside of a dpinger bug, it might also be helpful to reference our tutorial on pfSense Optimal Configuration as most of these recommendations can be implemented under OPNSense as well.

Specifically, if you are using the default Bridge Mode (IP Passthrough) I would definitely recommend statically assigning the OPNSense WAN MAC in the IG configuration which is covered in the tutorial here. This may help in the case dpinger is having some issue that has its source at layer 2 (ARP cache, etc.).

Also in the pfSense guide, there is a section on how to check if your WAN has Flow Control enabled and how to disable it as it has been known to sometimes increase latency.

The larger issue found in the last couple years is that carrier deprioritization algorithms, especially observed on T-Mobile, are increasingly dropping ICMP and UDP traffic at seemingly random intervals. I am not sure why they do this but it may be to mitigate disruptive DDOS activity from bad actors. Regardless, it is quite annoying for those that utilize failover monitoring that requires reliable ICMP replies. One suggestion would be to increase the payload size of the ICMP packets that dpinger sends as by default the payload size it sends is zero which is historically more likely to be dropped. More info can be found in the OP and first reply to this thread:
https://www.reddit.com/r/opnsense/comments/10b9nlz/gateway_monitoring_without_icmp/

I hope this information is helpful to you :slight_smile:

1 Like

I stumbled across this solution for dpinger on pfSense with T-Mobile a few months ago. The symptom was pings from dpinger were failing but pings from the pfSense command line to the same address were fine. Changing the dpinger payload size to 56 for my gateways has helped a lot.

3 Likes

Fantastic! Thank you for confirming this worked for you and providing the specific value you settled on. Hopefully it will help the OP and others as well :slight_smile:

Just got back from vacation last night so had time to tinker this morning. I found that demyers suggestion of setting the payload size to 56 did in fact resolve the issue (now I will play around with finding the lowest working payload size). I do have T-Mobile so this is apparently related to something that they are doing on their end. Default payload size in OPNSense is 1 and not 0 like pfSense but apparently 1 wasn’t a large enough payload to not have Tmo muck things up. This must have been something Tmo implemented in my area within the past 2 months since it’s a newer issue.

Thanks for the advice all!

2 Likes