Connection to UDM Pro (Router) Drops But Direct Laptop Connection Still Works

After getting my invisagig and loving it for a few days, I’m starting to see stability issues while attached to the WAN2 port of my Unifi Dream Machine Pro. Looking for advice.

For almost 2 years now I’ve used a TMobile hotspot with an ethernet port (Inseego M3000) as the WAN2 for my UDMP. Other than a mediocre signal and not being able to select the bands I wanted, this worked flawlessly. My hope in buying the invisagig was to solve those 2 problems and it does so VERY well!

My WAN2 is set up for failover, but I have several clients routed to always and only use WAN2, this has been my setup for years. However 4 times now in just the week I’ve had my invisagig, those clients lose (almost) all connection to the internet after working perfectly for hours or a day or two, it’s like a switch is flipped. It’s happened in the middle of the day and the middle of the night.

I say “almost” all, because pings still work sporadically, and while I can can’t ever seem to load an HTTP website (even icanhazip.com) in this state and speed tests fail with no connection, speedtest-cli will show a download of 0.6mbps.

The interesting part is that if I unplug the ethernet cable going into WAN2 and put it into my laptop with WIFI off, I get a full connection to the internet as if everything is working properly without rebooting the invisagig or anything. If I plug it back into WAN2, I get nothing out of the clients using that connection.

Things I’ve noticed:

  • This seems to happen around the modem changing bands
  • The modem still says it can access the internet fine
  • It happens on both SA and NSA
  • The laptop when plugged in directly gets an IPv4 and IPv6 address, the UDMP only gets an IPv4 address.

Things I’ve tried:

  • IPPT on and off, it seems to happen less with IPPT OFF, but that might just be from my small sample size of events and not relevant. I currently have it OFF and just landed in this state again after 56 hours of perfect uptime during which no network changes were made.
  • Watchdog on and off. Typically, watchdog typically doesn’t see the connection is down and doesn’t do anything. But I’ve also seen this once after a watchdog reboot. The modem noticed the connection dropped, rebooted and said connected, but then I still had no internet on WAN2.
  • Rebooting the invisagig. This seems to have no effect 80%-90% of the time, then suddenly on one of the reboots it starts working again. I thought unplugging waiting a few seconds and replugging the modem in had a higher success rate, so I installed a smart plug on the modem’s power supply, only to have a physical hard reset not work 5 times in a row, so I ditched this idea.

I can’t seem to figure out what actually solves the problem. Reboots seem to solve the problem eventually but sometimes it takes 10 of them in a row! This, and the fact it seems to happen around the time the modem changes bands, makes me point my finger at the invisagig. But then plugging the laptop in gets me a connection instantly which makes me want to point my finger at something on the UDMP side.

Looking for any and all advice and happy to provide any additional data I can to get to the bottom of this!

1 Like

Additional info:

I’ve rebooted the UDMP several times to no avail. I have NOT rebooted the invisagig at all since yesterday because I wanted to see if it “fixed itself” overnight, but it did not.

I just did further testing on a machine that is forced over WAN2 and found something really odd. It appears HTTP is working ok (latency seems high though) but HTTPS to the same servers fail with an unexpected eof error.

root@test:~$ curl http://google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
root@test:~$ curl https://google.com
curl: (35) error:0A000126:SSL routines::unexpected eof while reading
root@test:~$ curl https://google.com
curl: (35) error:0A000126:SSL routines::unexpected eof while reading
root@test:~$ curl https://icanhazip.com
curl: (35) error:0A000126:SSL routines::unexpected eof while reading
root@test:~$ curl http://icanhazip.com
172.59.XXX.XXX
root@test:~$ curl http://icanhazip.com
172.59.XXX.XXX
root@test:~$ curl http://icanhazip.com
172.59.XXX.XXX
root@test:~$ curl http://example.com
<!doctype html>
<html>
<head>
    <title>Example Domain</title>

    <meta charset="utf-8" />
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;

    }
    div {
        width: 600px;
        margin: 5em auto;
        padding: 2em;
        background-color: #fdfdff;
        border-radius: 0.5em;
        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
    }
    a:link, a:visited {
        color: #38488f;
        text-decoration: none;
    }
    @media (max-width: 700px) {
        div {
            margin: 0 auto;
            width: auto;
        }
    }
    </style>
</head>

<body>
<div>
    <h1>Example Domain</h1>
    <p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>
    <p><a href="https://www.iana.org/domains/example">More information...</a></p>
</div>
</body>
</html>
root@test:~$ curl https://example.com
curl: (35) error:0A000126:SSL routines::unexpected eof while reading

curl verbose:

root@test:~$ curl -v http://google.com
*   Trying 74.125.138.138:80...
* Connected to google.com (74.125.138.138) port 80 (#0)
> GET / HTTP/1.1
> Host: google.com
> User-Agent: curl/7.81.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 301 Moved Permanently
< Location: http://www.google.com/
< Content-Type: text/html; charset=UTF-8
< Content-Security-Policy-Report-Only: object-src 'none';base-uri 'self';script-src 'nonce-iuscjF1TkNLp8-YcbNH28A' 'strict-dynamic' 'report-sample' 'unsafe-eval' 'unsafe-inline' https: http:;report-uri https://csp.withgoogle.com/csp/gws/other-hp
< Date: Wed, 11 Dec 2024 13:19:21 GMT
< Expires: Fri, 10 Jan 2025 13:19:21 GMT
< Cache-Control: public, max-age=2592000
< Server: gws
< Content-Length: 219
< X-XSS-Protection: 0
< X-Frame-Options: SAMEORIGIN
<
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
* Connection #0 to host google.com left intact


root@test:~$ curl -v https://google.com
*   Trying 74.125.138.102:443...
\* Connected to google.com (74.125.138.102) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.0 (OUT), TLS header, Unknown (21):
* TLSv1.3 (OUT), TLS alert, decode error (562):
* error:0A000126:SSL routines::unexpected eof while reading
* Closing connection 0
curl: (35) error:0A000126:SSL routines::unexpected eof while reading

The exact same command when this machine is routed over WAN1:

root@test:~$ curl -v https://google.com
*   Trying 142.250.105.139:443...
* Connected to google.com (142.250.105.139) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS header, Finished (20):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.2 (OUT), TLS header, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=*.google.com
*  start date: Nov  4 08:37:47 2024 GMT
*  expire date: Jan 27 08:37:46 2025 GMT
*  subjectAltName: host "google.com" matched cert's "google.com"
*  issuer: C=US; O=Google Trust Services; CN=WR2
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* Using Stream ID: 1 (easy handle 0x56550edbfeb0)
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
> GET / HTTP/2
> Host: google.com
> user-agent: curl/7.81.0
> accept: */*
>
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
< HTTP/2 301
< location: https://www.google.com/
< content-type: text/html; charset=UTF-8
< content-security-policy-report-only: object-src 'none';base-uri 'self';script-src 'nonce--eaaPBdTJL3RhWK4rsxKgQ' 'strict-dynamic' 'report-sample' 'unsafe-eval' 'unsafe-inline' https: http:;report-uri https://csp.withgoogle.com/csp/gws/other-hp
< date: Wed, 11 Dec 2024 13:26:11 GMT
< expires: Fri, 10 Jan 2025 13:26:11 GMT
< cache-control: public, max-age=2592000
< server: gws
< content-length: 220
< x-xss-protection: 0
< x-frame-options: SAMEORIGIN
< alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
<
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.google.com/">here</A>.
</BODY></HTML>
* Connection #0 to host google.com left intact

This is getting stranger and stranger. Does the watchdog attempt connection to any HTTPS servers? If not, could that be added?

1 Like

Howdy! Thanks for all the detail on the issue you’re experiencing. Hopefully we can help you make some sense of the issue and ultimately resolve the root cause :slight_smile:

There are certainly a lot of variables here but if you could start by sharing a screenshot of your Modem Info screen (the logged out version with your sensitive information redacted) and confirming what specific plan your SIM card is provisioned with (T-Mobile Home Internet, T-Mobile Business Internet, Magenta Business Tablet, etc.) this will help us greatly. Since you are using UBNT I would also recommend ensuring you have the latest UniFi updates installed as recently the Network Application in particular has had some known bugs with dual-WAN configurations.

Based on your description it seems like you are using a tablet data plan but may not have the InvisaGig configured for such a plan. The smoking gun that most always points to this is the 0.6Mbps download speed you are seeing when running a speedtest. This is frequently the behavior observed when an incorrect Carrier Profile is selected. Under the latest IG software (v1.0.12) if you are using a T-Mobile tablet plan, you would select a Carrier Profile of ‘T-Mobile - Generic Hotspot’ if you are using such a tablet plan.

Another cause for this behavior can be that the maximum number of entries is being reached under your router’s connection tracking and/or firewall tables. Under PFSense we recommend to increase the latter but I’m not sure if the UniFi platform has a documented way of accomplishing this under their standard user interface. However, given the very specific 0.6Mbps download speed you are seeing I have a stronger suspicion that an incorrect Carrier Profile is more likely than than this possibility.

The curl success of ‘http://google.com’ is likely due to the tiny size of the page (it only contains a very small amount of HTML required for the 301 redirect to ‘http://www.google.com’ and subsequent redirects to the local, native HTTPS CDN host for your region). This small amount of data would also be why curling http://icanhazip.com succeeds as well. The curl failure of ‘https://google.com’ is likely due to the TLS certificate(s) included in the TCP headers as part of the handshake process which uses more data.

Hi Ryan,
Here is my Modem Info Screen

My SIM was obtained through Calyx. The SIM came installed in the Inseego M3000 I mentioned that is branded with TMobile splash screens and such. It is an unlimited data plan paid yearly. I don’t think this SIM would be provisioned with a tablet plan, but I could be wrong here.

All Unifi updates installed, always kept up to date.

Just to make sure I was clear originally, note that the 0.6mbps was not from the speedtest website or any of the mobile apps. All of those failed to connect 100% of the time while in this state, the speedtest website won’t even load because the HTTPS handshake can’t even complete. Only the speedtest-cli command from a linux machine would get 0.6mbps in this state (several times it wouldn’t even connect either).

That’s interesting about the firewall’s connection tracking, however is there a reason I would have immediately started seeing this daily with the InvisaGig but never saw it across nearly 2 years of using the Inseego M3000 in the same set up?

I was thinking the nearly the same with regards to page size, but I wanted to test that specifically by finding a large HTTP file download, which I haven’t got to do yet because everything is working again now (more on that below). I’ll do that the next time I get in this state. I was really leaning more towards it having something to do with the multiple round trip connections in the handshake process, vs a single connection for a standard HTTP download, or something odd having to do with IPv4 over IPv6 and packet size.

But why do all these issues disappear when plugged directly into a laptop (without rebooting the InvisaGig) if it was something to do with the carrier profile, or multiple connections, or IPv4 over IPv6?

Does the InvisaGig watchdog try any HTTPS connections in it’s connectivity check or is it just using ICMP? I think this would be valuable to add, if not.

As you can see in the above screenshot, IPPT is now back on. Earlier today I got tired of WAN2 being down and wanted to try something to see if I could gather more info. I turned on IPPT, the InvisaGig rebooted as required and everything was back to perfect with full speeds for my area of 300-400mbps down and 20-30mbps up. So while I’m glad it’s up, I can’t gather any more info to debug at the moment. I have added my own HTTPS connection watchdog on a machine that is routed over WAN2 however, so hopefully I will be alerted immediately the next time this happens.

Here is the regular HTTP response time over WAN2 (with yellow bars as failures) loading http://example.com, I bet you can tell when I rebooted the InvisaGig to turn on IPPT :sweat_smile::

1 Like

Thank you for the additional info! You are correct that Calyx should not be a tablet plat but the IG should be using the IMEI of the original Calyx device to avoid potential issues. While the IG is being used the Calyx device cannot be used on the T-Mobile network.

You were clear on the speedtest-cli point, I understand that this is where the 0.6Mbps result was coming from :slight_smile:

Unless your Calyx plan includes a static IP you shouldn’t necessarily need IPPT enabled and it should not make a large speed difference but if you do enable it I would suggest setting the MAC address manually and not allowing the IG to automatically detect it. You should ensure the proper MAC address from your WAN2 is the one used during the IPPT configuration on the IG. If this is not done, you may have connection issues as MAC auto detection for IPPT does not work properly with network interfaces that come off of a switch like the UDM SE and Pro devices.

The WatchDog uses diverse methods to check connectivity but as you have seen when plugging it into a laptop the connection is likely not actually down but the connected device appears to be having issues passing data. This is to say when the UDM Pro is not passing data, the modem itself and thus the WatchDog may not be seeing an issue internal to the InvisaGig.

Is there anything else that has changed recently (updates installed, configuration updates, etc.) since this behavior started?

IMEI was copied over on first boot.

Calyx does not include a static IP to my knowledge. I’ve only been turning IPPT on and off because sometimes when multiple regular reboots does not bring me out of this degraded state, toggling IPPT and doing the required reboot does get things working again.

I have not specified the MAC manually, but I can do so. Note that this degraded state happens regardless if IPPT is on or off so I don’t think this would be the silver bullet as I wouldn’t think it would matter when IPPT is off. I’ll do it though.

Nothing in my config has changed, this behavior started essentially within the first 24 hours of switching from the M3000 to the InvisaGig. The first time…and second I just did a few reboots until it started working again thinking it was something odd with me enabling and disabling so many bands while doing initial testing. But it’s kept happening every 24-48 hours since so I’ve started pulling the telemetry from the InvisaGig and logging it so I can compare timestamps to when things start failing with my own watchdogs, this is why I noted it seemed to happen around the modem changing bands but there is only 1 data point there, the latest outage (and it has done plenty of other band changes without issue).

1 Like

In addition to ensuring a static MAC is set for IPPT, please share a screenshot of the WatchDog action log the next time this issue should occur (I know so far you said WatchDog hadn’t showed an action but just want to see if it does at any point). Thank you!

One additional observation: I see you have limited the Network Mode to 5G SA only and that the current n41 band has a very weak connection. I would be curious to see if disabling 5G SA and allowing only NSA makes any difference.