Connectivity Troubleshooting

Agents not connecting? Routes not showing up? Step through these diagnostics to find the problem.

Quick checks:

# Is the agent healthy?
curl http://localhost:8080/healthz | jq '{peers: .peer_count, routes: .route_count}'

# Can you reach the peer?
nc -zv peer-address 4433

# Are certificates valid?
muti-metroo cert info ./certs/agent.crt

Diagnostic Tools

CLI Commands

Use the built-in CLI commands for quick diagnostics:

# Check overall agent status
muti-metroo status

# List connected peers with state and RTT
muti-metroo peers

# View routing table with hop counts
muti-metroo routes

# Test connectivity to a listener before deployment
muti-metroo probe server.example.com:4433
muti-metroo probe --transport h2 server.example.com:443

# Ping through the mesh to test exit connectivity
muti-metroo ping abc123 8.8.8.8

HTTP API

For scripting and monitoring:

# Basic health check
curl http://localhost:8080/health

# Detailed status with counts
curl http://localhost:8080/healthz | jq

# Expected output:
{
  "status": "healthy",
  "running": true,
  "peer_count": 2,
  "stream_count": 10,
  "route_count": 5
}

# List all known agents
curl http://localhost:8080/agents | jq

# Trigger route refresh
curl -X POST http://localhost:8080/routes/advertise

Peer Connection Issues

Can't Connect to Peer

Step 1: Check network reachability

# TCP (for HTTP/2, WebSocket)
nc -zv peer-address 4433
telnet peer-address 4433

# UDP (for QUIC)
nc -zvu peer-address 4433

Step 2: Check DNS resolution

dig peer-hostname
nslookup peer-hostname

Step 3: Check firewall

# On peer host
sudo iptables -L -n | grep 4433
sudo ufw status

# Try from another host on same network
curl http://peer-address:8080/health

Step 4: Check TLS

# Test TLS connection
openssl s_client -connect peer-address:4433 -CAfile ca.crt

# Verify certificate
muti-metroo cert info ./certs/agent.crt

Peer Disconnects Frequently

Check keepalive settings:

connections:
  idle_threshold: 30s    # Send keepalive after 30s idle
  timeout: 90s           # Disconnect after 90s no response

If network is slow, increase timeout:

connections:
  idle_threshold: 60s
  timeout: 180s

Check logs for disconnect reasons:

journalctl -u muti-metroo | grep -i "disconnect\|timeout"

Slow Reconnection

Tune reconnection backoff:

connections:
  reconnect:
    initial_delay: 500ms  # Start faster
    max_delay: 30s        # Cap sooner
    multiplier: 1.5       # Slower backoff
    jitter: 0.3           # More randomization

Transport-Specific Issues

QUIC Not Working

QUIC uses UDP, which may be blocked or throttled.

Test UDP connectivity:

# From client
echo "test" | nc -u peer-address 4433

# On server, check if receiving
tcpdump -i any udp port 4433

Common issues:

Corporate firewalls block UDP
NAT devices timeout UDP quickly
Some ISPs throttle UDP

Solution: Fall back to HTTP/2 or WebSocket:

peers:
  - id: "..."
    transport: h2    # Instead of quic
    address: "peer-address:443"

HTTP/2 Not Working

Test HTTP/2:

curl -v --http2 https://peer-address:8443/mesh

Check TLS:

openssl s_client -connect peer-address:8443 -alpn h2

WebSocket Through Proxy

Test proxy connectivity:

# Test CONNECT through proxy
curl -v --proxy http://proxy:8080 https://peer-address:443/

# Check if proxy allows WebSocket upgrade
curl -v --proxy http://proxy:8080 \
  -H "Upgrade: websocket" \
  -H "Connection: Upgrade" \
  https://peer-address:443/mesh

Configure proxy authentication:

peers:
  - transport: ws
    address: "wss://peer-address:443/mesh"
    proxy: "http://proxy:8080"
    proxy_auth:
      username: "${PROXY_USER}"
      password: "${PROXY_PASS}"

Routing Issues

No Route Found

Error: no route to 10.0.0.5

Step 1: Check if route should exist

# On exit agent
grep -A5 "exit:" /etc/muti-metroo/config.yaml

Step 2: Check route propagation

# On ingress agent - using CLI
muti-metroo routes

# Or using HTTP API
curl http://localhost:8080/healthz | jq '.route_count'

Step 3: Check peer connectivity

Routes propagate through peers. If peer is disconnected, routes are lost.

# Using CLI
muti-metroo peers

# Or using HTTP API
curl http://localhost:8080/healthz | jq '.peer_count'

Step 4: Trigger route advertisement

curl -X POST http://exit-agent:8080/routes/advertise

Step 5: Wait for propagation

Routes take time to propagate (up to advertise_interval).

Route Expired

Routes expire after route_ttl without refresh.

# Check route TTL
grep route_ttl config.yaml

# If exit disconnected for too long, routes expire
# Reconnect exit and trigger advertisement

Wrong Route Selected

Routes are selected by:

Longest prefix match
Lowest metric (hop count) if tied

Debug route selection:

# Enable debug logging
muti-metroo run --log-level debug

# Look for route lookup logs
grep "route lookup" logs

Stream Issues

Streams Not Opening

Error: stream open timeout

Causes:

Network latency too high
Too many hops
Exit agent overloaded

Solutions:

Increase timeout:
```
limits:
  stream_open_timeout: 60s
```
Check each hop is responsive
Reduce hop count if possible

Streams Dying

Check logs for stream issues:

journalctl -u muti-metroo | grep -i "stream"

Common causes:

Idle timeout
Buffer exhaustion
Network issues

Network Diagnostics

Capture Traffic

# QUIC (UDP)
tcpdump -i any udp port 4433 -w capture.pcap

# HTTP/2, WebSocket (TCP)
tcpdump -i any tcp port 443 -w capture.pcap

Monitor Connections

# Watch connection states
watch -n 1 'netstat -an | grep 4433'

# Count connections
netstat -an | grep 4433 | wc -l

Latency Testing

# Measure round-trip time
ping peer-address

# Measure TCP latency
hping3 -S -p 443 peer-address

# Time a stream open
time curl -x socks5://localhost:1080 https://example.com -o /dev/null

Connectivity Troubleshooting

Diagnostic Tools

CLI Commands

HTTP API

Peer Connection Issues

Can't Connect to Peer

Peer Disconnects Frequently

Slow Reconnection

Transport-Specific Issues

QUIC Not Working

HTTP/2 Not Working

WebSocket Through Proxy

Routing Issues

No Route Found

Route Expired

Wrong Route Selected

Stream Issues

Streams Not Opening

Streams Dying

Network Diagnostics

Capture Traffic

Monitor Connections

Latency Testing

Checklist

See Also

Next Steps

Diagnostic Tools​

CLI Commands​

HTTP API​

Peer Connection Issues​

Can't Connect to Peer​

Peer Disconnects Frequently​

Slow Reconnection​

Transport-Specific Issues​

QUIC Not Working​

HTTP/2 Not Working​

WebSocket Through Proxy​

Routing Issues​

No Route Found​

Route Expired​

Wrong Route Selected​

Stream Issues​

Streams Not Opening​

Streams Dying​

Network Diagnostics​

Capture Traffic​

Monitor Connections​

Latency Testing​

Checklist​

See Also​

Next Steps​

Diagnostic Tools

CLI Commands

HTTP API

Peer Connection Issues

Can't Connect to Peer

Peer Disconnects Frequently

Slow Reconnection

Transport-Specific Issues

QUIC Not Working

HTTP/2 Not Working

WebSocket Through Proxy

Routing Issues

No Route Found

Route Expired

Wrong Route Selected

Stream Issues

Streams Not Opening

Streams Dying

Network Diagnostics

Capture Traffic

Monitor Connections

Latency Testing

Checklist

See Also

Next Steps