Peterlog

Or: digging into Openstack Neutrons' packet filtering bowels

The other day I had to track down weird behaviour with Neutron security groups in one of our clouds, and thought to share notes on debugging those and on Neutron networking in general ¹.

Compute node neutronics

Recall that on a compute node, the virtual networking with Openvswitch and GRE tunneling looks something like this

┌─────────────────────┐                         
│                     │                         
│     VM instance     │                         
│                     │                         
└───┬────────────┬────┘                         
    │    tapX    │                              
    └────────────┘                              
           │                                    
           │                                    
 ┌───────────────────┐                          
 │                   │                          
 │    qbrX, linux    │                          
 │                   │                          
 └──┬────────────┬───┘                          
    │    qvbX    │                              
    └────────────┘                              
           │                                    
           │                                    
    ┌────────────┐                              
    │    qvoX    │                              
 ┌──┴────────────┴───┐                          
 │                   │                          
 │    br-int, ovs    │                          
 │                   │                          
 └──┬────────────┬───┘                          
    │ patch-tun  │                              
    └────────────┘                              
           │                                    
           │                                    
    ┌────────────┐                              
    │ patch-int  │                              
 ┌──┴────────────┴───┐                          
 │                   │                          
 │    br-tun, ovs    │─GRE tunnel ─ ─ ─ ─ ─ ─ ─ 
 │                   │                          
 └───────────────────┘

VM instance, tap: Packets originate in the instance and are placed onto the tap device by KVM. There are potentially several taps on a compute host, postfixed with an id derived from the ports uuid (here marked "X")
qbr: A linux bridge connecting that one tap device to the Openvswitch integration bridge. These bridges solely exist to enable filtering.
qvb, qvo: Virtual ethernet links funneling packets from "B"ridge into "O"penvswitch
br-int: The Openvswitch integration bridge. One per host, collects traffic from several instances
patch-tun, patch-int: Openvswitch internal devices linking integration bridge and the (GRE) tunnel to the outside
br-tun: Openvswitch bridge that holds tunnel devices towards the network node (outside the compute host)

The picture changes a bit when using VLAN or VXLAN instead of GRE, but the part about qbr and filtering largely stays the same.

Tracing a packet

Lets trace what happens when a packet travels from an instance to the outside. My example instance has an internal ip address of 192.168.20.3, and I have added an (external) floating ip address of 10.0.8.203. First, this is what the network device looks like from the instances' point of view:

root@vm1:~# ip a s ens2
2: ens2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1458 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:a9:87:cd brd ff:ff:ff:ff:ff:ff
    inet 192.168.20.3/24 brd 192.168.20.255 scope global ens2
       valid_lft forever preferred_lft forever

On the compute node carrying that instance, we can see a tap device (tapc42f8a9b-62 below). The tap device matches a Neutron port in turn; the tap devices' name is derived from the port uuid.

Inspecting the tap at the compute node ²:

root@juju-1337b9-11:~# ip a s tapc42f8a9b-62
17: tapc42f8a9b-62: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1458 qdisc pfifo_fast master qbrc42f8a9b-62 state UNKNOWN group default qlen 1000
    link/ether fe:16:3e:a9:87:cd brd ff:ff:ff:ff:ff:ff
...

Note that the macaddr from the link in the instance and tap device match – it's a property of the Neutron port.

You could get at the Neutron port by searching for the device id (ie. the instance uuid) with neutron port-list. The tap device name has the first 11 chars from the Neutron port uuid appended:

neutron port-show $( neutron port-list --device-id b01e8def-bf68-49a3-bfb3-6db967f1c696 -f value -c id )
+-----------------------+-------------------------------------------------------------------------------------+
| Field                 | Value                                                                               |
+-----------------------+-------------------------------------------------------------------------------------+
...
| device_id             | b01e8def-bf68-49a3-bfb3-6db967f1c696                                                |
...
| fixed_ips             | {"subnet_id": "c339d715-ae52-40f2-aa08-ddfc777aa25d", "ip_address": "192.168.20.3"} |
| id                    | c42f8a9b-6282-4c22-b307-58dd51c6c43f                                                |
| mac_address           | fa:16:3e:a9:87:cd                                                                   |
| name                  |                                                                                     |
| network_id            | a58d7130-08fb-410e-8c48-f836e0e67123                                                |
| security_groups       | 48d00602-8edc-4404-a969-02189fa4abe7                                                |
...

When running a ping 8.8.8.8 from within the instance the icmp packets show up right away at the instances' tap device. Running tcpdump against the tap device on the compute node:

root@juju-1337b9-11:~# tcpdump -ni tapc42f8a9b-62 icmp
...
20:09:49.148467 IP 192.168.20.3 > 8.8.8.8: ICMP echo request, id 3821, seq 16, length 64
20:09:49.184318 IP 8.8.8.8 > 192.168.20.3: ICMP echo reply, id 3821, seq 16, length 64

Further above, one could see master qbrc42f8a9b-62 specified on the tap device. This means the tap is attached to the respective qbr bridge. The qbr bridge also has those same first 11 chars from the port uuid suffixed.

Inspecting the qbr bridge on the compute node reveals it bridging the tap device and a device prefixed qvb:

root@juju-1337b9-11:~# brctl show qbrc42f8a9b-62
bridge name     bridge id               STP enabled     interfaces
qbrc42f8a9b-62          8000.867d6cc0ddb3       no              qvbc42f8a9b-62
                                                        tapc42f8a9b-62

The other interface named on the qbr bridge is one half of the veth pair that leads into the strange world of Openvswitch, with the second part being the corresponding qvo device ³. Note the master ovs-system bit for the qvo device; this means it's a part of the magic Openvswitch bridge:

root@juju-1337b9-11:~# ip a s qvoc42f8a9b-62
15: qvoc42f8a9b-62@qvbc42f8a9b-62: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1458 qdisc noqueue master ovs-system state UP group default qlen 1000
    link/ether f6:ab:31:6e:11:bd brd ff:ff:ff:ff:ff:ff

Tracing packets from tap device to qbr and qvb/qvo is pretty straightforward, as these are regular Linux network devices. Just attach tcpdump to the respective interface, eg.

root@juju-1337b9-11:~# tcpdump -ni qvoc42f8a9b-62 icmp
...
20:51:26.444041 IP 192.168.20.3 > 8.8.8.8: ICMP echo request, id 9517, seq 20, length 64
20:51:26.479578 IP 8.8.8.8 > 192.168.20.3: ICMP echo reply, id 9517, seq 20, length 64

Tracing packets in Openvswitch is a bit more involved ⁴, but I won't go there for now - instead let's look at security groups and filtering.

Filtering

Notice how above the qbr bridge just connected the instances' tap device and the Openvswitch integration bridge? Why not directly connect the tap to br-int, one could ask? The reason is that you need that for filtering. Filtering is implemented via iptables, but you can't have iptables rules on tap devices going into Openvswitch devices. The qbr bridges are also called "firewall bridges" for that reason.

Let's look at how that filtering manifests itself on the compute node. Neutron sets up a whole slew of iptables rules and chains; they can be inspected with standard netfilter tooling, eg. iptables -S to display the filtering table. Among others, these neutron chains below can be seen – note we have one instance with one port running, corresponding to the tap tapc42f8a9b-62 we've encountered above:

neutron-filter-top: Top of the funnel
neutron-openvswi-FORWARD: Generic forwarding rules
neutron-openvswi-ic42f8a9b-6: Input Chain, going to the tap device
neutron-openvswi-oc42f8a9b-6: Output Chain, coming from the tap device
neutron-openvswi-sg-chain: Generic chain for security groups
neutron-openvswi-sg-fallback: Fallback chain - packets get dropped at the end of it

As an example, I'll add rule to the default security group - a rule that should be applied to all instances. In this case I'm going to allow ICMP and SSH traffic globally

neutron security-group-rule-create --protocol icmp default
neutron security-group-rule-create --protocol tcp --port-range-min 22 --port-range-max 22 default

I didn't specify a direction, so the rules apply to ingress traffic ⁵. This results in the below rules being added:

# iptables -S 
...
-A neutron-openvswi-ic42f8a9b-6 -p icmp -j RETURN
-A neutron-openvswi-ic42f8a9b-6 -p tcp -m tcp --dport 22 -j RETURN

Which means that packets being passed through the ic42f8a9b-6 chain will be returned and accepted, instead of dropped (which is the default). The instance can now be accessed via ssh through it's floating ip (and also be pinged). When doing so, one would see traffic coming in to the instances' tap device. At this point the ip already has been translated from the externally accessible floating ip to the instance ip via DNAT. This happens before traffic hits the compute node.

When sshing towards the instance one could eg. see packets like this on the tap device. They are coming from the client (10.0.8.1) and are travelling to the internal ip of the instance (hitting port 22, the ssh port):

tcpdump -ni tapc42f8a9b-62 port 22
...
21:54:29.102770 IP 10.0.8.1.45170 > 192.168.20.3.22: Flags [S], seq 3651151370, win 26880, options [mss 8960,sackOK,TS val 703840903 ecr 0,nop,wscale 7], length 0
21:54:29.103277 IP 192.168.20.3.22 > 10.0.8.1.45170: Flags [S.], seq 4183538765, ack 3651151371, win 28120, options [mss 1418,sackOK,TS val 22768408 ecr 703840903,nop,wscale 7], length 0

Let's have a look at another example, and follow a packet through the individual chains. For example, I could add a rule that accepts packets towards port 8080, but only from a specific source ip range.

neutron security-group-rule-create --protocol tcp --port-range-min 8080 --port-range-max 8080 --remote-ip-prefix 10.1.0.0/23 default

This would be reflected in an input chain with this rule added:

-A neutron-openvswi-ic42f8a9b-6 -s 10.1.0.0/23 -p tcp -m tcp --dport 8080 -j RETURN

Those rules are a bit quiet by default – they will just pass on matching packets and drop the rest. If packets went missing there, one could add logging rules to narrow down where they are getting dropped.

There's several options to add logging to iptable rules. The oldschool method is the LOG target, which will emit logging information via the kernels' logging mechanisms - typicall they will show up in dmesg and /var/log/kern.log or similar. Another option is to use ulogd, the userspace logging daemon, in conjunction with the NFLOG target. I'll touch upon this further below.

As an example, let's assume I'd like to track packets toward that 8080 port above. From my workstation I fire up netcat towards port 8080 on the floating ip. I can see SYNs coming in through the qvb interface when doing tcpdump -ni qvbc42f8a9b-62 port 8080, but, whats that? On the tap I can see nothing when tracing with tcpdump -ni tapc42f8a9b-62 port 8080, and consequently I can't see any packets arriving in my instance.

I'd expect the packet to go via the FORWARD chain to neutron-openvswi-FORWARD, to the neutron-openvswi-sg-chain, and to the neutron-openvswi-ic42f8a9b-6 chain (which is the input chain for the instances port). Finally, if nothing in the input chain would match, the packet would go the neutron-openvswi-sg-fallback, and get dropped there.

I am going to add some logging rules to track where my packets go missing. In general, logging rules look something like the below, if one were to use the LOG target:

iptables -I <num> <chain> <filtering> -j LOG --log-prefix "<some marker>"

Num would be the index where the rule would get added in the chain. If you leave out <num>, the rule gets added at the top of the chain - a useful default. The log prefix is purely to add some marker text to clarify which logging rule just triggered. Add <filtering> to restrict what's being logged, this is especially useful if a lot of traffic passes through.

Let's add the logging rules as below, just adding one at the top of every chain the packet is expected to traverse. I'm filtering on proto tcp and destination port 8080. And I'm adding a marker text of "dbg: <abbr chainname> " so I know which chain spits out a log message ⁶

iptables -I FORWARD -p tcp --dport 8080 -j LOG --log-prefix "dbg: FORWARD "
iptables -I neutron-openvswi-FORWARD -p tcp --dport 8080 -j LOG --log-prefix "dbg: n-o-FORWARD "
iptables -I neutron-openvswi-sg-chain -p tcp --dport 8080 -j LOG --log-prefix "dbg: n-o-sg-chain "
iptables -I neutron-openvswi-ic42f8a9b-6 -p tcp --dport 8080 -j LOG --log-prefix "dbg: n-o-ic42f8a9b-6 "
iptables -I neutron-openvswi-sg-fallback -p tcp --dport 8080 -j LOG --log-prefix "dbg: n-o-sg-fallback "

With the rules in place it's time for some action. I'm opening a shell on my workstation and trying to open port 8080 on my instance via netcat. Recall that the external floating ip of that instance is 10.0.8.203, which later gets translated to the instances' internal ipaddr., 192.168.20.3

nc -v 10.0.8.203 8080

This results in those log lines being emitted (I'm eliding time stamp and host machine):

... dbg: FORWARD IN=qbrc42f8a9b-62 OUT=qbrc42f8a9b-62 MAC=fa:16:3e:a9:87:cd:fa:16:3e:98:e9:b2:08:00 SRC=10.0.8.1 DST=192.168.20.3 LEN=60 TOS=00 PREC=0x00 TTL=63 ID=3259 DF PROTO=TCP SPT=54654 DPT=8080 SEQ=3737303776 ACK=0 WINDOW=26880 SYN URGP=0 MARK=0
... dbg: n-o-FORWARD IN=qbrc42f8a9b-62 OUT=qbrc42f8a9b-62 MAC=fa:16:3e:a9:87:cd:fa:16:3e:98:e9:b2:08:00 SRC=10.0.8.1 DST=192.168.20.3 LEN=60 TOS=00 PREC=0x00 TTL=63 ID=3259 DF PROTO=TCP SPT=54654 DPT=8080 SEQ=3737303776 ACK=0 WINDOW=26880 SYN URGP=0 MARK=0
... dbg: n-o-sg-chain IN=qbrc42f8a9b-62 OUT=qbrc42f8a9b-62 MAC=fa:16:3e:a9:87:cd:fa:16:3e:98:e9:b2:08:00 SRC=10.0.8.1 DST=192.168.20.3 LEN=60 TOS=00 PREC=0x00 TTL=63 ID=3259 DF PROTO=TCP SPT=54654 DPT=8080 SEQ=3737303776 ACK=0 WINDOW=26880 SYN URGP=0 MARK=0
... dbg: n-o-ic42f8a9b-6 IN=qbrc42f8a9b-62 OUT=qbrc42f8a9b-62 MAC=fa:16:3e:a9:87:cd:fa:16:3e:98:e9:b2:08:00 SRC=10.0.8.1 DST=192.168.20.3 LEN=60 TOS=00 PREC=0x00 TTL=63 ID=3259 DF PROTO=TCP SPT=54654 DPT=8080 SEQ=3737303776 ACK=0 WINDOW=26880 SYN URGP=0 MARK=0
... dbg: n-o-sg-fallback IN=qbrc42f8a9b-62 OUT=qbrc42f8a9b-62 MAC=fa:16:3e:a9:87:cd:fa:16:3e:98:e9:b2:08:00 SRC=10.0.8.1 DST=192.168.20.3 LEN=60 TOS=00 PREC=0x00 TTL=63 ID=3259 DF PROTO=TCP SPT=54654 DPT=8080 SEQ=3737303776 ACK=0 WINDOW=26880 SYN URGP=0 MARK=0

Above one can see a SYN packet from netcat going through the chains. The packet is addressed to the instances internal ip (DST=192.168.20.3), and it has the expected destination port (DPT=8080). It passes through the FORWARD, neutron-openvswi-FORWARD, neutron-openvswi-sg-chain and into the instance-specific⁷ neutron-openvswi-ic42f8a9b-6 chain. We would have wanted it to be accepted here and therefore passed to the instance, but alas the packet falls through to the neutron-openvswi-sg-fallback chain and gets dropped.

Comparing the rule we imposed for traffic to port 8080 and the source address shown above it should be clear why. I am filtering for a source range of 10.1.0.0/23 but in this example my SYN packet is arriving with SRC=10.0.8.1 (an ip of my workstation), well outside of the allowed source range.

I'll add a special rule just for my workstation source address:

neutron security-group-rule-create --protocol tcp --port-range-min 8080 --port-range-max 8080 --remote-ip-prefix 10.0.8.1/32 default

The resulting rule on the compute node:

-A neutron-openvswi-ic42f8a9b-6 -s 10.0.8.1/32 -p tcp -m tcp --dport 8080 -j RETURN

Retrying the netcat run from before, I can now see packets for port 8080 passing through the tap device. Logging output is as before, except the packet won't hit the fallback chain anymore - as it is getting accepted prior to that.

Add more logging rules to trace more complex scenarios. If you are filtering on egress (by default no egress filtering is installed), the logging of course can also be applied for traffic coming out of the instance.

To clean up those logging rules, one would issue the corresponding -D (or –delete) commands:

iptables -D FORWARD -p tcp --dport 8080 -j LOG --log-prefix "dbg: FORWARD "
iptables -D neutron-openvswi-FORWARD -p tcp --dport 8080 -j LOG --log-prefix "dbg: n-o-FORWARD "
iptables -D neutron-openvswi-sg-chain -p tcp --dport 8080 -j LOG --log-prefix "dbg: n-o-sg-chain "
iptables -D neutron-openvswi-ic42f8a9b-6 -p tcp --dport 8080 -j LOG --log-prefix "dbg: n-o-ic42f8a9b-6 "
iptables -D neutron-openvswi-sg-fallback -p tcp --dport 8080 -j LOG --log-prefix "dbg: n-o-sg-fallback "

Logging via ulogd and NFLOG

Compared to the classical built-in LOG target, the ulogd/NFLOG method offers a lot more options in terms of where to log to, what to log, output formats etc. For debugging this doesn't matter too much, though. However, if your compute host is actually a container (very popular here with testing and development clouds), this is pretty much the only option; the LOG target iptables rule won't work inside a container.

To use the NFLOG logging method, you need to install the ulogd logging daemon and configure it for receiving log messages from iptables.

On my Ubuntu Xenial install this boils down to

apt-get install ulogd

And it's preconfigured to output messages directed at it from iptables to the file /var/log/ulog/syslogemu.log

The actual logging rules look very similar to the LOG rules, the basic form is

iptables -I <num> <chain> <filtering> -j NFLOG --nflog-prefix "<some marker>"

With that, the example rules from above become:

# Set up example rules:
iptables -I FORWARD -p tcp --dport 8080 -j NFLOG --nflog-prefix "dbg: FORWARD "
iptables -I neutron-openvswi-FORWARD -p tcp --dport 8080 -j NFLOG --nflog-prefix "dbg: n-o-FORWARD "
iptables -I neutron-openvswi-sg-chain -p tcp --dport 8080 -j NFLOG --nflog-prefix "dbg: n-o-sg-chain "
iptables -I neutron-openvswi-ic42f8a9b-6 -p tcp --dport 8080 -j NFLOG --nflog-prefix "dbg: n-o-ic42f8a9b-6 "
iptables -I neutron-openvswi-sg-fallback -p tcp --dport 8080 -j NFLOG --nflog-prefix "dbg: n-o-sg-fallback "

# And for cleanup use:
iptables -D FORWARD -p tcp --dport 8080 -j NFLOG --nflog-prefix "dbg: FORWARD "
iptables -D neutron-openvswi-FORWARD -p tcp --dport 8080 -j NFLOG --nflog-prefix "dbg: n-o-FORWARD "
iptables -D neutron-openvswi-sg-chain -p tcp --dport 8080 -j NFLOG --nflog-prefix "dbg: n-o-sg-chain "
iptables -D neutron-openvswi-ic42f8a9b-6 -p tcp --dport 8080 -j NFLOG --nflog-prefix "dbg: n-o-ic42f8a9b-6 "
iptables -D neutron-openvswi-sg-fallback -p tcp --dport 8080 -j NFLOG --nflog-prefix "dbg: n-o-sg-fallback "

That concludes the walkthrough. I often find Neutron utterly mystifying; the fact that security groups are getting expressed as plain iptable rules however is a bit comforting to me.

Footnotes

The configuration I'm using here is ML2 with Openvswitch and GRE tunnels, with Neutron security groups enabled - a fairly common configuration I believe. Security groups are working similarly with other ML2-type configurations however, but note that eg. with VLAN-based networking the parts about br-tun and related patch interfaces is replaced by the VLAN equivalents↩︎
I'm using a Openstack-on-LXD test environment as described here – very handy for local testing and exploration↩︎
Note, all the 'q' prefixes stem from the fact that Neutron was previously known as Quantum, until a tape vendor firm complained about name similarities↩︎
But see the excellent Openstack Network troubleshooting guide for some pointers on tracing ovs↩︎
Neutron will add two rules to allow all ipv4/ipv6 egress traffic by default↩︎
There's a limit on the length of the log prefix text, therefore the need to abbreviate↩︎
Actually port-specific, but we only have one port in this instance (no pun intended)↩︎

sabaini gmbh

Neutronic Security