Jun 16, 2022 6 min read Containers

Cursed Docker networking - using MacVLAN to pretend your containers are VMs

Hello there.

This being my first post, I thought I'd share the ~~cursed~~ unusual networking arrangement I use for the services running on my homelab, including the instances of Ghost and Commento powering this blog.

Be warned, this is likely on the extreme opposite end of what's considered "best practice", and done simply out of personal preference and because I can, so should you chose to replicate it, do it at your own risk.

The problem

Under normal circumstances, Docker creates an isolated network for each container, allowing outbound access, but requiring ports to be mapped between the containers and the host in order for these ports to be accessible (which is very similar to NAT with port forwarding). Communication between containers not on the same network also requires extra configuration.

But what if, for example, you needed to run two separate DNS server containers? They can't share the same port on the host, and listening for DNS on a port other than 53 is a whole other can of worms. You could also want your service to have an IP address different from the host's address, or to make it easier for containers to see and talk to each other.

MacVLAN

By using the macvlan network driver, Docker can give containers a virtual network interface that's linked to the host's real network. but without the need to create a classic Linux bridge interface. This virtual NIC has its own MAC address, IP address, and its own individual set of ports, behaving very closely as how a VM running under a hypervisor would.

Network isolation

In my particular case, for a little extra security I don't want the containers to have full access to the rest of my home LAN, so they will sit in their own separate VLAN, which will allow me to filter traffic on the router.

The first step in order to achieve this is to create the VLAN interface on the host. I'll not cover this process as it varies a lot depending on what network management tool you are using on the host, your network equipment and topology; but once done you should have a new interface that has the same base name as the LAN's interface plus a dot and the VLAN number.

For example, if your network interface is eth0, then the VLAN interface will be eth0.10 for VLAN 10.

You can also skip this entirely and have the containers exist alongside your home network (directly linked to eth0 in the example above), but I would advise against it.

Handling IP address assignments

So far we know that each container will have its own IP address, and that containers will have their own separate network, but who decides which container gets which address?

Although it is possible to use an external DHCP server with Docker, I want to avoid relying on external components as much as possible, and use just the functionality that's already included in Docker by default.

In this case, I'll let Docker itself handle address assignment, using the address space defined at the moment the macvlan network is created. The router will not provide DHCP, DHCPv6 or RA addressing on this VLAN.

Enough talk, I want to copy/paste potentially dangerous commands!

If all went well in the creation of the VLAN interface, we're now ready to create the network to which Docker will connect the containers. To do so, run:

$ docker network create -d macvlan -o macvlan_mode=bridge -o parent=eth0.10 --subnet=192.168.10.0/24 --gateway=192.168.10.1 docker_vlan

Optionally, if you also want to support IPv6, include this in the command, before the last parameter:

--ipv6 --subnet=xxxx:xxxx:xxxx:xxxx::/64 --gateway=xxxx:xxxx:xxxx:xxxx::1

Here's a breakdown of what each part of the command above does:

docker network create: creates a new network in Docker.
-d macvlan: The driver used by the network, which is macvlan in our case.
-o macvlan_mode=bridge: Set the driver to bridge mode.
-o parent=eth0.10: The interface which the macvlan driver will be connected to. In this example, the interface for VLAN 10.
--subnet=192.168.10.0/24: The subnet associated with this network. Docker will assign addresses from this CIDR to the containers connected to this network, in incremental order starting from the first available address.
--gateway=192.168.10.1: The gateway for the network, which is the address of the router interface for this VLAN.
--ipv6: Enable IPv6 support on this network.
--subnet=xxxx:xxxx:xxxx:xxxx::/64: Same as the above, but for IPv6.
--gateway=xxxx:xxxx:xxxx:xxxx::1: Same as the above, but for IPv6.
docker_vlan: The name of the network we're creating.

Using the new network

With the last step complete, we should have a network in Docker with the name docker_vlan. This network becomes part of your Docker config and will persist across reboots.

To use it, simply pass the argument --network=docker_vlan when executing a container from the CLI.

If using docker-compose, you must specify the network as external. This is an example adapted from the recommended compose file for Gitea:

---
version: "3"

networks:
  docker_net:
    external: true

services:
  gitea:
    image: gitea/gitea:latest
    container_name: gitea
    hostname: gitea.docker # this is optional, but it will make sense in the next section
    restart: always
    environment:
      - USER_UID=1000
      - USER_GID=1000
    volumes:
      - ./gitea:/data
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro

    networks:
      docker_vlan:
        ipv4_address: 192.168.10.101            # optional, in case you want to have an IPv4 static address
        ipv6_address: xxxx:xxxx:xxxx:xxxx::101  # optional, in case you want to have an IPv6 static address

Note that we have eliminated the ports element. All ports exposed by the container are automatically accessible on the IP assigned, like on a VM or physical machine. Access rules can be defined in the router/gateway, like in a more traditional setup (but that is of course out of scope for this post).

What about DNS?

1. Name resolution inside the containers

DNS resolution does not differ from a standard setup and will still be handled by the Docker daemon, meaning the DNS servers configured on the host will be used. I believe this can be changed, but I see little point in doing it.

2. Resolving the containers' names from your network

Now that our containers have their own addresses, we will want to access them somehow, but using a memorable name instead of an IP address.

This is possible thanks to the dns-proxy-server project. It provides a DNS server which will respond to queries for the hostname of the containers currently running (which is why hostname is set in the example above).

This is the compose file, adapted to work with our VLAN. Note that I'm using the domain .docker for this network, and it will become apparent why below.

---
version: "3"

networks:
  docker_vlan:
    external: true

services:
  dns-proxy-server:
    image: defreitas/dns-proxy-server
    container_name: dns-proxy-server
    restart: always
    hostname: dns-proxy-server.docker
    networks:
      docker_net:
        ipv4_address: 192.168.10.254
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

Then on your router, configure a conditional forwarder rule for the domain .docker to resolve through this container. For a router running dnsmasq for example, this would be a config line containing server=/docker/192.168.10.254.

Fixing host <-> container communication

One ~~bug~~ security feature of MacVLAN is that it blocks communication between the host's interface and its "child" MacV interfaces, so you will notice your containers are not able to communicate directly with the host, even though they are on the same network. The same is true in the opposite direction too: the host will not be able to access any of the containers, which can be a problem if you're running PiHole as a container, for example (the host wouldn't be able to resolve any names).

The workaround for this is weird but quite simple. On the host, simply create a new MacVLAN interface attached to the real network interface (eth0.10 in our examples). From here you have two options:

Remove the IP address / disable DHCP on the "real" network interface and set it on the MacV interface. All of your hosts traffic will now go through the MacV interface, but containers will be normally accessible.
Leave the original interface as is and set a new IP address on the MacV interface, like 192.168.10.253 (remember: Docker assigns addresses on this network incrementally, so going to the end will avoid/delay conflicts). Then set a route for the host to use the MacV link when talking to the containers subnet. On the containers side, if you need access to the host, do it via the address of the MacV interface.

Is this a good idea? Should I do it too?

No.

I have learned new things and had a ton of fun on the journey that led me to this setup, but it's obvious it introduces a lot of extra complexity, more than what I'd consider reasonable if you just want your containers to work. So, again, do it at your own risk.

Thanks for reading!

antsu

Some dude who likes computers