Why Can't Rails Use UNIX Sockets With Containerized Postgres?

I wasn't paying attention while setting up a work Rails project recently (my level of Rails proficiency is firmly winging it) and I bumped into this:

# bundle manages ruby dependencies, rake is a ruby build tool
# this command is saying: hey, please run the task that sets up my database
$ bundle exec rake db:setup
connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such file or directory

I spend a good amount of time blowing up databases, sometimes on purpose but mostly by accident, so for my development workflow I tend to run instances of Postgres in containers.

Most projects run just fine: they dial localhost on port 5432 and the good times roll. In this instance, however, I was ignoring this project's README.md, which advocated a local install of the wonderful Postgres.app. Postgres runs as a server, which can be a little fussy in terms of setup, and Postgres.app simplifies the whole process into a standard macOS application.

Still: what was going on here? socket? /tmp/.s.PGSQL.5432? No such file? What is this word soup? Aren't we supposed to be connecting to a database? I had a perfectly good Postgres container primed and waiting on port 5432! No, I have not read the project's README!

Let's unwind a little. Rails conventionally stores settings relating to the database in config/database.yml, so that was a natural first destination. The section for host was commented out, and featured this context: Connect on a TCP socket. Omitted by default since the client uses a domain socket that doesn't need configuration.

The most straightforward thing to do for my workflow would be to uncomment that host line out and set it to connect over TCP to localhost:5432. That's actually the only thing you can really do in this situation (read: when using macOS, and without some arcane trickery) but if you're anything like me you're ready to take an excruciatingly long (and fun, I hope!) journey to figure out why that is.

Socket to Me

So, cool, let's dig deeper: we have a rake task that wants to setup a Rails application, and it needs to speak to something external to itself - in this instance, a database - in order to go about its business. If you want to sound smart, you'll of course call this something like interprocess communication.

Perhaps the most famous method of interprocess communication is the socket. A socket, in this context, is usually going to refer to sending data through the network, out into the magical world of the internet, and back again.

This is most commonly done using two nifty protocols: TCP and IP. That's what I was assuming our rake command would be using to reach out to Postgres - that's what we'd get if we connected to localhost:5432.

One super interesting detail of all this is that our applications aren't actually allowed to talk over a socket without getting permission: sitting between our socket and Ruby (or Python, JavaScript or any programming language) is going to be the operating system. And it's the operating system that does a lot of the heavy lifting.

macOS is a POSIX-compatible operating system, with POSIX being a set of standards for compatibility between operating systems. Linux is generally another one. POSIX operating systems use the Berkeley Sockets API for a standardised method of managing network connections.

So, we have to ask the OS:

Application code can't make sockets by itself, it needs to make a system call to the operating system

So here we have us, the user, executing code written in Ruby, which provides its own abstraction of the underlying Berkeley Sockets API. The most common Ruby interpreter is written in C - you'll almost certainly know if you're not using it - and its Socket class itself calls out, via POSIX C libraries, to the operating system kernel (interacting with sockets is just a few of the many possible system calls). Finally, the operating system establishes and maintains the socket connection. So, and I find this wonderful, our little unassuming chain of events ripples all the way through into the protected core of the operating system.

On macOS, we can use a program called dtruss (warning: setting up dtruss to work is bit of a faff) to take a peek and see what system calls our applications are making. Here's it running for our bundle exec rake db:setup task.

	PID/THRD  SYSCALL(args) 		 = return

[...]

23078/0x3057d:  socket(0x1, 0x1, 0x0)		 = 10 0
23078/0x3057d:  fcntl(0xA, 0x3, 0x2)		 = 2 0
23078/0x3057d:  fcntl(0xA, 0x4, 0x6)		 = 0 0
23078/0x3057d:  fcntl(0xA, 0x2, 0x1)		 = 0 0
23078/0x3057d:  setsockopt(0xA, 0xFFFF, 0x1022)		 = 0 0
23078/0x3057d:  connect(0xA, 0x7F9B69ACE950, 0x6A)		 = -1 Err#2
23078/0x3057d:  close(0xA)		 = 0 0

Let's dive in and explore a couple of these calls: socket and connect.

socket is defined with the signature of int socket(int domain, int type, int protocol); you can probably look it up yourself on your machine with man 2 socket. We're calling it with a domain argument of 1, a type argument of 1 and a protocol of 0 (the dtruss output shows our numbers in hexadecimal format).

We can map those int values to constants in sys/socket.h (on my machine that file is located at /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/socket.h, but your mileage may vary) and see that the first two values correspond to AF_UNIX and SOCK_STREAM. The last value is set to 0, which we can lookup in /etc/protocols and see that it represents a generic catch-all internet result.

The return value from socket is shown as 10 0 - we're actually tracking two values, which feels a little cool and unfamiliar. The first is the return value from the syscall: 10. This is a reference to the file descriptor of the new socket, which we'll get to in a minute. The second return value is 0, which represents the global errno variable used by the kernel to add more context to system call failures. 0 means all good, which is great.

So we've now confirmed our expectation that the macOS kernel creates and manages a socket for our Ruby application. We can also see that we asked the operating system for a UNIX socket (AF_UNIX). These are also known as domain sockets, which is what database.yml mentioned. We said that sockets usually travel over the internet, but these UNIX sockets are slightly different: they send messages between processes directly via the operating system kernel itself. These bytes aren't even making it to our networking hardware.

Compare that to TCP/IP over AF_INET:

A side-by-side look at TCP/IP and UNIX socket stacks.

Regardless of their type, sockets lie dormant after being created, waiting for a connect call. The first argument we send to connect is an integer 10, which is the reference to the file descriptor - apologies, I know, I said we'd get to those and we haven't yet - of the socket we got earlier. The next argument is the memory address, 0x7F9B69ACE950, which points to a struct (of type sockaddr_un) with a member field (sun_family) containing the path to the /tmp/.s.PGSQL.5432 file. If we were initializing that struct directly in C, it might look like this:

struct sockaddr_un my_cool_socket = {
    .sun_family = AF_UNIX,
    .sun_path = "/tmp/.s.PGSQL.5432"
};

The final argument to connect is the length, in bytes, of the aforementioned structure. Our file does not exist, so the connect call returns a -1 (the universal sign of failure) and it sets that global errno variable used by the kernel to 2.

What does 2 mean here? We can take a peep on macOS with man errno, which gives us this: 2 ENOENT No such file or directory. A component of a specified pathname did not exist, or the pathname was an empty string.

Sounds about right! Our program is trying to connect to the UNIX socket file as a client, and the lack of that file's existence means that there's no corresponding server to chat to; the Postgres we have running in a container is only exposing TCP/IP to our macOS system.

File Under Something

It is a common refrain to say that almost everything in UNIX is a file. The sockets we create with the socket syscall are files, our desired /tmp/.s.PGSQL.5432 UNIX domain socket path is a file and, you know, files we normally associate with being files are also files. The UNIX file is itself this kind of wonderful abstraction that allows us to (mostly) avoid the ins-and-outs of, say, how these files are actually stored as bytes.

lsof conjures up a beautiful list of all open files on the system. It's huge:

# wc -l counts the number of lines
$ sudo lsof | wc -l
   15413

In its default state, lsof looks at every file being used by every process. Processes are a pretty powerful abstraction used by the operating system to make computers work, and at a high-level each process on the system represents a running program.

As part of running each process, the operating system kernel keeps track of the files allocated to it. Each and every process starts with three file descriptors - 0, 1 and 2 - which represent stdin, stdout and stderr. Right off the bat, this shows us that files don't have to correspond with our traditional mental model of a file.

File descriptors are like a ticket for the process to use; the operating system sits between the process and the files, and read and write commands have to go through it (via some more operating system calls). Processes go up to the operating system and ask to, say, read x number of bytes from the file represented by file descriptor y. The operating system then goes off, runs a series of checks, and then may return the result of that request if everything is tickety-boo.

Each process also gets its own process id (commonly called a pid) and we can look all those up with the ps command:

# -A displays processes belonging to all users, including those that aren't being used from the current terminal
$ ps -A | wc -l
     507

At this point, I've installed and started the aforementioned Postgres.app so we have a working reference point for what we should be seeing. With a Postgres server up and running locally, we can run ps and then we can grep through that:

$ ps -A | grep postgres
 7847 ??         0:54.81 /Applications/Postgres.app/Contents/Versions/14/bin/postgres -D /Users/martin/Library/Application Support/Postgres/var-14 -p 5432
 7849 ??         0:00.38 postgres: checkpointer
 7850 ??         0:06.02 postgres: background writer
 7851 ??         0:05.06 postgres: walwriter
 7852 ??         0:41.56 postgres: autovacuum launcher
 7853 ??         3:05.86 postgres: stats collector
 7854 ??         0:00.46 postgres: logical replication launcher
16014 ttys002    0:00.00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn --exclude-dir=.idea --exclude-dir=.tox postgres

Postgres runs a few processes, but at a guess the /bin/postgres one is probably the one we're looking for. So that's got a pid of 7847. We can feed that pid back to lsof:

# -P stops converting port numbers to port names (e.g. port 80 to http), -p filters by a PID
$ lsof -Pp 7847
COMMAND   PID   USER   FD    TYPE             DEVICE SIZE/OFF                NODE NAME
[...snip]
postgres  7847  martin 10u   unix             0xb49e813effdd307b  0t0        /tmp/.s.PGSQL.5432

If we peek into the pid of Postgres.app, we can indeed see a path for our /tmp/.s.PGSQL.5432 file. We can see in our TYPE section that it is, as we explored, a UNIX domain socket.

Our file descriptor integer for /tmp/.s.PGSQL.5432 associated with this process is marked as 10 (lsof suffixes it with a u, which means it's open for reading and writing).

tmp/.s.PGSQL.5432 is a file, so it also conforms with standard UNIX file permissions:

# -l displays 'long mode': file mode, number of links, owner name, group name, number of bytes in the file, abbreviated month, day-of-month file was last modified, hour file last modified, minute file last modified, and the pathname.
$ ls -l /tmp/.s.PGSQL.5432
srwxrwxrwx  1 martin  wheel  0 10 Aug 21:40 /tmp/.s.PGSQL.5432

The srwxrwxrwx section is the file mode - the first character s shows that this is a socket link, and the rwxrwxrwx shows that the socket can be read, written and executed for the owner, group and other users on the system; it's wide open. If it showed, say, srwx------ (700 in octal notation) then only the owner (in this example, martin) could interact with it.

One thing that always blows my mind while simultaneously making perfect sense is that the file /tmp/.s.PGSQL.5432 here isn't actually hitting the disk to read and write bytes. Writing to the disk is much slower than keeping information in kernel memory. And, because you're not going through extra layers of networking, communication over UNIX sockets should technically be faster than using a localhost TCP/IP connection.

In the context of day-to-day local development, you will almost certainly not need that extra speed, most of the time...

... but just how much faster, though?

$ time client -network "tcp" -address "localhost:8080" -input "./war-and-peace.txt" -loop 10
5.73s user 9.54s system 66% cpu 22.862 total

$ time client -network "unix" -address "./sock.sock" -input "./war-and-peace.txt" -loop 10
1.55s user 2.82s system 73% cpu 5.955 total

$ time client -network "tcp" -address "localhost:8080" -input "./war-and-peace.txt" -loop 10 -users 10
49.94s user 120.26s system 194% cpu 1:27.39 total

$ time client -network "unix" -address "./sock.sock" -input "./war-and-peace.txt" -loop 10 -users 10
55.63s user 145.38s system 403% cpu 49.802 total

Note that I'm no expert in benchmarking, so this should not be considered especially authoritative. Still, this little client executable dials out to an echo server which beams back whatever it receives. The client is set to send and receive each line of the Project Gutenberg edition of Leo Tolstoy's War and Peace, because culture is important.

On my machine, looping through the book ten times with a single user was almost 74% faster on the UNIX domain socket, and 43% faster when having ten concurrent users loop through the book ten times apiece.

Dock Around The Clock

So now we've been on a journey. We know that our Rails application, by default, is configured to use UNIX sockets to connect to Postgres for local development. We're also well over 2000 words into a blog post on something that could (and, again: should) have been fixed just fine to using TCP by changing just one line of code.

Now, we'll remove Postgres.app from this system and bring Docker back into the mix. We'll setup an ephemeral Postgres container:

$ docker run \
  --rm \
  --name my-cool-postgres-container \
  -e POSTGRES_PASSWORD=cool \
  -p 5432:5432 \
  -d \
  postgres:14

And, from inside that container, ls out the /tmp directory to look for the .s.PGSQL.5432 file:

$ docker exec -it my-cool-postgres-container /bin/bash
root@8759862222c5:/tmp# ls -al /tmp
total 8
drwxrwxrwt 1 root root 4096 Jun 23 14:27 .
drwxr-xr-x 1 root root 4096 Aug 16 06:17 ..

Oh. It's not there. Let's find it:

# 2>/dev/null redirects all errors to /dev/null, which is like a black hole
postgres@8759862222c5:/$ find / -name ".s.PGSQL.5432" 2>/dev/null
./run/postgresql/.s.PGSQL.5432

It is, perhaps counter-intuitively, in a different place inside the container. How confusing!

Running to the Postgres documentation shows that the configuration value we might be interested in is called unix_socket_directories, and the SHOW SQL command can be used to see the current setting.

$ docker exec -it my-cool-postgres-container psql -U postgres -c "SHOW unix_socket_directories"
 unix_socket_directories
-------------------------
 /var/run/postgresql
(1 row)

Another path! Now we're even more confused.

So, we've got the Postgres running inside the container saying it will save its UNIX sockets into /var/run/postgresql, but our find command spots it in /run/postgresql. And outside of the container, the Postgres running on the macOS pops the socket in /tmp.

The documentation hints at to what might be going on: The default value is normally /tmp, but that can be changed at build time. So the Postgres.app installation on macOS is presumably popping it in the default spot.

Now, let's head back inside the container. Instead of macOS, we're now running Linux. The typical convention on Linux points to /run as the home for UNIX sockets. But Postgres also configures itself to use /var/run, which (these days) has the same purpose as /run but is maintained for legacy reasons. The specification mentions that /var/run can be symlinked to run, so let's take a look:

postgres@8759862222c5:/$ ls -al /var/run
lrwxrwxrwx 1 root root 4 Jun 22 00:00 /var/run -> /run

So, that (mostly) explains that!

Now, we know where the socket file exists inside the container - so we can use a Docker bind mount to expose it to our macOS filesystem.

$ docker run \
  --rm \
  --name my-cool-postgres-container \
  -v /tmp/postgresql:/run/postgresql \
  -e POSTGRES_PASSWORD=cool \
  -d \
  postgres:14

We can see the file from the macOS host system, which is pretty cool.

But we still won't be able to connect to it. Womp womp.

Hitting Dock Bottom

We've established a couple of really important things:

UNIX sockets enable two separate processes to communicate via the operating system kernel
Socket files provide a namespace for both processes (the server and the client) to find each other

You may have intuitively felt this coming, but here it is: even with the file exposed from our container to our macOS host machine, a process on the host (rake) and a process inside a container (postgres) cannot speak to each other over a UNIX socket. We've already mentioned that the actual file isn't used to send the data and is instead managed by the kernel. And on macOS there's actually two different operating system kernels in play.

Docker on macOS quietly sets up its own virtual machine. It's deftly integrated into Docker Desktop, and it's almost completely invisible to the end user - you have to really go poking for it to see a trace in your system, but each of your containers is communicating with a separate (Linux) operating system kernel to the host system.

What happens, then, is that the Postgres installation creates a UNIX socket which it registers with the Linux kernel inside this Linux VM.

You can expose the resulting .s.PGSQL.5432 file to the macOS kernel, but it will shrug it off as it has no recollection of that file. It doesn't mean anything to macOS - it hasn't received any of those vital system calls: as far as the macOS kernel is concerned it's just an empty file.

Warning: Everything After This Point Is Awful

Now, there is an incredibly convoluted way we can get this communication up and running that will almost entirely defeat any real point of wanting to use a UNIX socket between a macOS host and a container. So, obviously, let's do it. But, you know, never actually do this.

We can use socat, which will let us proxy between TCP and UNIX domain sockets.

We're going create another 'sidecar' container that will have access to the socket file that Postgres uses, and then create a TCP proxy that our macOS kernel can communicate with. Yup. That's frightfully awful, but also awesome.

Let's set it up. We'll create a Dockerfile, which installs socat and copies a bash script into an Alpine container. And then we'll make a compose.yml file to let docker compose do the heavy lifting. I won't go through the code line-by-line here, but you can dig into the associated GitHub repo if you'd like to see it in action or give it a whirl on your machine.

Our compose.yml will have two services, db and proxy. They'll share a volume so that the proxy service can interact with the UNIX domain socket that db sets up, and as they're both inside Docker they'll be within the virtual machine and therefore can chat to the same Linux kernel. The proxy service will then forward the UNIX socket to a TCP port (5433) which we'll expose to the macOS host.

Now, on the macOS host machine we need to create another socat connection which listens for connections to the UNIX socket at /tmp/.s.PGSQL.5432 and then proxies them over to the port we're sharing with our proxy container:

$ socat -d -d UNIX-LISTEN:/tmp/.s.PGSQL.5432,unlink-early,fork TCP:localhost:5433

But doesn't all this extra weight, going through TCP and back out again, ultimately give us extra overhead, make it less performant, and make all of this a largely pointless exercise? Oh, definitely!

... but just how much more pointless, though?

$ time client -network "unix" -address "/tmp/server.sock" -input "./war-and-peace.txt" -loop 1 1.20s user 1.75s system 3% cpu 1:25.29 total

$ time build/client -network "unix" -address "/tmp/server.sock" -input "./war-and-peace.txt" 11.05s user 15.93s system 3% cpu 13:23.49 total

Wow. We've gone from 0:49 to 13:23.49, a ~1500% increase.

Note that having to communicate through the Docker networking (and, by extension, the HyperKit VM) adds some overhead even without the socket proxying.

This is by no means a performant solution, then, but it was extremely fun to poke around the system, realise how deeply intertwined the operating system is networking and to ultimately try and tease Docker into a non-standard configuration.

And, even better, if we go back to that original problem...

$ bundle exec rake db:setup
Created database 'backend_development'
Created database 'backend_test'

$ echo $?
0

Amazing! But I think just this one time I'll abandon the Rails doctrine and go with the one-line configuration over convention. Also: Rails is pretty cool! When I finally stopped going down this silly rabbit hole I had a really fun time using it for the aforementioned work project.