Understanding D3 Selections

On this page, D3 is available globally. Explore in your browser's development tools as you read: try d3.select("table")

D3 is a mature library with a rich history that's famously a bit scary. It's no wonder LLMs love it for data visualisation.

To draw an SVG with three circles based off some data, the convention is this:

const circleData = [
  { x: 100,  y: 100,  r: 50 },
  { x: 400, y: 100, r: 50 },
  { x: 700, y: 100, r: 50 }
];

const svg = d3.select("svg");

svg.selectAll("circle")
  .data(circleData)
  .join("circle")
  .attr("cx", d => d.x)
  .attr("cy", d => d.y)
  .attr("r",  d => d.r)
  .attr("fill", "#FBFBFF")

When I started learning D3, this API boiled by brain. If you've never used D3 before, read the code again and see how it resonates with you. If you have used D3 before, try and remember whether you stumbled on some of its aspects.

In my case, I really struggled with a few things:

Why do I have to selectAll an element I know doesn't exist in order to make new elements?
Why do I explicitly have to state I'm appending a circle in the join method when I've already mentioned I'm selecting circles in selectAll? Isn't that an unnecessary duplication?
D3 has select and selectAll, with the former grabbing the first matching DOM element and the latter selecting all matching elements. The join method is used for saying what you want to happen when data is bound (the circleData above). While the library won't explicitly error out, you will never see any example using join with select, only selectAll. Why? Shouldn't it be possible to select an element and join one item?

All of these relate to D3's data join, which is both its secret sauce and most tricky quality. It's worth taking the time to understand what's going on and what I believe D3 is trying to achieve.

Why D3?

To understand better later, let's start by thinking about why we'd want to use D3 in the first place. The Web APIs already have a way to create, select and manipulate elements in the DOM: document.querySelector and document.querySelectorAll. Why not just use those?

⚠️ D3's remit is much, much broader than manipulating the DOM, but in this article when we say D3 we are going to be solely focused on d3-selection.

Consider this bar chart below. It's an <svg> populated with a few <rect> elements, with each bar showing the ⚽ goals scored by Jean Philippe-Mateta, Ismaïla Sarr, Eberechi Eze and Daniel Muñoz in the 2024/25 Premier League season. You can interact with the chart to alternate between versions created by D3 and the standard Web APIs.

🔎 Explore the raw dataset by inspecting the cpfcGoalscorers202425 object from your browser's development tools

You'll see two identical bar charts, both taking roughly ~30 lines of code to create. The primary difference comes from associating the cpfcGoalscorers202425 data with the chart.

With the standard Web API, we iterate over the players, creating an SVG <rect> for each of them and appending that to a parent <g> element:

const g = document.createElementNS("http://www.w3.org/2000/svg", "g");

cpfcGoalscorers202425.forEach((player) => {
  const bar = document.createElementNS("http://www.w3.org/2000/svg", "rect");
  bar.setAttribute("fill", player.color);
  bar.setAttribute("x", player.x);
  bar.setAttribute("y", y(player.goals));
  bar.setAttribute("width", barWidth);
  bar.setAttribute("height", height(player.goals));

  g.appendChild(bar);
});

The y and height functions here are defined elsewhere, and map the values to the SVG coordinate space. We'll hand-wave those away as an implementation detail.

Let's compare this with D3. We now have its famous data and join methods, and an API which opts for a more declarative style. You first bind data to your selection, and then join the data to elements:

svg
  .append("g")
  .selectAll("rect")
  .data(cpfcGoalscorers202425)
  .join("rect")
  .attr("fill", (d) => d.color)
  .attr("x", (d) => d.x)
  .attr("y", (d) => y(d.goals));
  .attr("width", barWidth)
  .attr("height", (d) => height(d.goals))

On such a basic chart, there's not too much difference between the two. But the D3 API allows you to also consider a couple of extra states for your data:

What happens when new data has to enter the DOM, or old and unneeded DOM elements need to exit?
If you've got a lot of complex data, how do you link your source data to your DOM elements?

Let's go back to our goalscorers chart. We can turn it into an animated chart which shows cumulative total goals for Crystal Palace players across the 38 matches in the 2024/25 season:

This feels like a more complex arrangement but, as far as the DOM is concerned, this is still essentially the same as what we had before: an <svg> with bunch of <rect> elements. The primary difference is the data - there's more of it, it changes, and we're using it in a more complex way. To achieve this with the vanilla Web APIs, we'd need to start taking on more complicated-sounding work. Chiefly:

Identifying when data has changed
Knowing which DOM element corresponds to which datum
What to do when new data turns up

Surprise: D3 does all of those things under the hood. In this chart, the code that looks after generating bars is very similar to how it looked before.

bars
  .data(data, (d) => d.player)
  .join(
    (enter) =>
      enter
        .append("rect")
        .attr("fill", (d) => colour(d.player))
        .attr("width", 54)
        .attr("x", (d, i) => x(i)),
    (update) => update
  )
  .attr("y", (d) => y(d.goals))
  .attr("height", height(d.goals));

The data method now takes an accessor function to provide a key, which lets D3 associate DOM elements to specific bits of data, and the join method now takes two functions: enter and update. The former creates a new <rect> element in the DOM when a player scores their first goal and joins the dataset, and the latter is the identity function. A third function can be provided for exit, but this isn't used here.

The y and height attributes are then set on the merged outputs of the enter and update selections, which is the selection that the join method returns. The x, y, colour and height functions are defined elsewhere, but these will return the relevant values for each attribute.

We have now stumbled upon the most sacred D3 concept: the magical data join. You'll often see it visualised as a Venn diagram:

Which is a nice representation for the following:

Data not bound to elements are sent to the enter function
Data already bound to elements are sent to the update function
Elements not bound to data are sent to the exit function

With this in mind - and, honestly, it's a lot - we can start addressing those original pain points.

Why `selectAll` elements that don't exist?

That's the data join at work! Let's revisit our original code:

svg.selectAll("circle")
  .data(circleData)
  .join("circle")
  .attr("cx", d => d.x)
  .attr("cy", d => d.y)
  .attr("r",  d => d.r)
  .attr("fill", "#FBFBFF")

We can now analyse what's going on with a bit more D3-specific panache:

selectAll("circle") returns an empty selection, as there were no <circle> elements nested inside the <svg> container.
The data method binds the array of data to the selection. Three new selections are created under the hood, each representing the enter, update and exit states. These selections can be empty. The update selection is returned; the selection remains aware of the enter and exit states for later use.
The join method operates on the enter, update and exit selections, passing them as parameters to the associated callback functions.
selectAll("circle") returned an empty selection, so the update and exit selections are empty. The enter selection callback operates on the supplied data, appending three <circle> elements to the <svg> container. The attr function accepts either a string or a function - if the latter is provided, D3 will pass the individual datum as the first parameter.
The join method returns a new selection resulting from merging the enter and update selections together.

This might all feel a bit overblown - and, dare I say it, clunky - in the case of displaying static data once in the DOM. But D3's elegance comes from how it generalises extremely well across cases where there are multiple elements moving through the enter, update and exit states, which is often the case when veering into animated or interactive charts.

Why `append` a circle after selecting circles?

That's also the data join at work! The selectAll grabs the initial selection, but the append on the enter selection specifically describes what to do when there is data that's not bound to an element.

You could append a <rect> instead, of course. What would happen if you did?

Returning to the case of static data: nothing weird would happen. You'd get a nice little <rect> sitting inside your SVG container. But if you wanted to make something dynamic, well, when you re-ran the selectAll you would once again receive an empty selection. And once again the bound data would go through the enter state. And another beautiful <rect> would end up, perhaps unexpectedly, inside the SVG.

Press the button to see that happening:

Why can't you `select` an element and `join` one item?

Technically, you can if you're willing to fudge it. Sort of. But you shouldn't, because it's not right.

It mostly comes down to parents.

But first: semantics. The word data is an English language hot mess, and is usually used in both its singular and plural forms; the singular would be datum. Within the realm of D3 we should very much consider it plural. So it wouldn't really make sense to try and bind multiple bits of data to a singular selection. In language terms, this is a closed case.

But... why stop there?

Within D3, data (plural!) is bound to a selection. A selection is an array of arrays of DOM elements.

select and selectAll both return the same core Selection object, which contains fields for _groups and _parents. Imagine something like this:

export class Selection {
  this._groups = groups;
  this._parents = parents;

  constructor(groups, parents) {
    this._groups = groups;
    this._parents = parents;
  }

  // ... a bunch of methods
}

But select and selectAll are also a little sneaky. They're top-level selection functions, d3.select and d3.selectAll, which query the entire document and return selections with one group containing a single or all elements.

A selection also has its own select and selectAll methods, however, which allow for nested selections. These return new selections limited to descendants of the original selection. In these new selections, elements of the old group become the new selection groups. The elements of the new selection are the matching descendant elements of the new groups.

Let's illustrate that with this table:

One	Two	Three
Four	Five	Six

This becomes important when we start to think about the data join. Specifically, the enter selection.

In D3's data join, the parent of a new element created by the .join() method is determined by the selection it's called on. The selection.select() method returns a new selection where the group's parent is inherited from the original selection.

Often, this is the <html> element because d3.select will always set the parent to the <html> element.

Let's break for a second here to check our understanding:

const data = [{ x: 100, y: 100, r: 25 }];

d3.select("svg")
  .select("circle")
  .data(data)
  .join("circle")
  .attr("cx", d => d.x)
  .attr("cy", d => d.y)
  .attr("r",  d => d.r)
  .attr("data-id", "the-missing-circle")

Given our new knowledge of how D3 sets a parent, and this code, where will the <circle> end up?

🔎 This code has been run on this page. You can investigate the DOM in your browser's development tools and search for the-missing-circle if you want to double-check your thinking, or if you don't believe me.

The answer is the <html> element. We can break down why:

The initial d3.select("svg") creates a selection with a single group, with its parent set to the <html> element.
The .select("circle") creates a new, empty selection that inherits the <html> parent.
We bind the data to the selection, creating an enter selection as there is no corresponding <circle> element for the data.
When the join method invokes the enter callback, D3 appends a <circle> to the parent element: <html>.

In contrast, selection.selectAll() creates a new selection where each group's parent is the element from the original selection. When you perform a data join, D3 knows to append the new elements as children of these specific group parents (e.g., appending <td>'s inside a <tr>). While you could contort the D3 API enough that a select would have a parent where something would display inside the DOM, it would never end up being quite the right parent.

Joining the D3 Enlightened

We've explored why D3 exists in the first place, and how the data join is central to its magic. By binding data across three states - enter, update and exit - we gain a powerful general API to work with data in the DOM.

We'll create one final visualisation to cement everything together. In the below grid, we bind to data of randomly select squares on an infinitely repeating timer. We use the data join to bring it to life in the DOM. Squares from the enter selection appear from yellow, the update selection pulses green and the exit selection shrinks to red.

With the power of D3 comes quite a low-level focus. There are certainly easier ways to make attractive basic charts - one of the D3 team's other projects, Plot, does just that.

And yet, I argue understanding D3 from a slightly lower level is a great way to appreciate it more thoroughly, and I also find if you don't come from a statistical background it helps appreciate that, too.

Not to mention you'll be able to correct the LLM when it makes the occasional mistake with the chart you just asked it to make. Here's to investigating with data!