D3 the Hard FP Way

In theory, the idea here is similar to when I was learning React/Redux and diving into SQL selects. In practice, I think most of D3’s complexity isn’t exactly in a direction that is elucidated by writing down types for everything, so the title is a mere personal snowclone. I’m just writing things out to an arbitrary amount of detail until I understand them and can refer to what I wrote here later.

Background

D3 is “a JavaScript library for visualizing data”. It has a lot of sublibraries that interoperate well but could be used separately — for example, it has utilities for manipulating colors, time, and SVG paths. Of the various concepts, though, I think D3 selections are the most distinctive and fundamental, so they are the focus of this post.

At a high level, D3 selections feel like jQuery. You run some code and it goes into the DOM and adds, deletes, and mutates a bunch of elements. The docs even endorse monkeypatching d3.selection to add custom helpers. However, D3 has data binding and batch operations that make it easy to change the DOM in a way that resembles reconciliation in a framework like React.

Selections

API: Selecting Elements.

A D3 selection holds an array of arrays of nullable1 DOM elements. The intermediate arrays are called groups. Additionally, each group in a selection is associated with a parent node. During basic D3 usage, you might only ever work with selections with a single group and ignore parent nodes.

When relevant, I will call the index of an element inside its group the “within-group index” and the index of a group among all groups in a selection the “across-group index”.

  • d3.selectAll(someCssSelector) creates a selection with one group, which holds all elements matching some CSS selector. You can also pass an actual iterable of DOM elements if you have one. The group’s parent is the root element, probably the html element.
  • d3.select(someCssSelector) is the same except it only selects the first matching element (if any). You can also pass an actual DOM element.
  • d3.create(tagName) creates a selection with a single new (detached) element.

Given a selection sel:

  • sel.select(someCssSelector) creates a new selection, replacing each (non-null) element with its first descendant matching the selector if one exists (or null if not). (It also copies the parent’s datum to that descendant; see the data section.) Group structure and parent nodes are preserved.
  • sel.selectAll(someCssSelector) creates a new selection with one group for each element in sel. Each new group contains all descendants of the original element matching the selector, and its parent node is that element. (Therefore, the group structure of the original selection is flattened and lost.) I believe this is the most likely way to first get a selection with multiple groups and/or nontrivial parents.

Some ways to get non-selection info out of a selection:

  • sel.nodes() returns all elements as an array (flattening group structure). This is also what you get if you iterate over a selection.
  • sel.node() returns the first non-null element.
  • sel.empty() tests for emptiness.
  • sel.size() counts non-null elements.

Modifying elements

API: Modifying elements; handling events.

Many methods on selections are just for-each operations that modify each element in each group in some way, and then return the same selection to allow you to chain calls. For example, sel.style("color", "red") will set the CSS color property of each element in that selection to red (and then return the same selection). Such methods include:

  • .attr (name and value)
  • .classed (class name and value = boolean, for whether to add or remove the class)
  • .style (name and value)
  • .property (name and value) (e.g. text fields’ value, checkboxes’ checked)
  • .text (value)
  • .html (value)

The above methods can also be used as getters if you don’t pass a value, but they just return the value for the first non-null element (rather than some kind of data structure of values with the same shape as the selection).

Some more interesting methods modify the actual node structure of the DOM:

  • .append(tagName) appends an element of that tag name to each (non-null) element in the selection and returns a new selection with the added elements, in the same group structure and with the same data.
  • .remove() removes each element from the DOM.
  • .raise() resp. .lower() moves each element to the end resp. the beginning of its parent’s children. The names make sense when the elements are absolutely positioned and visually overlap: “raising” an element gives it a “higher z-index”.

Adding event handlers is similar: .on(eventName, handler). The handler will be called with two arguments, the event itself and the element’s current datum (see below); and will have this set to the DOM element that received the event.

sel.each(f) calls f on each DOM element. Inside such a function, you could d3.select(this) to get back into selection mode (though the function can’t be an arrow function). Finally, sel.call(f, ...args) is just f(sel, ...args) but allows you to continue long method chains.

Binding data

API: Joining data. How Selections Work has helpful visualizations, though it’s old, so its API references have bit-rotted.

Every DOM element may have a __data__ property that D3 pays attention to. (Reading comprehension note for the rest of this post: “data” is the plural of “datum”, but each __data__ property is one datum. If I could selfishly redesign D3 to maximize how intuitive it is to me personally, I would have renamed the property to __datum__.) This is a plain ol’ JavaScript property with no significance in web standards! D3 actually just writes to or reads from element.__data__. You can do this yourself — literally grab an element in the browser inspector and look at element.__data__ — to debug your code, which is neat. D3 doesn’t store data anywhere else.

Generally, the above methods on selections also take a function that receives the datum as an argument. sel.style("color", d => d.color) is roughly like element.style.color = element.__data__.color; for each element. (The function also gets two more arguments, the within-group index and the entire group itself as an array, and is called with this being the current DOM element.)

A simple but less often useful way to set data is with sel.datum(value), which sets the datum for each element independently. value can also be a function, which gets called with the same arguments as the modifier functions above: (current) datum, within-group index, and entire group. As a getter, sel.datum() gets the datum for the first non-null element.

Usually, though, one gets data onto DOM elements with the sel.data(data, keyfunc) method. Before proceeding, I’ll mention that calling sel.data() with no arguments produces an array with all data in the selection, but the group structure is flattened.

The .data function

sel.data(data, keyfunc) works as follows. For each group in the selection:

  • Produce an array of data:

    • If data is an array, it’s just a copy of that.
    • If data is a function, it’s called (once per group) with three arguments: the group’s parent node’s datum, the across-group index, and an array of the selection’s parent nodes. The call will be made with this being the group’s parent element. This should return an array.
  • Attempt to match each DOM element in the group with an element of the data using keyfunc.

    • If keyfunc is passed, the basic idea is that element matches datum if String(keyfunc(element.__data__)) === String(keyfunc(datum)). (keyfunc is called with more arguments, starting with an index within the group or the data array, but look them up yourself.) However, if there are duplicate keys on either side, all appearances after the first don’t match anything.
    • If keyfunc is not passed, the elements and the data are just matched in the order they appear in their respective arrays. If one array is longer, the elements or data at the end of the longer one aren’t matched.
  • After matching, there may or may not be leftover, unmatched elements and/or data. Imagine a Venn diagram. Each element and datum falls into one of three categories:

    • The update selection comprises the (equally many) elements and data that match each other.
    • The exit selection comprises elements that didn’t match any data.
    • The enter selection comprises data that didn’t match any elements. This is not a normal “selection” as we’ve defined it; instead of DOM elements, it contains placeholder “elements” that are objects with a __data__ property and a reference to a position in the DOM (right before some other element). They can be imagined as like empty React.Fragment elements at those locations.

      Just about the only useful thing you can do with an enter selection is to call .append(tagName) on it, which will insert elements at those positions, propagate the __data__ from the placeholder element, and leave you with a normal selection. Even this won’t work if the group’s parent node was not actually the parent node of the group’s constituent DOM elements.

    The ordering/position of elements in the update and enter selections is determined by the ordering/position of their corresponding data in data (so both may have nulls).

You can use the return value of .data directly as a normal selection (e.g. by chaining more methods onto it), and it’ll behave like the update selection. This might be all you need if you know that the elements and the data match perfectly.

Otherwise, you can call .enter() and .exit() on the return value of .data(...) to obtain the enter and exit selections. (These methods don’t do anything interesting on other selections.) In standard usage of .data(), your goal would likely be to insert and delete elements until the elements match the data, similar to React reconciliation. The most straightforward way to do this, in the easy case where the update selection elements are already in order, would be:

If you want to then keep chaining methods on the elements after insertion and modification, you can combine the update and enter selections with .merge. (This function does not work in general to combine selections. What it does is replace null elements in the first with elements in the exact same position, across and within groups, from the second.)

If the update selection nodes are not already in order according to the data, and you want them to be, you’ll finally need to call .order() on the merged selection.

You can perform all the steps after .data() — insertion, deletion, modification, and reordering — in one function call with join. In full generality, join takes three arguments, which are functions to call on the enter selection, update selection, and exit selection.

One reason to write three separate handlers is to use D3 transitions to animate elements appearing/disappearing. See below or the selection.join notebook. However, in the super common pattern above, you can pass a string as the first argument and not pass the second and third arguments.

Some worked examples and pitfalls

To create a <ul> with four <li>’s with data 1, 2, 3, 4, and the same text:

The selectAll doesn’t select anything and will produce a selection with a single empty group, but it is necessary to promote the ul to the parent node of that group. It doesn’t matter what selector you pass into selectAll, but passing the same tag name you’ll pass to .join later is idiomatic since it makes that part of the method chain idempotent. (With d3-jetpack, you can append many nodes and bind data in one step with .appendMany("li", [1, 2, 3, 4]).)

Now, suppose sel is the above selection: a group with four nodes node1, node2, node3, node4 that have data 1, 2, 3, 4, respectively. If we run:

Then:

  • The update selection will have nodes [node4, <empty>, node2].
  • The enter selection will have nodes [<empty>, (fake node 0), <empty>]. This is an implementation detail, but the fake node will reference node2 because 0 was before 2 in the new data so it knows where a “real node 0” should be inserted. Calling .enter().append(...) would insert a node with datum 0 there.
  • The exit selection will have nodes [node1, <empty>, node3, <empty>].

If we then ran

then the DOM would have five <li>s: node1, a new node0 with datum 0, node2, node3, node4.

If we then ran

then the DOM would have three <li>s: node0, node2, node4, with data 0, 2, 4. This matches the data we passed to sel.data, except that it’s in the wrong order.

If we finally ran

then the DOM elements would have been rearranged in the order we want: node4, node0, node2. (Although node0 has datum 0, it won’t have any text; we never set it.)

And again, the last three lines of code are equivalent to sel.join("li"), so we probably wouldn’t think about enter and exit selections manually unless we wanted transitions or something fancier.

Finally, a demo building a table out of two-dimensional data:

d3-transition

d3-transition animates changes to the DOM. which are useful for writing interesting .join enter/update/exit handlers.

A d3.transition instance holds a D3 selection, information about the transition’s timing, and information about the transition “destination” (what attributes and such the transitioning elements should end up with), all of which are configurable. Once a transition instance is created, it will be scheduled to begin in the next frame, so you must finish configuring it immediately and synchronously. You can’t create or half-configure a transition and then start it later.

The simplest way to obtain a transition is to call .transition() on a selection.

For configuring timing, you can set an initial delay transition.delay(n) or total duration transition.duration(n) in milliseconds, as well as an ease function transition.ease(e), which will most likely be an ease function from d3-ease (possibly after further configuration itself). These methods can also take functions, which are called and used the same way as selections’ modification methods.

Simple methods for configuring the destination are just like modifying a selection, .attr and .style, though there are “tween” methods for finer control. Calling .remove() will remove the element at the end of a transition.

Calling .transition() on another transition creates a transition scheduled to start when that one ends.

Other sublibraries, briefly

Since there are twenty-odd libraries, I decided to just describe a random subset of them that caught my eye.

A high-level description of D3’s API style: many D3 library functions return other functions that can be directly called, but that also may have attached setters. The setters mutate the original function-object and return it so that setters can be chained. Many setters also work as getters if called with no arguments.

  • d3-array has a lot of simple array-manipulation utilities. You might find some of these in other utility libraries, but if you’re already using D3, might as well use its versions. Examples:
    • d3.min, d3.max, d3.extent (returns a two-element array [min, max]), d3.sum, d3.mean, all of which take an iterable and ignore undefined, null, and NaN.
    • d3.zip(...arrays)
    • d3.cross(...iterables) (Cartesian product)
    • d3.pairs(iterable) gives (overlapping) pairs of adjacent elements
    • d3.range(...) is basically Python’s range.
  • d3-color manipulates colors. Each D3 color object is tied to a specific color space or representation, so you have to explicitly ask D3 to convert between them. When coerced to strings, colors turn into rgb(...) strings you can use in CSS.
  • d3-interpolate interpolates between numbers, colors, and other data types. In general, d3.interpolate(foo, bar) is an “interpolator”, a function that’s foo at 0, bar at 1, and interpolates between them in-between (and sometimes even extrapolates below 0 or above 1). d3.interpolate itself might be called an “interpolator factory”; different color interpolator factories using different spaces exist.
  • d3-scale works with scales. A scale is a JavaScript function that maps data values from some domain (e.g. numbers, time, temperature) to values used for some aspect of visualization (e.g. numbers, color, length/position), though scales also have a bunch of extra functions.

    • Linear scales are probably the simplest. Example: d3.scaleLinear([1, 2], [100, 200]) linearly maps the interval [1, 2] to the interval [100, 200], i.e. it multiplies by 100. (Linear scales are simple enough to extrapolate outside their domain.) You can also write this more explicitly as d3.scaleLinear().domain([1, 2]).range([100, 200]). You can specify more than two numbers in the domain and range to produce a piecewise linear scale. There are also many similar numeric scales like d3.scaleLog.

    • Sequential scales also have continuous domains and ranges, but are mainly used when you want to specify the range with an interpolator (such as a continuous color scheme from d3-scale-chromatic, see below) rather than an interpolator factory. Typical usage: d3.scaleSequential([lo, hi], d3.interpolateViridis). There are many other variations, like the log scale d3.scaleSequentialLog. Diverging scales are similar, but have domains specified with three elements instead (I’m guessing so you can use a diverging color scheme and specify that the middle of the domain is 0 but set the negative and positive endpoints independently).

    • Ordinal scales have discrete domains and ranges. You can even omit the domain and just let the ordinal scale memoize. Suppose ord is an ordinal scale like ord = d3.scaleOrdinal(["red", "green", "blue"]). Each time you call ord, if you’ve called ord with the same argument before, it’ll return the same result as before; otherwise it’ll return the next thing from its range, looping around if necessary.

    • Band scales and point scales have discrete domains but continuous ranges. Both split the range up evenly, but band scales split it into intervals and give you intervals’ left endpoints, while point scales give you evenly spaced points that map the endpoints to the domain to the endpoints of the range.

    • Quantile, quantize, and threshold scales have continuous domains but discrete ranges, kind of. This is technically inaccurate for quantile scales because you give a quantile scale the full dataset you care about as its domain. Quantize scales split the domain into equal intervals. For threshold scales you have to specify every interval/“breakpoint” that matters in the domain.

    Not all scales support every method below, but some common methods are:

    • scale.domain([lo, hi]) (setter or getter)
    • scale.range([lo, hi]) (setter or getter)
    • scale.invert(r) tries to map something from the range to the domain
    • scale.interpolate(d3.interpolateHcl) (e.g.) sets the interpolator factory
    • scale.nice() “extends the domain so that it starts and ends on nice round values”, so that if you labeled the start and end on a chart it would look nice.
  • d3-scale-chromatic predefines a bunch of color schemes, so you can rely on decades of data visualization wisdom about perceptual uniformity and aesthetics.

    • The categorial schemes, e.g. d3.schemeCategory10, are just arrays of colors, which you can pass into d3.scaleOrdinal.
    • The continuous schemes are interpolators that map floats in [0, 1] to colors, and can be used with d3.scaleSequential above: examples include d3.interpolateBlues, d3.interpolateRdBu, d3.interpolateRdYlGn, and d3.interpolateViridis.
    • There are also some schemes of schemes, such as d3.schemeBlues; each one is an array of arrays of colors and d3.schemeBlues[n] has length n, for n from 3 to 9 inclusive.
  • d3-format helps you format numbers. The format is loosely inspired by Python .format. d3.format is curried; it takes a format string and returns a function that takes a number and returns a formatted string. Even the default d3.format("") is useful because it uses the correct Unicode minus sign.
  • d3-drag lets you implement drag-and-drop behavior simply. d3.drag() creates a “drag behavior”, which can be configured and then called on a selection to attach its listeners to that selection. The most important configuration is dragBehavior.on("drag", handler); other events are "start" and "end". handler receives a fairly synthetic drag event, which has properly offset x and y coordinates if the “drag subject” is set up properly.

    Here is a minimal drag example, but there is more magic here than I expected: the default drag.subject works off datum attributes specifically named x and y. (See this StackOverflow answer.)

Sandbox

While learning D3 I found it very useful to just open up a browser console, run random d3 commands, and see what happens. Here are some DOM elements with preloaded D3 __data__ and styles if you want to do the same.

A div#d3-sandbox-div with ten divs with data 1 to 10:

A svg#d3-sandbox-svg with viewbox is -300 -300 600 600, so both coordinates’ ranges are [−300, 300]. Its children are 12 gs, each of which has data {i, x, y} and has been .style("transform", "translate(...px, ...px)")’ed to those x and y coordinates. Each g contains a circle and a text, and has a drag handler configured.


  1. Empirically, instead of actually having null or undefined, the array has the JavaScript specialty empty slots at those locations, such that there appears to be an undefined there if you go by length and indexing, but iteration skips those locations entirely. This is an internal representation you probably don’t care about though. I’ll still call these elements null or non-null.

(note: the commenting setup here is experimental and I may not check my comments often; if you want to tell me something instead of the world, email me!)