In theory, the idea here is similar to when I was learning React/Redux and diving into SQL selects. In practice, I think most of D3’s complexity isn’t exactly in a direction that is elucidated by writing down types for everything, so the title is a mere personal snowclone. I’m just writing things out to an arbitrary amount of detail until I understand them and can refer to what I wrote here later.
Background
D3 is “a JavaScript library for visualizing data”. It has a lot of sublibraries that interoperate well but could be used separately — for example, it has utilities for manipulating colors, time, and SVG paths. Of the various concepts, though, I think D3 selections are the most distinctive and fundamental, so they are the focus of this post.
At a high level, D3 selections feel like jQuery. You run some code and it goes into the DOM and adds, deletes, and mutates a bunch of elements. The docs even endorse monkeypatching d3.selection to add custom helpers. However, D3 has data binding and batch operations that make it easy to change the DOM in a way that resembles reconciliation in a framework like React.
Selections
API: Selecting Elements.
A D3 selection holds an array of arrays of nullable1 DOM elements. The intermediate arrays are called groups. Additionally, each group in a selection is associated with a parent node. During basic D3 usage, you might only ever work with selections with a single group and ignore parent nodes.
When relevant, I will call the index of an element inside its group the “within-group index” and the index of a group among all groups in a selection the “across-group index”.
d3.selectAll(someCssSelector)
creates a selection with one group, which holds all elements matching some CSS selector. You can also pass an actual iterable of DOM elements if you have one. The group’s parent is the root element, probably thehtml
element.d3.select(someCssSelector)
is the same except it only selects the first matching element (if any). You can also pass an actual DOM element.d3.create(tagName)
creates a selection with a single new (detached) element.
Given a selection sel
:
sel.select(someCssSelector)
creates a new selection, replacing each (non-null) element with its first descendant matching the selector if one exists (or null if not). (It also copies the parent’s datum to that descendant; see the data section.) Group structure and parent nodes are preserved.sel.selectAll(someCssSelector)
creates a new selection with one group for each element insel
. Each new group contains all descendants of the original element matching the selector, and its parent node is that element. (Therefore, the group structure of the original selection is flattened and lost.) I believe this is the most likely way to first get a selection with multiple groups and/or nontrivial parents.
Some ways to get non-selection info out of a selection:
sel.nodes()
returns all elements as an array (flattening group structure). This is also what you get if you iterate over a selection.sel.node()
returns the first non-null element.sel.empty()
tests for emptiness.sel.size()
counts non-null elements.
Modifying elements
API: Modifying elements; handling events.
Many methods on selections are just for-each operations that modify each element in each group in some way, and then return the same selection to allow you to chain calls. For example, sel.style("color", "red")
will set the CSS color
property of each element in that selection to red
(and then return the same selection). Such methods include:
.attr
(name and value).classed
(class name and value = boolean, for whether to add or remove the class).style
(name and value).property
(name and value) (e.g. text fields’value
, checkboxes’checked
).text
(value).html
(value)
The above methods can also be used as getters if you don’t pass a value, but they just return the value for the first non-null element (rather than some kind of data structure of values with the same shape as the selection).
Some more interesting methods modify the actual node structure of the DOM:
.append(tagName)
appends an element of that tag name to each (non-null) element in the selection and returns a new selection with the added elements, in the same group structure and with the same data..remove()
removes each element from the DOM..raise()
resp..lower()
moves each element to the end resp. the beginning of its parent’s children. The names make sense when the elements are absolutely positioned and visually overlap: “raising” an element gives it a “higher z-index”.
Adding event handlers is similar: .on(eventName, handler)
. The handler will be called with two arguments, the event itself and the element’s current datum (see below); and will have this
set to the DOM element that received the event.
sel.each(f)
calls f
on each DOM element. Inside such a function, you could d3.select(this)
to get back into selection mode (though the function can’t be an arrow function). Finally, sel.call(f, ...args)
is just f(sel, ...args)
but allows you to continue long method chains.
Binding data
API: Joining data. How Selections Work has helpful visualizations, though it’s old, so its API references have bit-rotted.
Every DOM element may have a __data__
property that D3 pays attention to. (Reading comprehension note for the rest of this post: “data” is the plural of “datum”, but each __data__
property is one datum. If I could selfishly redesign D3 to maximize how intuitive it is to me personally, I would have renamed the property to __datum__
.) This is a plain ol’ JavaScript property with no significance in web standards! D3 actually just writes to or reads from element.__data__
. You can do this yourself — literally grab an element in the browser inspector and look at element.__data__
— to debug your code, which is neat. D3 doesn’t store data anywhere else.
Generally, the above methods on selections also take a function that receives the datum as an argument. sel.style("color", d => d.color)
is roughly like element.style.color = element.__data__.color;
for each element. (The function also gets two more arguments, the within-group index and the entire group itself as an array, and is called with this
being the current DOM element.)
A simple but less often useful way to set data is with sel.datum(value)
, which sets the datum for each element independently. value
can also be a function, which gets called with the same arguments as the modifier functions above: (current) datum, within-group index, and entire group. As a getter, sel.datum()
gets the datum for the first non-null element.
Usually, though, one gets data onto DOM elements with the sel.data(data, keyfunc)
method. Before proceeding, I’ll mention that calling sel.data()
with no arguments produces an array with all data in the selection, but the group structure is flattened.
The .data
function
sel.data(data, keyfunc)
works as follows. For each group in the selection:
Produce an array of data:
- If
data
is an array, it’s just a copy of that. - If
data
is a function, it’s called (once per group) with three arguments: the group’s parent node’s datum, the across-group index, and an array of the selection’s parent nodes. The call will be made withthis
being the group’s parent element. This should return an array.
- If
Attempt to match each DOM element in the group with an element of the data using
keyfunc
.- If
keyfunc
is passed, the basic idea is thatelement
matchesdatum
ifString(keyfunc(element.__data__)) === String(keyfunc(datum))
. (keyfunc
is called with more arguments, starting with an index within the group or the data array, but look them up yourself.) However, if there are duplicate keys on either side, all appearances after the first don’t match anything. - If
keyfunc
is not passed, the elements and the data are just matched in the order they appear in their respective arrays. If one array is longer, the elements or data at the end of the longer one aren’t matched.
- If
After matching, there may or may not be leftover, unmatched elements and/or data. Imagine a Venn diagram. Each element and datum falls into one of three categories:
- The update selection comprises the (equally many) elements and data that match each other.
- The exit selection comprises elements that didn’t match any data.
The enter selection comprises data that didn’t match any elements. This is not a normal “selection” as we’ve defined it; instead of DOM elements, it contains placeholder “elements” that are objects with a
__data__
property and a reference to a position in the DOM (right before some other element). They can be imagined as like emptyReact.Fragment
elements at those locations.Just about the only useful thing you can do with an enter selection is to call
.append(tagName)
on it, which will insert elements at those positions, propagate the__data__
from the placeholder element, and leave you with a normal selection. Even this won’t work if the group’s parent node was not actually the parent node of the group’s constituent DOM elements.
The ordering/position of elements in the update and enter selections is determined by the ordering/position of their corresponding data in
data
(so both may have nulls).
You can use the return value of .data
directly as a normal selection (e.g. by chaining more methods onto it), and it’ll behave like the update selection. This might be all you need if you know that the elements and the data match perfectly.
Otherwise, you can call .enter()
and .exit()
on the return value of .data(...)
to obtain the enter and exit selections. (These methods don’t do anything interesting on other selections.) In standard usage of .data()
, your goal would likely be to insert and delete elements until the elements match the data, similar to React reconciliation. The most straightforward way to do this, in the easy case where the update selection elements are already in order, would be:
If you want to then keep chaining methods on the elements after insertion and modification, you can combine the update and enter selections with .merge
. (This function does not work in general to combine selections. What it does is replace null elements in the first with elements in the exact same position, across and within groups, from the second.)
If the update selection nodes are not already in order according to the data, and you want them to be, you’ll finally need to call .order()
on the merged selection.
You can perform all the steps after .data()
— insertion, deletion, modification, and reordering — in one function call with join
. In full generality, join
takes three arguments, which are functions to call on the enter selection, update selection, and exit selection.
One reason to write three separate handlers is to use D3 transitions to animate elements appearing/disappearing. See below or the selection.join notebook. However, in the super common pattern above, you can pass a string as the first argument and not pass the second and third arguments.
Some worked examples and pitfalls
To create a <ul>
with four <li>
’s with data 1, 2, 3, 4, and the same text:
The selectAll
doesn’t select anything and will produce a selection with a single empty group, but it is necessary to promote the ul
to the parent node of that group. It doesn’t matter what selector you pass into selectAll
, but passing the same tag name you’ll pass to .join
later is idiomatic since it makes that part of the method chain idempotent. (With d3-jetpack, you can append many nodes and bind data in one step with .appendMany("li", [1, 2, 3, 4])
.)
Now, suppose sel
is the above selection: a group with four nodes node1
, node2
, node3
, node4
that have data 1, 2, 3, 4, respectively. If we run:
Then:
- The update selection will have nodes
[node4, <empty>, node2]
. - The enter selection will have nodes
[<empty>, (fake node 0), <empty>]
. This is an implementation detail, but the fake node will referencenode2
because 0 was before 2 in the new data so it knows where a “real node 0” should be inserted. Calling.enter().append(...)
would insert a node with datum 0 there. - The exit selection will have nodes
[node1, <empty>, node3, <empty>]
.
If we then ran
then the DOM would have five <li>
s: node1
, a new node0
with datum 0
, node2
, node3
, node4
.
If we then ran
then the DOM would have three <li>
s: node0
, node2
, node4
, with data 0
, 2
, 4
. This matches the data we passed to sel.data
, except that it’s in the wrong order.
If we finally ran
then the DOM elements would have been rearranged in the order we want: node4
, node0
, node2
. (Although node0
has datum 0, it won’t have any text; we never set it.)
And again, the last three lines of code are equivalent to sel.join("li")
, so we probably wouldn’t think about enter and exit selections manually unless we wanted transitions or something fancier.
Finally, a demo building a table out of two-dimensional data:
d3.selection()
.append("table")
.selectAll("tr")
.data([[1, 2], [3, 4]]) // first <tr> has datum [1, 2]; second <tr> has datum [3, 4]
.join("tr")
.selectAll("td")
.data(d => d) // called twice, once with d = [1, 2], once with d = [3, 4]
.join("td") // four <td>s in two groups
.text(d => d);
d3-transition
d3-transition animates changes to the DOM. which are useful for writing interesting .join
enter/update/exit handlers.
A d3.transition
instance holds a D3 selection, information about the transition’s timing, and information about the transition “destination” (what attributes and such the transitioning elements should end up with), all of which are configurable. Once a transition instance is created, it will be scheduled to begin in the next frame, so you must finish configuring it immediately and synchronously. You can’t create or half-configure a transition and then start it later.
The simplest way to obtain a transition is to call .transition()
on a selection.
For configuring timing, you can set an initial delay transition.delay(n)
or total duration transition.duration(n)
in milliseconds, as well as an ease function transition.ease(e)
, which will most likely be an ease function from d3-ease (possibly after further configuration itself). These methods can also take functions, which are called and used the same way as selections’ modification methods.
Simple methods for configuring the destination are just like modifying a selection, .attr
and .style
, though there are “tween” methods for finer control. Calling .remove()
will remove the element at the end of a transition.
Calling .transition()
on another transition creates a transition scheduled to start when that one ends.
Other sublibraries, briefly
Since there are twenty-odd libraries, I decided to just describe a random subset of them that caught my eye.
A high-level description of D3’s API style: many D3 library functions return other functions that can be directly called, but that also may have attached setters. The setters mutate the original function-object and return it so that setters can be chained. Many setters also work as getters if called with no arguments.
- d3-array has a lot of simple array-manipulation utilities. You might find some of these in other utility libraries, but if you’re already using D3, might as well use its versions. Examples:
d3.min
,d3.max
,d3.extent
(returns a two-element array[min, max]
),d3.sum
,d3.mean
, all of which take an iterable and ignore undefined, null, and NaN.d3.zip(...arrays)
d3.cross(...iterables)
(Cartesian product)d3.pairs(iterable)
gives (overlapping) pairs of adjacent elementsd3.range(...)
is basically Python’s range.
- d3-color manipulates colors. Each D3 color object is tied to a specific color space or representation, so you have to explicitly ask D3 to convert between them. When coerced to strings, colors turn into
rgb(...)
strings you can use in CSS. - d3-interpolate interpolates between numbers, colors, and other data types. In general,
d3.interpolate(foo, bar)
is an “interpolator”, a function that’sfoo
at 0,bar
at 1, and interpolates between them in-between (and sometimes even extrapolates below 0 or above 1).d3.interpolate
itself might be called an “interpolator factory”; different color interpolator factories using different spaces exist. d3-scale works with scales. A scale is a JavaScript function that maps data values from some domain (e.g. numbers, time, temperature) to values used for some aspect of visualization (e.g. numbers, color, length/position), though scales also have a bunch of extra functions.
Linear scales are probably the simplest. Example:
d3.scaleLinear([1, 2], [100, 200])
linearly maps the interval[1, 2]
to the interval[100, 200]
, i.e. it multiplies by 100. (Linear scales are simple enough to extrapolate outside their domain.) You can also write this more explicitly asd3.scaleLinear().domain([1, 2]).range([100, 200])
. You can specify more than two numbers in the domain and range to produce a piecewise linear scale. There are also many similar numeric scales liked3.scaleLog
.Sequential scales also have continuous domains and ranges, but are mainly used when you want to specify the range with an interpolator (such as a continuous color scheme from d3-scale-chromatic, see below) rather than an interpolator factory. Typical usage:
d3.scaleSequential([lo, hi], d3.interpolateViridis)
. There are many other variations, like the log scaled3.scaleSequentialLog
. Diverging scales are similar, but have domains specified with three elements instead (I’m guessing so you can use a diverging color scheme and specify that the middle of the domain is 0 but set the negative and positive endpoints independently).Ordinal scales have discrete domains and ranges. You can even omit the domain and just let the ordinal scale memoize. Suppose
ord
is an ordinal scale likeord = d3.scaleOrdinal(["red", "green", "blue"])
. Each time you callord
, if you’ve calledord
with the same argument before, it’ll return the same result as before; otherwise it’ll return the next thing from its range, looping around if necessary.Band scales and point scales have discrete domains but continuous ranges. Both split the range up evenly, but band scales split it into intervals and give you intervals’ left endpoints, while point scales give you evenly spaced points that map the endpoints to the domain to the endpoints of the range.
Quantile, quantize, and threshold scales have continuous domains but discrete ranges, kind of. This is technically inaccurate for quantile scales because you give a quantile scale the full dataset you care about as its domain. Quantize scales split the domain into equal intervals. For threshold scales you have to specify every interval/“breakpoint” that matters in the domain.
Not all scales support every method below, but some common methods are:
scale.domain([lo, hi])
(setter or getter)scale.range([lo, hi])
(setter or getter)scale.invert(r)
tries to map something from the range to the domainscale.interpolate(d3.interpolateHcl)
(e.g.) sets the interpolator factoryscale.nice()
“extends the domain so that it starts and ends on nice round values”, so that if you labeled the start and end on a chart it would look nice.
d3-scale-chromatic predefines a bunch of color schemes, so you can rely on decades of data visualization wisdom about perceptual uniformity and aesthetics.
- The categorial schemes, e.g.
d3.schemeCategory10
, are just arrays of colors, which you can pass intod3.scaleOrdinal
. - The continuous schemes are interpolators that map floats in [0, 1] to colors, and can be used with
d3.scaleSequential
above: examples included3.interpolateBlues
,d3.interpolateRdBu
,d3.interpolateRdYlGn
, andd3.interpolateViridis
. - There are also some schemes of schemes, such as
d3.schemeBlues
; each one is an array of arrays of colors andd3.schemeBlues[n]
has lengthn
, forn
from 3 to 9 inclusive.
- The categorial schemes, e.g.
- d3-format helps you format numbers. The format is loosely inspired by Python
.format
.d3.format
is curried; it takes a format string and returns a function that takes a number and returns a formatted string. Even the defaultd3.format("")
is useful because it uses the correct Unicode minus sign. d3-drag lets you implement drag-and-drop behavior simply.
d3.drag()
creates a “drag behavior”, which can be configured and then called on a selection to attach its listeners to that selection. The most important configuration isdragBehavior.on("drag", handler)
; other events are"start"
and"end"
.handler
receives a fairly synthetic drag event, which has properly offsetx
andy
coordinates if the “drag subject” is set up properly.Here is a minimal drag example, but there is more magic here than I expected: the default drag.subject works off datum attributes specifically named
x
andy
. (See this StackOverflow answer.)let myDrag = d3.drag().on("drag", function (event, d) { d3.select(this).attr("cx", d.x = event.x).attr("cy", d.y = event.y); }); d3.select("svg#d3-drag-demo") .append("circle") .data([{x: 100, y: 100}]) .attr("cx", d => d.x) .attr("cy", d => d.y) .attr("r", 50) .attr("fill", "#c00") .call(myDrag); // same as myDrag(the entire preceding expression)
Sandbox
While learning D3 I found it very useful to just open up a browser console, run random d3
commands, and see what happens. Here are some DOM elements with preloaded D3 __data__
and styles if you want to do the same.
A div#d3-sandbox-div
with ten div
s with data 1 to 10:
A svg#d3-sandbox-svg
with viewbox is -300 -300 600 600
, so both coordinates’ ranges are [−300, 300]. Its children are 12 g
s, each of which has data {i, x, y}
and has been .style("transform", "translate(...px, ...px)")
’ed to those x
and y
coordinates. Each g
contains a circle
and a text
, and has a drag handler configured.
Empirically, instead of actually having
null
orundefined
, the array has the JavaScript specialty empty slots at those locations, such that there appears to be anundefined
there if you go by length and indexing, but iteration skips those locations entirely. This is an internal representation you probably don’t care about though. I’ll still call these elements null or non-null.↩