EPISODE 1793

[INTRO]

[0:00:00] ANNOUNCER: Data visualization is increasingly important as organizations prioritize data-driven decision-making. Tools that transform complex datasets into intuitive, interpretable visualizations are arguably just as critical as the data itself. Robert Kosara is a data visualization developer at Observable, which is a platform for creating interactive data visualizations and which makes extensive use of the popular D3 JavaScript library.

Robert previously worked at companies including Salesforce and Tableau and has deep experience in data visualization and data visualization tools. He joins the show to talk about modern data visualization and his work at Observable. This episode is hosted by Sean Falconer. Check the show notes for more information on Sean's work and where to find him.

[EPISODE]

[0:00:59] SF: Robert, welcome to the show.

[0:01:00] RK: Hi, thanks for having me.

[0:01:01] SF: Yes, absolutely. Thanks for being here. I worked in information visualization lab, you know, once upon a time. So, it's going to be fun to, I think, revisit this field and maybe resurface some long-forgotten knowledge that I once had.

[0:01:14] RK: For sure.

[0:01:16] SF: Yes. So, I want to start off with a bit of background on you. You were once an academic, but you've now made the migration into industry. What motivated that transition and how has that experience been?

[0:01:28] RK: Sure. Yes. I mean, it's almost like ancient history at this point. I was a professor at UNC Charlotte many years ago until 2012. That's when I was doing a sabbatical at Tableau. Then, at that time, Tableau was just starting to think about doing more research, and so that's when I made the switch into industry. Then, I was in industrial research at Tableau for 10 years, and then about three years ago, I switched over to Observable, and I'm now doing developer relations and product education at Observable.

[0:01:59] SF: How would you think about, especially in Tableau, where you were a little bit on the research side? How does that experience of doing that in industry versus in like an academic setting different?

[0:02:10] RK: Well, I don't want to ramble too much about like all of the things that annoy me about academia, but it tends to be that in academia, there is a lot of work that you do as a professor that has to do with sort of like administration and getting grants, getting money essentially to pay your students. Then the students are the ones that do the actual research work to a large extent. Of course, under your guidance and you're sort of like managing them. But a lot of what you do is really administrative, much more than sort of content.

That's what really drew me to Tableau at that point because I was just doing my own work and I had my schedule open and I could just do the work rather than having to go to all kinds of meetings and having to teach, of course, as well. Teaching takes up a ton of work and love time, even though I did enjoy it most of the time. It also was just a lot of work and time spent.

So, in the industry, you tend to just be able to focus on that. Also, of course, academic research is all about publishing papers. Industry research can be quite different. So, it depends on where you are, but some places are very focused on publishing. I think like Microsoft Research, I think. I don't know if that's still the case, but they certainly used to be very heavy on publishing. Then other places, like Tableau, actually, Tableau Research, was a mix. So, you wanted to have some sort of impact, and that could be publication, that could be getting your work into the product. That, of course, was what I was after at that point was to have more impact on real users and the real product rather than publishing something and then moving on to the next thing, and never really making it into people's hands, which is what happens with a lot of academic research.

[0:03:51] SF: Yes, I kind of got turned off from academics for similar reason. I did a PhD, postdoc, and I think I had this thought originally where it was like, oh well, if I get to a place where I'm a professor, I'll be able to sort of choose whatever I want to work on. In some sense, maybe that's the case, but like really all the fun work gets end up being done by students and it is a lot of administrative work like you're sort of under resource from the standpoint of like running a lab, it's like you're running a company in some aspects but you're sort of the sole person in charge of making the funding happening, while also teaching, while also trying to get your students to graduate and do things that are important and things like that. So, so there is this, I think, downside that isn't necessarily apparent and tell you sort of reach that stage.

[0:04:36] RK: That is absolutely true, yeah. Here we're just basically saying at the end there, I think it's also a matter, of course, of kind of getting over this hump, or it's like the startup as a part of that is really takes a long time, and it's really hard to sort of like get over. Then of course, some people are really good at this. They have a pipeline of grant proposals, and they know that they have money coming in for this and that, and they keep this mill of papers and grant proposals running. I just wasn't that good at that. It's just something that you have to have a talent for to really be good at that. Then it can work very well, for sure. I don't want to say that academia is miserable, but some people manage it really well and are extremely productive. So, that's certainly true.

[0:05:15] SF: It seems like you still publish a fair amount. How do you balance making those kind of research contributions while also having a job that maybe is a less research forward job?

[0:05:25] RK: Well, I publish a little bit. It's been essentially a paper a year, and those are mostly opinion pieces rather than really research. It's tricky, because, and I do also still do some service, so there is reviewing for conferences and journals that I still do a little bit of, because it helps to be part of the community a bit, and also be aware of what's happening, because you see what's going to be published in six months or in a year, if you're one of the people reviewing that and being part of that. It's mostly my spare time right now. It's not actually really part of my job description right now.

[0:05:59] SF: Does that help with your credibility within your day job as an educator in the space?

[0:06:04] RK: I think so, yes, for sure. I think people appreciate seeing somebody who's still part of this community and of research and being out there, so like being visible as somebody who's doing that work still, that certainly helps with credibility. It's also, I think, this is a big reason why research groups like Tableau Research, like Microsoft Research publish because of its credibility. It shows that the company is trying to do something that's bigger than itself. It's not just making money and building products. It's about being part of a community and of a larger kind of society in some sense.

[0:06:36] SF: What originally sparked your interest in data visualization, information visualization, and data apps?

[0:06:41] RK: I've always been fascinated by graphics in general. I think those are visual. Especially once I saw that you could actually make visuals from data, I was like, "Well, that's the thing to do, because this is so obviously a good idea." Because we can use our visual abilities to look at numbers and even large amounts of numbers and see patterns and understand what's going on. It's just really fascinating just to kind of see that when you first kind of look at a scatterplot and you see the correlation, you see the outlier. You see things in there that are interesting. Especially if it's data that you care about that has some meaning for yourself or for your work, of course, that you can now actually use that to figure out what to do next, or how the change of behavior, things like that.

Also, I guess, the other thing is communications. Really, I'm interested in how to tell people something about the world using visuals because it can be very impactful. But it can also be tricky, of course, to read, but there's always this trade-off. But I think as a way of communicating, visuals can be incredibly powerful.

[0:07:42] SF: How do you think about separating or the difference between, I guess, information visualization as a science versus, "Hey, I produced a nice graph." What is that depth difference, I guess, is what I'm looking for there? How are these things different?

[0:07:58] RK: That is a good question. There is actually, I think data visualization tends to have a very nice and tight coupling between research and practice. So, I guess to answer your question, the practical side is like, how do I make something? How do I make a chart, a data visualization, an interactive piece, some kind of data app out of my data or for a specific set of data, whatever that is. The research side is more about, like, what are the right ways to represent the data? What are good ways of interaction? What are things we know about the use of color, the use of different kinds of visual encodings? How can we also bring in some recommendation systems, other things like that, that help you with that?

So, there's a lot of work that's being done there. And of course, this kind of AI being this big topic that's becoming something that people are trying to do and figure out like what can you do there? But the recommendation systems have been part of data visualization for a long time going back to like the 1980s. There's a paper from 1985, that I was doing, and there might be even older ones. In data visualization, it's relatively easy to take ideas from research and put them into a product or even the other way around, because a lot of new things that get released also have publications associated with them. So, it's nice to see sort of like the thinking behind work that's being done on a new feature somewhere that you can then actually read about, like the thinking behind that and some of the mechanics behind that as well.

I think for a database, it's relatively easy to make that connection across from both sides, both from the industry side, and stuff like the product side and from the research side. That's also, I think, why this is a fairly vibrant space where there is a lot of awareness and cross talk between both communities or both parts of this community.

[0:09:45] SF: Yes. So, like the visual might be sort of the, if I'm understanding correctly, anyway, the visual is like the output, whereas like sort of from a research perspective is what was the thinking that goes into creating the visual to be something that's like actually that's actually useful, tells a certain story, or allow, unlocks someone's ability to explore data in a new way.

[0:10:07] RK: That is true, but I also think there's a lot of additional things that you have to be aware of and consider when you're building systems that you want to actually sell to people. For example, data access is a big problem in practice that you don't usually deal with in research, because you just have the data. It's usually not that much data, so usually, the research data sets tend to be relatively small. But in practice, in real work, there is - or I should say, in industry, I want to say real too much here. But in industry, as you're using something, you may be talking to a large database, or a data warehouse, or something that's way beyond what you could just load into your little app. 

So, you have to think a lot more about how do you push down work into the database or how do you query the database correctly, or where do you even get the data from? What's the shape like? Do you have to clean it up first? Do you have to reshape your data? There are a lot of steps before you can even build a visual and then the visual can be the end result, but it can also be a step along the way, because you might be building something and it could be elaborate or it could be very simple to answer quick question or to answer a first question, and then that lead you to, well, what's the next thing that I need to do? And then you kind of go along the way there.

So, in my experience, the visual often actually - well, it depends on the use case, of course. But very often, the visuals are sort of like stepping stones. They're not necessarily the end product, unless you're doing presentation of some kind, because you're talking to an executive, or you're presenting to a board or whatever. Then you're going to go through a lot of visuals first, but then you build the products that actually get shown to somebody else. But that's also, I guess, a difference in visualization between academia and industry where academia tends to emphasize the visual as the product more, because they're just less embedded in the larger questions of what's the overall work that you're trying to do this for, like what's your job, what's your task? Whereas in industry, there's much more focus on like, well, what's the next step? And how do I get from here to the next question or the next answer?

[0:12:13] SF: Then given that you could have something that like looks really good is like very aesthetically pleasing, but maybe is like a really bad sort of user experience where it's just not that useful. How do you think about like balancing those two things where something that's aesthetically pleasing can go a long way of making someone making someone be helpful in terms of making someone feel a certain way, but you can also build it in such a way that's not particularly useful? So, how do you balance those two things?

[0:12:38] RK: Yes, that is the crucial question, because there's always this trade-off, especially if you're trying to get somebody's attention, then you might want to do something fancier or even something that's technically not correct. But it gets your attention, and that's really what it's about. If you just make bar charts and line charts, because those are the correct way to show whatever data it is, but nobody cares because they all look the same, well, you haven't really done the right thing to get people to do something about whatever it is that you're trying to get people's attention for.

So, that's a big tension there, where the question is often, well, this isn't the way you should do it, but in reality, you have to get people to actually pay attention to something, and for that you need to do something that's unusual, at least, and that has to stand out in some way, which doesn't necessarily excuse all kinds of bad charts. But I think it's too easy to have a very dogmatic view of data vis that is very disconnected from the reality of how people really use data. I think, I certainly, coming from academia first, my thinking, of course, used to be that, well, there are rules and ways of doing things. 

But seeing what people do, I think you just naturally change and see that the goal, what the outcome is, really justifies what you built to a logic set. I'm not accusing everything. But if it gets people's attention, if it shows them the right data and doesn't lie to them, I think it's perfectly reasonable. Oftentimes, it's better to have people do things that are a bit unusual and are a bit more fun perhaps than just having sort of like insisting on the rules and just making some more bar charts that no one's going to look at. 

[0:14:20] SF: Right. Yes. You almost need like some sort of like anti-pattern to maybe draw attention. And also, there's an aspect of like, if it's fun, then you'll actually use it versus if it feels like a chore. Even if it's like technically correct, it might not be very useful.

[0:14:33] RK: For sure.

[0:14:34] SF: What are some of the interesting, maybe non-obvious findings that have come out of information visualization research work?

[0:14:41] RK: Let me see if I can find a few quickly that I can think of. I'm not sure if this is really research so much as practical use. I'll think of a research example, but one thing that came that I like to cite as something that people do in practice, which also goes back to your earlier question about things that you're supposed to do or not, is that people use TreeMaps in ways that isn't really what they were intended for, but it's not wrong in any way. So, what TreeMaps are really for, and just to briefly explain what a TreeMap is, a TreeMap is a rectangular space that you subdivide so that it represents some number. The easiest example is a file system. You have files in some kind of hierarchy, and each level of your folders contains files, and those files have sizes, and you take those sizes and add them up together, and then each folder and each file becomes a rectangle in this representation. You can see each level and how big those files are.

So, this is actually a very useful little tool for that. That's what it was developed for originally, was to actually figure out what's taking up all the space on my hard disk? And it was built for deep trees. So, meaning trees that have lots of hierarchies, like a folder hierarchy that goes many, many levels deep. But it turns out it's actually really hard to read the depth on a TreeMap. There are different ways of doing that, and they're all not that great. It's really hard to know what level you're on. But the size comparison is very useful.

What people have started doing is they just make a TreeMap instead of a pie chart to show part-to-whole relationships. So, you have your departments, revenue numbers, and how they all add up to the total, and it's a TreeMap. People like those because there's a lot of prejudices against pie charts, but also, because they make big, nice rectangles that take up the space and you can easily compare them. So, you get a good view of that data. That's one thing that that's not so much research, but it came out of the practical uses of a research thing for something that wasn't really intended for, but that works extremely well. That's been kind of interesting.

[0:16:37] SF: Yes, it's probably the most practical use case of a TreeMap that I've seen. I did a bunch of work on TreeMaps in my PhD, and I think one of the challenges there is they come up a lot in academics and research, and I think they get proposed probably at least back then in industry, but then end up disappearing by the time you roll something to production or they choose some other option. I think it does come back to this. They're very effective at showing sort of volume of data, but then you try to jam in additional dimensions into it by using color and maybe other forms of visuals, and then you kind of lose the actual like affordance of the visual, whereas like the primary thing that gives you is a volume kind of like a pie chart, but like a hierarchical pie chart.

[0:17:20] RK: Yes. For sure It's very much you could treat them essentially like what's called a waffle chart, which is like a rectangular pie chart essentially and they can work very well depending and what you're using them for.

[0:17:32] SF: What are some of the most common missteps that you see when you see products in industry and you shake your head at it basically and you're like, "Oh, my God, I can't believe I'm seeing this again."

[0:17:40] RK: Yes, I think one thing that is very common that I've seen is people really overemphasize the glitziness of things and try to make things pretty before they think of how they actually fit together to be useful. I think that probably has to do with trying to impress people, because this is an amazing new thing and look at how amazing it is. I think this is especially true for animation. So, there's a lot of stuff that you can do that's very easy to do these days with animation on the web and everywhere. It can be used very well. So, I'm actually very much pro animation. There are some people who really hate it. I think it's very, very helpful, especially for transitions.

But it can also be overdone and there are things like when you have animated textures on things or just everything being animated, everything bouncing and having little cute little things going on, it just becomes too much and that's something that I've seen in a few places that I think is a really big mistake. You have to really be very, very careful with these things, because they're very visually salient. So, animation is a lot like color. Color just draws your attention and you have to use it with care. And then it can be very useful, but it has to be used with a lot of care to not be totally distracting or overwhelming. The same is true for animation, actually, even more so because it always draws your attention. So, if something moves or it bounces, because it's used for as an attention mechanism is to get your attention to something. But if stuff always bounces, it just gets distracting and overpowering and overwhelming.

[0:19:07] SF: Yes. If everything is attention grabbing, then basically nothing is attention grabbing, right? I mean, it's a little bit if you're putting together PowerPoint slides, like animations and transitions can be effective tools, but every single thing is animated in a transition, then it becomes very, very obnoxious.

[0:19:24] RK: That's a good comparison. Yes, because that's also where I've seen it, yes, for sure.

[0:19:26] SF: What are some of the trends and designs that have kind of come and gone in terms of in fashion, out of fashion?

[0:19:33] RK: So, I think there was a whole sort of school of doing things in the late nineties, early 2000s, perhaps that had like, there's lots of use of gradients. A lot of this has to do with technology, I think. It became easy to do gradients and shading and stuff like that. So, people did a lot of that. Now, we've kind of moved to the opposite. Everything's very kind of two dimensional, very, very much like just playing colors and nothing else. I think that, to me, is a little bit of an over-response because it makes everything kind of look the same and also takes the fun out of things.

I'm a big fan of R.J. Andrews and his work on looking at historical data visualizations. And those, of course, those are hand-drawn, like charts and maps and all kinds of things. So, they use like hatching and little patterns and things like that. Those can be very effective to get your attention to something. I keep wondering, what can we bring back that's a bit like that, that's a little bit more playful, a little bit more textural and sort of like tangible that isn't just like solid colors, and nothing's allowed on the chart other than two colors and your white background? And everything like ends up looking very, very similar.

[0:20:50] SF: Yes, I mean, I think it's almost like anything where we end up with these like overcorrections of something where we kind of do something, we do it to death, basically. Then, we go like the complete opposite direction, and then that becomes the thing, and then we do that to death. And then we go in another direction. So, it's kind of like fashion, they say, what's that in style and then goes out of style and it's in style again.

[0:21:11] RK: Yes, definitely. What is an example of a problem that was solved through an information visualization technique that maybe wouldn't have been able to be solved otherwise?

[0:21:23] SF: So, one example that we had at Observable was, we make these charts of traffic. We look at our traffic on the Observable website and try to figure out, essentially, where does it come from? What is it for? And these are very dense plots. So, this is something that also we could talk about, perhaps, this distinction between trying to have everything be a summary plot versus showing essentially every single data point. But one of the charts that we have is very much every single data point from our logs to show essentially all the data for every day when a certain path was hit. They're color-coded. We actually have an example on our website for that, what it looks like.

The thing that you see when you do that, you have this very dense display of a very dense point cloud. It just seems random at first. But then you see some patterns, and sometimes you see interesting, kind of weird lines going through, or just darker clusters where there is a certain thing that's being hit more than others. The color here actually, is essentially the path. So, we have certain paths that we pick out and those different colors assigned to them.

So, we found very interesting patterns, something we're actually looking, because sometimes there was a lot of traffic and things were slowing down and we were like, "Well, what's causing this?" And we found people scraping the site. So, we were able to shut that down if they're causing too much trouble because they're just like hitting certain paths all the time that are expensive and slow. They're taking up a lot of resources.

But also, we found some other patterns that we just weren't aware of and we're like, "Oh, people do this. We didn't know that." For example, this is a kind of scraping thing, but people were looking at all of our profile images of our users and scraping those. We were like, "Well, why are you doing this?" And then in general, it's just traffic, the amount of traffic coming in. So, you see where that goes and that can help to balance out or to figure out where to put resources or where to maybe reroute things so that they're more efficient.

There are lots of little things you can find, but just looking at these patterns. In this case, they had to really be individual data points, really, it's hard to see this in a summary chart. But you get lots of information from these things that give you an overview of that kind of data. Another example is also, what do people search for? We have this search explorer that lets you see when people type into a search field, what do they type next? So, as they search, because there is a preview that's happening as the search is running, and so it shows us common search terms. Also, if you start searching this thing, well, where are you going? This could actually help build a recommendation system that would help you find the things faster. That's another thing where you have to really dig into all of the records and find, dig out those things that are common among them that are otherwise hard to find, because they're really weird, like sub-patterns in a huge morass of data. But you're picking out those things that are structured that you can find and pull and pull out.

[0:24:12] SF: In terms of like using visuals to help like people essentially recognize different patterns within their data, how is that compare or contrast to using some sort of like statistical method that I could use around pattern recognition that just gives me the answer?

[0:24:29] RK: What is the question? Does it actually give you the answer right? So, it means more exploration of the data and more open-ended then the data visualization is clearly the better way to do it, because you don't exactly know what you're looking for. It's hard to know beforehand what pattern it is that you want to pull out. Especially, like my example with the traffic, it's like, well, we see some of these lines going through our clouds of points. So, we can ask, well, what is that? Or when there's suddenly a little cloud of one color among the other colors that usually isn't there, that's the kind of thing that you pick out usually very easily, that you wouldn't know what to look for.

So, if you can formulate your question in very precise terms, then you probably just want a query or a statistical method of some kind. But if the goal is to say, "Well, I don't know what's happening, I want to find out what the issue is, or I want I find out what the cause is, or I just want to see what's happening today. What are the new things jumping out at me that I don't know about?" So, it's like these unknown unknowns. That's what visualization is great at because you can see them and you can start then digging into those patterns that you didn't expect or that just jump out at you.

[0:25:38] SF: How do you make certain visualizations understandable or is that a challenge for the layperson? If we go back even to the TreeMap example, if I don't really understand what the size of these rectangles represents, then then in that I could click on or interact with it, then it might be hard for me to map that mentally to whatever things I can do with it and how it can help me problem solve. So, can you talk a little bit about the challenges there?

[0:26:01] RK: Oh, for sure, yes. So, this is a very common thing, especially in data journalism, but it's also true everywhere because if you're just showing people, I guess you can assume, I mean, hopefully that you can assume that people will understand the basics of bar charts and line charts. But of course, those are the safe charts, so you can assume that people will know those. If you're using something like a scatter plot or a TreeMap or a Sankey diagram, then it gets a lot more important to make sure that you even know that the people that you're showing this to know what that is, or that you provide enough context.

So, in data journalism, I think they do a really good job of what they call the annotation layer, where they add annotations and examples and guide you through it, and say, "Well, here, this is a big thing and this is twice as much as that thing and it's important because of that." This is both again in the data journalism case usually what you get is people walk you through the data as much as they walk you through the visualization and that can be a good way to kind of just teach you something along the way because you're interested in the content more than the vis, usually you're going to go along with that and sort of like pick that up.

That's been a big discussion for a long time about, well, can you use more complex, more unusual charts in a news graphic, for example, when you don't know if people will want to spend the time to read it or just find it confusing? Sorry, it depends very much on the publication and on your audience, whether you expect them to do that. Also, I guess, on the kind of coverage, or the kind of news you're after. If it's breaking news, you probably don't want to do that because you want people to quickly pick it up and understand what's going on. If it's more of a complex feature piece where you want people to spend time and really dig deeper into it and understand what you're trying to tell them, then you're going to do something more complex, but you have to explain that.

So, in the business use case, since of course data journalism is interesting, but it's not everybody's work, of course. In the business use case, I think the important question is always who's your audience and are you going to be there to explain something to them? Or is this more of a presentation style thing? Or is this something you're sending to people and then you have to figure out if you're going to have some kind of explainer, or if you're going to use a simpler chart just to be safe?

[0:28:12] Yes. Clearly, there's different categories and expectations. If I'm interacting with a tool like a Tableau or something from Palantir or one of these types of companies, there's probably an assumption around like, "Okay, this person's gone through some sort of training to understand how to use this and drive analytics in some extent." Because these are exploratory tools that I can use to come up with hypotheses to run further investigations and things like that versus like, "Hey, I'm going to just send this to somebody who's maybe I don't even know what the role is or in the sort of journalism case, or a particular business user, then expectations about usability is going to be different."

[0:28:50] RK: Yes, for sure. That's where it's like, whether you're talking to your immediate team and the people who are working with those tools and who are versed in those visualizations and in the data too, versus people who are sort of the recipients of, not the consumers of your work, who may or may not know either the visuals or the data. So, it can be important there also to add more context to what does this data actually mean? What are we looking at here? Make sure that people are clear on the context and what's being measured and what the relevant goals or KPIs are, for sure.

[0:29:26] SF: The fact that we have more compute, more memory available to us now to be able to render things quicker, deliver more data faster, has it unlocked certain types of visualizations that previous were just not possible?

[0:29:43] RK: Yes, for sure. So, there are examples, like I was just thinking of, I have this sort of like this mental image in front of me, but there's this chart of our network traffic or website traffic, I guess, on Observable, where it shows you a large number. I don't actually remember what the time horizon is, but it's like a couple of weeks or so, and you have essentially all the traffic being rendered onto that, which is a lot. You have a lot of points being drawn on there, like millions of points. This is all happening in the browser. So, it has to download that data set. This is like a parquet file that it downloads, which is a very efficient way of storing it. But still, we haven't had that for that long.

Then, now it's possible to just render that. In that case, I guess this is probably Canvas, but you're rendering in the browser. There's enough power, certainly, on your desktop machine to deal with several million records and just rendering those. It takes, I don't know, a couple seconds maybe to do it. So, it's very fast. And then you still know, you can still point to things, and it can query from the location what that record actually is and show you more information. So, it's not just that you can render an image, but you actually, you still have that data available, and you can still query it and filter and do all kinds of things just in the browser.

So, that's really extremely powerful. We tend to kind of underappreciate just how much power we have in our laptops, even in our phones. I mean, they're extremely powerful computers, and they can do a lot. I think this is still not quite fully utilized and fully understood, but there's a lot that we do there. Especially with interaction, like because we can have the data right here and even downloading a few million records is quite fast these days, given networks, but also just given storage formats like Parquet, and DuckDB. DuckDB is a very powerful way of doing this because it has its own format that's also very efficient, but then also you can run it in the browser, and it's just fast and efficient. You can just run queries and do it interactively, move a slider that runs a query and renders its output and it all is done in fractions of a second even over fairly large data sets. So yes, absolutely. There's a lot of power we have today.

[0:31:50] SF: Is there certain visuals that have become more mainstream over the last decade or so? If you think about graphs, for example, you have your standard pie charts, your bar charts, things that kids are learning at school and you can find in any spreadsheet type of software. But has there been a new introduction to that space that has gone mainstream that wasn't there before?

[0:32:15] RK: Well, nothing that I can think of right now that's very recent other than the TreeMap. So, the TreeMap certainly had a big impact there, but of course, the TreeMap itself is from the mid-90s, but I think it took a while for people to realize what it can be used for, other than the Steve trees, as I was saying earlier. But I'm not sure if there is a specific - well, I guess one thing that is much more common these days, because it's just much easier to do them, is maps. Maps used to be pretty difficult to do on dashboards and data apps. Now, it's really easy to do that, because they're very fast. It's kind of hard to imagine a time before, like, Google Maps. The way we used to do maps on the web, but it used to be very and they would have to be rendered somewhere else. Today, it's a bit like what I was saying earlier about being able to render those things quickly.

You can have a lot of map data, just base map data, and render that quickly in your browser, and then render data on top of that, and then be able to move that, or like zooming it out, and things like that, and it renders and just happens right there, and it's just seamless and fast. So, that's something that I think is more common now because it's so much faster. And because maps are, for better or worse, are very popular, I think, as a representation, even though they're not necessarily the best representation for a lot of data, because I think a lot of people overestimate how important location is for a lot of data. But once you have location, it can be helpful to have that sort of as context. But yes, that's something that's become more popular, I think, because it's just so much easier today.

[0:33:45] SF: Yes. That's also a good example of something that without the growth in compute storage and high-speed internet probably wouldn't have been possible 20 years ago or something like that.

[0:33:55] RK: Yes.

[0:33:55] SF: I mean, anybody who lived through the MapQuest days and even the previous thing before even that existed, there was a real dark age for a lot of people there. So, we're living in the height of technology now. We have it much easier.

[0:34:09] RK: Yes. I think people who haven't lived through that, I can't imagine what that was like. 

[0:34:13] SF: Yes, exactly. When you stopped to go to a library to looks things up? So, you post a lot of your thoughts and writings on eagereyes. Can you talk a little bit about that site? Why did you start that?

[0:34:22] RK: I mean, honestly, it has been a little bit inactive or at least kind of dormant for a little while. But yes, the eagereyes has been my website, my blog for a long time. I started that, I don't know, 18 years ago now. So, it's been a while. I don't remember exactly what year I even started. So, this was my way to sort of get word out about my research and my work and my thinking. It was just like, what's happening in my head about database, I want to talk about it. I think this was probably an early sort of like way of trying to maybe rebel against academia and also kind of trying to kind of go beyond it, because academic publishing takes a lot of time. It takes a long time to get things out from when you do them.

So, getting things out to just people who can read a blog, it takes seconds technically, or it's much, much fast. You can write a blog post in a couple hours versus a paper that takes weeks or months to write, and then it takes months, if not years, to go through review processes and publishing and whatever. So, it was just a way to get things out faster. Also, to tell people about my research who weren't in the academic research community. A lot of people who are interested in data vis, this is also to your question earlier, I think about like the relationship between what do people expect from research versus what research does. A lot of people want to do work using data visualization that is informed by what's happening in research, but they don't know what's out there.

If I don't know that, well, I mean, how do I find that information? So, you're not going to read all those papers because some of them are actually hard to get access to, and it's a lot of efforts to find the right papers. But if you follow a few people who blog, and there are academics who blog, and so I was one of the few I guess back then, it's still not that common. But you can now learn about what's happening, both what that person is doing, like what I was doing in particular, but also, I wrote about other people's work when I was at conferences or, read a paper about something that I found interesting.

So, that was just my way to kind of get word out and say, "Hey, there's interesting work happening in this space, and here is some of it. Here's my work, and here is some other people's work. And here are also just random thoughts that I had around how to use pie charts and what else to do." So, that's what it's all about.

[0:36:38] SF: What's one of your most popular post?

[0:36:40] RK: My most popular post, I wrote a review of Edward Tufte's course that was not very kind because I was just not very excited about what he was presenting and that got a lot of traffic and a lot of comments. So, that's my most commented on and perennially, my most popular post. But the most the other ones are about specific techniques like I've written about pie charts, I've written about TreeMaps, and those are things that people find and seem to find useful where I talk a bit about the background, how these things work, how they use them, things like that. It tends to be sort of like the bread and butter, or maybe not bread and butter, but the background posts that tell you a little bit about how something works and then the practical side of it as well that seem to do well.

[0:37:20] SF: What would you say is the biggest problem that people in the data visualization field are trying to solve now? Is there a class of open or unsolved problems that people are really keen to try to wrap their heads around?

[0:37:36] RK: There are a lot of questions that people are working on. I think one thing that's been a topic for the last couple of years is figuring out if there is a way to use this wave of Generative AI for data visualization in some way. Of course, also, how to use data vis to work with these AI tools and understanding models, explaining models, helping with decision-making based on something that some AI model or LLM produces.

So, if you can understand some of the background behind that, maybe that'll help you understand where it's coming from. But I think especially this question about, well, where is it all going? If perhaps it's possible for some AI model, like I was saying earlier, finding those patterns, right? Those patterns that are unexpected. Well, if there are ways to pull those out using some machine learning thing, perhaps, well, then that would be interesting to know, because then, I don't know, that would inform what data visualization is good for or not.

I think that there are a lot of people working, or at least some people working on that, especially because it's such a hot topic right now. But there's also, there's these ongoing questions about, well, how do you interact with a data visualization? What's the right way to dig deeper into something? Also, how to even just build something from scratch? If you're starting to dig into data, where do you start? How do you help people figure out where to look and what to build from there? That, of course, all kind of flows together with this whole AI discussion. Those are some of the topics that I can think of right now.

[0:39:11] SF: Do you think that is probably going to be a big area of focus in the space, the foreseeable future? What is Generative AI's impact on information visualization? How can you leverage it there? What are some of the new types of things that you might be able to do that you couldn't do previously? 

[0:39:28] RK: I have to say that that's actually a part that I'm not super aware of what's happening there right now. I know that there are a few people working on this, but I can't really speak to what specifically has come out because I'm a little bit behind on my reading. But it's certainly, I'm sure it will be a topic going forward, and especially once people are able to build their own and train their own models, which I don't know if anybody has done that. But so far, I've seen people use like ChatGPT to help just like write these recode or help them write code or some sort to create visualizations and that's also been incorporated into some products, I think.

But that aside, I think there's a bigger question of, well, if we had some kind of corpus of visualizations and what they're good for, what they show, can we then find a way to kind of query that? I don't know how much work has been done in that space. So, I don't think I can comment on that.

[0:40:21] SF: Yes, it'd be interesting if you could look at here's some here's the data that I want to make available for analysis or something like that and create more of an exploratory experience of what visualization makes the most sense to support the exploration of this data.

[0:40:38] RK: As I was saying earlier, these recommendation systems have been around for a while. So, this is not necessarily a new question, but it's certainly whether there are better ways of doing this, that I think is going to be the interesting thing to see and to watch, because a lot of this so far has been focused on the data structure, basically. What data types do I have? Are these categorical? Are these numerical? What are they called? So, those are things that are the date fields, for example. At a time, are they like currencies, things like that? Then you get a sense of like, what will you do with that, and which ones are the most likely to be important?

When you have a time or a date field, then that's very often important. You can draw conclusions from that. Then once you've picked a few things, or once the user has picked a few things that they are interested in, then you can say, "Well, there are certain rules that let you build charts." So, you can just build a bar chart because you have a categorical and a numerical dimension, or you have two numerical dimensions, then you maybe do a scatterplot or whatever.

But I think there is, and there are - this is not a new thing, really. This has been around for a good while. So, as I was mentioning earlier, there's an app system from 1985 that was doing that that was basically looking at data types and building charts based on that, or encoding things dependent on what they were. This has been incorporated into Tableau and all kinds of chart tools and builders. I think, Google Sheets has a way to recommend charts, for example, that relatively recent. I think like a few years old, but that is based on a similar idea, that it has a recommendation system that then builds charts for you.

But so far, they all tend to be sort of correct. They build charts that may be useful. But whether they're actually useful or not is a totally different question, because very often they just show you stuff where you're like, "Okay, well, I didn't actually ask for this. I didn't need this particular thing, but they don't know what's important to you and what actually helps you understand your data or answer your question." So, I think that's really where the question will be, can we guide these systems so that they produce more useful output than they have so far.

[0:42:47] SF: Well, awesome. Is there anything else you'd like to share?

[0:42:49] RK: Yes. I mean, we could talk a bit about maybe D3 or the observables, data vis tools, if that's of interest?

[0:42:57] SF: Yes, absolutely. I mean, I'm familiar with D3, but maybe just quickly give a little bit of background on that, and then what are some of the things that you're focused on as a company with the investment in D3?

[0:43:08] RK: So, D3 is this library that is a data visualization library that came out of Mike Bostock's work during his PhD, I think. It stands for data-driven documents. This is just background, D3. It became a very popular way of doing data visualization on the web because that was, that was really hard, especially back then when this came out in like 2011, people were still sort of using Flash. Actually, this was like the, the tail end of Flash, I think, and people were looking for something new and better because Flash wasn't working on their iPhones and it was just also going away at that point.

So, what D3 made possible was to essentially tie SVG elements. You build a data visualization in an SVG object within a website, tie those objects, these elements of an SVG, like your bar rectangles or your scatterplot dots, tie those to data values. And what that does is it means that when the data values change, your visuals change or can change if you do it right. You can have a chart that animates between filter states, for example, or that can morph between a bar chart and a scatter plot and a line chart, whatever. So, you can do lots of very complex data visualizations, but it's also a fairly complex library. Even building a bar chart is sort of a fair amount of work.

What we've been doing at Observable is try to build tools that help you when it's not so much about building complex charts like you usually do with D3, like very fancy, very elaborate things, but simple charts. We have a new library that's called Observable Plot that's built around that. So, it's a much more direct like mapping of data into visuals. Then, you can't build everything that D3 can do, but what you can build is much faster, much easier to build that way. It's a way to do exploration versus - exploration analysis versus analysis versus the bespoke and very cool stuff that you can do with D3. So, you have kind of different ways of working and building things that are still web-native and very much part of whatever web-based system you want to build. 

[0:45:14] SF: Given that D3 has been around now for well over a decade, how has it had to evolve its approach to continue to stay relevant?

[0:45:23] RK: Well, Mike has done a lot of work. So, Mike Bostock, who built D3, he has done a lot of work to essentially keep it up to date with web standards and JavaScript standards. He's changed, I think he changed the way the modules work. He broke it up into modules, so now you can just take the pieces you want. D3 has become this library of utilities that can do all kinds of things. Even if you don't use it for your visuals, you might be using it for your array operations, because it has lots of cool operations for that, like grouping and nesting and stuff like that. He's also built a lot more, and while he and a few other collaborators have built some more layout of components, for example, or in maps, too. Mapping is a big deal with D3. You can have any projection you want. You can have all kinds of really interesting like interactions with maps that are all possible to build with D3 and relatively easy actually now.

Then layouts, like you can have like - and so this means things like TreeMaps and even pie charts, because building those from scratch, from the sectors of a circle, is kind of a pain. There are layout systems for that that will do all the stacking and whatever for you that build all kinds of different charts for you. You can more easily build things that are - where you don't have to do all the operations yourself. Of course, so I guess layouts and maps, but also force-directed layouts, so like graphs and things like that. Those are things that are less common in BI tools, but those are things that D3 is also really good at because you have to actually run a simulation to build, to have the layout be computed over time, and so it has ways of doing that.

[0:46:59] SF: Yes. I mean, that's common graphs or a common visual with workflow orchestration tools, so the no-code orchestration frameworks and stuff like that, that exists. Now, actually you're even seeing a lot of that stuff in the Gen AI space of these sort of low-code agentic framework for you stitching, basically, nodes and edges together.

[0:47:18] RK: Yes.

[0:47:20] SF: So, I wouldn't be surprised if there's a significant growth and investment in those kind of graphics as we continue to build more and more sort of Gen AI dev tools. 

Well, Robert, thanks so much for being here. I really enjoyed this.

[0:47:32] RK: Thank you. It was great.

[0:47:34] SF: Cheers.

[END]