This semester I started a new graduate course at Northeastern onVisualization for Machine Learning. I am particularly excited about this course because it strongly connects with our research over the last few years. As usual, teaching is an excellent way to understand our research work better. We are about six weeks into the semester, and I am ready to share some details and observations.
The course targets graduate students, and for this reason, it is heavily based on paper reading. However, when I started designing the course, I wanted to teach something other than a usual seminar course where students read and present papers. First, most students find paper presentations quite dull. Second, I felt students had way more to learn than only reading papers. For this reason, the course also includes the following:
In addition, we will also have a set of invited speakers who will talk about their research work in this area.
The biggest challenge I had to confront was to come up with an organization of the topics that made sense. My goal was to organize existing works into coherent categories and then highlight existing gaps. Above all, I hoped to create a structure that could work as a useful mental map of the field for myself and my students. After several iterations, I came up with a structure that I am pleased with. The existing literature is organized around three main categories: visualizing model data, visualizing model extrapolations and explanations, and visualizing learned representations. I like this structure because it organizes existing methods according to what type of information they use: the data, the extrapolations and explanations, and the structural components. These also tend to map somewhat nicely to focus on what the model does versus how the model does it.
The reading list includes two additional elements. First, I included a few papers that introduce the problems data visualization can help solve in machine learning (mostly papers that report on studies conducted with actual practitioners within companies). Second, I included a few introductory data visualization readings for students unfamiliar with the topic. Summarizing visualization research in one single week turned out to be quite a challenge, but I am happy with the results. Doing this work forced me to decide what is essential in visualization, and I think I have been able to distill a few relevant notions. But this is too long to describe here and deserves its blog post!
If you want to take a look, you can find the current version of the reading list inthis google doc file(note that I will keep editing it until the end of the semester). Eventually, I’d like to share it here with more comments and a nicer format.
For this course, I wanted to ensure the student would end the semester having acquired practical coding skills to analyze models with ML interpretability methods and data visualizations.
After several discussions, my Ph.D. studentDaniel(who is creating a fantastic set of labs) and I decided to focus on Jupyter notebooks and the Altair library, and both choices turned out to be great. We chose to use Jupyter and Python because it’s the industry standard, and you can’t really go wrong with them. We decided to use Altair because it provides reasonable defaults while being sufficiently expressive to generate the complex visualizations we sometimes need in this course. The most significant advantage we found with Altair is interactivity. Making charts interactive and creating multiple linked views turned out to be a real winner, and the students are having a blast!
Designing notebooks for ML visualization has been an exciting experience. In the beginning, we were worried that we would not be able to build the fancy visualizations that sometimes appear in visualization papers. But our experience has been that either it is always possible to simplify and recreate these visualizations with the Jupyter + Altair combo, or fancy visualizations are unnecessary. This is an area where I am learning a lot myself, leading me to reconsider the utility of overly fancy visual representations.
As a visualization instructor, one of my strongest beliefs is that you can’t learn visualization by talking about visualization; you need to do it! For this reason, the course also includes design exercises that expose the students to the challenge of designing visualizations for machine learning problems. We ask the students to sketch solutions for machine learning problems at home, and then we have a whole class where they compare and discuss their solutions and try to improve their initial ideas. While I am still refining these exercises, I find them inspiring and fun, and I believe they are very instructive for our students.
I am pretty happy with how the course is evolving. Designing and teaching it has helped me understand this research area way better than I did. I have a much more defined mental model of what visualization can do for machine learning and where some relevant gaps exist.
There is a lot more that I would like to share. I wanted to create a website for the course, but I could not find time to do it so far. Eventually, I plan to share all the material I have to help others develop a similar course. If you have any questions in the meantime, please do reach out to me.