Logo

The Data Daily

Apple Researchers Introduce 'NEO' to Generalize Confusion Matrix Visualization and Enable Machine Learning Practitioners to Find Hidden Confusions

Apple Researchers Introduce 'NEO' to Generalize Confusion Matrix Visualization and Enable Machine Learning Practitioners to Find Hidden Confusions

Machine learning is a complex, iterative design and development process aimed at creating a learned model that generalizes to new data inputs. Model assessment, which involves testing and analyzing a model’s performance on held-out test sets of data with known labels, is an important stage. Because of the magnitude of today’s machine learning applications, interactive data visualization has proven to be a helpful tool for assisting humans in comprehending model performance.

The confusion matrix is a tabular layout that contrasts a predicted class label against the actual class label for each class across all data examples. It is a common visualization used for model evaluation, particularly for classification models. The confusion matrix’s rows indicate real class labels, while the columns represent expected class labels in a typical arrangement (synonymously, these can be flipped via a matrix transpose).

Many machine learning courses teach these visuals, which are then used in practice to show which pairs of classes a model confuses. To summarise, confusion matrices are the “go-to” graphic for evaluating classification models.

Despite their widespread use, traditional confusion matrices have a number of usability issues. Confusion matrices display a visual proxy for accuracy (e.g., entries on the matrix’s diagonal), which has been found to be insufficient for many evaluations. Furthermore, the diagonal of a confusion matrix often contains far more occurrences (by orders of magnitude) than the off-diagonal entries, obscuring critical misunderstandings (i.e., off-diagonal entries).

As practitioners improve their model, the net impact shifts off-diagonal cases to the diagonal, increasing the problem of confusion hiding. Surprisingly, the better the model optimization, the more difficult it is to discover errors. When a dataset has many classes, a high-class imbalance, a hierarchical structure, or multiple outputs, confusion matrices suffer from scaling issues.

Apple researchers did a formative research study with machine learning practitioners to better grasp the limitations of employing confusion matrices. In many machine learning applications, the team discovered that confusion matrices are challenging to use at scale, don’t provide other metrics model practitioners need to know (e.g., precision, recall), and are difficult to share. Furthermore, confusion matrices only allow flat, single-label data structures; hierarchical labels and multi-output labels, which are more sophisticated but still common, are not supported.

The team developed a confusion matrix algebra, which represents confusion matrices as probability distributions and gives a unified solution for the shortcomings of conventional confusion matrices, based on findings from formative research and a literature review. They devised and built NEO, a visual analytics solution that allows practitioners to author and interact with confusion matrices in a variety of configurations with increasingly complex label structures based on this algebra.

NEO’s design expands the confusion matrix by allowing users to see extra metrics for analytical context, interactively analyze model confusions through alternative normalization methods, see hierarchical and multi-output labels, and easily exchange confusion matrix settings with others. NEO uses a traditional confusion matrix as the basis of the display, keeping the recognizable format of confusion matrices.

The primary contributions in this paper are as follows:: 

• Surveying machine learning practitioners at Apple on how confusion matrices and model evaluation visualizations are used in practice yielded formative research, which included typical problems and analysis tasks.

 • Confusion matrix algebra is a type of probability distribution algebra that generalizes and models confusion matrices.

 • NEO is a visual analytics solution that allows hierarchical and multi-output labels for building and engaging with confusion matrices. NEO also offers the specification (or “spec”), which allows users to share specific visualizations with others. In the same way as publishing a spec updates the visualization, interacting with the visualization updates the spec, NEO is reactive.

 • Three model evaluation scenarios, including object detection, large-scale picture classification, and multi-output online toxicity detection, show how NEO can assist practitioners in evaluating machine learning models across domains and modeling tasks.

Confusion matrices have an expected and borderline “standardized” representation, whereas many machine learning visualizations don’t have one. Rather than redesigning the confusion matrix visualization, the key design goal for NEO was to take advantage of its familiarity and enhance its functionality with additional views and interactivity. NEO displays a standard confusion matrix in the simplest case, where a practitioner has a classification model with a dataset whose instances have no hierarchy and only one class label. Even in these circumstances, however, there is still room to improve model evaluation through interaction.

Svelte, TypeScript, and D3 are used to create NEO, a modern web-based system. The specification is implemented in a portable JSON format so that confusion matrix configurations can be readily shared with other stakeholders. In terms of system scalability, NEO is constrained by standard SVG browser limits (e.g., showing tens of thousands of SVG components). Although researchers believe that better interactions for configuring confusion matrices to compare relevant classes and submatrices are more helpful to practitioners than rendering the largest matrix possible, they believe that engineering efforts such as leveraging Canvas or WebGL would remove this constraint.

The capabilities of confusion matrices are generalized in this paper while preserving their familiar representation. The team developed an algebra that represents confusion matrices as probability distributions and expresses more variations of confusion matrices, such as datasets with hierarchical and multi-labels, through formative research. The team used this algebra to create NEO, a visual analytics solution that allows practitioners to author, interact with, and share confusion matrices in a variety of ways. Finally, the researchers show NEO’s utility by using three model evaluation situations to assist humans in better comprehending model performance and uncover hidden ambiguities.

Images Powered by Shutterstock