Translating networks: linking visualization to statistics

Read original article here

Translating networks: linking visualization to statistics
2 minutes read, 15 with the table of correspondence.
All the pictures in this post are by Martin Grandjean. He’s good!
With Martin Grandjean , we have co-authored a short paper titled Translating networks for the Digital Humanities Conference 2019 in Utrecht. It seeks to bridge visualization with statistical measures of networks.
At least two fields are deeply concerned with measuring networks: social network analysis (the oldest one), and network science (since the end of the nineties). Network analysis is also sometimes presented as a field. These three have a large overlap anyways (and also crucial differences, but that is another story). I just want to acknowledge that there is a huge literature on measuring networks. Networks , by Mark Newman, is an excellent and exhaustive source on that matter.
As I argued in my previous post , the problem with network maps is that we do not know what to trust. Exploratory Data Analysis provides a useful frame: the visualization helps building hypotheses, while statistics allow testing them and building evidence. With Martin, and more generally with Tommaso Venturini through our work on Visual Network Analysis (see also this post ), we take this approach seriously. We want to operationalize it. This requires a grammar supporting the translation of a visual hypothesis into a statistical hypothesis.
The short paper we wrote with Martin is worth reading, and you might have come across it already. Here I want to highlight its appendix, because it might be the most immediately useful part. It consists of a table where different visual features are put into correspondence with statistical metrics. In addition, we discuss the usefulness of the translation, and its potential issues. It can be used as a teaching material on how to interpret a visualization in an analytical context, and it can help navigate the different things to pay attention in a network map, and/or the different metrics applicable and what they mean.
To encourage you taking a look, and for the sake of clarity, I reproduce the content of the table below.
Table of correspondence
This table of correspondence between network analysis concepts and interpretations or “translations” is a work in progress. Feel free to comment and share feedback, as we want to improve it, and we cannot do that alone. We plan to build a more complete version at some point.
Graph size (nodes)
How many nodes are there?
Visual analysis
Visual pattern: There are few or many nodes.
Hermeneutic criterion: counting the nodes.
Issue: Though it is easy to have an estimation of the total number of nodes, visualization decisions (for example, setting node sizes on a large scale) can make nodes with few connections difficult to see.
Computational analysis
Topological pattern: Nodes count.
Computational criterion: Counting the nodes.
Issues: None. Note that in graph theory the count of nodes is referred to as “graph order” while the “graph size” refers to the count of edges.
Interpretative potential
Usefulness: Very basic information but useful when comparing networks.
Translation/intuition: The intuition of the size of a network is appropriate.
Graph size (edges)
How many edges are there?
Visual analysis
Visual pattern: There are few or many edges.
Hermeneutic criterion: counting the edges.
Issue: The total number is hard to estimate as soon as the graph is not a simple diagram anymore. The distribution of edges and their weight has an influence on the visual estimation. The difficulty to count edges visually is a known issue, and probably impossible to overcome.
Computational analysis
Topological pattern: Edges count.
Computational criterion: Counting the edges.
Issues: None. Note that in graph theory the count of edges is referred to as “graph size” while the count of nodes is “graph order”.
Interpretative potential
Usefulness: Very basic information but useful when comparing networks.
Translation/intuition: Sometimes translated as size in natural language, but the number of edges is usually expressed in comparison to the number of nodes to indicate density or complexity, not for itself.
Density
How connected are the nodes overall?
Visual analysis
Visual pattern: The graph is more or less compact. If only certain parts are more compact, see “clusters” below.
Hermeneutic criteria: Accumulation of edges, cluttered groups of nodes, “hairball.” Easier to estimate in situation of comparison.
Issues: The less edges there are, the easier to estimate. High densities are difficult to distinguish because the overall appearance of a graph with 60% edges will looks close to a graph with 100%. The visual aspect also depends on the layout algorithm used: some are more efficient at representing clusters.
Computational analysis
Topological pattern: Network density.
Computational criterion: Divide the actual number of edges by the number of all potential edges.
Issues: The formula of density slightly changes depending on the type of networks (directed or not, self loops allowed or not…).
Interpretative potential
Usefulness: A very important notion that allows to compare networks of different sizes if they are produced in the same way or on the basis of comparable data sets.
Translation/intuition: Density, complexity, completeness.
How far are the most distant nodes?
Visual analysis
Visual pattern: The longest of the shortest paths in the graph, or the two most visually distant nodes.
Hermeneutic criteria: Following the series of edges from one node to another, trying to find the longest one. If impossible, the most visually distant nodes is acceptable.
Issues: Generally hard or impossible to see except on small networks, but quite easy to estimate by following a few paths that goes from a side to another, or just looking at most distant nodes in the same connected component (visual distance approximately correlates with geodesic distance).
Computational analysis
Computational criterion: Maximal geodesic distance of all the pairs of nodes.
Issues: Only relevant in a connected graph.
Interpretative potential
Usefulness: Can be used to describe how the density is distributed: complex networks are often characterized by a small diameter while high diameter is frequent in geographical networks.
Translation/intuition: Size, breadth, width.
On average, how close are nodes to each other?
Visual analysis
Visual pattern: The average distance between a couple of nodes.
Hermeneutic criterion: Following the edges between every couples of nodes.
Issues: Impossible to calculate visually since it is an average covering a very large number of values (already difficult to calculate). Loosely relates to density, which is easier to estimate.
Computational analysis
Topological pattern: Average path length.
Computational criterion: Average number of steps along the shortest paths for all possible pairs of nodes.
Issues: Since it is an average, this value does not allow conclusions to be drawn at the individual level if the graph is strongly clustered. The average path length is more complicated for a directed graph than for an undirected graph.
Interpretative potential
Usefulness: Could serve as a complement to diameter because the latter can be influenced by a few nodes that are very far from the main component of the graph. Can replace the diameter in case of unconnected graphs.
Translation/intuition: Can be used to describe the size, breadth or width of the network. But it can also be translated into an indicator of a small world situation.
Connectedness
Is the graph a connected system where there is a path between every nodes?
Visual analysis
Visual pattern: There must be only one component, not several groups of nodes disconnected from each other.
Hermeneutic criterion: Looking at empty areas (structural holes). Groups count as disconnected only if there are no edges between them.
Issues: Depending on the layout, it is possible that “islands” hide in dense groups of nodes, but with a properly set force directed layout, the risk is marginal. Possibly the easiest property to observe visually.
Computational analysis
Topological pattern: In a connected graph, the number of connected components must be one.
Computational criterion: There must be a path between each pair of nodes.
Issues: The notion of connectedness is more complicated for a directed graph than for an undirected graph.
Interpretative potential
Usefulness: The absence of edges between components is more remarkable if they contain many nodes. In many applied cases, connected graphs are artificially created by removing solitary nodes (frequent in messy extracted data).
Translation/intuition: The network is a continent, or, on the contrary, an archipelago.
Clusters / communities
What are the groups where nodes are more connected to each other?
Visual analysis
Visual pattern: Clusters (uneven distribution of nodes)
Hermeneutic criteria: Looking for groups of nodes, as visually dense and separated as possible.
Issue: Force-directed placement algorithms are known to represent clustering very well, if properly set.
Computational analysis
Topological pattern: Max modularity (modularity of the partition with the highest modularity).
Computational criterion: Running a modularity clustering detection algorithm and looking at the obtained modularity.
Issues: Maximal modularity is too hard to compute, so we rely on an estimation. Modularity is a measure of a graph partitioning, so it is necessary to partition the graph first. Different algorithms exist (Louvain, Leiden…). Other metrics than modularity also exist, though they are less widely used.
Interpretative potential
Usefulness: Useful for exploration. It is tempting to take the result of a cluster calculation as a given. In some cases, it is interesting to compare these clusters with previously known groups (categories that do not depend on the structure obtained).
Translation/intuition: The term cluster has become part of the common language, but we also like to talk about groups, communities or hubs. This notion of community is very directly related to the way in which the social sciences and humanities use the metaphor of the “network”.
Global or average clustering coefficient
General indication of the graph’s tendency to be organized into clusters
Visual analysis
Visual pattern: Clusters.
Hermeneutic criterion: Looking for triangles (groups of three nodes with three connexions between them).
Issues: Triangles are easy to count visually in a small network, but the ratio between this result and the total number of potential triangles is impossible to calculate directly. Very difficult to count in dense graphs. Graphs with clusters and/or visually dense tend to have a higher clustering coefficient.
Computational analysis
Topological pattern: Number of closed triplets.
Computational criterion: Dividing the number of closed triplets by the number of triplets in the graph
Issues: The global clustering coefficient is obtained by dividing the closed triplets by the number of all possible groups of three nodes. The average clustering coefficient is quite different but serve a relatively close purpose: it is the average of the local clustering coefficient of all the nodes.
Interpretative potential
Usefulness: A global measure that complements density well and, like the latter, is useful for comparing similar networks with each other.
Translation/intuition: Gives an idea of the entanglement / intrication and the presence of a more localized density.
Connectivity (degree)
How well connected is a node / how many links it has / how many neighbors
Visual analysis
Visual pattern: There are many links to the node.
Hermeneutic criterion: Counting the edges converging to that node.
Issue: In a dense image, it is not always obvious which edges converge to the node or just happen to pass through it visually. Sometimes, counting is also impractical.
Computational analysis
Topological pattern: Degree.
Computational criterion: Degree.
Issues: In a directed network, we also distinguish indegree (inbound link) and outdegree (outbound links). In that case, the degree is the sum of those and hence the number of links, not the number of neighbors.
Interpretative potential
Usefulness: The simplest form of centrality. In most cases, the degree shows information that is already known as part of the basic data and not dependent on the structure. This is why we often focus on the degree distribution.
Translation/intuition: The basic intuition of the number of neighbors. In directed networks, interpretation varies greatly between in- and out- degrees: indegree is often the primary way to look at a hierarchy of nodes, because being “cited” is often a good proxy for authority/notoriety.
Betweenness
Being a bridge, connecting otherwise separated groups of nodes. Removing that node would break many shortest paths.
Visual analysis
Visual pattern: A bridge between clusters.
Hermeneutic criterion: Looking for an edge appearing through a (mostly) empty area between large groups of nodes.
Issue: Many bridges look as expected, they connect over empty spaces. But sometimes, bridges are hidden in the complicated structure of the image. It is generally easier to see the bridging edges than the bridging nodes (however, most of the studies using betweenness centrality focus on bridging nodes).
Computational analysis
Topological pattern: Betweenness centrality.
Computational criterion: The score of betweenness centrality represents the number of shortest paths through a given node or edge.
Issues: Note that the undirected version of the algorithm is often used even for directed graphs. It is the most used metric for detecting bridges, but it does not exactly meet the intuition.
Interpretative potential
Usefulness: The definition of bridge implemented by betweenness centrality meets both intuition of a bridge and of a center. Indeed both a bridge and the center of a star are things that, if removed, disconnect parts of the network. In that sense betweenness is also a “centrality”.
Translation/intuition: The notion of bridge (but also link, gateway, broker or key passage) is very often used when applying network analysis to social or circulation issues. In some cases, it can represent a form of social capital because it describes a structural position of power (or vulnerability).
Closeness
Being in the middle of the network.
Visual analysis
Visual pattern: The geographical center of the graph.
Hermeneutic criterion: Finding the barycenter (the center of the “land masses”) of the graph.
Issue: The visual estimation of centrality is considered acceptable, but it remains an evaluation. It is harder to find in very sparse graphs.
Computational analysis
Topological pattern: Closeness centrality.
Computational criterion: The score of closeness centrality is the average length of the geodesic distances to all the other nodes.
Issues: The undirected version is often used even for directed graphs.
Interpretative potential
Usefulness: There is no single implementation of centrality, but closeness centrality is the most aligned with the notion of a middle, a point that is in proximity to all the others. It looks at the structural distance to other nodes, and can be interpreted as such.
Translation/intuition: Excellent to describe the centre or the middle of a network, especially when the latter is described in topographical terms. Low values of this metric are very appropriate for the use of concepts which are the opposite of the center: the periphery, the margins, etc.
Prestige (eigenvector)
Being connected to well connected nodes without necessarily having a large number of neighbors itself.
Visual analysis
Visual pattern: Proximity to well connected nodes (often inside a cluster).
Hermeneutic criteria: None, except if the size of the nodes is visually proportional to the degree centrality, which helps to identify nodes in the hubs’ surroundings.
Issue: This centrality is hard to see, though it correlates with other forms of centrality that point to well connected nodes at the center of the graph.
Computational analysis
Topological pattern: Eigenvector centrality or Page Rank.
Computational criterion: This is a score computed recursively. It flows from each node to its neighbors (following the direction of edges in a directed graph).
Issues: None.
Interpretative potential
Usefulness: This form of centrality is notably adapted to directed networks, and can be related to the functioning of a search engine (the Page Rank principle was used in the first Google search engine) or a system where information flows.
Translation/intuition: The iterative nature of this notion makes it difficult to translate (and difficult to use in some contexts). It is confused with the notions of prestige, authority, influence and, sometimes, power and elites. This measure distinguishes nodes that are “well” connected (and not just “a lot”). Relates to the notion of assortativity.
Local clustering coefficient
Are the neighbours of a node also connected together?
Visual analysis
Visual pattern: Nodes are inside a cluster.
Hermeneutic criterion: Looking for nodes that have many edges to their cluster (where the other nodes are also connected together). Bridges have a low clustering coefficient.
Issues: Clustering coefficient is generally hard to see and visual interpretation is considered unreliable. Exceptions are small networks, nodes that have only a few neighbors that we see well, and nodes that are only connected to a very dense cluster.
Computational analysis
Topological pattern: Clustering coefficient (local).
Computational criterion: Calculate density of the subgraph of neighbors (how close from complete is the graph formed by the node and its neighbors).
Issues: None.
Interpretative potential
Usefulness: Meets a notion of redundancy in the local connections, comparable to centrality but at a very local scale. Tells if a node is in a clustered environment. Complex networks are often characterized by a high average clustering coefficient.
Translation/intuition: This local measure makes it possible to analyse relationships at the collective level: it can be translated as an indicator of participation in a group (or, on the opposite, loneliness, solitude). It opposes the notion of a bridge.
Shortest path
Two nodes are connected by a path
Visual analysis
Visual pattern: Presence of a path between the two nodes whose relationship is to be analyzed.
Hermeneutic criterion: Following the series of edges from one node to another to find the shortest.
Issue: Requires that we can follow the links in practice, which is possible only for small (undirected) networks and depends on the graphic settings. Finding a path can be difficult, and ensuring that this path is the shortest can be too difficult. However the visual distance is a loose approximation of the shortest path length.
Computational analysis
Computational criterion: Geodesic distance, algorithms for shortest path detection.
Issues: Can be computationally costly on big networks. Note that multiple shortest paths can exist.
Interpretative potential
Usefulness: Very adapted to the use of the graph as a research interface to test the relation of couples of nodes. Very close to the qualitative approach of the humanities, which are often focusing on a few individuals in the network.
Translation/intuition: Corresponds to the intuitive notion of distance in the graph structure. Note that this translation does not take into account the fact that nodes are not always aware of the steps between them and that the perceived distance is not always the shortest path.
Cliques
Groups of nodes where all possible edges exist between them.
Visual analysis
Visual pattern: Very dense clusters.
Hermeneutic criterion: Looking for groups of nodes where each of them is connected to all the others.
Issue: Possible for small or sparse networks, especially if the focus is on cliques that are 4+ in size. Visually impossible for complex networks where cliques are very frequent.
Computational analysis

Images Powered by Shutterstock

The Data Daily

Translating networks: linking visualization to statistics