Graph Thinking:Connectivity:Connected Components
Connectivity is an important concept in graph thinking. From the math of connectivity in graph theory, we can identify communities/clusters of nodes/entities. One way of identifying communities in a graph is a connected component. In a connected component, each node in the component is reachable from any other node in the component. There can be weakly connected components and strongly connected components. Weak and strong refer to the direction of the edge/relationship, between the two nodes/entities. Weak connections are undirected edges and strong connections are directed edges.
Finding connected components in a graph database is an important first step graph analysis. Graph databases get too big and complex to visualize readily. Connected components in a complex system are hard to see. Understanding connected components helps us understand the underlying structure of the graph for further analysis. We might discover groups of entities that we didn’t know about before: a community cut off from the rest of the system, a vulnerability in our wide area network, or a special interest of expertise in a group of professionals.
A major challenge for enterprise databases is avoiding duplication of records. “The problem of detecting duplicates in a database can be described in terms of determining the connected components of an undirected graph.”1 Analyzing a bodies of published papers using connected components. “Citation graphs representing scientific papers contain valuable information about levels of scholarly activity and provide measures of academic productivity”2 To understand the influence and power of Multi-National Corporations “a complex network analysis is needed in order to uncover the structure of control and its implications.”3 The authors found a strongly connected component at the center, “in other words, this is a tightly-knit group of corporations that cumulatively hold the majority share of each other.”3
Connected components play a key role in fraud detection.4 A ring of fraudsters can be identified by a users sharing the same address or phone number and are opening a variety of credit accounts in a defined time period. In social networks, connected component searches can find groups of strongly connected people not previously identified. And connected component analysis can unlock patterns in the groups identified.
In the powerful realm of graph thinking, connectivity is an important concept to understand and apply. Modeling our data as a graph will enable the application of connected component searches to discover patterns in our data we cannot find in a relational model. Do you think about your data in terms of connectivity? Do you think about your data in terms of communities or cluster patterns? Do you want to? If you want to learn more about graph thinking please contact me and visit Graph Thinking Workshops
1-Monge, Alavaro, E., Elkan, Charles P., An efficient domain-independent algorithm for detecting approximately duplicate database records, 1997. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.8405
2-Yuan, An, Janssen, Jeannette, Milios, Evangelos E., Characterizing and Mining the Citation Graph of the Computer Science Literature, 2004. http://www.cs.toronto.edu/~yuana/research/publications/kais.pdf
3- Vitali, Stephania, Glattfelder, James B., Battiston, Stefano, The Network of Global Control, 2011. http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0025995&type=printable
4-Scifo, Estelle, Hands-On Graph Analytics with Neo4j, Packt Publishing, 2020, page 308.