From social science to biology, graphlets have found numerous applications and were used as the building blocks of network analysis. In social science, graphlet analysis (typically known as k-subgraph census) is widely adopted in sociometric studies. Much of the work in this vein focused on analyzing triadic tendencies as important structural features of social networks (e.g., transitivity or triadic closure) as well as analyzing triadic configurations as the basis for various social network theories (e.g., social balance, strength of weak ties, stability of ties, or trust). In biology, graphlets were widely used for protein function prediction, network alignment, and phylogeny to name a few. More recently, there has been an increased interest in exploring the role of graphlet analysis in computer networking (e.g., for web spam detection, analysis of peer-to-peer protocols and Internet AS graphs), chemoinformatics, image segmentation, among others.
While graphlet counting and discovery have witnessed a tremendous success and impact in a variety of domains from social science to biology, there has yet to be a fast and efficient approach for computing the frequencies of these patterns. The main contribution of this work is a fast, efficient, and parallel framework and a family of algorithms for counting graphlets of size k-nodes that take only a fraction of the time to compute when compared with the current methods used. The proposed graphlet counting algorithm leverages a number of theoretical combinatorial arguments for different graphlets. For each edge, we count a few graphlets, and with these counts along with the combinatorial arguments, we obtain the exact counts of others in constant time. Furthermore, we show a number of important machine learning tasks that rely on this approach, including graph anomaly detection, as well as using graphlets as features for improving community detection, role discovery, graph classification, and relational learning.