Venusaur –One Stitched Toy Graph Neural Network

Shanzai CL

Published on: 2021-09-12

Abstract

Graph Convolutional Network (GCNS) is a powerful deep learning method for graph structure data. Recently, GCNS and its subsequent variants have shown excellent performance in various application fields on real data sets. Despite the success, most current GCN models are very shallow due to the problem of over smoothing. Moreover, there are limitations of planarization, so the graph cannot be characterized in layers. In real applications, a lot of graph information is represented by hierarchical levels, such as maps, conceptual graphs, and flowcharts. Capturing hierarchical information can represent graphs more completely and efficiently. This paper studies the design and analysis of deep image convolutional networks. We proposed the VENUSAUR model, using three simple and effective techniques: initial residuals, Identity mapping and differentiable pooling. The hierarchical representation of the graph is realized. Each layer of the deep GNN learns the differentiable soft cluster allocation for the nodes, and maps the nodes to a set of clusters, and then these clusters are used as coarsening input and input to the next layer of GNN. Our experiments show that the deep VENUSAUR model outperforms the most advanced methods on various semi-supervised and fully-supervised tasks.

Keywords

Graph convolutional network; Initial residuals; Identity mapping; Differentiable pooling

Introduction

In recent years, people’s interest in developing graph neural networks has continued to surge. Graph neural network is a general deep learning architecture that can run on graph structure data such as social network data or molecular structure data. GNN generally uses the underlying graph as a computational graph, and learns neural network primitives by passing, transforming, and aggregating node feature information on the graph to generate a single node embedding. The generated node embedding can be used as input for any differentiable prediction layer such as node classification or connection prediction. The complete model can be trained in an end-to-end manner.Recently, there have been several attempts to solve the super smooth problem. JKNet (Xu et al., 2018) uses dense skip connections to combine the output of each layer to maintain the locality of node representation. Recently, DropEdge (Rong et al., 2020) suggested that by randomly deleting some edges from the input graph, the effect of excessive smoothing can be reduced. Experiments (Rong et al., 2020) show that as the network depth increases, these two methods can slow down the performance degradation. However, for semi-supervised tasks, the most advanced results are still achieved through shallow models, so the benefits of increasing network depth are still questionable. On the other hand, there are several methods that combine deep propagation with shallow neural networks. SGC (Wu et al., 2019) attempts to capture high-order information in the graph by applying the K-th power of the graph convolution matrix in a single neural network layer. PPNP and APPNP (Klicpera et al., 2019a) use a personalized PageRank matrix to replace the power of the graph’s convolution matrix to solve the over-smoothing problem. GDC (Klicpera et al., 2019b) extends APPNP by extending personalized PageRank (Page et al., 1999) to the arbitrary graph diffusion process. However, these methods linearly combine adjacent features in each layer, losing the powerful expressive ability of deep nonlinear structures, which means that they are still shallow models. In short, how to design a GCN model to effectively prevent excessive smoothing and achieve the most advanced results through a truly in-depth network structure is still an open question. Because of this challenge, when designing a new graph neural network, it is even unclear whether the network depth is a resource or a burden. In this article, we extend it to the deep model through three simple and effective modifications, thus giving a positive answer to this open question. In particular, we propose a graph convolutional network (VENUSAUR) based on initial residuals, Identity mapping and differentiable pooling, which is a deep GCN model that solves the smoothing problem. At each layer, the initial residuals are constructed from the input layer to skip connections, and the Identity mapping adds the unit matrix to the weight matrix. However, the GNN framework was originally designed based on CNN, and its inherent flat structure cannot be fully adapted to the representation of graphs, because most graphs (networks) have a hierarchical structure.So we have added the technique of differentiable pooling. Inside each layer, a graph classification representation algorithm is used to get the vector representation of the nodes in the current graph.Between layers, local pooling is used to obtain the coarse graph of the current graph. In each layer, the algorithm clusters the graphs in the layer according to the node vector obtained by GNN, and maps the nodes to a series of clusters to obtain a new layered graph. Until the last layer, a unified vector representation of the entire graph will be obtained, which is used for graph classifi- cation tasks. The core idea of differentiable pooling is to obtain a deeper and more layered GNN model by providing a pooling operation that can distinguish hierarchical nodes in the graph. At the same time, the differentiable pooling can be integrated with a variety of GNN models, which also shows that the algorithm has a certain generalization. Empirical studies have shown that these three surprisingly simple techniques can prevent over-smoothing and continuously improve its performance as the depth of the VENUSAUR network increases. In particular, the deep VENUSAUR model has achieved new state-of-the-art results on various semi-supervised and fully-supervised tasks.