Bulbasaur–One Stitched Toy Graph Neural Network
Shanzai CL
Published on: 2021-08-12
Abstract
Graph neural networks have shown significant success in the field of graph representation learning. Graph convolution performs neighborhood aggregation and represents one of the most important graph operations. However, one layer of these neighbor aggregation methods only considers direct neighbors, and when several recent studies attributed this performance degradation to an over-refined problem, the problem pointed out that repeated propagation makes it difficult to distinguish between different types of representations. In this work, we believe that the key factor affecting performance is the entanglement of representation conversion and propagation in current graphics convolution operations. After decomposing these two operations, a deeper graph neural network can be used to learn the graph level representation from the above receptive domain. three simple and effective techniques are used in combination:differentiable pooling initial residuals and Identity mapping. The hierarchical representation of the graph is realized. Each layer of the deep GNN learns the differentiable soft cluster allocation for the nodes, and maps the nodes to a set of clusters, and then these clusters are used as coarsening input and input to the next layer of GNN.Based on our theoretical and empirical analysis, we propose a Deep Adaptive Initial Residual Identity Mapping Graph Neural Network (BULBASAUR) to adaptively merge information from large receiving fields.
Keywords
Graph convolutional network; Initial residuals; Identity mapping; Differentiable pooling; Graph representation learningIntroduction
Graphs, representing entities and their relationships, are ubiquitous in the real world, such as social networks, point clouds, traffic networks, knowledge graphs, and molecular structures. Recently, many studies focus on developing deep learning approaches for graph data, leading to rapid development in the field of graph neural networks. Great successes have been achieved for many applications, such as node classification [7,9,11,12,21,30,31,33], graph classification [6,8,14,18,32,34,35,38] and link prediction [1,36,37] Graph convolutions adopt a neighborhood aggregation (or message passing) scheme to learn node representations by considering the node features and graph topology information together, among which the most representative method is Graph Convolutional Networks (GCNs) [11]. GCN learns representation for a node by aggregating representations of its neighbors iteratively. However, a common challenge faced by GCN and most other graph convolutions is that one layer of graph convolutions only consider immediate neighbors and the performance degrades greatly when we apply multiple layers to leverage large receptive fields. Several recent works attribute this performance degradation to the oversmoothing issue [3,15,33], which states that representations from different classes become inseparable due to repeated propagation. In this work, we study this performance deterioration systematically and develop new insights towards deeper graph neural networks.
In particular, we propose a graph convolutional network (BULBASAUR) based on initial residuals, identity mapping and differentiable pooling, which is a Deep Adaptive GCN model that solves the smoothing problem. At each layer the initial residuals are constructed from the input layer to skip connections, and the identity mapping adds the identity matrix to the weight matrix. However, the GNN framework was originally designed based on CNN, and its inherent flat structure cannot be fully adapted to the representation of graphs, because most graphs (networks) have a hierarchical structure.So we have added the technique of differentiable pooling. Inside each layer, a graph classification representation algorithm is used to get the vector representation of the nodes in the current graph.Between layers, local pooling is used to obtain the coarse graph of the current graph. In each layer, the algorithm clusters the graphs in the layer according to the node vector obtained by GNN, and maps the nodes to a series of clusters to obtain a new layered graph. Until the last layer, a unified vector representation of the entire graph will be obtained, which is used for graph classification tasks. The core idea of differentiable pooling is to obtain a deeper and more layered GNN model by providing a pooling operation that can distinguish hierarchical nodes in the graph. At the same time, the differentiable pooling can be integrated with a variety of GNN models, which also shows that the algorithm has a certain generalization. Empirical studies have shown that these three surprisingly simple techniques can prevent over-smoothing and continuously improve its performance as the depth of the BULBASAUR network increases. In particular, the deep BULBASAUR model has achieved new state-of-the-art results on various semi-supervised and fully-supervised tasks.