TNNLS
Graph Reasoning with Supervised Contrastive Learning
for Legal Judgment Prediction


Abstract
Given the fact descriptions of legal cases, the legal judgment prediction (LJP) problem aims to determine three judgment tasks of law articles, charges, and the term of penalty. Most existing studies have considered task dependencies while neglecting the prior dependencies of labels among different tasks. Therefore, how to make better use of the information on the relation dependencies among tasks and labels becomes a crucial issue. To this end, we transform the text classification problem into a node classification framework based on graph reasoning and supervised contrastive learning (SCL) techniques, named GraSCL. Specifically, we first design a graph reasoning network to model the potential dependency structures and facilitate relational learning under various graph topologies. Then we introduce the SCL method for the LJP task to further leverage the label relation on the graph. To accommodate the node classification settings, we extend the traditional SCL method to novel variants for supervised contrastive learning at the node level, which allows the GraSCL framework to be trained efficiently even with small batches. Furthermore, to recognize the importance of hard negative samples in contrastive learning, we introduce a simple yet effective technique called online hard negative mining to enhance our SCL approach. This technique complements our SCL method and enables us to control the number and complexity of negative samples, leading to further improvements in the model's performance. Finally, extensive experiments are conducted on two well-known benchmarks, demonstrating the effectiveness and rationality of our proposed SCL approach as compared to the state-of-the-art competitors.
Relation Graph Variants

Figure. The illustration of different relation graphs. Each subgraph contains two figures. The upper one is the topology graph of three tasks, in which the arrow means relations between tasks. The lower one is the corresponding adjacency matrix of the relation graph that is assumed to have six label nodes and two for each task. The color mask represents an edge between two label nodes that allows information flow.
Supervised Contrastive Learning Methods
GraSCL-base vs. GraSCL-local vs. GraSCL-global



