Bruno Alano

A Cora Depth Study of Geometry and Spectral Optimization in GNNs

gnn optimization geometry research-notes

This post extends my earlier note on Muon and graph neural networks, but it asks a different question.

If Muon only helps plain deep GCNs, then it may simply be compensating for an architecture that collapses with depth. The more interesting test is whether optimizer-level spectral control still matters once the backbone itself already addresses oversmoothing and oversquashing.

To test that, I ran a focused depth study on Cora across three backbones:

And I crossed them with three optimizers:

The result is straightforward: Muon helps plain deep GCNs, does not rescue deep GraphSAGE, and still gives a large lift on top of GBN. The architecture and the optimizer appear to be addressing different parts of the depth problem.

Experimental Setup

This was intentionally a narrow follow-up study rather than a broad benchmark.

Completed runs:

Incomplete runs:

This should therefore be read as an interrupted but informative Cora depth study, not as a finished benchmark paper.

Results

The 0.2 dashed line is there as a visual anchor. On Cora, once a model spends enough time near that band, it is usually no longer doing meaningful node classification.

The sweep reveals three distinct patterns, one for each backbone.

GCN: Muon Extends the Viable Depth Regime

The first result is the most familiar one. On plain GCN, Muon is strongest in the moderate-depth regime:

BackboneDepthAdamWMuonAdaMuon
GCN40.79100.81230.8030
GCN80.72800.78530.7450
GCN160.55400.60770.5240

This lines up with the earlier 10-seed plain-GCN confirmation, where Muon reached 0.8023 ± 0.0099 at depth 8 versus AdamW at 0.7099 ± 0.0360, and the hybrid was the cleaner win at depth 16.

On a standard GCN, then, spectral optimization buys real depth headroom. It does not eliminate depth-induced failure, but it clearly delays it.

GraphSAGE: Optimizer Choice Does Not Rescue a Failing Backbone

GraphSAGE is the negative control that makes the interpretation cleaner.

At shallow-to-moderate depth, Muon is still competitive:

Beyond that, however, the backbone fails regardless of optimizer:

BackboneDepthAdamWMuonAdaMuon
GraphSAGE160.34670.31500.3190
GraphSAGE640.20400.14730.1473
GraphSAGE1280.20230.20230.2023

This is an important negative result. Optimizer-level spectral control is not a universal cure for deep GNN failure. When the backbone’s geometry is still poor enough, optimizer choice can delay collapse at best; it does not correct the underlying problem.

GBN Changes the Baseline Before the Optimizer Does

The most interesting part of the study begins with the GBN backbone.

GBN + AdamW already behaves very differently from plain GCN and GraphSAGE. It does not catastrophically collapse as depth increases:

BackboneDepthAdamW
GBN320.5947
GBN640.5753
GBN1280.5767
GBN2560.5827

That is exactly the baseline a geometry-aware deep message-passing model is supposed to create. The architecture removes catastrophic depth collapse under a standard optimizer.

GBN + Muon Still Wins

Once I crossed Muon with GBN, the result stayed strong through every completed depth:

BackboneDepthAdamWMuon
GBN20.58130.7003
GBN40.60470.7720
GBN80.60870.7837
GBN160.56500.7867
GBN320.59470.7793
GBN640.57530.7863
GBN1280.57670.7780

This is the central claim of the post:

Muon is not just a band-aid for bad deep GNN architectures. In this Cora study, it still adds a large lift after a geometry-aware backbone has already stabilized depth.

That is a stronger result than “Muon beats AdamW on a deep GCN.” It suggests that the optimizer is not merely preventing collapse. It is improving the operating point of a backbone that was already designed to survive depth.

Interpretation

I do not want to overstate the mechanism. I did not rerun the full spectral and gradient diagnostic suite for the GBN sweep, so the interpretation here has to stay at the level of informed reading rather than direct mechanistic proof.

Still, the pattern is coherent:

  1. Plain GCN: Muon helps most when depth is high enough for spectral drift to become destructive, but not so high that the architecture is irrecoverable.
  2. Plain GraphSAGE: optimizer changes cannot overcome deep geometric failure on their own.
  3. GBN: geometry-aware message passing removes catastrophic depth collapse under AdamW and establishes a stable deep baseline.
  4. GBN + Muon: once that baseline exists, spectral optimization still seems to improve how the model uses depth.

The simplest reading is that the architecture and the optimizer are complementary, not redundant. Geometry determines whether deep training stays viable at all; the optimizer helps determine how good the resulting solution is.

Scope and Next Steps

This is still a narrow result. The next steps are clear:

That is the point where this turns from a strong research note into a more complete empirical study.

Bottom Line

In this Cora depth study, geometry-aware message passing fixed the catastrophic depth problem, but Muon still mattered. The architecture established stability; the optimizer improved the resulting operating point. Those are different gains, and on this evidence they stack.