@Cardyak
10-wide decode, 8x int ALUs, 4xLD, 2xST will chew through llvm, JS, and python indirect threading code like God intended.
Nitpick, decode should be 12-16 to match 64B fetch max burst, and 6x FADD/FMUL is a bit overkill that’s gonna be dark silicon outside benchmarking.