Skip to main content

Spike Artifacts

We built a custom tree-sitter grammar for IBM Enterprise COBOL and published it under the MIT license at github.com/Spantree/tree-sitter-cobol-enterprise. The grammar was purpose-built to handle EXEC CICS and EXEC SQL blocks as typed AST nodes with named fields (DATASET, INTO, FROM, RESP, and others). Validated against the AWS CardDemo corpus, it parses 65 of 66 files cleanly, a 98.5% success rate. The existing npm tree-sitter-cobol package produces ERROR nodes on 25 of the 44 CardDemo COBOL programs because it treats EXEC CICS and EXEC SQL blocks as opaque word sequences. Those ERROR nodes propagate downstream: any tool trying to reason about transaction behavior or database access patterns gets garbage. Our grammar parses those blocks as first-class AST nodes, which is what makes the analysis pipeline possible. We ran a full analysis pipeline against the CardDemo corpus as a reference implementation. The pipeline inventoried 179 files, built a dependency graph across all 44 programs, identified 14 dead program candidates and 33 dead paragraphs (32% dead code rate), and produced complexity scores for all 44 programs. The most complex program in the corpus is COACTUPC: 4.1/5 complexity, 3,368 lines, 17 CICS commands, 51 GO TO statements. That one is flagged for human specialist review before any automated translation. We also ran AWS Transform against the same CardDemo corpus as a benchmark. It produced 1,213 Java files. Those files require the AWS BluAge runtime to execute; they cannot be deployed to a standard JVM. AWS Transform produces output. This pipeline produces understanding: a knowledge graph, complexity scores, a dependency graph, and LLM documentation. The translated code comes after the understanding is built.

Open source repositories

The tree-sitter-cobol-enterprise repository is the custom tree-sitter grammar for IBM Enterprise COBOL, purpose-built to handle EXEC CICS and EXEC SQL blocks as typed AST nodes. Validated against the AWS CardDemo corpus, 65 of 66 files parse cleanly. MIT licensed, with attribution to the upstream tree-sitter-cobol grammar by Yutaro Sakamoto. Available at github.com/Spantree/tree-sitter-cobol-enterprise.

The cobol-carddemo-claude-analysis repository contains the full Phase 1 analysis outputs from running the comprehension pipeline against the CardDemo corpus: inventory across 179 files, complexity scores for all 44 programs, dead code analysis (14 unreachable programs, 32% dead code rate), and the dependency graph as a Mermaid diagram. These are the artifacts that Phase 1 produces for any estate. MIT licensed. Available at github.com/Spantree/cobol-carddemo-claude-analysis.

The aws-mainframe-modernization-carddemo repository is a fork of the AWS CardDemo reference application with two migration branches: migration/java (targeting Java 21 + Spring Boot 3) and migration/typescript (targeting NestJS + Bun + TypeORM + React). The TypeScript branch contains 34 fully translated programs, 4 implementation stubs, and 4 human-specialist stubs, covering all 44 COBOL programs in the corpus. Each stub explains precisely why automated translation was not attempted and what specialist work is required. Both branches include a PROGRAM_MAP.md mapping every COBOL filename to its TypeScript or Java equivalent with semantic names, complexity scores, and migration status. Available at github.com/Spantree/aws-mainframe-modernization-carddemo.