# Reproduction Guide

Run the Experiment 5 benchmarks on your own Chipyard + Gemmini setup.

## Prerequisites

- [Chipyard](https://github.com/ucb-bar/chipyard) with Gemmini (tested on IBert-style i-BERT config)
- RISC-V toolchain (`riscv64-unknown-elf-gcc`)
- Verilator (recommend `VERILATOR_THREADS=16`)
- Conda environment per Chipyard `env.sh`

**We do not ship Chipyard** (~14 GB). You need a local Chipyard clone.

## Install Gemmini changes

Choose **one** method.

### Method A — Copy source files

```bash
REPO=/path/to/gemmini-online-attention-fusion
CHIPYARD=/path/to/chipyard

cp "$REPO"/hardware/gemmini/src/main/scala/gemmini/*.scala \
   "$CHIPYARD/generators/gemmini/src/main/scala/gemmini/"

cp "$REPO"/software/gemmini-rocc-tests/include/gemmini.h \
   "$CHIPYARD/generators/gemmini/software/gemmini-rocc-tests/include/"
```

### Method B — Apply patches

```bash
cd "$CHIPYARD/generators/gemmini"
git checkout master   # base @ 8c3f992 (ucb-bar/gemmini)
git apply /path/to/gemmini-online-attention-fusion/patches/gemmini_hw_all_changes.patch

cd software/gemmini-rocc-tests
git checkout dev
git apply /path/to/gemmini-online-attention-fusion/patches/gemmini_sw_all_changes.patch
```

See `patches/COMMITS.txt` for pinned commit SHAs.

**Important:** Use the main `generators/gemmini` checkout only. Do not use stale worktrees `gemmini-exp3` / `gemmini-exp4`.

## Rebuild Verilator (required after HW changes)

```bash
cd "$CHIPYARD/sims/verilator"
source "$CHIPYARD/env.sh"
VERILATOR_THREADS=16 make CONFIG=IBertGemminiRocketConfig
```

Output: `simulator-chipyard.harness-IBertGemminiRocketConfig`

## Run benchmarks

```bash
export CHIPYARD=/path/to/chipyard
export GEMMINI_TESTS="$CHIPYARD/generators/gemmini/software/gemmini-rocc-tests"

cd /path/to/gemmini-online-attention-fusion/scripts

# Experiment 1 — baseline vs fused (seq=128)
./exp1/run_official_attention_baseline.sh
./exp1/run_fused_attention_local_benchmark.sh

# Experiment 4 — OnlineAttention phase 4M
VERILATOR_THREADS=16 ./exp4/run_online_attn_bert_k128_m4m.sh

# Experiment 5 — reported final numbers
VERILATOR_THREADS=16 ./exp5/run_exp5_full_seq128.sh
VERILATOR_THREADS=16 ./exp5/run_exp5_full_seq256.sh
VERILATOR_THREADS=16 ./exp5/run_exp5_full_seq512.sh
```

Scripts symlink benchmarks from this repo into `gemmini-rocc-tests/bareMetalC/` via `sync_gemmini_benchmark.sh`.

### Optional logging

```bash
LOG_TO_REPO=1 VERILATOR_THREADS=16 ./exp5/run_exp5_full_seq128.sh
# writes to ../logs/ under repo root
```

### Long runs

Use `tmux` for seq=512 (10+ minutes). Pass extra sim flags if needed:

```bash
EXTRA_SIM_FLAGS="+gemmini_timeout=100000000" VERILATOR_THREADS=16 ./exp5/run_exp5_full_seq512.sh
```

## Expected results (full sublayer cycles)

| Seq | Exp 5 total | Baseline |
|-----|-------------|----------|
| 128 | ~1.63M | ~2.15M |
| 256 | ~3.27M | ~3.97M |
| 512 | ~7.43M | ~8.34M |

Small deltas vs report are normal (Verilator thread count, Chipyard hash).

## Troubleshooting

| Issue | Fix |
|-------|-----|
| ReservationStation timeout | `EXTRA_SIM_FLAGS="+gemmini_timeout=100000000"` |
| `git apply` conflicts | Try `git apply --3way` or use Method A (copy files) |
| Benchmark not in Makefile | Re-run script — `sync_gemmini_benchmark.sh` adds entry |
| Build fails after RTL edit | Full Verilator rebuild (see above) |
