islandflow/RESEARCH.md
dirtydishes 82861408e4 Track project docs
Remove docs from .gitignore and add AGENTS.md, PLAN.md, CODING_STYLE.md, and RESEARCH.md to the repository.
2025-12-29 22:10:26 -05:00

123 lines
2.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# RESEARCH.md — Signal Evaluation & Backtesting Discipline
This document defines **how research is conducted and interpreted** in this repository.
Its purpose is to **prevent self-deception**, not to slow exploration.
---
## Core Research Principles
- **No hindsight**
- **No intent claims**
- **No performance without context**
- **No conclusions without uncertainty**
All results are provisional unless explicitly validated.
---
## Fact vs Inference
- Raw market data = **fact**
- Classifiers, signals, and labels = **inference**
Inference must always:
- reference its evidence
- expose confidence
- be stored separately from raw data
---
## Labeling Rules
Every evaluation must specify:
- Time horizon (e.g. 5m, 15m, 60m)
- Metric (return, vol expansion, MAE/MFE, etc.)
- Directional vs volatility outcome
- Entry definition (event time, next tick, next candle)
No implicit labels. Ever.
---
## Backtesting Constraints
- No lookahead bias
- No survivorship bias
- Use only data available **at the event timestamp**
- OI snapshots must match the event date
Replay pipelines must mirror live pipelines exactly.
---
## Evaluation Metrics (minimum set)
At least one of:
- Precision / recall
- Hit rate vs baseline
- Forward return distribution
- Vol realized vs implied
- Calibration (probability vs outcome)
Single “win rate” numbers are insufficient.
---
## Regime Awareness
Results must be contextualized by:
- Market regime (trend / chop / high vol)
- Time of day
- DTE bucket
- Liquidity conditions
If a signal only works sometimes, thats still information.
---
## Threshold Tuning Rules
- Thresholds may be tuned **only** on a defined training window.
- Validation must occur on disjoint data.
- Never tune on the same period you report.
Document when thresholds change and why.
---
## Language Discipline
Allowed:
- “suggests”
- “consistent with”
- “correlated with”
- “higher likelihood”
Disallowed:
- “smart money”
- “institutional intent”
- “guaranteed”
- “predicts”
---
## Recording Results
For each classifier or hypothesis, record:
- What was tested
- What failed
- What partially worked
- What conditions mattered
Failed ideas are assets. Keep them.
---
## Final Reminder
This system is for **understanding behavior**, not for proving superiority.
If a result cannot survive replay, uncertainty, and explanation,
it does not belong in production logic.