# RESEARCH.md — Signal Evaluation & Backtesting Discipline This document defines **how research is conducted and interpreted** in this repository. Its purpose is to **prevent self-deception**, not to slow exploration. --- ## Core Research Principles - **No hindsight** - **No intent claims** - **No performance without context** - **No conclusions without uncertainty** All results are provisional unless explicitly validated. --- ## Fact vs Inference - Raw market data = **fact** - Classifiers, signals, and labels = **inference** Inference must always: - reference its evidence - expose confidence - be stored separately from raw data --- ## Labeling Rules Every evaluation must specify: - Time horizon (e.g. 5m, 15m, 60m) - Metric (return, vol expansion, MAE/MFE, etc.) - Directional vs volatility outcome - Entry definition (event time, next tick, next candle) No implicit labels. Ever. --- ## Backtesting Constraints - No lookahead bias - No survivorship bias - Use only data available **at the event timestamp** - OI snapshots must match the event date Replay pipelines must mirror live pipelines exactly. --- ## Evaluation Metrics (minimum set) At least one of: - Precision / recall - Hit rate vs baseline - Forward return distribution - Vol realized vs implied - Calibration (probability vs outcome) Single “win rate” numbers are insufficient. --- ## Regime Awareness Results must be contextualized by: - Market regime (trend / chop / high vol) - Time of day - DTE bucket - Liquidity conditions If a signal only works sometimes, that’s still information. --- ## Threshold Tuning Rules - Thresholds may be tuned **only** on a defined training window. - Validation must occur on disjoint data. - Never tune on the same period you report. Document when thresholds change and why. --- ## Language Discipline Allowed: - “suggests” - “consistent with” - “correlated with” - “higher likelihood” Disallowed: - “smart money” - “institutional intent” - “guaranteed” - “predicts” --- ## Recording Results For each classifier or hypothesis, record: - What was tested - What failed - What partially worked - What conditions mattered Failed ideas are assets. Keep them. --- ## Final Reminder This system is for **understanding behavior**, not for proving superiority. If a result cannot survive replay, uncertainty, and explanation, it does not belong in production logic.