islandflow/RESEARCH.md
dirtydishes 82861408e4 Track project docs
Remove docs from .gitignore and add AGENTS.md, PLAN.md, CODING_STYLE.md, and RESEARCH.md to the repository.
2025-12-29 22:10:26 -05:00

2.4 KiB
Raw Blame History

RESEARCH.md — Signal Evaluation & Backtesting Discipline

This document defines how research is conducted and interpreted in this repository.

Its purpose is to prevent self-deception, not to slow exploration.


Core Research Principles

  • No hindsight
  • No intent claims
  • No performance without context
  • No conclusions without uncertainty

All results are provisional unless explicitly validated.


Fact vs Inference

  • Raw market data = fact
  • Classifiers, signals, and labels = inference

Inference must always:

  • reference its evidence
  • expose confidence
  • be stored separately from raw data

Labeling Rules

Every evaluation must specify:

  • Time horizon (e.g. 5m, 15m, 60m)
  • Metric (return, vol expansion, MAE/MFE, etc.)
  • Directional vs volatility outcome
  • Entry definition (event time, next tick, next candle)

No implicit labels. Ever.


Backtesting Constraints

  • No lookahead bias
  • No survivorship bias
  • Use only data available at the event timestamp
  • OI snapshots must match the event date

Replay pipelines must mirror live pipelines exactly.


Evaluation Metrics (minimum set)

At least one of:

  • Precision / recall
  • Hit rate vs baseline
  • Forward return distribution
  • Vol realized vs implied
  • Calibration (probability vs outcome)

Single “win rate” numbers are insufficient.


Regime Awareness

Results must be contextualized by:

  • Market regime (trend / chop / high vol)
  • Time of day
  • DTE bucket
  • Liquidity conditions

If a signal only works sometimes, thats still information.


Threshold Tuning Rules

  • Thresholds may be tuned only on a defined training window.
  • Validation must occur on disjoint data.
  • Never tune on the same period you report.

Document when thresholds change and why.


Language Discipline

Allowed:

  • “suggests”
  • “consistent with”
  • “correlated with”
  • “higher likelihood”

Disallowed:

  • “smart money”
  • “institutional intent”
  • “guaranteed”
  • “predicts”

Recording Results

For each classifier or hypothesis, record:

  • What was tested
  • What failed
  • What partially worked
  • What conditions mattered

Failed ideas are assets. Keep them.


Final Reminder

This system is for understanding behavior, not for proving superiority.

If a result cannot survive replay, uncertainty, and explanation, it does not belong in production logic.