agentskills.codes

<!-- REVIEW-2026-04-17: unreferenced by rules/workflows/teams/agents. Confirm or delete. --> --- name: hft-ops description: Use when working on platform operations — session lifecycle, autonomy degradation, position flattening, margin monitoring, pre/post market checks, backup, o

Install

mkdir -p .claude/skills/hft-ops && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/13878" && unzip -o skill.zip -d .claude/skills/hft-ops && rm skill.zip

Installs to .claude/skills/hft-ops

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

<!-- REVIEW-2026-04-17: unreferenced by rules/workflows/teams/agents. Confirm or delete. --> --- name: hft-ops description: Use when working on platform operations — session lifecycle, autonomy degradation, position flattening, margin monitoring, pre/post market checks, backup, o
280 chars · catalog description✓ has a “when” triggerlonger than Claude Code's old 250-char listing cap (fine on current versions)

About this skill

<!-- REVIEW-2026-04-17: unreferenced by rules/workflows/teams/agents. Confirm or delete. -->

name: hft-ops description: Use when working on platform operations — session lifecycle, autonomy degradation, position flattening, margin monitoring, pre/post market checks, backup, or any code in ops/.

HFT Operations

Use this skill for ops/ (14 files), operational Makefile targets, and runtime lifecycle management.

Module Map (14 files)

Session Lifecycle

FileClassPurpose
session_governor.py (12KB)SessionGovernor, TrackGate, SessionPhaseWall-clock session FSM per product track
strategy_governor.pyStrategyGovernorPer-strategy risk management

Autonomy & Safety

FileClassPurpose
autonomy.pyAutonomyMode, AutonomyTransitionNORMAL/QUARANTINED/REDUCE_ONLY/HALT enum + transitions
autonomy_monitor.py (16KB)AutonomyMonitorContinuous health monitoring -> auto-degradation
position_flattener.pyPositionFlattenerEmergency position closure (120s deadline)
flatten_gate.pyFlattenGateFORCE_FLAT phase filter (close-only orders)
manual_rearm.pymanual_rearm()Operator recovery from REDUCE_ONLY/HALT
platform_degrade.pyPlatformDegradeManagerCoordinated multi-subsystem degradation
margin_monitor.pyMarginMonitorBroker margin availability monitoring

Ops Utilities

FileClassPurpose
backup.pyBackupManagerClickHouse backup automation
config_snapshot.pyConfigSnapshotBoot config -> CH audit trail
daily_pnl_report.pyDailyPNLReportDaily PnL summary
evidence.pyEvidenceCollectorDegradation diagnosis evidence
platform_inputs.pyPlatformInputsPlatform input parameters

Session Phase FSM

INIT(0) -> PRE_OPEN(1) -> OPEN(2) -> CLOSE_ONLY(3) -> FORCE_FLAT(4) -> CLOSED(5)

Driven by wall-clock schedule in config/base/session_governor.yaml:

tracks:
  - name: stock
    symbols: [2330, 2317]
    schedule:
      - {phase: pre_open, at: "08:30"}
      - {phase: open, at: "09:00"}
      - {phase: close_only, at: "13:25"}
      - {phase: force_flat, at: "13:29"}
      - {phase: closed, at: "13:30"}
  - name: futures_day
    symbols: [MXF, TX]
    schedule: [...]

TrackGate provides O(1) per-symbol phase lookup for StrategyRunner:

  • OPEN: normal trading
  • CLOSE_ONLY: close positions only
  • FORCE_FLAT: PositionFlattener active
  • CLOSED: all intents rejected

Autonomy Degradation

AutonomyMonitor checks (every 100ms-1s):
  CH write stale > 60s         -> PLATFORM_REDUCE_ONLY
  Feed gap > 50% of symbols    -> PLATFORM_REDUCE_ONLY
  Feed reconnect flapping      -> PLATFORM_REDUCE_ONLY
  Queue depth > 90% maxsize    -> PLATFORM_REDUCE_ONLY
  RSS memory > threshold       -> PLATFORM_REDUCE_ONLY
  PnL drawdown > limit         -> HALT
  Reconciliation drift         -> PLATFORM_REDUCE_ONLY

Transitions:
  NORMAL -> PLATFORM_REDUCE_ONLY (auto)
  PLATFORM_REDUCE_ONLY -> HALT (auto, on critical triggers)
  HALT -> NORMAL (MANUAL ONLY via manual_rearm())

Reason codes (frozen set for metrics): broker_unavailable, clickhouse_unhealthy, feed_gap_majority, feed_reconnect_flapping, memory_pressure, persistence_failure, pnl_peak_drawdown, queue_depth_exceeded, reconciliation_drift, rss_unhealthy, wal_backlog_unhealthy

Pre/Post Market SOPs

Pre-Market

make pre-market-check    # Docker, ClickHouse, Redis, WAL, metrics

Checks: containers healthy, CH responsive, Redis ping, WAL backlog zero, Prometheus scraping.

Post-Market

make post-market-check   # WAL, recorder, ClickHouse records, PnL

Checks: WAL drained, recorder healthy, today's CH row count, PnL reconciled.

Operational Commands

# Health
make pre-market-check              # Pre-market gates
make post-market-check             # Post-market verification
make recorder-status               # WAL backlog + CH status
uv run hft check                   # Config validation

# Drills
make drill-ck-down                 # ClickHouse 30s outage (WAL fallback test)
make drill-wal-pressure            # Disk pressure circuit breaker
make drill-recon-mismatch          # Reconciliation mismatch
make rollback-drill                # Rollback procedure

# Maintenance
make canary-auto                   # One-shot canary gate
make wal-archive-cleanup           # Clean old WAL archives
make wal-dlq-status                # DLQ status

Key Environment Variables

VariableDefaultEffect
HFT_STARTUP_RECON_ENABLED1Startup position recovery
HFT_STARTUP_RECON_QTY_THRESHOLD10Stock discrepancy auto-correct threshold
HFT_CHECKPOINT_ENABLED1Periodic position checkpoint
HFT_RECONNECT_HOURS08:30-13:35Auto-reconnect window
HFT_RECONNECT_HOURS_2Secondary window (night session)
HFT_STORMGUARD_FEED_GAP_HALT_S30Feed gap -> HALT threshold
HFT_BACKUP_ENABLED0Automated daily CH backup
HFT_BACKUP_RETAIN_DAYS30Backup retention
HFT_TELEGRAM_ENABLED0Telegram notifications

Notification Flow

AutonomyMonitor detects degradation
  -> NotificationDispatcher
    -> Critical (HALT, daily_loss): bypass rate limit, immediate Telegram
    -> Normal (StormGuard, margin, position): rate-limited (10s/msg)
  -> Operator reviews -> manual_rearm() -> NORMAL

Critical Rules

  1. HALT requires manual rearm — never auto-recover from HALT
  2. FORCE_FLAT has 120s deadline — timeout = fail, escalate
  3. SessionGovernor is wall-clock — not event-driven
  4. TrackGate unknown symbols -> CLOSED (safe default, unless HFT_TRACK_GATE_DEFAULT_OPEN=1)
  5. Autonomy reason codes are frozen — add new ones to the frozen set before using
  6. Remote deployment is manual — never auto-deploy (user feedback rule)

Search skills

Search the agent skills registry