tabpfn-feature-encoder-runner

Name: tabpfn-feature-encoder-runner
Author: JO5HO4

Use when running, configuring, validating, or documenting the tabpfn-feature-encoder training repo, including conda setup, runner scripts, output artifacts, tests, and Git hygiene.

Install

mkdir -p .claude/skills/tabpfn-feature-encoder-runner && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/13566" && unzip -o skill.zip -d .claude/skills/tabpfn-feature-encoder-runner && rm skill.zip

Installs to .claude/skills/tabpfn-feature-encoder-runner

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

Use when running, configuring, validating, or documenting the tabpfn-feature-encoder training repo, including conda setup, runner scripts, output artifacts, tests, and Git hygiene.

180 chars✓ has a “when” trigger

About this skill

TabPFN Feature Encoder Runner

Repo Basics

Repo root: tabpfn-feature-encoder.
Main 12-class source residual config: configs/source_residual_mlp.yaml.
Particle GNN config: configs/source_gnn.yaml.
Particle transformer config: configs/source_transformer.yaml.
Full workflow launcher: bash scripts/run_full_workflow.sh.
Dispatcher spelling: bash scripts/run full workflow.
Launcher: bash scripts/run_source_encoder.sh.
Source transfer rerun: bash scripts/run_source_transfer.sh.
CP transfer rerun: bash scripts/run_cp_transfer.sh.
Open-data transfer rerun: bash scripts/run_gamgam_transfer.sh.
Context comparison plots: bash scripts/plot_context_comparison.sh.
Test runner: bash scripts/run_tests.sh.
Package CLI: tabpfn-encoder-train train --config configs/source_residual_mlp.yaml.
Output dir is configured by output_dir.

Environment

Use the existing conda env:

conda activate tabpfn
python -m pip install -e ".[train,atlas,plots]"

The runner falls back to conda run --no-capture-output -n tabpfn if the console script is not on PATH.

Runner Behavior

scripts/run_full_workflow.sh:

Runs configs/source_residual_mlp.yaml, configs/source_gnn.yaml, and configs/source_transformer.yaml by default.
For each config, trains the 12-class source encoder and then runs source-task, CP even/odd, and open-data transfer evaluations.
Runs configs in parallel by default when multiple GPUs are visible, with one config per GPU.
Streams per-config logs to the terminal and writes full logs to runs/workflow_logs/<timestamp>/.
Set TABPFN_WORKFLOW_STREAM_LOGS=0 to disable live log streaming.
Select GPUs with TABPFN_WORKFLOW_GPUS=0,1,2,3 bash scripts/run_full_workflow.sh.
Force sequential execution with TABPFN_WORKFLOW_PARALLEL=0 bash scripts/run_full_workflow.sh.
Runs context comparison plotting at the end unless TABPFN_WORKFLOW_PLOT=0 is set.
Accepts optional config paths to restrict the workflow: bash scripts/run_full_workflow.sh configs/source_residual_mlp.yaml.

scripts/run_source_encoder.sh:

Sets TABPFN_MODEL_CACHE_DIR to $SCRATCH/tabpfn_model_cache unless already set.
Sets PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True unless already set.
Reuses an existing checkpoint from ~/.cache/tabpfn when available.
Accepts an optional config path: bash scripts/run_source_encoder.sh configs/other.yaml.
Runs the GNN with: bash scripts/run_source_encoder.sh configs/source_gnn.yaml.
Runs the transformer with: bash scripts/run_source_encoder.sh configs/source_transformer.yaml.
Reruns source-task transfer from a checkpoint with: bash scripts/run_source_transfer.sh.
Reruns CP even/odd transfer from a checkpoint with: bash scripts/run_cp_transfer.sh.
Reruns open-data transfer from a checkpoint with: bash scripts/run_gamgam_transfer.sh.
Plots encoder comparison PDFs with: bash scripts/plot_context_comparison.sh.

Model Layout

Keep distinct encoder definitions in separate files under src/tabpfn_feature_encoder/models/.
Current modules: mlp.py, feature_gate.py, feature_mixer.py, gnn.py, transformer.py.
Keep model selection in models/factory.py.
Keep PyTorch import helpers in models/torch_utils.py.
Leave models/encoders.py as a compatibility re-export layer, not the place for new model logic.

Source Training Expectations

Training uses frozen TabPFN support/query episodes; only encoder weights are optimized.
For the 12-class source task, the trainer uses binary ECOC by default: encoder.tabpfn_max_classes: 2, encoder.many_class_redundancy: 4.
Default source configs use output_dim: 72, learning_rate: 0.0002, grad_clip_norm: 1.0, and validation_episodes: 8.
Source embeddings are detached before fitting the TabPFN support prompt by default (detach_support_gradients: true) so the encoder learns from query gradients without differentiating through prompt construction.
Validation is episodic and rotates through validation support/query contexts using the same 50/50 support/query split as training.
Epoch logs should include grad_norm_mean, grad_norm_max, and skipped_nonfinite_updates.

Validation Commands

Prefer these before finalizing code changes:

bash scripts/run_tests.sh

Pass pytest selectors through for focused checks, for example:

bash scripts/run_tests.sh tests/test_config.py

Clean generated Python/cache files before committing:

find . -type d -name __pycache__ -prune -exec rm -rf {} +
find src -maxdepth 2 -type d -name '*.egg-info' -prune -exec rm -rf {} +

Artifacts

Training saves:

metrics.json
training_summary.json
epoch_metrics.csv
encoder_classifier.pkl
run_metadata.json
source_generalization/source_12_class_generalization_metrics.json
source_generalization/source_12_class_generalization_context_scan_metrics.csv
source_generalization/source_12_class_generalization_context_scan_roc_auc.png
source_generalization/source_12_class_generalization_baseline_proba.npy
source_generalization/source_12_class_generalization_frozen_encoder_proba.npy
cp_generalization/cp_even_odd_generalization_metrics.json
cp_generalization/cp_even_odd_generalization_context_scan_metrics.csv
cp_generalization/cp_even_odd_generalization_context_scan_roc_auc.png
cp_generalization/cp_even_odd_generalization_baseline_proba.npy
cp_generalization/cp_even_odd_generalization_frozen_encoder_proba.npy
open_data_generalization_metrics.json in transfer.output_dir
open_data_generalization_context_scan_metrics.csv in transfer.output_dir
open_data_generalization_context_scan_roc_auc.png in transfer.output_dir
open_data_generalization_baseline_proba.npy in transfer.output_dir
open_data_generalization_frozen_encoder_proba.npy in transfer.output_dir
context_scan_comparison/*_roc_auc_comparison.pdf
context_scan_comparison/*_accuracy_comparison.pdf

Terminal metrics print to three decimals; CSV/JSON keep full precision.

Install

mkdir -p .claude/skills/tabpfn-feature-encoder-runner && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/13566" && unzip -o skill.zip -d .claude/skills/tabpfn-feature-encoder-runner && rm skill.zip

Installs to .claude/skills/tabpfn-feature-encoder-runner

Safety

Review before install

Runs shell / code

Automated static scan of the SKILL.md and repo. A flag describes what the skill can do — not a verdict. Always review code before installing.

Source & maintenance

Updated

1mo ago

Repo stars

Loads

~1,532 tokens

Stars are for the whole repository, not this skill alone.

Stats

Views

Installs

Author

JO5HO4

Links

Source code

tabpfn-feature-encoder-runner

Install

Activation

About this skill

TabPFN Feature Encoder Runner

Repo Basics

Environment

Runner Behavior

Model Layout

Source Training Expectations

Validation Commands

Artifacts

Search skills