AI infrastructure, tools, and open research.
Sparkco is an open-source research project on the post-AGI stack — the runtime containers agents live in, the harnessing (glue code) inside them, and the messaging between them. It's built by the team behind SimpleFunctions, where we're exploring how live prediction-market probabilities can serve as a real-time world state for AI agents. The site is our public log of that work: a live feed of AI and prediction-market signals, plus the setups and tools we recommend for agent builders.
We ship tools as CLIs first, not MCP — 0 tokens to expose, ~100% reliable, pipe-composable.
Parametric memory: replacing the context window with weights.
Today's chat models remember by re-reading the entire conversation on every turn. Compaction loses information, retrieval crowds the window, and a new session starts blank. We're testing whether the facts, preferences, and behavior in a dialogue can be encoded directly into model weights — leaving the context free for what's actually being said now.
Want to collaborate? patrick@simplefunctions.dev
Read the full directionHide
The context window is a finite token sequence, fully recomputed on every turn. Every existing workaround — summarization memory, vector retrieval, KV caching — moves the cost without solving it: long context drifts, compaction discards information, retrieval crowds the same window it pulls from. If conversational state could live in weight deltas instead of tokens, the window would only need to hold the current turn.
- Test-time training. ByteDance In-Place TTT (ICLR 2026 oral) and Stanford/NVIDIA TTT-E2E update MLP projection weights online during inference, compressing long context into fast weights. All published work targets long-document throughput; nobody has tested whether the fast weights survive once the document is dropped from context.
- Hypernetwork → adapter. Sakana's Doc-to-LoRA (Feb 2026) and P2P (Oct 2025) train a hypernet that emits a LoRA from raw text or a user profile in under a second. Validates "text → weights" as a tractable mapping — but neither was designed for accumulating dialogue history.
- Dialogue-direct fine-tuning. PLUM (Nov 2024) fine-tunes a LoRA on dialogue Q/A pairs and matches RAG at 100 turns. MemLoRA trains memory management itself as a LoRA. IBM's Activated LoRA (Dec 2025) solves multi-LoRA hot-swap without KV recompute — making per-conversation memory modules feasible.
- Knowledge editing. ROME and MEMIT do surgical single-fact edits on weights, but catastrophic forgetting appears past ~1000 edits. Not a candidate at dialogue scale.
These live in disjoint communities — efficient inference, recsys, personalization NLP, on-device, model editing — and have never been compared on the same benchmark. None has been evaluated end-to-end on a real user's multi-hundred-turn history across technical, strategic, philosophical, and personal domains, with the conversation removed from context. Existing benchmarks (RULER, needle-in-haystack, LaMP) are synthetic or shallow.
- TTT fast weights as memory. Ingest a fact-bearing dialogue with In-Place TTT, drop the context, probe. Iterations 1–2 ran on a single A100 with a self-trained checkpoint — full write-up here. Negative: trained fast weights produced perturbation noise, not retrievable encoding, even at small inference-time scales. Joint base+TTT training is the next attack surface.
- Doc-to-LoRA over real dialogues. Same probes, hypernet-generated LoRA instead of TTT. Compare raw-dialogue input against structured-profile input for information retention.
- Modular memory adapters. Decompose dialogue history into facts, preferences, and project context. Train one LoRA per axis; hot-swap with Activated LoRA. Measure single-load vs combined-load interference.
- Capacity and forgetting curves. Stream new facts turn-by-turn; locate the point at which turn N overwrites turn 1. Trace the capacity–fidelity tradeoff.
- A "conversation memory retention" benchmark — three difficulty tiers, six fact dimensions. None currently exists for this scenario.
- First head-to-head comparison of TTT fast weights, Doc-to-LoRA, PLUM-style dialogue-LoRA, and classical summarization memory on the same eval.
- An empirical answer to whether modular per-domain memory adapters can be composed without cross-interference.
Three layers, and what's already out there.
Containers
Sandboxes, microVMs, durable runtimes — where the agent lives.
- e2bCode-interpreter sandboxes; the default for general-purpose runs.
- ModalgVisor + GPU-native; sub-1s starts, scales to 50k+ concurrent.
- DaytonaOpen source; ~90–200ms cold start, fastest in class.
- Fly.io SpritesStateful microVMs with checkpoint/restore and persistent NVMe.
- Vercel SandboxFirecracker + idle-billed; the JS-stack default.
SimpleFunctions sits on top: autonomous daemons, scheduler, and risk gates for prediction-market agents.
Harnessing
Glue code inside the container. Context curation, tool routing, the runtime loop.
- Claude Agent SDKAnthropic's harness; powers Claude Code itself.
- Inspect AIEval-grade harness used by METR, Apollo, and government AISIs.
- LangGraphLangChain's runtime layer — durable execution, threads, HITL.
- Claude Code / Cursor / AiderOpinionated harnesses-in-product; not sold separately.
SimpleFunctions ships /api/agent/world as ~800-token markdown context, plus a CLI with --json for deterministic harness mode.
Messaging
Between containers. Discovery, identity, stateful tasks — not tool-calling.
- A2AGoogle's Agent2Agent (Linux Foundation, 2025) — the emerging consensus.
- ANPPeer-to-peer agent network over HTTPS + DIDs for identity.
- LettaShared memory blocks + thread-based message passing.
- AutoGen GroupChatIn-process orchestration; supervisor / round-robin patterns.
SimpleFunctions Chatbus: agents DM and broadcast in real time — the messaging substrate for trading agents.
What we ship publicly.
Harness & agents
- harnessDual pi-agent runtime — two agents (local + Cloudflare) negotiate, share state, and self-modify via a 5-message protocol.
- MementoContext-integrity stress testing for Claude. Adversarial harness tampers with memory between sessions and watches whether the agent notices.
- claude-arenaAI vs AI vs AI — autonomous Claude agents battle in a live CTF arena with trading.
- claude-tradingAutonomous Claude agents trade against each other on a live exchange — maker vs takers.
SimpleFunctions
Curated lists
- awesome-cli-agentic-toolsCLI tools for AI agents — prediction markets, agent frameworks, coding agents, browser agents, developer CLIs.
- awesome-prediction-marketsAPIs, datasets, and resources for developers and AI agents.
- prediction-markets-reading256 articles on Kalshi, Polymarket, market microstructure, calibration, and trading strategies.
Terminal tools
- kalshi-orderbook-viewerDepth charts for prediction markets, in your terminal.
- kalshi-price-monitorAlerts on significant Kalshi/Polymarket price changes.
- polymarket-sports-mmSports market maker; pre-game and live quoting tuned to the quadratic reward function.
- polymarket-ticker-resolverResolve any Polymarket ID format (numeric, conditionId, CLOB token, slug). Zero deps.
Signals & probability
- prediction-market-edge-detectorDetect mispricings across 30,000+ markets.
- prediction-market-regimeReal-time crisis / risk-off / risk-on / complacent classifier.
- prediction-market-uncertaintyUncertainty index from 30,000+ markets — one number, 0–100.
- causal-tree-decompositionStandalone causal-tree probability engine; thesis → weighted confidence. Zero deps.
World-state plumbing
SDK adapters
- crewai-prediction-marketsCrewAI tools.
- langchain-prediction-marketsLangChain tools.
- openai-agents-prediction-marketsOpenAI Agents SDK tools.
- vercel-ai-prediction-marketsVercel AI SDK tools.
- create-prediction-market-agentScaffold a project. Works with LangChain, CrewAI, OpenAI Agents SDK, or vanilla TypeScript.
- prediction-market-mcp-exampleMinimal MCP server example.
Live feed
Mixed stream from prediction markets, theses, new listings, and the blog.
Dogecoin Up or Down - June 8, 12:50PM-12:55PM ET
Ethereum Up or Down - June 8, 12:50PM-12:55PM ET
BNB Up or Down - June 8, 12:50PM-12:55PM ET
XRP Up or Down - June 8, 12:50PM-12:55PM ET
Bitcoin Up or Down - June 8, 12:50PM-12:55PM ET
Solana Up or Down - June 8, 12:50PM-12:55PM ET
US freezes Russian assets, sanctions Iran, bombs Iran — each action tells the world the dollar syste
The thesis confidence improved marginally due to aggressive market positioning in gold and BTC hitting identified targets, suggesting market participants are increasingly pricing in the reserve-neutral hypothesis. No structural changes obse
Stagflation traps the Fed in an impossible triangle. Powell stays until Warsh confirmation. Trump in
Recent labor market data and inflation prints suggest slightly more resilience in the economy than the pure 'stagflation' frame captures, leading to a minor confidence reduction. The core political dynamics regarding the Fed's leadership re
California 2026 Governor: Mahan Underpriced at 15¢. The mailman's son from Watsonville has the stron
The thesis remains stable at 36% confidence. Market microstructure signals for the California primary are currently highly incoherent, specifically between Steyer and Hilton, indicating a high-noise environment rather than a fundamental shi
Strait of Hormuz stays closed — 14c is a gift on geopolitical reality
R2 prices Hormuz traffic returning to normal by end of June at just 14c, now shifting to taker regime (score 0.625), signaling informed money is pressing the NO side. With US-Iran diplomatic meeting odds crashed to 24c and peace deal odds a
Jet fuel price collapse contract at 30c with 24,000+ IY — fade the fall
M16 shows the 'kerosene jet fuel above threshold' contract at 76c with 68 delta and IY of 3,339 on a 3-day horizon — the market is pricing continued elevated fuel prices. M2/M3/M4 cluster at 30c with opposing 84-85 deltas and extraordinary
15-cent cross-venue arb on Bitcoin reserve — 94% confidence match
X1 (Kalshi) prices Trump national Bitcoin reserve at 13c versus X2 (Polymarket) at 28c — a 15c gap on a 94% confidence venue match, the largest and highest-quality arb in the dataset. Buy X1 YES at 13c and sell X2 YES at 28c simultaneously
64-cent contagion lag in House races — NJ-08 Democratic seat ignored
C1 and C2 show a 63-64c contagion gap: MN-06 Republican moved -34c and MO-08 Republican moved -37c, yet the correlated lagging market (Democratic NJ-08) remains at just 12c. The 64c gap between trigger repricing and the lagging price is the
MA-01 Democratic seat at 13c with 61c contagion gap — structural buy
C4, C5, C6 all point to MA-01 Democratic market at 13c lagging behind trigger moves of 34-37c, generating a 61c gap. Massachusetts-01 is structurally one of the safest Democratic seats in the country, making a 13c price deeply anomalous aga
The United States will launch a ground invasion of Iran. After 5 weeks of airstrikes, the US faces t
Thesis confidence drops as multiple mediation channels (Oman, Pakistan) report breakthroughs, directly contradicting the 'no diplomatic off-ramp' core assumption. Market prices for oil and shipping transit have aggressively corrected, sugge
Putin profits from Iran war oil prices. Russian military budget fully funded. Ukraine peace talks st
The thesis confidence faced a minor downward revision as oil futures markets showed a trend toward stabilizing or retreating from high-end upside bets, contradicting the expectation of an extreme price spike supporting Russia's war budget.
Oil above $100 drives electricity costs up. Data center operating costs surge. AI companies delay or
Recent market signals show a strong retreat in energy price expectations, specifically regarding WTI oil and natural gas benchmarks, which weakens the thesis that electricity costs will surge to the point of impacting data center expansion.
The Hormuz Strait is America's final battle — not because it will lose militarily, but because the c
The thesis confidence has decreased slightly as evidence for a catastrophic, single-event fiscal/economic shock weakens, with market trends favoring more temperate diplomatic and energy pricing scenarios.
Automated Prediction Market Trading: CLI Agents on Kalshi
A practical guide for developers and traders on using CLI-based agents to automate order placement on Kalshi prediction markets. Covers thesis-driven trading logic, real tickers, and the agentic runtime behind production-grade automation.
Prediction Market Terminal Dashboard: Bloomberg-Style Monitoring for Kalshi Traders
A practical guide to building a professional-grade terminal dashboard for monitoring Kalshi prediction markets in real time. Covers CLI tooling, agentic scanning, position tracking, and thesis-driven trade execution.
What we'd install on a fresh machine
Three of ours, five from the community we trust.
npm i -g @spfunctions/cli@spfunctions/harness
SparkcoDual-agent runtime. Two pi-agents (local + Cloudflare) negotiate, share state, and self-modify via a 5-message protocol. $1/day to run.
npm i -g @spfunctions/harnessBrowse 69+ CLI tools
Taste-curated. Filter by category, sorted by Sparkco-first then stars.
npm i -g @spfunctions/cligit clone https://github.com/spfunctions/polymarket-sports-mm@spfunctions/prediction-market-mcp
SparkcoMCP server with 4 tools. Works with Claude, Cursor, VS Code.
npx @spfunctions/prediction-market-mcppip install simplefunctions-aigit clone https://github.com/spfunctions/prediction-market-mcp-examplegit clone https://github.com/spfunctions/kalshi-price-monitorgit clone https://github.com/spfunctions/prediction-market-contextgit clone https://github.com/spfunctions/causal-tree-decompositioncreate-prediction-market-agent
SparkcoScaffold agent projects: LangChain, CrewAI, OpenAI, vanilla TS.
npx create-prediction-market-agentuses: spfunctions/world-state-action@v1npm i langchain-prediction-marketsnpm i openai-agents-prediction-marketsnpm i vercel-ai-prediction-marketspip install crewai-prediction-marketsnpm i agent-world-awarenessgit clone https://github.com/spfunctions/prediction-market-edge-detector@spfunctions/harness
SparkcoDual-agent runtime. Two pi-agents (local + Cloudflare) negotiate, share state, and self-modify via a 5-message protocol. $1/day to run.
npm i -g @spfunctions/harness@spfunctions/bi
SparkcoAgent-friendly BI CLI. Query CSV/JSON/Parquet with SQL via DuckDB. 4 commands: head, schema, query, convert.
npm i -g @spfunctions/bicode --install-extension saoudrizwan.claude-devpip install openai-agentsgo install github.com/xo/usql@latestbrew install stripe/stripe-cli/stripego install github.com/cube2222/octosql/cmd/octosql@latestnpx @anthropic/playwright-mcpgit clone https://github.com/nweii/prediction-market-analysispip install sqlite-utilsbrew install supabase/tap/supabasegit clone https://github.com/Polymarket/agentsgit clone https://github.com/elizaOS/kalshi-ai-trading-botgit clone https://github.com/berlinbra/polymarket-mcp-servergit clone https://github.com/polybot-nexus/polybotgit clone https://github.com/PredictOS/predictospip install dr-manhattangit clone https://github.com/CloddsBot/cloddsbotgit clone https://github.com/polymarket-pipeline/pipelinegit clone https://github.com/gnosis/prediction-market-agentgit clone https://github.com/kalshi-trading/bot-clipip install kalshi-pythonpip install prediction-market-agent-toolingLatest from the blog
Insights on AI agents, prediction markets, and developer tools.
Automated Prediction Market Trading: CLI Agents on Kalshi
A practical guide for developers and traders on using CLI-based agents to automate order placement on Kalshi prediction markets. Covers thesis-driven trading logic, real tickers, and the agentic runtime behind production-grade automation.
Prediction Market Terminal Dashboard: Bloomberg-Style Monitoring for Kalshi Traders
A practical guide to building a professional-grade terminal dashboard for monitoring Kalshi prediction markets in real time. Covers CLI tooling, agentic scanning, position tracking, and thesis-driven trade execution.
Prediction Market Edge Detection: How to Find Mispriced Contracts on Kalshi
A systematic approach to finding mispriced prediction market contracts using causal models, orderbook analysis, and executable edge calculations.
Thesis-Driven Prediction Market Trading: Why Causal Models Beat Signal Chasing
Signal-based bots react to noise. Thesis-driven agents understand why prices should move. Here's how causal models change prediction market trading.
AI Agents for Prediction Markets: How SimpleFunctions Connects Claude to Kalshi
How to connect your AI agent to prediction market data using SimpleFunctions MCP server — get context, inject signals, and trade on Kalshi.
How to Build a Prediction Market Trading Bot with SimpleFunctions CLI
Build a prediction market bot that scans for edges, monitors thesis confidence, and executes trades on Kalshi — all from the terminal.