Read once. Scaffold automatically. Query your whole corpus.
Replikit is the workspace where every paper you save becomes a typed Python scaffold, every notebook lives next to its source paper at a stable URL, and the whole project answers questions in your team's voice. Reading becomes a codebase your collaborators can build on — not a folder of PDFs nobody opens twice.
Reading is solo. Building is solo. The notes get lost. The next collaborator starts from zero.
Every paper you've tried to reproduce had the same shape: a tidy methods section, a clear equation, a dataset name, and somewhere between three weeks and never of guessing at the hyperparameters, the library versions, the preprocessing details the authors didn't think to write down.
The deeper problem isn't any single paper — it's that the work disappears between papers. The notebook you wrote last quarter is on a laptop you've since replaced. The Slack thread where you compared baselines is unsearchable. The next collaborator joins your project and you re-explain six months of reading from memory.
Replikit is the place that workflow lives. Papers, notebooks, methods docs, dataset hooks, and discussions in one project. Hosted, queryable, versioned. The corpus you build today is the substrate the next person on your team starts from.
- 36%
- of psychology studies replicated in the Reproducibility Project
- 70%
- of ML researchers report failing to reproduce a paper at least once
- 3wk
- average time spent reimplementing a single paper's methods
Nosek et al., 2015
Nature, 2016 survey
r/MachineLearning consensus
Six layers. Paper in. Scaffold, docs, and a queryable project — out.
Replikit is document-aware and project-shaped. Every paper you save lands in a workspace alongside the notebook it generates, the docs that explain it, and the corpus it joins. The next collaborator inherits the whole project, not the conversation that built it.
- 01
Drop in an arXiv URL or a PDF
Paste an arXiv link, an OpenReview link, or upload the PDF directly. Replikit parses structure (methods, datasets, equations, hyperparameters) instead of treating it as raw text.
- arXiv
- OpenReview
- PDF upload
- 02
Extract — sections, equations, datasets, hyperparameters
A document-aware parser identifies the algorithm boundaries, pulls out LaTeX equations, identifies named datasets and benchmarks, and surfaces hyperparameters from the experimental setup.
- LaTeX-aware
- Dataset detection
- Hyperparameter extraction
- 03
Match against the knowledge graph
Replikit looks up the paper, its authors' prior work, known implementations of cited methods, and the canonical libraries the field uses. You inherit context the chatbot doesn't have.
- Citation graph
- Known impls
- Library inference
- 04
Generate — a notebook scaffold, not a guess
Function stubs with full type signatures. Equations rendered inline. Suggested libraries imported. Hyperparameters wired up to argparse. Cells annotated with the section of the paper they implement.
- Jupyter
- Type signatures
- Cited cells
- 05
Companion docs — methods.md, dataset.md, expected-results.md
Alongside the notebook, Replikit ships markdown that explains the methods in your own working notation, lists the datasets with auto-fetch commands, and records what numbers the paper reports — so you know when you're done.
- methods.md
- dataset.md
- expected-results.md
- 06
Export — Jupyter, Colab, or Hugging Face Spaces
Download the notebook, push to a Colab tab, or scaffold a Hugging Face Space. Citation metadata exports as BibTeX. Your reading list and your code stay in sync.
- Colab
- HF Spaces
- BibTeX
A project. Three papers. One queryable corpus.
The demo below is a real project view: three papers, a generated notebook for each, and a Corpus Q&A tab that asks questions of the whole project at once. Switch papers to see their per-paper scaffolds; switch to Corpus Q&A to see what the workspace gives you that a chat session can't.
Transformer Replication Sprint
3.2.1 Scaled Dot-Product Attention
We call our particular attention "Scaled Dot-Product Attention" (Figure 2). The
input consists of queries and keys of dimension d_k, and values of dimension d_v.
We compute the dot products of the query with all keys, divide each by sqrt(d_k),
and apply a softmax function to obtain the weights on the values.
In practice, we compute the attention function on a set of queries simultaneously,
packed together into a matrix Q. The keys and values are also packed together
into matrices K and V. We compute the matrix of outputs as:
Attention(Q, K, V) = softmax(Q · K^T / sqrt(d_k)) · V (1)- Algorithm
scaled_dot_product_attention
- Inputs
Q (d_k), K (d_k), V (d_v), optional mask
- Equation
Attention(Q,K,V) = softmax(QKᵀ / √d_k) · V
- Hyperparameters
d_k=64, d_v=64, h=8 heads
- Datasets
WMT 2014 EN-DE (4.5M), WMT 2014 EN-FR (36M)
- Library
PyTorch ≥ 1.12 / JAX 0.4+
- Known impls
fairseq, x-transformers, tensor2tensor
- In your corpus
Cited by he-2015 (no), ba-2016 (no)
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
class ScaledDotProductAttention(nn.Module):
"""
Scaled Dot-Product Attention (Vaswani et al., 2017, §3.2.1).
Equation (1): Attention(Q, K, V) = softmax(Q K^T / sqrt(d_k)) V
"""
def __init__(self, d_k: int, dropout: float = 0.0):
super().__init__()
self.scale = 1.0 / math.sqrt(d_k)
self.dropout = nn.Dropout(dropout)
def forward(
self,
q: torch.Tensor, # (B, h, T_q, d_k)
k: torch.Tensor, # (B, h, T_k, d_k)
v: torch.Tensor, # (B, h, T_k, d_v)
mask: torch.Tensor | None = None,
) -> torch.Tensor: # (B, h, T_q, d_v)
scores = (q @ k.transpose(-2, -1)) * self.scale
if mask is not None:
scores = scores.masked_fill(mask == 0, float("-inf"))
weights = self.dropout(F.softmax(scores, dim=-1))
return weights @ v
Project view. Switch papers to see the scaffold for each one. Switch to Corpus Q&A to query the whole project.
A workspace, not a one-shot tool.
Each capability below is something the project itself gives you — accumulated across every paper you add, available to everyone you share the project with.
- 01Hosted
Notebooks and docs at stable URLs
Every notebook, every methods.md, every expected-results.md lives at a permanent project URL. Share by link, not by zip file. Versioned by paper, by run, by edit.
- 02Corpus
Query the whole project, not one chat thread
Ask "which of my saved papers use X?" and get an answer grounded in your actual reading list — with citations back to specific papers in your project, not free recall from the open internet.
- 03Project-shaped
Papers, notebooks, datasets — together
A Replikit project is a unit: the papers you've added, the scaffolds you've generated, the datasets the methods need, the discussion. Onboard a new collaborator by sharing the project, not by re-explaining six months of context.
- 04Honest
Expected-results, so you know when you're done
Every scaffold ships with the numbers the paper reports. Run yours, compare to the table, see the delta. The corpus also tracks which papers you've reproduced, which you've abandoned, and why.
Pricing by usage, not by team size.
Pricing locks in the day the product ships. Reserve the early-access tier now to lock in 50% off for life — see below.
Student
For graduate students and self-funded researchers.
$9/mo- 5 papers / month
- .edu email verification
- Jupyter / Colab export
- Community support
- Most popular
Researcher
For one researcher doing real work.
$29/mo- Unlimited papers
- All output formats
- Citation manager sync (Zotero, BibTeX)
- Email support
Lab
For a research group sharing reproducibility infra.
$99/mo- 5 seats
- Shared scaffold library
- Group reproducibility metrics
- Slack/Discord support
Institution
For a department or industrial R&D group.
$499/mo- 50 seats
- SSO + admin dashboard
- Private knowledge graph
- Procurement-friendly contracts
Reserve a Researcher plan at $14/mo — 50% off, for life.
First 500 reservations only. No charge until launch — refunded if Replikit doesn't ship within 6 months.
What ships after v1.
Driven by what waitlist researchers tell me their actual reproduction blockers are.
- Q1 after launch
Chemistry and biology PDF support
v1 targets ML / CS arXiv. Q1 expands to chemistry (RSC, Nature Chemistry) and structural biology — where reproducibility is a different problem with the same shape.
- Q1 after launch
Auto-fetch datasets from HF + Kaggle
When the paper names a benchmark Replikit recognizes, the scaffold ships with a working dataset loader — no manual download, no path-fiddling.
- Q2 after launch
Reproducibility scorecard
A pre-flight checklist — code availability, dataset availability, hyperparameter completeness, hardware specs — so you know up-front whether reproduction is plausible.
- Q3 after launch
Citation graph visualization
See what cites this paper, what it cites, and which of its dependencies already have Replikit scaffolds in your library. Find the next paper worth your week.
The earlier you join, the more you shape v1.
I'm Tom — a data engineer who has spent more weekends than I'd like reimplementing other people's papers from scratch. If you've ever filed a reproducibility issue in your own head and not in a GitHub issue, your input shapes Replikit.
Honest answers.
Are you claiming to reproduce papers automatically?
No, deliberately. Replikit produces a scaffold — typed function stubs, the equation in the docstring, the datasets wired up, the hyperparameters from the paper recorded in a config. The actual experiment, the comparison to reported numbers, the writeup — that's still yours. The claim is 70% of the boring infrastructure, not 100% of the science.
Why a workspace instead of a chat session?
Chat is great for one paper, one session, one person. Replikit is built for the opposite — many papers, over months, across collaborators. The scaffold is hosted at a stable URL (not in your chat history). The companion docs are linked to the source paper (not in another tab). The whole project is queryable (so your reading list itself becomes a knowledge base). And every artifact carries citations back to the paper it came from — so someone joining your project six months in can navigate the corpus, not interview you.
Will you train models on the papers I upload?
No. Your uploads stay in your tenant and are not added to any training corpus. The model is invoked with your extracted structure + a system prompt; the responses are not fed back. Business model is per-paper pricing, not data resale.
Do you handle paywalled papers?
If you upload the PDF (you've paid for or otherwise legitimately accessed), Replikit processes it. Replikit does not bypass paywalls and does not store the source PDF beyond the time required to extract structure.
What disciplines does v1 cover?
Machine learning and computer science (arXiv cs.* and stat.ML) are the v1 target — that's where the reproducibility crisis is loudest and the implementation patterns are most standardized. Chemistry, biology, and physics follow in the first quarter after launch.
What if the paper's methods section is genuinely vague?
Replikit's scaffold will be vague too — and it will say so. Sections of the notebook that depend on unspecified details are marked TODO with the specific question ("learning rate schedule not stated; defaulting to cosine; verify with authors"). The scaffold is honest about what it doesn't know.
Can I edit and re-run?
Yes — the scaffold is yours. You can edit the notebook, swap libraries, plug in your data. Re-running Replikit on the same paper preserves your edits in cells marked USER and only updates the auto-generated cells.
When will it launch?
Depends on validation signal from this page. If the waitlist hits critical mass in the next two weeks, I start building full-time and ship v1 in 10–12 weeks (the document parser + knowledge graph are the long poles). If not, I respect your time and don't keep you on a list for vaporware.