We are founding a new field: Computational Personality Science.
A startup that will be acquired in five years builds a product. A startup that will define a category for the next twenty years founds an academic discipline. We are doing the second.
Computational Personality Science sits at the intersection of four traditions.
Kahneman’s dual-process theory, Tulving’s episodic/semantic distinction, Vygotsky’s cultural scaffolding. We borrow constructs and contribute longitudinal multimodal data.
Schwartz values inventory, Big Five, Russell’s circumplex, Martin’s humor styles. We operationalize them in a single Pydantic schema.
Foundation models, PEFT/LoRA, federated learning, stylometry, behavioral prompting. We add the structured intermediate that makes them auditable.
Ricœur’s narrative identity, Nissenbaum’s contextual integrity, Sofka’s grief tech research. We translate them into architectural primitives.
The Personality Test™
The Turing Test asks: can a machine be mistaken for a human?That question is no longer interesting. The new question — and the one this category requires — is: can the system be mistaken for a specific human?
The protocol
- The creator (a real person) answers 30 unseen questions in writing — the Personality Probe Set (PPS) — held out from training data.
- The capsule, built only from the creator’s training-set data, answers the same 30 questions independently.
- 3–5 evaluators per creator (typically family members) see paired (real, capsule) answers in random order.
- Each pair is rated on three 7-point Likert scales: authenticity, reasoning fidelity, linguistic signature.
- Forced-choice distinguishability: which answer was the AI? Random performance = 50%; trivial detection = 100%.
The harness in packages/personality/aitc_personality/eval.py refuses to run if any held-out content’s SHA-256 appears in the training corpus index — preventing accidental leakage.
Targets by sprint
| Sprint | Capability | Target D |
|---|---|---|
| Sprint 1 | PV + Behavioral Prompting on Claude/GPT-4o | ≤ 0.75 |
| Sprint 2 | + voice (ElevenLabs), + UI-side Authenticity Seal | ≤ 0.70 |
| Sprint 3 | + LoRA on open-source LLM, + biometric inputs | ≤ 0.65 |
| Sprint 4 | + 3D avatar (text-channel evaluation) | ≤ 0.60 |
| Funded MVP | Full system, 50-creator dataset | ≤ 0.58 + paper |
Distinguishability rates around 0.6 are remarkable for cognitive tasks of this kind: even people who know the creator extremely well struggle to tell the capsule apart from the real person.
- • Protocol v1.0 + 5-creator pilot results — 2026-Q3 — PLOS ONE / CHI 2026
- • Open-source eval harness — 2026-Q3 — this repository
- • Public leaderboard — 2026-Q4 — aitimecapsule.com/benchmark
- • 50-creator funded study — 2027-Q2 — Science Advances target
Eleven manuscripts in active preparation, lead author Rodion Sorokin.
These are not abstracts. They are full manuscripts (DOCX/PDF, 20–50 pages each) authored over the second half of 2025 and Q1 2026, currently in submission to PLOS ONE, CHI 2026, JMIR, and the ACM Journal on Responsible Computing.
Four research tracks, four target departments at Columbia and beyond.
Longitudinal Laboratory for Decision Science — first opportunity to capture lifetime longitudinal data on how decision-making heuristics evolve. Funding: NSF, Sloan.
Empirical validation of therapeutic potential in grief support and the study of cognitive aging. Funding: NIH, private mental-health foundations. IRB-gated throughout.
Living case study for the Columbia Initiative on AI and Data Science for Society (AIDS4S). Joint AI Governance Case Study. Funding: NSF, Knight, DARPA.
Partnership with Columbia’s Oral History Archives to create the first digital avatars of historical figures. Funding: Mellon, NEH.
The category will have an academic discipline whether we name it or not. By naming it first, defining its methods first, publishing its benchmark first, and partnering with the most credible institutions first, we make ourselves the de facto reference architecture. This is a moat no startup capital can buy after the fact.