What this does
Upload your finalised typology and the full corpus. Documents are coded in batches — each document independently scored against every concept in the typology. The DTM is exported automatically after each batch as a checkpoint. Review the DTM before proceeding to Phase 4 for threshold setting, binarisation, and cluster analysis. If the coding looks wrong, adjust the configuration or return to Phase 2 to revise the typology.
Configuration
Coding mode
0Absent1Mentioned — brief reference only2Important — discussed and emphasised3Central — core focus of document
Processing settings
Documents processed before each automatic DTM checkpoint export.
Characters extracted per document. Higher = more context, more tokens.
If extracted text is below this length the PDF is sent directly to the API for vision-based coding.
Typology CSV
📋
Drop CSV here or click to browse
Finalised typology from Phase 2
Corpus documents
📄
Select multiple files
Drop PDF or TXT files here, or click to browse and select all corpus documents at once
.pdf and .txt · select all at once using Shift+click or Ctrl+click
Resume from checkpoint — optional
⏸
Drop a checkpoint DTM here or click to browse
Documents already in this file will be skipped
Load typology and corpus to begin
Coding progress
0 / 0
✓ Checkpoint exported — DTM saved to your downloads folder
Document-Term Matrix
DTM ready
Cells highlighted in amber indicate low-confidence scores flagged by the LLM. Proceed to Phase 3.2 for threshold setting and binarisation.