Product / Parse & Extract

Extract Every Debt Tranche from Any Bankruptcy Filing

Upload a disclosure statement or plan of reorganization. TrancheLab parses the PDF, classifies sections, filters noise, and returns a structured capital table with confidence scores on every field. Minutes, not hours.

Read the Docs Book a Demo

TrancheLab extraction pipeline: PDF inputs flowing through parsing, classification, extraction, confidence scoring, and deduplication to structured output

297 > ~40Pages filtered before LLM

< 10 minAverage extraction time

0 to 1.0Confidence score on every field

3 formatsPDF, JSON, CSV output

How TrancheLab Extract Works

Step 1:

Upload your filing

Drop a disclosure statement, plan of reorganization, or DIP order. TrancheLab accepts any PDF, including scanned documents. No preprocessing required.

Docs: Supported Filing Types

Upload interface showing drag-and-drop zone with hertz_disclosure_statement_297p.pdf uploaded

Step 2:

PDF parsing with OCR fallback

TrancheLab runs a three-stage parsing chain: PyMuPDF for native text, pdfminer for layout-sensitive extraction, and Tesseract OCR as a fallback for scanned pages. Bad scans do not break the pipeline. Pages with fewer than 80 characters of extracted text automatically trigger OCR.

Docs: PDF Parsing Chain

Three-stage parsing chain: PyMuPDF to pdfminer to Tesseract OCR with decision points

Step 3:

Section classification and pre-filter

Before any LLM call, a deterministic classifier scans every page and tags it: capital structure, classification of claims, recovery analysis, risk factors, legal boilerplate. Only relevant pages pass through. A 297-page filing typically reduces to approximately 40 pages.

Docs: Section Classifier

297 pages filtered down to approximately 40 relevant pages by section classifier

Step 4:

Tranche extraction with confidence scoring

The extraction pipeline identifies every debt tranche and pulls face amounts, outstanding balances, interest rates, maturity dates, seniority rankings, and recovery estimates where disclosed. Every extracted value gets a deterministic confidence score from 0.0 to 1.0. If a value has a raw text excerpt backing it, confidence reflects match quality. If it does not, confidence is forced to 0.

Docs: Confidence Calibration

Extracted tranche table showing debt classes with amounts, rates, maturities, seniority, and confidence scores

Step 5:

Fuzzy deduplication

Levenshtein matching groups tranches that appear under different names across sections or plan amendments. 'First Lien Notes,' 'Existing First Lien Facility,' and 'Prepetition First Lien Credit Agreement' resolve to one entry instead of three. Amounts must be within 5% to merge. When both amounts are missing, name similarity must exceed 90%.

Docs: Deduplication Engine

Before and after deduplication: 8 entries with duplicates merged into 4 clean tranches via Levenshtein fuzzy matching

Step 6:

Structured output

Results export as a sortable data table in the UI, downloadable JSON, or CSV. Every field links back to the source text excerpt from the filing. Click any row to see the exact sentence the value was extracted from, the page number, and the confidence breakdown.

Docs: API Reference

Structured output table with download options for JSON and CSV

Extraction pipeline architecture

Your filing goes through a deterministic pre-filter before any LLM call. Extraction runs on the filtered pages only.

PDF Upload

Parse Chain

PyMuPDF > pdfminer > Tesseract

Section Classifier

deterministic

LLM Extraction

Groq / llama-3.3-70b

Only ~40 of 297 pages reach this stage

Confidence + Dedup

scored 0.0 to 1.0

Levenshtein fuzzy matching on tranche names

Structured Output

Table / JSON / CSV

Supported filing types

Upload any bankruptcy document. More filing types added based on demand.

Disclosure Statements

● LIVE

Plans of Reorganization

● LIVE

DIP Orders

● COMING SOON

RSA Exhibits

● BY REQUEST

First Day Declarations

● COMING SOON

Amended Plans

● COMING SOON

Bar Date Motions

● BY REQUEST

Cash Collateral Orders

● BY REQUEST

Liquidating Plans

● COMING SOON

TrancheLab vs. doing it yourself

Compare TrancheLab to manual analyst work and existing terminal subscriptions.

Capability	TrancheLab	Manual (Analyst)	Terminal Subscription
Time to structured output	< 10 minutes	4 to 6 hours	Varies (if available)
Confidence scoring	0.0 to 1.0 per field	Analyst judgment	Not offered
OCR for scanned filings	Automatic fallback	Manual retype	-Depends on vendor
Deduplication across amendments	Automatic (Levenshtein)	Manual cross-reference	Not offered
Source text excerpts	Linked per value	-Analyst notes	-Sometimes
Cost	API call	Analyst hourly rate	$30K to $50K/year
Coverage	Any Chapter 11 filing	Any Chapter 11 filing	Curated universe only

FAQ

TrancheLab runs a three-stage parsing chain. It tries PyMuPDF first for native text extraction, falls back to pdfminer for layout-sensitive documents, and uses Tesseract OCR as a final fallback for scanned pages. You do not need to preprocess your files.

A confidence score of 0 means TrancheLab could not find a raw text excerpt in the filing to back the extracted value. This can happen when a value is inferred from context rather than stated explicitly. Rather than guess, TrancheLab flags it.

The deduplication engine uses Levenshtein fuzzy matching to group tranches that appear under slightly different names across sections or plan amendments. You get one clean entry per tranche, not three near-duplicates.

Yes. The diff engine accepts two filings and highlights changes in tranche definitions, recovery estimates, and creditor class treatments between versions.

Currently, disclosure statements and plans of reorganization are fully supported. DIP orders, amended plans, liquidating plans, and first day declarations are in development. RSA exhibits, cash collateral orders, and bar date motions are available by request.

Yes. Subscribe to a case via the API after extracting it. TrancheLab polls CourtListener every 6 hours for new docket activity on subscribed cases. When a new filing is detected, it is automatically extracted and diffed against the previous version. If changes are found, you receive an email with the changed tranches, field-level diffs, and a link to view the comparison.

Start Extracting Capital Structure Data

See how TrancheLab turns hundreds of pages into a structured tranche table with confidence scores, in minutes.

Book a Demo