// QUESTION CLASS

Produce the list, the roster, the dataset.

Datamining is how The Watch turns unstructured source material into structured data. When the question is who the people in a category are, or what events happened in a period, or what dataset to build with which fields, Datamining is the engine that produces it. This is not a report product in the narrative sense; the output is a table, a roster, a timeline, a graph file. It is consumed programmatically as often as it is read. Datamining also feeds the other engines. Baseline reports rely on it to extract structural facts. Forecast runs rely on it for indicator data. Decision targeting relies on it for node inventories. When the output of another engine lists specific entities, specific dates, or specific numbers, Datamining produced that inventory.

// ANATOMY

A Datamining output contains:

A Datamining output is a dataset with a cover document. The dataset itself is the deliverable — a table, a graph, a timeline — in whatever format the customer consumes. The cover document explains how it was built: what the schema was, what queries ran, what sources were scanned, what was rejected by quality gates, and what the analyst's confidence is in each field of the schema. A reader who wants to use the dataset for downstream analysis needs to know what they can trust in it; the cover document is that instrument panel.

§ 01

Schema

The field definitions, declared before the run

§ 02

Query Pack

The specific queries that drove extraction

§ 03

Source Corpus

What was scanned, classified by source type

§ 04

Coverage Assessment

Percentage of known universe captured; known gaps

§ 05

Structured Output

The actual data, rendered as table / graph / timeline

§ 06

Per-Field Reliability

Confidence rating per column of the schema

§ 07

Quality Gates

What was rejected and why

§ 08

Refresh Plan

How this dataset stays current

// TRADECRAFT

Structured data inherits the problems of its sources. The cover document is how we show our work.

Schema first.

The fields are declared before the extraction runs. A Datamining product is only as rigorous as its schema — an ill-defined field produces unusable data. The Intake step pressure-tests the schema against the source universe before the run: do the sources actually support these field definitions, at this coverage target, within this time bound? If not, the schema is revised before anything runs.

Structured Extraction · Schema-first methodology

Per-cell citation.

Every cell of the output table carries the source it came from. Not per-row, not per-entity — per-cell. This is how a downstream analyst knows they can use the data: they can verify any individual datum against the specific source that produced it. Structured data without cell-level sourcing is not auditable data, and we don't produce it.

ICD 206 · Cell-level sourcing

Known gaps are part of the output.

A Datamining product names the gaps in its own coverage. Rows that could not be filled confidently are flagged, not omitted. A "complete-looking" dataset that quietly dropped the 3% of entities with thin sourcing is a less honest dataset than one that includes those rows with LOW-confidence flags. The customer can filter; they cannot unfilter what was never shown.

ICD 203 · Standard 2: Properly expresses uncertainties

Feeds all other engines. Often invoked as the first step before a Baseline or Forecast run.

// OTHER ENGINES

Situational

Monitors

Rolling updates on topics, actors, and regions. What's new, what changed, and what matters now.

Baseline

Profiles

Reference profiles and capability assessments for nation-states, organizations, networks, and systems.

Forecast

Forecasts

Estimative, scenario, warning, and travel-risk analysis. Tests futures, likelihoods, drivers, and indicators before the window closes.