SEC Insider Clusters

SEC Insiders is an automated pipeline for detecting clusters of insider buying and selling of securities. The system monitors Form 4 ownership disclosures and converts raw XML into structured transaction-level data. Then, downstream analysis can isolate discretionary activity from routine or preordained trades.

  • Status: Active development
  • Stack: TypeScript • Node.js • SQLite • SEC EDGAR APIs

System Outline

The system is organized as a three-stage pipeline with SQLite acting as both a durable queue and storage layer. That gives the project concurrent job processing, reliable retry behavior, and an auditable trail back to the original filing.

Ingestion

  • Regularly poll for SEC Form 4 submissions.
  • Enqueue jobs containing CIK, filing URL, and accession number.

Parsing

  • Normalize inconsistent XML structures into a stable schema.
  • Explode each filing into transaction-level rows with ownership metadata.

Analysis

  • Detect clusters of coordinated insider buying or selling.
  • Augment detections with market context and summary output.

Architecture

Ingestion Layer

1. Get data links:

  • Polls the SEC submissions feed via CIK and places the CIK, URL, and accession number into a job queue.

2. Ingest the XML data:

  • Via the job_queue Form 4 XML documents are batch downloaded
  • Note: This queue is a SQLite table which acts as a source for the Worker pool, which processes jobs concurrently while limiting request rates to avoid rate-limit violations.

Parsing Layer

1. Normalize XML filings:

  • XML filings are constructed in disparate ways, under differing schema guidelines. Therefore, the first step is flattening and normalization to allow consistent downstream parsing.

2. Exploding filings into transaction rows:

  • A single form may contain several events that may or may not be linked. When we have linked transactions (e.g. an option is exercised), we create a symbolic link called exercise_group_id, which ties the transactions together, even though they live on separate rows.
  • Each row has metadata such as: issuer CIK, owner name, officer/director/10% owner status, filing date, amendment indicator, 10b5-1 plan flag.
  • This row-level structure makes downstream filtering and analysis much simpler.

Analysis Layer - Identify Clusters of Insider Behavior

A cluster is generally defined as:

  • multiple insiders
  • at the same company
  • executing the same transaction type
  • within a rolling time window

The signal quality is enhanced by additionally filtering the clusters by weighted average price of transactions, in relation to the 20-day and 200-day moving averages. This allows us to gauge commitment more clearly.

The final output is broadcast as an image-card, which is enriched with:

  • time window that the transactions happened
  • total number of insiders and their titles
  • total shares traded
  • total transaction value
  • weighted average price
  • relative position to the 20-day and 200-day moving averages

Image-card for a Purchase Cluster

Example insider purchase activity image card

Image-card for a Sales Cluster

Example insider sales activity image card

Key Technical Decisions

Treat Filings as Collections of Events

Form 4 filings often combine different transactions (options exercise, share acquisition, swaps, etc.) within a single filing.

Instead of treating each filing as a single event, the system splits each filing into transaction-level rows. This allows downstream analysis to focus on the individual aspects of the transaction.

Sample of a Form 4 filing

Above, we have single transaction. With our parsing engine we're able to flatten this document into an easy to decipher table (below). In this example we show how derivatives are tied to their non-derivative acquisition via exercise_group_id. In this case, we can also see how this insider immediately sold 93% of the shares acquired from the exercising of their option.

accession             cik         is_officer  security_type   security_title                             acquired_disposed  transaction_shares  conversion_exercise_price  is_option_exercise  is_from_exercise  is_exercise_related_sale  underlying_title  underlying_shares  sec_owned_post_trx
--------------------  ----------  ----------  --------------  -----------------------------------------  -----------------  ------------------  -------------------------  ------------------  ----------------  ------------------------  ----------------  -----------------  ------------------
0001127602-25-020266  0000066740  1           derivative      Non-qualified Stock Option (Right to Buy)  D                  6650                130.14                     1                   0                 0                         Common Stock      6650.0             0.0
0001127602-25-020266  0000066740  1           non-derivative  Common Stock                               A                  6650                130.14                     0                   1                 0                                           9065.149
0001127602-25-020266  0000066740  1           non-derivative  Common Stock                               D                  6165                150.1801                   0                   0                 1                                           2900.149

In this example the exercised derivative is tied to the non-derivative acquisition, and the follow-on sale remains visible as a separate event rather than being hidden in a single filing-level summary.

Preserve Amendment History

When amendments are encountered, they are treated as explicit, autonomous entries, rather than automatically overwriting the records that the amendment is superseding. This helps to retain an auditable trail.

Filter Toward Discretionary Activity

Not all insider activity is informative. In the Analysis Layer the system filters out noisy activity that obscures discretionary activity, including:

  - 10b5-1 planned trades

  - equity swaps

  - option exercises

  - exercise-related sales

To enhance this filtering, clusters are then filtered by price context. For example, purchase clusters are only retained when their weighted average buy price is below both the 20-day and 200-day simple moving average.

Cluster Tracking

Cluster tracking is a long-running project to track the predictive quality of insider purchase/sales clusters. The cluster_tracking table is recording the initial price, current price, low/high price (since tracking initiated). At the moment, there is no output derived from this data.