Skip to main content
PYRAMYD

The Data Foundation

Last refresh · 9 days ago

The live product graph for enterprise software.

The PYRAMYD Product Graph is the most comprehensive, structured, and continuously refreshed knowledge base of enterprise software ever assembled.

252K+ Products2.6K Categories2.4M+ Reviews88 Node Types10 Field Groups163 Industries249 Countries4,329 Personas
Loading 16,600-node production slice…

The enrichment pipeline

Every node, 10 field groups, one canonical shape.

Each taxonomy node is enriched through the same 10 field groups in canonical tab order. Watch a single category fill in · every cell carries its model, prompt hash, and citation set.

Product CategoryCRM · Salesforce · 4.5 / 10

Enrichment progress

100%

Filling Pulse …

  • 01Overview
  • 02Demand
  • 03Market
  • 04Landscape
  • 05Trends
  • 06Operations
  • 07Compliance
  • 08Economics
  • 09Capabilities
  • 10Pulse

Every taxonomy node (category, industry, country, persona, product) carries the same 10 field groups in this canonical order. Same shape across taxonomies = one mental model for analysts, one API for connectors, one audit row per cell.

Enrichment on demand

Enrich any taxonomy, row, column, group, or cell.

The same runner powers every scope. You pick what to refresh; PYRAMYD pays for most of it because every enrichment compounds back into the shared graph.

Taxonomy

1 of 5

All rows · all 10 groups · all cells. Held under an advisory lock so the run is atomic.

Subsidized

Row

2 of 5

One node (e.g. one category, one country) refreshed across every enrichable field group.

Subsidized

Column

3 of 5

One field across many rows · ship a new dimension to your whole taxonomy at once.

Subsidized

Group

4 of 5

One field group (e.g. demand, landscape, compliance) refreshed across the rows you pick.

Subsidized

Cell

5 of 5

One field on one row · the cheapest unit. Costs cents and runs in seconds.

Subsidized

Why we subsidize

Every enrichment you run
makes the graph richer for everyone.

When you enrich a row, PYRAMYD keeps a copy in the shared product graph. That means your $X of LLM + retrieval spend funds the next thousand customers asking the same question · so we charge you the marginal compute, not the sticker price.

  • · Pay only the marginal LLM + embedding cost, not the full retrieval pipeline.
  • · Every cell shipped is permanently citable · model, prompt hash, source URL, retrieval timestamp.
  • · Re-running a stale row falls back to the cache if the verification gate still passes.

What you pay

Marginal compute, not retail LLM pricing.

Cell · 1 field, 1 row

~3-8 LLM calls + 1 embedding

$0.02 · $0.08

Row · 1 node, 10 groups

~120-180 LLM calls + verification

$0.70 · $1.40

Taxonomy · 4,329 personas

Atomic run, advisory-locked

$3K · $6K

Ranges reflect frontier-model selection (Claude Opus, GPT-5) vs. budget-tier (Haiku, Mini) · same verification gate either way.

Cell-level provenance, live in the workspace

Click any field. See where it came from.

A real Chrome capture from the production graph · the provenance popover shows freshness, quality score, confidence, sources, raw payload, and full revision history for the field the user clicked.

Provenance popover open inside the PYRAMYD product graph Category 360 view, showing Freshness, Quality, Confidence, Sources, Raw, and History tabs for a field.

Live capture from the production graph · Battlecards module · Provenance popover open on a Strengths field.

Freshness

How recently the field was verified against its source. Within SLA in this example · refreshed 1 day ago.

Quality & Confidence

Quality is the underlying signal strength (65/100 here). Confidence is how sure the model is the extraction is correct. Both rendered as discrete scores, not heuristic stars.

Sources · Raw · History

Three tabs · Sources lists every URL fingerprinted into the field, Raw shows the model's captured payload byte-for-byte, History is the audit log of every previous value.

What's in the graph

Every vendor is a typed node with real fields.

A real Salesforce-shaped record from the production graph. Every field is typed, every FK traversable, every signal carries provenance.

Vendor node · production shape

{
  "id": "8a7c5f1e-...",
  "name": "Salesforce",
  "description": "Customer relationship...",
  "country": "United States",
  "industry": "Software · SaaS",
  "size": "Large Enterprise",
  "reviewCount": 18420,
  "productCount": 47,
  "categoryCount": 23,
  "dataQuality": {
    "verifiedAt": "2026-05-28T14:11:02Z",
    "confidence": 0.94,
    "sourcesCount": 12
  },
  "provenance": {
    "sourceUrl": "https://salesforce.com/about/",
    "retrievedAt": "2026-05-28T08:22:17Z",
    "refreshCadence": "weekly"
  }
}

Every field is queryable. Every foreign key resolves. Every signal links back to a source URL with a retrieval timestamp.

MCP server response · getVendor(slug="salesforce")

{
  "vendor": { "name": "Salesforce", "id": "..." },
  "products": [
    { "name": "Sales Cloud", "score": 4.4,
      "reviews": 9821, "category": "CRM" },
    { "name": "Service Cloud", "score": 4.3,
      "reviews": 4102, "category": "Help Desk" }
  ],
  "topCategories": ["CRM", "Marketing Auto",
    "Sales Engagement", "Field Service"],
  "competitors": [
    { "name": "HubSpot", "score": 4.5 },
    { "name": "Microsoft Dynamics", "score": 4.2 }
  ],
  "citations": [
    { "field": "reviewCount",
      "source": "g2.com/.../salesforce",
      "retrievedAt": "2026-05-28T..." }
  ]
}

Any agent · Claude, ChatGPT, GitHub Copilot, your internal LLM · can query the graph through the MCP server. Responses come back typed, cited, and traversable.

The Six Layers

Volume. Velocity. Variety. Governance. Network effects.

Each layer reinforces the others. Together they form a foundation that gets deeper every week · not narrower.

01

Volume

252K+ enterprise software products mapped across 2,606 live categories with 2.4M+ aggregated reviews · the broadest structured enterprise-software graph in production.
02

Velocity

1,000+ signal sources refreshed continuously on a per-source cadence. Multi-tier ETL learns when a vendor releases vs. when a category shifts and adjusts refresh rates accordingly.
03

Variety

88 universal node types · 200+ pre-transformed connectors · 183 bi-directional connectors. Every entity typed; every relationship traversable.
04

Governance

SOC 2 Type 2 · ISO 27001 / 42001 · GDPR · CCPA · EU AI Act Article 50 · all in progress. Every signal already carries provenance: source URL, retrieval timestamp, model + version, confidence.
05

Network Effects

Every customer's RFX responses, battle cards, and win/loss data privately enrich their tenant slice. Aggregated trends improve the public graph for everyone · without revealing any single customer's data.
06

AI Specialization

Inline embeddings on every node · multi-hop traversal · Graph RAG · 4× multi-hop accuracy vs. vector RAG. Vertical AI no horizontal copilot can match.

Internal estimate: 12–24 months and $1–3M of focused engineering + data acquisition spend to replicate from scratch.

Depth × Breadth

Not just rows. Every row, enriched across 10 dimensions.

Every taxonomy carries the same 10 field-group schema · overview, pulse, demand, market, landscape, trends, economics, operations, compliance, capabilities. Multiply row count × column width × 10 dimensions and the foundation is structured, not just big.

NodeRowsColsStructured pointsRefresh
Products251,83528872.5MWeekly · per-source cadence
Reviews2,447,96495232.6MWeekly · 200+ review sites
VPT features2,870,530190545.4MWeekly · vendor changelogs
Companies251,83527368.8MMonthly · funding + hiring
Categories2,606154401,324Weekly · LLM-enriched
Industries74210678,652Monthly · LLM-enriched
Countries1,73890156,420Quarterly · regulator-tracked
Personas4,32976329,004Quarterly · LLM-enriched
Total~5.8Macross 8 node types~620M structured data pointsRolling enrichment

Enrichment coverage

100% of every taxonomy, fully enriched.

Every row in each taxonomy carries an LLM-grounded enrichment payload · google-search grounding, citations preserved, model + timestamp tracked per row. Pulled live from production.

Categories

100%

2,606 of 2,606 rows enriched

4 models · google-search grounded

last enriched: 2026-05-29

Industries

100%

163 of 163 rows enriched

3 models · google-search grounded

last enriched: 2026-02-14

Countries

100%

249 of 249 rows enriched

3 models · google-search grounded

last enriched: 2026-01-18

Personas

100%

4,329 of 4,329 rows enriched

2 models · google-search grounded

last enriched: 2026-05-15

Refresh cadence

Weekly refresh on every node. Real-time on demand.

The default cadence is weekly · categories, industries, countries, personas, products, reviews all re-enriched on a 7-day rolling cycle. When a customer needs an update sooner, any node can be re-enriched in real time from the workspace; token consumption applies.

Weekly

Default cadence · every node, every taxonomy

Every category, industry, country, persona, product, and connector-sourced signal re-enriches on a 7-day rolling cycle. Battle cards, dashboards, and APEX answers all carry the latest evidence without an analyst lifting a finger.

Real-time

User-triggered · any node, on demand

Need fresher data right now? Hit refresh on any node from the workspace and the enrichment job runs immediately. Token consumption applies per re-enriched row, billed against your workspace budget.

Continuous

Always-on signal ingestion

Review feeds, competitor website diffs, funding announcements, hiring signals, and press wires stream into the graph as they happen. Surfaced into CI Hub and Alerts within hours of the source event.

Why this matters

Every node carries a verified-at timestamp. When a review changes, a category shifts, or a competitor ships a price change, the battle card refreshes automatically on the weekly cycle · and any analyst can force an instant refresh on the rows that matter most this morning, with provenance preserved on every cited claim.

Live Today

What's actually in the graph, right now.

These aren't roadmap numbers. Every figure here is queryable on the platform today.

01

252K+

Enterprise software products tracked across 2.6K categories

02

2.4M+

Aggregated software reviews from 200+ review sources

03

1K+

Live signal sources, refreshed on per-source cadence

04

88

Universal node types across the entire graph schema

05

200+

Pre-transformed connectors, 183 bi-directional

06

3.08×

Live query speedup vs. baseline (2,519ms → 818ms)

88 Universal Node Types

Every entity in enterprise software, typed and connected.

The graph schema is the foundation. Ten categories span people, entities, products, revenue, finance, operations, comms, content, data, and systems · and every edge between them is a query the graph can answer.

6types

People

Contacts · roles · positions · interviews

8types

Entities

Companies · teams · workspaces · segments · locations · countries · industries

6types

Products

Products · categories · features · releases · reviews

8types

Revenue

Deals · orders · pipelines · contracts · campaigns · cadences · battle cards

7types

Finance

Transactions · postings · ledgers · periods · budgets · forecasts · filings

12types

Operations

Ideas · requirements · issues · projects · roadmaps · cycles · objectives · capabilities · processes

5types

Comms

Messages · communications · chats · channels · events

10types

Content

Documents · articles · sheets · slides · notebooks · canvases · forms · files · folders · transcripts

14types

Data

Datasets · catalogs · connectors · transformations · prompts · agents · runs · models · experiments · metrics · signals · dashboards

12types

Systems

Repositories · branches · commits · credentials · settings · activities · devices · alerts · applications · policies · services

Every Signal Cited

Provenance is not optional.

Every node and every signal carries the metadata regulators want and the metadata sales leaders need: source, time, model, confidence, verification status.

Source URL

Every signal links back to the real source · a press release, a review, a vendor changelog, a regulatory filing.

Retrieval Timestamp

When the signal was captured. When it was last re-verified. When the source itself was updated.

Model + Version

Which model wrote the enrichment. Prompt version. Token count. Cost. Confidence score per field.

Verification Status

Verified · Needs Review · Disputed. A 2-gate audit (completeness + content) before any row reaches APEX.

When a customer asks "where did this come from?" they have a defensible, regulator-ready answer.

See the graph live in your category.

In 30 minutes we'll pull live data for your top 5 competitors, walk the graph, and show APEX answer a multi-hop question with every citation traceable to source.