CRM Analytics

Performance analytics layer for the CRM module, providing KPI dashboards, sales funnel analysis, and lead intelligence from a continuously-synced materialized table.

Overview

The CRM Analytics system solves a core performance problem: the original CRM had to run complex UNION ALL queries joining five or more tables every time a manager opened the analytics tab, making response times unacceptably slow at scale.

The solution introduces a materialized table (crm_analytics_leads) that pre-denormalizes every prospect and client row into a flat structure. A cron-based incremental sync keeps this table current within ~2 minutes. On top of this layer, a snapshot cache (crm_analytics_snapshots) pre-computes the four most-used date presets so the common case returns immediately from a single row lookup.

The analytics dashboard then serves two modes:

Snapshot mode: for the four preset date ranges (last 30 days, this month, last 3 months, last 6 months) — returns the pre-computed JSON instantly.
Live compute mode: for any custom date range or when an interactive filter (salesman, tier, stage, product) is active — queries crm_analytics_leads directly. This is still fast because the materialized table is a flat denormalized index, not raw relational data.

Key Concepts

Concept	Description
Lead	A unified row in `crm_analytics_leads` representing either a prospect (`record_type = 'prospect'`) or a company in the CRM pipeline (`record_type = 'company'`)
Virtual stage	A computed classification that overrides the raw pipeline stage. Priority: `customer_lost` > `won` > `in_recharge` > pipeline `stage_name`. Every lead falls in exactly one bucket.
Tier	A shipment-volume band based on monthly shipments: `blank` (0), `T0` (1–10), `T1` (11–100), `T2` (101–350), `T3` (351–750), `T4` (751–1,500), `T5` (1,500+).
Snapshot	A pre-computed analytics payload stored as JSON in `crm_analytics_snapshots`, keyed by `MD5(startDate\|endDate\|localeIds)`.
Dirty detection	The sync strategy that finds records changed since the last watermark by checking `created_at` timestamps across all related tables.
Rolling refresh	An additional sync pass that re-fetches the 10,000 oldest rows (by `synced_at`) to catch changes that do not produce a new timestamp (e.g., salesman reassignment).
Cutoff date	`CRM_ANALYTICS_CUTOFF_DATE` (`2025-01-01`). Applied only during the cron's discovery phase to avoid mass-importing historical records. It is not applied when refreshing rows already in the table, nor by the cleanup step, nor by `upsertCrmAnalyticsLead`. This means a pre-2025 client that receives a new activity via the UI will be inserted and kept permanently in the materialized table.
Lead Score	A 0–10 quality score computed from 6 boolean criteria. Criteria differ for prospects vs clients.
closing_days	Days elapsed from first follow-up activity to the first note with `stage_name = 'won'`. Only set when both dates are known.

Data Flow

Analytics Request Flow

Materialized Table Sync Pipeline

The live upsert path (triggered from controllers when a note is saved) always uses bypassCutoff: true, so a pre-2025 client that receives a new activity is immediately inserted into crm_analytics_leads.

Snapshot Pre-computation

ETL: How Rows Enter `crm_analytics_leads`

The ETL is performed by two denormalizer functions that run inside the sync job. Both write to the same flat table via INSERT … ON DUPLICATE KEY UPDATE on (record_id, record_type).

There are two entry paths:

Path	When	Cutoff applied?
Cron — new IDs	A company/prospect not yet in the table is detected as dirty	Yes (`created_at >= 2025-01-01`)
Cron — existing IDs	A row already in the table is detected as dirty or picked by the rolling refresh	No (bypassed)
Live upsert (`upsertCrmAnalyticsLead`)	A controller saves or updates a follow-up note	No (bypassed)

This design means a client created before 2025 is never imported in bulk by the cron, but is inserted the moment a user registers a new activity in the UI — and stays in the table indefinitely because the cleanup step only removes records that are soft-deleted, have status = 0, or have no follow_up_id.

Companies (`record_type = 'company'`)

Source: companies c filtered by c.follow_up_id IS NOT NULL AND c.status != 0 AND cfs.source = 'company'.

Joined Table	Join Condition	Fields Extracted
`users u_sal`	`u_sal.id = c.salesman`	`executive_id`, `executive_name` (fallback `'N/A'`)
`locales l`	`l.id = c.locale_id`	`locale_rate` (fallback `1`)
`catalog_follow_up_statuses cfs`	`cfs.id = c.follow_up_id`	`stage_name`, `stage_color`
`catalog_follow_up_statuses cfs_p`	`cfs_p.id = cfs.parent_id`	parent stage wins: `IFNULL(cfs_p.name, cfs.name)`
`company_monthly_tiers cmt`	`cmt.company_id = c.id` GROUP	`tier_value = MAX(total_shipments)`
`follow_up_data fud`	`fud.company_id = c.id`	`account_value`, `monthly_shipments`, `content_id`
`catalog_suggestions_package_content cspc`	`cspc.id = fud.content_id`	`product_tag = translation_tag`
`follow_up_comments fuc_first`	`fuc_first.company_id = c.id` GROUP	`first_activity_at = MIN(created_at)`
`follow_up_comments fuc_won`	joined with `cfs_w.name = 'won'` GROUP	`won_at = MIN(created_at)`
`EXISTS(shipments s)`	`s.company_id = c.id AND s.canceled = 0`	`has_shipment` (0 or 1)
`EXISTS(payment_history ph)`	`ph.company_id = c.id`	`has_recharge` (0 or 1)

Prospects (`record_type = 'prospect'`)

Source: prospection_users pu filtered by pu.follow_up_id IS NOT NULL AND pu.company_id IS NULL AND pu.status != 0 AND cfs.source = 'company'.

Key differences from companies:

Field	Prospects source
`executive_id / name`	`administrators adm ON adm.user_id = pu.created_by` + `users a_user ON a_user.id = pu.created_by`
`tier_value`	`fud.monthly_shipments` (no `company_monthly_tiers`)
`first_activity_at / won_at`	`prospect_follow_up_comments` (column `status_id`, not `follow_up_status_id`)
`has_shipment / has_recharge`	Always `0` — prospects have no shipment or recharge history

Derived Columns (computed in SQL at fetch time)

Column	Formula
`is_lost`	`1` when `stage_name IN ('lost', 'not_interested', 'customer_lost')`, else `0`
`is_won`	`1` when `stage_name = 'won'`, else `0`
`closing_days`	`TIMESTAMPDIFF(DAY, first_activity_at, won_at)` — `NULL` when either date is missing
`age_days`	`TIMESTAMPDIFF(DAY, created_at, NOW())` — recomputed on every sync
`tier_label`	Computed in JS (`_tierLabel()`) from `tier_value`; must match the SQL CASE in the materialized script

Tier Label Mapping

`tier_value` range	`tier_label`
`0`	`blank`
`1 – 10`	`T0`
`11 – 100`	`T1`
`101 – 350`	`T2`
`351 – 750`	`T3`
`751 – 1,500`	`T4`
`> 1,500`	`T5`

This mapping lives in both _tierLabel() (Node.js sync job) and TIER_RANGES (frontend analytics.constants.js). Keeping them in sync is a deployment concern.

Analytics Computation

Query-Time Filters

All active filters are concatenated into a sharedFilters SQL string appended to every query inside computeCrmAnalytics().

Filter	Parameter	SQL generated
Date (current)	`startDate`, `endDate`	`cal.created_at BETWEEN '{start} 00:00:00' AND '{end} 23:59:59'`
Date (previous)	derived	Range of same length ending the day before `startDate`
Locale	`localeIds` (CSV)	`cal.locale_id IN (id1, id2, …)`
Executive	`salesman`	`cal.executive_id = {salesman}`
Tier	`tier` (CSV of 0–5)	`cal.tier_label IN ('T0', …)`
Stage	`stage`	Virtual-aware — see decision tree below
Product	`product`	`cal.product_tag = '{product}'`

The date filter on the avg closing days query is rewritten to cal.won_at instead of cal.created_at — this measures closes that happened in the period, not leads that were created in it.

Previous period: rangeDays = round((end - start) / day) + 1. Previous end = start - 1 day. Previous start = prevEnd - (rangeDays - 1) days. All arithmetic uses local timezone (no UTC conversion).

Virtual-Stage Decision Tree

Used identically in the funnel breakdown (query 5), outcome summary (query 6), the stage drilldown filter, and the by-executive-stage chart.

When stage filter = customer_lost → WHERE is_lost = 1. When stage filter = won → WHERE is_lost = 0 AND has_shipment = 1. When stage filter = in_recharge → WHERE is_lost = 0 AND has_shipment = 0 AND has_recharge = 1. When stage filter = any pipeline stage → WHERE stage_name = '{stage}' AND is_lost = 0 AND has_shipment = 0 AND has_recharge = 0.

This ensures that clicking a funnel row always returns exactly the same leads shown in that bucket.

13 Parallel Queries

All 13 queries run with Promise.all against crm_analytics_leads cal.

#	Name	Formula / SQL pattern
1	Current leads	`COUNT(*) AS total_leads, SUM(is_lost) AS total_lost` — current date filter
2	Previous leads	Same, previous date filter → used for KPI delta display
3	Current shipments	`SUM(monthly_shipments)`, `SUM(account_value / NULLIF(locale_rate, 0)) / SUM(monthly_shipments) AS avg_cost` (0 when denominator = 0)
4	Previous shipments	Same, previous date filter → delta
5	Sales funnel	`COUNT() GROUP BY virtual_stage` (CASE priority: `customer_lost > won > in_recharge > stage_name`), ordered by count DESC; `percentage = round(count / total 1000) / 10`
6	Outcome summary	`SUM(is_lost)`, `SUM(is_lost=0 AND has_shipment=1)`, `SUM(is_lost=0 AND has_shipment=0 AND has_recharge=1)`, `SUM(is_lost=0 AND has_shipment=0 AND has_recharge=0)`
7	Executive list	`DISTINCT executive_id, executive_name` — locale filter only (no interactive filters, used to populate the salesman dropdown)
8	Avg closing days	`AVG(closing_days) WHERE closing_days IS NOT NULL` — date filter on `won_at`
9	In-recharge count	`COUNT(*) WHERE has_recharge = 1 AND has_shipment = 0`
10	By executive stage	`GROUP BY executive_name`: `won = SUM(has_shipment)`, `lost = SUM(is_lost=1 AND has_shipment=0)`, `remaining = SUM(is_lost=0 AND has_shipment=0)`, `total = COUNT(*)`, ordered by `total DESC`
11	By tier	`COUNT(*) GROUP BY tier_label` — by default appends `AND is_won=0 AND is_lost=0` (active only), but the restriction is dropped when `stage IN ('won','customer_lost')` so the chart reflects the active filter; ordered `FIELD('blank','T0','T1','T2','T3','T4','T5')`
12	By product	`GROUP BY product_tag`: `lead_count = COUNT(*)`, `total_value = SUM(account_value / NULLIF(locale_rate, 0))`, ordered by `total_value DESC`
13	Active leads	`TOP 100 ORDER BY age_days DESC` — by default appends `AND is_won=0 AND is_lost=0` (active only); when `stage IN ('won','customer_lost')` the restriction is dropped so the table shows the leads that match the explicit stage filter. Returns `record_id, record_type, record_name, executive_name, tier_label, stage_name, stage_color, age_days, last_activity_at`

Post-Processing in JavaScript

After all 13 queries resolve:

Computed field	Formula
`advancement_rate`	`round((won + in_recharge) / (won + lost + in_recharge + remaining) * 1000) / 10` — 0 when denominator is 0
Money/rate rounding	2 decimals: `round(x * 100) / 100`
Percentage rounding	1 decimal: `round(x * 1000) / 10`
Fallbacks	`executive_name \|\| 'N/A'`, `tier_label \|\| 'blank'`, empty arrays when no rows

Lead Scoring

Lead scoring assigns a quality score of 0–10 based on how many signals are present. The same SQL expression (buildLeadScoreExpression) is usable in SELECT, WHERE, and HAVING clauses without duplication.

Criteria

#	Criterion	Applies to
1	Account is verified	Clients only
2	Has at least one shop in `shops`	Clients only
3	Has at least one row in `payment_history`	Clients only
4	`type = 'client'`	Clients only
5	`total_shipments > 0`	Clients only
6	Has row in `about_yourself_company_answers`	Clients only
1	Has `ecommerce_type_id` set	Prospects only
2	`fud_monthly_shipments > 0`	Prospects only
3	`fud_channel IS NOT NULL OR fud_account_value IS NOT NULL`	Prospects only

Formula

score = ROUND(criteria_met / 6 * 10, 2)

The denominator is always 6 (CRITERIA_COUNT). Prospects have only 3 criteria available, so their maximum score is 5.0. Clients have 6 criteria, reaching 10.0. This asymmetry is intentional — clients have richer signals available.

The JavaScript equivalent (computeScore(metCount)) returns the same value for pre-computed data.

Badge Thresholds (Frontend)

Score range	Bootstrap variant	Meaning
`≥ 8`	`success` (green)	High-quality lead
`≥ 5`	`warning` (yellow)	Medium quality
`≥ 3`	`info` (blue)	Low quality
`< 3`	`secondary` (grey)	Very low / insufficient data

Database

Tables

Table	Purpose
`crm_analytics_leads`	Flat materialized table with one row per prospect or company in the CRM pipeline. Primary source for all analytics queries.
`crm_analytics_snapshots`	Pre-computed analytics payloads for the four standard date presets, keyed by MD5 hash.
`crm_analytics_sync_log`	Watermark and metrics for each incremental sync run.

`crm_analytics_leads` Columns

Column	Type	Description
`record_id`	INT	Source entity ID (company or prospect)
`record_type`	ENUM	`'company'` or `'prospect'`
`record_name`	VARCHAR	Company name / prospect company name
`created_at`	DATETIME	Original creation date of the source entity
`executive_id`	INT	Assigned salesman/executive ID
`executive_name`	VARCHAR	Assigned salesman name
`locale_id`	INT	Locale (country)
`follow_up_id`	INT	Current follow-up status ID
`stage_name`	VARCHAR	Parent stage name (if parent exists) or current stage name
`stage_color`	VARCHAR	CSS class for stage badge
`is_lost`	TINYINT	`1` when stage is lost/not_interested/customer_lost
`is_won`	TINYINT	`1` when stage is `won`
`tier_value`	INT	Raw shipment count (for companies: max monthly; for prospects: declared monthly)
`tier_label`	VARCHAR	Derived tier bucket: `blank`, `T0`–`T5`
`has_shipment`	TINYINT	`1` if at least one non-cancelled shipment exists (companies only)
`has_recharge`	TINYINT	`1` if at least one payment exists (companies only)
`monthly_shipments`	INT	Declared monthly shipments from `follow_up_data`
`product_tag`	VARCHAR	Package content translation tag from `catalog_suggestions_package_content`
`account_value`	DECIMAL	Estimated account value in local currency
`locale_rate`	DECIMAL	Currency exchange rate to normalize `account_value` to USD
`first_activity_at`	DATETIME	Timestamp of the first follow-up note
`won_at`	DATETIME	Timestamp of the first `won`-stage note
`last_activity_at`	DATETIME	Most recent `IFNULL(scheduled_date, created_at)` from follow-up notes where the date is `<= NOW()`. Updated by the sync cron and live on every note insert/edit.
`closing_days`	INT	`TIMESTAMPDIFF(DAY, first_activity_at, won_at)` — `NULL` if not yet won
`age_days`	INT	Days since `created_at` — recomputed on every sync run
`synced_at`	DATETIME	Last time this row was written by the sync job

Entity Relationship

Key Decisions

Decision	Reasoning	Alternatives Considered
Materialized flat table (`crm_analytics_leads`) instead of querying source tables directly	The original `UNION ALL` queries joining 5+ tables took several seconds at scale. A flat table reduces analytics queries to simple `COUNT`/`SUM` with WHERE filters.	Indexed views (MySQL doesn't support them natively); query result caching at the app layer (harder to invalidate).
MD5 cache key includes only `start\|end\|localeIds` — interactive filters excluded	Adding salesman/tier/stage/product to the key would create an exponential number of cache entries (combinatorial explosion). These filters are only used by managers in specific investigative sessions and hit live compute instead.	Per-filter snapshot precomputation (too many combinations); no cache at all (acceptable only if the database could serve queries fast enough).
Mutually-exclusive virtual stages	The sales funnel and the outcome pie chart must account for every lead exactly once. Without virtual stages, a lead with `stage_name = 'won'` that also has `is_lost = 1` would appear in two buckets. The priority (`lost > won > in_recharge > pipeline`) makes each bucket deterministic.	Separate charts per stage type (less coherent dashboard); allow overlap (double-counts in aggregates).
Rolling refresh of 10,000 oldest rows	Some changes (e.g., salesman reassignment on `companies`) do not update any related-table `created_at` timestamp, so dirty detection misses them. The rolling pass ensures every row is eventually re-fetched.	Event-based triggers (MySQL triggers to write a change log — high write amplification); full table refresh on every sync (too slow at scale).
Watermark = sync start time − 2 minutes	If the sync job runs for, say, 90 seconds, any row inserted at second 45 would be missed by the next sync that starts reading from the job's end time. Subtracting 2 minutes ensures a safe overlap window at the cost of re-syncing a small number of rows.	Use job end time as watermark (misses concurrent writes); use `NOW()` at moment of log write (same problem).
Tier thresholds kept in both JS and SQL	The `_tierLabel()` JS function (sync job) must produce the same labels as the SQL CASE expression (analytics materialized script). Centralizing them in a shared constant is not practical across two runtimes. A code comment documents this coupling.	Single source of truth in a config file loaded by both (would require a build step for the SQL); compute `tier_label` only in SQL (removes the JS function, simpler if future tools only use the DB).
Prospects always have `has_shipment = 0` and `has_recharge = 0`	Prospects have no `company_id`, so they cannot have shipments or payment history. Hardcoding to 0 avoids a conditional sub-query and keeps the funnel logic uniform.	Null instead of 0 (requires IS NULL checks throughout the analytics queries).
Cutoff date applies only to new-record discovery, not to refresh or cleanup	The cron must not mass-import the full historical database on every run, so `CRM_ANALYTICS_CUTOFF_DATE` is applied when fetching IDs that are not yet in the materialized table. However, rows already present (e.g., a pre-2025 client inserted via live upsert) must be kept fresh and must not be silently deleted. Removing the cutoff from the cleanup `NOT EXISTS` guard achieves this without additional bookkeeping.	Apply cutoff everywhere (pre-2025 clients can never appear); apply cutoff nowhere (full historical scan on every cron run).

Dependencies

Internal: CRM module — source data for prospects and companies; catalog_follow_up_statuses catalog
Tables: companies, prospection_users, follow_up_data, follow_up_comments, prospect_follow_up_comments, catalog_follow_up_statuses, company_monthly_tiers, payment_history, shipments, catalog_suggestions_package_content, locales, shops, about_yourself_company_answers
Backend libraries: date.util.js (formatDateYMD, startOfMonth, subDays, subMonths)
Constants: backend/constants/crm.constants.js (CRM_ANALYTICS_BATCH_SIZE = 500, CRM_ANALYTICS_ROLLING_LIMIT = 10000), backend/constants/lead-scoring.constants.js (CRITERIA_COUNT = 6)

API Endpoints — GET /prospects/client-analytics and POST /crm/analytics/sync reference
Prediction Module — GET /prospects/client-prediction: 6-month historical baseline + 6-month forward forecast
UI Screens & Flows — Dashboard components, filters, charts, and drilldown pattern
CRM Module Overview — Parent module: prospects, pipeline, Kanban, and notes
User Guide — Analytics — End-user instructions for reading and using the analytics dashboard
User Guide — Prediction — End-user instructions for the Prediction view, including all formula explanations

UI Elements

Forms & Overlays

Tables

Analytics

Organization Settings

Payment Requests

Pricing & Costs

Additional Services & Charges

CRM Analytics

Overview

Key Concepts

Data Flow

Analytics Request Flow

Materialized Table Sync Pipeline

Snapshot Pre-computation

ETL: How Rows Enter `crm_analytics_leads`

Companies (`record_type = 'company'`)

Prospects (`record_type = 'prospect'`)

Derived Columns (computed in SQL at fetch time)

Tier Label Mapping

Analytics Computation

Query-Time Filters

Virtual-Stage Decision Tree

13 Parallel Queries

Post-Processing in JavaScript

Lead Scoring

Criteria

Formula

Badge Thresholds (Frontend)

Database

Tables

`crm_analytics_leads` Columns

Entity Relationship

Key Decisions

Dependencies

CRM Analytics ​

Overview ​

Key Concepts ​

Data Flow ​

Analytics Request Flow ​

Materialized Table Sync Pipeline ​

Snapshot Pre-computation ​

ETL: How Rows Enter crm_analytics_leads ​

Companies (record_type = 'company') ​

Prospects (record_type = 'prospect') ​

Derived Columns (computed in SQL at fetch time) ​

Tier Label Mapping ​

Analytics Computation ​

Query-Time Filters ​

Virtual-Stage Decision Tree ​

13 Parallel Queries ​

Post-Processing in JavaScript ​

Lead Scoring ​

Criteria ​

Formula ​

Badge Thresholds (Frontend) ​

Database ​

Tables ​

crm_analytics_leads Columns ​

Entity Relationship ​

Key Decisions ​

Dependencies ​

Related Documentation ​

CRM Analytics

Overview

Key Concepts

Data Flow

Analytics Request Flow

Materialized Table Sync Pipeline

Snapshot Pre-computation

ETL: How Rows Enter `crm_analytics_leads`

Companies (`record_type = 'company'`)

Prospects (`record_type = 'prospect'`)

Derived Columns (computed in SQL at fetch time)

Tier Label Mapping

Analytics Computation

Query-Time Filters

Virtual-Stage Decision Tree

13 Parallel Queries

Post-Processing in JavaScript

Lead Scoring

Criteria

Formula

Badge Thresholds (Frontend)

Database

Tables

`crm_analytics_leads` Columns

Entity Relationship

Key Decisions

Dependencies

Related Documentation