CRM Analytics
Performance analytics layer for the CRM module, providing KPI dashboards, sales funnel analysis, and lead intelligence from a continuously-synced materialized table.
Overview
The CRM Analytics system solves a core performance problem: the original CRM had to run complex UNION ALL queries joining five or more tables every time a manager opened the analytics tab, making response times unacceptably slow at scale.
The solution introduces a materialized table (crm_analytics_leads) that pre-denormalizes every prospect and client row into a flat structure. A cron-based incremental sync keeps this table current within ~2 minutes. On top of this layer, a snapshot cache (crm_analytics_snapshots) pre-computes the four most-used date presets so the common case returns immediately from a single row lookup.
The analytics dashboard then serves two modes:
- Snapshot mode: for the four preset date ranges (last 30 days, this month, last 3 months, last 6 months) — returns the pre-computed JSON instantly.
- Live compute mode: for any custom date range or when an interactive filter (salesman, tier, stage, product) is active — queries
crm_analytics_leadsdirectly. This is still fast because the materialized table is a flat denormalized index, not raw relational data.
Key Concepts
| Concept | Description |
|---|---|
| Lead | A unified row in crm_analytics_leads representing either a prospect (record_type = 'prospect') or a company in the CRM pipeline (record_type = 'company') |
| Virtual stage | A computed classification that overrides the raw pipeline stage. Priority: customer_lost > won > in_recharge > pipeline stage_name. Every lead falls in exactly one bucket. |
| Tier | A shipment-volume band based on monthly shipments: blank (0), T0 (1–10), T1 (11–100), T2 (101–350), T3 (351–750), T4 (751–1,500), T5 (1,500+). |
| Snapshot | A pre-computed analytics payload stored as JSON in crm_analytics_snapshots, keyed by MD5(startDate|endDate|localeIds). |
| Dirty detection | The sync strategy that finds records changed since the last watermark by checking created_at timestamps across all related tables. |
| Rolling refresh | An additional sync pass that re-fetches the 10,000 oldest rows (by synced_at) to catch changes that do not produce a new timestamp (e.g., salesman reassignment). |
| Cutoff date | CRM_ANALYTICS_CUTOFF_DATE (2025-01-01). Applied only during the cron's discovery phase to avoid mass-importing historical records. It is not applied when refreshing rows already in the table, nor by the cleanup step, nor by upsertCrmAnalyticsLead. This means a pre-2025 client that receives a new activity via the UI will be inserted and kept permanently in the materialized table. |
| Lead Score | A 0–10 quality score computed from 6 boolean criteria. Criteria differ for prospects vs clients. |
| closing_days | Days elapsed from first follow-up activity to the first note with stage_name = 'won'. Only set when both dates are known. |
Data Flow
Analytics Request Flow
Materialized Table Sync Pipeline
The live upsert path (triggered from controllers when a note is saved) always uses bypassCutoff: true, so a pre-2025 client that receives a new activity is immediately inserted into crm_analytics_leads.
Snapshot Pre-computation
ETL: How Rows Enter crm_analytics_leads
The ETL is performed by two denormalizer functions that run inside the sync job. Both write to the same flat table via INSERT … ON DUPLICATE KEY UPDATE on (record_id, record_type).
There are two entry paths:
| Path | When | Cutoff applied? |
|---|---|---|
| Cron — new IDs | A company/prospect not yet in the table is detected as dirty | Yes (created_at >= 2025-01-01) |
| Cron — existing IDs | A row already in the table is detected as dirty or picked by the rolling refresh | No (bypassed) |
Live upsert (upsertCrmAnalyticsLead) | A controller saves or updates a follow-up note | No (bypassed) |
This design means a client created before 2025 is never imported in bulk by the cron, but is inserted the moment a user registers a new activity in the UI — and stays in the table indefinitely because the cleanup step only removes records that are soft-deleted, have status = 0, or have no follow_up_id.
Companies (record_type = 'company')
Source: companies c filtered by c.follow_up_id IS NOT NULL AND c.status != 0 AND cfs.source = 'company'.
| Joined Table | Join Condition | Fields Extracted |
|---|---|---|
users u_sal | u_sal.id = c.salesman | executive_id, executive_name (fallback 'N/A') |
locales l | l.id = c.locale_id | locale_rate (fallback 1) |
catalog_follow_up_statuses cfs | cfs.id = c.follow_up_id | stage_name, stage_color |
catalog_follow_up_statuses cfs_p | cfs_p.id = cfs.parent_id | parent stage wins: IFNULL(cfs_p.name, cfs.name) |
company_monthly_tiers cmt | cmt.company_id = c.id GROUP | tier_value = MAX(total_shipments) |
follow_up_data fud | fud.company_id = c.id | account_value, monthly_shipments, content_id |
catalog_suggestions_package_content cspc | cspc.id = fud.content_id | product_tag = translation_tag |
follow_up_comments fuc_first | fuc_first.company_id = c.id GROUP | first_activity_at = MIN(created_at) |
follow_up_comments fuc_won | joined with cfs_w.name = 'won' GROUP | won_at = MIN(created_at) |
EXISTS(shipments s) | s.company_id = c.id AND s.canceled = 0 | has_shipment (0 or 1) |
EXISTS(payment_history ph) | ph.company_id = c.id | has_recharge (0 or 1) |
Prospects (record_type = 'prospect')
Source: prospection_users pu filtered by pu.follow_up_id IS NOT NULL AND pu.company_id IS NULL AND pu.status != 0 AND cfs.source = 'company'.
Key differences from companies:
| Field | Prospects source |
|---|---|
executive_id / name | administrators adm ON adm.user_id = pu.created_by + users a_user ON a_user.id = pu.created_by |
tier_value | fud.monthly_shipments (no company_monthly_tiers) |
first_activity_at / won_at | prospect_follow_up_comments (column status_id, not follow_up_status_id) |
has_shipment / has_recharge | Always 0 — prospects have no shipment or recharge history |
Derived Columns (computed in SQL at fetch time)
| Column | Formula |
|---|---|
is_lost | 1 when stage_name IN ('lost', 'not_interested', 'customer_lost'), else 0 |
is_won | 1 when stage_name = 'won', else 0 |
closing_days | TIMESTAMPDIFF(DAY, first_activity_at, won_at) — NULL when either date is missing |
age_days | TIMESTAMPDIFF(DAY, created_at, NOW()) — recomputed on every sync |
tier_label | Computed in JS (_tierLabel()) from tier_value; must match the SQL CASE in the materialized script |
Tier Label Mapping
tier_value range | tier_label |
|---|---|
0 | blank |
1 – 10 | T0 |
11 – 100 | T1 |
101 – 350 | T2 |
351 – 750 | T3 |
751 – 1,500 | T4 |
> 1,500 | T5 |
This mapping lives in both _tierLabel() (Node.js sync job) and TIER_RANGES (frontend analytics.constants.js). Keeping them in sync is a deployment concern.
Analytics Computation
Query-Time Filters
All active filters are concatenated into a sharedFilters SQL string appended to every query inside computeCrmAnalytics().
| Filter | Parameter | SQL generated |
|---|---|---|
| Date (current) | startDate, endDate | cal.created_at BETWEEN '{start} 00:00:00' AND '{end} 23:59:59' |
| Date (previous) | derived | Range of same length ending the day before startDate |
| Locale | localeIds (CSV) | cal.locale_id IN (id1, id2, …) |
| Executive | salesman | cal.executive_id = {salesman} |
| Tier | tier (CSV of 0–5) | cal.tier_label IN ('T0', …) |
| Stage | stage | Virtual-aware — see decision tree below |
| Product | product | cal.product_tag = '{product}' |
The date filter on the avg closing days query is rewritten to cal.won_at instead of cal.created_at — this measures closes that happened in the period, not leads that were created in it.
Previous period: rangeDays = round((end - start) / day) + 1. Previous end = start - 1 day. Previous start = prevEnd - (rangeDays - 1) days. All arithmetic uses local timezone (no UTC conversion).
Virtual-Stage Decision Tree
Used identically in the funnel breakdown (query 5), outcome summary (query 6), the stage drilldown filter, and the by-executive-stage chart.
When stage filter = customer_lost → WHERE is_lost = 1. When stage filter = won → WHERE is_lost = 0 AND has_shipment = 1. When stage filter = in_recharge → WHERE is_lost = 0 AND has_shipment = 0 AND has_recharge = 1. When stage filter = any pipeline stage → WHERE stage_name = '{stage}' AND is_lost = 0 AND has_shipment = 0 AND has_recharge = 0.
This ensures that clicking a funnel row always returns exactly the same leads shown in that bucket.
13 Parallel Queries
All 13 queries run with Promise.all against crm_analytics_leads cal.
| # | Name | Formula / SQL pattern |
|---|---|---|
| 1 | Current leads | COUNT(*) AS total_leads, SUM(is_lost) AS total_lost — current date filter |
| 2 | Previous leads | Same, previous date filter → used for KPI delta display |
| 3 | Current shipments | SUM(monthly_shipments), SUM(account_value / NULLIF(locale_rate, 0)) / SUM(monthly_shipments) AS avg_cost (0 when denominator = 0) |
| 4 | Previous shipments | Same, previous date filter → delta |
| 5 | Sales funnel | COUNT(*) GROUP BY virtual_stage (CASE priority: customer_lost > won > in_recharge > stage_name), ordered by count DESC; percentage = round(count / total * 1000) / 10 |
| 6 | Outcome summary | SUM(is_lost), SUM(is_lost=0 AND has_shipment=1), SUM(is_lost=0 AND has_shipment=0 AND has_recharge=1), SUM(is_lost=0 AND has_shipment=0 AND has_recharge=0) |
| 7 | Executive list | DISTINCT executive_id, executive_name — locale filter only (no interactive filters, used to populate the salesman dropdown) |
| 8 | Avg closing days | AVG(closing_days) WHERE closing_days IS NOT NULL — date filter on won_at |
| 9 | In-recharge count | COUNT(*) WHERE has_recharge = 1 AND has_shipment = 0 |
| 10 | By executive stage | GROUP BY executive_name: won = SUM(has_shipment), lost = SUM(is_lost=1 AND has_shipment=0), remaining = SUM(is_lost=0 AND has_shipment=0), total = COUNT(*), ordered by total DESC |
| 11 | By tier | COUNT(*) GROUP BY tier_label — by default appends AND is_won=0 AND is_lost=0 (active only), but the restriction is dropped when stage IN ('won','customer_lost') so the chart reflects the active filter; ordered FIELD('blank','T0','T1','T2','T3','T4','T5') |
| 12 | By product | GROUP BY product_tag: lead_count = COUNT(*), total_value = SUM(account_value / NULLIF(locale_rate, 0)), ordered by total_value DESC |
| 13 | Active leads | TOP 100 ORDER BY age_days DESC — by default appends AND is_won=0 AND is_lost=0 (active only); when stage IN ('won','customer_lost') the restriction is dropped so the table shows the leads that match the explicit stage filter. Returns record_id, record_type, record_name, executive_name, tier_label, stage_name, stage_color, age_days, last_activity_at |
Post-Processing in JavaScript
After all 13 queries resolve:
| Computed field | Formula |
|---|---|
advancement_rate | round((won + in_recharge) / (won + lost + in_recharge + remaining) * 1000) / 10 — 0 when denominator is 0 |
| Money/rate rounding | 2 decimals: round(x * 100) / 100 |
| Percentage rounding | 1 decimal: round(x * 1000) / 10 |
| Fallbacks | executive_name || 'N/A', tier_label || 'blank', empty arrays when no rows |
Lead Scoring
Lead scoring assigns a quality score of 0–10 based on how many signals are present. The same SQL expression (buildLeadScoreExpression) is usable in SELECT, WHERE, and HAVING clauses without duplication.
Criteria
| # | Criterion | Applies to |
|---|---|---|
| 1 | Account is verified | Clients only |
| 2 | Has at least one shop in shops | Clients only |
| 3 | Has at least one row in payment_history | Clients only |
| 4 | type = 'client' | Clients only |
| 5 | total_shipments > 0 | Clients only |
| 6 | Has row in about_yourself_company_answers | Clients only |
| 1 | Has ecommerce_type_id set | Prospects only |
| 2 | fud_monthly_shipments > 0 | Prospects only |
| 3 | fud_channel IS NOT NULL OR fud_account_value IS NOT NULL | Prospects only |
Formula
score = ROUND(criteria_met / 6 * 10, 2)The denominator is always 6 (CRITERIA_COUNT). Prospects have only 3 criteria available, so their maximum score is 5.0. Clients have 6 criteria, reaching 10.0. This asymmetry is intentional — clients have richer signals available.
The JavaScript equivalent (computeScore(metCount)) returns the same value for pre-computed data.
Badge Thresholds (Frontend)
| Score range | Bootstrap variant | Meaning |
|---|---|---|
≥ 8 | success (green) | High-quality lead |
≥ 5 | warning (yellow) | Medium quality |
≥ 3 | info (blue) | Low quality |
< 3 | secondary (grey) | Very low / insufficient data |
Database
Tables
| Table | Purpose |
|---|---|
crm_analytics_leads | Flat materialized table with one row per prospect or company in the CRM pipeline. Primary source for all analytics queries. |
crm_analytics_snapshots | Pre-computed analytics payloads for the four standard date presets, keyed by MD5 hash. |
crm_analytics_sync_log | Watermark and metrics for each incremental sync run. |
crm_analytics_leads Columns
| Column | Type | Description |
|---|---|---|
record_id | INT | Source entity ID (company or prospect) |
record_type | ENUM | 'company' or 'prospect' |
record_name | VARCHAR | Company name / prospect company name |
created_at | DATETIME | Original creation date of the source entity |
executive_id | INT | Assigned salesman/executive ID |
executive_name | VARCHAR | Assigned salesman name |
locale_id | INT | Locale (country) |
follow_up_id | INT | Current follow-up status ID |
stage_name | VARCHAR | Parent stage name (if parent exists) or current stage name |
stage_color | VARCHAR | CSS class for stage badge |
is_lost | TINYINT | 1 when stage is lost/not_interested/customer_lost |
is_won | TINYINT | 1 when stage is won |
tier_value | INT | Raw shipment count (for companies: max monthly; for prospects: declared monthly) |
tier_label | VARCHAR | Derived tier bucket: blank, T0–T5 |
has_shipment | TINYINT | 1 if at least one non-cancelled shipment exists (companies only) |
has_recharge | TINYINT | 1 if at least one payment exists (companies only) |
monthly_shipments | INT | Declared monthly shipments from follow_up_data |
product_tag | VARCHAR | Package content translation tag from catalog_suggestions_package_content |
account_value | DECIMAL | Estimated account value in local currency |
locale_rate | DECIMAL | Currency exchange rate to normalize account_value to USD |
first_activity_at | DATETIME | Timestamp of the first follow-up note |
won_at | DATETIME | Timestamp of the first won-stage note |
last_activity_at | DATETIME | Most recent IFNULL(scheduled_date, created_at) from follow-up notes where the date is <= NOW(). Updated by the sync cron and live on every note insert/edit. |
closing_days | INT | TIMESTAMPDIFF(DAY, first_activity_at, won_at) — NULL if not yet won |
age_days | INT | Days since created_at — recomputed on every sync run |
synced_at | DATETIME | Last time this row was written by the sync job |
Entity Relationship
Key Decisions
| Decision | Reasoning | Alternatives Considered |
|---|---|---|
Materialized flat table (crm_analytics_leads) instead of querying source tables directly | The original UNION ALL queries joining 5+ tables took several seconds at scale. A flat table reduces analytics queries to simple COUNT/SUM with WHERE filters. | Indexed views (MySQL doesn't support them natively); query result caching at the app layer (harder to invalidate). |
MD5 cache key includes only start|end|localeIds — interactive filters excluded | Adding salesman/tier/stage/product to the key would create an exponential number of cache entries (combinatorial explosion). These filters are only used by managers in specific investigative sessions and hit live compute instead. | Per-filter snapshot precomputation (too many combinations); no cache at all (acceptable only if the database could serve queries fast enough). |
| Mutually-exclusive virtual stages | The sales funnel and the outcome pie chart must account for every lead exactly once. Without virtual stages, a lead with stage_name = 'won' that also has is_lost = 1 would appear in two buckets. The priority (lost > won > in_recharge > pipeline) makes each bucket deterministic. | Separate charts per stage type (less coherent dashboard); allow overlap (double-counts in aggregates). |
| Rolling refresh of 10,000 oldest rows | Some changes (e.g., salesman reassignment on companies) do not update any related-table created_at timestamp, so dirty detection misses them. The rolling pass ensures every row is eventually re-fetched. | Event-based triggers (MySQL triggers to write a change log — high write amplification); full table refresh on every sync (too slow at scale). |
| Watermark = sync start time − 2 minutes | If the sync job runs for, say, 90 seconds, any row inserted at second 45 would be missed by the next sync that starts reading from the job's end time. Subtracting 2 minutes ensures a safe overlap window at the cost of re-syncing a small number of rows. | Use job end time as watermark (misses concurrent writes); use NOW() at moment of log write (same problem). |
| Tier thresholds kept in both JS and SQL | The _tierLabel() JS function (sync job) must produce the same labels as the SQL CASE expression (analytics materialized script). Centralizing them in a shared constant is not practical across two runtimes. A code comment documents this coupling. | Single source of truth in a config file loaded by both (would require a build step for the SQL); compute tier_label only in SQL (removes the JS function, simpler if future tools only use the DB). |
Prospects always have has_shipment = 0 and has_recharge = 0 | Prospects have no company_id, so they cannot have shipments or payment history. Hardcoding to 0 avoids a conditional sub-query and keeps the funnel logic uniform. | Null instead of 0 (requires IS NULL checks throughout the analytics queries). |
| Cutoff date applies only to new-record discovery, not to refresh or cleanup | The cron must not mass-import the full historical database on every run, so CRM_ANALYTICS_CUTOFF_DATE is applied when fetching IDs that are not yet in the materialized table. However, rows already present (e.g., a pre-2025 client inserted via live upsert) must be kept fresh and must not be silently deleted. Removing the cutoff from the cleanup NOT EXISTS guard achieves this without additional bookkeeping. | Apply cutoff everywhere (pre-2025 clients can never appear); apply cutoff nowhere (full historical scan on every cron run). |
Dependencies
- Internal: CRM module — source data for prospects and companies;
catalog_follow_up_statusescatalog - Tables:
companies,prospection_users,follow_up_data,follow_up_comments,prospect_follow_up_comments,catalog_follow_up_statuses,company_monthly_tiers,payment_history,shipments,catalog_suggestions_package_content,locales,shops,about_yourself_company_answers - Backend libraries:
date.util.js(formatDateYMD,startOfMonth,subDays,subMonths) - Constants:
backend/constants/crm.constants.js(CRM_ANALYTICS_BATCH_SIZE = 500,CRM_ANALYTICS_ROLLING_LIMIT = 10000),backend/constants/lead-scoring.constants.js(CRITERIA_COUNT = 6)
Related Documentation
- API Endpoints —
GET /prospects/client-analyticsandPOST /crm/analytics/syncreference - Prediction Module —
GET /prospects/client-prediction: 6-month historical baseline + 6-month forward forecast - UI Screens & Flows — Dashboard components, filters, charts, and drilldown pattern
- CRM Module Overview — Parent module: prospects, pipeline, Kanban, and notes
- User Guide — Analytics — End-user instructions for reading and using the analytics dashboard
- User Guide — Prediction — End-user instructions for the Prediction view, including all formula explanations
