Skip to content

CRM Analytics

Performance analytics layer for the CRM module, providing KPI dashboards, sales funnel analysis, and lead intelligence from a continuously-synced materialized table.

Overview

The CRM Analytics system solves a core performance problem: the original CRM had to run complex UNION ALL queries joining five or more tables every time a manager opened the analytics tab, making response times unacceptably slow at scale.

The solution introduces a materialized table (crm_analytics_leads) that pre-denormalizes every prospect and client row into a flat structure. A cron-based incremental sync keeps this table current within ~2 minutes. On top of this layer, a snapshot cache (crm_analytics_snapshots) pre-computes the four most-used date presets so the common case returns immediately from a single row lookup.

The analytics dashboard then serves two modes:

  • Snapshot mode: for the four preset date ranges (last 30 days, this month, last 3 months, last 6 months) — returns the pre-computed JSON instantly.
  • Live compute mode: for any custom date range or when an interactive filter (salesman, tier, stage, product) is active — queries crm_analytics_leads directly. This is still fast because the materialized table is a flat denormalized index, not raw relational data.

Key Concepts

ConceptDescription
LeadA unified row in crm_analytics_leads representing either a prospect (record_type = 'prospect') or a company in the CRM pipeline (record_type = 'company')
Virtual stageA computed classification that overrides the raw pipeline stage. Priority: customer_lost > won > in_recharge > pipeline stage_name. Every lead falls in exactly one bucket.
TierA shipment-volume band based on monthly shipments: blank (0), T0 (1–10), T1 (11–100), T2 (101–350), T3 (351–750), T4 (751–1,500), T5 (1,500+).
SnapshotA pre-computed analytics payload stored as JSON in crm_analytics_snapshots, keyed by MD5(startDate|endDate|localeIds).
Dirty detectionThe sync strategy that finds records changed since the last watermark by checking created_at timestamps across all related tables.
Rolling refreshAn additional sync pass that re-fetches the 10,000 oldest rows (by synced_at) to catch changes that do not produce a new timestamp (e.g., salesman reassignment).
Cutoff dateCRM_ANALYTICS_CUTOFF_DATE (2025-01-01). Applied only during the cron's discovery phase to avoid mass-importing historical records. It is not applied when refreshing rows already in the table, nor by the cleanup step, nor by upsertCrmAnalyticsLead. This means a pre-2025 client that receives a new activity via the UI will be inserted and kept permanently in the materialized table.
Lead ScoreA 0–10 quality score computed from 6 boolean criteria. Criteria differ for prospects vs clients.
closing_daysDays elapsed from first follow-up activity to the first note with stage_name = 'won'. Only set when both dates are known.

Data Flow

Analytics Request Flow

Materialized Table Sync Pipeline

The live upsert path (triggered from controllers when a note is saved) always uses bypassCutoff: true, so a pre-2025 client that receives a new activity is immediately inserted into crm_analytics_leads.

Snapshot Pre-computation

ETL: How Rows Enter crm_analytics_leads

The ETL is performed by two denormalizer functions that run inside the sync job. Both write to the same flat table via INSERT … ON DUPLICATE KEY UPDATE on (record_id, record_type).

There are two entry paths:

PathWhenCutoff applied?
Cron — new IDsA company/prospect not yet in the table is detected as dirtyYes (created_at >= 2025-01-01)
Cron — existing IDsA row already in the table is detected as dirty or picked by the rolling refreshNo (bypassed)
Live upsert (upsertCrmAnalyticsLead)A controller saves or updates a follow-up noteNo (bypassed)

This design means a client created before 2025 is never imported in bulk by the cron, but is inserted the moment a user registers a new activity in the UI — and stays in the table indefinitely because the cleanup step only removes records that are soft-deleted, have status = 0, or have no follow_up_id.

Companies (record_type = 'company')

Source: companies c filtered by c.follow_up_id IS NOT NULL AND c.status != 0 AND cfs.source = 'company'.

Joined TableJoin ConditionFields Extracted
users u_salu_sal.id = c.salesmanexecutive_id, executive_name (fallback 'N/A')
locales ll.id = c.locale_idlocale_rate (fallback 1)
catalog_follow_up_statuses cfscfs.id = c.follow_up_idstage_name, stage_color
catalog_follow_up_statuses cfs_pcfs_p.id = cfs.parent_idparent stage wins: IFNULL(cfs_p.name, cfs.name)
company_monthly_tiers cmtcmt.company_id = c.id GROUPtier_value = MAX(total_shipments)
follow_up_data fudfud.company_id = c.idaccount_value, monthly_shipments, content_id
catalog_suggestions_package_content cspccspc.id = fud.content_idproduct_tag = translation_tag
follow_up_comments fuc_firstfuc_first.company_id = c.id GROUPfirst_activity_at = MIN(created_at)
follow_up_comments fuc_wonjoined with cfs_w.name = 'won' GROUPwon_at = MIN(created_at)
EXISTS(shipments s)s.company_id = c.id AND s.canceled = 0has_shipment (0 or 1)
EXISTS(payment_history ph)ph.company_id = c.idhas_recharge (0 or 1)

Prospects (record_type = 'prospect')

Source: prospection_users pu filtered by pu.follow_up_id IS NOT NULL AND pu.company_id IS NULL AND pu.status != 0 AND cfs.source = 'company'.

Key differences from companies:

FieldProspects source
executive_id / nameadministrators adm ON adm.user_id = pu.created_by + users a_user ON a_user.id = pu.created_by
tier_valuefud.monthly_shipments (no company_monthly_tiers)
first_activity_at / won_atprospect_follow_up_comments (column status_id, not follow_up_status_id)
has_shipment / has_rechargeAlways 0 — prospects have no shipment or recharge history

Derived Columns (computed in SQL at fetch time)

ColumnFormula
is_lost1 when stage_name IN ('lost', 'not_interested', 'customer_lost'), else 0
is_won1 when stage_name = 'won', else 0
closing_daysTIMESTAMPDIFF(DAY, first_activity_at, won_at)NULL when either date is missing
age_daysTIMESTAMPDIFF(DAY, created_at, NOW()) — recomputed on every sync
tier_labelComputed in JS (_tierLabel()) from tier_value; must match the SQL CASE in the materialized script

Tier Label Mapping

tier_value rangetier_label
0blank
1 – 10T0
11 – 100T1
101 – 350T2
351 – 750T3
751 – 1,500T4
> 1,500T5

This mapping lives in both _tierLabel() (Node.js sync job) and TIER_RANGES (frontend analytics.constants.js). Keeping them in sync is a deployment concern.

Analytics Computation

Query-Time Filters

All active filters are concatenated into a sharedFilters SQL string appended to every query inside computeCrmAnalytics().

FilterParameterSQL generated
Date (current)startDate, endDatecal.created_at BETWEEN '{start} 00:00:00' AND '{end} 23:59:59'
Date (previous)derivedRange of same length ending the day before startDate
LocalelocaleIds (CSV)cal.locale_id IN (id1, id2, …)
Executivesalesmancal.executive_id = {salesman}
Tiertier (CSV of 0–5)cal.tier_label IN ('T0', …)
StagestageVirtual-aware — see decision tree below
Productproductcal.product_tag = '{product}'

The date filter on the avg closing days query is rewritten to cal.won_at instead of cal.created_at — this measures closes that happened in the period, not leads that were created in it.

Previous period: rangeDays = round((end - start) / day) + 1. Previous end = start - 1 day. Previous start = prevEnd - (rangeDays - 1) days. All arithmetic uses local timezone (no UTC conversion).

Virtual-Stage Decision Tree

Used identically in the funnel breakdown (query 5), outcome summary (query 6), the stage drilldown filter, and the by-executive-stage chart.

When stage filter = customer_lostWHERE is_lost = 1. When stage filter = wonWHERE is_lost = 0 AND has_shipment = 1. When stage filter = in_rechargeWHERE is_lost = 0 AND has_shipment = 0 AND has_recharge = 1. When stage filter = any pipeline stage → WHERE stage_name = '{stage}' AND is_lost = 0 AND has_shipment = 0 AND has_recharge = 0.

This ensures that clicking a funnel row always returns exactly the same leads shown in that bucket.

13 Parallel Queries

All 13 queries run with Promise.all against crm_analytics_leads cal.

#NameFormula / SQL pattern
1Current leadsCOUNT(*) AS total_leads, SUM(is_lost) AS total_lost — current date filter
2Previous leadsSame, previous date filter → used for KPI delta display
3Current shipmentsSUM(monthly_shipments), SUM(account_value / NULLIF(locale_rate, 0)) / SUM(monthly_shipments) AS avg_cost (0 when denominator = 0)
4Previous shipmentsSame, previous date filter → delta
5Sales funnelCOUNT(*) GROUP BY virtual_stage (CASE priority: customer_lost > won > in_recharge > stage_name), ordered by count DESC; percentage = round(count / total * 1000) / 10
6Outcome summarySUM(is_lost), SUM(is_lost=0 AND has_shipment=1), SUM(is_lost=0 AND has_shipment=0 AND has_recharge=1), SUM(is_lost=0 AND has_shipment=0 AND has_recharge=0)
7Executive listDISTINCT executive_id, executive_name — locale filter only (no interactive filters, used to populate the salesman dropdown)
8Avg closing daysAVG(closing_days) WHERE closing_days IS NOT NULL — date filter on won_at
9In-recharge countCOUNT(*) WHERE has_recharge = 1 AND has_shipment = 0
10By executive stageGROUP BY executive_name: won = SUM(has_shipment), lost = SUM(is_lost=1 AND has_shipment=0), remaining = SUM(is_lost=0 AND has_shipment=0), total = COUNT(*), ordered by total DESC
11By tierCOUNT(*) GROUP BY tier_label — by default appends AND is_won=0 AND is_lost=0 (active only), but the restriction is dropped when stage IN ('won','customer_lost') so the chart reflects the active filter; ordered FIELD('blank','T0','T1','T2','T3','T4','T5')
12By productGROUP BY product_tag: lead_count = COUNT(*), total_value = SUM(account_value / NULLIF(locale_rate, 0)), ordered by total_value DESC
13Active leadsTOP 100 ORDER BY age_days DESC — by default appends AND is_won=0 AND is_lost=0 (active only); when stage IN ('won','customer_lost') the restriction is dropped so the table shows the leads that match the explicit stage filter. Returns record_id, record_type, record_name, executive_name, tier_label, stage_name, stage_color, age_days, last_activity_at

Post-Processing in JavaScript

After all 13 queries resolve:

Computed fieldFormula
advancement_rateround((won + in_recharge) / (won + lost + in_recharge + remaining) * 1000) / 10 — 0 when denominator is 0
Money/rate rounding2 decimals: round(x * 100) / 100
Percentage rounding1 decimal: round(x * 1000) / 10
Fallbacksexecutive_name || 'N/A', tier_label || 'blank', empty arrays when no rows

Lead Scoring

Lead scoring assigns a quality score of 0–10 based on how many signals are present. The same SQL expression (buildLeadScoreExpression) is usable in SELECT, WHERE, and HAVING clauses without duplication.

Criteria

#CriterionApplies to
1Account is verifiedClients only
2Has at least one shop in shopsClients only
3Has at least one row in payment_historyClients only
4type = 'client'Clients only
5total_shipments > 0Clients only
6Has row in about_yourself_company_answersClients only
1Has ecommerce_type_id setProspects only
2fud_monthly_shipments > 0Prospects only
3fud_channel IS NOT NULL OR fud_account_value IS NOT NULLProspects only

Formula

score = ROUND(criteria_met / 6 * 10, 2)

The denominator is always 6 (CRITERIA_COUNT). Prospects have only 3 criteria available, so their maximum score is 5.0. Clients have 6 criteria, reaching 10.0. This asymmetry is intentional — clients have richer signals available.

The JavaScript equivalent (computeScore(metCount)) returns the same value for pre-computed data.

Badge Thresholds (Frontend)

Score rangeBootstrap variantMeaning
≥ 8success (green)High-quality lead
≥ 5warning (yellow)Medium quality
≥ 3info (blue)Low quality
< 3secondary (grey)Very low / insufficient data

Database

Tables

TablePurpose
crm_analytics_leadsFlat materialized table with one row per prospect or company in the CRM pipeline. Primary source for all analytics queries.
crm_analytics_snapshotsPre-computed analytics payloads for the four standard date presets, keyed by MD5 hash.
crm_analytics_sync_logWatermark and metrics for each incremental sync run.

crm_analytics_leads Columns

ColumnTypeDescription
record_idINTSource entity ID (company or prospect)
record_typeENUM'company' or 'prospect'
record_nameVARCHARCompany name / prospect company name
created_atDATETIMEOriginal creation date of the source entity
executive_idINTAssigned salesman/executive ID
executive_nameVARCHARAssigned salesman name
locale_idINTLocale (country)
follow_up_idINTCurrent follow-up status ID
stage_nameVARCHARParent stage name (if parent exists) or current stage name
stage_colorVARCHARCSS class for stage badge
is_lostTINYINT1 when stage is lost/not_interested/customer_lost
is_wonTINYINT1 when stage is won
tier_valueINTRaw shipment count (for companies: max monthly; for prospects: declared monthly)
tier_labelVARCHARDerived tier bucket: blank, T0T5
has_shipmentTINYINT1 if at least one non-cancelled shipment exists (companies only)
has_rechargeTINYINT1 if at least one payment exists (companies only)
monthly_shipmentsINTDeclared monthly shipments from follow_up_data
product_tagVARCHARPackage content translation tag from catalog_suggestions_package_content
account_valueDECIMALEstimated account value in local currency
locale_rateDECIMALCurrency exchange rate to normalize account_value to USD
first_activity_atDATETIMETimestamp of the first follow-up note
won_atDATETIMETimestamp of the first won-stage note
last_activity_atDATETIMEMost recent IFNULL(scheduled_date, created_at) from follow-up notes where the date is <= NOW(). Updated by the sync cron and live on every note insert/edit.
closing_daysINTTIMESTAMPDIFF(DAY, first_activity_at, won_at)NULL if not yet won
age_daysINTDays since created_at — recomputed on every sync run
synced_atDATETIMELast time this row was written by the sync job

Entity Relationship

Key Decisions

DecisionReasoningAlternatives Considered
Materialized flat table (crm_analytics_leads) instead of querying source tables directlyThe original UNION ALL queries joining 5+ tables took several seconds at scale. A flat table reduces analytics queries to simple COUNT/SUM with WHERE filters.Indexed views (MySQL doesn't support them natively); query result caching at the app layer (harder to invalidate).
MD5 cache key includes only start|end|localeIds — interactive filters excludedAdding salesman/tier/stage/product to the key would create an exponential number of cache entries (combinatorial explosion). These filters are only used by managers in specific investigative sessions and hit live compute instead.Per-filter snapshot precomputation (too many combinations); no cache at all (acceptable only if the database could serve queries fast enough).
Mutually-exclusive virtual stagesThe sales funnel and the outcome pie chart must account for every lead exactly once. Without virtual stages, a lead with stage_name = 'won' that also has is_lost = 1 would appear in two buckets. The priority (lost > won > in_recharge > pipeline) makes each bucket deterministic.Separate charts per stage type (less coherent dashboard); allow overlap (double-counts in aggregates).
Rolling refresh of 10,000 oldest rowsSome changes (e.g., salesman reassignment on companies) do not update any related-table created_at timestamp, so dirty detection misses them. The rolling pass ensures every row is eventually re-fetched.Event-based triggers (MySQL triggers to write a change log — high write amplification); full table refresh on every sync (too slow at scale).
Watermark = sync start time − 2 minutesIf the sync job runs for, say, 90 seconds, any row inserted at second 45 would be missed by the next sync that starts reading from the job's end time. Subtracting 2 minutes ensures a safe overlap window at the cost of re-syncing a small number of rows.Use job end time as watermark (misses concurrent writes); use NOW() at moment of log write (same problem).
Tier thresholds kept in both JS and SQLThe _tierLabel() JS function (sync job) must produce the same labels as the SQL CASE expression (analytics materialized script). Centralizing them in a shared constant is not practical across two runtimes. A code comment documents this coupling.Single source of truth in a config file loaded by both (would require a build step for the SQL); compute tier_label only in SQL (removes the JS function, simpler if future tools only use the DB).
Prospects always have has_shipment = 0 and has_recharge = 0Prospects have no company_id, so they cannot have shipments or payment history. Hardcoding to 0 avoids a conditional sub-query and keeps the funnel logic uniform.Null instead of 0 (requires IS NULL checks throughout the analytics queries).
Cutoff date applies only to new-record discovery, not to refresh or cleanupThe cron must not mass-import the full historical database on every run, so CRM_ANALYTICS_CUTOFF_DATE is applied when fetching IDs that are not yet in the materialized table. However, rows already present (e.g., a pre-2025 client inserted via live upsert) must be kept fresh and must not be silently deleted. Removing the cutoff from the cleanup NOT EXISTS guard achieves this without additional bookkeeping.Apply cutoff everywhere (pre-2025 clients can never appear); apply cutoff nowhere (full historical scan on every cron run).

Dependencies

  • Internal: CRM module — source data for prospects and companies; catalog_follow_up_statuses catalog
  • Tables: companies, prospection_users, follow_up_data, follow_up_comments, prospect_follow_up_comments, catalog_follow_up_statuses, company_monthly_tiers, payment_history, shipments, catalog_suggestions_package_content, locales, shops, about_yourself_company_answers
  • Backend libraries: date.util.js (formatDateYMD, startOfMonth, subDays, subMonths)
  • Constants: backend/constants/crm.constants.js (CRM_ANALYTICS_BATCH_SIZE = 500, CRM_ANALYTICS_ROLLING_LIMIT = 10000), backend/constants/lead-scoring.constants.js (CRITERIA_COUNT = 6)

Envia Admin