Work Methodologies DefinedTerm

CRISP-DM

Also known as: CRISP-DM, Cross Industry Standard Process for Data Mining

Cross-industry standard process for data mining and data science projects, structured in six iterative phases.

Updated: 2026-01-04

Definition

CRISP-DM (Cross-Industry Standard Process for Data Mining) is a process model published in 1999 by a consortium of European companies (SPSS, NCR, Daimler-Chrysler, OHRA). It structures data mining and data science projects into 6 cyclical phases: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, Deployment.

Despite its age, CRISP-DM remains the most adopted framework: KDnuggets 2014 survey reported 43% adoption, more than double any alternative. Its strength is being industry-agnostic and tool-agnostic.

The six phases

1. Business Understanding (15-20% of time)

  • Define business objectives: what does the organization want to achieve?
  • Translate into analytics objectives: what question to answer with data?
  • Assess situation: available resources, constraints, risks
  • Define success criteria: measurable metrics

2. Data Understanding (20-25% of time)

  • Collect initial data: identify and access data sources
  • Describe data: volume, format, coverage, data dictionary
  • Explore data: descriptive statistics, visualizations, correlations
  • Verify data quality: completeness, accuracy, outliers

3. Data Preparation (50-70% of time)

  • Select data: choose relevant variables and records
  • Clean data: handle missing values, outliers, duplicates
  • Construct data: feature engineering, aggregations, derive new features
  • Integrate data: merge from different sources
  • Format data: transform for modeling tools (normalization, encoding)

4. Modeling (10-20% of time)

  • Select modeling technique: regression, classification, clustering, etc.
  • Design test plan: train/validation/test split, cross-validation
  • Build model: train algorithms with optimal parameters
  • Assess model: accuracy, precision, recall, F1, AUC, etc.
  • Iterate: return to data preparation if performance insufficient

5. Evaluation (5-10% of time)

  • Evaluate results: does the model meet business success criteria?
  • Review process: identify skipped or to-be-reviewed steps
  • Determine next steps: deployment, new iterations, or project termination

6. Deployment (5-10% of time)

  • Plan deployment: how to put into production (batch, real-time, embedded)
  • Plan monitoring: how to monitor performance and data drift
  • Produce final report: document findings and recommendations
  • Review project: lessons learned for future projects

Iterative nature

CRISP-DM is not waterfall. The arrows in the circular diagram indicate you can return to previous phases:

  • Modeling reveals data quality issues → back to Data Preparation
  • Evaluation shows insufficient model → back to Modeling or Data Understanding
  • Deployment discovers edge cases → back to Data Preparation or Business Understanding

The outer cycle (from Deployment to Business Understanding) represents successive projects that refine the solution.

Modern adaptations

CRISP-ML(Q): 2020 extension for production ML, adds monitoring, maintenance, and quality assurance phases. Addresses ML-specific concerns like model drift, retraining, and A/B testing.

Agile Data Science: integration of CRISP-DM with Agile sprints. Each sprint executes mini CRISP-DM cycles, delivering value increments. Favored in teams adopting DataOps.

TDSP (Team Data Science Process) by Microsoft: more prescriptive version with templates, checklists, and Azure-specific tooling. Emphasis on collaboration and reproducibility.

Practical considerations

Data Preparation dominates: 50-70% of time goes into this phase. Underestimating this effort is a common cause of project delays. Investing in upfront data quality (data governance, catalogs) reduces this overhead.

Business Understanding is critical: projects starting from “we have data, let’s find insights” (data-first) fail more than those starting from business problem. CRISP-DM forces starting from business understanding.

Deployment is often overlooked: many projects end with Jupyter notebooks or PowerPoint reports. CRISP-DM reminds that value is realized only through deployment and user adoption.

Skill gap: CRISP-DM requires both technical skills (modeling, data engineering) and business skills (domain knowledge, stakeholder management). Junior data scientists tend to over-focus on modeling.

Alternatives and comparisons

SEMMA (Sample, Explore, Modify, Model, Assess): SAS process, more tool-specific and less emphasis on business understanding.

KDD (Knowledge Discovery in Databases): academic predecessor of CRISP-DM, more theoretical and less practical.

Agile/Lean: complementary frameworks. CRISP-DM defines “what to do”, Agile defines “how to organize the team”. Many orgs combine CRISP-DM with sprints and retrospectives.

Common misconceptions

”CRISP-DM is waterfall”

No. The phases are iterative. You regularly return to previous phases when discovering new information. The circular diagram represents this cyclicality.

”CRISP-DM is obsolete, superseded by Agile”

False. CRISP-DM and Agile operate at different levels. CRISP-DM structures analytical workflow, Agile structures team and delivery. They complement each other.

”CRISP-DM ignores production and monitoring”

No. The Deployment phase explicitly includes monitoring and maintenance planning. Many projects neglect this phase, but the framework includes it.

”CRISP-DM is only for classical data mining, not for deep learning”

Not true. The principles (understand business, prepare data, model, evaluate, deploy) apply to any ML approach, including deep learning. CRISP-ML(Q) modernizes specific details.

  • DataOps: methodology to accelerate CRISP-DM through automation
  • Agile Software Development: framework to organize iterative sprints
  • LLM: modern approach building on machine learning foundations that CRISP-DM guides
  • DevOps: parallel discipline for software deployment related to CRISP-DM deployment phase

Sources

  • Chapman, P. et al. (2000). CRISP-DM 1.0: Step-by-step data mining guide
  • Provost, F. & Fawcett, T. (2013). Data Science for Business
  • KDnuggets (2014). “Poll: What main methodology are you using for your analytics, data mining, or data science projects?”
  • Studer, S. et al. (2020). “Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology”