Standard Deviation

Plauti Context Analysis Types

Last published at: May 12th, 2026

Standard Deviation Analysis calculates the mean, standard deviation, and coefficient of variation for numeric fields.

 

With the Standard Deviation analysis, you analyze how spread out the values are in a numeric field. Standard Deviation analysis calculates:

  • Mean: the average value across all records with a value.
  • Standard Deviation (SD): how much the values deviate from the mean on average.
  • Coefficient of Variation (CV): standard deviation relative to the mean. (Standard Deviation / Mean)
    A higher CV indicates greater relative variability in the field's values, whereas with a lower CV, values are more tightly clustered around the mean.

Use to find out which fields have high variability, and may need validation rules, range checks, or process changes. The Coefficient of Variation lets you compare different fields on the same scale, even if they have different means.

Standard Deviation Analysis focuses on spread and variability
For “which values dominate this field?”, use Value Distribution
For “how many records share the same value?”, use Duplicate Value Rate.

 

Detailed Job Results

The bar chart ranks fields by their Coefficient of Variation (CV), which represents the standard deviation relative to the mean. A higher CV indicates greater relative variability in the field's values. Fields with a mean of zero show a CV of zero. 

Key Insights

  • Value spread: Mean and Standard Deviation show whether values sit in a narrow band or across a wide range, and whether most records are close to the average, or if some are far off. 
    This is helpful for questions like “Are opportunity amounts relatively similar, or all over the place?” or “Is this score field effectively differentiating records?”.
  • Relative variability: The CV normalizes spread by mean, allowing comparison across fields with different means.
    • A CV of 0.1 indicates that values vary slightly around the mean.
    • A CV of 1.0 indicates that standard deviation equals the mean, i.e. a very high variability.

This lets you compare different fields on the same scale, even if one has a mean of 10 and another a mean of 10,000.

  • Potential outliers: a very high standard deviation or CV can indicate a few extreme outliers (e.g. accidentally added zeros), mixed units (e.g. minutes vs. hours, full numbers vs. units of thousands), or other occasional data entry errors that skew the distribution.
  • Field quality:
    • Low variability can mean a stable process, or overuse of same default (verify with the Value Distribution analysis)
    • High variability can mean real business diversity, or lack of validation
  • Analytics readiness: Fields with reasonable spread are more informative for scoring and segmentation; extreme variability might require cleanup first.
Scenario Actions
High SD or CV (CV around 1.0 or more)
  • Apply the Value Distribution analysis to identify extreme values
  • Inspect records at low and high ends
  • Check for data entry errors, mixed units, or incorrect imports
  • Correct or exclude obvious outliers
  • Add or tighten validation rules for min/max ranges and default values
  • Standardize units and scales
Very low variability
  • Apply the Value Distribution analysis to check if one value dominates
  • If one value dominates, reconsider field configuration or defaults
  • If low variability is expected, document the expected range for the field
Designing validation rules
  • Derive mininum and maximum bounds from the analysis (e.g., flag values beyond mean ± 3×SD)
  • Implement validation rules and warning messages
Preparing for reporting/routing/AI
  • Review Mean, SD, and CV before using field in critical thresholds or models
  • Implement winsorizing* or bucketed transformations if variability is too high
  • Document field health in your data dictionary

* Winsorizing handles extreme outliers by capping values at a certain threshold rather than removing them entirely. Replace outlier values with the minimum or maximum threshold value. This reduces the impact of extreme outliers on averages, models, and reports, preserves the record (unlike deletion) so you don't lose data points, and is useful for AI/scoring models where extreme values can distort training.