Duplicate Value Rate measures the percentage of duplicate values in text and lookup fields.
Duplicate Value Rate measures which percentage of records share their value with at least one other record, analyzed per field. In the results, "# Different Values" is the count of distinct values found. "# Records with Duplicate Value" is the number of records that share their value with at least one other record. Blank values are excluded and values are trimmed of whitespace before comparison.
This analysis type helps to identify fields with high redundancy that may benefit from deduplication or normalization efforts.
For checking whether all values in a field are unique across all records, use Uniqueness Detection instead.
Configuration
In Library Analyses you might want to disable this analysis type for fields where it's normal and expected to have duplicate values, such as City, Country, Account Owner, etc.
Indicate whether the check for duplicate values should be case sensitive or not. When Case Sensitive is enabled, values aa and Aa will be considered two separate, unique values.
Set a threshold for what constitutes a good, warning level, or critical duplicate value rate. For many fields that you apply a Duplicate Value Rate analysis to you'll want to have as few duplicate fields as possible. This would mean for example a good duplicate value rate is 10% or less, warning level would be between 10-30%, and more than 30% duplicates would be critical.

Detailed Job Results
# Records with Value is the number of records where this field contains a value.
# Different Values is the count of distinct values found. For example, in a set of six records, with values [A, A, B, C, C, C] , the number of different values (distinct values) is three: A, B, C.
# Records with Duplicate Value is the number of records that share a value with at least one other record for this field. Blank values are excluded, and values are trimmed of whitespace before comparison.
In the bar chart, this is displayed as "# Records with Duplicate".
Note that in a set of duplicate records, one is always considered the “master” record of which the others are duplicates. That ‘master’ record is not counted. For example, in a set of 26 records, all with the same value for a certain field, the ‘# Records with Duplicate Value’ will be 25, not 26.
% Duplicate Value Rate calculates the number of records with a duplicate value for this field as a percentage of all analyzed records that contain a value for this field.
In the bar chart, this is displayed as “% Duplicate Rate”.
Note that because the “master” record is not counted in the # Records with Duplicate Value, the ‘% Duplicate Value Rate’ will never be 100%.
Case Sensitive indicates whether case sensitivity was enabled or not. Where this column displays a check mark
, a distinction was made between lower and upper case when searching for duplicate values. In that case, Arnhem and arnhem would be considered two different values.
Key Insights
- Duplicate Value Rate: see whether many records share the same values, or that most records have distinct values.
- Level of non-uniqueness: a field can be non-unique (fails Uniqueness Detection), but still have a low duplicate value rate (only a few records with duplicate values). Or it is widely duplicated (high Duplicate Value Rate) and is therefore clearly unsuitable as a key or matching field.
-
Input behavior: a high Duplicate Value Rate can indicate
- Default or placeholder values being reused over and over
- Integrations writing the same value for many records
- Users reusing codes or values intended to be unique
Recommended Actions
| Scenario | Actions |
|---|---|
| Field should be unique, but has a Duplicate Value Rate above 0% |
|
| Field not intended to be unique, and Duplicate Value Rate above 0% |
|
| Protecting reporting, routing and AI |
|