How to Reduce False Positives in Dedupe Matching (Without Losing Real Duplicates)

How to Reduce False Positives in Dedupe Matching (Without Losing Real Duplicates)
April 23, 2026
Reading time: 8 minutes

You set up your deduplication job, hit run, and then... chaos. Records that are clearly different get flagged as duplicates, and your team spends the afternoon clicking "Not a duplicate" over and over again.

Sound familiar? It should. False positives are one of the most common complaints in data quality work, and they have a real cost. Too many of them, and people stop trusting the process. They ignore the alerts, skip the reviews, and your entire data quality effort quietly falls apart.

Here's the thing, though: false positives aren't a sign your deduplication tool is broken. They're almost always a sign that your matching configuration needs tuning. Here's exactly how to do that.

Start Tight, Not Loose

The most common mistake is starting with broad fuzzy matching across every field. It feels thorough, but it creates enormous noise. A record for "John Smith at Acme Corp" and "Jonathan Smithfield at Acme Construction" might score as a probable duplicate when they clearly aren't the same person.

A better approach: start with exact-match filters on your most reliable fields. Email address is the gold standard. If two leads share the same email, they are almost certainly duplicates. Phone number is a solid second. The Company domain works well for Account deduplication.

Once you have caught all the obvious matches, then introduce fuzzy logic to handle variations like typos and nickname differences. Build from certainty outward. Don't start fuzzy and try to tighten things up after the fact.

Fix the Empty Field Rule First

This one is responsible for more false positives than anything else, and it is almost always overlooked.

By default, the Empty Field Rule in a Plauti Deduplicate scenario is set to Disregard. That sounds harmless, but what it actually means is this: if a field is empty, it gets ignored completely when calculating the matching score. So, if a contact has no phone number on file, that blank field doesn't lower their score at all.

The result? A record with just a first name and a postal code can match at 100% against another record with the same first name and postal code, even if they are completely different people.

The fix is simple. Go into your scenario settings and change the Empty Field Rule from Disregard to Score 0%. Now empty fields actively bring the score down. That single change will often cut your false positive count significantly on the very next job run.

Choose the Right Matching Method for Each Field

Not every field should use the same matching method. This is where a lot of configurations go wrong.

Email addresses should use Exact matching. If you apply fuzzy logic to an email field, you will start flagging records that share the same domain as potential duplicates. That's technically "similar," but it is not a meaningful match signal at all.

The same logic applies to external IDs, EINs, or any unique identifier your organization uses. Use Exact matching there. Assign those fields a high weight. If two records have different values in a field that is meant to be unique, they are almost certainly not the same record.

For names, fuzzy matching makes sense. A "First Name" field benefits from synonym matching so that Bob and Robert get caught. A "Last Name" field works well with a partial fuzzy approach to handle minor typos. Company names are trickier. "IBM" and "International Business Machines" are the same company, but a loose fuzzy match might also connect "IBM" to "IB Medical." The Company Name matching method in Plauti Deduplicate is specifically built for this kind of comparison and handles it far better than plain fuzzy text.

Matching the algorithm to what the field actually contains. That single habit removes a large share of false positives before you even touch your threshold settings.

The Plauti Deduplicate matching methods guide covers which methods apply to which field types in detail.

How to Reduce False Positives in Dedupe Matching 2

Raise Your Threshold

If your scenario threshold is sitting at 60% or 70%, you are going to see a lot of noise. That range is broad enough to catch real duplicates, but also broad enough to pull in plenty of records that just happen to share a few common values.

Tightening your threshold to 80–85% is a good starting point for most scenarios. It requires a closer overall match before a pair gets surfaced for review. Combine this with the Empty Field Rule fix above, and you will see a meaningful drop in false positives right away.

For automated merging without manual review (Direct Processing), set your threshold higher still. 99–100% is recommended. You only want to auto-merge records when you are very confident. A wrong merge is much harder to undo than a missed duplicate.

Use Secondary Fields to Break Ties

A scenario built on two or three fields is fragile. Two records matching on Name and Record Type alone is not enough signal to call them duplicates with confidence.

Add more discriminating fields. Postcode, email, phone, city, street, job title. The more concrete identifiers you include, the harder it becomes for two unrelated records to score above your threshold. When primary fields are close but not conclusive, secondary fields like State or Department can be the deciding factor.

Weight your most unique fields the highest. Email and phone should outweigh first name. Postcode should outweigh city. The hierarchy of your field weights is just as important as which fields you include.

Add a Frequent Words List

Some words show up in nearly every record of a certain type. Legal entities like "Inc," "LLC," and "Ltd." Common industry terms like "Consulting," "Restaurant," or "Services." These words don't help distinguish one record from another, but if they're included in your matching logic, they push scores up and create false matches.

In Plauti Deduplicate, you can configure a Frequent Words list per field. Words on that list get excluded from the matching calculation. So "Willson Architects" compares against "Willson & Son" as "Willson" vs. "Willson & Son," which gives you a much more honest score.

One thing to verify: check that Activate Improved Frequent Words Handling is switched on. You'll find it in DC Setup > Settings > Critical Updates. Without it, frequent words may not be excluded correctly in all matching scenarios, even if you have already set up the list.

Use Match Scores as a Gate, Not a Guarantee

Most deduplication tools give you a similarity score for each record pair. The temptation is to auto-merge everything above a certain threshold. That's where false positives turn into false merges, which is a much bigger problem.

A tiered workflow works far better:

  • Auto-merge only records that hit a very high confidence threshold on high-certainty fields, like an identical email address.
  • Queue for review everything in the middle range, so a human can confirm before any merge happens.
  • Auto-dismiss pairs that fall below your minimum threshold.

The records your team reviews should be genuinely ambiguous, not obvious non-duplicates. If your review queue is full of clearly different records, your lower threshold is too low.

How to Reduce False Positives in Dedupe Matching 3

Let AI Handle the Gray Zone

Even with well-tuned scenarios, some record pairs will sit in that uncertain middle range. Not obvious duplicates, not obviously distinct either. That gray zone is where review time piles up fast.

AI Match Recommendation in Plauti Deduplicate is built specifically for this problem. After running a job, you can apply AI Match Recommendation to the results. It analyzes each duplicate pair across all fields in context and labels it as Duplicate, Not Duplicate, or Uncertain, with a confidence score and a reason.

Your team then focuses only on the pairs the AI flags as genuinely uncertain. Everything the AI calls clearly duplicate or clearly not a duplicate gets handled automatically. It doesn't replace human judgment. It focuses on where it actually matters.

AI Match Recommendation is available in the Premium and PDM editions. When used alongside Auto Merge, it lets you safely lower your auto-merge threshold slightly (to around 97–98%) because the AI recommendation adds an extra layer of confidence.

When You Still Get False Positives, Use the Discard Feature

Even after tuning your scenario, some false positives will come through. That's normal.

When you come across a pair that clearly isn't a duplicate, mark it as False Positive in the DC Job results or DC Live. That specific pair won't appear in any future job results. Each discard is specific to that pair, so both records remain eligible to match with other records going forward.

If your matching criteria changes later and you want to reconsider previously discarded pairs, you can remove them from the DC Discards tab. They'll re-enter the matching pool on the next job run.

Review, Adjust, Repeat

Deduplication configuration is not a one-time project. Your data changes. Your processes change. What worked six months ago might generate more noise today.

Build a habit of reviewing your false positive rate after each job run and adjusting one variable at a time. Change one field's matching method, or raise the threshold by five points, then run the job again and compare. If you change too many things at once, you won't know what actually helped.

The Plauti Deduplicate scenario setup guide is a useful reference to keep open while you're tuning. And if you're starting from scratch, doing a record analysis first, listing your objects, identifying your most discriminating fields, and planning your weight hierarchy before you build anything, saves a lot of back-and-forth later.

Quick Reference: Common Causes and Fixes

Problem Likely cause Fix
Records matching on a single field Empty Field Rule set to Disregard Change to Score 0%
Email domains flagged as duplicates Fuzzy matching on email field Switch to Exact matching
Common company names causing noise No Frequent Words list Add Frequent Words in Premium
Too many pairs in review queue Threshold too low Raise to 80-85%
Uncertain pairs taking too long to review Manual review only Enable AI Match Recommendation
Previously dismissed pairs reappearing Discard not applied correctly Mark as False Positive in DC Job/DC Live

Frequently Asked Questions (FAQ)

What is a false positive in deduplication?

A false positive is when two records are flagged as potential duplicates but are actually distinct records

It happens when matching scores get inflated by configuration settings, broad field matching, or insufficient matching criteria.

Why does the Empty Field Rule cause false positives?

When the Empty Field Rule is set to Disregard, empty fields are excluded from the score calculation entirely. This means records with fewer filled fields can score very high on the fields they do share, even if they're clearly different records overall. Setting it to Score 0% ensures empty fields bring the match score down.

What matching method should I use for email addresses?

Always use Exact matching for email addresses. Fuzzy matching on email will flag records that share only a domain as potential duplicates, which is not a reliable signal of duplication.

What threshold should I set to reduce false positives?

For manual review jobs, 80–85% is a solid starting point. For automated merging (Direct Processing), 99–100% is recommended. With AI Match Recommendation enabled, you can safely lower auto-merge thresholds slightly to around 97–98%.

What is Plauti's AI Match Recommendation?

It is a feature in the Premium and PDM editions of Plauti Deduplicate that analyzes job results using AI and labels each pair as Duplicate, Not Duplicate, or Uncertain. It helps your team focus review time on genuinely ambiguous pairs rather than working through the entire result set manually.

Can I undo a False Positive discard?

Yes. Discarded pairs are stored in the DC Discards tab. Removing a pair from there allows it to appear again in future job results.

Hungry for more?
View resources