Deduplicating 500,000+ Dynamics 365 Records at Scale

Deduplicating 500,000+ Dynamics 365 Records at Scale
April 14, 2026
Reading time: 7 minutes

Five hundred thousand records spread across multiple business units is not an edge case. It's the reality for mid- to large-sized enterprises running Dynamics 365 as their system of record. And at that scale, the question of deduplication becomes less about whether you have duplicates and more about how bad it already is, and what you can actually do about it without grinding your system to a halt.

Across organizations using Plauti Deduplicate, a significant share of incoming records are flagged as duplicates, in many cases exceeding 40%. At 500,000 records, that could mean over 200,000 records you can't fully trust. And if your organization spans multiple business units with separate data entry points, the duplication rate compounds across every integration, import, and manual entry.

Why Dynamics 365’s Native Tools Can’t Handle This

Dynamics 365 ships with built-in duplicate detection. For small datasets and simple scenarios, it works well enough. For 500,000 records across business units, it falls short in ways that matter.

Native duplicate detection in D365 has a hard cap of 5,000 duplicates per bulk detection job. If your dataset exceeds that threshold, and at this scale it will, the job either stops or returns incomplete results. That's not a configuration issue. It's a platform limit.

There's also no fuzzy matching. Native rules support exact matching, first-N-character matching, and last-N-character matching. That means "Jon Smith" and "Jonathan Smith" won't be flagged as the same person, even if they share an email, a phone number, and an account. In real enterprise data, records rarely duplicate cleanly. People type names differently, companies get entered under multiple abbreviations, and phone numbers appear with or without country codes.

And once duplicates are found? There's no bulk merge. Each record pair requires individual review and manual action. At 500,000 records, that's not a workflow. It's a full-time job for months.

Deduplicating 500000 Dynamics 365 Records at Scale 1

What Actually Matters at This Volume

When evaluating deduplication solutions for large Dynamics 365 deployments, four capabilities determine whether a tool is viable or not.

Fuzzy and phonetic matching

The tool needs to catch near-matches, not just exact ones. "Acme Corp" and "ACME Corporation" should resolve to the same entity. So should slight name variations, transposed digits in phone numbers, and alternate email formats. Plauti Deduplicate for Dynamics 365 supports a full library of matching methods, including fuzzy text matching, Company Name algorithms that absorb abbreviations and suffixes, Person Name matching that combines phonetic comparison with Levenshtein distance, and Phone Number normalization that treats "(415) 555-2761" and "+1 415 555 2761" as identical.

Bulk processing without performance impact

Running a dedupe job on 500,000 records shouldn't lock up the system for active users. DC Jobs in Plauti Deduplicate run outside of Dynamics 365, either through Plauti Cloud (a dedicated, fenced-off Azure environment where each job gets its own isolated server) or DC Local (running on your own machine or server). Your D365 environment stays responsive while jobs process. Results come back only after the job is finished.

Cross-entity logic

In D365, duplicates don't just live within a single entity. A Lead might duplicate a Contact. A Contact at one business unit might duplicate a Contact at another. Plauti's cross-entity deduplication for Dynamics 365 handles exactly this, checking Leads against existing Contacts and Accounts, in real-time prevention and in batch DC Jobs.

Automation that reduces manual review

At 500,000 records, even reviewing 10% manually is 50,000 decisions. Plauti's Auto Merge runs as a sub-job on top of a completed DC Job. You set a minimum matching percentage. Records scoring above that threshold are merged automatically. Records below it stay in the job results for manual review. The queue shrinks to a fraction of the total.

How Plauti Deduplicate Handles High-Volume D365 Environments

Plauti Deduplicate for Dynamics 365 is built natively on the platform. No data leaves your environment through unsecured channels. When DC Jobs run on Plauti Cloud, data is exported and imported via encrypted connections, and the job server is terminated once processing is complete. Even Plauti cannot access your data during a job run.

Scenarios and matching methods

You configure Scenarios per entity, combinations of fields, matching methods (exact or fuzzy), field weights, and score thresholds. For example, a Contact Scenario might combine Email (exact), Full Name (Person Name fuzzy), and Company (Company Name fuzzy), weighted so that email carries the most influence. Two records score as duplicates only if the combined weighted score meets or exceeds your threshold. This keeps detection precise and explainable; admins can see exactly which fields drove a match.

Merge Rules

When duplicates are found, Merge Rules decide what happens. You configure per entity which record becomes the master, the oldest record, the most recently updated, a specific field value, or a specific business unit's record. You also define what happens to losing records: deactivate or delete. Related records (Activities, Opportunities, etc.) are reparented to the master, so no history is lost.

Auto Merge

After a DC Job finishes, Auto Merge can run as a follow-up step. Set a minimum matching percentage, choose where to run it (Plauti Cloud or DC Local), and the system merges all pairs at or above that score. Lower-scoring pairs stay in the job results for manual review. This is the only realistic way to close a project involving hundreds of thousands of records.

Prevention after cleanup

Once historical duplicates are resolved, Plauti's Prevention Features stop new ones from forming. You configure per entity where duplicate checks run, on form save, as-you-type while a user fills in a form, on Quick Create, and on API/import inserts. For high-confidence API matches, Direct Processing can auto-merge incoming records before they ever hit your database. For lower-confidence matches, suspected duplicates are routed into a DC Job queue for review.

For multi-business-unit environments, Plauti handles cross-business-unit visibility so duplicates that exist across organizational boundaries don't slip through. And because everything runs inside Dynamics 365/Dataverse, there are no external ETL pipelines to maintain, no sync delays, and no separate application your team has to learn.

Structuring a Deduplication Project at 500,000 Records

Even with the right tool, a project of this size benefits from a phased approach.

Start with a data audit

Before merging anything, understand the scope. Run a DC Job, or several, filtered by entity and business unit, to quantify duplicates. This surfaces the highest-concentration areas and shapes your merge strategy. It also gives you a before-and-after baseline to measure progress.

Define Scenarios and Merge Rules before you run bulk operations

Decide which matching methods apply to each entity, set field weights, and configure your score thresholds. Then define Merge Rules: which record wins, what happens to related data, and how losing records are handled. Document these decisions so they're applied consistently and can be reviewed later.

Run DC Jobs in stages

Don't try to process all 500,000 records in one job. Filter by entity, business unit, or record creation date range. This keeps jobs manageable, makes errors easier to isolate, and lets you validate results before moving to the next segment. Plauti Cloud can run multiple jobs in parallel, which speeds this up considerably.

Use Auto Merge for high-confidence pairs

Set a conservative minimum matching percentage for your first run, say 90% or higher. Review a sample of the merged records to confirm accuracy. Then gradually expand to lower thresholds as confidence grows.

Enable Prevention after cleanup

Once existing duplicates are resolved, turn on Prevention Features for new records. Cleaning 500,000 records and then letting duplicates re-enter freely is wasted effort. Configure prevention on Leads, Contacts, and Accounts at a minimum, on form save, Quick Create, and API insert.

Deduplicating 500000 Dynamics 365 Records at Scale 2

The Manual Review Problem

This is where most deduplication projects actually stall. The tool finds 80,000 potential duplicates. Someone has to review them. Nobody has time. The project sits in a queue for months until someone declares the data "good enough" and moves on.

The fix isn't more reviewers. It's smarter triage. A well-configured Plauti setup auto-resolves the easy cases, same email, same name, same phone, and only surfaces genuinely ambiguous records for human judgment. Setting a high Auto Merge threshold handles the obvious cases without manual input. What's left is a much smaller queue of records that actually need a human decision.

If you're managing 500,000 records across business units and need deduplication that actually finishes, the native D365 tools won't get you there. They weren't built for this volume, this complexity, or this scale of ongoing prevention.

Plauti Deduplicate was. Start with a free trial on Microsoft Marketplace or book a call with us to see how it handles your specific environment.

Hungry for more?
View resources