Synonym Matching

Last published at: 2024-10-04 09:39:53 UTC
Delete

Synonym Matching is available for Premium and PDM licenses.

Synonym Matching lets you define words that are different from each other, but that should be considered duplicates of each other.

For example, two Leads: Robert Johnson from the US, and Bob Johnson from the U.S.A. might very well be the same person. Both their first names and their countries are just variants that can be used interchangeably. By adding Robert and Bob, and US and U.S.A., as synonyms, the two leads will score as 100% duplicate based on their first name, last name and country.

Create lists of synonyms for fields that you use in scenarios. Then indicate in your scenarios if and which synonym lists should be used when searching for duplicates.

The synonym lists are used per field, and you can only select one list per field, so it is recommended to create synonym lists based on the different fields you use in scenarios. E.g. a list for First Names, a list for Country Names, etc.
You can apply a synonym list to multiple fields.

In a list you mention several synonyms of the same type. So a First Names list could look like this:



These are all first name synonyms. All names on one line are considered synonyms of each other. When this list is used for first names in a scenario, Alex Starr and Sasha Starr will match 100%, but Alex Starr and Cathy Starr will only score 100% on their last name, not on their first name.

Use synonym lists for example for first names, country names, departments, business entity types, job titles, etc.

Synonyms and Frequent Words

Note that there is a difference between Synonyms and Frequent Words‍. 

  • Synonyms matches two different words as 100% duplicate of each other. "Bob" and "Robert" will score 100% if added as synonyms.
  • Frequent Words ignores certain words when calculating duplicate scores, as they should not be considered duplicates of each other even though they are the same. The frequent word "University" in Tokyo University and Leiden University is not taken into account when calculating the duplicate score, only "Tokyo" and "Leiden" are compared.

When both are in use in a scenario, the Frequent Words are applied first, and Synonyms second. Therefore there's no need to add frequent words in synonyms, and in fact it is recommended to leave frequent words out.
For example, "Street" and "St." only need to be added to the Frequent Words list, and don't need to be on the synonym list. You can add "Holland" and "Netherlands" to a country name synonym list, but "The Netherlands" should not be added if "the" is already used as a frequent word for the same field.

Creating a Synonyms List

To create a new list with one or multiple words and their synonyms:

  1. Go to DC Setup.
  2. At left, under General Setup, click Synonym Matching.
  3. At top right, click + Add List.
  4. Enter a Synonym List Name. Pick a name that describes for which field the list can be used, like "First Names" or "Country Names".
  5. Click Save.
  6. In the new list, click + Add Synonyms .
  7. Add a word and its synonyms, separated by commas or comma spaces.
    All words that are synonyms of each other should go on one line. To add a new word and its synonyms, hit "Enter" to add a new synonym line.
    To include a comma as part of a word or synonym, enter escape character \ (backslash) in front of the comma.
  8. Click Add to list .

The list is added to the Synonym Matching overview. You can now add it to a field in a Scenario‍.

Delete

The limits

You can add a maximum of 10 lists. Each list can contain about 200,000 characters.
Apart from that there is also the Salesforce storage limit for managed packages. This limit is the same size as your org limit, up to 20MB.
So the size limit for synonyms is about 200,000 characters per synonym list, until the Salesforce storage limit for Duplicate Check is reached.

Editing a Synonyms List

  • At the top of a list, click + Add Synonyms to add a new line with a new word and its synonyms to the list. The list opens in its entirety, so place your cursor under the existing words to add the new word on a new line.
  • On a synonym line, click Edit to edit a word and its synonyms.
  • On a synonym line, click Delete to delete only that word and its synonyms. This cannot be undone!
  • At the top of a list, click Delete All Synonyms to delete all words of the list and their synonyms, but keep the empty list. This cannot be undone!
  • At the top of a list, click Delete List to delete the entire list. This cannot be undone!

Using synonym lists in the search for duplicates

In your Scenarios, indicate which fields should take synonyms into account. The next time you search for duplicates using those scenarios, words that are listed as synonyms will match as 100% to each other.

  1. Go to DC Setup and select an Object at left.
  2. At tab Scenarios, fold open the scenario where you want to apply a synonym list.
  3. Find the field that can contain words from the synonym list.
  4. Under Synonym Matching next to the field, select the synonym list you want to use for that field.

Delete

Overlapping Synonyms

Sometimes synonyms can overlap. For example, Alex might be synonymous to both Alexander and Alexandra, but you do not want Alexander and Alexandra to match as duplicates. Furthermore, you can only use a synonym word once in a list, and use only one list per scenario.

To solve this, you can create two lists, e.g. "Female First Names" and "Male First Names". Then use each in a separate scenario and apply both scenarios.

Do note that if you use both scenarios together in a DC Job, all variants of the overlapping synonyms will be grouped together as duplicates of each other, if the double synonym is present in a record. So, to continue the Alex/Alexander/Alexandra example: in a job that contains an Alex, that record will be in one group containing both Alexander and Alexandra, all scoring 100% at First Name. In a job with only an Alexander and an Alexandra, these will not match as duplicates.

When using both scenarios in DC Entry or another method that only involves one record, and a record for an Alexandra is created or updated, Alexander will not come up as a potential duplicate, only Alex will. When creating or updating a record for Alex, both Alexander and Alexandra will come up.