Content Examination - Entity Match Examples

This article contains information on using entities in Content Examination definitions, detailing exceptions to Privacy Pack requirements, entity operators, and syntax for detecting sensitive information in messages.

Below are examples of using entities as part of a Content Examination definition. This information should be utilized after reading the following pages:

The use of certain entities will require the Privacy Pack dictionaries to be enabled on your Mimecast account. The following entities are exceptions to this and no longer require the Privacy Pack to be used:

  • National Insurance Number (NIN)
  • Vehicle Identification Number (VIN)
  • UK Electoral Roll Number
  • Credit Cards
  • Telephone Numbers
  • National Health Service Number (NHS)
  • UK Driver's License
  • Date of Birth
  • IP Address
  • URL
  • Community Health Index Number (CHI)
  • Email Address
  • Passports
  • IBAN
  • Date

See the Content Examination - Reference Dictionaries and Content Examination - Entities pages for more information, including Privacy Pack and Non-Privacy Pack entities.

Single Entity

You wish to hold all messages containing references to credit card numbers. The "creditcard" entity finds all credit card numbers, regardless of the credit card type. For example, the following would match any credit card number found in the specified areas of an email (header, body, attachment), as long as it is within proximity to a credit card entity keyword. See the "Credit Card" section of the Content Examination - Entity Keywords page for further details.

1 detect creditcard

Multiple Entities with Operators

You want to hold messages that contain a piece of PII (Personal Identifiable Information) and a date of birth that is a specific distance from each other.

1 (detect aba) Proximity:50 (detect date_dob)

This formula detects instances of an ABA number and instances of a Date of Birth (DOB) within a range of 50 characters from each other before a match is made.

By default, the "Proximity" operator has a default distance of 300 characters. Specifying a number value after proximity overrides the default distance.

Ignoring Terms from Entities

Occasionally specific terms in an entity can cause too many false positives. To prevent this, you can ignore particular terms. This allows you to continue using an entity to look for content matches. Take the following example:

You're checking for FDA drug names in proximity to a person's name, but you wish to exclude the name "Susan" as it also matches the name of one of your customers. Here is the syntax you'd use to ignore the name "Susan":

1 (detect fdadrugs) PROXIMITY (detect names) IGNORE (Susan)

If you wish to ignore the term "Asprin' from the FDA Drugs entity:

1 (detect Names) PROXIMITY (detect fdadrugs) IGNORE (Asprin)

The 'ignore' only applies to the second entity specified on a line. No other operators will be allowed once the first operator has been specified.

To ignore multiple terms, you'll need to separate each term with a space as below:

1 (detect fdadrugs) PROXIMITY (detect names) IGNORE (Susan Bob James)

If you're using multiple entities in a search, the 'Ignore' operator only applies to the entity specified at the end of the search. For example:

1 (detect fdadrugs) PROXIMITY (detect names) IGNORE (Susan)  -  correct syntax
1 (detect fdadrugs) ignore (Asprin) PROXIMITY (detect names)  -  incorrect syntax

Excluding Entities

An administrator using the PII entity group notices a high volume of false positives with the "Phone Number" entity and would like to check if excluding this resolves the problem. To do this, they can:

  1. Remove the entry for the PII entity group from the Content Examination definition.
  2. Enter all the individual entities that they wish to use.
Original Word / Phrase Match List New Word / Phrase Match List
1 detect PII
1 detect Names
1 detect date_dob
1 detect SSN
1 detect medicare_id
1 detect VIN
1 detect IP
1 detect Email
1 detect URL

Negative Score

A negative score for the entity can be applied using the same example of the PII entity group but wanting to exclude the "Phone Number" entity. If a match is found for the search term, the negative score is applied to the number of hits total. This reduces the overall score and the chance of applying the Content Examination policy. To do this, they can:

  1. First, leave the PII entity group in the Content Examination definition.
  2. Then, use a negative score for the "Phone Number" entity.
Original Word / Phrase Match List New Word / Phrase Match List
1 detect PII
1 detect PII
-1 detect PhoneNumber

No Keywords

This operator disables the context keyword matching associated with many of the supported entities. Using this operator increases the likelihood of false positives occurring but simplifies whether or not a match is likely to be found. For example, using the NKW operator causes the checks for terms associated with social security numbers to be ignored, and the check will only look for a regular expression match:

1 detect SSN_NKW

Adding Entities

Reports of Canadian Social Security Numbers (SIN) being present in messages sent externally. On investigation, the SIN number entity is not present in the policy. To do this, they can add the SIN entity to the Content Examination definition.

1 detect SIN

Combining Entities and Phrases

You want to search for the term "Admission Date" followed by a date in Month / Day / Year format. This can be achieved by using the following policy syntax.

1 ("Admission Date") Proximity (detect date_mdy)

Where:

      • 1 is the line score applied when a match is found.
      • ("Admission Date") is the first check performed. This must be in brackets to mark the boundaries of the search text.
      • Proximity is the operator. In this case, the phrase "Admission Date" needs to be within 300 characters of a date in Month/Day/Year format.
      • (detect date_mdy) is the second check performed. Again, this must be in brackets to mark the boundaries of the search term.

ICD10 Entity

ICD10 Category codes have been split from the ICD10 entity to give more flexibility:

1 (detect icd10cm_categories) IGNORE (r10)

See Also...

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.