Content Examination - Entity Groups

This article contains information on how entities and entity groups function in Content Examination - Definitions. detailing the use of validators, regular expressions, and word lists to identify sensitive information in messages and attachments.

Entities and Entity Groups Overview

Before we go into detail, you must know the difference between entities and entity groups. Entities allow administrators to search for sensitive information in messages and attachments without the need to create complicated word lists or regular expressions (regex). Entity groups are a collection of entities aligned by category (e.g., PII, PHI or Financial). This allows administrators to search based on a subject area rather than listing individual entities to achieve the same goal.

How do Entities Work?

An entity consists of the elements listed below. Only when all of the criteria above are met is a match made.

      • A validator: This confirms that the structure of the content meets the defined standards for the item you are looking for. For example, if looking for credit cards, the content must contain four blocks of four numbers and a check digit within the specified range.
      • A regular expression: This is applied to the target content if the validator check passes. Should the validator check fail, the content checks stop.
      • A word list: This is used to limit the number of false positives encountered by matching keywords for the subject area. For example, credit card keywords are used when using credit card entities. This helps determine the context of the match and allows us to exclude a string of numbers that meet the credit card checks but which aren't credit card numbers.
Content Examination How do

Some entities or entity groups don't contain validators or regular expressions, as they don't relate to the subject area. For example, the ICD10Cm entity is just a list of medical conditions.

Example Content Examination Definition

Detailed instructions on configuring a Content Examination definition are covered in the Content Examination - Configuring page. However, before configuring a definition, it is vital you understand the information you are looking for and whether there are any conflicts with other data that could cause false positives. Take the following example:

You wish to hold all messages containing references to American Express credit card numbers. The "americanexpress" entity finds all these credit card numbers where they're found in the specified areas of an email (header, body, attachment).

1 detect americanexpress     

If a 15 American Express number is present in a message, that alone won't be enough for a match to be made. Instead the "americanexpress" entity performs the following:

  1. Possible matches are located using the entity's corresponding Regular Expressions.
  2. The possible matches are passed through an appropriate Luhn algorithm to reduce the number of inaccurate matches.
  3. Attempts to locate specific keywords within proximity of the matches found that provide context relating to credit cards.

To summarize, a content examination hit for an American Express credit card only occurs if:

      • There's a 15-digit number that matches the appropriate Luhn check.
      • A term such as "Credit Card" or "Amex" is found within 300 characters of the credit card number.

No Keywords Entity Operator

This operator disables the context keyword matching associated with many of the entities we support. Using this operator increases the likelihood of false positives occurring but simplifies whether or not a match is likely to be found. For example, using the NKW operator causes the checks for terms associated with social security numbers to be ignored, and the check will only look for a regular expression match:

1 detect SSN_NKW 

See the "Entity Examples" section of the Phrase Match Examples page for more information.

See Also...

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.