This article contains information on using content expressions in Content Examination definitions for Data Leak Prevention (DLP), detailing search parameters, scoring, and handling incorrect matches in spreadsheets.
A content expression is the combination of functions, text, and phrases used in a Content Examination Definition to assist with Data Leak Prevention (DLP). A content expression could be a word, phrase, regular expression, or hash algorithm matching a specific document. The content expression is entered into the content examination definition, which scans messages looking for any of the search criteria to match.
These search terms are used to prevent accidental or malicious data loss through company email. Depending on your requirements, some or all of the examples below may be applicable to make your email system comply with your company's security policy.
Using Content Expressions
Content expressions are entered into the "Word/Phrase Match List" field in a Content Examination definition. The activation score is a required field and is the total score used to trigger the definition should a match be found in a message.
It is highly recommended to test Content Examination definitions to ensure that the correct results are achieved as required.
| Search Parameters (*) | Example |
|---|---|
| weight [:maxscore] <search text> | 4 "company Confidential" |
| weight [:maxscore] required <search text> | 1 required "Project X" |
| weight [:maxscore] exclude <search text> | 0 exclude "Tax exemption" |
| weight [:maxscore] regex <regular expression> | 10 regex 4[0-9]{12}(?:[0-9]{3}? |
| weight [:maxscore] regex, cardnumber <regular expression> | 1 regex,cardnumber 6(?<=\b6)(767|334)(?!\n\t)(\d{12,15}|[\d- ]{16,19})\b |
| weight [:maxscore] hash <MD5#> | 1 hash 9EBD30E761ED4FF770A90DDBD5CB4190 Confidential.PDF |
Search Parameter Notation
| Search Parameter Type | Example Parameters | Description |
|---|---|---|
| Mandatory | weight required, exclude regex hash | One or more of these parameters must be included in the search parameters, according to the type of content expression that is being created. At a minimum, the weight must be defined. |
| [Optional] | [:maxscore] cardnumber | Parameters in square brackets [ ] are optional. "cardnumber" invokes additional scanning of the credit card through a Luhn algorithm to determine if the sequence of numbers is a valid credit card number. |
| <To Be Modified> | <search text> <regular expression> <MD5#> | Parameter in angled brackets <x> must be modified to build the required content expression. |
Incorrect Content Matching When Using Spreadsheets
Occasionally an attachment is blocked because of a match (usually numeric) although the content does not appear to be in the spreadsheet. This is a result of some document types (e.g., Microsoft Excel) that have internal formatting features. These can cause Mimecast to match incorrectly. Examples include:
-
-
- Spreadsheet columns that have an internal numbering scheme (invisible to the user), which when analyzed appear to be numeric content.
- Dates that are stored internally in a long integer format.
-
This internal numbering can be mistaken for Social Security Numbers, Credit Card numbers, etc. by the Mimecast content text analyzer. Unfortunately, there is no automatic way to avoid this, and the incorrectly held attachment must be released manually.
The Content Examination Word/Phrase Match List Examples are listed in the following pages:
Comments
Please sign in to leave a comment.