|
Extract
And Clean
What is Extract And Clean?
"Extract And Clean" is a tool to extract e-mail addresses from files and cleans
only syntactically correct e-mail addresses.
What are syntactically correct e-mail addresses?
-
A valid address contains only letters (a-z), numbers (0-9), hyphens ('-'), underscores
('_'), periods ('.') and only one 'at' sign ('@').
-
Valid address must begin with a letter or a number.
-
Valid address must not exceed the specified maximum address length. If none
specified, then max length is 45.
-
There must be at least one '.' character in the address.
-
There must be at least one character before the '.' and at least one character
after it.
-
A valid address must end with a character (a-z).
-
Address must have at least 2 characters before the '@' sign. For example, it will
kick out ones like 1@mail.ru.
-
In the case of AOL addresses, OILM will allow for AOL e-mail addresses that
contain spaces to the left of the '@' sign. Also a valid AOL screen name must
be between 3 and 16 characters to the left of the '@' sign, and must begin with
a letter. This conforms to the exact syntax for valid AOL addresses.
-
New! Valid address must not contain '-' after the '@' sign.
There are two modes:
-
Extract And Clean All Addresses From These Files And Folders. Input
files can be of any format - ASCII text, binary or whatever. Output file will
contain syntactically correct e-mail addresses, one e-mail address per line.
Output e-mail addresses cannot be longer than Maximum Address Length, which
defined in Options dialog box.
-
Clean Mail Lists. Extracts lines with
syntactically correct e-mail addresses and put them to the output file. Unlike Extract And
Clean All Addresses you can process multicolumn mail lists. Also in
this mode you can set Clean Options
to perform some formatting of the result list:
- Replace delimiters by TAB/COMMA useful when columns in some input
lists delimited by tab, and columns in the others delimited by comma, but you want
to have unified output lists, delimited by tab only or by comma only.
- Remove quotes, Remove leading and trailing spaces from
fields remove unnecessary characters (quotes and spaces) from fields. Remember,
other functions of list manager can be sensitive to existence of quotes or spaces.
For example, two emails "joe@aol.com" (with quotes) and joe@aol.com (without quotes)
will be different in comparison/sorting. So, for correct working your email lists
should not contain quotes and leading/trailing spaces in email fields.
- Move emails to the beginning, Remove empty fields,
Reorder/Remove fields allows you re-arrange columns in the lists
and remove unnecessary ones.
- Convert dates to system format (on the Date/Time Format
tab) option used to convert dates and times of different formats to unified format
(specified in system Regional Options). Enter numbers of date/time fields, delimited
by comma, in the edit box below the option. Some examples of date/time values that
can be converted:
2005-7-27 16:55
7.27.2005 20050120
Extract And Clean All Addresses From These Files And Folders
Input files and folders from which e-mail addresses will be extracted. You can simply
drag and drop files and folders from windows explorer into this list or press Add
Files.../Add Dir... buttons to open appropriate dialog box.
Output File Containing Clean Email Addresses
Specifies an output file in which will be kept e-mail addresses.
Sort (Output File)
When this option is turned on then output file will be sorted.
Remove Duplicates (Output File)
With this option you can remove duplicate e-mail addresses from output
file. This option is accessible only if sorting of output file is turned on.
Sort By Domain (Output File)
This option allows you to sort output file by domain. The Sort option
must be also turned on.
Reject Any Addresses Longer Than
You can tell the OILM to reject all addresses longer than a specified number of
characters by checking this option and selecting the maximum allowed address
length.
Allow Embedded Spaces In AOL Usernames
When this option is turned on then AOL addresses like this will be
allowed: bill gates@aol.com. The final result on your output file would have
the username portion of e-mail address without spaces. Final result of the
above example is billgates@aol.com.
No Duplicate Domains
Check this option if you need only one e-mail address from each domain
present in the output file.
Reject non-country domains with 2 or more dots and country domains with 3 or
more dots
You can remove emails of non-country domains with 2 or more dots after @, like xxx@yyy.zzz.com,
and emails of country domains with 3 or more dots, e.g. xxx@yyy.zzz.com.au. Press
Edit country domains button to edit the list of country domains.
Reject domains that start with numbers New!
Check this option to reject e-mail addresses of domains that start
with numbers.
Multi Column Support
This option is accessible only in Clean Mail Lists
mode and allows you to process multicolumn e-mail lists. See
What is Multi Column Support for details.
Save Rejected Addresses Into The Following File
You can specify a file in which you want to keep the rejected
addresses. Leave it blank to prevent keeping the rejected addresses.
|