|
Contacts
Blog
Twitter
Programs
PAD Files
Online Tools
Proxylists
Feedback
Write...
| |

Download RegExp Extractor
RegExp Extractor 1.7 (build 65, 2011-07-12)
OS: Windows NT, 2000, XP, Vista, 7
Buy RegExp Extractor now
Price: $50
|
|
RegExp Extractor
RegExp Extractor is an utility designed to extract various data from text files and logs using conditions and rules written using regular expressions.
It is very fast and can process huge files.
Howto Use
To use this program you need to know regular expressions (regexp).
Source file(s) - file that you want to extract data from. You can use mask here, ex. c:\temp\*.txt
Output file - output file for the extracted data.
Output dir - RegExp Extractor can produce several output files. This option allows to define destination folder for them.
Save other lines to file - save the lines, that don't match any regular expression, here.
Conditions/Rules Tabs
Each tab contains set of conditions and rules to extract data.
For ex., "emails" tab contains conditions and rules to extract emails, "url-domains" tab contains
conditions and rules to extract domains from the urls.
To add a new tab use [+] button below the tabs, to remove an existing tab use [-] button.
When you press "Start" button, RegExp Extractor will extract data from source file(s) using conditions and rules
from the active tab.
Each set of conditions & rules has the Title (name of the tab).
Extract Conditions
Each line contains the regular expression with the name. Conditions are used in Extract Rules.
Extract Rules
Each line contains the rule - what data to extract.
Example 1.
Condition: email=/[a-z0-9][a-z0-9.-]+[a-z0-9]@[a-z0-9][a-z0-9.-]+[a-z0-9]/
Rule: email:$0
In this example:
email is the name of the used condition.
/[a-z0-9][a-z0-9.-]+[a-z0-9]@[a-z0-9][a-z0-9.-]+[a-z0-9]/ is the regular expression to extract emails.
$0 specifies that we need to extract all sub-strings of the source line, that match the condition.
For our example it is email.
Example 2.
Condition: url-domain=/https?://([a-z0-9][a-z0-9.-]+[a-z0-9])|(www\.[a-z0-9.-]+[a-z0-9])/i
Rule: url-domain:$1$2
url-domain is the name of the used condition.
/https?://([a-z0-9][a-z0-9.-]+[a-z0-9])|(www\.[a-z0-9.-]+[a-z0-9])/i is the regular expression to extract urls.
$1$2 specifies to extract the first ([a-z0-9][a-z0-9.-]+[a-z0-9]) and the second (www\.[a-z0-9.-]+[a-z0-9])
groups from sub-strings that match regular expression.
Also you can use another characters in the rules to produce result lines, for ex.: email:The email is $0
Result lines will look like this:
The email is email1@domain1.com
The email is email2@domain2.com
Separate by conditions
This option allows you to save lines that match different conditions into different files in the output folder.
See Example below.
Example 3.
Separate by conditions = On
Conditions
sent-ok=/sent ok/i
blocked=/blocked/i
http=/(https?://)|(www\.)[a-z0-9.-]+//
err=/(-ERR \[[0-9]{3}\] : ).+ : (.+)/
Rules
sent-ok!:$L
blocked!^err:$L
http!^err:$L
err!:$L
This example demonstrates how to save all lines that have sent ok sub-sting to sent-ok.txt,
lines that have blocked sub-string AND don't match the err condition to blocked.txt,
lines that have urls (that match http condition) AND don't match the err condition to http.txt,
lines that match err condition to err.txt.
Sign ! after the name of the condition in rule expression means that RegExp Extractor will stop processing
rules if the line matches the condition from this rule. If we omit ! in our example then RegExp Extractor will
save the line sent ok: blocked to the both files: sent-ok.txt and blocked.txt.
^ in blocked!^err means that the line should match the condition blocked and match the condition err.
Also you can use ~ sign that means that the line SHOULD NOT match the condition after that sign. Example:
blocked!~sent-ok:$L
$L means taht you need to extract WHOLE the line. Not only the sub-string that matches the regular expression.
Example 4.
Separate by conditions = Off
Conditions
http=/https?://([a-z0-9][a-z0-9.-]+[a-z0-9])|(www\.[a-z0-9.-]+[a-z0-9])/i
Rules
http>>$1$2.txt:$L
In this rule we specified the output file name $1$2.txt where $1$2 is the domain of the extracted url.
This example demonstrates how to separate lines by domain.
|
|
Opt-In List Manager  Email list management program. It is specially designed to provide an efficient way of processing huge email lists.
Web Emails Checker  Hotmail and Yahoo maillist verifier. It checks the exisence of the given Hotmail and Yahoo emails.
Web Proxy Checker 
Free and fast proxy checking software. Supports SOCKS4/SOCKS5/HTTP/HTTPS proxies with and without authentication. Check for connect to host or load URL. Multithreaded. Handle redirects. Can download proxy list from the given URL. Built-in proxy distribution web server.
Web Searcher 
Web scraping tool. Allows to search in Google and Bing for keywords and extract various data from web pages and sites.
Web Image Uploader 
Web Image Uploader is a tool designed for easy and fast uploading images to an image hosting services.
Web URL Shortener 
Web URL Shortener is a tool that allows to create short URLs that can be easily shared, tweeted, or emailed to friends.
Thumbnail Grabber 
Free utility to create thumbnail screenshots of web pages in JPEG format.
YouTube Flash Grabber 
Free program to download flash videos (FLV) from YouTube to hard drive.
Opt-In List Extractor 
A simple but powerful utility to extract and combine multi column email lists.
RAS Dialer 
Free dialer for Windows. Features: auto dial after start, re-dialing, minimization to tray.
Opt-In Mail 
Opt-In Tunnel 
Simple TCP port redirector. This tool accepts connections on a particular TCP port and creates a tunnel to the specified web- or mail-server.
WHOIS utility
Command line utility that performs whois lookup for domain name or IP address.
|