Extract Info From Web Pages 02.06.08
*** Bidding time is actually two days for this project ***
Looking for a php script to do thje following…
FIRST
- Access http://www.proxz.com/proxy_list_high_anonymous_0_ext.html
- Get the URLs of pages next to the word “Proxylist”
eg… “Proxylist 1:2::..” you would need to get the URLs of TWO pages - but this number may vary.
- Now access each URL and extract the proxy addresses and write them in the format “200.132.0.70:3127″ to a file called “proxies.txt”
(the list of proxy addresses is HEX encoded but easy to unscramble)
SECOND
- Extract proxy info from pages http://www.samair.ru/proxy/proxy-01.htm to http://www.samair.ru/proxy/proxy-20.htm
- append amy IP address whose description contains the words “high-anonymous” (including “high-anonymous proxy server” etc) to proxies.txt
(the port numbers use javascript to write them but the decoding is obvious)
THIRD
- Append the the IP addresses at http://www.proxylists.net/http_highanon.txt to “proxies.txt”
FINALLY
check “proxies.txt” and remove any duplicate entries.



