Freelance Projects

All freelance projects at One Location


Scraper / Web Bot / Data Miner  12.10.08

I used 3 descriptions in my title because I’m not sure which one most accurately describes what I’m looking for. Please don’t feel limited to the “programming options” I listed - I am open to any way of doing this project.

I think this is a fairly simple project. I will be VERY detailed about what I want here so you understand what you are bidding on. Please don’t allow all of my words to make you think the project is complicated :)
I would like you to create a program that will run through the pages of a certain popular web forum. The forum has many different sections. For each section I am interested in, I would like the program to evaluate every thread/post of that section which has been made in the last X number of days.

The bot will be looking for a list of words and phrases of my choosing. All I want the bot to do is count how many times EACH word and phrase was used. I would like the ability to create the words/phrases myself and add an unlimited number of words/phrases to it over time. All words/phrases will belong to one of two groups. Let’s call them Group A and Group B. I’ll explain why a little later.

I would also like the program to perform some simple math for me. Again - it’s only counting how many times a word or phrase appears. I would like it to record that number each time a run is made. The math would be taking all of those numbers together and computing what the AVERAGE number of times that word appears. Then take the new number and have it tell me what percentage increase or decrease the word/phrase has this time.

As an example let’s say I am looking for the word “cat” in a section of the forum. Let’s also say I have previously had the bot search for the word “cat” three other times. On those runs the word was found 3, 8 and 1 time. Since 3 + 8 + 1 divided by 3 (because the search has been made 3 times) = 4 the current run considers 4 to be the average number of times the word appears. On the current run “cat” is found 2 times. This would mean the frequency was 50% lower than average. Let’s call this the “percentage difference”.

Here is where Group A and Group B comes into the mix. Taking all of the “percentage difference” calculated as described above, make an average of all of those averages. It would be the averages for all of the words in Group A and also an average for all of the percentages in Group B. This can be called the “overall percentage difference”.

This process will be repeated for all of the many sections of the forum that I point the program to (I don’t want all sections to be used because I think that would be too massive and time consuming).

The same words and phrases will be used for all sections. Ultimately I want to find which sections of the forum are currently showing the greatest “overall percentage difference” for Group A and also Group B.

In the end I would like the data put into a spreadsheet or better yet on a website where I can easily sort the data in different ways.

I have a computer I can dedicate exclusively to this program OR I could run it off of a server or hosting of some kind.

BEFORE BIDDING PLEASE NOTE: price matters very much to me. Again I don’t think this is a super complicated project. I want a qualified programmer but I am unable to overspend on this project. Please be fair in your bidding because I always take the lowest qualified bid.

Thanks for looking at my project. I would be happy to further explain things to you via private messages. Please ask any questions you may have.



If you liked this project, make sure you
Subscribe to Freelance Projects RSS feed!




    • Your Ads Here
    • Your Ads Here