The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
Need Help Please - automatic crawling of website
Posted by: HelpPls
Date: November 16, 2013 07:20AM

I am going to lay out the best I can exactly what I need and hopefully someone can tell me some software to use or how to automate this process. If it is software here is what I need it to do.

1 need to log in to a secured site (yes I have a subscription to this site)
2 the next page is a list of states. need to click on state
3 the next page is a list of businesses. need to click on each business
4 the next (sometimes several pages depending on size of company) page is a list containing contact, department, and email.
need to copy this page or pages and have them placed in an excel document.

Any idea's on how to automate or software to use will help.
Thank you



Edited 1 time(s). Last edit at 11/16/2013 07:44AM by webmasterphilfv.

Options: ReplyQuote
Re: Need Help Please
Date: November 16, 2013 07:44AM

This is not related to "data mining". I think that you just need to use a web crawler. For a dynamic website with forms, you would probably need to write you own web crawler if you know how to do programming.

Besides, you should read the license agreement for using the website. I doubt that a subscription based website will allow you to automatically download everything. There is a high chance that they have some way to detect if someone access too many pages within a short amount of time and they may ban your account and/or your ip address if you do it.



Edited 4 time(s). Last edit at 11/16/2013 07:46AM by webmasterphilfv.

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.