Web Content Extractor Demo 2
Articles,  Blog

Web Content Extractor Demo 2

This demo shows you how to extract data from listing pages using Web Content Extractor. First, click the “New Project” button to create a new project. During the first step of the “New Project” wizard, enter the web address from which the program will start the crawling process. Then identify the links that the program should follow. While basic rules allow you to identify links by the links’ position on the page, advanced rules allow you to identify links by URL patterns. If the links do not change position on any pages of the web site, try using the basic rules. Click the “+” button. Wait until the page is loaded and click on the “Next” link. Then you need to create an extraction pattern. Click the “Define…” button. An extraction pattern is a set of data fields that define the positions of text and images on the web page. To add new data fields, click the “+” button. Wait until the page is loaded and click on the text or image you need to extract. The program defines the HTML path of the element that contains the title text and displays the “New Data Field” window, which allows you to specify the other parameters of the new data field. You can create other data fields in a similar fashion. Once all the data fields are created. Click OK. In the preview window you can see the selected data fields on the web page. And you can see the extracted text. Once the extraction pattern is created. Click OK. Enter the name of the project and click “Finish”. To start the extraction process, click “Start”. The program starts crawling from the starting page, follows the specified links and extracts data using the extraction pattern. Thank you for your attention.


Leave a Reply

Your email address will not be published. Required fields are marked *