Web Scraping - To extract key data from Augustana Athletics Website

The development process for our Augustana Athletics application requires scraping huge amounts of data from the Augustana Athletics website. For the purpose, we have decided to use web scraping (also termed as screen scraping, web harvesting & web data extraction). Web scraping is a technique used to extract large amounts of data from websites, for example in our case it will be the Augustana Athletics official website.

As part of the process, the information is usually collected and then exported to a format that is more useful to the user. Web scraping can be done manually however there are many software tools in the market which are used for the purpose.

During our tech demo about web scraping, we demonstrated to the class how to implement JSOUP open-source java library in android application to help with web scraping. JSOUP library provides a convenient application programming interface to extract and manipulate data. For the purpose, DOM, CSS, and jquery-like methods are used. Therefore, it allows us to scrape and parse HTML from a URL, file, or string, etc.

As part of our presentation, we demonstrated how to create 3 buttons, each of which will perform different tasks; for example, show the website title, website description and website logo. The presentation included a code run over the java code which is used to retrieve the key piece of data for each button.

We plan to use the same scraping technique to retrieve data such as schedule, roster and scores for each game. All of this data is essential in the development process of our Augustana Athletics application.



Comments