Scrape Website For Data Using Python’s BeautifulSoup
I was designing a website — online shopping platform and i wanted to use data for demo purposes. At first i thought of hard coding the details, but that is just too much work.
So i thought why not try to be creative about it and I came out with this piece of code
For starters, create a file called scrape.py and write the code below
— —
Now, you can use any website you want,
We make a request to the site of choice in this case asterixkenya.co.ke
then download the page content using the request module
We define getUrls() method, which uses the downloaded page content and returns an array(List in python) of links pointing to products on the page
BeautifullSoup module parses the html content and through use of regular expressions allows you to find any element on the page
The fetch_data() method that takes visits the product url and gets the piece of info that you want from the page.
You do this using a class, id or any html selectors you want but, to get the specific item, you need to use a unique html selector
For example:
visit https://asterixkenya.co.ke/product/samsung-galaxy-a10s-6-2-32gb-2gb-dual-sim/
For instance let’s get the title of the product samsung-galaxy-a10s, hover the image and right click and select inspect element or press ctrl+shift + I and see that the class of the title is “product_title” . Hence the implementation above
Finally , we return a dictionary of the items we need from that page
Next we define the init() method starts the download process, Here we see three new methods, which i am going to cover below.
To speed up the process, i used a separate thread to follow every five links
The dowloadThread() method calls the fetch_data() method to to follow a product url and return a dictionary which is added to the variable that holds the all our data — self.data
rangeCheck() allows to handle the last batch
joinThreads() method waits for all the child threads to finish and links them back to single thread and Finally call the write_to _file() method which converts our data to json format and writes it to a file data.json.
To initialize the script the init() method is called here in the scraper class contructor
To start the script we call call the instance of the scraper class.
And that takes care of it.
—
Play around with the code and see where it leads and let me know in the comments below.