Scrape Website For Data Using Python’s BeautifulSoup

brian kiplangat
3 min readNov 28, 2020

I was designing a website — online shopping platform and i wanted to use data for demo purposes. At first i thought of hard coding the details, but that is just too much work.

So i thought why not try to be creative about it and I came out with this piece of code

For starters, create a file called scrape.py and write the code below

— —

Now, you can use any website you want,

We make a request to the site of choice in this case asterixkenya.co.ke

then download the page content using the request module

We define getUrls() method, which uses the downloaded page content and returns an array(List in python) of links pointing to products on the page

BeautifullSoup module parses the html content and through use of regular expressions allows you to find any element on the page

The fetch_data() method that takes visits the product url and gets the piece of info that you want from the page.

You do this using a class, id or any html selectors you want but, to get the specific item, you need to use a unique html selector

For example:

visit https://asterixkenya.co.ke/product/samsung-galaxy-a10s-6-2-32gb-2gb-dual-sim/

For instance let’s get the title of the product samsung-galaxy-a10s, hover the image and right click and select inspect element or press ctrl+shift + I and see that the class of the title is “product_title” . Hence the implementation above

Finally , we return a dictionary of the items we need from that page

Next we define the init() method starts the download process, Here we see three new methods, which i am going to cover below.

To speed up the process, i used a separate thread to follow every five links

The dowloadThread() method calls the fetch_data() method to to follow a product url and return a dictionary which is added to the variable that holds the all our data — self.data

rangeCheck() allows to handle the last batch

joinThreads() method waits for all the child threads to finish and links them back to single thread and Finally call the write_to _file() method which converts our data to json format and writes it to a file data.json.

To initialize the script the init() method is called here in the scraper class contructor

To start the script we call call the instance of the scraper class.

And that takes care of it.

Play around with the code and see where it leads and let me know in the comments below.

--

--

brian kiplangat
0 Followers

I am a self driven software developer with a vision, I get things done