you are required to scrap data from the IMDb top 250 movies page. it should only have fields movie name, year and rating.
Answers
Answer:
Hi mate!
Here, we go:
Step 1:
www.imdb.com/chart/top”
Then, we have to import few libraries like ‘BeautifulSoup’ which is responsible for HTML parsing libraries, used mainly for web-scraping
>>>> from bs4 import BeautifulSoup
and for getting all URL information, we have to import ‘requests’ library:
>>>> import requests
create a variable ‘url’ which will contain imdb url:
>>>> url = ‘http://www.imdb.com/chart/top’
We have to make a requests to the url for accessing all the text information , and assign it to a variable called ‘resp’:
>>>> resp = requests.get(url)
for checking whether the request object was created or not,
>>>> resp
if response object got created, then we will get output like this :
[200] means that the request is completed successfully.
Now we will create an object ‘BeautifulSoup’ which will contain the information in text format:
>>>> soup = BeautifulSoup(resp.text)
to check whether, the ‘soup’ variable contained all the information in text format or not , just type ‘soup’ and you will get all the information in text:
>>>> soup
All information of URL
now we will go to URL and inspect the website and according to that , we will parse the data:
Inspect Window
We can see from the above image, whole body section lies under the class “lister-list”
We will create a variable to store the information of body section
>>>> llist = soup.find(‘tbody’, {‘class’:’lister-list’})
starts from <tbody>
this will contains the information of the body section
now we want all the information of <tr> section,
Step-10:
all the <tr> information will comes under this
from <tr> to </tr>
now we want specific information , we want to fetch the title column information:
>>>> for tr in trs:
Not yet the desired output
still, we are getting some html tags, we want in the simplest form.For that we have to refine it, for fetching exactly the text-format only:
>>>> for tr in trs:
this will give the desired output.
Final Output
Thank you, if you have reached this point of article. Its my first programming post, feel free to share your feedback