Computer Science, asked by gunjan1612, 10 months ago

you are required to scrap data from the IMDb top 250 movies page. it should only have fields movie name, year and rating. ​

Answers

Answered by suniltty180
1

Answer:

Hi mate!

Here, we go:

Step 1:

www.imdb.com/chart/top”

Then, we have to import few libraries like ‘BeautifulSoup’ which is responsible for HTML parsing libraries, used mainly for web-scraping

>>>> from bs4 import BeautifulSoup

and for getting all URL information, we have to import ‘requests’ library:

>>>> import requests

create a variable ‘url’ which will contain imdb url:

>>>> url = ‘http://www.imdb.com/chart/top’

We have to make a requests to the url for accessing all the text information , and assign it to a variable called ‘resp’:

>>>> resp = requests.get(url)

for checking whether the request object was created or not,

>>>> resp

if response object got created, then we will get output like this :

[200] means that the request is completed successfully.

Now we will create an object ‘BeautifulSoup’ which will contain the information in text format:

>>>> soup = BeautifulSoup(resp.text)

to check whether, the ‘soup’ variable contained all the information in text format or not , just type ‘soup’ and you will get all the information in text:

>>>> soup

All information of URL

now we will go to URL and inspect the website and according to that , we will parse the data:

Inspect Window

We can see from the above image, whole body section lies under the class “lister-list”

We will create a variable to store the information of body section

>>>> llist = soup.find(‘tbody’, {‘class’:’lister-list’})

starts from <tbody>

this will contains the information of the body section

now we want all the information of <tr> section,

Step-10:

all the <tr> information will comes under this

from <tr> to </tr>

now we want specific information , we want to fetch the title column information:

>>>> for tr in trs:

Not yet the desired output

still, we are getting some html tags, we want in the simplest form.For that we have to refine it, for fetching exactly the text-format only:

>>>> for tr in trs:

this will give the desired output.

Final Output

Thank you, if you have reached this point of article. Its my first programming post, feel free to share your feedback

Similar questions