Step 1: Install Dependencies
Before starting, ensure you have the required libraries installed. Use the following command to install them:
pip install requests beautifulsoup4
Step 2: Import Necessary Libraries
import requests
from bs4 import BeautifulSoup
Step 3: Send a Request to the Website
Use the requests
library to fetch the webpage content.
url = "https://example.com" # Replace with the target URL
response = requests.get(url)
# Check if the request was successful
if response.status_code == 200:
print("Successfully fetched the webpage!")
else:
print("Failed to fetch the webpage. Status Code:", response.status_code)
Step 4: Parse the HTML Content
Pass the HTML content to BeautifulSoup for parsing.
soup = BeautifulSoup(response.text, "html.parser")
Step 5: Extract Specific Data
You can extract different elements using tags like <title>
, <h1>
, <p>
, etc.
Extract the Title of the Page
title = soup.title.text
print("Page Title:", title)
Extract All Headings (h1
Tags)
headings = soup.find_all("h1")
for h in headings:
print(h.text)
Extract All Paragraphs (p
Tags)
paragraphs = soup.find_all("p")
for p in paragraphs:
print(p.text)
Extract Links (a
Tags)
links = soup.find_all("a")
for link in links:
href = link.get("href")
print(href)
Step 6: Extract Data from a Specific Section
If you want to extract data from a particular div
or table
, use find()
or find_all()
with class or id attributes.
div_content = soup.find("div", class_="example-class")
print(div_content.text if div_content else "No content found!")
Step 7: Handle Websites with Dynamic Content
Some websites load content dynamically using JavaScript. In such cases, consider using Selenium for scraping.
pip install selenium
Conclusion
This tutorial covered the basics of web scraping using BeautifulSoup. If a website blocks your requests, try:
- Adding headers in the
requests.get()
call. - Using proxies or rotating user-agents.
Would you like a more advanced tutorial covering pagination, authentication, or API extraction? 🚀