14.2. Get news links from faculty webpages¶

Let’s say that you want to get the link to the first news article on your favorite umsi faculty’s webpages.

But clicking through to gather all those links would be a pain. Fortunately, we can do that task with BeautifulSoup!

Run the code below to see what it collects.

This code is made up of three plans. Click on each of the plans below to learn more about it.

Plan 3: Get a soup from multiple URLs
# Load libraries for web scraping
from bs4 import BeautifulSoup
import requests
# Get a soup from multiple URLs
base_url = 'https://web.archive.org/web/20230128074139/https://www.si.umich.edu/people/'
endings = ['barbara-ericson', 'steve-oney', 'paul-resnick']
for ending in endings:
    url = base_url + ending
    r = requests.get(url)
    soup = BeautifulSoup(r.content, 'html.parser')

Plan 4: Get info from a single tag    # Get first tag of a certain type from the soup
    tag = soup.find('a', class_='item-teaser--heading-link')
    # Get info from tag
    info = tag.get('href')

Plan 9: Print info    # Print the info
    print(info)