Posted in python
1172
1:30 am, April 4, 2021
 

python extract title tag from url and html using regex

this will extract the title tag as text from the url and the title tag in the following python script

Python

import re
from urllib.request import urlopen
url = "http://olympus.realpython.org/profiles/dionysus"
page = urlopen(url)
html = page.read().decode("utf-8")
pattern = "<title.*?>.*?</title.*?>"
match_results = re.search(pattern, html, re.IGNORECASE)
title = match_results.group()
title = re.sub("<.*?>", "", title) # Remove HTML tags
print(title)

View Statistics
This Week
205
This Month
802
This Year
6

No Items Found.

Add Comment
Type in a Nick Name here
 
Search Code
Search Code by entering your search text above.
Welcome

This is my test area for webdev. I keep a collection of code here, mostly for my reference. Also if i find a good link, i usually add it here and then forget about it. more...

You could also follow me on twitter. I have a couple of youtube channels if you want to see some video related content. RuneScape 3, Minecraft and also a coding channel here Web Dev.

If you found something useful or like my work, you can buy me a coffee here. Mmm Coffee. ☕

❤️👩‍💻🎮

🪦 2000 - 16 Oct 2022 - Boots
Random Quote

“Make no mistake: This is not your diary. You are not letting it all hang out. You are picking and choosing every single word.”


Dani Shapiro
Random CSS Property

:past

The :past CSS pseudo-class selector is a time-dimensional pseudo-class that will match for any element which appears entirely before an element that matches :current. For example in a video with captions which are being displayed by WebVTT.
:past css reference