Today I wanted to download a course from an RSS feed. The media files (mp3) were attached to the RSS entries. So instead of downloading tens of files one by one, I wrote a script!
With the help of Alvin's post to parse RSS with Python, I attached a download method and the script was ready to do the work for me.
First, here is the script:
[rss_downloader.py]
You just call the script, pass RSS url, and the directory to save the files.
This will parse the RSS feed, print the available links with type "audio/mpeg" (you can change this to be a value passed by the user), and download them with a progress display
This it. But there is more...
After doing this, I realized that I could simply print the output, copy the links all together to my download manager as a batch download, and enjoy the features of my download manager. After all, I wanted to download the files, not make a full application. Dummy me :D
However you choose to go with the script, I hope you find it useful.
With the help of Alvin's post to parse RSS with Python, I attached a download method and the script was ready to do the work for me.
First, here is the script:
[rss_downloader.py]
#!/usr/bin/python
import feedparser
import sys
import urllib2
#
# Takes a url and a directory for saving the file. Directory must exist.
#
def download(url, dir_name):
file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(dir_name+'/'+file_name, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading File: %s (Size: %s Bytes)" % (file_name, file_size)
file_size_dl = 0
block_sz = 8192
while True:
buffer = u.read(block_sz)
if not buffer:
break
file_size_dl += len(buffer)
f.write(buffer)
status = r"%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
status = status + chr(8)*(len(status)+1)
print status,
f.close()
#
# Take url and directory parameters from user call
#
url = sys.argv[1]
dir_name = sys.argv[2]
#
# Get the feed data from the url
#
feed = feedparser.parse(url)
#
# Collect urls to download
#
urls_to_download = []
for entry in feed.entries:
links = entry.links
for link in links:
if link.type == u'audio/mpeg':
urls_to_download.append(link.href)
print("Files count: %s" % (len(urls_to_download)))
#
# Download files
#
for url in urls_to_download:
download(url, dir_name)
# print(url)
You just call the script, pass RSS url, and the directory to save the files.
python rss_downloader.py http://rss.dw.com/xml/DKpodcast_dwn1_en /home/madly/DeutschWarumNicht/serie1
This will parse the RSS feed, print the available links with type "audio/mpeg" (you can change this to be a value passed by the user), and download them with a progress display
This it. But there is more...
After doing this, I realized that I could simply print the output, copy the links all together to my download manager as a batch download, and enjoy the features of my download manager. After all, I wanted to download the files, not make a full application. Dummy me :D
However you choose to go with the script, I hope you find it useful.

No comments:
Post a Comment