Friday, December 18, 2015

Download RSS Media Files using Python

Today I wanted to download a course from an RSS feed. The media files (mp3) were attached to the RSS entries. So instead of downloading tens of files one by one, I wrote a script!

With the help of Alvin's post to parse RSS with Python, I attached a download method and the script was ready to do the work for me.

First, here is the script:
[rss_downloader.py]
#!/usr/bin/python

import feedparser
import sys
import urllib2

#
# Takes a url and a directory for saving the file. Directory must exist.
#
def download(url, dir_name):
    file_name = url.split('/')[-1]
    u = urllib2.urlopen(url)
    f = open(dir_name+'/'+file_name, 'wb')
    meta = u.info()
    file_size = int(meta.getheaders("Content-Length")[0])
    print "Downloading File: %s (Size: %s Bytes)" % (file_name, file_size)

    file_size_dl = 0
    block_sz = 8192
    while True:
        buffer = u.read(block_sz)
        if not buffer:
            break

        file_size_dl += len(buffer)
        f.write(buffer)
        status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
        status = status + chr(8)*(len(status)+1)
        print status,

    f.close()

#
# Take url and directory parameters from user call
#
url = sys.argv[1]
dir_name = sys.argv[2]

#
# Get the feed data from the url
#
feed = feedparser.parse(url)

#
# Collect urls to download
#
urls_to_download = []
for entry in feed.entries:
    links = entry.links
    for link in links:
        if link.type == u'audio/mpeg':
            urls_to_download.append(link.href)

print("Files count: %s" % (len(urls_to_download)))

#
# Download files
#
for url in urls_to_download:
    download(url, dir_name)
    # print(url)


You just call the script, pass RSS url, and the directory to save the files.
python rss_downloader.py http://rss.dw.com/xml/DKpodcast_dwn1_en /home/madly/DeutschWarumNicht/serie1

This will parse the RSS feed, print the available links with type "audio/mpeg" (you can change this to be a value passed by the user), and download them with a progress display
This it. But there is more...

After doing this, I realized that I could simply print the output, copy the links all together to my download manager as a batch download, and enjoy the features of my download manager. After all, I wanted to download the files, not make a full application. Dummy me :D

However you choose to go with the script, I hope you find it useful.