After a few months of blogging photos from my life events, I realised that I should have kept a local version (dummy me). I kept everything else, so why did I forget this time?!
Anyways, I realised that this is not an easy job if you are free hosting on Wordpress.com. All I could have is either an XML with posts and references to images, or a full migration to another Wordpress blog.
So I rolled up my sleeves and reused a python script I used before.
Here is the script:
I simply call the script and pass the backup XML file I've just downloaded from my Wordpress.com admin panel (/wp-admin) and the path to the folder where I want to save the images.
One more update can be to use a better parsing technique and use an element called wp:post_id to sort the images into folders by their post. But I am satisfied with the current result. I can do the rest manually as it will be faster in my case :)
Anyways, I realised that this is not an easy job if you are free hosting on Wordpress.com. All I could have is either an XML with posts and references to images, or a full migration to another Wordpress blog.
So I rolled up my sleeves and reused a python script I used before.
Here is the script:
#!/usr/bin/python
# -*- coding: utf-8 -*-
# This script parses a wordpress backup XML and downloads the contained media files.
# Example:
# python wordpress-media-downloader.py /Users/madly/Downloads/mycoolblog.wordpress.2016-05-02.xml /Users/madly/Downloads/media-library
import sys
import urllib2
import re
#
# Takes a url and a directory for saving the file. Directory must exist.
#
def download(url, dir_name):
file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(dir_name+'/'+file_name, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading File: %s (Size: %s Bytes)" % (file_name, file_size)
file_size_dl = 0
block_sz = 8192
while True:
buffer = u.read(block_sz)
if not buffer:
break
file_size_dl += len(buffer)
f.write(buffer)
status = r"%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
status = status + chr(8)*(len(status)+1)
print status,
f.close()
#
# Take file name and directory parameters from the user call
#
file_name = sys.argv[1]
dir_name = sys.argv[2]
img_regex = '(http:\/\/.*files\.wordpress\.com.*\.(?:jpeg|jpg|png))'
#
# Get the images urls from the xml file
#
urls_to_download = []
file = open(file_name, "r")
for line in file.readlines():
m = re.search(img_regex, line)
if m:
img_url = m.group(1)
urls_to_download.append(img_url)
urls_count = len(urls_to_download)
print("Images to download: %s images." % (urls_count))
#
# Download images
#
i = 0
for url in urls_to_download:
i += 1
print("File %s/%s" % (i, urls_count))
download(url, dir_name)
# print(url) # use this line instead of download() if you want to export the output to a download manager
I simply call the script and pass the backup XML file I've just downloaded from my Wordpress.com admin panel (/wp-admin) and the path to the folder where I want to save the images.
One more update can be to use a better parsing technique and use an element called wp:post_id to sort the images into folders by their post. But I am satisfied with the current result. I can do the rest manually as it will be faster in my case :)














