After a few months of blogging photos from my life events, I realised that I should have kept a local version (dummy me). I kept everything else, so why did I forget this time?!
Anyways, I realised that this is not an easy job if you are free hosting on Wordpress.com. All I could have is either an XML with posts and references to images, or a full migration to another Wordpress blog.
So I rolled up my sleeves and reused a python script I used before.
Here is the script:
I simply call the script and pass the backup XML file I've just downloaded from my Wordpress.com admin panel (/wp-admin) and the path to the folder where I want to save the images.
One more update can be to use a better parsing technique and use an element called wp:post_id to sort the images into folders by their post. But I am satisfied with the current result. I can do the rest manually as it will be faster in my case :)
Anyways, I realised that this is not an easy job if you are free hosting on Wordpress.com. All I could have is either an XML with posts and references to images, or a full migration to another Wordpress blog.
So I rolled up my sleeves and reused a python script I used before.
Here is the script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | #!/usr/bin/python # -*- coding: utf-8 -*- # This script parses a wordpress backup XML and downloads the contained media files. # Example: # python wordpress-media-downloader.py /Users/madly/Downloads/mycoolblog.wordpress.2016-05-02.xml /Users/madly/Downloads/media-library import sys import urllib2 import re # # Takes a url and a directory for saving the file. Directory must exist. # def download(url, dir_name): file_name = url.split( '/' )[ - 1 ] u = urllib2.urlopen(url) f = open (dir_name + '/' + file_name, 'wb' ) meta = u.info() file_size = int (meta.getheaders( "Content-Length" )[ 0 ]) print "Downloading File: %s (Size: %s Bytes)" % (file_name, file_size) file_size_dl = 0 block_sz = 8192 while True : buffer = u.read(block_sz) if not buffer : break file_size_dl + = len ( buffer ) f.write( buffer ) status = r "%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size) status = status + chr ( 8 ) * ( len (status) + 1 ) print status, f.close() # # Take file name and directory parameters from the user call # file_name = sys.argv[ 1 ] dir_name = sys.argv[ 2 ] img_regex = '(http:\/\/.*files\.wordpress\.com.*\.(?:jpeg|jpg|png))' # # Get the images urls from the xml file # urls_to_download = [] file = open (file_name, "r" ) for line in file .readlines(): m = re.search(img_regex, line) if m: img_url = m.group( 1 ) urls_to_download.append(img_url) urls_count = len (urls_to_download) print ( "Images to download: %s images." % (urls_count)) # # Download images # i = 0 for url in urls_to_download: i + = 1 print ( "File %s/%s" % (i, urls_count)) download(url, dir_name) # print(url) # use this line instead of download() if you want to export the output to a download manager |
I simply call the script and pass the backup XML file I've just downloaded from my Wordpress.com admin panel (/wp-admin) and the path to the folder where I want to save the images.
One more update can be to use a better parsing technique and use an element called wp:post_id to sort the images into folders by their post. But I am satisfied with the current result. I can do the rest manually as it will be faster in my case :)