Creating 301 redirects between many pairs of URLs
I recently upgraded a website. The website consisted of static html documents and some simple php scripts. I transformed it into a Wordpress blog. In this process every url was changed. In order to keep the pagerank and not break any inlinks I created 301 redirects between them.
If there had been an obvious pattern between the old and new urls this would have been a simple task. For example lets say urls of the pattern:
http://www.website1.com/[pagename].html
should be changed into:
http://www.website2.com/[pagename]/
Then I could just have used this in my .htaccess
file on website1:
RedirectMatch 301 ^/(.*)\.html$ http://www.website2.com/$1/
But in this case there where no obvious patterns. Some of the old pages where converted to Wordpress pages, some to Wordpress posts and some to a Wordpress category with the old text as the description for the category. All this resulting in a wide variety of new urls.
I solved the situation like this:
First i used a sitemap generator to get a list of all the urls that I needed to rewrite. If you google for "sitemap generator" you will find one. Myself I used http://www.xml-sitemaps.com/. After your sitemap is generated choose the alternative "Download Sitemap in Text Format" and you will get a txt file with the urls. It might look like this:
urls.txt
http://www.website1.com/information-about-foo.html
http://www.website1.com/a-page-about-bar.htm
http://www.website1.com/index.php?pageid=foo
http://www.website1.com/index.php?pageid=foo&a=b
http://www.website1.com/index.php?pageid=bar
...
Now after each url add the corresponding new url you want to redirect to (separate with space). It might look like this:
urls.txt
http://www.website1.com/information-about-foo.html http://www.website2.com/information-about-foo/
http://www.website1.com/a-page-about-bar.htm http://www.website2.com/a-page-about-bar/
http://www.website1.com/index.php?pageid=foo http://www.website2.com/foo/
http://www.website1.com/index.php?pageid=foo&a=b http://www.website2.com/foo/?a=b
http://www.website1.com/index.php?pageid=bar http://www.website2.com/bar/
...
Next I created this python3 script to generate the content for my .htaccess
file:
fh = open('urls.txt', 'r')
lines = fh.read().split("\n")
print('RewriteEngine on')
print('')
for line in lines :
parts = line.split(" ")
from_url = parts[0]
to_url = parts[1]
if to_url.count('?') == 0 :
to_url += '?'
if from_url.count('?'):
print("RewriteCond %{QUERY_STRING} ^" + from_url.split('?')[1] +"$")
print("RewriteRule ^" + from_url.split('?')[0].replace('.', '\.') + "$ " + to_url + " [R=301,L]")
else:
print("RewriteRule ^" + from_url.replace('.', '\.') + "$ " + to_url + " [R=301,L]")
print('')