Friday, January 31, 2020

Auto-migrate your Bitbucket hg repos to git - in 3 (5) easy steps

Better be at peace with your version control system!

So Bitbucket is sunsetting Mercurial support, and you are wondering what to do with all those hg repos lying in your Bitbucket account? Relax. It is pretty easy to migrate them to git - and get them back in Bitbucket in no time.

Automating hg-to-git migration

There are probably dozens of blogs and articles out there, explaining how to import a Bitbucket hg repo into GitHub, and then back to Bitbucket as git. The only difference here is, that we would be using some automation here - API calls and whatnot. This could come handy if you are dealing with hundreds of repos - jumping back and forth to do the job manually might not be a pleasant experience after all.

In this example, we will be importing Mercurial repos available in a BB organization account (via a user that has been granted (at least) read access to them), and transferring the converted Git repos back to the same organization.

First, the bad news

Before jumping in, a few things you should know:

  • This guide uses basic auth for authorizing to BB and GH. These API calls may work with OAuth tokens or other forms of "standard" or "secure" auth - as dictated by each service; however, some admin operations being used here, do require admin credentials. Normal OAuth tokens don't usually provide these scopes; so in a nutshell, if you decide to go without basic auth, YMMV.
  • [Update] Very recently, I got a warning email from GitHub, mentioning that they are soon removing basic auth support - so by the time you read this, none of it might actually work (although techniques like Personal Access Tokens might still be worth a shot).
  • APIs may have throttling; so if you're in big business, you may have to batch these calls into hundreds, thousands, whatever - less than 100, doesn't seem to cause issues though.
  • The bonus steps (ownership transfer and permission updates) need to be run from an account with admin access - which means that user would need to provide his username and password to the script. While there are tricks to prevent the shell from remembering an entered command - and hence the credentials - you may need to go manual if your admin-colleague is too skeptical about the whole thing.
  • You may be in complete violation of GitHub's terms of service - considering you're using them as only a stepping stone. Figure it out on your own. You have been warned.

Consider yourself warned.

All right, let's get to work!

hg-to-git migration: the plan

  1. import to GitHub, wait for completion
  2. import back to Bitbucket, wait for completion
  3. (Bonus!) transfer to appropriate party/organization
  4. (Bonus!) add necessary user/team permissions
  5. clean up

Of course we could have stopped at 1, and ditched BB once and for all; but probably your BB repo used to be private, and GH won't let you keep it that way - unless it is on a personal account. And BB has the cheaper team plans, after all.

Step by step

Step 0: some handy configz

I'm gonna use Python, so first let's start by defining some configs and utility methods in a configz.py.

As you may notice, I will be defining several environment variables, so that I (and you) can avoid hard-coding usernames and passwords in code - even temporarily. When invoking, be sure to define the environment variables; and at the same time, be smart enough not to allow them to seep into your shell command history - perhaps by prefixing the whole thing with some whitespace ( ):

For example, step 1 could be executed as:

<whitespace>GH_USER=janakaud GH_PASS=naaah BB_ORG=acmeandco BB_USER=janakaud BB_PASS=NAAAH python gh-import.py

All right, back to configz.py:

from base64 import b64encode
from os import environ
import requests

# we'll use environment variables to fetch get auth details

# GitHub username/password
gh_user = environ["GH_USER"]
gh_pass = environ["GH_PASS"]

# Bitbucket organization name, username, password and user ID
bb_org  = environ["BB_ORG"]
bb_user = environ["BB_USER"]
bb_pass = environ["BB_PASS"]

# find the user ID by checking the "id" field on "bb-bootstrap" HTML meta-tag,
# in any Bitbucket webpage after sign-in:
# https://codereview.stackexchange.com/questions/41647/script-to-import-repositories-to-bitbucket#comment-454978
bb_uid  = environ["BB_UID"]

repos = [
	# add your list of BB repo names for migration
	# you could grab them from the https://api.bitbucket.org/2.0/repositories/$BB_ORG API output

	"mercurial-ftw",
	"i-hate-git"
]

gh_auth = "Basic %s" % b64encode("%s:%s" % (gh_user, gh_pass))
bb_auth = "Basic %s" % b64encode("%s:%s" % (bb_user, bb_pass))

s = requests.Session()

def call(url, headers, data, method="POST"):
	res = s.request(method, url, headers=headers, data=data, allow_redirects=False)
	print(res.status_code)
	print(res.headers)
	print(res.text)
	return res

Step 1: import to GitHub

GitHub. Our stepping stone. The poor octocat.

GitHub does have a UI page for this, but if you got a lot, using the source imports API may be faster - and cooler.

You need to create a repo first, and then run the import by providing your BB credentials:

from configz import *
import json

for r in repos:
	print("\n" + r)

	call("https://api.github.com/user/repos", {
		"Content-Type": "application/json",
		"Authorization": gh_auth
	}, json.dumps({
	  "name": r,
	  "private": "true"
	}))

	call("https://api.github.com/repos/%s/%s/import" % (gh_user, r), {
		"Accept": "application/vnd.github.barred-rock-preview",
		"Content-Type": "application/json",
		"Authorization": gh_auth
	}, json.dumps({
	  "vcs": "mercurial",
	  "vcs_url": "https://bitbucket.org/%s/%s" % (bb_org, r),
	  "vcs_username": bb_user,
	  "vcs_password": bb_pass
	}), "PUT")

The API even allows you to check the status of each import.

But GitHub is kind enough to send you an email notification for each import completion status. So it might just be faster to filter your mailbox for subject "Import to ..." from noreply@github.com, count the successes and failures, and take any further steps.

Yeah, this time we're betraying a good beast. 😟

Step 2: import back to Bitbucket

BB APIs don't seem to cover this, so we have to mimic a browser call to their "Import existing code" UI page. Luckily it works with basic auth - without having to dig into the gruesome cookie details; that are commonplace when automating webpage form submissions.

To rename or not to rename, that is the question

Before the import, we may need to rename the existing repos; say we imported a hg repo named foo into GitHub as foo, and now we need to get it back in Bitbucket as foo - but we still have the old hg repo foo sitting in BB. Luckily the update repo endpoint in BB API can do this for us.

However, since we are doing this on behalf of an organization, there may actually be no conflicts at this stage - since we are importing the repos into your own BB account, not the organization's. But in many cases, your account may have forks of the originals - which go by the same name - causing conflicts. In that case, simply add the names of your existing forks under the existing list below, so that the script will automatically rename them before trying to import the new ones.

We'll use the convention of renaming the original repo with a -hg suffix; the old Mercurial foo repo will become foo-hg, so the new Git one can reclaim the old name foo.

Combining the two:

from configz import *
import json
import time

# repo names that already exist in BB, under your account;
# script will rename them on BB side before importing the new one
# most common scenario: you have forks of some of the original repos on your own BB account,
# in which case you need to rename those before you can import the new ones
existing = []

# add any import-failed repos from step 1 - unless you managed to re-import them
# the script will simply ignore them - you WILL have to deal with them later!
failed = [
	"i-hate-git"
];

for r in repos:
	if r in failed:
		print("\nSkipping %s" % r)
		continue
	print("\n" + r)

	if r in existing:
		print("\nRenaming %s" % r)
		call("https://api.bitbucket.org/2.0/repositories/%s/%s" % (bb_user, r), {
			"Content-Type": "application/json",
			"Authorization": bb_auth
		}, json.dumps({
			"name": "%s-hg" % r
		}), "PUT")
		time.sleep(2)

	call("https://bitbucket.org/repo/import", {
		"Content-Type": "application/x-www-form-urlencoded",
		"Authorization": bb_auth
	}, "source_scm=git&source=source-git&sourceforge_scm=hg&codeplex_scm=hg&url=https%3A%2F%2Fgithub.com%2F{}%2F{}&auth=on&username={}&password={}&owner={}&name={}&is_private=True&forking=no_public_forks&no_forks=False&no_public_forks=True&scm=git".format(gh_user, r, gh_user, gh_pass.replace(" ", "%20"), bb_uid, r))

You can find your own ID (value for the bb_uid parameter, or the BB_UID environment variable for configz.py) via the "id" field on the bb-bootstrap HTML <meta> - found on almost every BB page after sign-in.

Now the transfer part is over; the rest is all about handing it back to your organization and setting the appropriate user/group permissions. So if you were doing this for yourself, you can skip straight down to step 5.

Bitbucket: if you didn't stop supporting hg, we won't be going through all this trouble!

Step 3 (Bonus!): transfer Bitbucket repo to appropriate party/organization

If you're doing this whole migration thing on behalf of a team or organization, you ultimately need to hand over control of the new repository to them.

The API doesn't provision this, either; you have to mimic a form submission on the Transfer Repository page.

from configz import *
import json

for r in repos:
	print("\n" + r)
	call("https://bitbucket.org/%s/%s/admin/transfer" % (bb_user, r), {
		"Content-Type": "application/x-www-form-urlencoded",
		"Authorization": bb_auth
	}, "user=%s" % bb_org)

Once transferred, the recipient has to accept the transfer on the other end; this is a bit tricky to automate - because each transfer request has a unique URL, and that seems to be available only on the confirmation email that gets sent to the recipient.

One trick might be to download the set of confirmation emails (e.g. as .eml files via T-bird), grep through for "View transfer request: ", clean up to isolate the URLs, and loop through them.

And here, we do need to rename each existing repo before importing the new one - to avoid the naming conflicts that we discussed before.

Note that this step must run on an account with admin privileges on the organization - probably the organization account itself.

from configz import *
import json
import requests
import time

# add the repository transfer accept URLs, from the transfer request emails
urls = [
]

aSess = requests.Session()
for u in urls:
	r = u.split("/")[4]
	print("\n" + r)

	print("\nRenaming %s" % r)
	res = aSess.put("https://api.bitbucket.org/2.0/repositories/%s/%s" % (bb_user, r), headers={
		"Content-Type": "application/json",
		"Authorization": bb_auth
	}, data=json.dumps({
		"name": "%s-hg" % r
	}))
	print(res.status_code)
	print(res.headers)
	print(res.text)
	time.sleep(2)

	# this part needs a username/password credential; app-passwords don't work
	call(u, {
		"Content-Type": "application/x-www-form-urlencoded",
		"Authorization": bb_auth
	}, "owner=%s&submitBtn=Accept" % bb_uid)

Step 4 (Bonus!): add necessary user/team permissions

For the brand new imported git repo, you'd also need to grant Bitbucket-level permissions for your old users and teams.

Here, we need to find and pass another parameter: the UUID of your BB organization:

  • open one of your organization's repos in the browser
  • get the page source (View Source)
  • search for &quot;<your organization name here, without the angle brackets>&quot;, &quot;uuid&quot;:
  • grab the UUID value next to it, excluding the surrounding &quot; marks
  • pass it to the script run command, as one additional variable BB_ORG_ID
import os
from configz import *

def grant(repo, group, level):
	call("https://api.bitbucket.org/1.0/group-privileges/{}/{}/%7B{}%7D/{}/?exclude-members=1".format(bb_org, repo, os.environ["BB_ORG_ID"], group), {
		"Content-Type": "application/x-www-form-urlencoded",
		"Authorization": bb_auth
	}, level, "PUT")

for r in repos:
	print("\n" + r)

	grant(r, "your-write-enabled-group-here", "write")
	grant(r, "your-read-only-group-here", "read")

Step 5: clean up

Clean up after you, always.

Now Bitbucket is ready to go - but your GitHub account still has the intermediary repos, which you should probably get rid of:

from configz import *

for r in repos:
	print("\n" + r)

	call("https://api.github.com/repos/%s/%s" % (gh_user, r), {
		"Content-Type": "application/json",
		"Authorization": gh_auth
	}, None, "DELETE")

Cool! All your Bitbucket hg repos are now git!

So, now that we are familiar with the whole round-trip story, hopefully you can adapt it to tackle any future Bitbucket apocalypses.

Adios, and good luck with changing your mindset to git! 🙏

1 comment:

Asankha said...

This was a great relief Janaka for the migration of all of AdroitLogic repositories in our BB account. Thank you very much for getting this done and happy to see you sharing the knowledge to help others