Friday, January 31, 2020

Auto-migrate your Bitbucket hg repos to git - in 3 (5) easy steps

Better be at peace with your version control system!

So Bitbucket is sunsetting Mercurial support, and you are wondering what to do with all those hg repos lying in your Bitbucket account? Relax. It is pretty easy to migrate them to git - and get them back in Bitbucket in no time.

Automating hg-to-git migration

There are probably dozens of blogs and articles out there, explaining how to import a Bitbucket hg repo into GitHub, and then back to Bitbucket as git. The only difference here is, that we would be using some automation here - API calls and whatnot. This could come handy if you are dealing with hundreds of repos - jumping back and forth to do the job manually might not be a pleasant experience after all.

In this example, we will be importing Mercurial repos available in a BB organization account (via a user that has been granted (at least) read access to them), and transferring the converted Git repos back to the same organization.

First, the bad news

Before jumping in, a few things you should know:

  • This guide uses basic auth for authorizing to BB and GH. These API calls may work with OAuth tokens or other forms of "standard" or "secure" auth - as dictated by each service; however, some admin operations being used here, do require admin credentials. Normal OAuth tokens don't usually provide these scopes; so in a nutshell, if you decide to go without basic auth, YMMV.
  • [Update] Very recently, I got a warning email from GitHub, mentioning that they are soon removing basic auth support - so by the time you read this, none of it might actually work (although techniques like Personal Access Tokens might still be worth a shot).
  • APIs may have throttling; so if you're in big business, you may have to batch these calls into hundreds, thousands, whatever - less than 100, doesn't seem to cause issues though.
  • The bonus steps (ownership transfer and permission updates) need to be run from an account with admin access - which means that user would need to provide his username and password to the script. While there are tricks to prevent the shell from remembering an entered command - and hence the credentials - you may need to go manual if your admin-colleague is too skeptical about the whole thing.
  • You may be in complete violation of GitHub's terms of service - considering you're using them as only a stepping stone. Figure it out on your own. You have been warned.

Consider yourself warned.

All right, let's get to work!

hg-to-git migration: the plan

  1. import to GitHub, wait for completion
  2. import back to Bitbucket, wait for completion
  3. (Bonus!) transfer to appropriate party/organization
  4. (Bonus!) add necessary user/team permissions
  5. clean up

Of course we could have stopped at 1, and ditched BB once and for all; but probably your BB repo used to be private, and GH won't let you keep it that way - unless it is on a personal account. And BB has the cheaper team plans, after all.

Step by step

Step 0: some handy configz

I'm gonna use Python, so first let's start by defining some configs and utility methods in a configz.py.

As you may notice, I will be defining several environment variables, so that I (and you) can avoid hard-coding usernames and passwords in code - even temporarily. When invoking, be sure to define the environment variables; and at the same time, be smart enough not to allow them to seep into your shell command history - perhaps by prefixing the whole thing with some whitespace ( ):

For example, step 1 could be executed as:

<whitespace>GH_USER=janakaud GH_PASS=naaah BB_ORG=acmeandco BB_USER=janakaud BB_PASS=NAAAH python gh-import.py

All right, back to configz.py:

from base64 import b64encode
from os import environ
import requests

# we'll use environment variables to fetch get auth details

# GitHub username/password
gh_user = environ["GH_USER"]
gh_pass = environ["GH_PASS"]

# Bitbucket organization name, username, password and user ID
bb_org  = environ["BB_ORG"]
bb_user = environ["BB_USER"]
bb_pass = environ["BB_PASS"]

# find the user ID by checking the "id" field on "bb-bootstrap" HTML meta-tag,
# in any Bitbucket webpage after sign-in:
# https://codereview.stackexchange.com/questions/41647/script-to-import-repositories-to-bitbucket#comment-454978
bb_uid  = environ["BB_UID"]

repos = [
	# add your list of BB repo names for migration
	# you could grab them from the https://api.bitbucket.org/2.0/repositories/$BB_ORG API output

	"mercurial-ftw",
	"i-hate-git"
]

gh_auth = "Basic %s" % b64encode("%s:%s" % (gh_user, gh_pass))
bb_auth = "Basic %s" % b64encode("%s:%s" % (bb_user, bb_pass))

s = requests.Session()

def call(url, headers, data, method="POST"):
	res = s.request(method, url, headers=headers, data=data, allow_redirects=False)
	print(res.status_code)
	print(res.headers)
	print(res.text)
	return res

Step 1: import to GitHub

GitHub. Our stepping stone. The poor octocat.

GitHub does have a UI page for this, but if you got a lot, using the source imports API may be faster - and cooler.

You need to create a repo first, and then run the import by providing your BB credentials:

from configz import *
import json

for r in repos:
	print("\n" + r)

	call("https://api.github.com/user/repos", {
		"Content-Type": "application/json",
		"Authorization": gh_auth
	}, json.dumps({
	  "name": r,
	  "private": "true"
	}))

	call("https://api.github.com/repos/%s/%s/import" % (gh_user, r), {
		"Accept": "application/vnd.github.barred-rock-preview",
		"Content-Type": "application/json",
		"Authorization": gh_auth
	}, json.dumps({
	  "vcs": "mercurial",
	  "vcs_url": "https://bitbucket.org/%s/%s" % (bb_org, r),
	  "vcs_username": bb_user,
	  "vcs_password": bb_pass
	}), "PUT")

The API even allows you to check the status of each import.

But GitHub is kind enough to send you an email notification for each import completion status. So it might just be faster to filter your mailbox for subject "Import to ..." from noreply@github.com, count the successes and failures, and take any further steps.

Yeah, this time we're betraying a good beast. 😟

Step 2: import back to Bitbucket

BB APIs don't seem to cover this, so we have to mimic a browser call to their "Import existing code" UI page. Luckily it works with basic auth - without having to dig into the gruesome cookie details; that are commonplace when automating webpage form submissions.

To rename or not to rename, that is the question

Before the import, we may need to rename the existing repos; say we imported a hg repo named foo into GitHub as foo, and now we need to get it back in Bitbucket as foo - but we still have the old hg repo foo sitting in BB. Luckily the update repo endpoint in BB API can do this for us.

However, since we are doing this on behalf of an organization, there may actually be no conflicts at this stage - since we are importing the repos into your own BB account, not the organization's. But in many cases, your account may have forks of the originals - which go by the same name - causing conflicts. In that case, simply add the names of your existing forks under the existing list below, so that the script will automatically rename them before trying to import the new ones.

We'll use the convention of renaming the original repo with a -hg suffix; the old Mercurial foo repo will become foo-hg, so the new Git one can reclaim the old name foo.

Combining the two:

from configz import *
import json
import time

# repo names that already exist in BB, under your account;
# script will rename them on BB side before importing the new one
# most common scenario: you have forks of some of the original repos on your own BB account,
# in which case you need to rename those before you can import the new ones
existing = []

# add any import-failed repos from step 1 - unless you managed to re-import them
# the script will simply ignore them - you WILL have to deal with them later!
failed = [
	"i-hate-git"
];

for r in repos:
	if r in failed:
		print("\nSkipping %s" % r)
		continue
	print("\n" + r)

	if r in existing:
		print("\nRenaming %s" % r)
		call("https://api.bitbucket.org/2.0/repositories/%s/%s" % (bb_user, r), {
			"Content-Type": "application/json",
			"Authorization": bb_auth
		}, json.dumps({
			"name": "%s-hg" % r
		}), "PUT")
		time.sleep(2)

	call("https://bitbucket.org/repo/import", {
		"Content-Type": "application/x-www-form-urlencoded",
		"Authorization": bb_auth
	}, "source_scm=git&source=source-git&sourceforge_scm=hg&codeplex_scm=hg&url=https%3A%2F%2Fgithub.com%2F{}%2F{}&auth=on&username={}&password={}&owner={}&name={}&is_private=True&forking=no_public_forks&no_forks=False&no_public_forks=True&scm=git".format(gh_user, r, gh_user, gh_pass.replace(" ", "%20"), bb_uid, r))

You can find your own ID (value for the bb_uid parameter, or the BB_UID environment variable for configz.py) via the "id" field on the bb-bootstrap HTML <meta> - found on almost every BB page after sign-in.

Now the transfer part is over; the rest is all about handing it back to your organization and setting the appropriate user/group permissions. So if you were doing this for yourself, you can skip straight down to step 5.

Bitbucket: if you didn't stop supporting hg, we won't be going through all this trouble!

Step 3 (Bonus!): transfer Bitbucket repo to appropriate party/organization

If you're doing this whole migration thing on behalf of a team or organization, you ultimately need to hand over control of the new repository to them.

The API doesn't provision this, either; you have to mimic a form submission on the Transfer Repository page.

from configz import *
import json

for r in repos:
	print("\n" + r)
	call("https://bitbucket.org/%s/%s/admin/transfer" % (bb_user, r), {
		"Content-Type": "application/x-www-form-urlencoded",
		"Authorization": bb_auth
	}, "user=%s" % bb_org)

Once transferred, the recipient has to accept the transfer on the other end; this is a bit tricky to automate - because each transfer request has a unique URL, and that seems to be available only on the confirmation email that gets sent to the recipient.

One trick might be to download the set of confirmation emails (e.g. as .eml files via T-bird), grep through for "View transfer request: ", clean up to isolate the URLs, and loop through them.

And here, we do need to rename each existing repo before importing the new one - to avoid the naming conflicts that we discussed before.

Note that this step must run on an account with admin privileges on the organization - probably the organization account itself.

from configz import *
import json
import requests
import time

# add the repository transfer accept URLs, from the transfer request emails
urls = [
]

aSess = requests.Session()
for u in urls:
	r = u.split("/")[4]
	print("\n" + r)

	print("\nRenaming %s" % r)
	res = aSess.put("https://api.bitbucket.org/2.0/repositories/%s/%s" % (bb_user, r), headers={
		"Content-Type": "application/json",
		"Authorization": bb_auth
	}, data=json.dumps({
		"name": "%s-hg" % r
	}))
	print(res.status_code)
	print(res.headers)
	print(res.text)
	time.sleep(2)

	# this part needs a username/password credential; app-passwords don't work
	call(u, {
		"Content-Type": "application/x-www-form-urlencoded",
		"Authorization": bb_auth
	}, "owner=%s&submitBtn=Accept" % bb_uid)

Step 4 (Bonus!): add necessary user/team permissions

For the brand new imported git repo, you'd also need to grant Bitbucket-level permissions for your old users and teams.

Here, we need to find and pass another parameter: the UUID of your BB organization:

  • open one of your organization's repos in the browser
  • get the page source (View Source)
  • search for &quot;<your organization name here, without the angle brackets>&quot;, &quot;uuid&quot;:
  • grab the UUID value next to it, excluding the surrounding &quot; marks
  • pass it to the script run command, as one additional variable BB_ORG_ID
import os
from configz import *

def grant(repo, group, level):
	call("https://api.bitbucket.org/1.0/group-privileges/{}/{}/%7B{}%7D/{}/?exclude-members=1".format(bb_org, repo, os.environ["BB_ORG_ID"], group), {
		"Content-Type": "application/x-www-form-urlencoded",
		"Authorization": bb_auth
	}, level, "PUT")

for r in repos:
	print("\n" + r)

	grant(r, "your-write-enabled-group-here", "write")
	grant(r, "your-read-only-group-here", "read")

Step 5: clean up

Clean up after you, always.

Now Bitbucket is ready to go - but your GitHub account still has the intermediary repos, which you should probably get rid of:

from configz import *

for r in repos:
	print("\n" + r)

	call("https://api.github.com/repos/%s/%s" % (gh_user, r), {
		"Content-Type": "application/json",
		"Authorization": gh_auth
	}, None, "DELETE")

Cool! All your Bitbucket hg repos are now git!

So, now that we are familiar with the whole round-trip story, hopefully you can adapt it to tackle any future Bitbucket apocalypses.

Adios, and good luck with changing your mindset to git! 🙏

How I went serverless, seven years ago

Everybody is crying out "serverless, serverless!" these days. But few realize it has been there, lurking in the shadows, in many disguises... from as far back as 2009. Yeah, you read it right. For far more than a decade.

Before Lambda (2014). JAWS The Framework (2015). Sigma (2018). And any of this serverless "hype".

The Beginning

I started my programmer life in 2007 (ninth grade at school). That which started with C For Dummies by Dan Gookin, was kinda my whole life - by the time I was in my second year at dear ol' UoM - 2013.

C For Dummies - where it all began

Okay, before that.

The Hourly Draw

Somewhere in 2012, while I was just fooling around - first year at uni! - one of my best friends told me about this website; where you could play a game each hour, and win "credits".

You could then buy stuff from the site, in exchange of these credits. Nothing fancy: teeny tiny stuff like memory cards, flash drives, watches and stuff. The fascination of an average kid - yeah yeah, back in those days, at least.

First, it was all manual.

I still remember how I kept on doing it; waking up my laptop at the 25th minute of each hour; refresh the game page, click the button to go to their "promo page", wait 30-odd seconds, and hit the "enter drawing" button. (Remember, we're at UTC+0530, so the server-hour ends at the 30th minute.)

Yeah. Every hour. At least the ones when I was awake.

And to think that I actually kept on doing that, for 5 odd months. Man.

And, after a few months,

You guessed it: I got tired of it.

Confession: I'm still doing it.

"Old habits die hard", they say. Of course. Especially when those "habits" bring you nice goodies and gadgets, every once in a while!

So how am I still doing it - while doing a day job?

Well, not manually, of course.

Automation, FTW!

Automation: baby steps

In the beginning, all I could think of was: write some JS scripts that would do the wait-and-click stuff for me. I knew about GreaseMonkey so pretty soon I could splice up something simple.

But obviously, that meant that my laptop had to be alive and online. And yes, I still had to fire the script manually at each 25th minute (although I could probably have come up with something for automating that as well).

And then, one fine day, in the university caferia...

The cafeteria ("Goda Uda canteen", shall I say?) was my safe haven; there I could sit at a table for hours, with nothing but my laptop in front - and forget about the whole world.

Oh, those were the days.

Anyways.

That day, I was thinking all over about this website thing. What if I can set up my script on some "online machine"? (The whole "cloud server" concept was still a bit beyond my grasp.) I won't have to do anything, not even keep my laptop running; the server would do everything for me.

The Goda Uda Canteen

HTTP, WWW and Cookies, a.k.a. Prof. Gihan's lecture

A few days ago, we had attended a networks lecture by Prof. Gihan Dias - he who was a pioneer in introducing computing, internet, e-mail etc. to our country. The lecture explained how websites maintain client "sessions" - via something called "cookies" passed around in HTTP headers. (Luckily I had managed to grasp that part - without falling asleep - as we usually used to do, during his lectures.)

I had already checked the network tab of my browser, and seen how those little cookie things get passed back and forth. If I could mimic the same thing, and put it in an "online machine" somewhere, all my troubles would be over.

I had seen some of my techy friends doing that on stuff called like "VPS" and "Heroku". But, as always I wanted things to be simple - just save my code, set up a timer kind of thing, and "sit back and enjoy the ride".

And of course, I didn't wanna spend a dime on all that.

And then I saw him.

Haha. Nope. It's not what you think it is.

It was Thilina අයියා - from our senior batch, both at school and campus. He was a මීටරේ - a "bright case", and one of the best at that in our department of study. He was our first go-to solution for any computer or IT-related problem.

And now he was in the cafeteria queue, waiting for his turn.

So I just walked up to him and asked:

"අයියෙ වෙබ් හොස්ටිං හරි සර්වරයක් හරි නැතුව පොඩි කෝඩ් කෑල්ලක් ඔන්ලයින් දුවාගන්න විදිහක් තියේද?"

"Bro, is there a way to run a piece of code online - without setting up web hosting or a server?"

The guy was thoughtful for a few seconds, and said:

"ගූගල් ඇප්ස් ස්ක්‍රිප්ට් කියල සීන් එකක් තියෙනව, පොඩ්ඩක් සර්ච් කරල බලපං..."

"There's this thing called Google Apps Script; do a search and you'll see..."

Google Apps Script, FTW!

And that'a how I met GAS.

I thanked him, walked back to my seat, and started on it. Apps Script looked really cool - just write some JS, set up a trigger, and done! Google would take care of running your code at the right time.

With my daily draw opened right on the next tab, I wrote my first Apps Script - crudely replaying every network call of the actual browser-based entry process. Needless to say, it didn't work the first few times - heck, I didn't know how to pass a cookie header to UrlFetchApp.fetch - but in the end...

...when I ran the script and refreshed the actual draw page, there was the message "you have already entered".

Culmination of success.

So, how serverless is Google Apps Script?

Very.

  • GAS lives in Google Drive - one the first widespread, client-facing storage-as-a-service.
  • It is merely a set of script files containing functions, following a JS-like syntax.
  • You can assign triggers to these functions: time-driven, or fired by events like an edit on a spreadsheet, a form submission, or merely the opening of a Google document.
  • With web app mode you can also expose your function via a HTTPS URL - for request-response mode execution.
  • It offers native integration with a mosaic of Google services - including Gmail, Drive, Docs, Forms, Sheets, Slides, Calendar, etc. - and can be integrated with any other Google Cloud service with little effort.
  • Additionally there is built-in caching, key-value storage and logging always at your disposal - to name a few - without having to set up or deploy anything.

Sounds familiar?

GAS: Better than FaaS?!

To me, it all sounds much better than AWS Lambda - or Google Cloud Functions, for that matter.

With normal FaaS platforms, you only get the compute out-of-the-box: you need to set up storage, record persistence, caching, queueing, HTTP URL exposure, timer triggers and a dozen other things before your FaaS application can do something useful.

But with GAS, it's all bundled right in. Just write your code; save it (Ctrl+S); run (and, oh, even debug) it; and set up your trigger, sit back, and relax - all without ever leaving your browser tab!

Debugging on GAS

As if that wasn't enough, GAS can even send you execution failure notifications if your asynchronous trigger runs fail - individually or batched. Maybe not as powerful or versatile as dead letter queues or Lambda destinations - but GAS has been offering it for at least 8 years, much longer than any other provider can boast.

Yep, one could argue it is against the principles of separation of concerns - but as long as GAS matches the definition of serverless computing, nobody can deny that GAS is serverless.

Versus Google App Engine

I have seen many praise Google App Engine as an early, unspoken serverless pioneer. Their claims are partly true; like Apps Script, App Engine offers a remarkable level of abstraction - upload your code, set up your execution triggers, and the rest - launch, scaling, error recovery, etc. - is automagically taken care of. Plus, you can make use of dozens of inbuilt services - memcache, datastore, taskqueue - as well as heaps of others from Google Cloud Platform APIs.

However, there is a major difference:

App Engine spawns instances that handle multiple concurrent requests, but in serverless - Lambda etc. - one instance usually handles a single request. An instance could handle multiple calls, one after the other; but under concurrency, each request is handled by its own instance.

On the other hand, Apps Script doesn't even have the notion of an instance. One could say, therefore, that it is a better fit for the "serverless" definition - perhaps even more so than FaaS!

Conclusion: so there!

Sorry about that technical side-track; yet, here we are - discussing serverless, back from a time when Lambda was still on the drawing board!

Serverless means server-less.

How, in my quest for exactly that - a "server-less" code hosting solution - I ran into the simplest; coolest; and most perfect serverless platform that I would have ever dreamed of.

Having known Lambda, GCF and App Engine for several years, my heart still lies in Apps Script - simply because it is so simple; self-contained; and, er, serverless.

Perhaps, if I didn't meet Apps Script, my life would have been quite different from what it is today. I can't exactly say if it would have been better or worse - but I guess I should be grateful.

Dear Apps Script, I owe you one! 🤗