Latest Posts

Curtain Call

So I now have fancy letters after my name as something to show for the last four year’s hard work and a lot of money. Even better, I have a Ph.D. lined up doing something that I’m sure I’m really going to enjoy. But as a consequence, I’m also losing a lot. Sure, I’m going back to Cambridge for the Ph.D. but there’s going to be a lot missing from Cambridge.

I’ve switched colleges, and while the change is something I wanted, it would have been easier to just stick with Churchill. There’d be fewer unknowns for three months from now, which would be comforting. While objectively I’m sure Clare is going to be great, my gut is unconvinced. Coupled with the fact that a lot of people I’ve become attached to over the last four years are going to be absent, I’m very much heading into the unknown again as I was four years ago. I managed well enough then, though, and fell into company that I’ve treasured since, so I should be able to cope.

This wasn’t intended to be terribly sentimental. I’ve already done that either in person or via email (to my shame) with those that I wanted to. A lot of photos were taken after the end of exams, some of which were rather apt and there are still more to be circulated. It was the first and only post-exams / May Week period that I’ve had without performing with the Fire Troupe at balls, I really enjoyed myself. Without the regimen of sleeping during the day, get up, rehearse, perform, sleep again, I’ve had plenty of time for fun – garden parties, 5-a-side football tournaments, musicals… all the trappings. And what fun it was too. I definitely have no regrets about doing it.

Twitter Compression

Take one arbitrary limitation (Twitter only lets you send 140 characters). Note that Twitter allows you to use UTF-8 characters. Add a course in information theory. Roast under the heat of procrastination for several hours until juicy.

In english: I wondered just how much information you could send in a single tweet, and decided to find out.

Armed with the knowledge from an Information Theory course given by the singular David MacKay, I decided to see. So, it’s common knowledge that Arithmetic Coding is the best way to compress information, getting you to within a couple of bits of the Shannon Limit so long as you have an accurate modelling system. The ‘best’ bit of the compression comes from having suitably subtle and ingenious models of the text that you’re encoding; without such a model, you’re probably not going to see the benefits of Arithmetic Coding. However, I’m always up for a challenge – especially one set by someone with an Erdos Number of two.

Admittedly, I’d attempted to make an arithmetic encoder for a homework, which I almost achieved. However, I cheated by using the arbitrary precision maths module in PHP which would break after a suitably long string. As I presented it to him, I felt a pang of shame even before I explained… so reinvigorated with the fire of project procrastination, I thought I’d try and get one working properly. Admittedly, my first instinct was to just find one written in PHP, to get onto the meat of this project. But there didn’t seem to be one on the internet – until now! And it really does work properly, much to my amazement. Technically, it’s a range encoder, but the two are mathematically identical. As an added bonus, a range encoder isn’t protected by IBM patents, whereas an arithmetic coder is.

So, how do we transmit our bitstream over Twitter? Happily, Twitter supports UTF-8 characters, which can occupy up to four bytes each. For a four byte character, 21 are under control of the user. If we were to use just uppercase characters and space, for English this corresponds to about four bits per character we wish to send – so we could send up to 700 characters in such a case! We like punctuation though, so this means we can send fewer total characters, but it’s probably worth it. And so began my trial with Unicode.

  1. First attempt: split the binary string into 21 bit blocks, pad the last one and send the corresponding UTF-8 characters. Unfortunately, not all 21 bit long binary strings map to valid characters – those above 0×10ffff are invalid. So drop down to 20 bits per UTF-8 character, and have the first bit as 0 – we’re now definitely going to map onto a valid unicode character, right?
  2. Turns out, no. The points 0xD800 to 0xDFFF are UTF-16 surrogate pairs. So look for these, and if we would be trying to send one of these, send a three byte character instead (which will only be 16 bytes, but still a bargain).
  3. Unicode has canonical decompositions – so the character á is canonically identical to the characters representing a´ in sequence, for example. These are defined to be identical, and no application should function differently when presented by one sequence or the other. So now check to see that a character we are trying to send is fully decomposed – if not, drop to a three byte character if we were four, and a two byte character if we were three. I’m not convinced that there’s a PHP library that does a complete job of this anywhere, so this is currently only a ‘probably’ step of the process – if a message fails to send, it’s probably failed here.
  4. Line Feed / Carriage Return. Different OSes introduce the other – or not – after seeing one. So if the character we’re trying to send is actually in the bottom 128 of characters, only take seven bits. Set the 32s bit to 1, so we’re not using the awkward part of the ASCII table.

I then set about trying to send some information over twitter. The first test worked well. I decided to push it to the limit, however, and encoded nearly 600 characters as a string of 138 UTF-8 characters. Dropped my clipboard into Tweetie, and was informed that I was over the limit. Perplexed, I did some poking around and found that, in fact, Twitter isn’t sure what they mean when they say 140 characters. They think they probably mean bytes, not characters, but there is evidence to the contrary. No further updates as yet that I can see, but problems seem to be in at least partly due to how Ruby counts characters. Until this is fixed, however, the amount of information able to be sent using a single tweet is going to be limited, somewhat. They’re looking into it.

The upshot is that there is now a proof of concept of the compressor up along with the source. I’ve included the UTF-8 libraries with it which are from a couple of sources, but mostly MediaWiki. I based the range encoder on the pseudocode over at the wiki article, changing it to use binary rather than base 10. It’s entirely for use at your own risk.

Lastly, it would seem that I am not the first to have an idea along these lines, and some have even gone much, much further. Arguably too far. Truly, the internet is a wonderful enabler for people who want to do things simply to see if they can be done. I find it amusing how all of these seem to have sprung up in the last month!

Procrastination

Jon Stewart had a pretty great interview last week with Cliff May where they discussed the merits of torture, which was excellent on many levels. What got a lot of traction was Stewart’s assertion that Truman was a war criminal for using the A-Bomb against Japan; he acknowledged that he could understand the decision at the time, but looking back it was an atrocity and should be considered as such by Americans, who should learn from this particular mistake in the past.

Seeing a commentator speak in this way was novel, which was why it got the attention it did. Stewart was essentially forced into admitting this to defend his position on torture without resorting to double standards which Cliff May would have surely jumped on. Unfortunately, Stewart has now retracted what he said, saying it was stupid. I don’t believe that’s true; it’s a valid opinion that can be defended (as Stewart himself did to some extent in the original interview). From a (relative) outsider’s perspective, it seems like he’s being forced to say it simply because his original statement wasn’t popular. It’s unfortunate that Cliff May doesn’t have a chance to respond, because I think he would have relished the chance to be someone to score a (rare) point against Stewart.

Project deadline is looming, so as a bit of a shortcut for content, a selection of links that I’ve run across over the last couple of weeks and found worth enough of bookmarking. Not sure if I’ll do this consistently (it’s very easy to do), but it’s a relatively quick way to get a bulkier blog post with genuine content.

Technology and TeX

The Wii can now boot backups off of a hard drive. “Excellent”, I thought to myself. “I have an iPod sitting around, and if I backed up Brawl onto it, I’d be able to play Brawl without listening to my Wii’s DVD drive grinding itself away to dust”. So I set about obtaining the relevant software, which refuses to format the iPod to the (bespoke) WBFS file system. Nor did the homebrew client that runs on the Wii.

My original suspicion was that it was due to the fact that iPods use SCSI over USB, but I did some reading. It turns out that many iPod hard drives use block sizes of 2048 bytes, compared to 512 bytes used by almost every other hard drive, everywhere, which the wbfs utility assumed. So a quick ‘find | xargs grep’  and a compile later, I was able to format the iPod. Unfortunately, I still can’t rip from the Wii, so I’ve been using another hard drive as an intermediary. Because the hard drive in the iPod is relatively slow, loading times are about the same as off the DVD – but no grinding. Whenever I hear it, I get flashbacks to my Dreamcast and PSO v2, with the ever-increasing loading times.

My Part III project is dragging itself along. It’s just managed to stagnate over Easter, despite my efforts to the contrary. I really need to get it rolling again. The computational element taking so long to do each run doesn’t really gel well with my working style, which is to sit down and just do the damned thing, but where I have six hour long enforced gaps, it’s tricky to find my way back once I’ve set it going. I have a reasonable draft for the write-up, up to the point where results are necessary, but after that it’s really just an expanse of blank. I’m back to university on Sunday, which will perhaps spur me on to action if I’ve not already managed to get back on the horse. Having to revise minor options for an exam later this month has hardly helped, either.

As part of my writeup procrastination, I found some interesting articles about common errors in scientific write-ups. I admit to committing some of them in the past, but I’m striving to eliminate them now.

Lightning Strikes Twice

Yet another entry during term?!! Killing some time before going out, a quick post on some topics that have been circulating in my mind for the last few days.

  • My week five blues seem to have arrived exactly a week late. This is unsurprising, as lectures didn’t start for me until week two, so they could be considered right on schedule. Unfortunately, everyone else has already conquered them, so I’m wallowing alone, somewhat.
  • My Khan Machine seems to still be going strong. Two weeks after it was in the B3ta newsletter, it now has over 3500 graphs drawn, and still seems to be drawing around 50 visitors a day, after getting around 5000 hits in the first 24 hours. Pleasingly, people thought it worth submitting to Digg and Reddit, but it didn’t hit the front page on either, thankfully – I am only on free hosting, after all, and I don’t think their servers would have appreciated it!
  • Astrometry.net is absolutely staggering, especially when you see it go to work on your own crude contributions.
  • Having signed up for Twitter quite a while ago, I never used it. I’ve been forcing myself to (slowly) since around Christmas, and while my own contributions are comparatively infrequent, I enjoy those from the people I’m following. These are mostly celebrities of one form or another that I’m interested in for whatever reason. It’s bite-sized celebrity gossip, news and banter. I think Twitteriffic is going to be hanging around on my Mac for a while.

License To Ill

An entry during term time!?! What madness is this!?

I’ve had the magnificent fortune to not have suffered from any serious illness during my time in Cambridge – until this week. I was practically bedridden for two days, technically functional but practically ineffective for a third (curiously, between the two bedridden days). It’s hardly been a pleasant experience; knowing that both my supervision work was piling up and work on my research project was going wanting only served to antangonise me as I lay in bed unable to do anything about either. I even had to miss a free formal…

I’m now on the road to recovery (I hope), and rather than going out this evening have committed this brief post to the internet at large. Lastly, I have made the Kh(aX)n machine for B3ta, which seems to be pretty popular.

Countdown

The last episode of Countdown with Carol Vorderman aired yesterday (the final conundrum was, appropriately, ‘Era Closes’). It was the final of the current series, and was competed between two PhD students from Cambridge and Oxford. Unfortunately, the Cambridge guy lost narrowly (four points), but I was struck by how much I liked the poem from Hilaire Belloc that Giles Brandreth recited handing Carol a bouquet:

From quiet homes and first beginning
Out to the undiscovered ends
There’s nothing worth the wear of winning
But laughter and the love of friends.

Secondly, this news pleases me more than it really should. Gambit’s omission from the X-Men movies frustrated me intensely, and Deadpool is simply excellent. Seeing them both (potentially) get the treatment that I think they deserve is delightful.

Michaelmas Post Mortem

Three months since an entry – clearly [minus one point - Ed] another busy term in Cambridge has gone by. It’s been somewhat different to previous ones; living in an actual house has proved remarkably easy to transition to. I suspect a lot of that is based on the fact that, by chance, our natural sleep cycles and lecture timetables have such a disparity that very rarely has anyone ever been caught waiting to use the bathroom in the morning.

Fortuitously, I ended up with a lecture timetable that involved lectures only three days a week; this meant that I was more willing to do things in the evening, and more enthusiastically, which I really enjoyed. Two Fire Troupe sessions a week, Churchill College Dance Club once a week (when it didn’t clash with Troupe) and Pav (bien sûr) were the most regular activities I partook in. More occasional excursions included Salsa, going to see Nizlopi in London (with Jon and Kat, much to my delight) and the usual excursions to the Rainbow Cafe and, right at the end of term, a John’s formal for a birthday. My last 48 hours Cambridge were, for an alarmingly coincidental combination of reasons, a bit of a downer, but did little to take the sheen of what has been a marvelous term overall.

Of course, it wasn’t all fun and games. Work was pretty relentless throughout term, thought I managed to keep on top of it until the very end where – ironically – the Scholar’s feast meant that I didn’t get all my supervision work done for my final set of supervisions crammed into the last two weekdays I was in Cambridge. PhD applications are well under way, too – the forms for which are a challenge in themselves to fill in, with contradicting, misleading and generally unhelpful instructions. Still, at least everyone has to fill in the same forms.

The ‘winterval’ has arrived, bringing with it even less relaxation that usual. Part III has exams throughout the year so, five weeks today, I’ll have had my first exam of the year. The January exams make up a full third of the year’s mark, so I have to take them – unfortunately – pretty seriously, and do revision appropriately over the coming weeks. I’ve also got to get something tangible out for my Part III project whilst not going completely nutty. There are a few fun things planned for the holiday – notably an Evans Challenge Football match – and I am looking forward to them, but they are unfortunately going to have to be the exception rather than the rule this holiday, I fear.

I note, unfortunately that I’ve still not regained the passion I used to have for photography. I was understandably burned out after my Project365, but I was rather hoping to have regained it by now. I believe that part of it is due to my new camera which I received through the insurance company after I broke my old one. The new one is the technical equal or superior of the old one in every respect., bar one. That respect is that it makes a godawful fake shutter noise every time it is used. It’s a pet peeve that I have with technology these days and, unfortunately, it seems to be becoming more prevalent. It’s a lovely camera in every other respect, and it really deserves to be used more. A more gentle photography New Years’ resolution, perhaps, than my 2007 one?

LHC

Unless you’ve been living under a rock the last few weeks, you know that yesterday the Large Hadron Collider went online, much to the joy of physicists everywhere. It will clear up a bunch of questions about the Standard Model and, more happily for me, it might confirm the theoretical existence of skittens. It also will not bring about the end of the world, regardless of what some people think, when it starts colliding protons and ramping up the energy in about a month and continuing to do so over the course of a few years. Theoretically, I wouldn’t mind a great deal about people scaremongering, but if it has real consequences then you do have to start thinking carefully about what you’re reporting.

Saying that, of course, there do seem to be some suspicious characters hanging around the LHC – posted after the jump…

(more…)

My heroes

It seems like it’s a bad time to be a fan of anything I like. In quick succession we’ve had Nizlopi announce they’re not touring next year, one of the Barenaked Ladies get busted for drugs, another narrowly escape death and now Jasper Fforde, quite possibly my all time favourite author, is going to have to watch ‘Lost in Austen‘.

I’m not saying that it’s been directly lifted with the names changed, but it bears an astonishing resemblance to the Thursday Next series. In this new TV drama, a woman who lives for reading finds herself inside ‘Pride and Prejudice’, where she ends up accidentally changing the story. In the first of the Thursday Next books, Thursday finds herself inside ‘Jane Eyre’ chasing a villain, and ends up changing the story. I intend to watch it just so that I can see how many similarities there are for myself, but as Phil has pointed out, in the future there will be people who pick up ‘The Eyre Affair’ and think how much of a resemblance it bears to Lost in Austen – which is perhaps the worst aspect of it all.

On a lighter note, work has continued on our ongoing Dalek Project over the last four days, and we’ve made a lot of progress by virtue of spending all our time at Ben’s. There was a small incident with an accidental wheelie, and putting 24V through a 12V relay, but nothing that can’t be repaired. It’s really coming along well (and now has an eye stalk – which it didn’t when those photos were taken). It’s not going to have some features real Daleks have, but equally it will have some features that real Daleks don’t – and by virtue of those features, will be excellent.