Computational Fluid Dynamics take a leap

People ask me all the time what work is being done that I find exciting. Honestly it’s a little harder than it used to be. The simultaneous emergence of cloud infrastructure providers and mobile hardware created a cambrian explosion of innovation that now feels like it’s settling out.

The work done in this video is a huge counter-example though. It’s a couple of years old now but I still find myself referring to it often enough that it’s worth creating a pointer to it.

Computational fluid dynamics which is the use of simulation for physical systems that involve fluids has traditionally been very hard. Your choices were either a low-fidelity model which did not map to reality well or a higher fidelity model which takes months on thousands of cores to simulate.

The folks at SpaceX have found a way to create a fractal representation of CFD meaning that the parts of the simulation where interesting things are going on are modelled at high resolution and the parts where nothing interesting are going on are represented at low fidelity. The model adjusts as the simulation progresses and the whole thing is tuned to perform well on GPUs like those sold by NVIDIA.

The result is that simulating novel designs of their upcoming raptor methane engine becomes something you can run overnight on a laptop. This is really exciting work and there is a myriad of other domains that this will impact. (Unfortunately, one of them likely being nuclear weapons design.)

The video is 50 min long but well worth it for those technically inclined. There is also a higher level article describing the work. If you just prefer the eye-candy, there’s a video of that too.

Tesla, Autopilot and Data Collection

2013-tesla-model-s-test-review-car-and-driver-photo-490335-s-original

f I were going to build a self-driving car, I’d want to have a large corpus of field data. What situations come up most commonly? How do humans handle them? How might our computer react in a similar situation?

Google’s self-driving car efforts are well known and they’ve spoken publicly about how all the miles their autonomous cars drive are carefully logged and analyzed. When a human takes over, presumably something has gone wrong or is at risk of going wrong and those situations are carefully scrutinized. Unfortunately this data is not what I’d call a clean sample of real human behavior. Everyone driving one of Google’s car is either an employee of Google or closely affiliated and most importantly, they know they’re driving one of Google’s special and expensive cars. No doubt they’ve signed a bunch of confidentiality and other forms. And so even if they’re driving themselves, they’re likely to be highly cautious. Those cars can only be used in certain controlled circumstances and the data Google can collect will be constrained.

When Tesla announced the autopilot features it really struck me that the hardware installed seems much more capable than what is really required for the features they’re offering. Radar for adaptive cruise control? Seems like overkill. But hundreds of thousands of Tesla cars with these sensors, all collecting data on their drivers and the situations they encounter seems like an amazing opportunity to build a corpus of real world situations encountered by human drivers and what they do. We already know that Tesla has the ability to connect to their cars remotely. What if Tesla is already deploying their self-driving software to their cars and running it in a mode where it’s just not hooked up to the actuators in the car? At every moment the car software could be simulating what it might be doing in the present situation and logging what happens when its choices differ from what the human actually does. Tesla engineers can then analyze these logs, adjust their software and re-simulate the car encountering that situation. At some point they’ll have it down to where the only places the human and the computer diverge is where they’re convinced the computer is making better choices. At that point, ship it! (Modulo lots of regulatory and insurance concerns.)

As an engineer that sounds exciting and cool. For me the ideal version of this would be that all the car data would be uploaded to HQ where I could analyze it indefinitely. Of course I’m sure customers and law enforcement would be interested to know if a full sensor download from all Tesla cars were being stored at Tesla HQ indefinitely. So it’s possible that the data would be anonymized somehow before being uploaded to HQ.

Does any Tesla owner out there want to share the privacy policy or the text of any opt-ins for the autopilot features on the new cars?

The world needs only five computers

C|Net hosts an interesting interview with Greg Papadopoulos, CTO of Sun, where he articulates the Sun argument that there will only be a small number of companies in the world that can achieve the scale and efficiency of data center operations to compete and that everybody else will get squeezed out. I think this basic premise is correct – certainly running a huge datacenter is hard and expensive. Google has done better than anyone else at making the complements of good software – CPU time, disk space, bandwidth and programmer hours – relatively cheap and easier to deploy than any of their competitors. That superior execution is throwing off dividends.

Other software companies must look at that scale and infrastructure cost and be struck by a combination of jealousy and fear. Certainly Microsoft is and it’s started serious investment around datacenter scale.

While Google, Microsoft, Salesforce.com, Sun, and Amazon are all attempting to build the common platform that everybody writes their apps against, Amazon’s EC2 effort has some uniquely interesting attributes not mentioned in the article: While Salesforce.com and Sun both make you write code to their proprietary API (and Google and Microsoft won’t run your code at all) Amazon sells access to their grid the “CPU Hour” where the “API” is just a bootable linux disk image. As a potential consumer of these services, this is immensely attractive to me. Much as Open Source software gives me the at-least-conceptual threat that I might take my existing software to another vendor, Amazon’s design gives me confidence that if they ever get too pricey or just go away entirely, I can always go throw my own boxes into a datacenter and run the machines myself. Who wants to make the same bet that the Sun API or Salesforce API will still be around in 5 years or they won’t jack up the prices? What’s in it for Amazon? Amazon gets to achieve scale on the backs of other people’s companies, driving down prices for their own needs.
Of course a raw Linux disk image is a pretty primitive construct to start building massively parallel and reliable systems. Code needs to be written to decide when to spool up new CPUs, for splitting and joining work like Google’s MapReduce and so on. This seems like some complementary software that Amazon should write to spur adoption of EC2.

Google Maps style UX comes to Leopard

At the WWDC last summer, Steve Jobs made a big deal about how he was keeping a few features in the next release of OS X (Leopard) secret because he didn’t want the folks in Redmond (ahem) to “start their photocopiers too early”. Although the whole photocopiers angle is an obvious jab (and not even plausibly realistic) I do think there are some interesting features that still haven’t been announced in Leopard.  There’s been a fair bit of speculation as to what Steve Jobs still has up his sleeves, but to my eye, nobody has quite nailed it.

I think the well known work to make OS X resolution independent will be applied to creating a “google maps style” interface for the desktop UX. Imagine a desktop that is a variable number of pixels across. Dragging a window so that it sits partially offscreen causes the viewport to smoothly zoom out and everything is visible, just scaled down. In this context, Expose is just a zoom out, instead of tiling programs in a flat grid with no relation to how they were laid out on screen. Naturally you can zoom in too or do the google-eque “drag” which is really just panning the viewport across a large area.

What’s my reasoning?

  • It just makes sense as a feature. You’ve always got more stuff running than you can deal with. It is much more natural and makes a hell of a lot more sense than multiple desktops, spinning cubes or Flip 3D
  • This feature doesn’t affect developers lives at all. They don’t even need to know about it so keeping it secret is possible.
  • It is indeed difficult for others to copy. Nobody else has done the resolution independence work that Apple has. (512*512 icons? Wow.)
  • It can have some super snazzy sounding name which Jobs loves so much. Time Travel?

I’ve read a fair number of the Apple blogs and haven’t seen this prediction anywhere else, but I’m not a regular so it’s possible I’ve missed something obvious. Personally I’m going to be shelling down for a new Vista laptop with a snazzy SideShow display, but I’m definitely pretty impressed by the Apple stuff.

On being root

Recently, my last friend who was still a student at MIT in the PHD program graduated and so I had to find a new home for oroup.com. What I’d always really wanted was my own box hosted at a colo where I could play around at will, install bizarre software, and so on. I’d previously priced this out a few times but the costs were always prohibitive, easily several hundred per month after the purchase price of the box itself. Boy how times have changed.
Rimuhosting offers you “root” on your own box and a static IP address for $19.95 a month. Of course, it’s not really your own box, it just feels that way. In reality, the magic of Xen, I run happily in my own little slice of a larger machine, happily unaware of who I share the box with or what they’re doing. They price by RAM. $19.95 only gets you 96MB of RAM which probably won’t get me as far as I’d like but still it’s pretty amazing.
Suddenly, new things seem possible. I’d always resisted using something like blogger.com or MSN Spaces for a bunch of reasons:

  1. Probably the key issue is I just don’t like the URL. I realized early on that setting my email address to a domain name I hosted meant I could swap out the underlying email provider whenever I wanted. While my friends send out notices announcing their move from hotmail to yahoo to gmail, I have had the same email address for almost 10 years and never plan to change it. (Spam issues notwithstanding.) The same should be true for your blog.
  2. At the end of the day, it’s my data. Google seems pretty enlightened at this point with respect to not trying to lock you in. Any email sent to my gmail account lands in my “real” inbox automatically without me having to log into their website. Still, you just never know how that’s going to evolve in the future. Sourceforge seemed pretty enlightened in this regard too and has gotten somewhat less so. If the data sits on my box, I know I can get to it.
  3. I want flexibility. Blogger will let you turn features on and off but they’re never going to use a search engine other than Google. MSN Spaces is always going to be pushing Live Search. Neither of them are ever going to connect to Flickr. It’s my site and I should be able to build it as I want. Sure, I’m unlikely to ever start writing my own PHP plugins but I want the flexibility to install what I want.

I tried to do a little comparison shopping of blog software and landed at this chart. It’s still basically too much information to grok but the author ended up going with WordPress, I’ve seen a bunch of WordPress sites around and the list of plugins seemed impressive. The only downside was that it requires MySQL. My impression has always been that PostgreSQL is a real database and MySQL is a toy, but that impression may be dated and at the end of the day it wasn’t that important to me. So WordPress it is.
Welcome to the new blog.