Archive for the ‘Uncategorized’ Category

SOSP 2009: Day 3

Monday, October 19th, 2009

Clusters

Distributed Aggregation for Data-Parallel Computing: Interfaces and Implementations

  • Goal to make large-scale programming simple for all developers.
  • Write a program in Visual Studio; Dryad(LINQ) takes care of shipping it to the cluster, fault tolerance, etc.
  • Wrestling with the implementation of GroupBy-Aggregate. GroupBy takes a sequence of objects with some kind of key, and groups them together by key. Similar to MapReduce.
  • Naïve execution plan splits map and reduce into two phases, with an all-to-all data exchange between them. However, applying the reduce after this exchange results in a large amount of network I/O.
  • A better idea is to do early partial aggregation: use an aggregation tree to achieve this. Reduces the disk and network I/O by up to one or two orders of magnitude.
  • Want to automate this optimization. Programmer writes the obvious code and the system takes care of the rest.
  • Notion of decomposable functions is key to this. Need an initial reducer that is commutative, and a combiner that is commutative and associative.
  • How do we decompose a function? Two ways: iterator and accumulator interface. Choice can have a significant impact on performance.
  • How do we deal with user-defined functions? Try automatic inference, but fall-through to a good annotation mechanism. Implement simple function and annotate it with the initial reduce and combiner implementation function names.
  • Hadoop interface for this adds quite a lot of complexity. Java’s static typing is not preserved.
  • Iterator interface has to build an entire group and iterate through it. Accumulator can discard the inputs if they are not needed. Oracle uses this approach, implemented with stored procedures. Hard to link in user-defined procedures.
  • Automatic decomposition looks at the expression and checks whether all leaf function calls are decomposable.
  • Want our approach to have good data reduction, pipelining, low memory consumption and parallelisability (multicore). Define six strategies, accumulator- and iterator-based.
  • Iterator PartialSort approach. Idea is to keep only a fixed number of chunks in memory; processed in parallel. The bound on memory makes pipelining possible. Strategy close to MapReduce.
  • Accumulator FullHash approach builds an in-memory parallel hash table with one accumulator object per key. Objects are accumulated immediately. This gives optimal data reduction and memory consumption proportional to the number of keys, not records. This is the DB strategy (DB2 and Oracle).
  • Evaluated with three applications: WordStates, TopDocs and PageRank on a 240-machine cluster. Accumulator-based implemen—

water_laptop

The 9/11 Delusion

Monday, July 21st, 2008

When I saw in last Monday’s Guardian that Charlie Brooker was taking aim at 9/11 conspiracy theories, I hoped that he’d use his wide audience to present a logically watertight argument, in an entertainingly acerbic register. And buried within his piece was the quite probable suggestion that the paperwork alone would be impossible to conceal. Unfortunately, because he’s evidently paid by the ad hominem, he also said that every conspiracy theorist might as well believe that he is the Emperor of Pluto, and unleashed a firestorm in the online comments. By opening up too many fronts in this debate, he left himself open to attacks, even from other Guardian commentators.

(more…)

At least he’s got a bag for life

Sunday, May 25th, 2008

The law of unintended consequences often comes shopping with me.

(more…)

My name is not Bob

Sunday, May 18th, 2008

This weekend finds me back in Glasgow to visit my parents, and I’ve spent much of the afternoon clearing out a desk that I’d used since 1996. Filled with memories, old tickets and trinkets, my first football match, my first gig, my first trip to London on my own, my trip to Cambridge for interview. Filled with old school work and school reports (Computing – “I am pleased with my grades, and I like computers.” R.E. – “I am pleased with my progress in the short course, and look forward to its completion.”), and the surprising insight that I apparently had a “particular ability at Volleyball.” Filled with greetings cards from old friends, people I barely remember, and people I’d rather forget.

(more…)

Strangers on a plane

Friday, May 16th, 2008

The other day, I was getting off a plane from Istanbul back to Stansted, and retrieving my Duty-Free carry-on, when a fellow passenger accosted me:

“I think that’s my bag,” he said.

“I’m fairly sure it’s not.”

“Does it have Turkish Delight in it?”

“Well, yes….”

Civic Pride

Monday, May 5th, 2008

Growing up in Glasgow, I was exposed to more than my fair share of internecine rivalries: when I was more serious about blogging, I planned a grand series of posts cataloguing every single one of them. Easy, I thought, there’s the other football team, the other side of the river, the suburbs, the other city, and don’t even get me started on the English.

(more…)

The Apprentice: I demand a recount!

Thursday, June 14th, 2007

I accept that it’s a bit jejune to get worked up about Reality TV, but last night’s Apprentice took the biscuit.

(more…)