<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.0.2" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>mrry</title>
	<link>http://www.mrry.co.uk/blog</link>
	<description>Derek Murray's weblog</description>
	<pubDate>Mon, 05 May 2008 10:30:58 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.2</generator>
	<language>en</language>
			<item>
		<title>Civic Pride</title>
		<link>http://www.mrry.co.uk/blog/2008/05/05/civic-pride/</link>
		<comments>http://www.mrry.co.uk/blog/2008/05/05/civic-pride/#comments</comments>
		<pubDate>Mon, 05 May 2008 10:30:58 +0000</pubDate>
		<dc:creator>Derek Murray</dc:creator>
		
	<category>Uncategorized</category>
		<guid isPermaLink="false">http://www.mrry.co.uk/blog/2008/05/05/civic-pride/</guid>
		<description><![CDATA[Growing up in Glasgow, I was exposed to more than my fair share of internecine rivalries: when I was more serious about blogging, I planned a grand series of posts cataloguing every single one of them. Easy, I thought, there&#8217;s the other football team, the other side of the river, the suburbs, the other city, [...]]]></description>
			<content:encoded><![CDATA[<p>Growing up in Glasgow, I was exposed to more than my fair share of internecine rivalries: when I was more serious about blogging, I planned a grand series of posts cataloguing every single one of them. Easy, I thought, there&#8217;s the <a href="http://en.wikipedia.org/wiki/Old_Firm">other football team</a>, the other side of the river, the suburbs, the <a href="http://en.wikipedia.org/wiki/Edinburgh">other city</a>, and don&#8217;t even get me started on the English.</p>
<p><a id="more-30"></a>Though these rivalries have existed for years, the internet has given them a new lease of life, in the hands of pure, unabashed saddos. Usenet has been the site of a <a href="http://groups.google.com/group/alt.airports.uk.glasgow/browse_thread/thread/11ad497e76334c9a#">proxy war</a>, pitting <a href="http://groups.google.com/group/alt.airports.uk.glasgow/topics">Glasgow <em>Airport</em></a> against its <a href="http://groups.google.com/group/alt.airports.uk.edinburgh/topics?lnk=rgh">rival in Edinburgh</a>. The <a href="http://en.wikipedia.org/w/index.php?title=Glasgow&#038;action=history">revision history</a> for the <a href="http://en.wikipedia.org/w/index.php?title=Glasgow&#038;action=history">Glasgow article on Wikipedia</a> shows frequent changes in the population column, as well as <a href="http://en.wikipedia.org/w/index.php?title=Glasgow&#038;diff=209774696&#038;oldid=209774130">the recent deletion</a> of the comment, &#8220;<span class="diffchange diffchange-inline">Glasgow is the largest city in Scotland and all the more mysteriously is not the capital. Edinburh the second largest city in Scotland is the capital</span>.&#8221; I can even remember reading one message board post where the author had counted the number of A-roads in each city and concluded that, having more, Glasgow was the more important.<br />
In the interests of full disclosure, I must admit that I did once subscribe to both alt.airports.uk.glasgow and the edit feed for the city&#8217;s Wikipedia article. But having lived first in Edinburgh, now in England, I&#8217;ve become more laid back in my civic pride. Until today, that is.</p>
<p>A <a href="http://news.bbc.co.uk/1/hi/uk/7382750.stm">BBC news article</a> reported on fears during the Cold War that a nuclear strike would have a grave effect on the nation&#8217;s tea supply. It mentioned that, for the purposes of planning:</p>
<blockquote><p>the Ministry of Food listed London, Birmingham, Merseyside, Manchester and Clydeside [Glasgow] as H-bomb targets.</p>
<p>Tyneside, Teesside, Leeds, Sheffield, Hull, Derby, Purfleet in Essex, Southampton, Portsmouth, Bristol, Plymouth, Cardiff, Coventry and Belfast were named as A-bomb targets.</p></blockquote>
<p>And all I could think was, &#8216;Ha, take that, Edinburgh&#8230;.&#8217;
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.mrry.co.uk/blog/2008/05/05/civic-pride/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Trade</title>
		<link>http://www.mrry.co.uk/blog/2008/03/04/trade/</link>
		<comments>http://www.mrry.co.uk/blog/2008/03/04/trade/#comments</comments>
		<pubDate>Tue, 04 Mar 2008 16:35:29 +0000</pubDate>
		<dc:creator>Derek Murray</dc:creator>
		
	<category>Politics</category>
		<guid isPermaLink="false">http://www.mrry.co.uk/blog/2008/03/04/trade/</guid>
		<description><![CDATA[I can&#8217;t sleep, and I&#8217;ve got NAFTA on my mind. I won&#8217;t claim that the two are correlated, but when did that ever stop someone writing a blog post?
I am a little worried, however. Like just about everyone to whom I&#8217;ve talked, I&#8217;m hoping for a Democrat in the White House next year, and I&#8217;ve [...]]]></description>
			<content:encoded><![CDATA[<p>I can&#8217;t sleep, and I&#8217;ve got NAFTA on my mind. I won&#8217;t claim that the two are correlated, but when did that ever stop someone writing a blog post?</p>
<p><a id="more-29"></a>I am a little worried, however. Like just about everyone to whom I&#8217;ve talked, I&#8217;m hoping for a Democrat in the White House next year, and I&#8217;ve been following the primaries intently. But some of the recent pronouncements on trade have troubled me a little. So I&#8217;m hoping that, by writing something so naïve and ill-informed as this, someone will come out of the woodwork and tell me where I&#8217;m going wrong. Okay, here goes.</p>
<p>Let&#8217;s start by assuming that we&#8217;re all liberals. We probably want Barack Obama to be the next president, but maybe we&#8217;d rather it was Hilary Clinton. Either way, for the purpose of this discussion, it doesn&#8217;t matter.</p>
<p>We want things to be better for people in developing countries. To a first approximation, this could mean raising the standard of living in these countries by alleviating poverty. One way of doing this would be to spend more money in these countries. (Is this not why we&#8217;re all drinking fair-trade coffee, and eating fair-trade chocolate? Actually, scratch that, as I&#8217;m not convinced that the fair-trade movement doesn&#8217;t suffer from the exact same problems as what I&#8217;m about to describe. And sorry for the double-negative there.)</p>
<p>We are more likely to spend money in a country if we have free trade with that country, than if prices for goods from that country are inflated by duties or tariffs. We ensure free (or, perhaps more accurately, &#8220;freer&#8221;) trade by making free trade agreements with other countries.</p>
<p>So it&#8217;s heartening that, sometimes at least, both Obama [<a href="http://www.nytimes.com/2008/03/04/us/politics/04nafta.html?ex=1362373200&#038;en=95e274b55a8e3fe3&#038;ei=5124&#038;partner=permalink&#038;exprod=permalink">1</a>] and Clinton [<a href="http://www.ndol.org/ndol_ci.cfm?kaid=106&#038;subid=122&#038;contentid=250750">2</a>] have supported NAFTA, which enables free trade between the US, Canada and Mexico.</p>
<p>Except, there&#8217;s a problem: free trade can cost jobs, especially manufacturing jobs in countries where the standard of living is so high that it is uneconomical to pay workers a living wage, when the same output could be obtained at a fraction of the price from one of our trading partners.</p>
<p>Especially jobs in traditionally blue-collar states like Ohio, where voters go to the polls today. And, therefore, both Clinton [<a href="http://facts.hillaryhub.com/archive/?id=6019">3</a>] and Obama [<a href="http://www.barackobama.com/2008/02/24/remarks_for_senator_barack_oba_1.php">4</a>] have attacked the other for supporting NAFTA, at the expense of American jobs.</p>
<p>So I have three questions:</p>
<ul>
<li>Which of the above assumptions is incorrect?</li>
<li>I&#8217;d like to believe in Obama&#8217;s message of hope and change, but does this change stop at US border?</li>
<li>If, as is now being reported, at least one of the candidates is engaging in disingenuous political posturing when making these statements [<a href="http://www.nytimes.com/2008/03/04/us/politics/04nafta.html?ex=1362373200&#038;en=95e274b55a8e3fe3&#038;ei=5124&#038;partner=permalink&#038;exprod=permalink">1</a>] (and I&#8217;d be surprised if his opponent <em>wasn&#8217;t</em> also doing the same), then why should we believe in the rest of the rhetoric that goes along with it?</li>
</ul>
<p>Now, I&#8217;m not an idealist, but I don&#8217;t particularly enjoy being cynical. Indeed, I hope I&#8217;ve made a mistake somewhere above, and someone will come along and correct me. However, it seems that, in an election that will probably be won on the strength of aspirational oratory, we should be especially critical of what is said.
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.mrry.co.uk/blog/2008/03/04/trade/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>SOSP 2007: Day 3</title>
		<link>http://www.mrry.co.uk/blog/2007/10/17/sosp-2007-day-3/</link>
		<comments>http://www.mrry.co.uk/blog/2007/10/17/sosp-2007-day-3/#comments</comments>
		<pubDate>Wed, 17 Oct 2007 15:58:51 +0000</pubDate>
		<dc:creator>Derek Murray</dc:creator>
		
	<category>Uni</category>
	<category>Technology</category>
		<guid isPermaLink="false">http://www.mrry.co.uk/blog/2007/10/17/sosp-2007-day-3/</guid>
		<description><![CDATA[Indulge me for a moment. For some reason, the powers that be at SIGOPS like to hold SOSP at remote locations, which in recent years have included Bretton Woods, Banff, and Brighton. So I&#8217;ve not been the only one to point out that this year&#8217;s location, Skamania Lodge in southern Washington state, bears something of [...]]]></description>
			<content:encoded><![CDATA[<p>Indulge me for a moment. For some reason, the powers that be at SIGOPS like to hold SOSP at remote locations, which in recent years have included <a href="http://www.sosp.org/1983">Bretton Woods</a>, <a href="http://www.sosp.org/2001">Banff</a>, and <a href="http://www.sosp.org/2005">Brighton</a>. So I&#8217;ve not been the only one to point out that this year&#8217;s location, <a href="http://www.skamania.com/">Skamania Lodge</a> in southern Washington state, bears something of a resemblance to the infamous <a href="http://en.wikipedia.org/wiki/Overlook_Hotel">Overlook Hotel</a> in Stephen King&#8217;s <em>The Shining</em>. A little Wikipedia surfing last night led me to discover that the basis for the Overlook is Timberline Lodge, <a href="http://maps.google.com/maps?f=d&#038;hl=en&#038;geocode=&#038;time=&#038;date=&#038;ttype=&#038;saddr=Skamania+Lodge+Dr,+Stevenson,+WA+98648&#038;daddr=Timberline+Hwy,+Government+Camp,+Clackamas,+Oregon+97028,+United+States&#038;sll=45.305803,-121.731262&#038;sspn=0.063023,0.138702&#038;ie=UTF8&#038;z=10&#038;om=1">just across the river in Oregon</a>. Imagine my surprise as I watched the local news in our own hotel (which owes more than a little to <a href="http://www.mrry.co.uk/blog/en.wikipedia.org/wiki/Bates_Motel">Hitchcock</a>), and the weather forecaster cut to a shot of the Timberline Lodge, where apparently the snow has just started falling for the season. Ah well, at least they&#8217;ve got a security camera up there these days.</p>
<p><a id="more-28"></a></p>
<h2>Storage</h2>
<h3>DejaView: A Personal Virtual Computer Recorder</h3>
<ul>
<li>The MEMEX Vision: We lack tools to store all of our books, records and communications quickly and with great flexibility. (Vannevar Bush.) We need to archive, search, view and manipulate what we have seen. Web/desktop search is inadequate: no record of what we have seen but not saved.</li>
<li>DejaView: PVCR that provides complete, transparent, fast recording of the desktop computing experience. Tivo for the desktop. Records display like a video (playback, browse, ff, rw); and text and context to use as an index.</li>
<li>Display recording: transparent, efficient and full-fidelity. Uses a virtual display driver that can redirect the display anywhere: to the user (on the screen) and to persistent storage. Has a standard device interface, intercepting updates at the low level and logs all display updates.</li>
<li>Text/context recording: naïve approach would be do to OCR on the framebuffer, which is too slow and inaccurate. Instead leverage accessibility infrastructure (screen reader technology), which is integrated in most standard GUI toolkits. Also provides useful contextual information: name/type of application, window in focus, special properties (e.g. menu text).</li>
<li>Execution recording: be able to revive execution state at any time, including the entire desktop session, not just a single process, and it needs to be fast enough to save frequently without degrading user experience. Use a virtual execution environment that decouples the desktop from the underlying OS by interposing on the syscall API. Only saves the desktop state and not the entire OS, which cuts down on the size of the checkpoint. Requires some downtime for the application during saving and snapshotting the filesystem, which must be optimised for interactivity. Checkpoint rate must be limited to make the overhead manageable. Also avoid taking pointless checkpoints (like when the desktop clock changes). A checkpoint per second is sufficient granularity.</li>
<li>On revival, use UnionFS to put a CoW R/W layer on top of the R/O checkpointed file system. Can therefore fork history.</li>
<li>Evaluation: based on real implementation - overhead, impact on interactivity and storage requirements; and access to data (latency etc.). Ran a bunch of benchmarks, based on normal workloads, like web browsing, video playback, catting a large file to screen, doing a kernel build, or something in Matlab. Also evaluated based on real desktop usage by grad students.</li>
<li>Little overhead for display or execution recording; 100% overhead for text recording; 125% for full recording when web browsing (due to a bug in Firefox&#8217;s accessibility). All other use cases have much less overhead. Checkpoint latency: 5ms to 22ms downtime; total checkpoint time from 80 to 200ms. Storage growth is very low for the real usage scenario (lower than a PVR with equivalent display resolution). Browse and search latency no more than 200ms, which is good enough for interactive use. Playback speedup about 200x for real usage. Session revive takes 1.5 to just over 4s.</li>
</ul>
<h3>Improving File System Reliability with I/O Shepherding</h3>
<ul>
<li>Storage systems are becoming complex, which can lead to complex failures. Want to manage disk and individual block failures. Addressed by checksumming, parity, mirroring, versioning, etc. However, these techniques are insufficient, and poorly understood (unlike performance and consistency). There is no good strategy in commodity FSs, only coarse-grained policies.</li>
<li>Reliability treatments are diffuse and inflexible. Different apps require different policies (e.g. desktop workload versus web servers).</li>
<li>I/O shepherd: localised, flexible policies for different types of storage. e.g. Mirroring for archival data, checksumming for scientific data, or different levels of protection based on the quality of the drive. These policies can be composed.</li>
<li>I/O shepherding layer interposed between the FS and the disk subsystem. Includes >=1 policy tables, linking to policy code, policy primitives and policy metadata.</li>
<li>Policy table: for specifying policies. Different kinds of block types and volumes, and levels of reliability and importance. So we have a write policy and read policy for each block type (like a function pointer); and a policy table for each volume (or mount point?).</li>
<li>Policy metadata: need remapping, mirroring and sanity checking. Must be integrated with the FS. Also need I/O Shepherd Maps, to identify e.g. mirror locations.</li>
<li>Primitives and Code: want to make reliability management simple. Primitives like &#8220;Checksum&#8221;, &#8220;Parity&#8221;, &#8220;Sanity Check&#8221;, &#8220;Allocate Near&#8221;, &#8220;Allocate Far&#8221;. These are then composed into a full policy, in the policy code.</li>
<li>Challenge is to keep the new data and metadata consistent in the presence of crashes. Need consistency management.</li>
<li>CrookFS: ext3 with shepherding capabilities. 900LOC changes to the OS, and 3500LOC for the shepherding infrastructure. Small overhead. Integrates reliability policy with journalling: execute policies during checkpoint.</li>
<li>However, we can&#8217;t run e.g. remapping during checkpointing, because the checkpoint might fail, leaving us in an inconsistent state. If we keep the remap table in memory, the remap succeeds, but the program crashes before we can sync the table, we have the disk in an inconsistent state. At present, there is no consistency for any checkpoint recovery that changes state. Consistent reliability with the current journalling system is impossible!</li>
<li>Thus we need &#8220;chained transactions&#8221;, which says that only after a chained transaction (updating the remap table) can the previous transaction (doing the remapping) be released. This fixes the flaw in journalling. It is repeatable across crashes, as long as we have idempotent policies.</li>
<li>Evaluation: [see the paper].</li>
</ul>
<h3>Generalized File System Dependencies</h3>
<ul>
<li>A new architecture for constructing file systems, using the &#8220;generalised dependency abstraction&#8221;.</li>
<li>e.g. for consistency, we must keep FS consistent after every write, by, e.g. journalling. However must tradeoff durability features against performance. A file system must pick one tradeoff, but why can it not be extensible? This is difficult because it&#8217;s difficult to implement, complicated due to caches, and because correctness must be maintained.</li>
<li>What is a simple, general mechanism for implementing any consistency model? The &#8220;patch&#8221; abstraction in their implementation, Featherstitch.</li>
<li>Featherstitch contributions: the patch and patchgroup abstractions (explicit write-before abstractions, and FS-agnostic). Replaces Linux&#8217;s FS and buffer cache layer with implementations of ext2 and UFS. Journalling, WAFL and soft updates implemented just by using patch arrangements.</li>
<li>Patches: a disk data change and any dependencies it has on other disk data changes. Includes some undo data. The dependency says which data must be written by which others. Benefits of this are to separate write-before specification and enforcement, and makes these relationships explicit. Inspired by soft updates, but don&#8217;t enforce a particular FS and consistency model.</li>
<li>Example of rename: involves adding a directory entry and removing the old one. So we could write the source with the removed entry, then write the new entry. But what if we crash in between?</li>
<li>Soft updates: inc ref count for the file inode, then add the new dirent, then remove the old dirent, then dec the ref count. However, this has a cyclic dependency for block updates. But, in patches, this is not a cycle. So we can do the increment patch, then do the other patches, with no cycle.</li>
<li>Also showed how this would be done with journalling and WAFL: somewhat more complicated.</li>
<li>Patchgroups: arise from application-defined consistency requirements. Commonly just syscall to the buffer cache (fsync, sync), or depend on the underlying FS. Instead, the patch interface is extended to user space, using patchgroups, which enables the specification of write-before requirements between syscalls. Can make things more efficient by not doing a bunch of syncs.</li>
<li>Patch optimisations: massive dependency graph between patches even when allocating a few blocks for a new file, and so patches must have a lightning-fast implementation. Main primary overhead is unused undo data (150% overhead on some benchmarks). How do we detect where this is unused? Theorem is that undo data is only necessary when there are block-level cycles. So we should specify all dependencies when creating a patch.</li>
<li>Next: the patch data structures take a large amount of memory. Therefore try to merge them, i.e. if they are on the same block (hard patch merging) and when they overlap. There are several other optimisations in the paper!</li>
<li>Evaluation: measure optimisation effectiveness, compare with ext2 and ext3, check consistency, and run a benchmark (UW IMAP).</li>
<li>The optimisations are successful in reducing the number of patches, system time and undo data overhead.</li>
<li>Comparison with Linux: Featherstitch is between 19 and 26% slower at the PostMark benchmark. It&#8217;s faster on other benchmarks, where differences in the block allocation strategy outweigh the overhead.</li>
<li>Consistency correctness: under random crashes, is the FS consistent. Works on Soft updates, Journalling as expected.</li>
<li>Benchmark: moving 1000 messages from one IMAP folder to another. Patchgroups are much faster than using fsync.</li>
</ul>
<h2>Operating System Security</h2>
<h3>Information Flow Control For Standard OS Abstractions</h3>
<ul>
<li>Problem: vulnerabilities in websites can lead to exploits. Solution: decentralised information flow control (DIFC). Example: admin wants to secure web app that handles secret and non-secret data. Use a &#8220;declassifier&#8221; which decides who gets to see what data. User authenticates to the declassifier then passes all requests through it. Even a bug in the webapp cannot leak data, because the declassifier makes the final decision on who can see what. So why is nobody using this, if it&#8217;s so good?</li>
<li>Too hard to write the declassifier, requires smart people to develop all aspects of the system. Label systems are complex. Programming with DIFC can lead to unexpected behaviour. And it prevents the reuse of existing code, such as commodity operating systems.</li>
<li>Unexpected behaviour: unreliable communication between two processes of different security types; mysterious failures when a secret process elevates the security of another process, and therefore prevents it from continuing with a task such as writing to an unsecure file (so that it can&#8217;t leak data from the secret process).</li>
<li>Solution: Flume to solve DIFC problems (user-level DIFC on Linux with a simple label system, and glue between Unix API and labels). Then develop an application and evaluate it. Goal: want to be able to install this on existing machines using a apt-get.</li>
<li>Uses system call delegation which forwards system calls to a user-level Flume reference monitor.</li>
<li>Three process classes: confined (all syscalls go through refmon), Flume-oblivious (with no access to secret data), and unconfined/mediators (a mix of the two).</li>
<li>Simple label system: each process gets a secrecy label which summarises the categories of data that a process is assumed to have seen. (e.g. Financial Documents, HR Documents.) Single category is a tag; set of tags is a label. Any process can add any tag to its label, but it cannot remove it without special privileges. A process can create a new tag, and is given the ability to declassify it (and hence remove it from its secrecy label).</li>
<li>Communication rule: process p can send to q if p&#8217;s label is a subset of q&#8217;s label.</li>
<li>Endpoints: intermediaries between processes for communication (per file-descriptor?), which can be labelled independently from the processes, and which declassify data. This is only possible if the owning process is able to declassify a tag in the secrecy label of the endpoint (better specified in set notation). A process can have many endpoints. This addresses some of the mysterious failures by eagerly revealing errors (which would violate the endpoint invariant).</li>
<li>Example application is the Python-based MoinMoin wiki (100KLOC). 43 instances of the same check to see if the current user is able to access a bit of data, which have led to bugs, and plugins further complicate this. Instead add a 1KLOC declassifier. TCB is server + declassifier. Untrusted code is the Wiki plus any plugins.</li>
<li>Apache is Flume-oblivious. Declassifier is a mediator. MoinMoin is confined, i.e. must syscall through the refmon.</li>
<li>Results: Flume allows the adoption of existing Unix software: just 1% of MoinMoin had to be changed (no change in Python interpreter or Apache). By using Flume, two ACL bypass bugs were solved automatically. Performance is within a factor of 2.</li>
<li>Limitations: bigger TCB than HiStar or Asbestos (Linux stack + 22KLOC refmon). Disk quotas are a covert channel.</li>
<li>Q: On the example showing how endpoints are made, and secrecy labels may be changed, what if a malicious plugin attempts to perform declassification? This isn&#8217;t a problem, because, if the secrecy were critical to other applications, it wouldn&#8217;t be able to declassify the tag (because the tag would have been created by another application). Untrusted software would not have the elevated privilege to declassify tags.</li>
<li>Q: Can the described policies be implemented inside SELinux? For a single application, probably yes, but the policies would become complex if it were possible to add applications to the change. Fluke simplifies system administration and transfers these decisions to the content provider.</li>
</ul>
<h3>SecVisor: A Tiny Hypervisor to Provide Lifetime Kernel Code Integrity for Commodity OSes</h3>
<ul>
<li>Kernel rootkits: malware inserted in OS kernels to avoid detection by user-level scanners. Becoming increasingly common in the wild. DMA-based attacks are becoming common. Current (user-space) security tools are insufficient because they assume kernel integrity, and cannot find all attacks (as they are detection-based).</li>
<li>Aim: a hypervisor that prevents injected code from executing at kernel privilege. Permit only user-approved code to execute a kernel privilege, based on a user-specified approval policy. Goals: security and ease of porting (not performance).</li>
<li>~1100LOC hypervisor, that operates at a ring below the kernel, but doesn&#8217;t attempt to do anything with hardware. Enforces approved code execution in kernel mode, which holds over system lifetime.</li>
<li>Assumes: attacker can perform all attacks except HW attacks against CPU and memory. Can modify memory contents, perform DMA writes and modify system firmware. Can also have knowledge of 0-day kernel exploits. Single CPU with SVM or VT. Executed in 32-bit mode without the use of self-modifying code. No vulnerabilities in SecVisor (going to formally verify that).</li>
<li>Require: constrained instruction pointer, which should be within approved code regions whenever CPU is in kernel mode. Approved code regions must be immutable, and so cannot be modified by an attacker.</li>
<li>Constrained IP: kernel mode entry should set IP to being within approved regions. IP must remain within approved regions as long as we stay in kernel mode. Each kernel mode exit should set the CPU privilege level to user mode.</li>
<li>Kernel Entry: exception/interrupt happens, looks up handler in the interrupt vector, and jumps to that IP in kernel mode. So all entries in the vector must point to approved code.</li>
<li>During execution: place write XOR execute protection over kernel memory. But the problem is that the kernel and application share the same address space, so what if the attacker attempts to jump to user-space code, maybe by performing a buffer overrun. Solution: mark all application memory as non-executable (NX) on kernel entry.</li>
<li>Intercepting user-to-kernel switch: all CPU entry pointers point to approved code; mark approved code regions as NX during user mode execution; all user-to-kernel switches raise exceptions.</li>
<li>Kernel exit: intercept all kernel-to-user switches by using the exception that is raised by trying to execute NX code in user space after the switch back to user space.</li>
<li>Immutable approved code: memory may be written by software executing on CPU or by DMA writes by peripherals. Approved code must be marked read-only. IOMMU protects from DMA writes.</li>
<li>Implementation of memory protection: set protection independent of OS, using shadow (or nested) page tables. DMA exclusion vector to protect approved code.</li>
<li>[I don&#8217;t think I need to include the explanation of shadow page tables.]</li>
<li>Shadow PT is used to set memory protections. Approved code regions are set read-only in the SPT, and DEV prevents DMA writes. We also need to prevent aliasing of pages that are in the approved regions to non-approved VAs.</li>
<li>Check entry pointers to ensure they contain VAs of approved code. GDT, LDT and IDT must be protected by shadowing these (to protect them from DMA writes).</li>
<li>Overhead from intercepting all switches, and the shadow PT synchronisation and kernel PT checks. I/O-intensive workloads will perform poorly.</li>
<li>Specint benchmarks show that SecVisor is about 2 to 5% slower than Xen (except for the gcc test, where it is much slower). Some optimisation has been done in the latest version.</li>
<li>Q: Many past kernel exploits change the uid of user processes to overwrite kernel memory; does SecVisor address this? It doesn&#8217;t protect the integrity of kernel data structures. This is future work.</li>
<li>Q: Performance effect of nested page tables? Shadow overhead goes away, but you still have to check and protect the kernel page tables.</li>
<li>Q: Knowing everything you know about the system, how would you attack it as an intruder? Try to attack the TCB, but it is very small, so we don&#8217;t expect that this will be a problem.</li>
<li>Q: Deal with loadable modules? Paper discusses this: you could have a hash-based approval policy, and SecVisor performs code relocation on behalf of the kernel.</li>
<li>Q: How to protect the stable storage off which SecVisor is loaded? Use skinit, along with trusted computing techniques.</li>
</ul>
<h3>Secure Virtual Architecture: A Safe Execution Environment for Commodity Operating Systems</h3>
<ul>
<li>Safe execution environment: such as that provided by Java or C#. Array indexing within bounds, no use of uninitialised variables, type safe operations, no dangling pointers, control flow follows semantics, etc.</li>
<li>Commodity OSs do not use these kind of languages.</li>
<li>Using as secure exec env would provide security, novel design opportunities and the ability to develop new solutions to higher-level security challenges (such as information flow policies, for example).</li>
<li>Secure virtual architecture: interpose a compiler-based VM between a commodity OS and hardware. The VM provides a virtual instruction set architecture.</li>
<li>Contributions: a compiler-based VM that can host a commodity OS. First to provide security guarantees for a complete commodity OS.</li>
<li>Safety-checking compiler: similar to a C compiler which translates C into the virtual ISA (application level). Generates bytecode rather than native code. Bytecode has a type system which enables alias analysis in the VM.</li>
<li>VM contains a safety verifier (which ensures that the compiler has done the right thing; may insert runtime checks), native code generator, and OS and memory safety runtime libraries. The compiler is outside the TCB; only the verifier and code generator are inside.</li>
<li>Virtual ISA based on LLVM. Added OS-neutral operations (SVA-OS) that remove difficult to analyze assembly code and encapsulates privileged operations. Porting to SVA-OS is like porting to a new arch.</li>
<li>Goals: maximise safety guarantees, minimise reliance on OS correctness, minimise changes to the kernel, and retain original kernel memory allocators.</li>
<li>Guarantees: just like Java or C#, but type safety only for a subset of objects (due to the use of C code), and dangling pointers are harmless (rather than never used). These limitations do no compromise the other guarantees.</li>
<li>Checks: load/store, bounds, illegal free and indirect call. Transforms: stack to heap promotion to deal with dangling pointers, and initialising memory. Need to track object bounds: use object lookups. Naïvely, the security verifier would add code to register all allocations in the VM metadata. All uses of these objects would have to follow a lookup to see if the access is within bounds. Can improve this by alias analysis, which groups objects into logical partitions, reducing overhead from between 400 and 1100% to between 10 and 30%.</li>
<li>Implementation: ported Linux to the SVA ISA, compiled using LLVM. SVA-OS is a runtime library linked into the kernel. Guarantees safety of everything apart from the kernel memory management code and [blink and you&#8217;ll miss it, see the paper]. Around 5KLOC had to be changed in Linux.</li>
<li>Memory safety overhead is less than 59% (for a web server benchmark). Latency is bad with many small transfers. 80% of exploits caught (the non-caught one was in a library that isn&#8217;t currently ported to the VM).</li>
<li>Several performance improvements are possible, by changing the code, using smarter checks and smarter static analysis.</li>
<li>Q: To what extent does the analysis equate to type safety? Do not guarantee that pointers point to the correct object, but that it is a valid object.</li>
<li>Q: Must the number of memory pools be statically allocated? No. What if memory pools are allocated at run-time? Even then, its use is statically identifiable.</li>
<li>Q: Is the overhead really small enough to make this approach suitable for use today? Improving the overheads is future work.</li>
</ul>
<p>And, with that, I&#8217;m off to Portland for an afternoon of shopping in a city where £1 buys $2.03, and there&#8217;s no sales tax. See you when I get back!
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.mrry.co.uk/blog/2007/10/17/sosp-2007-day-3/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>SOSP 2007: Day 2</title>
		<link>http://www.mrry.co.uk/blog/2007/10/16/sosp-2007-day-2/</link>
		<comments>http://www.mrry.co.uk/blog/2007/10/16/sosp-2007-day-2/#comments</comments>
		<pubDate>Tue, 16 Oct 2007 16:24:00 +0000</pubDate>
		<dc:creator>Derek Murray</dc:creator>
		
	<category>Uni</category>
	<category>Technology</category>
		<guid isPermaLink="false">http://www.mrry.co.uk/blog/2007/10/16/sosp-2007-day-2/</guid>
		<description><![CDATA[I&#8217;ll skip doing a review of the banquet last night, so it&#8217;s on with the papers!

Software Robustness
Bouncer: Securing Software by Blocking Bad Input

Filters block bad input before it is processed. Low overhead and no false positives. Programs can keep running even when under attack.
Performs symbolic execution of programs to perform analysis and generate input filters. [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ll skip doing a review of the banquet last night, so it&#8217;s on with the papers!</p>
<p><a id="more-27"></a></p>
<h2>Software Robustness</h2>
<h3>Bouncer: Securing Software by Blocking Bad Input</h3>
<ul>
<li>Filters block bad input before it is processed. Low overhead and no false positives. Programs can keep running even when under attack.</li>
<li>Performs symbolic execution of programs to perform analysis and generate input filters. Works on machine code.</li>
<li>Pre-condition slicing: compute a subsequence of trace instructions, executing which is sufficient to create a vulnerability. Find instruction with vulnerability (e.g. call to sprintf), then move backwards through the trace, adding the instructions on which the vulnerable instruction depends.</li>
<li>Deployment in a distributed scenario. When an exploit is found, run Bouncer, calculate an improved filter, and deploy this. Or run centrally on a cluster to find exploits in parallel.</li>
<li>Nirvana generates traces. Phoenix implements slicing.</li>
<li>Evaluated against four real vulnerabilities (including the Slammer worm). Ran experiment for 24 hours.</li>
<li>Filters have no false positives. Found perfect filters for two vulnerabilities. Some false negatives for two others.</li>
<li>Throughput is much closer to 100% when using filters, compared to having to restart on each attack. Can still maintain 80% throughput with 18000 attack probes per second.</li>
</ul>
<h3>Triage: Diagnosing Production Run Failures at the User&#8217;s Site</h3>
<ul>
<li>What to do with a production run failure (send error report)? At present, we just send a core dump, which takes a lot of manual effort to diagnose. At the same time, these failures cause damage to end users.</li>
<li>Reproduction is hard, because user environment is unique. Furthermore, there are privacy concerns.</li>
<li>Core dumps tell you what the failure is. Bug detection tells you about some errors. Diagnosis is a means of tracing back to the underlying fault, but existing tools are offline (e.g. requiring programmer input).</li>
<li>To diagnose: (i) need information about the failure (fault, error, propagation tree), but off-site techniques don&#8217;t work on-site; (ii) need guidance about what to do next, the kind of analysis to perform, the interesting variables to investigate (but there is no programmer to do this, so decisions must be taken automatically); (iii) need to be able to test &#8220;what-ifs&#8221;, such as changing input and seeing what happens, but most techniques focus on replaying the same thing.</li>
<li>Triage enables on-site diagnosis, using systems techniques to make offline analysis tools useful on-site, addressing the three challenges. It introduces a new technique called delta analysis. Tested in a human study with real programmers and real bugs.</li>
<li>Uses checkpointing and re-execution to capture the bug. Allows replaying the failure repeatedly, using different analysis tools. Little overhead in the normal-run case. Checkpoints every 200ms, keeping 20 previous checkpoints, in memory to lower overhead.</li>
<li>Uses a human-like &#8220;diagnosis protocol&#8221;. Repeated replay enables an incremental diagnosis. e.g. If the bug doesn&#8217;t always repeat, it may be a race condition.</li>
<li>But the replay isn&#8217;t identical: it can either be deterministic (plain), loose (tolerating some variance, such as the introduction of analysis tools which change the syscalls) or wild (including potentially large variations, maybe changing input or the code itself). Delta analysis is similar to diffing a pair of (successful and failing) runs. Triage doesn&#8217;t need to be safe when it replays execution: it only cares about why the bug happens.</li>
<li>Failure analysis variety of methods (bounds checking, taint analysis, symbolic execution, etc.) and delta generation (rearrange allocations, drop inputs, mutate inputs, drop code(!)).</li>
<li>Delta analysis: compute the basic block (of code) vector (1 if executed, 0 if not), and subtract them. The two closest runs are diffed (finding the edit distance and shortest edit script). So a basic block that differs could for example be to return NULL, and it is then dereferenced.</li>
<li>Human study: 15 programmers from faculty, researchers and grad students. Measured times to repair bugs with and without Triage. (People got core dumps, sample inputs, instructions for how to replicate, and access to debugging tools.) The evaluation didn&#8217;t expect the participants to work out how to replicate the bug, and set a maximum time.</li>
<li>47% time saving on diagnosing real bugs, significant with probability of null hypothesis < 0.01%.</li>
</ul>
<h3>/* iComment: Bugs or Bad Comments? */</h3>
<ul>
<li>Many bugs are due to mismatches between the code and the programmer&#8217;s assumptions. Such as assuming that a lock is held when a piece of code is executed. Comments express these assumptions. 20% of Linux is comments (excluding blank lines and copyrights), but they are not utilised by compilers or bug detection tools.</li>
<li>However, comments are imprecise, compared to code, and cannot be tested. Therefore they tend to become less reliable as software evolves. Nevertheless, they are easier to understand for programmers. So wrong comments are likely to mislead programmers. Difficult to infer assumptions from source code (but see the MUVI paper from yesterday).</li>
<li>Mismatches indicate either bugs or bad comments. Challenge is to understand the natural language of comments. NLP techniques are not enough: POS tagging (97% accurate), chunking (90%) and semantic role labelling (70%). But they assume correct grammar, which comments don&#8217;t always have.</li>
<li>Therefore, also use machine learning, staistics and program analysis (with NLP). 1832 rules extracted, detecting 60 new bugs or bad comments (19 of these confirmed). Looking at call and locking bugs only.</li>
<li>Focus on the comments that contain rules (rather than explaining code), as these are more likely to be inconsistent with the code.</li>
<li>Example rule:  must be held before entering .</li>
<li>Comment classifier is trained using a decision tree building algorithm. Some manual training is required (at least once per topic).</li>
<li>Validated using Linux, Mozilla, Wine and Apache, looking at lock- and call-related topics. Overall, 60 mismatches, 33 (12 confirmed) bugs, 27 (7) bad comments and 38 false positives, from 1832 rules. Training accuracy is between 90 and 100% (for lock-related comments). Cross-software accuracy is between 78 and 90% (saves the amount of training).</li>
</ul>
<h2>Distributed Systems</h2>
<h3>Sinfonia: A New Paradigm for Building Scalable Distributed Systems</h3>
<ul>
<li>The protocols in current distributed algorithms are complicated, error-prone, and often difficult to understand. We want to avoid such protocols. Focus is on data centre systems, with small and predictable latencies, and occasional crashes. Focus also on infrastructure applications, like cluster file systems, lock managers and communication services.</li>
<li>Sinfonia is a data sharing service, comprising a set of memory nodes which each export a linear address space (no structure imposed). Protocol design becomes the easier problem of shared data structure design. Minitransactions are used to access the memory nodes.</li>
<li>Minitranscations have ACID properties, balance power and efficiency. Reduce network round-trips, but are still flexible and easy to use. Semantics: perform a bunch of compare items (for equality over an address range), if all of these match then retrieve the read items and perform the write items. Execution of the transaction is piggybacked on the two-phase commit process, to minimise round-trips.</li>
<li>Commit coordinator runs at the application, which may crash, so a new two-phase commit protocol was necessary. Based on all memory nodes logging a &#8220;yes&#8221; vote, rather than the coordinator logging a &#8220;commit&#8221; decision.</li>
<li>Applications: sinfoniaFS (a cluster file system) and sinfoniaGCS (a group communication service - i.e. a chatroom with ordered notifications).</li>
<li>sinfoniaFS performs as well as LinuxNFS in the Andrew benchmark. sinfoniaGCS scales better than Spread.</li>
</ul>
<h3>PeerReview: Practical Accountability for Distributed Systems</h3>
<ul>
<li>Fault in a distributed system with distributed state can lead to incomplete information. In the general case, there could be multiple admins with different interests (and possible malicious intent).</li>
<li>Many faults are not fail-stop but instead change behaviour of a node. Dealing with general faults is difficult, because of a lack of control over the system. How can faults be detected or the faulty nodes identified. Then how do you convince others that a node is faulty (or not faulty) and must be fixed.</li>
<li>Real-world systems rely on accountability. e.g. signed receipts, double-entry bookkeeping and auditing. These can be used to detect, identify and convince about faults.</li>
<li>A fault is when a node deviates from expected behaviour. We want a system that generates a proof of misbehaviour against a faulty node. Difficult if we consider a fault that affects only a node&#8217;s internal state (requires an online trusted probe at each node). So we focus on observable faults: those that causally affect a correct node, and don&#8217;t require a trusted component.</li>
<li>Verifiable evidence: a proof of misbehaviour or a challenge that a faulty node cannot answer.</li>
<li>Accountability: whenever a fault is observed by a correct node, the system eventually generates verifiable evidence against a faulty node.</li>
<li>Implementation: as a library, PeerReview. Assumes that nodes modelled as a collection of deterministic state machines, with each node holding a reference implementation of all state machines. Also that correct nodes can eventually communicate and nodes can sign messages.</li>
<li>All nodes log all inputs and outputs. Each log is audited periodically by witnesses (a set of nodes for each node). If misbehaviour is detected, evidence is generated and distributed to other nodes.</li>
<li>Log entries form a hash chain to prevent forgery of a log. Keeping multiple logs or forking logs is prevented by checking the signed hashes to ensure that a single linear log is kept. Faults in a log are recognised by replaying the state machine with the known inputs and checking outputs against the log.</li>
<li>PeerReview guarantees that faults will be detected and good nodes cannot be accused.</li>
<li>Applicability: NFS server in the Linux kernel, overlay multicast, and P2P email system. Deals with small latency-sensitive requests, large transfers, and complex, large and decentralised systems.</li>
<li>Dominant cost depends on the number of witnesses per node.</li>
<li>Normal P2P email scheme scales as O(log n) with n being the number of nodes. PeerReview can scale O((log n)^2) if you accept a 99.999% probability of detecting faults. (100% accuracy is prohibitive (linear, I think)).</li>
</ul>
<h3>Dynamo: Amazon&#8217;s Highly Available Key-Value Store</h3>
<ul>
<li>Amazon architecture: loosely coupled SOA, with stateful services that manage their own state. Requirements on latency, availability and scale.</li>
<li>State management is the primary factory affecting scalability and availability. Data is only ever accessed by primary key, in self-describing blobs, so key-value is better than RDBMS (don&#8217;t need a query optimiser, triggers, etc.). RDBMS scale up, not out, and (strong, transactional) consistency is less important than availability.</li>
<li>Dynamo requirements: always writable (accept writes during failures), e.g. to add an item to a shopping cart must always be possible; user-perceived consistency (so can&#8217;t be too weak on consistency); performance with latency in the 99.9th percentile; low cost; incremental scalability; and tunable knobs for cost, consistency, durability and latency. No production-ready systems met these requirements</li>
<li>Replicated DHT with consistency management, using: consistent hashing, optimistic replication, sloppy quorum; anti-entropy mechanisms; and object versioning.</li>
<li>DHT is similar to Chord, using MD5 as the hash function. Clique for routing between replicas. Each storage node given many locations on the ring for load-balancing. Trade-off between balancing load and keeping membership tables concise.</li>
<li>Configurable for different scenarios e.g. consistent, durable, interactive user state; a high-performance read engine; or a distributed web cache.</li>
<li>Able to meet a 500ms SLA on latency, for the implementation of a shopping cart.</li>
<li>Drawbacks: no indefinite scaling, not appropriate if transactional semantics required, more challenging to program than using ACID properties.</li>
</ul>
<h2>System Maintenance</h2>
<h3>Staged Deployment in Mirage, an Integrated Software Upgrade Testing and Distribution System</h3>
<ul>
<li>Typically test on a couple of vendor machines, then a bunch of beta tests in the outside world, but despite this, the software will still fail on user machines in the outside world (due to dependencies, odd configurations, etc.). Mirage enables vendors to cluster outside world machines in terms of the their configurations/environment, so as to ensure coverage of beta testing before public release (and hence improve reliability). This reduces &#8220;upgrade overhead&#8221;: the number of machines that fail.</li>
<li>Environment must be identified and fingerprinted, before clustering. Then the order and speed of deployment must be chosen by the vendor.</li>
<li>In a cluster, all machines &#8220;behave identically wrt an upgrade&#8221;. Benefit depends on quality of clustering and the quality of testing.</li>
<li>Identification: instrument the application to determine the resources (libraries, config files, environment variables and dependent applications) that it uses during a normal run. Filter out noise (log files, temp files and data) based on heuristics and vendor-provided rules. A library becomes a (name, version, hash of binary) tuple. The hash is used in case there are different builds, for example.</li>
<li>Tradeoff between low upgrade overhead (few failures) and fast deployment. Choice between deploying in parallel to many clusters (fast, but high overhead), or sequentially (slow, but low overhead). Fixing a problem for one environment might also fix it for another environment (benefit of serialisation).</li>
<li>Evaluated the quality of clustering (identification and clustering), and staged deployment (the upgrade overhead and deployment speed).</li>
<li>Accurate identification: between 200 and 850 environment resources (on firefox, apache, php and mysql), between 0 and 7 vendor rules, yielded 0 errors in any case.</li>
<li>Accurate clustering: 21 different environments, with 2 distros of linux, PHP and Apache and various MySQL configurations. Yielded 0 misplaced machines within 15 clusters. Varying the fingerprinting granularity can vary the number of clusters.</li>
<li>Controlling the tradeoff: simulation with 100,000 machines, 3 problems and 2 staging protocols. Upgrade overhead reduced very significantly, with, in the worst case, a 25% increase in deployment time.</li>
<li>Studying &#8220;real machines&#8221; and deployments is the subject of future work.</li>
</ul>
<h3>AutoBash: Improving Configuration Management with Operating System Causality Analysis</h3>
<ul>
<li>Current approach to configuration management is to ask friends, search online, read manual, try potential solutions. Frustrating, time consuming and tedious! Hard to undo a wrong solution or know if the problem was solved. A solution can cause new problems.</li>
<li>AutoBash: tries many solutions at once, provides an undo capability, explains to the user how it solves a problem and automatically runs regression tests.</li>
<li>When a problem is detected, there are two modes: Replay mode which automatically searches for a solution; and Observation mode which helps the user to fix the problem. All the time, there is a Health monitoring mode.</li>
<li>Observation mode: a modified bash where user tries to fix a problem by typing a command then testing if the app works, then possibly undoing the command and rolling back the command, before trying again. Has &#8220;predicates&#8221; which test if an application functions correctly and returns true iff the test passes. Idea is for application to be shipped with a testsuite of predicates. Speculator (SOSP&#8217;05) is used to perform process-level speculative execution, and make predicate testing safe (undoable). Shell provides a rollback command, used in conjunction with the speculative execution, to undo the attempted command.</li>
<li>Regression testing is handled by running all predicates in some database. This can be slow, however, so causality tracking is used to find which predicates might be affected by a change.</li>
<li>Tracking causality: Output set of kernel objects that an action causally affects, and Input set of kernel objects on which a predicate causally depends. If the output set of an action and input set of a predicate intersect, then the predicate must be checked. The output set can be tracked by syscall tracing, and then trimmed by excluding temporary objects (such as processes, or temporary files). Similarly, this can be done to determine the input set of a predicate.</li>
<li>Generating a causal explanation: the important actions are the ones whose output sets intersect with the input sets of the newly-passing predicates.</li>
<li>Replay mode: assume that vendors ship &#8220;common solutions&#8221; with their software, so that these may be automatically tested (safely, using speculative execution) in the background, while the user gets on with work.</li>
<li>Evaluation: overhead of speculative execution and effectiveness of causality analysis? Looked at CVS, GCC cross compiler and webserver. Created 10 bugs and 10 solutions.</li>
<li>Very little overhead (non-significant) of speculative execution. Causality analysis almost halves the total replay time, with a slight increase in the predicate testing time.</li>
</ul>
<h2>Energy</h2>
<h3>Integrating Concurrency Control and Energy Management in Device Drivers</h3>
<ul>
<li>Concentrating on wireless sensor networks. Concurrency of I/O operations are considered (synchronous versus asynchronous). Energy management is considered in terms of the necessary power state of devices to perform I/O.</li>
<li>The more workload information the app can provide the scheduler, the less energy needs to be used.</li>
<li>Hard to tell OS about application workloads: API extensions for hints to energy management. Done for CPU voltage scaling and disk spin-down. Sensor networks need a unique solution, however: harsh requirements, small power source, and need to run unattended for periods ranging from months to years. 1st generation OSs push all decisions to the applications.</li>
<li>ICEM: a driver architecture that automatically manages energy, implemented in TinyOS 2. New concept of Power Locks: split-phase (asynchronous) locks with integrated energy and configuration management. Dedicated, shared and virtualised drivers and a component library for building them.v Energy efficiency is 98.4% of hand-tuned implementation. Also much easier to program.</li>
<li>Case study on the TMote platform: three interfaces to six total devices. Logging application with producer, which writes sensor samples out to flash every 5 minutes, and a consumer that sends all samples from flash over the radio every 12 hours. Code complexity of hand-tuned implementation is very complex: need to order the power-on and -off of each different device. ICEM hides these calls, greatly simplifying the code.</li>
<li>Split-phase I/O operations: single thread of control, where read() call is given a callback, readDone(), that allows the driver to poke the application when it is done. These can then be scheduled efficiently, because the driver has information about what has to be done before it considers switching the device off.</li>
<li>Virtualised driver: only a functional interface. Assume multiple concurrent users, buffering requests for concurrency management, and managing energy by looking at pending requests. Such drivers have longer latencies, because the underlying device has to have synchronous access. Suitable for high-level hardware (like a block device?)</li>
<li>Dedicated driver: functional and power control interfaces. A single user only with no concurrency control and explicit energy management. Suitable for low-level hardware.</li>
<li>Shared driver: functional and lock interfaces. Multiple users, explicit concurrency control (using the lock), and implicit energy management based on pending requests.</li>
<li>Power locks: hardware-specific configuration and power interfaces, and a lock interface. Talk to a dedicated driver. First, request access to the lock, which powers on the underlying device and configures it and calls back the application (split-phase). Other locking requests are queued after the current user. Power down only happens when all pending lock requests have been fulfilled and released.</li>
<li>Arbiter: performs concurrency control and queuing/scheduling (FCFS and round-robin). Configurator: calls h/w-specific configuration interface on the dedicated driver. Power Manager: powers down device when it falls idle, and powers it back up when a new lock request comes in (with immediate and deferred policies).</li>
<li>Evaluation: comparing hand-tuned, ICEM, optimal serial and worst-case serial orderings, in the sensor logging application described above. ICEM performance very close to hand-tuned, and better than both serial versions.</li>
<li>ICEM overhead is 5.60% of total sampling energy. Node lifetime for ICEM is between 98.4% and 100% of the hand-tuned version, as the sampling period is increased.</li>
</ul>
<h3>VirtualPower: Coordinated Power Management in Virtualized Enterprise Systems</h3>
<ul>
<li>Two key benefits of integrated power management: power savings are proportional to cost savings, and cooling capability is proportional to rack utilisation. Information and capabilities exported ACPI have led to application-specific power management policies. How do we do this in the context of virtualisation?</li>
<li>Problems: what should be exposed? Are there issues with isolation? Are there power benefits arising from resource sharing? VPM can give a 34% power improvement.</li>
<li>Data centres have a heterogeneous collection of platforms. Virtualisation should abstract this (e.g. because of migration), but this is tricky with different power characteristics in the underlying hardware. The workload stays the same, so we should virtualise the power management states: therefore we can have a consistent view of manageability across migrations.</li>
<li>What about isolation? Disassociate changes to virtual state from the management of the physical state: front-ends in DomUs, back-end in Dom0, the latter of which manages the physical state. Notion of soft scaling which allows the possibility of &#8220;scaling&#8221; a virtual CPU, which enables consolidation of soft-scaled virtual resources. (Two soft-scaled VCPUs on a single full-power CPU, but you might be able to turn off another physical CPU, for example.)</li>
<li>VPM rules in Dom0 which control the physical PM. Virtualisation layer policies are driven by VPM state requests from the VMs, which makes an implicit feedback loop. Might need some learning algorithms to derive rules.</li>
<li>e.g. Dom1 changes VPx state via a hypercall and creates a VPM event (comprising the VM soft state, and a shadow state (the physical manifestation of the current soft state). This leads to a new set of shadow states being calculated.</li>
<li>Workloads: tiered web service (RUBIS) (Linux on-demand gonvernor); transactional workload (select policy based on the transaction processing rate and the amount of slack); web service with quality of information metric (Travelport (Worldspan)) (policy based on the QoI and processing time of requests across different client classes).</li>
<li>Complex analysis has up to 4% degradation on throughput. Seems to work with multiple VMs too. [Lots of fairly convincing graphs but I think it&#8217;ll be necessary to take a good look at the paper to judge what we&#8217;re actually being presented.]</li>
</ul>
]]></content:encoded>
			<wfw:commentRSS>http://www.mrry.co.uk/blog/2007/10/16/sosp-2007-day-2/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>SOSP 2007: Day 1</title>
		<link>http://www.mrry.co.uk/blog/2007/10/15/sosp-2007-day-1/</link>
		<comments>http://www.mrry.co.uk/blog/2007/10/15/sosp-2007-day-1/#comments</comments>
		<pubDate>Mon, 15 Oct 2007 15:56:54 +0000</pubDate>
		<dc:creator>Derek Murray</dc:creator>
		
	<category>Uni</category>
	<category>Technology</category>
		<guid isPermaLink="false">http://www.mrry.co.uk/blog/2007/10/15/sosp-2007-day-1/</guid>
		<description><![CDATA[Since day 0 was driving up (through some outstanding scenery) from Portland to Skamania Lodge, I&#8217;ll come back to it when the photos are developed. Therefore, on with a little experiment: the live blog! (The page will grow downwards as more presentations are added.)
N.B. If you&#8217;re reading this through Facebook or are not a die-hard [...]]]></description>
			<content:encoded><![CDATA[<p>Since day 0 was driving up (through some outstanding scenery) from Portland to Skamania Lodge, I&#8217;ll come back to it when the photos are developed. Therefore, on with a little experiment: the live blog! (The page will grow downwards as more presentations are added.)</p>
<p>N.B. If you&#8217;re reading this through Facebook or are not a die-hard Computer Scientist, you might want to ignore it!</p>
<p><a id="more-26"></a></p>
<h2>Web Meets Operating Systems</h2>
<h3>Protection and Communication Abstractions for Web Browsers in MashupOS</h3>
<ul>
<li>Adding inline modules to a single page involves trusting those modules not to interfere with each other or snoop data (e.g. from Google cookie). Want multiple principals in the web browser for client mashups.</li>
<li>&#8220;Same origin policy&#8221; is currently all or nothing: either no cross-domain interactions allowed (iframe) or external scripts run with the privilege of the enclosing page (included script). With latter, risk of cross-site scripting attack (Samy worm: 1 million MySpace users in 24 hours), leads to scripts being disallowed and flexibility being lost.</li>
<li>Browser should be a multi-principal OS. Balance ease-of-use and security.</li>
<li>Protect: memory (heap of script/DOM objects), persistent state (cookies) and remote data access (XMLHttpRequest).</li>
<li>What if integrator (Google) is trusted to access provider (e.g. widget developer) content, but provider is not able to access integrator content? New sandbox and openSandbox tags for &#8220;unauthorised&#8221; content. Invoking code inside sandbox only happens after a setuid call. Put user input in a sandbox and so prevent XSS attacks.</li>
</ul>
<h3>AjaxScope: Remotely Monitoring Client-side Web-App Behavior</h3>
<ul>
<li>Windows Live Local client-side codebase: 70kloc (1MB in size), 2855 functions, interacts with >14 backend services, and all this executes on the host, not in the cloud.</li>
<li>Web 1.0 all done on the server then serves static content: developer has control over machine and can profile it or debug it. How does developer address bugs or performance issues when code is largely running on the client? And of course there are mashups&#8230;.</li>
<li>On-the-fly rewriting to add instrumentation, by a proxy between webapp and the client.</li>
<li>Goals: performance optimisation, debugging, testing (code coverage and A/B tests), user interaction feedback (feature discovery etc.).</li>
<li>No changes to original webapp or client-side browsers.</li>
<li>Experiments tested prototype against 90 websites, including Google Maps, Live Local and 88 other sites.</li>
<li>Adaptive instrumentation: don&#8217;t log every function with timestamps; instead profile whole script and, if it&#8217;s slow, drill down to discover the particular function that&#8217;s taking time (then recurse).</li>
<li>Distributed instrumentation: monitoring for JS memory leaks by looking for runtime patterns indicative of a leak. Check all object assignments for potential cycles: involves an expensive heap traversal, which wouldn&#8217;t scale to all assignments. So give each user a random N% of the assignments to check.</li>
<li>Need to address information protection: should it be possible for credit card information to be stored in the logging DB? At present, don&#8217;t instrument HTTPS pages, but in future could allow developers to mark state that should not be logged.</li>
</ul>
<h3>Swift: Secure Web Applications via Automatic Partitioning</h3>
<ul>
<li>Web development methods lack security reasoning. Swift makes interactive web applications secure and easier to write. Write a single program in a single language, which is automatically split by the compiler. Rich security policies as declarative annotations. Attempts to optimise the partitioning for performance.</li>
<li>For example, want to download state (e.g. a secret number in a guess-the-number game) and behaviour (input validation, checking the guess against the secret) to the browser to reduce number of server roundtrips, but (i) want to keep this secret from a malicious user, and (ii) preserve the integrity of any guess sent back to the server.</li>
<li>Swift code looks like regular Java code with an event-driven GUI.</li>
<li>Security policy on data denoted by labels with named principals (e.g.  Alice->Bob == &#8220;Alice permits Bob to read&#8221;; Alice<-Bob == "Alice permits Bob to write"). e.g. int {server->server; server<-server} secretNumber; int {server->client; server<-server} numberOfTries;</li>
<li>Endorsements after, say, a bounds check, allows a client-provided value to be trusted.</li>
<li>Placement constraints (based on who can read and write), and architectural constraints (e.g. database access must be located on server).</li>
<li>Performance optimisation: minimise number of network messages. Construct weighted control flow graph (with weights estimated by interprocedural dataflow analysis), then execute MIN-CUT/MAX-FLOW on it to find the optimal partitioning. (Slightly more subtle, since nodes could be placed on both the client and server: could for example replicate input validation for fast responsiveness at the client (if wrong) but proper integrity at the server.)</li>
<li>Server keeps state about expected control flow to prevent client attempting to execute arbitrary server code by falsifying a message. Client cannot corrupt server-local variables because server does not accept (non-endorsed?) client-supplied values to update high-integrity variables.</li>
<li>Evaluated by writing a suite of example applications and comparing the number of messages sent by the Swift-generated and hand-optimised versions of the apps.</li>
</ul>
<h2>Concurrency</h2>
<h3>TxLinux: Managing Transactional Memory in an Operating System</h3>
<ul>
<li>Hardware transactional memory is a reality! Sun &#8220;Rock&#8221; chip supports it, and so should Solaris 10.</li>
<li>Locks are hard (deadlock, priority inversion, etc.). Transactional memory in the OS benefits user programs and simplifies programming.</li>
<li>Conventional wisdom maps lock_acquire and lock_release to tx_begin and tx_end, respectively. Transactionalising Linux this way took six person-years, mostly due to issues with I/O and idiosyncratic locking.</li>
<li>Reject conventional wisdom and insist that locks and transactions must cooperate! Retain legacy code. Flexibility to aid performance: TM is good when contention is rare; locks are good when contention is high.</li>
<li>Innovation: Cooperative Transactional Spinlock. Critical sections choose dynamically between locks and transactions. If csec attempts I/O then rollback and use a lock instead. (This took one person-month to implement on Linux.)</li>
<li>Implemented as x86 extensions, using the Simics machine simulator. Benchmarks on pmake, bonnie++, MAB, (parallel) configure, and (parallel) find. Only kernel is using transactions, not user code. 2.5% speedup over Linux (16 CPUs); 1% speedup over Linux (32 CPUs).</li>
<li>Scheduling-aware transactions to <em>eliminate 100%</em> of priority inversion: contention manager decides in favour of higher priority process.</li>
<li>One pathological behaviour (in the Bonnie++ benchmark) which is caused by the exponential backoff in transaction retry.</li>
</ul>
<h3>MUVI: Automatically Inferring Multi-Variable Access Correlations and Detecting Related Semantic and Concurrency Bugs</h3>
<ul>
<li>Variable Access Correlation: programs contain many variables which are not isolated. Correct programs consistently access correlated variables. (e.g. an array and the length of that array, in two variables; different views of the same information (number of packets and bytes received); different aspects of related information (RGBA values in a framebuffer); iterator in a data structure).</li>
<li>Need to ensure consistent access (e.g. when updating array, must update length).</li>
<li>If x and y are correlated, an access (read or write) to x is correlated with an access to y. So we can detect where a programmer forgets to access a correlated variable (e.g. initialise RGB but forget alpha). Or where two correlated variables are updated concurrently (leaving the possibility of reaching an incorrect state due to a race condition).</li>
<li>Statistically infer access correlation based on variable access pattern in source code, assuming that a mature program is mostly correct. Correlation is inferred where variables commonly appear together, and seldom appear separately. Use static code distance between accesses.</li>
<li>Frequent itemset mining to determine potentially-correlated variables. Every variable is an item. Every function is an itemset. The accessed variables, tagged with the type of item, is placed in the itemset. Flow-insensitive, inter-procedural analysis, considering globals and structure-typed variables. IPA means that callee accesses are incorporated in the itemset. This yields frequent variable sets.</li>
<li>Post-processor prunes this set by identifying where items an a sub-itemset appear separately too many times, or are very popular (e.g. stdout and stderr). Then it categorises based on the type of access (read or write).</li>
<li>Inconsistent update bug detection by getting all write(x)->access(y) correlations. Concurrency bug detection by looking for common locks surrounding different accesses to a set of variables. Look for where the variables are accessed without holding that lock.</li>
<li>Approximately 13 to 19 per cent false positives on correlation inference. Analysis time takes on the order of two hours. Bad programming shows up even when it is not a bug.</li>
</ul>
<h2>Byzantine Fault Tolerance</h2>
<h3>Zyzzyva: Speculative Byzantine Fault Tolerance</h3>
<ul>
<li>State of the art in BFT is too complex for system designers. Zyzzyva outperforms existing approaches, gives performance comparable to unreplicated services, and has overhead that approaches lower bounds.</li>
<li>State machine replication to replicate failures. Two phases: agreement and execution. Agree on the request order (three-phase agreement), followed by execution.</li>
<li>In Zyzzyva, replicas execute requests without agreement, so much less overhead. The output commit happens at the client. Only the client needs to know that the system is consistent, not the replicas. So client only commits if it can confirm that the system is consistent (if it can verify that the reply is stable).</li>
<li>Verify the stable reply by using the request history. Replicas include request history in the replies. If all of the replies and histories match, then all replicas are in a consistent state, and we are done.</li>
<li>What if there are failures? We can commit output if a majority of the replies and histories are available and match. Commit phase: the client sends the histories to the replicas, and if it receives a majority of acks, then it can commit.</li>
<li>Same consistency guarantees as traditional BFT.</li>
<li>A faulty client cannot block progress or compromise safety.</li>
<li>Performance is at least twice as good as existing BFT methods, and about 35% of the unreplicated case (with no failures).</li>
</ul>
<h3>Tolerating Byzantine Faults in Database Systems using Commit Barrier Scheduling</h3>
<ul>
<li>50% of errors in DBMSs are &#8220;non-crash&#8221; errors.</li>
<li>This is the first practical BFTDB</li>
<li>3f+1 replicas, protocol globally orders requests, replicas execute in order: no concurrency. We want to be able to extract concurrency in the database context.</li>
<li>Introduce &#8220;Shepherd&#8221; into the architecture: centralised and cannot handle faults in this (but it&#8217;s small compared to the DB replicas). Many clients -> single shepherd -> many replicas. Need f+1 matching votes from the replicas.</li>
<li>Pre-determine which statements conflict to extract concurrency. Difficult to inspect SQL.</li>
<li>Commit Barrier Scheduling: run transactions first on the primary, then duplicate the ordering from the primary on the secondaries. Works best if primary is &#8220;sufficiently blocking&#8221;. Primary sends back result, shepherd responsible for running the same-ordered statements on the secondaries.</li>
<li>For correct execution: execute that statements of a transaction in the same order, and all replicas commit transactions in the same order.</li>
<li>This assumes a non-faulty primary! Concurrency on both primary and secondaries, but there is an increase in latency.</li>
<li>Faulty secondaries are not a problem, because of voting. Faulty primary is a problem, because it could generate an invalid schedule for the whole system. Assuming faulty primaries are rare.</li>
<li>Asymptotically, 17% performance penalty on MySQL or PassThrough scheme (measured in Transactions per second). But a 10x speedup on the serial case.</li>
<li>Masked bugs by using heterogeneous vendors and heterogeneous versions. Found concurrency bugs in MySQL (now patched).</li>
</ul>
<h3>Low-Overhead Byzantine Fault-Tolerant Storage</h3>
<ul>
<li>Block storage protocol that can tolerate arbitrary Byzantine faults with similar performance to systems that can handle crash-faults.</li>
<li>Crash fault-tolerant (erasure-code based) storage performs much better than replicated BFT storage (for write bandwidth). So they introduce erasure-coded BFT storage, with a relatively small overhead over the CFT method (within 10% for sufficiently large writes).</li>
<li>Write overhead = two rounds plus a cryptographic checksum. Read overhead = cryptographic checksum.</li>
</ul>
<h3>Attested Append-Only Memory: Making Adversaries Stick to their Word</h3>
<ul>
<li>BFT protocols enforce linearisability and liveness on a replicated system, with up to 1/3 faulty replicas.</li>
<li>Problem of servers equivocating to clients. Does preventing it help? Can it improve on the 1/3 bound? If so, how do we prevent equivocation?</li>
<li>Introduce a trusted equivocation guard to prevent equivocation.</li>
<li>Attested Append-Only Memory: a set of numbered logs, with sequence #, stored value, and crypto digest of entire log for attestation.</li>
<li>Run A2M as a 3rd party service, in a process, in a virtual machine or even in the VMM, or even in a secure coprocessor (cf. IBM&#8217;s vTPM implementation). Different levels of isolation or sizes of TCB.</li>
<li>Trustworthy systems are built from untrusted components. Looking at what trusted small components can be put together to make better systems.</li>
<li>A2M is simple and easily implementable, and prevents equivocation. It has broader implications on structuring trustworthy systems.</li>
</ul>
<h2>Work in Progress</h2>
<ul>
<li><strong>GIGA+: Scalable Directories for Shared File Systems:</strong> FS used as a lightweight database of small files. Want scalability to billion-to-trillion entries, striped on many servers; 100K inserts/sec. Usually use fast, dynamic indexing structures, but are synchronised. GIGA+ is more parallel, with incremental growth over many servers, and high concurrency through minimal synchronisation. Client cache consistency is expensive, so GIGA+ uses stale partition-to-server maps, but without affecting the correctness of operations. Decentralised and concurrent indexing, using increasing prefixes of hashed filenames. Servers can then split partitions independently and in parallel.</li>
<li><strong>Sequoia: Prediction Trees to Support Network-Aware Applications:</strong> Virtual trees to provide properties of the network to support applications like &#8220;What is the server that can provide this file with the greatest bandwidth?&#8221; Predicts bandwidth and latency. Tree position yields coordinates in the network, which can be used to estimate distance and the quality of paths in the network. Free hierarchical partitioning of the network.</li>
<li><strong>Flexible, Wide-Area Storage for Distributed Systems with WheelFS:</strong> Can&#8217;t just use existing tools to make a cooperative web-cache (Apache caching proxy plus network file system). Cache can use old copies of data, and always can fall back to the origin. So why not tell the FS about this? Introduce semantic cues to help apps control behaviour in event of failure. Only one line change to make Apache distributed. Semantic cues embedded in the pathname. Useful for PlanetLab, Parallel Grid computations and Distributed make.</li>
<li><strong>Reforming Software Delivery using P2P Technology:</strong> Analysed managed service department at HP. An edge node spends 1.8 hours daily synchronising its image. Idea is to combine P2P technology and rsync. Large chunk sizes give low per-chunk overhead, but less opportunities to exploit similarity. Pure P2P is not feasible because different clients use different tools, and also want to avoid using customer bandwidth. Related to CDNs. Decentralisation makes security and optimisation hard.</li>
<li><strong>Ostra: Leveraging trust to thwart unwanted communication:</strong> i.e. spam, mislabeled videos on YouTube. Usual defenses don&#8217;t work because you can&#8217;t do Bayesian filtering on video. Ostra uses social networking trust relationships, which are assumed to be hard to form. If a source sends unwanted communication to a destination, the destination will cut off the source. For multi-hop transmission, the links along the path from source to destination are all penalised. No requirement of strong identities: can create a network of sybils, or use different identities for friends.</li>
<li><strong>Enabling BITE: High-Performance Snapshots in a High-Level Cache:</strong> Applications control when snapshots are taken, and can get and put objects that reside in pages. Does CoW to save storage. Provides time travel in a storage system. (BITE = Back In Time Execution.) Approach is virtualised, crash consistent and requires a Write-Ahead Snapshot invariant. High-level: database or file system, because lower level makes it difficult to achieve consistent application-requested snapshots. Therefore they sync from higher levels when a snapshot is requested. Need to make declaration of snapshots more efficient (currently transactional).</li>
<li><strong>Kernel Memory Management in Verified Small Kernels:</strong> Constructed a small, highly-assured microkernel. Formal assurance that the abstract model translates to the kernel code, which must be rigid. seL4 exports all memory allocation and freeing to the user, with no implicit allocations within the kernel. Something of a capability model. Formally proved spatial partitioning. Haskell prototype and C/C++ version of the kernel. Ongoing work to evaluate and refine the performance.</li>
<li><strong>The Case for DDoS Resistant Membership Management in P2P Systems:</strong> Malicious node M acts as index and says that a victim node is the source for a file (even though it is not), which, if there are millions of requests, could lead to DDoS. Exploitable mechanisms include: push-based mechanisms and the ability to have multiple logical IDs per physical ID (such as IP address). Achieved an attack magnitude of 700Mbps on a real network. Idea is to have self-validation of membership information. Instead of dumbly sending requests to a victim node, count the number of failures, and, past a threshold, stop hitting it.</li>
<li><strong>Adaptive File Transfers for Diverse Environments:</strong> Goal to correctly and efficiently transfer files in a wide range of scenarios. Existing tools are scenario-specific (files in place, other files, peers, identical peers). Challenges: resources have varying performance, which changes dynamically; receivers may have different initial state (e.g. software version); and resources should not need to be set up in advance. Novel optimisation framework: first decide which resources are available, then schedule across them &#8220;in an intelligent fashion&#8221;. Defers disk operations when network is faster than disk.</li>
<li><strong>Fine-Grained Isolation for the Apache Web Server:</strong> Restructuring legacy apps for the principle of least privilege. Apaches Web Server has a sensitive private key, and a complicated parser, which could be compromised, leading to the private key being exposed. 222 heap objects and 389 globals. Manual isolation took 5 weeks. Crowbar is a binary instrumentation tool that tells you what memory a function accesses, what functions access what memory items and where a sensitive-data-generating function propagates.</li>
<li><strong>A Social Networking-Based Access Control Scheme for Personal Content:</strong> Presently done messily, for example using an email service (but limited bandwidth, file size; plus push-based model is inefficient for content delivery where not all recipients would want the content). Or using a social networking site (but little support for access control on content-specific sites; plus users end up having a bunch of social networks that may get out of sync between sites). Idea is to separate the social network (which we can let people manage), and the sites that serve content. Social relations are captured by digital social attestations, and maintain this in one&#8217;s personal address book. Enables social ACLs (e.g. restrict access to family only, or friends only, etc.). Enables &#8220;social firewalls&#8221; (access only to people whom you know), or a social calendar (provides different views to different people, based on relationship).</li>
<li><strong>Improving Virtual Appliances through Virtual Layered File Systems:</strong> Problem is existing file systems are not sufficient for management of large numbers of machines. Provisioning takes too long; existing file systems treat machines as fully independent; and security models treat every file in the same way(?). A layer is a self-contained set of files, which can be shared in a read-only manner. e.g. a library or an application can be a layer. Layers are composable into a traditional file system view, along with a private read-write layer to make the file systems independent. Layers can be composed into templates for provisioning. Updating is easy: just replace the layer! Security is improved because it is easier to keep machines up to date, and tell which files have been modified.</li>
</ul>
]]></content:encoded>
			<wfw:commentRSS>http://www.mrry.co.uk/blog/2007/10/15/sosp-2007-day-1/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>SOSP 2007: Day -1 (Travel)</title>
		<link>http://www.mrry.co.uk/blog/2007/10/14/sosp-2007-day-1-travel/</link>
		<comments>http://www.mrry.co.uk/blog/2007/10/14/sosp-2007-day-1-travel/#comments</comments>
		<pubDate>Sun, 14 Oct 2007 01:54:20 +0000</pubDate>
		<dc:creator>Derek Murray</dc:creator>
		
	<category>Travel</category>
		<guid isPermaLink="false">http://www.mrry.co.uk/blog/2007/10/14/sosp-2007-day-1-travel/</guid>
		<description><![CDATA[So, if you&#8217;ve seen me in person some time in the past week, I&#8217;ll probably have gushed to you about the fact that I&#8217;m incredibly lucky to be going to SOSP 2007, at Skamania Lodge, Stevenson, WA. I&#8217;ve been up for 22 hours now, and my luck is giving way to psychosis.
Up at 4:30am, I [...]]]></description>
			<content:encoded><![CDATA[<p>So, if you&#8217;ve seen me in person some time in the past week, I&#8217;ll probably have gushed to you about the fact that I&#8217;m incredibly lucky to be going to <a title="Sadly not on the moon...." href="http://www.sosp2007.org/">SOSP 2007</a>, at <a href="http://www.skamania.com/">Skamania Lodge</a>, Stevenson, WA. I&#8217;ve been up for 22 hours now, and my luck is giving way to psychosis.</p>
<p>Up at 4:30am, I left my (new-but-that&#8217;s-another-story) house at 6, caught the 6:30 bus from Cambridge to Gatwick (and was dismayed at the lack hostess, jolly or otherwise, on the <a href="http://www.lyricsfreak.com/d/divine+comedy/national+express_20040885.html">National Express</a>), then left Gatwick at 12:45 on an American Airlines flight to Raleigh/Durham, NC. Those with a keen, or even extremely vague, sense of geography will see immediately the flaw in my plan!</p>
<p>The transatlantic leg was fine, nothing special, and I got a good amount of work done on the plane. Arriving at the backwoods &#8220;<a href="http://www.rdu.com/">Raleigh/Durham International Airport</a>&#8221; was something of a come down. I&#8217;m no stranger to queues at immigration (although the hall was so small that they had to let us off the plane in stages), or having to recheck my luggage having cleared customs, but I was dismayed to find that, after doing this, all 300 tired and cranky passengers off the fully-loaded 777 were forced through a full laptops-out, shoes-off security checkpoint, even if they were leaving the airport! A little artefact of those halcyon pre-9/11 days when airport security was such that the entire townsfolk would congregate in the departure lounge for a barn dance&#8230;.</p>
<p>Two hours there, and a bag of Fritos later, I was on another flight, this time to Dallas Fort Worth. A narcoleptic episode beginning just before take-off left me quite disoriented, but now I&#8217;m at DFW, tapping out an unnecessarily sardonic blog post, just trying to stay awake and not miss my four-hour flight to Portland. If you&#8217;re reading this, Henry, the <a href="http://www.aa.com/aadvantage">AAdvantage miles</a> just weren&#8217;t worth it.
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.mrry.co.uk/blog/2007/10/14/sosp-2007-day-1-travel/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Faking It</title>
		<link>http://www.mrry.co.uk/blog/2007/07/16/faking-it/</link>
		<comments>http://www.mrry.co.uk/blog/2007/07/16/faking-it/#comments</comments>
		<pubDate>Mon, 16 Jul 2007 11:45:21 +0000</pubDate>
		<dc:creator>Derek Murray</dc:creator>
		
	<category>Meta</category>
	<category>Personal</category>
		<guid isPermaLink="false">http://www.mrry.co.uk/blog/2007/07/16/faking-it/</guid>
		<description><![CDATA[Today I read with some distress that scenes from Gordon Ramsay&#8217;s The F Word were edited to make it appear that he caught several fish. He did no such thing.Distress not because I had just signed up Ramsay for a lucrative five-year spearfishing contract, but because of the statement from Channel 4 that:
 
the broadcaster [...]]]></description>
			<content:encoded><![CDATA[<p>Today I read with some distress that scenes from Gordon Ramsay&#8217;s <em>The F Word</em> were edited to make it appear that he caught several fish. <a href="http://news.bbc.co.uk/1/hi/entertainment/6900463.stm">He did no such thing.</a><a id="more-24"></a>Distress not because I had just signed up Ramsay for a lucrative five-year spearfishing contract, but because of the statement from Channel 4 that:</p>
<blockquote><p><font size="2"> </font></p>
<p><font size="2">the broadcaster took &#8220;such errors of judgement seriously&#8221; and was working with production company Optomen to ensure there was no repeat. </font></p></blockquote>
<p>Perhaps the fact that this story was filed under &#8220;Entertainment&#8221; should be the giveaway clue: this is about as troubling as the use of stuntmen in action films, or identical twins to portray a single young child in a heartwarming soap opera. If such &#8220;errors of judgement&#8221; are to be taken &#8220;seriously&#8221;, will we not be left with utterly anaemic entertainment? Will all television be reduced to something like <em>Blue Peter</em>? (Oh, wait, <a href="http://news.bbc.co.uk/1/hi/entertainment/6284014.stm">never mind</a>.)<br />
The obvious reason for this banal mea culpa is the <a href="http://news.bbc.co.uk/1/hi/entertainment/6898379.stm">storm in a teacup</a> over that documentary about the Queen. Yes, it was bad journalism, but an <a href="http://news.bbc.co.uk/1/hi/entertainment/6294472.stm">apology was issued</a>, and that should be the end of the story. I&#8217;m a fan of the monarchy (on utilitarian grounds), but Betty is ill-served by calls for heads to roll.</p>
<p>Perhaps I&#8217;m a hypocrite, though. For eight years now, I&#8217;ve been selectively editing this blog to give me the appearance of a international playboy and raconteur. I apologise if you&#8217;ve been taken in.
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.mrry.co.uk/blog/2007/07/16/faking-it/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Self-hating Scotsman</title>
		<link>http://www.mrry.co.uk/blog/2007/06/18/self-hating-scotsman/</link>
		<comments>http://www.mrry.co.uk/blog/2007/06/18/self-hating-scotsman/#comments</comments>
		<pubDate>Mon, 18 Jun 2007 15:01:39 +0000</pubDate>
		<dc:creator>Derek Murray</dc:creator>
		
	<category>Personal</category>
		<guid isPermaLink="false">http://www.mrry.co.uk/blog/2007/06/18/self-hating-scotsman/</guid>
		<description><![CDATA[Whenever I introduce myself to someone as being from Glasgow, I invariably get one of the following responses:

&#8220;Oh but I can understand your accent.&#8221;
&#8220;Oh but I thought Scottish people were meant to be thrifty and you&#8217;ve clearly just got a round in for everyone.&#8221;
&#8220;Oh but I can&#8217;t help noticing you haven&#8217;t knifed me to death.&#8221;
&#8220;Is [...]]]></description>
			<content:encoded><![CDATA[<p>Whenever I introduce myself to someone as being from Glasgow, I invariably get one of the following responses:</p>
<ul>
<li>&#8220;Oh but I can understand <em>your</em> accent.&#8221;</li>
<li>&#8220;Oh but I thought Scottish people were meant to be thrifty and you&#8217;ve clearly just got a round in for everyone.&#8221;</li>
<li>&#8220;Oh but I can&#8217;t help noticing you haven&#8217;t knifed me to death.&#8221;</li>
<li>&#8220;Is it true about the kilts?&#8221;</li>
</ul>
<p>In the face of such blatant stereotyping, I emit a beleaguered sigh, and launch into the prepared script that I have for just these occasions; I wonder why people cling on to these old-fashioned notions about Scottishness, surely cribbed from an episode of the Russ Abbot show. But then&#8230;.</p>
<p><a id="more-23"></a>On Saturday, I was outside my house, pouring concrete for the foundations of a new wall. Such is the crucible of the front garden that many passers by stopped for a chat, some just to ogle. One of the former introduced himself as William, from Glasgow. In the space of a few short minutes, he:</p>
<ul>
<li>Was drunk to the point of staggering</li>
<li>Raised the spectre of Scotland&#8217;s 1967 football victory over England</li>
<li>Complained about the price of beer</li>
<li>Started getting extremely worked up about the price of his fish supper</li>
<li>Swore a lot</li>
</ul>
<p>So, thanks to William, the drunk, aggressive, tight-arsed, anti-English Scotsman with a penchant for fried food. When presented with this evidence, it&#8217;s perfectly clear that the battle is lost. Now, if you&#8217;ll excuse me, I think my head&#8217;s going to explode.
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.mrry.co.uk/blog/2007/06/18/self-hating-scotsman/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>The Apprentice: I demand a recount!</title>
		<link>http://www.mrry.co.uk/blog/2007/06/14/the-apprentice-i-demand-a-recount/</link>
		<comments>http://www.mrry.co.uk/blog/2007/06/14/the-apprentice-i-demand-a-recount/#comments</comments>
		<pubDate>Thu, 14 Jun 2007 09:27:54 +0000</pubDate>
		<dc:creator>Derek Murray</dc:creator>
		
	<category>Uncategorized</category>
		<guid isPermaLink="false">http://www.mrry.co.uk/blog/2007/06/14/the-apprentice-i-demand-a-recount/</guid>
		<description><![CDATA[I accept that it&#8217;s a bit jejune to get worked up about Reality TV, but last night&#8217;s Apprentice took the biscuit.
Now, bear in mind that I only watched the last two episodes, so I&#8217;m not basing this on all the evidence, but in business, as the cliché goes, you&#8217;re only as good as your last [...]]]></description>
			<content:encoded><![CDATA[<p>I accept that it&#8217;s a bit jejune to get worked up about Reality TV, but last night&#8217;s <em>Apprentice</em> took the biscuit.</p>
<p><a id="more-22"></a>Now, bear in mind that I only watched the last two episodes, so I&#8217;m not basing this on all the evidence, but in business, as the cliché goes, you&#8217;re only as good as your last job/performance/trade/idea/etc., so I think that gives me the right to comment. (Well, that and the fact that, as a blogger, I&#8217;m clearly a narcissist.)</p>
<p>So we had Simon versus Kristina. And I&#8217;ll put my cards on the table, I really thought Kristina should have won. If this is a meritocracy, the winner should have been chosen on merit, and, in my opinion, all the merit belonged to her. We saw her give a confident performance in the interview, in which they made no successful attempt to impeach her. By contrast, we saw Simon fail-to-look-the-interviewer-in-the-eyes as his litany of misdeeds as a landlord were reeled off (there is a special place in hell&#8230;), and his supposed advantage was that he knew so much about the company and Alan Sugar himself. So he read the <a href="http://www.amstrad.com/about/profile.html">company profile</a>, and practically quoted it verbatim, in the form, &#8220;I know . I know .&#8221; Perhaps he thought that if he repeated &#8220;I know&#8221; over and over, this would be proof of his intelligence, but any fool can repeat a list of facts. In fact, even a computer can do it: someone clearly taught the Amstrad web server, so it can&#8217;t be that hard.<br />
So his two advantages, coming into the last round were intelligence and class: quite literally, &#8220;He went to a <a href="http://www.westminster.org.uk/">good school</a>; he went to a <a href="http://www.cam.ac.uk/">good university</a>.&#8221; (Disclosure: I work for the university in question.) Allow me to advance a theory. He went to a good public school because his father is a multi-millionaire <a href="http://news.bbc.co.uk/1/hi/entertainment/6749303.stm">[1]</a> <a href="http://www.westminster.org.uk/fees.asp">[2]</a>. He was disproportionately likely to go to Cambridge because he went to a public school <a href="http://www.admin.cam.ac.uk/reporter/current/special/11/table6.pdf">[3]</a>.</p>
<p>All of which sounds like so much armchair socialism and inverse snobbery. But it&#8217;s not. If you come from a well-off background, or, for whatever reason, attend an independent school or an elite university, then I wish you no harm. But if you use any of these facts alone as an a priori reason to employ someone then you fully deserve society&#8217;s wrath. It&#8217;s not where you came from or where you went that should matter in this world; it&#8217;s who you are and what you can do.</p>
<p>But should that be the deciding factor in the cut-throat world of business? Perhaps not. By all accounts, in the final episode, Simon delivered an excellent presentation (although this was hardly clear from the programme; and let&#8217;s not forget that life was imitating <a href="http://en.wikipedia.org/wiki/Aldrin_Justice">art</a> when the building he presented looked like (not one but) three semi-erect penises), and his charm well-spokenness was an asset when dealing with customers. I would ask you, though, what use is it to be well-spoken, if one has to be told what to say?</p>
<p>Which brings me to my punchline. When was the last time we saw a &#8220;nice&#8221;, &#8220;charming&#8221; guy, who came from a distinguished background and went to a <a href="http://www.yale.edu/">good university</a>,<!--more--> chosen in a major contest?<br />
<a href="http://en.wikipedia.org/wiki/George_W._Bush">Oh yeah. Never mind.</a>
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.mrry.co.uk/blog/2007/06/14/the-apprentice-i-demand-a-recount/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Xen Linux kernel configuration for dummies</title>
		<link>http://www.mrry.co.uk/blog/2007/05/21/xen-linux-kernel-configuration-for-dummies/</link>
		<comments>http://www.mrry.co.uk/blog/2007/05/21/xen-linux-kernel-configuration-for-dummies/#comments</comments>
		<pubDate>Mon, 21 May 2007 14:04:32 +0000</pubDate>
		<dc:creator>Derek Murray</dc:creator>
		
	<category>Technology</category>
		<guid isPermaLink="false">http://www.mrry.co.uk/blog/2007/05/21/xen-linux-kernel-configuration-for-dummies/</guid>
		<description><![CDATA[First let me offer my apologies to readers on GUFF or Facebook: normal service will resume shortly. Now, who&#8217;s been having trouble with their paravirtualisation?
It seems to me a common problem that when people try to build Xen from source, they come unstuck when trying to boot their new dom0 kernel. Usually, they get an [...]]]></description>
			<content:encoded><![CDATA[<p>First let me offer my apologies to readers on <a href="http://www.mrry.co.uk/guff/">GUFF</a> or Facebook: normal service will resume shortly. Now, who&#8217;s been having trouble with their paravirtualisation?</p>
<p><a id="more-21"></a>It seems to me a common problem that when people try to build Xen from source, they come unstuck when trying to boot their new dom0 kernel. Usually, they get an error like:</p>
<blockquote><p>VFS: <strong>Cannot open root device</strong> &#8220;hda1&#8243; or unknown-block(0,0)</p></blockquote>
<p>Obviously, or maybe not, hda1 could be replaced by a number of different labels. What it boils down to is that your kernel and initial ramdisk pairing is incorrectly configured to access your root device. And so you can&#8217;t get to any of your files.</p>
<p>The first step is to make sure that you have an initial ramdisk. And this is where the advice usually ends. So, boot up your system using a functioning kernel, and enter the following as root (assuming your version of Xen uses 2.6.18 as the dom0 kernel):</p>
<blockquote><p>mkinitrd -f /boot/initrd-2.6.18-xen.img 2.6.18-xen</p></blockquote>
<p>Then edit your GRUB menu.lst to make sure that this is being loaded (it should be on a line beginning &#8220;module&#8221;, immediately following your dom0 kernel).</p>
<p>In many cases, this will fix the problem, but not if you didn&#8217;t build the correct modules in the first place! What you have to do, then, is get a copy of a working kernel .config for your system. For example, if you set up Fedora Core 5, you should be able to get a copy of the FC5 .config files for your kernel version (2.6.15, in my case). Follow the instructions for downloading kernel headers sources, and you should find them in a resulting directory. If you&#8217;re extra-lucky, you might get specific Xen .config files (for the distributions that ship with Xen support). Now that you have a working configuration, copy this over $XENROOT/buildconfigs/linux-defconfig_xen_x86_32 (use your smarts to select the appropriate source file for your architecture, and modify your move-target appropriately).<br />
Now, hit `make world`, and go and make some coffee. Do the same mkinitrd shuffle as before, then reboot into Xen. If you&#8217;re lucky like me, it&#8217;ll work.</p>
<p>I know that this might be obvious to anyone who&#8217;s done serious kernel development before, but it wasn&#8217;t to me, and maybe you&#8217;ll come across this page in Google and it&#8217;ll help you out.</p>
<p>A little aside: I was doing all this in order to get Xen to run in HVM mode, <em>on top of Xen</em>. A big hat-tip must therefore go to Mark Williamson, who set me on this path and helped me out along the way.<br />
As for the blog, expect normal service to return, with a mild rant about customer service and the iniquities of pedestrian access to suburban shopping centres, real soon now&#8230;.
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.mrry.co.uk/blog/2007/05/21/xen-linux-kernel-configuration-for-dummies/feed/</wfw:commentRSS>
		</item>
	</channel>
</rss>
