astomo.us

energy efficient computing

March 1st, 2010

I just returned from the 14th SIAM conference on parallel computing in Seattle. I was presenting a SIMD matrix compression algorithm Ive been working on for some applications in quantum chemistry. There were a lot of interesting technical talks but my favorite was a non-technical plenary talk by Horst Simon on energy-efficient computing, which you might think would be a tough sell to the supercomputing community. However Simon obviously knew his audience and made a very convincing argument to the effect that greater energy efficiency is an absolute must in order to even build an exaflop machine.

The reason I liked the talk so much though was that he just used that fact as a hook to pull you into a more general discussion of energy in America and the future of computing. I will relate one interesting point he made which is that you cant just build a competitive supercomputer anywhere these days. There are two big reasons for this: one is that you have to have stable access to a huge amount of (preferably renewable) energy; the other related reason is you want to maximize your Power Usage Effectiveness which essentially means building in a cool climate so as to not waste resources on cooling. One result of this is that you are going to see more and more remote ‘data factories’ in the Pacific Northwest such as Google’s Oregon Data Center. These are essentially supercomputers dedicated to fabricating data for consumers in the form of cloud apps such as Gmail or Amazon’s EC2.

sound synthesis workshop

February 19th, 2010

For anyone from the Los Angeles area who might be interested, Ill be running a workshop on sound synthesis at the Machine Project in May (details are here).

i am sitting in a room

January 5th, 2010

Alvin Lucier’s “I am Sitting in a Room” is a classic piece of feedback-based sound design. It consists of Lucier sitting in a room with 2 distant mics, 2 tape decks, and speakers. The recording begins with Lucier verbally describing the process of the piece, which is that he is recording his voice, and after recording, he will play that back through the speakers, and record that through the mics onto the 2nd tape deck. This continues for multiple generations. At first, you just hear reverberation of the rooms acoustics as he replays the tape into the room. Gradually, the resonant frequencies of the room (i.e. sinusoids whose wavelength divides the distance from speaker to mic) begin to stand out, and by the middle of the 2nd side of the lp the original recording is completely obscured.

Last month I wrote a max patch that implements a direct echo/reverb feedback loop to digitally simulate Lucier’s design, with a few extra features. There are several presets you can mess around with- the latter preset object implements a simulated acoustic reverb network from James Anderson Moorer’s 1978 paper ‘about this reverberation business’. Ive also included an example aiff file and its output after several passes through the loop.

NOTE OF CAUTION- if you start toying with the reverb network- the cascade output gain isnt normalized so the feedback tends to get out of control after a few iterations if you dont manually lower the gain so mind your ears.

hexagon

December 28th, 2009

hexagonPerhaps you werent aware that there is a GIANT HEXAGON ON SATURN’S NORTH POLE. It appears to be a variety of standing wave. These are new pictures with an unprecedented level of detail- note the shockwaves (at least ithink theyre shockwaves) propagating off of the corners. Thanks to Joe Mason from the Cassini imaging media team for fwding these to me.

thoughts on research

November 30th, 2009

Two thoughts, in particular, stand out. In any sufficiently large field of study:

(a) there is no single person who is an authority, and

(b) at any given moment, there will be many people working independently on the same problem. To anyone who has done any type of research, these are self-evident truths. Ignorance and uncertainty, as well as a vast and chaotic literature are inevitable by-products of a free research environment. When there is no such freedom, it is easy for one to become an authority and, once an authority, to make sure that everyone in one’s charge is working together on the same problem; at that stage, it is only a small step to further ensure, as Lysenko did in Soviet Russia, that everyone uncovers the same results, their truth notwithstanding.

aphex twin vs. stockhausen

November 10th, 2009

When presented with an Aphex Twin track, Karlheinz Stockhausen once commented: “I heard the piece Aphex Twin of Richard James carefully: I think it would be very helpful if he listens to my work “Song of the Youth,” which is electronic music, and a young boy’s voice singing with himself. Because he would then immediately stop with all these post-African repetitions, and he would look for changing tempi and changing rhythms, and he would not allow to repeat any rhythm if it [was] varied to some extent and if it did not have a direction in its sequence of variations.”

To which AT responded: “I thought he should listen to a couple of tracks of mine: ‘Didgeridoo,’ then he’d stop making abstract, random patterns you can’t dance to”.

- “Advice to Clever Children”. The Wire 67 (141): 553.

The other day I was listening to AT’s “Selected Ambient Works Vol. 2″ and track #2 off the first disc got me thinking. In case you’ve never heard of R.D. James (aka Aphex Twin), he is an electronic musician. Occasionally he does something nerdy and/or snarky which cracks me up. For example his 1999 single Windowlicker had several images embedded as sonograms inside the audio. Much of his music sounds meticulously edited, but occasionally there are melodies or modulation parameters or something that sound algorithmic. For fun I decided to try and ‘cover’ track #2 algorithmically with no editing. To construct the melody and modulation data I used a bit-reversal algorithm on an ascending harmonic scale.

bit-reversal

Bit-reversal (pictured above as appiled to an 8-element array) is, as the name suggests, an operation (permutation actually) wherein you take an array of data and map each element to the unique address whose binary digits are the mirror image of those in its current address. It is a low-level primitive that comes up surprisingly often in computational harmonic analysis. The most prominent example is the FFT of course, but there are several other interesting transforms (such as the sequency ordered Walsh-Hadamard transform) that make similar use of it. Bit-reversal is plenty interesting on its own however. It can be done in-place and in O(n) time and can be profitably parallelized on SIMD architectures (such as GPU’s). To create my data I used a recursive version (written in Python):


from numpy import *

def bit_reverse_traverse(a):
    n = a.shape[0] 
    if n == 1:
        yield a[0]
    else:
        even_index = arange(n/2)*2
        odd_index = arange(n/2)*2 + 1
        for even in bit_reverse_traverse(a[even_index]):
            yield even
        for odd in bit_reverse_traverse(a[odd_index]):
            yield odd
    return

def get_bit_reversed_list(l):
    n = len(l)
    index = arange(n)
    b = []
    for i in bit_reverse_traverse(index):
        b.append(l[i])      
    return b

The first function bit_reverse_traverse is a generator, which is a sortof functor (by which I mean function with internal state, not the category-theoretic kind) that behaves like an iterator. However instead of creating the entire array in memory and feeding it out one element at a time, generators rely on internal state (using the yield statement, which acts like a stack frame freezer) to synthesize a new element each time their next method is called. Hence they are much more efficient than normal iterators (at the cost of not being random access) and enable one to do things like bit-reverse a 65,536 element array on a laptop reasonably quickly. This generator works a bit like merge-sort: by recursively iterating over copies of itself applied to 2 sub-arrays of even and odd-numbered elements (accessible using numpy syntax a[even_index]) until the arrays are of size 1, then it just returns the element. The second function get_bit_reversed_list actually performs the bit-reversal on a list by iterating over the list using the generator and appending each array element returned by the yield statements.

One can do all sorts of interesting things with generators- usually with some kindof explicit recursion (indeed they are conceptually quite similar to basic streams in Scheme), but not always. For example you can easily phrase the Sieve of Erastosthenes algorithm for generating prime numbers as a generator:


from numpy import any

def primes():
    n = 2
    p = []
    while True:
        if not any( n % f == 0 for f in p ): 
            yield n
            p.append( n )
        n += 1

As you can see the code isn’t recursive.

Anyways, after creating some lists this way I converted them into MIDI data and played them on a sampling synthesizer. Here is the result.

static functions in c

November 7th, 2009

Today I was reading the slides from Felix von Leitner’s talk on C compiler optimizations (given at the recent Linux Kongress). Basically his point is that there is no point in making your code confusing in an attempt to optimize it b/c usually the resulting object code will not be faster. I’m not sure how true that is in general (it doesn’t seem to be the case with OpenCL at least in my experience), but I really like his point. The slides are also a good read if you like learning semi-obscure C tricks. For example if you want to inline a function in C its best to actually not use the inline keyword. This ‘destroys code locality’ according to Felix which Im assuming means that it can cause pointers to what is apparently the same function to compare not equal to one another. However what you can do is declare them static if you’re calling them exactly once. In that case the compiler will inline the function. For example, when compiled with the command gcc -S -O2, the code:

static long abs2(long x) {
return x>=0?x:-x;
} /* Note: > vs >= */

long abs3(long x) {
return x>=0 ? x : -x;
}

int main(int argc, char *argv[]) {
int x = 11;
int y;

y = abs2(x);
y = abs3(x);

return 0;

}

will produce assembly that contains no definition of abs2. However if you add a second call to abs2 and look at the assembly again you’ll find that all of a sudden you have an extra:


_abs2:
LFB2:
  pushq  %rbp
LCFI0:
  movq  %rsp, %rbp
LCFI1:
  movq  %rdi, %rdx
  sarq  $63, %rdx
  movq  %rdx, %rax
  xorq  %rdi, %rax
  subq  %rdx, %rax
  leave
  ret
LFE2:
  .align 4,0x90

cray vs. apple

November 7th, 2009

When he was told that Apple Computer had just bought a Cray to help design the next Apple Macintosh, Cray commented that he had just bought a Macintosh to design the next Cray.

- According to Jim Gray (quoted by C. Gordon Bell in his “Seymour Cray Perspective”)

SIAM is holding a conference on parallel computing in Seattle next February. I’m pretty psyched about it mostly because I’ve been sinking a lot of time into GPU-based stream computing lately and from a cursory look at the program I’d say that GPGPU is going to be the subject of a lot of interest.

For example here is an abstract from one of the Joint JSIAM — SIAM Lectures (sortof amusingly entitled: GPU Acceleration: A Fad or the Yellow Brick Road onto Exascale?):

“Since the first commodity x86 cluster Wigraf achieving paltry 10s~100s Megaflops in 1994, we have experienced several orders of magnitude boost in performance. However, the first Petaflop was achieved with the LANL RoadRunner, a Cell-based “accelerated” cluster, and in 2010 we may see the first (GP)GPU-based cluster reaching Petaflops. Do such non-x86“accelerator” merely push the flops superficially, or are they fundamental to scaling? Based on experiences from TSUBAME, the first GPU-accelerated cluster on the Top500, we show that GPUs not only achieve higher performance but also better scaling, and in fact their true nature as multithreaded massively-parallel vector processor would be fundamental for Exascale. Such results are being reflected onto the design of TSUBAME2.0 and its successors.”

In general the computer hardware industry is unpredictable and early adopters often get burned. How widespread GPGPU eventually becomes still remains to be seen, but such talk must be encouraging to anyone who has invested time into constructing and implementing algorithms on GPU architectures. Anyways, I’m hoping to make the trip up and do a poster presentation of some of the GPU-based matrix compression work I’ve been doing.