Skip to content

Parasitic Computation and JavaScript

by Jonas Huckestein on September 2nd, 2010

Parasitic computing is a technique [...] that uses the legitimate function of computing hosts to perform some other computation.

From Gian David Perrone‘s Master’s Thesis [PDF]

According to a Nielsen report broadband-connected Americans spend 2.6B hours online every month. That’s about 300 computing-years/month in the US alone while people are actually in front of the screen. The global internet population is 2B people! That is a lot of largely unused processing power.

There’s projects that use idle processors to solve difficult problems (SETI@Home, Rosetta@Home, SuperDonate, etc). Most of those projects are based on BOINC and require the installation of a client software. All BOINC projects have a combined all-time user-base of 2M users. That is 1/1000 of the internet population (probably less, given that BOINC runs on servers as well). For a mind-blowing list of volunteer computing articles click here.

I think we can do better.

Parasitic JavaScript and MapRejuice

This weekend we (@shiftb and @ryanmcgrath and me) built MapRejuice for the Node Knockout competition. I finally found some time to reflect on what we learned and where we’re at, so here it is.

MapRejuice is a distributed Map Reduce implementation in JavaScript. A website owner can insert a small piece of code into his website and his visitors will start cranking away at MapRejuice jobs. This computing power can be sold and the webmaster gets a cut. Goodbye advertisements? Not quite yet. There’s a few things to figure out first.

Distributed computing in JavaScript has been attempted numerous times before:

The problems commonly associated with Parasitic Javascript are

  • Security: The worker scripts could modify the host site. Also the data and algorithms used in the computation are visible to the clients.
  • Performance: A few years ago JavaScript implementations were  still rather slow.
  • Latency: Most distributed computing applications are not l because the input data is so large. The latency from sending this data to workers would kill any large-data computation.
  • Reliability: Workers in this kind of setup are not trustworthy and fail often (when the user navigates away form the host site)
  • Restrictiveness: The in-browser JavaScript runtime is not sufficient for complex calculations. There is no binary protocol and data cannot be encrypted. Data cannot be streamed. Data cannot be requested and sent back across different domains.

Using the upcoming WebWorkers standard makes it impossible for workers to modify the host site and the performance of modern JavaScript engines with JIT compilers is not too bad. Some of the restrictiveness may be solved by technological advances, but let’s not count on that.

What now?

So what have we got? We can run jobs on large numbers of unreliable, non-trustworthy workers with slow transfer speeds in a restricted, sandboxed environment. Now we’re looking for a class of problems that are well suited for that scenario.

Most Map Reduce jobs seem to run some sort of analysis on terabytes of data with chunk sizes of 64MB. That’s not going to work in our case. But what if the data sizes were generally small, or the data was publicly available, or the data could be randomly generated? How about a distributed count of words in Wikipedia? Or a Monte Carlo simulation for investment portfolios (thanks Tobias!)? Or an analysis of a person’s interests and habits on the internet?

As I said, there’s a few things to figure out and the technology may not quite there yet/widespread. But that doesn’t mean that it’s not worth a shot. And before we know it the next wave of technologies such as Google’s native client will be available and we can run BOINC in everybody’s browsers :)

We’d appreciate any input on the topic since we’re new to distributed computing.

Update 1: Voting at Node Knockout has ended and we came in second in the innovation category and seventh overall. Not too bad :) Thanks for all the support and congratulations to the winners!

Update 2: We got covered in the MIT Technology Review (errr, the blog at least). We also got a shout-out on the changelog (starts at 8.30min)

No related posts.

From → Technology

6 Comments
  1. I made jsdc a few years ago, and more recently I made http://distributed-pi.appspot.com/static/beta.html (Someone else made http://www.csc.liv.ac.uk/~acollins/pi which does pretty much the same thing with more pretty charts and a non-broken implementation).

    • jonas.huckestein permalink

      Updated with links antimatter’s links. Thanks!

  2. How about protein folding? The Folding@Home project is a good place to start looking at this problem. It is in a class of problems that don’t require much data but solving them requires a massive search over a combinatorial space. If that search can be made parallel a MapRejuice approach might be useful.

    Along those lines you may want to look into SAT problems. Perhaps the SAT competitions site is a good place to start: http://www.satcompetition.org/

    • Hey Mark, those are great ideas! The SAT competition would have been a fantastic proof of concept. Ah well :) Next time.

      How long would a typical Folding@Home computation run? If a computation runs longer than a visitor remains on a website, then the time will have been waster.

  3. Aaron permalink

    Fantastic!

    You get this going and I will click on your processing URL before I got to bed and let it process all night. I won’t load special software, but I want to help.

    BTW, a beautiful looking graphic while it’s working always helps.

  4. Tim Kelley permalink

    Hey Jonas -

    Sent you a message on Facebook. Trying to get ahold of you about some code writing, but I’m not sure how to find you! Katharina Juranek recommended me to you. If you could, check that message and get back to me?

    Tim Kelley

Leave a Reply

Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS