Parasitic computing is a technique [...] that uses the legitimate function of computing hosts to perform some other computation.
According to a Nielsen report broadband-connected Americans spend 2.6B hours online every month. That’s about 300 computing-years/month in the US alone while people are actually in front of the screen. The global internet population is 2B people! That is a lot of largely unused processing power.
There’s projects that use idle processors to solve difficult problems (SETI@Home, Rosetta@Home, SuperDonate, etc). Most of those projects are based on BOINC and require the installation of a client software. All BOINC projects have a combined all-time user-base of 2M users. That is 1/1000 of the internet population (probably less, given that BOINC runs on servers as well). For a mind-blowing list of volunteer computing articles click here.
I think we can do better.
- http://jsdc.appspot.com/ Is a distributed reverse-hashing implementation that can be embedded into websites. More recently the author built http://distributed-pi.appspot.com/static/beta.html
- Andrew Collins uses the BBP algorithm to compute digits of pi in visitor’s browsers. The algorithm automatically verifies the digits.
- Misco, a MapReduce framework for mobile devices. This is actually implemented in python but the mobile aspect makes it interesting.
- Security: The worker scripts could modify the host site. Also the data and algorithms used in the computation are visible to the clients.
- Latency: Most distributed computing applications are not l because the input data is so large. The latency from sending this data to workers would kill any large-data computation.
- Reliability: Workers in this kind of setup are not trustworthy and fail often (when the user navigates away form the host site)
So what have we got? We can run jobs on large numbers of unreliable, non-trustworthy workers with slow transfer speeds in a restricted, sandboxed environment. Now we’re looking for a class of problems that are well suited for that scenario.
Most Map Reduce jobs seem to run some sort of analysis on terabytes of data with chunk sizes of 64MB. That’s not going to work in our case. But what if the data sizes were generally small, or the data was publicly available, or the data could be randomly generated? How about a distributed count of words in Wikipedia? Or a Monte Carlo simulation for investment portfolios (thanks Tobias!)? Or an analysis of a person’s interests and habits on the internet?
As I said, there’s a few things to figure out and the technology may not quite there yet/widespread. But that doesn’t mean that it’s not worth a shot. And before we know it the next wave of technologies such as Google’s native client will be available and we can run BOINC in everybody’s browsers :)
We’d appreciate any input on the topic since we’re new to distributed computing.
Update 1: Voting at Node Knockout has ended and we came in second in the innovation category and seventh overall. Not too bad :) Thanks for all the support and congratulations to the winners!
No related posts.