Yale DB

1. Problem

Large analytical data management (OLAP) with commodity clusters

2. Challenges

  • Performance and efficiency
    • hadoop
      • is not for structured data analysis
      • scale well
      • open source, without cost
  • scalability, fault-tolerance, and flexibility
    • Previous parallel db fit into tens of nodes, not thousand of nodes.
    • not scale well for failures, heterogeneous machines, performance untested

3. Solutions

  • MapReduce (Hadoop) + parallel db (or single-node DBs) = HadoopDB

MPI-SWS, UPenn, Duke, Akamai, paper, slides, video

1. Problems

Hybrid (Servers with assisting peers) designs in CDNs -> malicious clients can cause significant accounting inaccuracies.

2. Challenges

Infrastructure can not control P2P communications by malicious clients (peers). Even if infrastructure provides signed metadata and fallback (so content can not be mishandled by peers), there are still:

  • Affect service quality
  • Misreport P2P transfers

For example, inflation attack occurred for a fake download report.

3. Solutions

Reliable Client Accounting (RCA)

From MPI-SWS, Germany, paper and slides

1. Problem

To protect user privacy in distributed systems from leaking by statistical queries.

2. Challenges

The most direct solutions are

  1. to anonymize + add noise to user data.

    [-] utility, de-anonymize

  2. differential privacy. add noise to answer of queries.

    [-] scale, churn tolerance, malicious client

3. Solution

PDDP: Practical Distributed Differential Privacy.

Yale Dedis

1. Efficient System-Enforced Deterministic Parallelism (OSDI’10)

Parallelism introduces 1) non-determinism and 2) data races (heisenbugs). Determinism means that a given input always produces the same output. In other words, input alone determines the output, regardless of extrinsic events such as the OS’s thread scheduling.

To achieve determinism,

Important update on 09/14/2013. I implemented the flow-based scheduler in this paper on Hadoop – H-Quincy. For a much more detailed discussion, please visit this link.

Microsoft, paper, video, slides

1. Problem

How to achieve fair scheduling? In other words, a guy who gets up early and performs huge tasks on a cluster should not always monopolize most computing resource, and someone else’s assignments should not be ignored. Otherwise, it is unfair for all the cluster users.

Fair sharing of the cluster resources

Job x takes t seconds, when running exclusively on the cluster. When the cluster has J jobs, x should take <= Jt seconds.

N computers and J jobs: each job gets at least N/J computers.

2. Challenges

Traditionally,

一 大叔

第一次见到阿米(Amittai F. Aviram)大叔就被彻底震撼到了,你简直就无法相信眼前这位爷爷辈的大叔居然是计算机系的博士!OSDI’10(计算机系统界顶级会议)最佳论文的第一作者!要知道,人人生而平等,会议生而不等,同是顶级会议,论文被OSDI录用就比VLDB、WWW、NSDI等等这些难得不是一点两点;而且系统领域的论文不是推个数学公式,做个小实验验证一下就可以过关,而是需要老老实实动手实现整个复杂的设计的。而这位谈吐之间无时无刻不环绕着功成身退气息的大叔,此刻正在向比他小个几十岁的教授们作毕业答辩!机缘巧合的是,不久之后,当我想看看如何实现一个简单的数据库的时候,满心欢喜地查到一个B+树的C语言实现,左上角竟然也是赫然打着阿米大叔的大名!

大叔啊,您都一大把年纪了,上能发好论文,下能写好代码,让我们这群自诩码农的后生们情何以堪?元芳,你怎么看?此事必有蹊跷!于是怀着复杂的心情,我点开了大叔的履历。其实,大叔在三十年前就已经成为了南卡大学(University of South Carolina)的英语文学教授,酷爱写诗,是拿到了终身教职的。后来,不知为何对计算机产生了浓厚的兴趣,04年去哥大读计算机专业的本科,后来一发不可收拾,经过八年的锤炼,几个月前在耶鲁博士毕业。现在在MathWorks任高级软件工程师……

UCB, paper

1. Problem

The browsing of webpages is slow on smartphones for their limited CPU computational resources. The power wall forces hardware architects to apply increases in transistor counts towards improving parallel performance, not sequential performance. So the authors introduce the parallel mobile browser.

2. Challenges

先回味一个耳熟能详的故事,再提出什么是和氏璧的困境。春秋时期,我的老家——楚国有一位从事琢玉行业的贫苦的劳动人民叫做卞和,他非常热爱自己的祖国和自己的事业,在认定一块绝世的璞玉后,再三送给国家的老大们。玉工们不识货,说这旮旯就是块石头,老大们很生气,后果很严重。厉王砍他左脚,武王砍他右脚,和同学很伤心,血泪纵横三天三夜,还好文王很善良,终于肯多剖开石头看两眼,果然是稀世珍宝,美其名曰“和氏璧”。顺便提一句,这璧后来被打造成了秦王的御玺,后为刘邦所得,三国里面有袁术因传国玉玺身败名裂吐血而亡的桥段,御玺后来辗转隋唐,终没于乱世,已然见证了半个中华的历史。

和氏璧的困境在于,就和同学个体而言,不可谓不悲惨;就和氏璧的历史而言,不可谓不辉煌:巨大的落差全悬于一念之间,即文王突发奇想,把那块毫不起眼的石头劈开,整个世界因此而改变。谢耳朵会说,噢,薛定谔的猫,你不打开盒子,这只猫既死又活,玄妙之处,不开不知道,一开下一跳。更何况,美玉千载难得,和同学妙手偶得,我想他心底也没谱,要不然为何不把表面敲开一块以示众人?这就是和氏璧的困境,某人单口头说自己有玉,别人不信,要证明就要付出惨痛的代价,不仅过程悬于一念,而且结果也是分布呈两头大中间小之势,要么发迹,要么残疾。而老大们无论结果如何,欢乐不已。

每一位想要自我证明的同学都有着各自的和氏璧的困境,你怎么就知道我不行,我怎么就知道我能行,在一个不缺人才的时代。

EOF