Yale DB

1. Problem

Large analytical data management (OLAP) with commodity clusters

2. Challenges

  • Performance and efficiency
    • hadoop
      • is not for structured data analysis
      • scale well
      • open source, without cost
  • scalability, fault-tolerance, and flexibility
    • Previous parallel db fit into tens of nodes, not thousand of nodes.
    • not scale well for failures, heterogeneous machines, performance untested

3. Solutions

  • MapReduce (Hadoop) + parallel db (or single-node DBs) = HadoopDB

MPI-SWS, UPenn, Duke, Akamai, paper, slides, video

1. Problems

Hybrid (Servers with assisting peers) designs in CDNs -> malicious clients can cause significant accounting inaccuracies.

2. Challenges

Infrastructure can not control P2P communications by malicious clients (peers). Even if infrastructure provides signed metadata and fallback (so content can not be mishandled by peers), there are still:

  • Affect service quality
  • Misreport P2P transfers

For example, inflation attack occurred for a fake download report.

3. Solutions

Reliable Client Accounting (RCA)

From MPI-SWS, Germany, paper and slides

1. Problem

To protect user privacy in distributed systems from leaking by statistical queries.

2. Challenges

The most direct solutions are

  1. to anonymize + add noise to user data.

    [-] utility, de-anonymize

  2. differential privacy. add noise to answer of queries.

    [-] scale, churn tolerance, malicious client

3. Solution

PDDP: Practical Distributed Differential Privacy.

Yale Dedis

1. Efficient System-Enforced Deterministic Parallelism (OSDI’10)

Parallelism introduces 1) non-determinism and 2) data races (heisenbugs). Determinism means that a given input always produces the same output. In other words, input alone determines the output, regardless of extrinsic events such as the OS’s thread scheduling.

To achieve determinism,

Important update on 09/14/2013. I implemented the flow-based scheduler in this paper on Hadoop – H-Quincy. For a much more detailed discussion, please visit this link.

Microsoft, paper, video, slides

1. Problem

How to achieve fair scheduling? In other words, a guy who gets up early and performs huge tasks on a cluster should not always monopolize most computing resource, and someone else’s assignments should not be ignored. Otherwise, it is unfair for all the cluster users.

Fair sharing of the cluster resources

Job x takes t seconds, when running exclusively on the cluster. When the cluster has J jobs, x should take <= Jt seconds.

N computers and J jobs: each job gets at least N/J computers.

2. Challenges


一 大叔

UCB, paper

1. Problem

The browsing of webpages is slow on smartphones for their limited CPU computational resources. The power wall forces hardware architects to apply increases in transistor counts towards improving parallel performance, not sequential performance. So the authors introduce the parallel mobile browser.

2. Challenges


