Skip to main content

Scarlett: Coping with Skewed Content Popularity in MapReduce Clusters







To improve data availability and resilience MapReduce frame-
works use le systems that replicate data
uniformly. However,
analysis of job logs from a large production cluster shows
wide disparity in data popularity. Machines and racks storing
popular content become bottlenecks; thereby increasing the
completion times of jobs accessing this data even when there
are machines with spare cycles in the cluster. To address this
problem, we present
Scarlett, a system that replicates blocks
based on their popularity. By accurately predicting le popu-
larity and working within hard bounds on additional storage,
Scarlett causes minimal interference to running jobs. Trace
driven simulations and experiments in two popular MapRe-
duce frameworks (Hadoop and Dryad) show that
Scarlett ef-
fectively alleviates hotspots and can speed up jobs by . 



Popular posts from this blog

Elizabeth Holmes Discusses Theranos at WSJDLive 2015

Elizabeth Holmes Discusses Theranos at WSJDLive 2015 Elizabeth Holmes Discusses Theranos at WSJDLive 2015 At the WSJDLive 2015 conference, Theranos founder and CEO Elizabeth Holmes discusses her company's proprietary technologies, the FDA's inspection of its facilities, and the assertion that her company was too quick to market its products.