Scarlett: Coping with Skewed Content Popularity in MapReduce Clusters

To improve data availability and resilience MapReduce frame-
works use le systems that replicate data uniformly. However,
analysis of job logs from a large production cluster shows
wide disparity in data popularity. Machines and racks storing
popular content become bottlenecks; thereby increasing the
completion times of jobs accessing this data even when there
are machines with spare cycles in the cluster. To address this
problem, we present Scarlett, a system that replicates blocks
based on their popularity. By accurately predicting le popu-
larity and working within hard bounds on additional storage,
Scarlett causes minimal interference to running jobs. Trace
driven simulations and experiments in two popular MapRe-
duce frameworks (Hadoop and Dryad) show that Scarlett ef-
fectively alleviates hotspots and can speed up jobs by .

wirosa

Search This Blog

Scarlett: Coping with Skewed Content Popularity in MapReduce Clusters

Popular posts from this blog

Бузову уличили в�примитивном плагиате мирового бренда | Новости | Пятый канал

Elizabeth Holmes Discusses Theranos at WSJDLive 2015

17 Data Reports That Every SEO Should Be Tracking in 2026