August 16, 2016
Pradeep Subedi, Ping Huang, Tong Liu, Virginia Commonwealth University, Joseph Moore, Stan Skelton, NetApp, Inc., Xubin He, Virginia Commonwealth University.
2016 International Conference on Parallel Processing (ICPP 2016) Philadelphia, PA, USA.
Cloud file systems like Hadoop have become a norm for handling big data because of the easy scaling and distributed storage layout. However, these systems are susceptible to failures and data needs to be recovered when a failure is detected. During temporary failures, MapReduce jobs or file system clients perform degraded reads and satisfy the read request. We argue that lack of sharing of the recovered data during degraded reads and recovery of only the requested data block places a heavy strain on the system's network resources and increases the job execution time. To this end, we propose CoARC (Co-operative, Aggressive Recovery and Caching), which is a new data-recovery mechanism for unavailable data during degraded reads in distributed file systems. The main idea is to recover not only the data block that was requested but also other temporarily unavailable blocks in the same strip and cache them in a separate data node. We also propose an LRF (Least Recently Failed) cache replacement algorithm for such a kind of recovery caches. We also show that CoARC significantly reduces the network usage and job runtime in erasure coded Hadoop.
The definitive version of the paper can be found at: http://dx.doi.org/10.1109/ICPP.2016.40.