Contact Sales
Welcome!

An account will enable you to access:
- NetApp support's essential features
- NetApp communities
- NetApp training
- Sign in to my dashboard
- Don't have an account?
  Create an account
- BlueXP is now NetApp Console
  
  Monitor and run hybrid cloud data services
  NetApp Console
NetApp account
Language
- English
- Deutsch
- Español
- Français
- Italiano
- Português
- 日本語
- 한국어
- 简体中文
- 繁體中文
See your global contacts
Learn
Browse

Designing a Fast File System Crawler with Incremental Differencing

Date

December 15, 2012

Author

Tim Bisson, Yuvraj Patel, and Shankar Pasupathy.

In this paper, we discuss the challenges in building a file system crawler and then present the design of two file system crawlers.Search engines for storage systems rely on crawlers to gather the list of files that need to be indexed. The recency of an index is determined by the speed at which this list can be gathered. While there has been a substantial amount of literature on building efficient web crawlers, there is very little literature on file system crawlers. In this paper we discuss the challenges in building a file system crawler. We then present the design of two file system crawlers: the first uses the standard POSIX file system API but carefully controls the amount of memory and CPU that it uses. The second leverages modifications to the file system’s internals, and a new API called SnapDiff, to detect modified files rapidly. For both crawlers we describe the incremental differencing design; the method to produce a list of changes between a previous crawl and the current point in time.

In ACM SIGOPS Operating Systems Review, Vol. 46, No. 3, December 2012, pp. 11-19

Resources

A copy of the paper is attached to this posting. The definitive version of the paper can be found at: https://dl.acm.org/citation.cfm?doid=2421648.2421652.

FS_crawler_Bisson.pdf