July 20, 2020
Douglas Santry, NetApp
12th USENIX Workshop on Hot Topics in Storage and File Systems July 2020 Online virtual conference
Binary comparison, the basis of the venerable B Tree, is perhaps the most successful operator for indexing data on secondary storage. We introduce a different technique, called Neural Trees, that is based on neural networks. Neural Trees increase the fan-out per byte of a search tree by up to 40% compared to B Trees. Increasing fan-out reduces memory demands and leads to increased cacheability while decreasing height and media accesses. A Neural Tree also permits search path layout policies that divorce a key’s value from its physical location in a data structure. This is an advantage over the total ordering required by binary comparison, which totally determines the physical location of keys in a tree. Previous attempts to apply machine learning to indices are based on learning the data directly, which renders insertion too expensive to be supported. The Neural Tree is a hybrid scheme using a tree of small neural networks to learn search paths instead of the data directly. Neural Trees can efficiently handle a general read/write workload. We evaluate Neural Trees with weeks of traces from production storage and SPC1 workloads to demonstrate their viability.
Check out the paper and the presentation at: https://www.usenix.org/conference/hotstorage20/presentation/santry.