1. Home
  2. /
  3. 10/21/2021: Learned Data Structures,...

10/21/2021: Learned Data Structures, Alexander Irribarra, University of Concepción

Speaker/presenter: Alexander Irribarra

i) Presentation description:

Learned data structures are a novel way to design data structures. By “learning” from the data, hidden patterns may be exploited to have faster operations while using less space, in contrast to traditional data structures which usually have to be specifically designed to exploit already known patterns. In general, this “learning” from the data involves applying machine learning techniques to obtain models that can predict the results of certain operations within a bounded error, and then use smaller traditional data structures to correct that error, such as the Recursive Model Index [1]. Despite this, the concept of when a data structure is “learned” is very broad, and there are learned data structures which use other techniques to exploit the patterns on the data.

Some examples of learned data structures include the Recursive Model Index [1], which allows for indexing sorted data, thus providing range and point search over the data, which are operations that can be supported with a B-Tree [2]. In fact, a B-Tree can be seen as a model which can, given a key, predict the position of that key on the database within a bounded error which corresponds to the pagesize. By creating a directed acyclic graph of models and training them on the data, the previously mentioned queries can be supported, although it does not allow for writes and updates. This weakness is covered by ALEX [3], which by providing some operations such as merging models, allocating gapped arrays for the data and splitting models, can support both read and write operations and be faster and smaller than a B+-Tree. Another example is presented in [4], in which the authors present an almost optimal size learned data structure for rank/select dictionaries, which is obtained by storing a Piecewise Linear Approximation of a mapping of the elements into the cartesian plane and the errors from the approximation. This is represented compactly using existing compact data structures for the approximations and for the errors. By doing so, this new representation can support the rank and select queries in a space and time very competitive with the state of the art solutions.

As this field of research is fairly recent, new ideas can be brought up to already existing learned data structures to augment their functionality. One example would be to add dynamism or persistency to existing solutions.

 

References:

[1] Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD ’18). Association for Computing Machinery, New York, NY, USA, 489–504. DOI:https://doi.org/10.1145/3183713.3196909

[2] Bayer R., McCreight E. (2002) Organization and Maintenance of Large Ordered Indexes. In: Broy M., Denert E. (eds) Software Pioneers. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-59412-0_15

[3]     Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet, and Tim Kraska. 2020. ALEX: An Updatable Adaptive Learned Index. In Proceedings of the 2020 ACM SIGMOD International

Conference on Management of Data (SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 969–984. DOI:https://doi.org/10.1145/3318464.3389711

[4]     Boffa, Antonio, Paolo Ferragina, and Giorgio Vinciguerra. 2021. A “Learned” Approach to Quicken and Compress Rank/Select Dictionaries. Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX). Society for Industrial and Applied Mathematics, 2021. 46-59.DOI:https://doi.org/10.1137/1.9781611976472.4