For a DBMS that provides support for a declarative query language like SQL, the query optimizer is a crucial piece of software. The declarative nature of a query allows it to be translated into many equivalent evaluation plans. The process of choosing a suitable plan from all alternatives is known as query optimization. The basis of this choice are a cost model and statistics over the data. Essential for the costs of a plan is the execution order of join operations in its operator tree, since the runtime of plans with different join orders can vary by several orders of magnitude. An exhaustive search for an optimal solution over all possible operator trees is computationally infeasible. To decrease complexity, the search space must be restricted. Therefore, a well-accepted heuristic is applied: All possible bushy join trees are considered, while cross products are excluded from the search.
There are two efficient approaches to identify the best plan: bottom-up and top-down join enumeration. But only the top-down approach allows for branch-and-bound pruning, which can improve compile time by several orders of magnitude, while still preserving optimality.
Hence, this thesis focuses on the top-down join enumeration. In the first part, we present two efficient graph-partitioning algorithms suitable for top-down join enumeration. However, as we will see, there are two severe limitations: The proposed algorithms can handle only (1) simple (binary) join predicates and (2) inner joins. Therefore, the second part adopts one of the proposed partitioning strategies to overcome those limitations. Furthermore, we propose a more generic partitioning framework that enables every graph-partitioning algorithm to handle join predicates involving more than two relations, and outer joins as well as other non-inner joins. As we will see, our framework is more efficient than the adopted graph-partitioning algorithm. The third part of this thesis discusses the two branch-and-bound pruning strategies that can be found in the literature. We present seven advancements to the combined strategy that improve pruning (1) in terms of effectiveness, (2) in terms of robustness and (3), most importantly, avoid the worst-case behavior otherwise observed.
Different experiments evaluate the performance improvements of our proposed methods. We use the TPC-H, TPC-DS and SQLite test suite benchmarks to evaluate our joined contributions.
Inhaltsverzeichnis (Table of Contents)
- Zusammenfassung
- Abstract
- 1 Einleitung
- 2 Grundlagen
- 2.1 Relationale Algebra
- 2.2 Anfrageoptimierung
- 2.3 Top-Down Join Enumeration
- 2.4 Branch-and-Bound Pruning
- 3 Algorithmen für Top-Down Join Enumeration
- 3.1 Partitionierung von Graphen
- 3.1.1 Der Greedy-Algorithmus
- 3.1.2 Der Greedy-Algorithmus mit dynamischer Partitionierung
- 3.2 Erweiterung der Partitionierungsstrategien
- 3.2.1 Der Greedy-Algorithmus für komplexe Join-Prädikate
- 3.2.2 Ein generisches Framework für Top-Down Join Enumeration
- 4 Optimierung der Branch-and-Bound Pruning-Strategien
- 4.1 Branch-and-Bound Pruning-Strategien
- 4.1.1 Kosten-basierte Pruning
- 4.1.2 Kardinalitäts-basierte Pruning
- 4.2 Verbesserungen der Branch-and-Bound Pruning-Strategien
- 4.2.1 Kombination von Pruning-Strategien
- 4.2.2 Pruning im Greedy-Algorithmus
- 4.2.3 Optimierungen der Kosten-basierten Pruning-Strategie
- 4.2.4 Optimierungen der Kardinalitäts-basierten Pruning-Strategie
- 5 Experimentelle Evaluierung
- 5.1 Evaluierung der Partitionierungsalgorithmen
- 5.2 Evaluierung der Branch-and-Bound Pruning-Strategien
- 6 Zusammenfassung und Ausblick
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
The primary objective of this dissertation is to improve the efficiency of top-down join enumeration in query optimization for database management systems. This is achieved by developing novel algorithms and enhancing existing methods for graph partitioning and branch-and-bound pruning.
- Efficient algorithms for graph partitioning in the context of top-down join enumeration
- Handling complex join predicates and outer joins in top-down join enumeration
- Optimization and enhancement of existing branch-and-bound pruning strategies
- Experimental evaluation of the proposed algorithms and strategies using various benchmarks
- Analysis and comparison of the performance improvements achieved through the proposed methods
Zusammenfassung der Kapitel (Chapter Summaries)
- Chapter 1: Einleitung This chapter introduces the topic of query optimization and its importance for database management systems. It highlights the limitations of existing methods for top-down join enumeration, particularly in handling complex join predicates and outer joins. It also introduces the objectives and key themes of the dissertation.
- Chapter 2: Grundlagen This chapter provides essential background information on relational algebra, query optimization, top-down join enumeration, and branch-and-bound pruning. It lays the foundation for the subsequent chapters by defining key concepts and terminology.
- Chapter 3: Algorithmen für Top-Down Join Enumeration This chapter presents two efficient graph-partitioning algorithms designed for top-down join enumeration. It then extends these algorithms to handle complex join predicates and outer joins, introducing a generic framework for top-down join enumeration.
- Chapter 4: Optimierung der Branch-and-Bound Pruning-Strategien This chapter explores existing branch-and-bound pruning strategies and proposes improvements to their combination. It presents optimizations for both cost-based and cardinality-based pruning techniques, aiming to enhance their effectiveness, robustness, and efficiency.
- Chapter 5: Experimentelle Evaluierung This chapter presents experimental results evaluating the performance of the proposed algorithms and strategies. It uses benchmark datasets like TPC-H, TPC-DS, and SQLite to measure the efficiency and effectiveness of the developed methods, showing significant improvements in query optimization performance.
Schlüsselwörter (Keywords)
This dissertation focuses on query optimization, specifically on improving the efficiency of top-down join enumeration. Key areas of research include graph partitioning, complex join predicates, outer joins, branch-and-bound pruning, and experimental evaluation of performance improvements. The work contributes to advancements in the area of database management systems, particularly in the optimization of query processing.
- Quote paper
- Pit Fender (Author), 2014, Algorithms for Efficient Top-Down Join Enumeration, Munich, GRIN Verlag, https://www.grin.com/document/274543