K-dimensional tree. Anatomy of a KD-Trees Quickly Build a KD-Tree

For a change, the beginning of the post is written in the spirit of "Sergei, you learn mathematics! Write strictly.", Do not pay attention, then everything is fine.
Yes, and I didn't mean that I have a lot of experience on this issue, I have quite a bit of experience, but I was asked to tell you.

Suppose we have a set of possible elements X, a selected subset A, and the task is to find some "most similar" to it in A. For example, ten most "similar" ones. Or everyone whose "similarity" is not less than a given one. It is advisable not to iterate over all A, this is a long time.

The problem is excellent and, most importantly, it is quickly solved if the elements are numbers, and the "similarity" is the greater, the closer the values are to each other: just a sorted array and a binary search, or a search tree (binary, n-ary, B-tree - does not matter).

That in this case allows solving the problem quickly: the presence of a given order on the set, consistent with "similarity". That is, if the required element x< a, и a < b, то x более похож на a, чем на b. Это позволяет не рассматривать сильно большие/меньшие элементы, так как они заведомо хуже подходят.

If you define "similarity" differently, for example through the Levenshtein distance between the decimal notation, the presence of order ceases to help. That is, the point is precisely in the consistency of one with the other, and not in the possibility of specifying the order at least somehow, especially since "at least somehow" is always possible, there is such a theorem in set theory.

The problem is that there isn't always a suitable order. The main task, which will be discussed further: the elements are points on a plane, and the similarity is the greater, the smaller the distance between the points.

In the case of n-dimensional space, both the problem and the solutions are generalized trivially, well, it seems that you can even go a little further, more on that below.

Ideas

The first thought that comes to mind for a normal person who does not want to invent complicated things is "let's break it into cells." That is, the plane is divided into squares, the existing points are snapped to the corresponding squares, the square in which it lies is determined by the point x, the search is carried out along it and several neighboring ones. Doesn't work very well. The problem is that this is a "data-independent" partitioning method, due to this it is very simple, due to the same it can very poorly correspond to a specific set of points. For example, if it turned out that 99% of the points are concentrated in one cell, you still have to go through everything, and we tried so hard.

Second thought: and then let's make the cells more frequent in those places where there are more dots. Let's. Only these are half measures, we will bring the thought to the end:

We start with one large cell.

If in some cell there are too many points to iterate over, we divide it into two parts. In order to make it easy to assign points to cells, we divide by vertical and horizontal lines, for example, in turn this way and that way (where exactly to draw them is a separate question).

Carry out the previous point until all cells have an acceptable number of points

The result is an obvious hierarchical structure: top-level cells contain two smaller cells (right and left or top and bottom), leaf cells contain dots. It is a two-dimensional KD-tree (from K-dimensional, i.e. K-dimensional), a generalization of a binary tree to a multidimensional space. If the dimension is greater than two, instead of straight lines there will be planes (hyperplanes) perpendicular to the coordinate axes. The main idea, I hope, is clear.

Third thought: why do we need cells where there are no points, let's build cells only where necessary. It was not possible to describe as briefly and clearly as the KD-tree, but something like this:

There are no dots, no cells.

With the appearance of the first point, a rectangle is built around it exactly in size (with sides 0 by 0).

We continue to add points one at a time, expanding the rectangle as needed. The sides are parallel to the axes, this simplifies the calculation of the hit points. And let's further call the "rectangle" the word "node".

When there are too many points in a node, the structure is transformed:
- a root appears that itself does not contain points, but contains child nodes
- and the original node is split into several (well, for example, two), and they all become children of the root

Points keep adding. If a new point falls directly into an existing node, it is added to it, if not, then the node is selected that needs to increase the area less than others to include a new point (the idea is that the smaller the total area of the nodes, the better the result).

The knots keep breaking apart. When the next node overflows, it falls apart and they all become children of the root.

Until there are too many nodes at the root. Then the root itself breaks up into parts, which become the children of the new root. Each such part has an associated rectangle that contains all of its children and expands if necessary.

And so, until we add all the points

This is called an R-tree (R for rectangle), a generalization of the B-tree to the multidimensional case. If there are more than two dimensions, n-dimensional parallelepipeds are obtained. You may have noticed that during construction it may turn out that different nodes will intersect with each other; it is not critical, although unpleasant. Again, I hope that some general idea is clear.

How to search

There will be a couple of words about the reasons for building trees. But after everything is built, it turns out that a KD-tree is just a special case of an R-tree: each node contains only two nodes of the next level, and in space they are located very specifically, but all this is within the framework of what is allowed for R-trees. and there are no other differences. So the algorithm for finding the nearest points can be written general. Something like this:

I have a working implementation, it worked quite quickly (compared to search by exhaustive search), but it was written for a long time and now I do not really like the code. And to correct laziness. So what is below is out of my head, it never even started. Be carefull.

Problem # 1: find all points lying from x at a distance not more than R
For internal nodes, children are nodes:
def getBall (self, x, R): return sum (,)
For leaves, children are points:
def getBall (self, x, R): return sum (,)
Optionally, of course, you can define a getBall () function for points and erase the difference between leaves and roots at that point.

Task # 2: find the n points closest to x
First, you need to have a priority queue, ordered so that the point farthest from x is at the top, the first to the exit. The standard Python heapq does not allow you to specify your own comparison function (or I am not in the know and you can somehow simply and beautifully replace the "less" operator for specific instances on the fly, or the developers did not really think about the interface), apparently you need something else, rather only someone has already written. Let there be such a thing, it is passed in the q parameter. Then something like this:

For internal nodes:
def getN (self, x, q, n): self.children.sort (key = lambda a, x = x: a.dist (x)) # what's the difference, in which order the children go, let's sort. for c in self.children: if (len (q)< n) or q.top.dist(x) >c.dist (x): # if there are still completely empty spaces, q = c.getN (x, q, n) # or there is a chance to come across a closer point than the current worst one else: break # because just sorted the children, it will only get worse return q
For leaves:
def getN (self, x, q, n): self.children.sort (lambda a, x = x: a.dist (x)) # what's the difference, in which order the children go, sort for c in self.children: if (len (q)< n) or q.top.dist(x) >c.dist (x): # if there are still completely blank spaces, q.push (c) # or point closer than the current worst one else: break return q
The idea of sorting children is not only dubious ideologically, but also its effectiveness is questionable. If n is small, it may be faster to execute min as many times as needed, or something like that.

That is, what is the general meaning of these trees: when searching for the nearest ones, the distance to points is roughly estimated as the distance to the node containing them, the distant ones are discarded, and as a result, only points from close nodes remain, which are not so many.

Why else do we need

The question, of course, is who else is here. I have no statistics and I do not know what is the main application. However.

3D graphics, a ray tracing algorithm that allows you to obtain photographic quality images with the play of light, reflections, refractions and other diffraction. It doesn't work very quickly. The main task solved during the operation of the algorithm is to find the intersection of a given ray with objects in the scene. If the objects are of complex shape (for example, not square), this is not very easy, and the more complex the shape of the object, the more complex. The scene is large enough, the object is some kind of surface, triangulated (for example) and defined by the vertices of the triangles. There are many triangles, as well as vertices. The points are distributed very unevenly throughout the volume, in theory, most of the rays pass by, it remains to understand which ones.

Building a KD-tree or something like an R-tree around an object (as far as I understand it is more or less analogous, they call it Bounding Volume Hierarchy, BVH) allows you to quickly understand which ray falls where.

Picture taken from here

How to build

Alas this separate problem which I did not understand. Maybe someday, but not this time. The KD-tree, which stupidly divides the cells in half (perpendicular to different axes, in a circle), without thinking about where it would be better to draw the plane, also works. With this approach, it can be built not by a set of points, but by adding one at a time, just as I described the R-tree.

Only the algorithm for dividing a node into parts needs to be added to the description of building an R-tree. Also head-on: we divide into two parts, as centers of crystallization we choose the two children most distant from each other (points or nodes of the next level, this is O (n ^ 2), where n is the number of children in a node), we collect the rest according to the principle "which rectangle to expand less". This algorithm is quadratic about the maximum number of children in a node, the result is not optimal, but the optimal one seems to require an exponential. In general, it also works like that, but in terms of price / quality ratio, it can be generally the best.

I advise you to look at how this problem is solved in 3D graphics: KD-tree, BVH. But, perhaps, they have other criteria of optimality, here you have to think. Upd: About BVH is not applicable, alas. There is not just a set of points, there is known the original object from which this set was obtained and the connections between the points. Therefore, they can afford to build shells around the object. If there are only points, you can only use something like clustering algorithms to refine the positions of the rectangles. But most likely it will take a long time.

Oh yes. The KD-tree wiki describes something completely different. They propose not to distinguish between leaves and internal nodes, but to associate a plane with each point. This turns out to be a completely honest generalization of a binary tree to a multidimensional space. It seems to me that this should not be done. But I haven't actually tried it.

A couple more thoughts

Exactly two:

It is clear that the rectangles in the R-tree are determined only by the speed at which the point is counted inside. That is, nothing in the algorithm prevents you from replacing them with any other shapes. If my ability to draw conclusions does not change me, this allows us to generalize the problem to a situation where we do not have axes and coordinates, but only the elements of a set and a suitable function of the distance between them. Mathematically good: non-negative, symmetrical, with the triangle rule. For example, the lines and the Levenshtein distance, already mentioned. The KD tree is obviously inapplicable here, since there is no concept of a plane. And there are no parallelepipeds either. But there are balls. And it seems possible to build a tree from balls. In theory, it should be quite smart in offering options for correcting typos, I wonder if they do this or not? Upd: there is

It is not at all necessary to store points in trees. You can, for example, balls, or any other objects. The main thing is that there is a function for calculating the distance and that the object fits entirely into the cell, i.e. so that the distance to it can be roughly estimated as the distance to the cell. This will be enough to implement the search algorithm.

- data structure, which is a tree-like structure in the form of a set of related nodes.

This is a finite set of elements that is either empty or contains an element (root) associated with two different binary trees called left and right subtrees... Each element of a binary tree is called a node. The connections between the nodes of a tree are called branches.

Binary tree representation:

A is the root of the tree
B - the root of the left subtree
C - the root of the right subtree

The root of the tree is located at the level with the minimum value.

Node D that sits directly below node B is called B's child. If D is at level i, then B is at level i-1. Node B is called the ancestor of D.

The maximum level of any element in the tree is called its depth or height.

If the element has no children, it is called a sheet, or terminal node wood.

The rest of the elements are internal nodes (branch nodes).

The number of children of an internal node is called its degree. The maximum degree of all nodes is the degree of the tree.

The number of branches to go from the root to node x is called the length of the path to x. The root has a path length of 0; a node at level i has a path length equal to i.

A binary tree is used when one of two possible decisions must be made at each point of the computational process.

There are many tasks that can be done in a tree.

A common task is to perform a given operation p on each element of the tree. Here p is considered as a parameter of the more general problem of visiting all nodes or the problem of traversing a tree.

If we consider the task as a single sequential process, then individual nodes are visited in a certain order and can be considered linearly located.

Tree traversal methods

Let we have a tree, where A is the root, B and C are the left and right subtrees.

There are three ways to traverse the tree:

Traversing the tree from top to bottom (in direct order): A, B, C - prefix form.
Traversing the tree in symmetric order (from left to right): B, A, C - infix form.
Traversing the tree in reverse order (from bottom to top): B, C, A - postfix form.

Implementing a tree

A tree node can be described as a structure:

struct tnode (

Int field; // data field

Struct tnode * right; // right child
};

In this case, the tree traversal in prefix form will have the form

if (tree! = NULL) (

cout<< tree->field; // Display the root of the tree

Traversing the tree in infix form will look like

void treeprint (tnode * tree) (

if (tree! = NULL) ( // Until an empty node is encountered

cout<< tree->field; // Display the root of the tree

The tree traversal in postfix form will look like

void treeprint (tnode * tree) (

if (tree! = NULL) ( // Until an empty node is encountered

cout<< tree->field; // Display the root of the tree

Binary (binary) search tree Is a binary tree for which the following are performed additional conditions(search tree properties):

both subtrees, left and right, are binary search trees;
all nodes of the left subtree of an arbitrary node X have data key values less than the data key value of the node X itself;
all nodes of the right subtree of an arbitrary node X have data key values not less than the data key value of node X.

The data in each node must have keys on which the comparison operation is less defined.

Typically, the information representing each node is a record rather than a single data field.

To compose a binary search tree, consider the function of adding a node to the tree.

Adding nodes to the tree

struct tnode * addnode (int x, tnode * tree) (

If (tree == NULL) ( // If there is no tree, then form a root

tree = new tnode; // memory for the node

tree-> field = x; // data field

tree-> left = NULL;

tree-> right = NULL; // initialize branches with empty

) else if (x< tree->field) // condition for adding the left child

tree-> left = addnode (x, tree-> left);

Else // condition for adding the right child

tree-> right = addnode (x, tree-> right);

Return (tree);
}

Deleting a subtree

void freemem (tnode * tree) (

If (tree! = NULL) (

freemem (tree-> left);

freemem (tree-> right);

Delete tree;

Example Write a program that counts the word frequency of the input stream.

Since the word list is not known in advance, we cannot pre-order it. It is unreasonable to use a linear search for each received word to determine whether it has occurred before or not, because in this case, the program is too slow.

One way is to constantly maintain the order of the words already received, placing each new word in such a place that the existing order is not disturbed. Let's use a binary tree.

In the tree, each node contains:

pointer to word text;
counter of the number of occurrence;
pointer to the left child;
pointer to the right child.

Let's consider the execution of the program using the phrase

In this case, the tree will have the following form

Implementation example

#include
#include
#include
#include
#define MAXWORD 100
struct tnode (// tree node

Char * word; // pointer to string (word)

Int count; // number of occurrences

Struct tnode * left; // left child

Struct tnode * right; // right child
};
// Function for adding a node to the tree
struct tnode * addtree (struct tnode * p, char * w) (

This article is completely devoted to KD-Trees: I describe the intricacies of constructing KD-Trees, the intricacies of the implementation of the "near" search functions in the KD-Tree, as well as possible "pitfalls" that arise in the process of solving certain subtasks of the algorithm. In order not to confuse the reader with terminology (plane, hyper-plane, etc.), and in general for convenience, it is assumed that the main action unfolds in three-dimensional space. However, where necessary, I note that we are working in a space of a different dimension. In my opinion, the article will be useful both for programmers and for all those who are interested in learning algorithms: someone will find something new for themselves, and someone will simply repeat the material and, possibly, add the article in the comments. In any case, I ask everyone under the cat.

Introduction

KD-Tree(K-dimensional tree), a special "geometric" data structure that allows you to split the K-dimensional space into "smaller parts" by cutting this very space by hyperplanes ( K> 3), planes ( K = 3), straight lines ( K = 2) well, and in the case of a one-dimensional point space (performing a search in such a tree, we get something similar to binary search).
It is logical that such a partition is usually used to narrow the search range in K-dimensional space. For example, finding a close object (vertex, sphere, triangle, etc.) to a point, projecting points onto a 3D mesh, ray tracing (actively used in Ray Tracing), etc. In this case, space objects are placed in special parallelepipeds - bounding boxes(we call a bounding box such a parallelepiped that describes the initial set of objects or the object itself, if we build a bounding box only for it. For points, a bounding box with zero surface area and zero volume is taken as a bounding box), the sides of which are parallel coordinate axes.

Node division process

So, before using the KD-Tree, you need to build it. All objects are placed in one large bounding box, which describes the objects of the original set (each object is bounded by its own bounding box), which is then split (divided) by a plane parallel to one of its sides into two. Two new nodes are added to the tree. In the same way, each of the resulting parallelepipeds is divided into two, etc. The process ends either by a special criterion (see. SAH), either upon reaching a certain depth of the tree, or upon reaching a certain number of elements within the tree node. Some elements can simultaneously enter both the left and right nodes (for example, when triangles are considered as elements of a tree).

I will illustrate this process with the example of a set of triangles in 2D:

On the fig. 1 the original set of triangles is shown. Each triangle is placed in its own bounding box, and the entire set of triangles is inscribed in one large root bounding box.

On the fig. 2 we divide the root node into two nodes (by OX): triangles 1, 2, 5 fall into the left node, and 3, 4, 5 fall into the right one.

On the fig. 3, the resulting nodes are again divided by two, and triangle 5 enters each of them. When the process ends, we get 4 leaf nodes.

The choice of plane for dividing the tree node is of great importance. There are a huge number of ways to do this, I give only some of the most common in practice methods (it is assumed that the initial objects are placed in one large bounding box, and the division occurs parallel to one of the coordinate axes):

⦁ Method 1: Choose the greatest side bounding box. Then the cutting plane will pass through the middle of the selected side.

⦁ Method 2: Dissect by median: we sort all primitives by one of the coordinates, and the median is the element (or the center of the element) that is in the middle position in the sorted list. The cutting plane will pass through the median so that the number of elements on the left and right will be approximately equal.

⦁ Method 3: Using alternating sides when splitting: at depth 0 we hit through the middle of the side parallel to OX, the next depth level is through the middle of the side parallel to OY, then along OZ, etc. If we "walked along all axes", then we start the process again. The exit criteria are described above.

⦁ Method 4: The smartest option is to use SAH (Surface Area Heuristic) function for evaluating the division of the bounding box (This will be discussed in detail below). SAH also provides a universal criterion for stopping node division.

Methods 1 and 3 are good when it comes to the speed of building a tree (since it is trivial to find the middle of a side and draw a plane through it, filtering out the elements of the original set "to the left" and "to the right"). At the same time, they quite often give a loose representation of the partitioning of space, which can negatively affect the basic operations in the KD-Tree (especially when tracing a ray in the tree).

Method 2 allows you to achieve good results, but requires a considerable amount of time spent sorting the elements of the node. Like methods 1, 3, it is quite simple to implement.

Of greatest interest is the method using "smart" SAH heuristics (method 4).

Bounding box of a tree node is divided by N (parallel to the axes) planes into N + 1 parts (parts, as a rule, are equal) on each side (in fact, the number of planes for each side can be any, but for simplicity and efficiency they take a constant) ... Further, in possible places of intersection of the plane with the bounding box, the value of a special function is calculated: SAH. The division is performed by the plane with the smallest value of the SAH function (in the figure below, I assumed that the minimum is reached in SAH, therefore, the division will be performed by this plane (although here 2D space is so straight)):

The SAH function value for the current plane is calculated as follows:

In my implementation of the KD-Tree, I divide each side into 33 equal parts using 32 planes. Thus, according to the test results, I was able to find a "golden" mean - the speed of the main functions of the tree / the speed of building the tree.

As a SAH heuristic, I use the function shown in the figure above.

It should be noted that for making a decision, only a minimum of this function is required for all secant planes. Therefore, using the simplest mathematical properties of inequalities, as well as discarding multiplication by 2 when calculating the surface area of a node (in 3D) ( SAR, SAL, SA), this formula can be significantly simplified. In full, the calculations are made only once per node: when evaluating the criterion for exiting the division function. Such a simple optimization gives a significant increase in the speed of building a tree ( x2.5).

To efficiently calculate the value of the SAH function, you must be able to quickly determine how many nodal elements are to the right of a given plane, and how many to the left. The results will be unsatisfactory if a rough algorithm is used as an algorithm, the so-called brute force approach with quadratic asymptotics. However, when using binned method, the situation changes significantly in better side... This method is described below:

Suppose we split the side of the bounding box into N equal parts (the number of planes is (N-1)), we store the bounding box with a pair of coordinates (pointMin, pointMax-see. rice. one), then in one pass through all the elements of the node, we can accurately determine for each plane how many elements lie to the right and how many to the left of it. Let's create two arrays of N elements each, and initialize them with zeros. Let them be arrays with names aHigh and aLow... We consistently run through all the elements of the node. For the current element, check which part the pointMin and pointMax of its bounding box fall into. Accordingly, we get two indices per element of the set. Let it be indexes with names iHigh(for pointMax) and iLow(for pointMin). After that, let's do the following: aHigh + = 1, aLow + = 1.

After passing through all the elements, we get the filled arrays aHigh and aLow. For the aHigh array, calculate the suffix sums, and for aLow, calculate the prefix sums (see figure). It turns out that the number of elements to the right ( and only on the right!) from the plane with index i will be equal to aLow, the number of elements lying to the left ( and only to the left!): aHigh [i], the number of elements that are included in both the left and right nodes: AllElements-aLow-aHigh [i].

The task has been solved, and an illustration of this simple process is given below:

I would like to note that obtaining a previously known number of elements to the left and right of the "beating" plane allows us to pre-allocate the required amount of memory for them (after all, after obtaining the minimum SAH, we need to go through all the elements again and place each in the required array , (and using the banal push_back (if reserve was not called) leads to constant memory allocation - a very expensive operation), which has a positive effect on the speed of the tree building algorithm (x3.3).

Now I would like to describe in more detail the purpose of the constants used in the formula for calculating SAH, and also talk about the criterion for stopping the division of this node.

Looping through constants cI and cT, it is possible to achieve a denser tree structure (or vice versa) by sacrificing the running time of the algorithm. In many articles devoted primarily to building a KD-Tree for Ray Tracing rendering, the authors use the values cI = 1., cT =: the larger the value of cT, the faster the tree is built, and the worse the results of ray tracing in such a tree.

In my own implementation, I use a tree to search for the "neighbor", and, having paid due attention to the obtained test results and finding the necessary coefficients, you can see that high cT values give us nodes that are not densely filled with elements. To avoid this situation, I decided to set the cT value to 1. and test the cI value on different-large-data units. As a result, we managed to get a fairly dense structure of the tree, having paid off with a significant increase in construction time. This action had a positive effect on the search results for "neighbor" - the search speed increased.

The criterion for stopping node division is quite simple:

In other words: if the cost of tracking child nodes is more than the cost of tracking the parent node, then the division must be stopped.

Now that we have learned how to divide a KD-Tree node, I will tell you about the original cases when the number of elements in a node turns out to be quite large, and the criteria for stopping by the number of elements take the algorithm to infinity. Actually, all attention to the picture (for example, triangles in 2D):

I call situations like this "fan-shaped"(they have a common point of contact, coinciding objects are also included in this category). It can be seen that no matter how we draw the cutting plane, the central point somehow falls into one of the nodes, and with it the triangles for which it is common.

The tree building process

We have learned how to divide a tree node, now it remains to apply the knowledge gained to the process of building the entire tree. Below I give a description of my implementation of building a KD-Tree.

The tree is built from the root. In each node of the tree, I store pointers to the left and right subtrees, if the node does not have any, then it is considered leafy(sheet, i.e.). Each node stores a bounding box that describes the objects of this node. In leaf ( and only in leaf!) nodes, I store the indices of those objects that are included in this node. However, during the construction process, memory for nodes is allocated in portions (i.e., immediately for K nodes: firstly, it is more efficient to work with the memory manager, and secondly, consecutive elements are ideal for caching). This approach prohibits storing tree nodes in the vector, because adding new elements to the vector can lead to the reallocation of memory for all existing elements to another place.

Accordingly, pointers to subtrees are meaningless. I am using a container of type list (std :: list), each element of which is a vector (std :: vector), with in advance given size(constant). I build the tree in a multithreaded manner (using Open MP), that is, I build each subtree in a separate thread (x4 to speed). The move semantics (C ++ 11) (+ 10% speed) is ideal for copying indexes to a leaf node.

Finding "closest" to a point in the KD-Tree

So, the tree is built, let's move on to describing the implementation of the search operation in the KD-Tree.

Suppose that we carry out the search in a set of triangles: given a point, it is required to find the triangle closest to it.

It is not profitable to solve the problem with the use of bruteforce: for a set of N points and M triangles, the asymptotics will be O (N * M). In addition, I would like the algorithm to calculate the distance from a point to a triangle "honestly" and do it quickly.

Let's use the KD-Tree and do the following:

⦁ Step 1... Let's find the nearest leaf node to a given point (at each node, as you know, we store a bounding box, and we can safely calculate the distance to the center ((pointMax + pointMin) * 0.5) of the node's bounding box) and designate it by nearestNode.

⦁ Step 2... By iterating over all the elements of the found node (nearestNode). The resulting distance will be denoted by minDist.

⦁ Step 3... Construct a sphere with a center at the origin and a radius of length minDist. Let's check if this sphere lies completely inside (that is, without any intersections of the sides of the bounding box-node) nearestNode. If it does, then the nearest element is found, if not, then go to step 4.

⦁ Step 4... Let's start from the root of the tree, search for the nearest element in the radius: going down the tree, we check whether the right or left nodes intersect (in addition, a node can lie entirely inside a sphere or a sphere inside a node ...) this sphere. If any node was crossed, then a similar check is performed for the internal nodes of the same node. If we come to a leaf node, then we will perform an exhaustive search for the neighbor in this node and compare the result with the length of the radius of the sphere. If the radius of the sphere is greater than the found distance, update the length of the radius of the sphere with the calculated minimum value. Further descents along the tree occur with the updated length of the radius of the sphere (if we use the recursive algorithm, then the radius is simply passed to the function by reference, and then, where necessary, is updated).

The following figure helps to understand the essence of the algorithm described above:

According to the picture: Suppose we found the closest leaf node (blue in the figure) to a given point (highlighted in red), then, after searching for the closest one in the node, we get that this is a triangle with ichendex 1, however, as you can see, this is not the case. A circle with radius R intersects an adjacent node, therefore, the search must be performed in this node as well, and then compare the newly found minimum with what is already there. As a result, it becomes obvious that the triangle with the index 2 is the nearest one.

Now I would like to tell you about the effective implementation of intermediate operations used in the "near" search algorithm.

When searching for a neighbor in a node, you need to be able to quickly calculate the distance from a point to a triangle. I will describe the simplest algorithm:

We find the projection of point A (the point to which we are looking for the nearest one) on the plane of the triangle. We denote the found point by P. If P lies inside the triangle, then the distance from A to the triangle is equal to the length of the segment AP, otherwise we find the distance from A to each of the sides (segments) of the triangle, choose the minimum. The problem has been solved.

The described algorithm is not the most efficient one. A more efficient approach relies on the search and analysis (finding the minimum of the gradient, etc.) a function whose values are all possible distances from a given point to any point in the triangle. The method is quite difficult to understand and, in my opinion, deserves a separate article (so far it has been implemented in my code, and you will find a link to the code below). You can get acquainted with the method by. This method, according to the test results, turned out to be 10 times faster than that that I described earlier.

Determining whether a sphere centered at point O and radius R is inside a particular node represented by a bounding box is simple (3D):

Inline bool isSphereInBBox (const SBBox & bBox, const D3 & point, const double & radius) (return (bBox.m_minBB< point - radius && bBox.m_maxBB > < point - radius && bBox.m_maxBB >point + radius && bBox.m_minBB< point - radius && bBox.m_maxBB >point + radius); )
With the algorithm for determining the intersection of a sphere with the bounding box of a node, finding a node inside a sphere or a sphere inside a node, the situation is somewhat different. I will illustrate again (picture taken from) and give the correct code that allows you to perform this procedure (in 2D, 3D-similar):

bool intersects (CircleType circle, RectType rect) (circleDistance.x = abs (circle.x - rect.x); circleDistance.y = abs (circle.y - rect.y); if (circleDistance.x> (rect.width / 2 + circle.r)) (return false;) if (circleDistance.y> (rect.height / 2 + circle.r)) (return false;) if (circleDistance.x<= (rect.width/2)) { return true; } if (circleDistance.y <= (rect.height/2)) { return true; } cornerDistance_sq = (circleDistance.x - rect.width/2)^2 + (circleDistance.y - rect.height/2)^2; return (cornerDistance_sq <= (circle.r^2)); }
First (the first couple of lines) we reduce the calculations of the 4 quadrants to one. In the next couple of lines, we check if the circle lies in the green area. If it does, then there is no intersection. The next couple of lines will check if the circle is in the orange or gray area of the drawing. If it does, then an intersection is found.

Actually, this calculation returns "false" for all circles whose center is within the red region, and "true" for all circles centered in the white area.

In general, this code provides what we need (I have provided the implementation of the code for 2D here, but in my KD-Tree code I use the 3D version).

It remains to talk about the speed of the search algorithm, as well as about critical situations that slow down the search in the KD-Tree.

As stated above, "fan" situations generate a large number of elements inside the node, the more there are, the slower the search. Moreover, if all elements are equidistant from a given point, then the search will be carried out after O (N)(the set of points that lie on the surface of the sphere, and we look for the nearest one to the center of the sphere). However, if these situations are removed, then the search, on average, will be equivalent to descending the tree with enumeration of elements in several nodes, i.e. per O (log (N))... It is clear that the search speed depends on the number of visited leaf nodes of the tree.

Consider the following two figures:

The essence of these figures is that if the point to which we are looking for the nearest element is located very, very far from the initial bounding box of the set, then a sphere with a radis of length minDist (distance to the nearest one) will cross many more nodes than if we considered the same sphere, but centered at a point much closer to the original bounding box of the set (of course, minDist will change). In general, searching for points close to a very distant point is slower than searching for a point close to the original set. My tests confirmed this information.

Results and summing up

As a conclusion, I would like to add a few more words about my implementation of the KD-Tree and give the results obtained. Actually, the design of the code was developed so that it could be easily adapted to any objects of the original set (triangles, spheres, points, etc.). All you need to do is create an inherited class with overridden virtual functions. Moreover, my implementation also provides for passing a special class Splitter user-defined. This class, or rather its virtual split method, determines exactly where the clipping plane will pass. In my implementation, I provide a SAH based decision class. Here, I note that in many articles devoted to accelerating the construction of a KD-Tree based on SAH, many authors use simpler techniques for finding a cutting plane (like dividing by the center or median ), and SAH heuristics are applied only when the number of elements in a node is small.

My implementation of such tweaks does not contain, but allows you to quickly add them (you just need to extend the constructor of the KD-Tree with a new parameter and call the construction member function with the necessary splitter, controlling the necessary restrictions). Searching in the tree is multithreaded. All calculations are performed in double precision numbers ( double). The maximum tree depth is set by a constant (default is 32). Some #defines, allowing, for example, to perform tree traversal when searching without recursion (with recursion, though it is faster to exit - all nodes are elements of some vector (that is, they are located nearby in memory), which means they are well cached). Along with the code, I provide test datasets (3D models "modified OFF format" with different numbers of triangles inside (from 2 to 3,000,000)). The user can drop a dump of the constructed tree (DXF format) and view it in the appropriate graphics program. The program also logs (you can enable / disable) the quality of the tree construction: the depth of the tree is reset, the maximum number of elements in the leaf node, the average number of elements in the leaf nodes, the running time... I am by no means claiming that the resulting implementation is perfect, but what is really there, I myself know the places where I missed (for example, I do not pass the allocator to the template parameter, the frequent presence of C code (I do not read and write files using streams) , unnoticed bugs are possible, etc. "s time to fix it.) And of course, the tree is made and optimized strictly for working in 3D space.

I tested the code on a laptop with the following characteristics: Intel Core I7-4750HQ, 4 core (8 threads) 2 GHz, RAM-8gb, Win x64 app on Windows 10... I took the following coefficients for calculating SAH: cT = 1., cI = 1.5. And, if we talk about the results, it turned out that on 1.5 million triangles tree is built in less than 1.5 seconds. On the 3 million in 2.4 seconds. For 1.5 million points and 1.5million triangles (the points are not very far from the original model), the search was performed in 0.23 seconds, and if the points are removed from the model, the time increases, up to 3 seconds. For 3 million points (again, close to the model) and 3 million triangles search takes about 0.7 seconds. I hope I haven't confused anything. Finally, an illustration of the visualization of the constructed KD-Tree:

Consider a binary spatial partitioning structure called kd -wood. This structure is a binary tree of nested bounding boxes. Each parallelepiped in kd -tree is divided by a plane perpendicular to one of the coordinate axes into two child parallelepipeds.

The entire scene is contained within the root box, but continuing the recursive partitioning of boxes, we can come to the conclusion that each leaf box contains only a small number of primitives. Thus, kd -tree allows using binary search to find the primitive traversed by the ray.

Building a kd-tree

The algorithm for constructing a kd-tree can be represented as follows (we will call a rectangular parallelepiped by the English word "box").

Add all primitives to the bounding box. That is, build a box that bounds all primitives, which will correspond to the root node of the tree.

If there are few primitives at the node or the tree depth limit is reached, complete the construction.

Select a split plane that divides this node into two children... We will call them the right and left tree nodes.

Add primitives intersecting the box of the left node to the left node, primitives intersecting the box of the right node to the right.

The most difficult part of building a kd-tree is the 3rd step. The efficiency of the accelerating structure directly depends on it. There are several ways to choose the plane of the partition, let's look at them in order.

1. Select the plane of the dividing by the center.

The easiest way is to select a plane in the center of the box. First, select the axis (x, y or z), in which the box has the maximum size, then split the box in the center.

Drawing 1 : Split center

This method has poor adaptability. The classic octree has the same disadvantages. Intuitively, you can describe the reason for the poor speed on such trees by the fact that at each node there is a lot of empty space through which the ray traverses (pass through the tree during the search), while empty space should be discarded as soon as possible. A more rigorous explanation will be given a little later.

2. Select the plane along the median.

You might think that it would be nice to split the node into two children each time so that the number of primitives is the same in the right and left subtrees. Thus, we will build a balanced tree, which should speed up the search.

Figure 2 : Splitting by median

This is not a very good idea. The point is that balanced trees can only help if the required element is in a random node every time, that is, if the distribution of rays over the nodes during the search is uniform. In reality, this is not the case. On average, more rays will go to the node that is larger in its surface area, and with a median partition, these areas may be different for the nodes.

3. Surface Area Heuristic (SAH)

What are the criteria for a well-constructed kd-tree? On an intuitive level, this is actually not very clear, although the expression "as much white space as possible should be discarded as quickly as possible" will probably do.

We use a formal approach. Let us introduce a cost traversal function, which will reflect how much computationally expensive it is to trace a given node with a random set of rays. We will calculate it using the following formula.

SAH (x) = CostEmpty + SurfaceArea (Left) * N (Left) + SurfaceArea (Right) * N (Right)

Where CostEmpty is the cost of tracking an empty node (some constant), SurfaceArea is the surface area of the corresponding node, and N is the number of primitives in it. It is necessary to choose the plane of the partition so as to minimize this function. The function argument x is the one-dimensional coordinate of the plane of the partition.

Primitive boundaries are good candidates for a minimum of SAH. A simple construction algorithm is as follows. Each time you choose a plane, you need to sort out all possible boundaries of primitives in three dimensions, calculate the value of the cost function in them and find the minimum among all these values. When we calculate SAH for each plane, then we need to know N (left) and N (right) - the number of primitives to the right and left of the plane. If we calculate N by simple enumeration, the result will be a construction algorithm that is quadratic in complexity.

Drawing 3 : Partitioning with SAH

In Figure 4, you can see that SAH immediately discards large empty spaces, tightly constraining the geometry.

Drawing 4 : kd-tree built with SAH in mind

Surface Area Heuristic generates a more intelligent stopping criterion than a tree depth limit or a small number of primitives. If for the selected split plane the total cost of tracking child nodes is greater than the cost of tracking the current node as a leaf (that is, using the formula SurfaceArea (Node) * N (Node)), then the tree building should be stopped.

4. Fast construction of a kd tree

Option 0

Break in the center, choosing the axis along which the box size is maximized. This method is the fastest, but in some scenes ray tracing in such a tree is about 2-3 times slower.

Option 1

The simplest thing you can do to speed up the construction is to reduce the number of planes to iterate over. Instead of iterating over N planes (where N is the number of primitives), it is possible to compute SAH in only a small number of predetermined locations. The input box is evenly divided along each axis into M parts. Usually M lies in the range from several tens to a couple of hundred. N (left) and N (right) are calculated as before by exhaustive search, but now we have to do full iteration over the array only M times, not N. Thus, the algorithm acquires complexity N * M. In fact, we achieved acceleration by coarsening the SAH by sampling it. However, it turns out that the value of the minimum of the obtained rough function, as a rule, does not differ much from its exact minimum.

Option 2

What can be done to speed up option 1? It is necessary to get rid of the exhaustive search while calculating N (left) and N (right). To do this, we construct a tree, in each node of which a certain dividing plane and the number of primitives to the right and to the left of the plane are written. struct IntervalTree

Float split;

Int numPrimsOnLeftSide;

Int numPrimsOnRightSide;

IntervalTree * left;

IntervalTree * right;

In fact, we need three such trees at each level - one for x, y, z. The input interval will be halved each time. Having such a tree, it is possible, in log (N) operations, to fairly accurately determine the number of primitives to the right and left of the plane. The tree search algorithm looks simple enough.

vec2i IntervalTree :: TraversePlane (

float a_split) const

// in the zero component we will store the minimum, in the first - the maximum

vec2i res; // vector of two integers

if(a_split< split - EPSILON)

res = numPrimsOnRightSide;

if(left! = 0)

res + = Left () -> TraversePlane (a_split);

// count at least those primitives that are in this node so that SAH does not nullify.

If(res.M == 0)

res.M = numPrimsOnLeftSide;

Elseif(a_split> split + EPSILON)

res = numPrimsOnLeftSide;

If(right! = 0)

res + = right-> TraversePlane (a_split);

// if for some reason the number of primitives is zero, then

// count at least those primitives that are in this node so that SAH does not nullify.

If(res == 0)

res = numPrimsOnRightSide;

Else

// hit exactly the plane of the node

res = numPrimsOnLeftSide;

res = numPrimsOnRightSide;

return res;

The complexity of constructing a tree of planes is N * log (N). The complexity of estimating SAH is M * log (N), since we calculate SAH only in M fixed planes. Thus, the total complexity of the algorithm is N * log (N), since M is much less than N.

Before building a kd tree, you can in principle build an arbitrary accelerating structure just to speed up the SAH calculation later.

Option 3 (Almost exactly and for N * Log (N))

We use sorting. For each axis (x, y, z), we sort the primitives first by the maximums and then by the minimums of their bounding boxes. Let's call the maximum sorted array sorted_max. An array sorted by minimums - sorted_min.

Passing through the sorted_min array from left to right, each time we know exactly how many primitives (or rather their bounding boxes) lie to the right of the plane (and we almost exactly know how many primitives lie to the left). Passing through the sorted_max array from right to left, we know exactly how many primitives lie to the left of the plane and almost exactly how many lie to the right.

for(intaxis = 0; axis < 3; axis++)

// sort primitives belong to current axis and min bounds

// CompareMincomparator(axis); std:: sort(prims. begin(), prims. end(), comparator); for(inti=1; i< prims. size(); i++) intonLeftSide = i; intonRightSide = prims. size()- i; floatsplit = prims[ i]. Box(). vmin[ axis]; if(split == a_box. vmin[ axis]) continue;

// evaluate SAH

AABB3fbox_left = a_box;

AABB3fbox_right = a_box;

Box_left. vmax[ axis] = split;

Box_right. vmin[ axis] = split;

Floatsah= EMPTY_COST + onLeftSide* SurfaceArea(box_left) + onRightSide* SurfaceArea(box_right);

If (sah < min_sah)

Min_sah = sah;

Min_split = split;

Min_axis = axis;

// sort primitives belong to current axis and max bounds

// CompareMaxcomparator2(axis); std:: sort(prims. begin(), prims. end(), comparator2); for(inti= prims. size()-2; i>=0; i--) intonLeftSide = i+1; intonRightSide = prims. size()- i-1; floatsplit = prims[ i]. Box(). vmax[ axis]; if(split == a_box. vmax[ axis]) continue;

// evaluate SAH

AABB3f box_left = a_box;

AABB3f box_right = a_box;

Box_left.vmax[axis] = split;

Box_right.vmin[axis] = split;

Float sah= EMPTY_COST + onLeftSide*SurfaceArea(box_left) + onRightSide*SurfaceArea(box_right);

If (sah < min_sah)

Min_sah = sah;

Min_split = split;

Min_axis = axis;

In theory, the method has a problem. Consider the sorted_min array. We go through it from left to right in a loop:

for (int i = 0; i size(); i ++)

onLeftSide = i;

Int onRightSide = prims.size()-i;

// ...

We know the onLeftSide number absolutely exactly, but not quite the onRightSide number. The fact is that a plane can split some primitives, and in this case the same primitive lies both to the right of the plane and to the left, which this algorithm does not take into account. In practice, this problem practically does not manifest itself.

Option 4

It is possible to speed up the search for the minimum of the function SAH by using some method of minimization with several initial approximations. Let's say the golden ratio method. In this case, we also get rid of the brute force method. You just need to take into account that SAH becomes more or less smooth only with a large number of primitives. The advantage of this approach is that brute-force calculation of SAH is easily transferred to the GPU. Therefore, each time estimating SAH with brute force and reducing the number of estimates to a small number (~ 10-20 max), it is possible to build a kd tree with this method very quickly.

Option 5 (Binning)

This option is often used. The essence is as follows:

You need to divide the space into regular intervals of x, y and z. Each such interval is called a bin. Usually limited to a small number of baskets ~ 32.

The centers of the triangles are taken and placed in baskets. This means that you need to go through all the triangles and calculate their centers. After that, for each basket, you need to calculate how many points (centers) fell into it. This is not difficult to do. When calculating the center, you just need to increase the corresponding counter. Since the binning is regular, given the coordinate of a point, you can immediately determine which bin it falls into.

Ray tracing in kd-tree on CPU

you can download the algorithm from here

The classical binary search algorithm in kd trees (kd-tree traversal in English literature), used in most Cpu implementations is approximately as follows.At the first step of the algorithm, it is necessary to calculate the intersection of the ray with the scene-bounding root parallelepiped and remember the information about the intersection in the form of two coordinates (in the ray space) - t_near and t_far , denoting the intersection with the near and far planes, respectively. At each next step, information is needed only about the current node (its address) and these two coordinates.

In this case, there is no need to calculate the intersection of the ray and the child parallelepiped, it is enough just to find out the intersection with the plane dividing the parallelepiped (we denote the corresponding coordinate as t_split ). Every non-leaf node kd the tree has two child nodes. In fig. 5 shows three scenarios of events that can occur when tracing the path of the beam.

Figure 5 : Tracking a ray in a kd tree

When A (t_split> = t_far) or B (t_split< t_near) the ray crosses only one child node, so you can simply drop the right (respectively left) nodeand continue searching for the intersection at the remaining node.

When C the ray crosses both child nodes, so you must first look for the intersection in the near node and if it is not found, look for it in the far one. Since in general it is not known how many times the last event will occur, a stack is needed. Whenever a ray crosses both child nodes, the address of the far node, t_near and t_far are pushed onto the stack and the search continues in the near. If no intersection is found in the near node, the address of the far node is taken from the stack, t_near, t_far and the search continues at the far node.

Tracking a ray in a kd-tree on a GPU

Since on GPU initially there was no stack, stackless algorithms and algorithms using an artificial stack of small length appeared. On the GPU five ray tracing algorithms are known - restart, backtrack, push-down, short stack and the tracking algorithm in kd tree with bundles.

kd-tree restart

Drawing 2 : operation of the restart algorithm.

When a ray does not find an intersection in the leaf, its t_far is equal to t_near and the search continues from the root of the kd-tree (Fig. 2). Roughly speaking, the origin of the ray is simply moved - that is, its point of emission and the search starts over. This causes the beam to travel over the same nodes many times, which is inefficient.

kd-tree push-down

Small modification restart algorithm, which consists in the fact that the ray starts not from the root of the tree, but from some subtree. However, a node can be selected as a new subtree only if, during the entire descent to it, not a single node has been encountered in which the ray would intersect both child nodes.

That is, if, when descending along the nearest nodes kd tree at least once met a node in which the ray crosses both the near and far child nodes, then it is this far child node that must be selected as a subtree. Further, if the ray misses, the restart will be performed from the remembered far node and you can again try to find a new subtree. In fact, this is an attempt to create a stack with a length of 1 element.

kd-tree short stack

The authors also used a short stack. As long as the stack size is sufficient, it is filled in the same way as in the classical algorithm. When the stack overflows, it starts to act as a ring buffer. If the stack is empty, restart is required. For example, if a stack of length 4 contains nodes with numbers (1), (4), (6), (8), then when adding a new element (12), the stack will look like this: (12), (4), (6), (8). That is, the first element will be overwritten. Elements will be removed in the order in which they were added (that is, first 12, then 8, 6 and finally 4), but when element (4) is removed from the stack, you will need to restart, since we have overwritten element (1). The point of a short stack is that it greatly reduces the number of restarts a ray has to do.

kd-tree backtrack

If you keep in nodes kd tree a link to the parent nodes, in case of a miss, this link can be used to get to the parent. In addition, it will be necessary to store its bounding parallipipid at each node in order to be able to compute a new t_far in case of a miss.

Ct_near you can do the same as in the case restart algorithm. This will require an additional 6 * 4 (for the coordinates of the parallelipipid) + 4 (for the parent reference) = 28 bytes. Since memory for GPU rather limited, such waste can create problems. In addition, each time you ascend, you will have to count the intersection of the ray and the parallellipid aligned along the axes, which, of course, is not free in terms of computational resources. It should be especially noted that a kd tree with saved parallellipipids will take much more memory than a well-built BVH tree in the same scene. The main reason here is that in the kd tree, the parallellipipids will have more common points, which will eventually be duplicated.

kd-tree with bundles

In this version, each node kd the tree adds six links to the neighboring nodes of the tree (those nodes that touch this node) (Fig. 3). If the ray misses at the current node, then the bundles can be used to get to the next nodes in which you need to trace the ray.

This algorithm, like backtrack allows you to avoid multiple traverses on the same tree nodes. However, six references require an additional 24 bytes of memory, which adds up to 32 bytes.

Drawing 3 : kd tree with bundles .

Benefits of kd trees

A very simple and efficient traverse algorithm. Even for the GPU.
Low memory usage (8 bytes per node).

Disadvantages of kd trees

Time consuming construction, namely, finding a partition with minimal SAH.

Deeper than BVH. More building steps.

Conclusion

In summary, the kd tree is ideal for ray tracing. This is true for both the CPU and GPU.

Literature

Wald I. Realtime Ray Tracing and Interactive Global Illumination. PhD thesis, Saarland University, 2004.

Shevtsov M., Soupikov A., Kapustin A.
Highly Parallel Fast KD-tree Construction for Interactive Ray Tracing of Dynamic Scenes. In Proceedings of the EUROGRAPHICS conference, 2007.

Foley T., Sugerman J . KD-Tree Acceleration Structures for a GPU Raytracer. In Proceedings of the ACM SIGGRAPH / EUROGRAPHICS conference on Graphics hardware, p. 15-22, 2005.

Horn D., Sugerman J., Houston M., Hanrahan P. Interactive k-D Tree GPU Raytracing. Proceedings of the symposium on Interactive 3D graphics and games on Fast rendering, p. 167-174, 2007.

Popov S., Günther J., Seidel H.-P., Slusallek P. Stackless KD-Tree Traversal for High Performance GPU Ray Tracing. In Proceedings of the EUROGRAPHICS conference, 2007.