Skip to main content

kdtree

kdtree(
@dataset ## llll (required)
@numneighbors 1
@radius 0
) -> llll

Given a dataset, generates a K-dimensional Tree object for efficient neighborhood searches of multi-dimensional data. KDTrees are most useful for repeated querying of a dataset, because there is a cost associated with building them. Whilst KDTrees can offer very good performance relative to naïve search algorithms, they suffer from something called “the curse of dimensionality” (like many algorithms for multi-dimensional data). In practice, this means that as the number of dimensions of your data goes up, the relative performance gains of a KDTree go down.

During prediction, via the predict function, it returns the following llll:

[
[ 'neighbors' <indices> ]
[ 'distances' <distances> ]
]

Where:

  • 'neighbors' is a list of integers representing the index position of the nearest neighbors in the @dataset.
  • 'distances' is a list of floats representing the distances between the target point and its nearest neighbors.

Arguments

  • @dataset [llll]: Dataset to fit KDTree to (required)
  • @numneighbors [int]: The number of neighbors to return. (default: 1).
  • @radius [int/float]: The maximum distance (in high dimensional space) that a returned point can be. Any points beyond radius will not be returned (even if they're within the nearest numneighbors points). When radius is 0 (default), it is no longer a constraint and the distance of a nearest neighbor is not taken into account. (default: 0).

Output

KDTree object [llll]


Usage

$dims = 10; ## data dimensionality
$dataset = dataset(
## dummy dataset
for $i in 1...100 collect [for $j in 1...$dims collect randn()]
);
## generate KDTree based on dataset
$kdtree = kdtree($dataset @numneighbors 3);
## random point
$xpoint = for $j in 1...$dims collect randn();
## search nearest neighbors
$match = predict($kdtree, $xpoint);
## get id and distances of nearest neighbors
$neighborids = getkey($match, 'neighbors');
$distances = getkey($match, 'distances');
## print results
print($neighborids, 'Indices of 3 Nearest neighbors:');
print($distances, 'Distance from 3 Nearest neighbors:');
print(getitems($dataset, $neighborids), '3 Nearest neighbors:')