bigKNN can build exact neighbour graphs directly from
bigmemory::big.matrix references. These graph builders are
useful when the exact neighbourhood structure is the real output you
care about, not only the raw search result.
Typical uses include:
This vignette introduces the graph families available in
bigKNN, the symmetrization rules they use, and the output
formats that make them easy to inspect or hand off downstream.
We will work with six points arranged as two well-separated local groups. That geometry keeps the graph outputs easy to read while still showing meaningful differences between directed, mutual, shared-nearest-neighbour, and radius graphs.
reference_points <- data.frame(
id = paste0("p", 1:6),
x1 = c(0.0, 0.3, 1.2, 4.0, 4.3, 5.2),
x2 = c(0.0, 0.0, 0.0, 4.0, 4.1, 4.0)
)
reference <- as.big.matrix(as.matrix(reference_points[c("x1", "x2")]))
reference_points
#> id x1 x2
#> 1 p1 0.0 0.0
#> 2 p2 0.3 0.0
#> 3 p3 1.2 0.0
#> 4 p4 4.0 4.0
#> 5 p5 4.3 4.1
#> 6 p6 5.2 4.0The returned graph objects always refer to rows of the reference
matrix. In other words, vertices 1:6 correspond to
p1:p6.
k-NN graphs with
knn_graph_bigmatrix()A directed k-nearest-neighbour graph stores one outgoing
edge per retained neighbour for every row. With
include_distance = TRUE, each edge value is the exact
distance between the source row and the neighbour row.
directed_knn <- knn_graph_bigmatrix(
reference,
k = 1,
format = "edge_list",
symmetrize = "none"
)
clean_graph(directed_knn)
#> from to distance
#> 1 1 2 0.3000000
#> 2 2 1 0.3000000
#> 3 3 2 0.9000000
#> 4 4 5 0.3162278
#> 5 5 4 0.3162278
#> 6 6 5 0.9055385Here:
from is the source rowto is the chosen neighbour rowdistance is the exact Euclidean distanceThis is the most literal graph form: every query row keeps its own directed outgoing edges. If one row points to another but not vice versa, both facts are visible.
k-NN graphs with
mutual_knn_graph_bigmatrix()Mutual k-NN graphs keep only reciprocal neighbour
relationships. They are often sparser and more conservative than
directed k-NN graphs, because one-sided edges are
dropped.
mutual_knn <- mutual_knn_graph_bigmatrix(
reference,
k = 1,
format = "edge_list"
)
clean_graph(mutual_knn)
#> from to distance
#> 1 1 2 0.3000000
#> 2 4 5 0.3162278In this example, only the pairs (1, 2) and
(4, 5) survive. Vertices 3 and 6
each point toward their nearest local anchor, but that preference is not
returned in the other direction, so those edges disappear in the mutual
graph.
mutual_knn_graph_bigmatrix() is equivalent to calling
knn_graph_bigmatrix(..., symmetrize = "mutual"):
radius_graph_bigmatrix()Radius graphs use a fixed distance threshold instead of a fixed
k. That means the number of neighbours can vary from row to
row.
radius_directed <- radius_graph_bigmatrix(
reference,
radius = 1.1,
format = "edge_list",
symmetrize = "none"
)
radius_union <- radius_graph_bigmatrix(
reference,
radius = 1.1,
format = "edge_list",
symmetrize = "union"
)
clean_graph(radius_directed)
#> from to distance
#> 1 1 2 0.3000000
#> 2 2 1 0.3000000
#> 3 2 3 0.9000000
#> 4 3 2 0.9000000
#> 5 4 5 0.3162278
#> 6 5 4 0.3162278
#> 7 5 6 0.9055385
#> 8 6 5 0.9055385
clean_graph(radius_union)
#> from to distance
#> 1 1 2 0.3000000
#> 2 2 3 0.9000000
#> 3 4 5 0.3162278
#> 4 5 6 0.9055385Compared with fixed-k graphs:
In this self-search setting with a symmetric metric,
symmetrize = "union" and symmetrize = "mutual"
coincide because any radius edge that appears in one direction also
appears in the other:
For k-NN and radius graph builders,
symmetrize controls how directed edges are collapsed:
"none" keeps directed edges as-is"union" keeps a pair if either direction appears"mutual" keeps a pair only when both directions
appearThe difference is easiest to see on k = 1:
graph_none <- knn_graph_bigmatrix(reference, k = 1, format = "edge_list", symmetrize = "none")
graph_union <- knn_graph_bigmatrix(reference, k = 1, format = "edge_list", symmetrize = "union")
graph_mutual <- knn_graph_bigmatrix(reference, k = 1, format = "edge_list", symmetrize = "mutual")
data.frame(
symmetrize = c("none", "union", "mutual"),
n_edges = c(nrow(clean_graph(graph_none)), nrow(clean_graph(graph_union)), nrow(clean_graph(graph_mutual))),
row.names = NULL
)
#> symmetrize n_edges
#> 1 none 6
#> 2 union 4
#> 3 mutual 2When distances are stored, bigKNN keeps the minimum
directed distance for the collapsed undirected pair. When unit weights
are stored instead (include_distance = FALSE), the retained
value is the maximum weight, which remains 1 for unweighted
graphs.
as_edge_list(),
as_triplet(), and as_sparse_matrix()Different downstream tasks prefer different graph formats:
triplet_graph <- as_triplet(graph_mutual)
sparse_graph <- as_sparse_matrix(
knn_graph_bigmatrix(
reference,
k = 1,
format = "edge_list",
symmetrize = "union",
include_distance = FALSE
)
)
triplet_graph
#> $i
#> [1] 1 4
#>
#> $j
#> [1] 2 5
#>
#> $x
#> [1] 0.3000000 0.3162278
#>
#> $Dim
#> [1] 6 6
#>
#> attr(,"class")
#> [1] "bigknn_graph_triplet" "bigknn_graph"
#> attr(,"value_name")
#> [1] "distance"
#> attr(,"symmetric")
#> [1] TRUE
class(sparse_graph)
#> [1] "dgCMatrix"
#> attr(,"package")
#> [1] "Matrix"
Matrix::summary(sparse_graph)
#> 6 x 6 sparse Matrix of class "dgCMatrix", with 8 entries
#> i j x
#> 1 2 1 1
#> 2 1 2 1
#> 3 3 2 1
#> 4 2 3 1
#> 5 5 4 1
#> 6 4 5 1
#> 7 6 5 1
#> 8 5 6 1One subtle but important point: symmetric edge lists and triplets
store each undirected pair once, while the sparse matrix representation
stores both matrix entries (i, j) and (j, i).
That is why the sparse summary above shows twice as many non-zero
entries as the symmetrized edge list.
You can round-trip back to an edge list when needed:
Once a graph is in sparse-matrix form, ordinary matrix operations become useful immediately. For an unweighted symmetric adjacency, row sums give vertex degrees:
degree <- Matrix::rowSums(sparse_graph)
data.frame(
vertex = reference_points$id,
degree = as.numeric(degree),
row.names = NULL
)
#> vertex degree
#> 1 p1 1
#> 2 p2 2
#> 3 p3 1
#> 4 p4 1
#> 5 p5 2
#> 6 p6 1The same graph could also be handed off to a separate graph package after conversion:
as_edge_list() for packages that ingest edge
tablesas_sparse_matrix() for adjacency-based
workflowsas_triplet() when you want a lightweight sparse
representation without constructing a matrix object yetThat keeps bigKNN focused on exact search and exact
graph construction, without forcing a particular downstream graph
ecosystem on the user.