std.algorithm.setops

This is a submodule of std.algorithm. It contains generic algorithms that implement set operations.

The functions multiwayMerge, multiwayUnion, setDifference, setIntersection, setSymmetricDifference expect a range of sorted ranges as input.

All algorithms are generalized to accept as input not only sets but also multisets. Each algorithm documents behaviour in the presence of duplicated inputs.

Cheat Sheet
Function Name Description
cartesianProduct Computes Cartesian product of two ranges.
largestPartialIntersection Copies out the values that occur most frequently in a range of ranges.
largestPartialIntersectionWeighted Copies out the values that occur most frequently (multiplied by per-value weights) in a range of ranges.
multiwayMerge Merges a range of sorted ranges.
multiwayUnion Computes the union of a range of sorted ranges.
setDifference Lazily computes the set difference of two or more sorted ranges.
setIntersection Lazily computes the intersection of two or more sorted ranges.
setSymmetricDifference Lazily computes the symmetric set difference of two or more sorted ranges.

License:
Boost License 1.0.
Authors:
Andrei Alexandrescu
Source
std/algorithm/setops.d
auto cartesianProduct(R1, R2)(R1 range1, R2 range2)
Constraints: if (!allSatisfy!(isForwardRange, R1, R2) || anySatisfy!(isInfinite, R1, R2));

auto cartesianProduct(RR...)(RR ranges)
Constraints: if (ranges.length >= 2 && allSatisfy!(isForwardRange, RR) && !anySatisfy!(isInfinite, RR));

auto cartesianProduct(R1, R2, RR...)(R1 range1, R2 range2, RR otherRanges)
Constraints: if (!allSatisfy!(isForwardRange, R1, R2, RR) || anySatisfy!(isInfinite, R1, R2, RR));

Lazily computes the Cartesian product of two or more ranges. The product is a range of tuples of elements from each respective range.

The conditions for the two-range case are as follows:

If both ranges are finite, then one must be (at least) a forward range and the other an input range.

If one range is infinite and the other finite, then the finite range must be a forward range, and the infinite range can be an input range.

If both ranges are infinite, then both must be forward ranges.

When there are more than two ranges, the above conditions apply to each adjacent pair of ranges.

Parameters:
R1 range1 The first range
R2 range2 The second range
RR ranges Two or more non-infinite forward ranges
RR otherRanges Zero or more non-infinite forward ranges
Returns:
A forward range of std.typecons.Tuple representing elements of the cartesian product of the given ranges.
Examples:
import std.algorithm.searching : canFind;
import std.range;
import std.typecons : tuple;

auto N = sequence!"n"(0);         // the range of natural numbers
auto N2 = cartesianProduct(N, N); // the range of all pairs of natural numbers

// Various arbitrary number pairs can be found in the range in finite time.
assert(canFind(N2, tuple(0, 0)));
assert(canFind(N2, tuple(123, 321)));
assert(canFind(N2, tuple(11, 35)));
assert(canFind(N2, tuple(279, 172)));
Examples:
import std.algorithm.searching : canFind;
import std.typecons : tuple;

auto B = [ 1, 2, 3 ];
auto C = [ 4, 5, 6 ];
auto BC = cartesianProduct(B, C);

foreach (n; [[1, 4], [2, 4], [3, 4], [1, 5], [2, 5], [3, 5], [1, 6],
             [2, 6], [3, 6]])
{
    assert(canFind(BC, tuple(n[0], n[1])));
}
Examples:
import std.algorithm.comparison : equal;
import std.typecons : tuple;

auto A = [ 1, 2, 3 ];
auto B = [ 'a', 'b', 'c' ];
auto C = [ "x", "y", "z" ];
auto ABC = cartesianProduct(A, B, C);

assert(ABC.equal([
    tuple(1, 'a', "x"), tuple(1, 'a', "y"), tuple(1, 'a', "z"),
    tuple(1, 'b', "x"), tuple(1, 'b', "y"), tuple(1, 'b', "z"),
    tuple(1, 'c', "x"), tuple(1, 'c', "y"), tuple(1, 'c', "z"),
    tuple(2, 'a', "x"), tuple(2, 'a', "y"), tuple(2, 'a', "z"),
    tuple(2, 'b', "x"), tuple(2, 'b', "y"), tuple(2, 'b', "z"),
    tuple(2, 'c', "x"), tuple(2, 'c', "y"), tuple(2, 'c', "z"),
    tuple(3, 'a', "x"), tuple(3, 'a', "y"), tuple(3, 'a', "z"),
    tuple(3, 'b', "x"), tuple(3, 'b', "y"), tuple(3, 'b', "z"),
    tuple(3, 'c', "x"), tuple(3, 'c', "y"), tuple(3, 'c', "z")
]));
void largestPartialIntersection(alias less = "a < b", RangeOfRanges, Range)(RangeOfRanges ror, Range tgt, SortOutput sorted = No.sortOutput);

Given a range of sorted forward ranges ror, copies to tgt the elements that are common to most ranges, along with their number of occurrences. All ranges in ror are assumed to be sorted by less. Only the most frequent tgt.length elements are returned.

Parameters:
less The predicate the ranges are sorted by.
RangeOfRanges ror A range of forward ranges sorted by less.
Range tgt The target range to copy common elements to.
SortOutput sorted Whether the elements copied should be in sorted order. The function largestPartialIntersection is useful for e.g. searching an inverted index for the documents most likely to contain some terms of interest. The complexity of the search is Ο(n * log(tgt.length)), where n is the sum of lengths of all input ranges. This approach is faster than keeping an associative array of the occurrences and then selecting its top items, and also requires less memory (largestPartialIntersection builds its result directly in tgt and requires no extra memory). If at least one of the ranges is a multiset, then all occurences of a duplicate element are taken into account. The result is equivalent to merging all ranges and picking the most frequent tgt.length elements.
Warning
Because largestPartialIntersection does not allocate extra memory, it will leave ror modified. Namely, largestPartialIntersection assumes ownership of ror and discretionarily swaps and advances elements of it. If you want ror to preserve its contents after the call, you may want to pass a duplicate to largestPartialIntersection (and perhaps cache the duplicate in between calls).
Examples:
import std.typecons : tuple, Tuple;

// Figure which number can be found in most arrays of the set of
// arrays below.
double[][] a =
[
    [ 1, 4, 7, 8 ],
    [ 1, 7 ],
    [ 1, 7, 8],
    [ 4 ],
    [ 7 ],
];
auto b = new Tuple!(double, uint)[1];
// it will modify the input range, hence we need to create a duplicate
largestPartialIntersection(a.dup, b);
// First member is the item, second is the occurrence count
writeln(b[0]); // tuple(7.0, 4u)
// 7.0 occurs in 4 out of 5 inputs, more than any other number

// If more of the top-frequent numbers are needed, just create a larger
// tgt range
auto c = new Tuple!(double, uint)[2];
largestPartialIntersection(a, c);
writeln(c[0]); // tuple(1.0, 3u)
// 1.0 occurs in 3 inputs

// multiset
double[][] x =
[
    [1, 1, 1, 1, 4, 7, 8],
    [1, 7],
    [1, 7, 8],
    [4, 7],
    [7]
];
auto y = new Tuple!(double, uint)[2];
largestPartialIntersection(x.dup, y);
// 7.0 occurs 5 times
writeln(y[0]); // tuple(7.0, 5u)
// 1.0 occurs 6 times
writeln(y[1]); // tuple(1.0, 6u)
void largestPartialIntersectionWeighted(alias less = "a < b", RangeOfRanges, Range, WeightsAA)(RangeOfRanges ror, Range tgt, WeightsAA weights, SortOutput sorted = No.sortOutput);

Similar to largestPartialIntersection, but associates a weight with each distinct element in the intersection.

If at least one of the ranges is a multiset, then all occurences of a duplicate element are taken into account. The result is equivalent to merging all input ranges and picking the highest tgt.length, weight-based ranking elements.

Parameters:
less The predicate the ranges are sorted by.
RangeOfRanges ror A range of forward ranges sorted by less.
Range tgt The target range to copy common elements to.
WeightsAA weights An associative array mapping elements to weights.
SortOutput sorted Whether the elements copied should be in sorted order.
Examples:
import std.typecons : tuple, Tuple;

// Figure which number can be found in most arrays of the set of
// arrays below, with specific per-element weights
double[][] a =
[
    [ 1, 4, 7, 8 ],
    [ 1, 7 ],
    [ 1, 7, 8],
    [ 4 ],
    [ 7 ],
];
auto b = new Tuple!(double, uint)[1];
double[double] weights = [ 1:1.2, 4:2.3, 7:1.1, 8:1.1 ];
largestPartialIntersectionWeighted(a, b, weights);
// First member is the item, second is the occurrence count
writeln(b[0]); // tuple(4.0, 2u)
// 4.0 occurs 2 times -> 4.6 (2 * 2.3)
// 7.0 occurs 3 times -> 4.4 (3 * 1.1)

// multiset
double[][] x =
[
    [ 1, 1, 1, 4, 7, 8 ],
    [ 1, 7 ],
    [ 1, 7, 8],
    [ 4 ],
    [ 7 ],
];
auto y = new Tuple!(double, uint)[1];
largestPartialIntersectionWeighted(x, y, weights);
writeln(y[0]); // tuple(1.0, 5u)
// 1.0 occurs 5 times -> 1.2 * 5 = 6
struct MultiwayMerge(alias less, RangeOfRanges);

MultiwayMerge!(less, RangeOfRanges) multiwayMerge(alias less = "a < b", RangeOfRanges)(RangeOfRanges ror);

Merges multiple sets. The input sets are passed as a range of ranges and each is assumed to be sorted by less. Computation is done lazily, one union element at a time. The complexity of one popFront operation is Ο(log(ror.length)). However, the length of ror decreases as ranges in it are exhausted, so the complexity of a full pass through MultiwayMerge is dependent on the distribution of the lengths of ranges contained within ror. If all ranges have the same length n (worst case scenario), the complexity of a full pass through MultiwayMerge is Ο(n * ror.length * log(ror.length)), i.e., log(ror.length) times worse than just spanning all ranges in turn. The output comes sorted (unstably) by less.

The length of the resulting range is the sum of all lengths of the ranges passed as input. This means that all elements (duplicates included) are transferred to the resulting range.

For backward compatibility, multiwayMerge is available under the name nWayUnion and MultiwayMerge under the name of NWayUnion . Future code should use multiwayMerge and MultiwayMerge as nWayUnion and NWayUnion will be deprecated.

Parameters:
less Predicate the given ranges are sorted by.
RangeOfRanges ror A range of ranges sorted by less to compute the union for.
Returns:
A range of the union of the ranges in ror.
Warning
Because MultiwayMerge does not allocate extra memory, it will leave ror modified. Namely, MultiwayMerge assumes ownership of ror and discretionarily swaps and advances elements of it. If you want ror to preserve its contents after the call, you may want to pass a duplicate to MultiwayMerge (and perhaps cache the duplicate in between calls).
See Also:
std.algorithm.sorting.merge for an analogous function that takes a static number of ranges of possibly disparate types.
Examples:
import std.algorithm.comparison : equal;

double[][] a =
[
    [ 1, 4, 7, 8 ],
    [ 1, 7 ],
    [ 1, 7, 8],
    [ 4 ],
    [ 7 ],
];
auto witness = [
    1, 1, 1, 4, 4, 7, 7, 7, 7, 8, 8
];
assert(equal(multiwayMerge(a), witness));

double[][] b =
[
    // range with duplicates
    [ 1, 1, 4, 7, 8 ],
    [ 7 ],
    [ 1, 7, 8],
    [ 4 ],
    [ 7 ],
];
// duplicates are propagated to the resulting range
assert(equal(multiwayMerge(b), witness));
static bool compFront(.ElementType!RangeOfRanges a, .ElementType!RangeOfRanges b);
this(RangeOfRanges ror);
@property bool empty();
@property ref auto front();
void popFront();
auto multiwayUnion(alias less = "a < b", RangeOfRanges)(RangeOfRanges ror);

Computes the union of multiple ranges. The input ranges are passed as a range of ranges and each is assumed to be sorted by less. Computation is done lazily, one union element at a time. multiwayUnion(ror) is functionally equivalent to multiwayMerge(ror).uniq.

"The output of multiwayUnion has no duplicates even when its inputs contain duplicates."

Parameters:
less Predicate the given ranges are sorted by.
RangeOfRanges ror A range of ranges sorted by less to compute the intersection for.
Returns:
A range of the union of the ranges in ror. See also: multiwayMerge
Examples:
import std.algorithm.comparison : equal;

// sets
double[][] a =
[
    [ 1, 4, 7, 8 ],
    [ 1, 7 ],
    [ 1, 7, 8],
    [ 4 ],
    [ 7 ],
];

auto witness = [1, 4, 7, 8];
assert(equal(multiwayUnion(a), witness));

// multisets
double[][] b =
[
    [ 1, 1, 1, 4, 7, 8 ],
    [ 1, 7 ],
    [ 1, 7, 7, 8],
    [ 4 ],
    [ 7 ],
];
assert(equal(multiwayUnion(b), witness));

double[][] c =
[
    [9, 8, 8, 8, 7, 6],
    [9, 8, 6],
    [9, 8, 5]
];
auto witness2 = [9, 8, 7, 6, 5];
assert(equal(multiwayUnion!"a > b"(c), witness2));
struct SetDifference(alias less = "a < b", R1, R2) if (isInputRange!R1 && isInputRange!R2);

SetDifference!(less, R1, R2) setDifference(alias less = "a < b", R1, R2)(R1 r1, R2 r2);

Lazily computes the difference of r1 and r2. The two ranges are assumed to be sorted by less. The element types of the two ranges must have a common type.

In the case of multisets, considering that element a appears x times in r1 and y times and r2, the number of occurences of a in the resulting range is going to be x-y if x > y or 0 otherwise.

Parameters:
less Predicate the given ranges are sorted by.
R1 r1 The first range.
R2 r2 The range to subtract from r1.
Returns:
A range of the difference of r1 and r2.
See Also:
setSymmetricDifference
Examples:
import std.algorithm.comparison : equal;
import std.range.primitives : isForwardRange;

//sets
int[] a = [ 1, 2, 4, 5, 7, 9 ];
int[] b = [ 0, 1, 2, 4, 7, 8 ];
assert(equal(setDifference(a, b), [5, 9]));
static assert(isForwardRange!(typeof(setDifference(a, b))));

// multisets
int[] x = [1, 1, 1, 2, 3];
int[] y = [1, 1, 2, 4, 5];
auto r = setDifference(x, y);
assert(equal(r, [1, 3]));
assert(setDifference(r, x).empty);
this(R1 r1, R2 r2);
void popFront();
@property ref auto front();
@property typeof(this) save();
@property bool empty();
struct SetIntersection(alias less = "a < b", Rs...) if (Rs.length >= 2 && allSatisfy!(isInputRange, Rs) && !is(CommonType!(staticMap!(ElementType, Rs)) == void));

SetIntersection!(less, Rs) setIntersection(alias less = "a < b", Rs...)(Rs ranges)
Constraints: if (Rs.length >= 2 && allSatisfy!(isInputRange, Rs) && !is(CommonType!(staticMap!(ElementType, Rs)) == void));

Lazily computes the intersection of two or more input ranges ranges. The ranges are assumed to be sorted by less. The element types of the ranges must have a common type.

In the case of multisets, the range with the minimum number of occurences of a given element, propagates the number of occurences of this element to the resulting range.

Parameters:
less Predicate the given ranges are sorted by.
Rs ranges The ranges to compute the intersection for.
Returns:
A range containing the intersection of the given ranges.
Examples:
import std.algorithm.comparison : equal;

// sets
int[] a = [ 1, 2, 4, 5, 7, 9 ];
int[] b = [ 0, 1, 2, 4, 7, 8 ];
int[] c = [ 0, 1, 4, 5, 7, 8 ];
assert(equal(setIntersection(a, a), a));
assert(equal(setIntersection(a, b), [1, 2, 4, 7]));
assert(equal(setIntersection(a, b, c), [1, 4, 7]));

// multisets
int[] d = [ 1, 1, 2, 2, 7, 7 ];
int[] e = [ 1, 1, 1, 7];
assert(equal(setIntersection(a, d), [1, 2, 7]));
assert(equal(setIntersection(d, e), [1, 1, 7]));
this(Rs input);
@property bool empty();
void popFront();
@property ElementType front();
@property SetIntersection save();
struct SetSymmetricDifference(alias less = "a < b", R1, R2) if (isInputRange!R1 && isInputRange!R2);

SetSymmetricDifference!(less, R1, R2) setSymmetricDifference(alias less = "a < b", R1, R2)(R1 r1, R2 r2);

Lazily computes the symmetric difference of r1 and r2, i.e. the elements that are present in exactly one of r1 and r2. The two ranges are assumed to be sorted by less, and the output is also sorted by less. The element types of the two ranges must have a common type.

If both ranges are sets (without duplicated elements), the resulting range is going to be a set. If at least one of the ranges is a multiset, the number of occurences of an element x in the resulting range is abs(a-b) where a is the number of occurences of x in r1, b is the number of occurences of x in r2, and abs is the absolute value.

If both arguments are ranges of L-values of the same type then SetSymmetricDifference will also be a range of L-values of that type.

Parameters:
less Predicate the given ranges are sorted by.
R1 r1 The first range.
R2 r2 The second range.
Returns:
A range of the symmetric difference between r1 and r2.
See Also:
setDifference
Examples:
import std.algorithm.comparison : equal;
import std.range.primitives : isForwardRange;

// sets
int[] a = [ 1, 2, 4, 5, 7, 9 ];
int[] b = [ 0, 1, 2, 4, 7, 8 ];
assert(equal(setSymmetricDifference(a, b), [0, 5, 8, 9][]));
static assert(isForwardRange!(typeof(setSymmetricDifference(a, b))));

//mutisets
int[] c = [1, 1, 1, 1, 2, 2, 2, 4, 5, 6];
int[] d = [1, 1, 2, 2, 2, 2, 4, 7, 9];
assert(equal(setSymmetricDifference(c, d), setSymmetricDifference(d, c)));
assert(equal(setSymmetricDifference(c, d), [1, 1, 2, 5, 6, 7, 9]));
this(R1 r1, R2 r2);
void popFront();
@property ref auto front();
@property typeof(this) save();
ref auto opSlice();
@property bool empty();

© 1999–2021 The D Language Foundation
Licensed under the Boost License 1.0.
https://dlang.org/phobos/std_algorithm_setops.html