aerial.utils.coll
Various supplementary collection functions not included in the
dstandard Clojure ecosystem. Mostly for seqs, but also for
vectors, maps and sets.
coalesce-xy-yx
(coalesce-xy-yx item-coll f)
Coaleseces elements of item-coll, which are or have common "keys",
according to the function f. Two keys k1 and k2 are considered
common if (or (= k1 k2) (= (reverse k1) k2)) for reversible keys or
simply (= k1 k2) for non reversible keys. Reversible keys are
vectors, seqs, or string types.
F is a function of two parameters [x v], where x is an element of
item-coll, and v is the current value associated with the key of x
or nil if no association yet exists. F is expected to return the
current association for key of x based on x and v. If x is a
mapentry, (key x) is used to determine the association. If x is a
list or vector (first x) is used to determine the association.
Ex:
(freqn 1 (map #(apply str %) (combins 2 "auauuagcgccg")))
=> {"aa" 3, "cc" 3, "gg" 3, "uu" 3, "ac" 9, "cg" 4,
"ag" 9, "ua" 4, "uc" 9, "ug" 9, "au" 5, "gc" 5}
(coalesce-xy-yx *1 (fn[x v] (if (not v) 0 (+ (val x) v))))
=> {"aa" 3, "cc" 3, "gg" 3, "uu" 3, "ac" 9, "cg" 9,
"ag" 9, "ua" 9, "uc" 9, "ug" 9}
concatv
(concatv)
(concatv coll & colls)
Eager concat. WARNING: Use with caution on large colls. Will
infinite loop on infinite colls!
drop-until
(drop-until pred coll)
dropv
(dropv n coll)
Eager drop. Uses transducers to eagerly drop from a coll
WARNING: Use with caution on large colls. Will infinite loop on
infinite colls!
dropv-until
(dropv-until pred coll)
Eager drop-until. Uses transducers to eagerly drop from a coll
WARNING: Use with caution on large colls. May infinite loop on
infinite colls!
dropv-while
(dropv-while pred coll)
Eager drop-while. Uses transducers to eagerly drop from a coll
WARNING: Use with caution on large colls. May infinite loop on
infinite colls!
ensure-vec
(ensure-vec x)
Return a vector representation for x. If x is a vector just return
it, if it is a seqable return (vec x), if it is an "atom" return
[x].
in
(in e coll)
Return whether e is an element of coll.
map->csv-map
(map->csv-map map)
(map->csv-map prefix map)
Transforms a nested map into a "flattened" map where keys are
column names formed by concatenating the path of keys to each
element. If prefix is given it is catenated to the front of each
column name.
Ex:
(map->csv-map {:one {:a 1 :b 2}, :two {"hi" 1 "there" 7}})
=> {"one_a" [1], "one_b" [2], "two_hi" [1], "two_there" [7]}
(map->csv-map "P" {:one {:a 1 :b 2}, :two {"hi" 1 "there" 7}})
=> {"P_one_a" [1], "P_one_b" [2], "P_two_hi" [1], "P_two_there" [7]}
map-entry?
(map-entry? x)
Return whether x is a map entry
merge-with*
added in 1.jsa
(merge-with* f & maps)
Merge-with needs to call user supplied F with the KEY as well!!!
Returns a map that consists of the rest of the maps conj-ed onto the
first. If a key occurs in more than one map, the mapping(s) from
the latter (left-to-right) will be combined with the mapping in the
result by calling (f key val-in-result val-in-latter).
partitionv-all
(partitionv-all n coll)
Eager partition-all. Uses transducers to eagerly partition coll
into partitions of size n (with possibly fewer than n items at the
end).
WARNING: use with caution on large colls. Will infinite loop on
infinite colls!
pos
(pos x coll)
Returns a lazy seq of positions of X within COLL taken as a sequence
pos-any
(pos-any test-coll coll)
Returns a lazy seq of positions of any element of TEST-COLL within
COLL taken as a sequence
positions
(positions pred coll)
pxmap
(pxmap f par coll)
(pxmap f par coll1 coll2)
(pxmap f par coll1 coll2 & colls)
Constrained pmap. Constrain pmap to at most par threads.
Generally, to ensure non degrading behavior, par should be
<= (.. Runtime getRuntime availableProcessors). It can be more,
but if par >> availableProcessors, thrashing (excessive context
switching) can become an issue. Nevertheless, there are cases
where having par be larger can reduce the ill effects of the
partition problem. NOTE: no effort is made to provide the true (or
even a "good") solution to the partitioning of f over coll(s).
Effectively, (pmap f (partition-all (/ (count coll) par) coll).
Implicit doall on results to force execution. For multiple
collection variants, chunks the _transpose_ of the collection of
collections.
random-subset
(random-subset s n)
Create a "random" N element subset of the collection s treated as a set,
i.e., s with no duplicate elements. If n <= 0, return #{}, the
empty set. If (count (set s)) <= n, return (set s). Otherwise,
pick N random elements from (set s) to form subset.
reducem
(reducem f fr coll)
(reducem f fr coll1 & colls)
Multiple collection reduction. FR is a reducing function which must
return an identity value when called with no arguments. F is a
transform function that is applied to the arguments from the
supplied collections (treated as seqs). Note, for the first
application, the result is (fr (fr) (f ...)).
By default, reduction proceeds on the results of F applied over the
_cross product_ of the collections. If reduction should proceed
over collections in parallel, the first "coll" given should be
the special keyword :||. If given, this causes F to be applied to
the elements of colls as stepped in parallel: f(coll1[i] coll2[i]
.. colln[i]), i in [0 .. (count smallest-given-coll)].
rotate
(rotate coll)
(rotate n coll)
Rotate (seq coll) by n positions. In single arg case, n=1.
rotations
(rotations x)
Returns a lazy seq of all rotations of a seq
separate
(separate f s)
Returns a vector:
[(filter f s), (filter (complement f) s) ]
separatev
(separatev f s)
Eager separate, returns [(filterv f s) (filterv (complement f) s)]
WARNING: Uses eager drop - use with caution on large colls. Will
infinite loop on infinite colls!
sliding-take
(sliding-take n coll)
(sliding-take d n coll)
Sliding window take. N is the "window" size to slide across
collection COLL treated as a sequence. D is the slide displacement
and defaults to 1.
splitv-at
(splitv-at n coll)
Eager split. Uses transducers to eagerly split a coll a pos n.
WARNING: Uses eager drop - use with caution on large colls. Will
infinite loop on infinite colls!
subsets
(subsets coll)
All the subsets of elements of coll
take-until
(take-until pred coll)
take-until-nochange
(take-until-nochange sq & {:keys [elt=], :or {elt= =}})
Eagerly takes from SQ until consecutive elements are the same. So,
take until and up to element Ei, where Ei=Ei+1. Equality of
elements is determined by elt=, which defaults to =.
takev
(takev n coll)
Eager take. Uses transducers to eagerly take from a coll
takev-until
(takev-until pred coll)
Eager take-until. Uses transducers to eagerly take from a coll
WARNING: Use with caution on large colls. May infinite loop on
infinite colls!
takev-while
(takev-while pred coll)
Eager take-while. Uses transducers to eagerly take from a coll
WARNING: Use with caution on large colls. May infinite loop on
infinite colls!
transpose
(transpose colls)
(transpose coll1 coll2 & colls)
Matrix transposition. Well, effectively. Can be used in some
other contexts, but does the same computation. Takes colls a
collection of colletions, treats this as a matrix of (count colls)
rows, each row being a string or seqable data structure: M[rij],
where rij is the jth element of the ith row. Returns M' = M[cji],
where cji is the ith element of the jth column of M.
For the cases where colls is a string or a collection of strings,
returns M with rows as strings (effectively M[(apply str cji)]).
vfold
(vfold f coll)
(vfold f n coll)
(vfold f n coll & colls)
Fold function f over a collection or set of collections (c1, ...,
cn) producing a collection (concrete type of vector). Arity of f
must be equal to the number of collections being folded with
parameters matching the order of the given collections. Folding
here uses the reducer lib fold at base, and thus uses work stealing
deque f/j to mostly relieve the partition problem. In signatures
with N given, N provides the packet granularity, or if (< N 1),
granularity is determined automatically (see below) as in the
base (non N case) signature.
While vfold is based on r/fold, it abstracts over the combiner,
reducer, work packet granularity, and transforming multiple
collections for processing by f by chunking the _transpose_ of the
collection of collections.
As indicated above, vfold's fold combiner is monoidal on vectors:
it constructs a new vector from an l and r vector, and has identity
[] (empty vector). In line with this, vfold's reducer builds up
new vectors from elements by conjing (f a1, ... an) onto a prior
result or the combiner identity, [], as initial result.
Packet granularity is determined automatically (base case or N=0)
or supplied with N > 1 in signatures with N. Automatic
determination tries to balance significant work chunks (keep thread
overhead low), with chunks that are small enough to have multiple
instances per worker queue (supporting stealing by those workers
that finish ahead of others).
xprod
(xprod k coll)
(xprod xfn k coll)
Cross product item generation of size K over COLL. xfn is a
transform function applied to each item set generated, and defaults
to simply aggregating them in a vector.
Examples:
(xprod 2 "ADN")
=> [[\A \A] [\A \D] [\A \N] [\D \A] [\D \D]
[\D \N] [\N \A] [\N \D] [\N \N]]
(xprod str 2 "ADN")
=> ["AA" "AD" "AN" "DA" "DD" "DN" "NA" "ND" "NN"]
xprod-rng1k
(xprod-rng1k k coll)
(xprod-rng1k xfn k coll)
Cross product item generation ranging from size 1 to K over
COLL. The cross products for each size are concatenated in order.
xfn is a transform function applied to each item set generated, and
defaults to simply aggregating them in a vector.
Examples:
(xprod-rng1k 1 "ADN")
=> ([\A] [\D] [\N])
(xprod-rng1k 2 "ADN")
=> ([\A] [\D] [\N] [\A \A] [\A \D] [\A \N] [\D \A] [\D \D]
[\D \N] [\N \A] [\N \D] [\N \N])
(xprod-rng1k str 3 "ADN")
=> ("A" "D" "N" "AA" "AD" "AN" "DA" "DD"
"DN" "NA" "ND" "NN" "AAA" "AAD" "AAN"
"ADA" "ADD" "ADN" "ANA" "AND" "ANN" "DAA"
"DAD" "DAN" "DDA" "DDD" "DDN" "DNA" "DND"
"DNN" "NAA" "NAD" "NAN" "NDA" "NDD" "NDN"
"NNA" "NND" "NNN")