JAVA8学习——Stream底层的实现(学习过程) (3)

和Collector收集器一样,同时提供了collector接口和 Collectors的工具类

public final class Spliterators {} public interface Spliterator<T> {} 我们先来看看Spliterator接口的javadoc /** * An object for traversing and partitioning elements of a source. The source * of elements covered by a Spliterator could be, for example, an array, a * {@link Collection}, an IO channel, or a generator function. 一个分割迭代器,是一个对象,用于对源中的元素进行遍历和分区。 源可以是:数组,集合,或者IO通道 * * <p>A Spliterator may traverse elements individually ({@link * #tryAdvance tryAdvance()}) or sequentially in bulk * ({@link #forEachRemaining forEachRemaining()}). 一个迭代器可以一个一个的去遍历。tryAdvance() 也可以以块的方式去遍历。 forEachRemaining() * * <p>A Spliterator may also partition off some of its elements (using * {@link #trySplit}) as another Spliterator, to be used in * possibly-parallel operations. Operations using a Spliterator that * cannot split, or does so in a highly imbalanced or inefficient * manner, are unlikely to benefit from parallelism. Traversal * and splitting exhaust elements; each Spliterator is useful for only a single * bulk computation. 也可以使用 trySplit() 对元素进行分区,形成一个新的元素迭代器。也可以以并行的方式去操作。 使用Spliterator的操作,是不能分割,或者效率非常低的分割, 如果用并行的话,不会获得很大的收益。 (比如,100个元素,分区分为 2+98,这种的就是非常低效的。就无法利用并行的优势了。) 每一个分割迭代器,只对自己特定的块有用。 * * <p>A Spliterator also reports a set of {@link #characteristics()} of its * structure, source, and elements from among {@link #ORDERED}, * {@link #DISTINCT}, {@link #SORTED}, {@link #SIZED}, {@link #NONNULL}, * {@link #IMMUTABLE}, {@link #CONCURRENT}, and {@link #SUBSIZED}. These may * be employed by Spliterator clients to control, specialize or simplify * computation. For example, a Spliterator for a {@link Collection} would * report {@code SIZED}, a Spliterator for a {@link Set} would report * {@code DISTINCT}, and a Spliterator for a {@link SortedSet} would also * report {@code SORTED}. Characteristics are reported as a simple unioned bit * set. 分割迭代器还会 设置 特性值 {@link #ORDERED}, {@link #DISTINCT}, {@link #SORTED}, {@link #SIZED}, {@link #NONNULL}, {@link #IMMUTABLE}, {@link #CONCURRENT}, {@link #SUBSIZED}. 这些属性用来控制特定的某些计算。 比如说,一个 Collection就需要SIZED特性值 Set需要DISTINCT * * Some characteristics additionally constrain method behavior; for example if * {@code ORDERED}, traversal methods must conform to their documented ordering. * New characteristics may be defined in the future, so implementors should not * assign meanings to unlisted values. 不要给没有列出来的值赋予新的含义。 * * <p><a>A Spliterator that does not report {@code IMMUTABLE} or * {@code CONCURRENT} is expected to have a documented policy concerning: * when the spliterator <em>binds</em> to the element source; and detection of * structural interference of the element source detected after binding.</a> A * <em>late-binding</em> Spliterator binds to the source of elements at the * point of first traversal, first split, or first query for estimated size, * rather than at the time the Spliterator is created. A Spliterator that is * not <em>late-binding</em> binds to the source of elements at the point of * construction or first invocation of any method. Modifications made to the * source prior to binding are reflected when the Spliterator is traversed. * After binding a Spliterator should, on a best-effort basis, throw * {@link ConcurrentModificationException} if structural interference is * detected. Spliterators that do this are called <em>fail-fast</em>. The * bulk traversal method ({@link #forEachRemaining forEachRemaining()}) of a * Spliterator may optimize traversal and check for structural interference * after all elements have been traversed, rather than checking per-element and * failing immediately. 并不是说一个迭代器在创建的时候就被绑定到源上面了。而是在满足首次遍历,首次分割,首次查询的时候,才进行绑定。 ConcurrentModificationException,在绑定之前操作,会出现这一行的异常。 叫做 快速失败。 * * <p>Spliterators can provide an estimate of the number of remaining elements * via the {@link #estimateSize} method. Ideally, as reflected in characteristic * {@link #SIZED}, this value corresponds exactly to the number of elements * that would be encountered in a successful traversal. However, even when not * exactly known, an estimated value value may still be useful to operations * being performed on the source, such as helping to determine whether it is * preferable to split further or traverse the remaining elements sequentially. * * <p>Despite their obvious utility in parallel algorithms, spliterators are not * expected to be thread-safe; instead, implementations of parallel algorithms * using spliterators should ensure that the spliterator is only used by one * thread at a time. This is generally easy to attain via <em>serial * thread-confinement</em>, which often is a natural consequence of typical * parallel algorithms that work by recursive decomposition. A thread calling * {@link #trySplit()} may hand over the returned Spliterator to another thread, * which in turn may traverse or further split that Spliterator. The behaviour * of splitting and traversal is undefined if two or more threads operate * concurrently on the same spliterator. If the original thread hands a * spliterator off to another thread for processing, it is best if that handoff * occurs before any elements are consumed with {@link #tryAdvance(Consumer) * tryAdvance()}, as certain guarantees (such as the accuracy of * {@link #estimateSize()} for {@code SIZED} spliterators) are only valid before * traversal has begun. * serial-thread-confinement : 线程安全围栏 * <p>Primitive subtype specializations of {@code Spliterator} are provided for * {@link OfInt int}, {@link OfLong long}, and {@link OfDouble double} values. * The subtype default implementations of * {@link Spliterator#tryAdvance(java.util.function.Consumer)} * and {@link Spliterator#forEachRemaining(java.util.function.Consumer)} box * primitive values to instances of their corresponding wrapper class. Such * boxing may undermine any performance advantages gained by using the primitive * specializations. To avoid boxing, the corresponding primitive-based methods * should be used. tryAdvance()方法和forEachRemaining() 提供了原生子类型的特化, int, long, doule 等,子类型默认的实现。 避免包装类型装箱拆箱操作。 如下。 For example, 如下特化版本. * {@link Spliterator.OfInt#tryAdvance(java.util.function.IntConsumer)} * and {@link Spliterator.OfInt#forEachRemaining(java.util.function.IntConsumer)} * should be used in preference to * {@link Spliterator.OfInt#tryAdvance(java.util.function.Consumer)} and * {@link Spliterator.OfInt#forEachRemaining(java.util.function.Consumer)}. * Traversal of primitive values using boxing-based methods * {@link #tryAdvance tryAdvance()} and * {@link #forEachRemaining(java.util.function.Consumer) forEachRemaining()} * does not affect the order in which the values, transformed to boxed values, * are encountered. * * @apiNote * <p>Spliterators, like {@code Iterator}s, are for traversing the elements of * a source. The {@code Spliterator} API was designed to support efficient * parallel traversal in addition to sequential traversal, by supporting * decomposition as well as single-element iteration. In addition, the * protocol for accessing elements via a Spliterator is designed to impose * smaller per-element overhead than {@code Iterator}, an d to avoid the inherent * race involved in having separate methods for {@code hasNext()} and * {@code next()}. Spliterator支持高效的,并行的操作。 支持解耦,分解,氮元素的遍历。 此外,通过accessing协议。。。 相对于 Iterator,遍历元素的时候成本更低。 原因: 之前的{@code hasNext()} and {@code next()}.搭配使用存在竞争。 现在直接使用一个tryAdvance()方法就解决了这两个方法实现的事情。 * * <p>For mutable sources, arbitrary and non-deterministic behavior may occur if * the source is structurally interfered with (elements added, replaced, or * removed) between the time that the Spliterator binds to its data source and * the end of traversal. For example, such interference will produce arbitrary, * non-deterministic results when using the {@code java.util.stream} framework. 如果源在结构上被修改了(增删改),在绑定迭代器之后和执行完毕之前这段时间内进行任意修改。 行为就是不确定的了。 所以在使用流框架的时候,要求源是不可变的 * * <p>Structural interference of a source can be managed in the following ways * (in approximate order of decreasing desirability): 源结构上的修改,是可以通过如下几个方式去修改的 * <ul> * <li>The source cannot be structurally interfered with. 如,源是不允许被修改的。 * <br>For example, an instance of * {@link java.util.concurrent.CopyOnWriteArrayList} is an immutable source. * A Spliterator created from the source reports a characteristic of * {@code IMMUTABLE}.</li> 如:CopyOnWriteArrayList是一个不可变的源。 先拷贝,再追加。 (但是效率会下降。)适合读多写少的操作。 * <li>The source manages concurrent modifications. 源本身自己去管理并发。 * <br>For example, a key set of a {@link java.util.concurrent.ConcurrentHashMap} * is a concurrent source. A Spliterator created from the source reports a * characteristic of {@code CONCURRENT}.</li> 如:ConcurrentHashMap 。 创建的是并发源 * <li>The mutable source provides a late-binding and fail-fast Spliterator. * <br>Late binding narrows the window during which interference can affect * the calculation; fail-fast detects, on a best-effort basis, that structural * interference has occurred after traversal has commenced and throws * {@link ConcurrentModificationException}. For example, {@link ArrayList}, * and many other non-concurrent {@code Collection} classes in the JDK, provide * a late-binding, fail-fast spliterator.</li> 可变的源提供了延迟绑定和快速失败的迭代分割器。 会限制时间点的缩短。 如果在遍历中修改,则会抛出ConcurrentModificationException。 * <li>The mutable source provides a non-late-binding but fail-fast Spliterator. * <br>The source increases the likelihood of throwing * {@code ConcurrentModificationException} since the window of potential * interference is larger.</li> * <li>The mutable source provides a late-binding and non-fail-fast Spliterator. * <br>The source risks arbitrary, non-deterministic behavior after traversal * has commenced since interference is not detected. * </li> * <li>The mutable source provides a non-late-binding and non-fail-fast * Spliterator. * <br>The source increases the risk of arbitrary, non-deterministic behavior * since non-detected interference may occur after construction. * </li> 总结。 1. 源是不是并发的 2. 是不是快速绑定的,是不是快速失败的(2*2 组合的四种情况。) * </ul> * 串行案例 * <p><b>Example.</b> Here is a class (not a very useful one, except * for illustration) that maintains an array in which the actual data * are held in even locations, and unrelated tag data are held in odd * locations. Its Spliterator ignores the tags. 如:类维护了一个数组。 实际的数据是在偶数的位置上存放。不想管的标签数据是存放在奇数位置上。 * * <pre> {@code * class TaggedArray<T> { * private final Object[] elements; // immutable after construction * TaggedArray(T[] data, Object[] tags) { * int size = data.length; * if (tags.length != size) throw new IllegalArgumentException(); * this.elements = new Object[2 * size]; * for (int i = 0, j = 0; i < size; ++i) { * elements[j++] = data[i]; * elements[j++] = tags[i]; * } * } * * public Spliterator<T> spliterator() { * return new TaggedArraySpliterator<>(elements, 0, elements.length); * } * * static class TaggedArraySpliterator<T> implements Spliterator<T> { * private final Object[] array; * private int origin; // current index, advanced on split or traversal * private final int fence; // one past the greatest index * * TaggedArraySpliterator(Object[] array, int origin, int fence) { * this.array = array; this.origin = origin; this.fence = fence; * } * * public void forEachRemaining(Consumer<? super T> action) { * for (; origin < fence; origin += 2) * action.accept((T) array[origin]); * } * * public boolean tryAdvance(Consumer<? super T> action) { * if (origin < fence) { * action.accept((T) array[origin]); * origin += 2; * return true; * } * else // cannot advance * return false; * } * * public Spliterator<T> trySplit() { * int lo = origin; // divide range in half * int mid = ((lo + fence) >>> 1) & ~1; // force midpoint to be even * if (lo < mid) { // split out left half * origin = mid; // reset this Spliterator's origin * return new TaggedArraySpliterator<>(array, lo, mid); * } * else // too small to split * return null; * } * * public long estimateSize() { * return (long)((fence - origin) / 2); * } * * public int characteristics() { * return ORDERED | SIZED | IMMUTABLE | SUBSIZED; * } * } * }}</pre> * 并行案例: * <p>As an example how a parallel computation framework, such as the * {@code java.util.stream} package, would use Spliterator in a parallel * computation, here is one way to implement an associated parallel forEach, * that illustrates the primary usage idiom of splitting off subtasks until * the estimated amount of work is small enough to perform * sequentially. Here we assume that the order of processing across * subtasks doesn't matter; different (forked) tasks may further split * and process elements concurrently in undetermined order. This * example uses a {@link java.util.concurrent.CountedCompleter}; * similar usages apply to other parallel task constructions. * * <pre>{@code * static <T> void parEach(TaggedArray<T> a, Consumer<T> action) { * Spliterator<T> s = a.spliterator(); * long targetBatchSize = s.estimateSize() / (ForkJoinPool.getCommonPoolParallelism() * 8); * new ParEach(null, s, action, targetBatchSize).invoke(); * } * * static class ParEach<T> extends CountedCompleter<Void> { * final Spliterator<T> spliterator; * final Consumer<T> action; * final long targetBatchSize; * * ParEach(ParEach<T> parent, Spliterator<T> spliterator, * Consumer<T> action, long targetBatchSize) { * super(parent); * this.spliterator = spliterator; this.action = action; * this.targetBatchSize = targetBatchSize; * } * * public void compute() { * Spliterator<T> sub; * while (spliterator.estimateSize() > targetBatchSize && * (sub = spliterator.trySplit()) != null) { * addToPendingCount(1); * new ParEach<>(this, sub, action, targetBatchSize).fork(); * } * spliterator.forEachRemaining(action); * propagateCompletion(); * } * }}</pre> * * @implNote * If the boolean system property {@code org.openjdk.java.util.stream.tripwire} * is set to {@code true} then diagnostic warnings are reported if boxing of * primitive values occur when operating on primitive subtype specializations. * * @param <T> the type of elements returned by this Spliterator * * @see Collection * @since 1.8 */

serial-thread-confinement

我们再来看看Spliterator类中的方法

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/wpdfjj.html