pyspark.RDD.distinct¶
- 
RDD.distinct(numPartitions: Optional[int] = None) → pyspark.rdd.RDD[T][source]¶ Return a new RDD containing the distinct elements in this RDD.
New in version 0.7.0.
- Parameters
 - numPartitionsint, optional
 the number of partitions in new
RDD
- Returns
 
See also
Examples
>>> sorted(sc.parallelize([1, 1, 2, 3]).distinct().collect()) [1, 2, 3]