Vectors¶
- 
class 
pyspark.ml.linalg.Vectors[source]¶ Factory methods for working with vectors.
Notes
Dense vectors are simply represented as NumPy array objects, so there is no need to convert them for use in MLlib. For sparse vectors, the factory methods in this class create an MLlib-compatible type, or users can pass in SciPy’s scipy.sparse column vectors.
Methods
dense(*elements)Create a dense vector of 64-bit floats from a Python list or numbers.
norm(vector, p)Find norm of the given vector.
sparse(size, *args)Create a sparse vector, using either a dictionary, a list of (index, value) pairs, or two separate arrays of indices and values (sorted by index).
squared_distance(v1, v2)Squared distance between two vectors.
zeros(size)Methods Documentation
- 
static 
dense(*elements: Union[float, bytes, numpy.ndarray, Iterable[float]]) → pyspark.ml.linalg.DenseVector[source]¶ Create a dense vector of 64-bit floats from a Python list or numbers.
Examples
>>> Vectors.dense([1, 2, 3]) DenseVector([1.0, 2.0, 3.0]) >>> Vectors.dense(1.0, 2.0) DenseVector([1.0, 2.0])
- 
static 
norm(vector: pyspark.ml.linalg.Vector, p: NormType) → numpy.float64[source]¶ Find norm of the given vector.
- 
static 
sparse(size: int, *args: Union[bytes, Tuple[int, float], Iterable[float], Iterable[Tuple[int, float]], Dict[int, float]]) → pyspark.ml.linalg.SparseVector[source]¶ Create a sparse vector, using either a dictionary, a list of (index, value) pairs, or two separate arrays of indices and values (sorted by index).
- Parameters
 - sizeint
 Size of the vector.
- args
 Non-zero entries, as a dictionary, list of tuples, or two sorted lists containing indices and values.
Examples
>>> Vectors.sparse(4, {1: 1.0, 3: 5.5}) SparseVector(4, {1: 1.0, 3: 5.5}) >>> Vectors.sparse(4, [(1, 1.0), (3, 5.5)]) SparseVector(4, {1: 1.0, 3: 5.5}) >>> Vectors.sparse(4, [1, 3], [1.0, 5.5]) SparseVector(4, {1: 1.0, 3: 5.5})
- 
static 
squared_distance(v1: pyspark.ml.linalg.Vector, v2: pyspark.ml.linalg.Vector) → numpy.float64[source]¶ Squared distance between two vectors. a and b can be of type SparseVector, DenseVector, np.ndarray or array.array.
Examples
>>> a = Vectors.sparse(4, [(0, 1), (3, 4)]) >>> b = Vectors.dense([2, 5, 4, 1]) >>> a.squared_distance(b) 51.0
- 
static 
zeros(size: int) → pyspark.ml.linalg.DenseVector[source]¶ 
- 
static