- Machine Learning with Spark(Second Edition)
- Rajdeep Dua Manpreet Singh Ghotra Nick Pentreath
- 197字
- 2025-04-04 19:20:52
Matrix in Spark
A local matrix in Spark has integer-typed row and column indices. Values are double-typed. All the values are stored on a single machine. MLlib supports the following matrix types:
- Dense matrices: Matrices where entry values stored are in a single, double array in a column-major order.
- Sparse matrices: Matrices where non-zero entry values are stored in the CSC format in a column-major order. For example, the following dense matrix is stored in a one-dimensional array [2.0, 3.0, 4.0, 1.0, 4.0, 5.0] for the matrix size (3, 2):
2.0 3.0
4.0 1.0
4.0 5.0
This is an example of a dense and sparse matrix:
val dMatrix: Matrix = Matrices.dense(2, 2, Array(1.0, 2.0, 3.0,
4.0))
println("dMatrix: n" + dMatrix)
val sMatrixOne: Matrix = Matrices.sparse(3, 2, Array(0, 1, 3),
Array(0, 2, 1), Array(5, 6, 7))
println("sMatrixOne: n" + sMatrixOne)
val sMatrixTwo: Matrix = Matrices.sparse(3, 2, Array(0, 1, 3),
Array(0, 1, 2), Array(5, 6, 7))
println("sMatrixTwo: n" + sMatrixTwo)
The output of the preceding code is as follows:
[info] Running linalg.matrix.SparkMatrix
dMatrix:
1.0 3.0
2.0 4.0
sMatrixOne:
3 x 2 CSCMatrix
(0,0) 5.0
(2,1) 6.0
(1,1) 7.0
sMatrixTwo:
3 x 2 CSCMatrix
(0,0) 5.0
(1,1) 6.0
(2,1) 7.0