书名：Machine Learning with Spark（Second Edition）
作者名：Rajdeep Dua Manpreet Singh Ghotra Nick Pentreath
本章字数：262字
更新时间：2025-04-04 19:20:52

Submitting a Job

To submit a job from the Cloud Platform Console to the cluster, go to the Cloud Platform UI. Select the appropriate project and then click on Continue. The first time you submit a job, the following dialog appears:

Click on Submit a job:

To submit a Spark sample job, fill the fields on the Submit a job page, as follows:

Select a cluster name from the cluster list on the screen.
Set Job type toSpark.
Add file:///usr/lib/spark/lib/spark-examples.jar to Jar files. Here, file:/// denotes a Hadoop LocalFileSystem scheme; Cloud Dataproc installs /usr/lib/spark/lib/spark-examples.jar on the cluster's master node when it creates the cluster. Alternatively, you can specify a Cloud Storage path (gs://my-bucket/my-jarfile.jar) or an HDFS path (hdfs://examples/myexample.jar) to one of the custom jars.
Set Main class or jar to org.apache.spark.examples.SparkPi.
Set Arguments to the single argument 1000.

Click on Submit to start the job.

Once the job starts, it is added to the Jobs list. Refer to the following screenshot:

Once the job is complete, its status changes:

Take a look at the job output as listed here.

Execute the command from the terminal with the appropriate Job ID.

In our case, the Job ID was 1ed4d07f-55fc-45fe-a565-290dcd1978f7 and project-ID was rd-spark-1; hence, the command looks like this:

  $ gcloud beta dataproc --project=rd-spark-1 jobs wait 1ed4d07f-
    55fc-45fe-a565-290dcd1978f7

The (abridged) output is shown here:

Waiting for job output...
16/01/28 10:04:29 INFO akka.event.slf4j.Slf4jLogger: Slf4jLogger 
    started
16/01/28 10:04:29 INFO Remoting: Starting remoting
...
Submitted application application_1453975062220_0001
Pi is roughly 3.14157732

You can also SSH into the Spark Instance and run spark-shell in the interactive mode.

本周热推：

计算机网络 Visual Basic.NET+SQL Server全程指南 AI 3.0 AI的25种可能 ABB工业机器人编程全集