Submitting a Job

To submit a job from the Cloud Platform Console to the cluster, go to the Cloud Platform UI. Select the appropriate project and then click on Continue. The first time you submit a job, the following dialog appears:

Click on Submit a job:

To submit a Spark sample job, fill the fields on the Submit a job page, as follows:

  1. Select a cluster name from the cluster list on the screen.
  2. Set Job type toSpark.
  3. Add file:///usr/lib/spark/lib/spark-examples.jar to Jar files. Here, file:/// denotes a Hadoop LocalFileSystem scheme; Cloud Dataproc installs /usr/lib/spark/lib/spark-examples.jar on the cluster's master node when it creates the cluster. Alternatively, you can specify a Cloud Storage path (gs://my-bucket/my-jarfile.jar) or an HDFS path (hdfs://examples/myexample.jar) to one of the custom jars.
  4. Set Main class or jar to org.apache.spark.examples.SparkPi.
  5. Set Arguments to the single argument 1000.

Click on Submit to start the job.

Once the job starts, it is added to the Jobs list. Refer to the following screenshot:

Once the job is complete, its status changes:

Take a look at the job output as listed here.

Execute the command from the terminal with the appropriate Job ID.

In our case, the Job ID was 1ed4d07f-55fc-45fe-a565-290dcd1978f7 and project-ID was rd-spark-1; hence, the command looks like this:

  $ gcloud beta dataproc --project=rd-spark-1 jobs wait 1ed4d07f-
55fc-45fe-a565-290dcd1978f7

The (abridged) output is shown here:

Waiting for job output...
16/01/28 10:04:29 INFO akka.event.slf4j.Slf4jLogger: Slf4jLogger
started

16/01/28 10:04:29 INFO Remoting: Starting remoting
...
Submitted application application_1453975062220_0001
Pi is roughly 3.14157732

You can also SSH into the Spark Instance and run spark-shell in the interactive mode.