Spark examples jar

Spark examples jar DEFAULT

Download spark-examples JAR files with dependency

Search JAR files by class name


spark-examples from group uk.gov.gchq.gaffer (version 0.6.6)

Group: uk.gov.gchq.gaffer Artifact: spark-examples
Show all versions Show documentation Show source 
 

Artifactspark-examples
Groupuk.gov.gchq.gaffer
Version 0.6.6
Last update 03. April 2017
Newest version Yes
Organization not specified
URL Not specified
License not specified
Dependencies amount 4
Dependenciesexample-graph, spark-library, spark-accumulo-library, scala-library,
There are maybe transitive dependencies!


examples_2.12 from group za.co.absa.spline.agent.spark (version 0.6.3)

Group: za.co.absa.spline.agent.spark Artifact: examples_2.12
Show all versions Show documentation Show source 
 

Artifactexamples_2.12
Groupza.co.absa.spline.agent.spark
Version 0.6.3
Last update 03. September 2021
Newest version Yes
Organization not specified
URL Not specified
License not specified
Dependencies amount 6
Dependenciesspark-core_${scala.binary.version}, spark-sql_${scala.binary.version}, spark-xml_${scala.binary.version}, spark-excel_${scala.binary.version}, agent-core_${scala.binary.version}, kafka-clients,
There are maybe transitive dependencies!


examples_2.11 from group za.co.absa.spline.agent.spark (version 0.6.3)

Group: za.co.absa.spline.agent.spark Artifact: examples_2.11
Show all versions Show documentation Show source 
 

Artifactexamples_2.11
Groupza.co.absa.spline.agent.spark
Version 0.6.3
Last update 03. September 2021
Newest version Yes
Organization not specified
URL Not specified
License not specified
Dependencies amount 6
Dependenciesspark-core_${scala.binary.version}, spark-sql_${scala.binary.version}, spark-xml_${scala.binary.version}, spark-excel_${scala.binary.version}, agent-core_${scala.binary.version}, kafka-clients,
There are maybe transitive dependencies!


lakefs-spark-examples-247_2.11 from group io.lakefs (version 0.1.5)

Spark client for lakeFS object metadata.

Group: io.lakefs Artifact: lakefs-spark-examples-247_2.11
Show all versions Show documentation Show source 
 

Artifactlakefs-spark-examples-247_2.11
Groupio.lakefs
Version 0.1.5
Last update 29. August 2021
Newest version Yes
Organization Treeverse Labs
URLhttps://github.com/treeverse/spark-client
LicenseApache 2
Dependencies amount 5
Dependenciesscala-library, lakefs-spark-client-247_2.11, bom, s3, aws-java-sdk,
There are maybe transitive dependencies!


lakefs-spark-examples-301_2.12 from group io.lakefs (version 0.1.5)

Spark client for lakeFS object metadata.

Group: io.lakefs Artifact: lakefs-spark-examples-301_2.12
Show all versions Show documentation Show source 
 

Artifactlakefs-spark-examples-301_2.12
Groupio.lakefs
Version 0.1.5
Last update 29. August 2021
Newest version Yes
Organization Treeverse Labs
URLhttps://github.com/treeverse/spark-client
LicenseApache 2
Dependencies amount 5
Dependenciesscala-library, lakefs-spark-client-301_2.12, bom, s3, aws-java-sdk,
There are maybe transitive dependencies!


spark-examples_2.12 from group com.github.immuta.hadoop (version 3.1.1-hadoop-2.7)

Group: com.github.immuta.hadoop Artifact: spark-examples_2.12
Show documentation Show source 
 

Artifactspark-examples_2.12
Groupcom.github.immuta.hadoop
Version 3.1.1-hadoop-2.7
Last update 18. March 2021
Newest version Yes
Organization not specified
URLhttp://spark.apache.org/
License not specified
Dependencies amount 1
Dependenciesscopt_${scala.binary.version},
There are maybe transitive dependencies!


spark-examples_2.12 from group ch.cern.spark (version 3.0.1)

Group: ch.cern.spark Artifact: spark-examples_2.12
Show documentation Show source 
 

Artifactspark-examples_2.12
Groupch.cern.spark
Version 3.0.1
Last update 05. November 2020
Newest version Yes
Organization not specified
URLhttp://spark.apache.org/
License not specified
Dependencies amount 1
Dependenciesscopt_${scala.binary.version},
There are maybe transitive dependencies!


snappy-spark-examples_2.11 from group io.snappydata (version 2.1.1.8)

TIBCO ComputeDB distributed data store and execution engine

Group: io.snappydata Artifact: snappy-spark-examples_2.11
Show all versions Show documentation Show source 
 

Artifactsnappy-spark-examples_2.11
Groupio.snappydata
Version 2.1.1.8
Last update 17. January 2020
Newest version Yes
Organization not specified
URLhttp://www.snappydata.io
LicenseThe Apache License, Version 2.0
Dependencies amount 14
Dependencieslog4j, slf4j-api, slf4j-log4j12, scala-library, scala-reflect, snappy-spark-core_2.11, snappy-spark-streaming_2.11, snappy-spark-mllib_2.11, snappy-spark-hive_2.11, snappy-spark-graphx_2.11, snappy-spark-streaming-flume_2.11, snappy-spark-streaming-kafka-0.10_2.11, commons-math3, scopt_2.11,
There are maybe transitive dependencies!


ti-spark-examples_2.11 from group org.bom4v.ti (version 0.0.1-spark2.3)

Sample/demonstration project for the Spark layer of BOM for Verticals

Group: org.bom4v.ti Artifact: ti-spark-examples_2.11
Show documentation Show source 
 

Artifactti-spark-examples_2.11
Grouporg.bom4v.ti
Version 0.0.1-spark2.3
Last update 28. February 2019
Newest version Yes
Organization Business Object Models for Verticals (BOM4V)
URLhttps://github.com/bom4v/ti-spark-examples
LicenseApache-2.0
Dependencies amount 11
Dependenciesscala-library, nscala-time_2.11, xgboost4j-spark_2.11, ti-models-customers_2.11, ti-models-calls_2.11, ti-serializers-customers_2.11, ti-serializers-calls_2.11, spark-core_2.11, spark-sql_2.11, spark-mllib_2.11, spark-hive_2.11,
There are maybe transitive dependencies!


snappy-spark-examples_2.10 from group io.snappydata (version 1.6.2-6)

SnappyData distributed data store and execution engine

Group: io.snappydata Artifact: snappy-spark-examples_2.10
Show all versions Show documentation Show source 
 

Artifactsnappy-spark-examples_2.10
Groupio.snappydata
Version 1.6.2-6
Last update 09. August 2016
Newest version Yes
Organization not specified
URLhttp://www.snappydata.io
LicenseThe Apache License, Version 2.0
Dependencies amount 27
Dependencieshbase-server, snappy-spark-streaming-flume_2.10, snappy-spark-streaming_2.10, guava, snappy-spark-streaming-zeromq_2.10, hbase-testing-util, snappy-spark-streaming-mqtt_2.10, algebird-core_2.10, scopt_2.10, hbase-common, hbase-client, snappy-spark-streaming-kafka_2.10, commons-math3, snappy-spark-streaming-twitter_2.10, snappy-spark-core_2.10, hbase-protocol, snappy-spark-graphx_2.10, scala-library, log4j, cassandra-all, hbase-hadoop-compat, slf4j-log4j12, scala-reflect, snappy-spark-hive_2.10, snappy-spark-mllib_2.10, snappy-spark-bagel_2.10, slf4j-api,
There are maybe transitive dependencies!




Page 1from 2(items total 14)
Sours: https://jar-download.com/?search_box=spark-examples

Apache Spark Deployment

Introduction

We have already worked with Spark shell and saw how to write transformations. But we need to understand how to execute a Spark application. Let us look at deployment of a sample Spark application in detail.

Spark applications can be deployed and executed using spark-submit in a shell command on a cluster. It can use any of the cluster managers like YARN, Mesos or its own Cluster manager through its uniform interface and there is no extra configuration needed for each one of them separately.

If the code has a dependency on other projects, we need to package those dependent projects with the Spark code so that the dependent code is also distributed on the Spark cluster. To package all these we need to create an assembly jar (or “uber” jar) with our Spark code and the dependencies. We can use SBT or Maven to create the assembly jar. We do not need to bundle the Spark and Hadoop jars in this “uber” jar but these can be listed as provided jars since these will be provided by the cluster managers during the runtime of the application. When the assembly jar is ready we can spark-submit the assembled jar.

A common spark-submit command would look like this:

./bin/spark-submit \   --class <main-class> \   --master <master-url> \   --deploy-mode <deploy-mode> \   --conf <key>=<value> \   ... # other options   <application-jar> \ [application-arguments]  Some of the commonly used options are:
  • --class: The entry point for your application (e.g. org.apache.spark.examples.SparkPi)
  • --master: The master URL for the cluster (e.g. spark://23.195.26.187:7077)
  • --deploy-mode: Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: client)
  • --conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown).
  • application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.
  • application-arguments: Arguments passed to the main method of your main class, if any

Explanation with example

Let's us look at running an example spark application. We will take the example of the SparkPi application provided in the spark examples as part of the installation.

The Scala code looks like below:

package org.apache.spark.examples  import scala.math.random  import org.apache.spark.sql.SparkSession  /** Computes an approximation to pi */  object SparkPi {    def main(args: Array[String]) {  val spark = SparkSession     .builder     .appName("Spark Pi")     .getOrCreate()  val slices = if (args.length > 0) args(0).toInt else 2  val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow  val count = spark.sparkContext.parallelize(1 until n, slices).map { i =>  val x = random * 2 - 1  val y = random * 2 - 1     if (x*x + y*y <= 1) 1 else 0  }.reduce(_ + _)  println(s"Pi is roughly ${4.0 * count / (n - 1)}")  spark.stop()    }  }

Spark can also be used for compute-intensive tasks. This code estimates π by "throwing darts" at a circle. We pick random points in the unit square ((0, 0) to (1,1)) and see how many fall in the unit circle. The fraction should be π / 4, so we use this to get our estimate.

Executing Steps

Please locate the Spark examples jar which comes with the Spark installation. On my installation, it is at this location.

/usr/local/opt/apache-spark/libexec/examples/jars/spark-examples_2.11-2.4.3.jar

The class is called SparkPi. Open the command prompt and execute the below command. This should run the SparkPi example and compute the output. This is running through the Spark’s default resource manager as we are just running on local. There are multiple options that can be specified in the spark-submit command depending on the environment you are operating and the resource manager used. For simplicity, I have used the Spark’s default.

spark-submit  --class org.apache.spark.examples.SparkPi  /usr/local/opt/apache-spark/libexec/examples/jars/spark-examples_2.11-2.4.3.jar 10

Spark Code

Spark Code

You can see the output printed as

Pi is roughly 3.1396071396071394

You can look at the options which can be used in the Spark-submit command here in the apache-spark official website.

Conclusion

In this module we understood how to deploy a Spark application and what are some of the different configuration parameters. We cannot go through every parameter, but we went through the most important ones.

Sours: https://www.knowledgehut.com/tutorials/apache-spark-tutorial/spark-deploymnet
  1. 1095 steel billet
  2. Brad keselowski 2014
  3. Luxury dressing table chair
  4. Eod pay army
  5. Kris bernal

The spark-submit command is a utility to run or submit a Spark or PySpark application program (or job) to the cluster by specifying options and configurations, the application you are submitting can be written in Scala, Java, or Python (PySpark). spark-submit command supports the following.

  1. Submitting Spark application on different cluster managers like Yarn, Kubernetes, Mesos, and Stand-alone.
  2. Submitting Spark application on or deployment modes.

Related:

In this article, I will explain different spark-submit command options and configurations along with how to use a uber jar or zip file for Scala and Java, using Python .py file, and finally how to submit the application on Yarn. Mesos, Kubernetes, and standalone cluster managers.

Table of contents

1. Spark Submit Command

Spark binary comes with script file for Linux, Mac, and command file for windows, these scripts are available at directory.

If you are using Cloudera distribution, you may also find which is used to run Spark 2.x applications. By adding this Cloudera supports both Spark 1.x and Spark 2.x applications to run in parallel.

spark-submit command internally uses class with the options and command line arguments you specify.

Below is a spark-submit command with the most-used command options.

You can also submit the application like below without using the script.

2. Spark Submit Options

Below I have explained some of the common options, configurations, and specific options to use with Scala and Python. You can also get all options available by running the below command.

2. 1 Deployment Modes (–deploy-mode)

Using , you specify where to run the Spark application driver program. Spark support cluster and client deployment modes.

ValueDescription
clusterIn cluster mode, the driver runs on one of the worker nodes, and this node shows as a driver on the Spark Web UI of your application. cluster mode is used to run production jobs.
clientIn client mode, the driver runs locally where you are submitting your application from. client mode is majorly used for interactive and debugging purposes. Note that in client mode only the driver runs locally and all other executors run on different nodes on the cluster.

2.2 Cluster Managers (–master)

Using option, you specify what cluster manager to use to run your application. Spark currently supports Yarn, Mesos, Kubernetes, Stand-alone, and local. The uses of these are explained below.

Cluster ManagerValueDescription
YarnyarnUse yarn if your cluster resources are managed by Hadoop Yarn.
Mesosmesos://HOST:PORTuse mesos://HOST:PORT for Mesos cluster manager, replace the host and port of Mesos cluster manager.
Standalonespark://HOST:PORTUse spark://HOST:PORT for Standalone cluster, replace the host and port of stand-alone cluster.
Kubernetesk8s://HOST:PORT
k8s://https://HOST:PORT
Use k8s://HOST:PORT for Kubernetes, replace the host and port of Kubernetes. This by default connects with https, but if you wanted to use unsecured use k8s://https://HOST:PORT
locallocal
local[k]
local[K,F]
Use local to run locally with a one worker thread.
Use local[k] and specify k with the number of cores you have locally, this runs application with k worker threads.
use local[k,F] and specify F with number of attempts it should run when failed.

Example: Below submits applications to yarn managed cluster.

2.3 Driver and Executor Resources (Cores & Memory)

While submitting an application, you can also specify how much memory and cores you wanted to give for driver and executors.

OptionDescription
–driver-memoryMemory to be used by the Spark driver.
–driver-coresCPU cores to be used by the Spark driver
–num-executorsThe total number of executors to use.
–executor-memoryAmount of memory to use for the executor process.
–executor-coresNumber of CPU cores to use for the executor process.
–total-executor-coresThe total number of executor cores to use.

Example:

2.4 Other Options

OptionsDescription
–filesUse comma-separated files you wanted to use.
Usually, these can be files from your resource folder.
Using this option, Spark submits all these files to cluster.
–verboseDisplays the verbose information. For example, writes all configurations spark application uses to the log file.

Note: Files specified with are uploaded to the cluster.

Example: Below example submits the application to yarn cluster manager by using cluster deployment mode and with 8g driver memory, 16g and 2 cores for each executor.

3. Spark Submit Configurations

Spark submit supports several configurations using , these configurations are used to specify Application configurations, shuffle parameters, runtime configurations.

Most of these configurations are the same for Spark applications written in Java, Scala, and Python(PySpark)

Configuration keyConfiguration Description
spark.sql.shuffle.partitionsNumber of partitions to create for wider shuffle transformations (joins and aggregations).
spark.executor.memoryOverheadAmount of additional memory to be allocated per executor process in cluster mode, this is typically memory for JVM overheads. (Not supported for PySpark)
spark.serializer (default)
spark.sql.files.maxPartitionBytesThe maximum number of bytes to be used for every partition when reading files. Default 128MB.
spark.dynamicAllocation.enabledSpecifies whether to dynamically increase or decrease the number of executors based on the workload. Default true.
spark.dynamicAllocation
.minExecutors
A minimum number of executors to use when dynamic allocation is enabled.
spark.dynamicAllocation
.maxExecutors
A maximum number of executors to use when dynamic allocation is enabled.
spark.executor.extraJavaOptionsSpecify JVM options (see example below)

Besides these, Spark also supports many more configurations.

Example :

Alternatively, you can also set these globally @ to apply for every Spark application. And you can also set using programmatically.

First preference goes to SparkConf, then spark-submit –config and then configs mentioned in spark-defaults.conf

4. Submit Scala or Java Application

Regardless of which language you use, most of the options are same however, there are few options that are specific to a language, for example, to run a Spark application written in Scala or Java, you need to use the additionally following options.

OptionDescription
–jarsIf you have all dependency jar’s in a folder, you can pass all these jars using this spark submit –jars option. All your jar files should be comma-separated.
for example –jars jar1.jar,jar2.jar, jar3.jar.
–packagesAll transitive dependencies will be handled when using this command.
–classScala or Java class you wanted to run.
This should be a fully qualified name with the package
for example .

Note: Files specified with and are uploaded to the cluster.

Example :

5. Spark Submit PySpark (Python) Application

When you wanted to spark-submit a PySpark application, you need to specify the .py file you wanted to run and specify the .egg file or .zip file for dependency libraries.

Below are some of the options & configurations specific to PySpark application. besides these you can also use most of the options & configs that are covered above.

PySpark Specific ConfigurationsDescription
–py-filesUse  to add ,  or  files.
–config spark.executor.pyspark.memoryThe amount of memory to be used by PySpark for each executor.
–config spark.pyspark.driver.pythonPython binary executable to use for PySpark in driver.
–config spark.pyspark.pythonPython binary executable to use for PySpark in both driver and executors.

Note: Files specified with are uploaded to the cluster before it run the application. You calso upload these files ahead and refer in your PySpark application.

Example 1 :

Example 2: Below example uses other python files as dependencies.

6. Submitting Application to Mesos

Here, we are submitting spark application on a Mesos managed cluster using deployment mode with 5G memory and 8 cores for each executor.

7. Submitting Application to Kubernetes

The below example runs Spark application on a Kubernetes managed cluster using cluster deployment mode with 5G memory and 8 cores for each executor.

8. Submitting Application to Standalone

The below example runs Spark application on a Standalone cluster using cluster deployment mode with 5G memory and 8 cores for each executor.

Happy Learning !!

Tags: kubernetes,mesos,standalone

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Spark Submit Command Explained with Examples
Sours: https://sparkbyexamples.com/spark/spark-submit-command/
Spark Kinesis Example

Submitting Applications

The script in Spark’s directory is used to launch applications on a cluster. It can use all of Spark’s supported cluster managers through a uniform interface so you don’t have to configure your application especially for each one.

If your code depends on other projects, you will need to package them alongside your application in order to distribute the code to a Spark cluster. To do this, create an assembly jar (or “uber” jar) containing your code and its dependencies. Both sbt and Maven have assembly plugins. When creating assembly jars, list Spark and Hadoop as dependencies; these need not be bundled since they are provided by the cluster manager at runtime. Once you have an assembled jar you can call the script as shown here while passing your jar.

For Python, you can use the argument of to add , or files to be distributed with your application. If you depend on multiple Python files we recommend packaging them into a or .

Once a user application is bundled, it can be launched using the script. This script takes care of setting up the classpath with Spark and its dependencies, and can support different cluster managers and deploy modes that Spark supports:

Some of the commonly used options are:

  • : The entry point for your application (e.g. )
  • : The master URL for the cluster (e.g. )
  • : Whether to deploy your driver on the worker nodes () or locally as an external client () (default: )
  • : Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown). Multiple configurations should be passed as separate arguments. (e.g. )
  • : Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an path or a path that is present on all nodes.
  • : Arguments passed to the main method of your main class, if any

A common deployment strategy is to submit your application from a gateway machine that is physically co-located with your worker machines (e.g. Master node in a standalone EC2 cluster). In this setup, mode is appropriate. In mode, the driver is launched directly within the process which acts as a client to the cluster. The input and output of the application is attached to the console. Thus, this mode is especially suitable for applications that involve the REPL (e.g. Spark shell).

Alternatively, if your application is submitted from a machine far from the worker machines (e.g. locally on your laptop), it is common to use mode to minimize network latency between the drivers and the executors. Currently, the standalone mode does not support cluster mode for Python applications.

For Python applications, simply pass a file in the place of , and add Python , or files to the search path with .

There are a few options available that are specific to the cluster manager that is being used. For example, with a Spark standalone cluster with deploy mode, you can also specify to make sure that the driver is automatically restarted if it fails with a non-zero exit code. To enumerate all such options available to , run it with . Here are a few examples of common options:

The master URL passed to Spark can be in one of the following formats:

Master URLMeaning
Run Spark locally with one worker thread (i.e. no parallelism at all).
Run Spark locally with K worker threads (ideally, set this to the number of cores on your machine).
Run Spark locally with K worker threads and F maxFailures (see spark.task.maxFailures for an explanation of this variable)
Run Spark locally with as many worker threads as logical cores on your machine.
Run Spark locally with as many worker threads as logical cores on your machine and F maxFailures.
Connect to the given Spark standalone cluster master. The port must be whichever one your master is configured to use, which is 7077 by default.
Connect to the given Spark standalone cluster with standby masters with Zookeeper. The list must have all the master hosts in the high availability cluster set up with Zookeeper. The port must be whichever each master is configured to use, which is 7077 by default.
Connect to the given Mesos cluster. The port must be whichever one your is configured to use, which is 5050 by default. Or, for a Mesos cluster using ZooKeeper, use . To submit with , the HOST:PORT should be configured to connect to the MesosClusterDispatcher.
Connect to a YARN cluster in or mode depending on the value of . The cluster location will be found based on the or variable.
Connect to a Kubernetes cluster in or mode depending on the value of . The and refer to the Kubernetes API Server. It connects using TLS by default. In order to force it to use an unsecured connection, you can use .

The script can load default Spark configuration values from a properties file and pass them on to your application. By default, it will read options from in the Spark directory. For more detail, see the section on loading default configurations.

Loading default Spark configurations this way can obviate the need for certain flags to . For instance, if the property is set, you can safely omit the flag from . In general, configuration values explicitly set on a take the highest precedence, then flags passed to , then values in the defaults file.

If you are ever unclear where configuration options are coming from, you can print out fine-grained debugging information by running with the option.

When using , the application jar along with any jars included with the option will be automatically transferred to the cluster. URLs supplied after must be separated by commas. That list is included in the driver and executor classpaths. Directory expansion does not work with .

Spark uses the following URL scheme to allow different strategies for disseminating jars:

  • file: - Absolute paths and URIs are served by the driver’s HTTP file server, and every executor pulls the file from the driver HTTP server.
  • hdfs:, http:, https:, ftp: - these pull down files and JARs from the URI as expected
  • local: - a URI starting with local:/ is expected to exist as a local file on each worker node. This means that no network IO will be incurred, and works well for large files/JARs that are pushed to each worker, or shared via NFS, GlusterFS, etc.

Note that JARs and files are copied to the working directory for each SparkContext on the executor nodes. This can use up a significant amount of space over time and will need to be cleaned up. With YARN, cleanup is handled automatically, and with Spark standalone, automatic cleanup can be configured with the property.

Users may also include any other dependencies by supplying a comma-delimited list of Maven coordinates with . All transitive dependencies will be handled when using this command. Additional repositories (or resolvers in SBT) can be added in a comma-delimited fashion with the flag . (Note that credentials for password-protected repositories can be supplied in some cases in the repository URI, such as in . Be careful when supplying credentials this way.) These commands can be used with , , and to include Spark Packages.

For Python, the equivalent option can be used to distribute , and libraries to executors.

Once you have deployed your application, the cluster mode overview describes the components involved in distributed execution, and how to monitor and debug applications.

Sours: https://spark.apache.org/docs/latest/submitting-applications.html

Examples jar spark

.

creation of Spark JAR file using SBT

.

Similar news:

.



202 203 204 205 206