Setup Maven scala and spark project. Build spark scala project with spark dependencies and scala build plugin.
Set up the Scala and Maven development environment: You need to have the latest version of Scala and Maven installed on your machine. Create a new Maven project: You can use the following command to create a new Maven project:
mvn archetype:generate -DgroupId=com.mycompany.app -DartifactId=my-app -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false
Add Scala and Maven Scala plugin dependencies: In the pom.xml file of your project, add the following dependencies:
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.13.3</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>2.15.2</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
Above are the basic steps to build a Scala project using Maven. Now we will see how to add spark dependency to it.
<!-- Apache Spark -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.13</artifactId>
<version>3.1.2</version>
</dependency>
<!-- Spark SQL -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.13</artifactId>
<version>3.1.2</version>
</dependency>
We can additionally add spark-streaming
and scala-test
to the project depedencies.
Note that the versions of Scala and Apache Spark in the example above are specific to this example and you may need to change them based on the version you are using.
To set up a Scala Spark project using Maven with properties, you can add the following to your pom.xml file:
<properties>
<!-- Scala version -->
<scala.version>2.13.3</scala.version>
<!-- Spark version -->
<spark.version>3.1.2</spark.version>
<!-- Scalatest version -->
<scalatest.version>3.2.3</scalatest.version>
</properties>
<dependencies>
<!-- Scala library -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<!-- Apache Spark -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- Spark SQL -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- Spark Testing -->
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_${scala.version}</artifactId>
<version>${scalatest.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
By default, the source directory is src/main/scala
for the scala code. There is no need to specifically mention the source and test directories.
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
Also there is no need to use a maven compiler as scala plugin will take care of building the project.
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.10.1</version>
</plugin>
You can simply the project with
mvn compile