Setup Maven Spark Project

Setup Maven scala and spark project. Build spark scala project with spark dependencies and scala build plugin.

Set up the Scala and Maven development environment: You need to have the latest version of Scala and Maven installed on your machine. Create a new Maven project: You can use the following command to create a new Maven project:

mvn archetype:generate -DgroupId=com.mycompany.app -DartifactId=my-app -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false

Add Scala Dependency

Add Scala and Maven Scala plugin dependencies: In the pom.xml file of your project, add the following dependencies:


<dependencies>
  <dependency>
    <groupId>org.scala-lang</groupId>
    <artifactId>scala-library</artifactId>
    <version>2.13.3</version>
  </dependency>
</dependencies>

<build>
  <plugins>
    <plugin>
      <groupId>org.scala-tools</groupId>
      <artifactId>maven-scala-plugin</artifactId>
      <version>2.15.2</version>
      <executions>
        <execution>
          <goals>
            <goal>compile</goal>
            <goal>testCompile</goal>
          </goals>
        </execution>
      </executions>
    </plugin>
  </plugins>
</build>

Add Spark Dependency

Above are the basic steps to build a Scala project using Maven. Now we will see how to add spark dependency to it.

 <!-- Apache Spark -->
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.13</artifactId>
    <version>3.1.2</version>
  </dependency>

  <!-- Spark SQL -->
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.13</artifactId>
    <version>3.1.2</version>
  </dependency>

We can additionally add spark-streaming and scala-test to the project depedencies.

Using libs with correct version of scala

Note that the versions of Scala and Apache Spark in the example above are specific to this example and you may need to change them based on the version you are using.

To set up a Scala Spark project using Maven with properties, you can add the following to your pom.xml file:

<properties>
  <!-- Scala version -->
  <scala.version>2.13.3</scala.version>

  <!-- Spark version -->
  <spark.version>3.1.2</spark.version>

  <!-- Scalatest version -->
  <scalatest.version>3.2.3</scalatest.version>
</properties>

<dependencies>
  <!-- Scala library -->
  <dependency>
    <groupId>org.scala-lang</groupId>
    <artifactId>scala-library</artifactId>
    <version>${scala.version}</version>
  </dependency>

  <!-- Apache Spark -->
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_${scala.version}</artifactId>
    <version>${spark.version}</version>
  </dependency>

  <!-- Spark SQL -->
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_${scala.version}</artifactId>
    <version>${spark.version}</version>
  </dependency>

  <!-- Spark Testing -->
  <dependency>
    <groupId>org.scalatest</groupId>
    <artifactId>scalatest_${scala.version}</artifactId>
    <version>${scalatest.version}</version>
    <scope>test</scope>
  </dependency>

</dependencies>

Maven Project Build

By default, the source directory is src/main/scala for the scala code. There is no need to specifically mention the source and test directories.

<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>

Also there is no need to use a maven compiler as scala plugin will take care of building the project.

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-compiler-plugin</artifactId>
    <version>3.10.1</version>
</plugin>

You can simply the project with

mvn compile