Scala stuff

/code#scala

Scala stuff

Syntax overview

trait A {
  // The lack of parens means that this method can be invoked without parentheses, behaving much like a property
  // (It can be invoked *with* parens, but if a method is defined with parens, it must be invoked with them.)
  def method: String
  // Traits can provide defeault implementations which makes it easier to, e.g. have a bunch of functionality built in terms of
  // a smaller number of implementer-defined methods.
  // Method implementations can be bare expressions, or they can be block expressions
  def method2(arg: String): ReturnType = defaultImplementation
  
  // I believe that while this is declared as a `val` rather than `def`, it will still be implemented as a nullary method
  // to allow it to be overridden
  val someValue: Int
}

trait B {
  def anotherMethod(): String = "Hello"
}

// I'm not sure why Scala went with this English-like syntax, other than to make it clearer that if you have a parent class,
// it must be in the first position, while only traits can follow `with`
class Example(cosntructorArgument: Int, val attributeDerivedFromConstructor: Int) extends A with B {
  def method: String = "This is a class"
  val someValue: Int = constructorArgument + 3
  def anotherMethod(): String = attributeDerivedFromConstructor.toString
  
  // Alternate constructor
  def this(seed: Int) = {
    this(seed / 2, seed * 3)
  }
}
// Class is constructed like
new Example(3, 4)
// or
new Example(42)

// This is a companion object to the Example class
// It can see Example's privates, and basically holds what in Java would be static attributes and methods
// Unlike Java, it's implemented as a singleton
object Example {
  // We cover this more later, but this lets us construct Example using just a function call, without using the `new` keyword
  def apply(firstArg: Int, secondArg: Int): Example = new Example(firstArg, secondArg)
}

Example(3, 4)

Generics

Similar to Java, Scala supports generics. It uses square brackets for type parameters, versus the <> used in Java or C++. This makes parsing much easier: there's no doubt about whether a < b > c is a comparison or a variable declaration, but does mean that array indexing is done via parentheses (and yes, this does mean that a List[T] is a function from Int to T).

Type erasure

JVM bytecode has no concept of generics. A List[Int] is the same type as a List[String]. I think I've only run into this as an obstacle once in Java, but multiple times when writing Scala.

Type bounds

You can constrain a type parameter to be a subtype or a supertype of another type. A <: B means that A must be a subtype of B, while A :> B means that it must be a supertype.

To use an example from the docs:

abstract class Pet extends Animal {}
class Cat extends Pet {
  override def name: String = "Cat"
}

class Dog extends Pet {
  override def name: String = "Dog"
}

class Lion extends Animal {
  override def name: String = "Lion"
}

class PetContainer[P <: Pet](p: P) {
  def pet: P = p
}

You can create a PetContainer[Dog] or a PetContainer[Cat], but not a PetContainer[Lion]. PetContainer is also different from

class GenericPetContainer(p: Pet) {
  def pet: Pet = p
}

in that you can store either a Cat or a Dog in a GenericPetContainer. That is,

// Var means we can rebind the variable to a new value
var dogContainer = new PetContainer[Dog](new Dog)
dogContainer = new PetContainer[Cat](new Cat) // Doesn't work: PetContainer[Dog] != PetContainer[Cat]
var petContainer = new GenericPetContainer(new Dog)
petContainer = new GenericPetContainer(new Cat) // Perfectly allowed

Expression-oriented syntax

The last expression of a block or statement is its return value. That means that if the body of a method is a single expression, it's not necessary to wrap it in braces. It also means that things like if statements are actually if expressions.

Explicit return is discouraged

There is a return statement, but its use is mostly reserved for early exit. return also behaves somewhat unintuitively when used inside anonymous functions. Consider

def firstEven(l: List[Int]): Option[Int] = {
  l.forEach { v =>
    if (v % 2 == 0) return Some(v)
  }
  None
}

Even though the v => ... is technically a separate function, the return statement will return from firstEven instead. This is all fine and dandy until you have something like

def applyFilter(l: List[Int], filter: Int => ()) = {
  l.forEach(filter)
  None
}
def makeIsEvenFilter(): Int => () = { v => if (v % 2 == 0) return Some(v) }
applyFilter(List(1, 2, 3), makeIsEvenFilter)

The return will try to return from the call to makeIsEvenFilter (since that's where it lives, lexically), but that stack frame no longer exists. Sadness ensues. (Note that it's impossible to write an isEvenFilter function with the same semantics.)

This does mean that it's impossible to do an early return from a closure, but if you've found that's necessary, perhaps that logic shouldn't be in an anonymous function.

Optional parentheses and dots

If a nullary method has no side effects, it's standard to define it without parentheses. Think of it like a property getter.

If you have a unary parameter list, it can be called without parentheses, e.g.

List(1, 2, 3) filter isEven

As demonstrated, you can also call it without a .. This kind of thing is used to build DSLs. At Sortable, we tended to avoid it.

By extension, operators are just methods with unsual names: 1 + 2 is equivalent to 1.+(2). Most methods are left-associative, unless the name ends with a : character, in which case it's right-associative. I suspect that this rule was added to make :: work as a cons operator.

Block-like syntax

Because unary parameter lists can be called without parentheses, it's common for first-class functions to be in a parameter list on their own. The result can look like custom keywords, e.g.

Try {
  // some operation here
}

returns a Try monad (Result in Rust, Either in Haskell), but is just implemented as

object Try {
  /** Constructs a `Try` using the by-name parameter.  This
   * method will ensure any non-fatal exception is caught and a
   * `Failure` object is returned.
   */
  def apply[T](r: => T): Try[T] =
    try Success(r) catch {
      case NonFatal(e) => Failure(e)
    }
}

Mutability

The language is immutable by default. The default data types are all persistent data structures: the cost of updating a hashmap, e.g., is only O(log(n) * S), where S is the splay factor (usually 32, IIRC?), as it updates the leaf node, then each of the parents on the way to the root. The result is a new hashmap which shares most of its storage with the old hashmap. There are algebraic data types, kind of. Product types are "case class"es:

case class Person(name: String, age: Int) {
  // s"" does string interpolation. It also tends to mess a bit with string escaping.
  // You can create your own custom string processors; there's a raw"" which just does nothing.
  def sayHi() = println(s"Hello, ${name}")
  // Nullary functions with side-effects get parentheses; those without tend don't, by convention
  def birthday: Person = this.copy(age = this.age + 1)
}

Case classes automatically generate things like the copy method, and a companion object (which Scala uses instead of static methods and fields) with apply and unapply methods, more on later. Case classes are immutable.For case classes without any fields, one tends to use singleton case objects instead.

Sum types are handled by creating a (sealed) trait and implementing classes or objects implementing that trait. A sealed trait can only be implemented by types in the same file. Scala allows multiple class/etc definitions in the same file, unlike Java. Only traits, classes and objects are allowed as top-level items, though.

Scala has a concept of null because of JVM/Java compatibility, but it's generally discouraged in favour of Option. Typing together the above, Option's definition is something like:

// The + indicates variance.
// Option is covariant in its type parameter: If A is a subtype of B, then Option[A] is a subtype of Option[B]. 
// A leading - would indicate contravariance. The default is invariance.
trait Option[+T] {
  def map[S](f: T => S): Option[S]
  def get: T
  // We don't require that the 'default' value be a T, only that it's a subtype of T
  // The `default` parameter is an example of call-by-name syntax: we only evaluate the parameter if necessary
  def getOrElse[B >: A](default: => B): // A whole bunch of other useful methods omitted
}

case class Some[T](value: T) extends Option[T] {
  def map(f: T => S): Option[S] = Some(f(value))
  def get: T = value
  def getOrElse[B >: A](default: => B): B = value
}

// Nothing is the uninhabited type, i.e. the type with no instances. 
// It's the type of functions that never return, thrown exceptions, etc. 
// Because you can never have an instance of Nothing, it can do everything: ex falso, quod libet. 
// As a result, it's a subtype of every other type, so 
case object None extends Option[Nothing] {
  def map(F: T => S): Option[S] = None
  def get: T = throw new UnwrappedNone()
  def getOrElse[B >: A](default: => B): B = default
}

(This is not actually how it's implemented, and I'm not quite sure why not. I suspect the answer is "optimizations".)

Pattern matching

There's pattern matching, and under good circumstances, it's also exhaustive. If the value you're matching on is a sealed trait, it can tell you if you're missing something. If you're missing on an unsealed trait, it can't.

Apply

Every "callable" value implements the apply method.

case class Adder(value: Int) {
  def apply(other: Int): Int = value + other
}
val addTwo = Adder(2)
println(addTwo(3)) // Prints 5

In the example above, there's also a generated

object Adder {
  def apply(value: Int): Adder = new Adder(value)
}

Unapply

There's also an unapply method, used for pattern matching. The Adder case class also has a generated

object Adder {
  def unapply(adder: Adder): Option[Int] = Some(adder.value)
}

This can be exploited to do magical things. For example, regexes implement unapply:

val keyValueRe = "(\w+) = (\d+)".r
"age = 6" match {
  case keyValueRe(name, value) => println(s"Saw key with name ${name} and value ${value}")
  case other: String => println(s"'${other}' is not a valid key-value pair")
}

As you might imagine, this can lead to creative uses of unapply.

Underscores are magic

You can use them for wildcards in imports:

import scala.util._ // Probably a bad idea

You can use them in pattern matching as a catch-all that's not captured:

foo match {
  // Other case statemements omitted
  case _ => // Default case goes here
}

They're used for partial application and converting methods into functions (eta-expansion):

object Multiplier {
  // mult is a method on the Multiplier singleton
  def mult(a: Int, b: Int) = a * b
}
// mulByThree is a partially applied function, i.e. an object with an 'apply' method.
val mulByThree = Multiplier.mult(3, _)

They're used as implicit arguments to implicit anonymous functions:

def addTwo(maybeNumber: Option[Int]): Option[Int] = maybeNumber.map(_ + 2)
// Equivalent to 
def addTwo(maybeNumber: Option[Int]): Option[Int] = maybeNumber.map(n => n + 2) 

I feel like I'm missing some other use as well?

Each of these things is useful, but sometimes it's ambiguous about whether _ is being used for eta-expansion, or if you intended the identity function.

Macros

I've used libraries that use macros, but I've never written one myself. They're more akin to Lisp macros than C's purely lexical macros.

For comprehensions

You want for loops? Scala's got 'em!

for { x <- List(1, 2, 3) } yield x * 2
// Returns List(2, 4, 6)

Want nested loops?

for {  x <- List(1, 2, 3)  y <- List(10, 20, 30)} yield x + y
// Returns List(11, 21, 31, 12, 22, 32, 13, 23, 33)

Want to calculate intermediate values, or filter things?

for {
  x <- List(1, 2, 3, 4, 5, 6)
  z = x * 2
  if z % 4 == 0
  y <- List(10, 20)
} yield z + y
// Returns List(14, 18, 22, 24, 28, 32)

These are all just syntactic sugar for (respectively)

List(1, 2, 3).map(x => x * 2)
List(1, 2, 3).flatMap(x => List(10, 20, 30).map(y => x + y))
List(1 ,2, 3, 4, 5 ,6)
  .map(x => (x, x * 2))
  .filter { case (x, z) => z % 4 == 0 }
  .flatMap { case (x, z) => List(10, 20).map(y => y + z) }

And because this transformation is syntactic rather than semantic (e.g. no Monad trait), it works with anything that implements the necessary methods. That is, it's duck-typing monads.

That means the following also works:

for {
  a <- apiCallReturningFuture()
  b <- anotherApiCallReturningAFuture(a)
} yield b

since Future implements map and flatMap.

Partial Functions

Not to be confused with partially applied functions, Scala has a type-level concept of functions that are only valid on some (but not all) input values. There's a number of ways to do it, but a typical way is to use pattern matching, e.g.

def halve(x: Int): Int = {
  case x if x % 2 == 0 => x / 2
}

Calling halve on an odd value is a runtime exception, although there are methods and such that work natively with partial functions.

List(1, 4, 5, 8, 9).collect(halve)

would return List(2, 4): it's like a selective map.

Partial functions also have a lift method that changes the function into a total one by wrapping the return type in Option.

Multiple parameter lists

Methods can have multiple parameter lists. Sometimes this helps with type inference, e.g. List.foldLeft[B](z: B)(op: (B, A) => B): B. Having the initial value in a separate param list defines it for the second one. You can also write something like List(1, 2, 3).foldLeft("") to return a function with a signature of (op: (String, Int) => String) => String.

#Implicits

It took me a long time to really wrap my head around implicits and when you might want to use them. If you declare a parameter as implicit, Scala will search a series of scopes to find a value marked with implicit and with a matching type to pass in automatically. I can't remember the search order, but I do know that if you have

def frobnicate[T](foo: T)(implicit frobber: Frob[T])

it will search first the immediate lexical scope, then the current file, before look in places like the companion objects for Frob and T, wherever those might be.

Sometimes implicits are used to provide values that somewhat environmental, e.g. the ExecutionContext for threaded operations. You can import the default execution context into your module, or import the one used by your library/program/etc., or you can even pass one explicitly. If you're writing a library and don't want to force a given ExecutionContext on your users, you can accept an execution context as an implicit parameter, and it will be implicitly passed down.

Reminder about type erasure

The JVM has no concept of generics, beyond arrays. Type parameters exist for type checking purposes, but no longer exist at runtime. That means you can write, e.g.,

foo match {
  case _: String => "String"
  case _: Int => "Int"
}

but not

foo match {
  case _: List[String] => "String"
  case _: List[Int] => "Int"
}

since those are both reduced to just List at runtime.

Type manifests

Scala's solution is to provide type manifests: objects containing erased type information, automatically generated by the compiler as needed at the point where concrete types are available, and passed down implicitly. In this case, you might have

def nameType[T](value: List[T])(implicit m: Manifest[T]) = // Do reflectiony things on T via m
// This needs to take an implicit manifest as well to provide to nameType, since it doesn't know the concrete type of T
def indirectlyNameType[T](value: List[T])(implicit m: Manifest[T]) = nameType(value)
// Scala knows the concrete type of T here, so passes a Manifest[Int] automagically
nameType(List(1, 2, 3)) 
// Likewise, but Manifest[String]
nameType(List("1", "2", "3")) 

Manifest is actually deprecated these days, but I can't remember what new type one is supposed to use.

Type classes

Implicts are also used a lot for builders and type classes. Methods like List[T].max take an implicit math.Ordering[T]. For built-in types, there are snippets in Ordering.scala like

  trait IntOrdering extends Ordering[Int] {
    def compare(x: Int, y: Int): Int = java.lang.Integer.compare(x, y)
  }
  implicit object Int extends IntOrdering with CachedReverse[Int]

that will be used automatically if no other options are present. If you have your own type, you could write

case class Person(name: String, age: String)
object Person {
  // Sort people by age; who cares about names?
  implicit val ordering: math.Ordering = new math.Ordering {
    def compare(x: Person, y: Person): Int = java.lang.Integer.compare(x.age, y.age)
  }
}

If you wanted to override the ordering, you could even write

implict val customPersonOrdering = new math.Ordering {
  def compare(x: Person, y: Person): Int = x.name.compareTo(y.name)
}
List(jim, bob, jimmybob).max()

math.Ordering is an example of a type class. In Scala 3 it's been given more dedicated syntax.

Context bounds

Because being able to pass an implicit type class is fairly common, Scala added syntax for it.

def cofrobbable[T: Frobbable](a: T, b: T)

is equivalent to

def cofrobbable[T](a: T, b: T)(implicit _: Frobbable[T])

You might notice that since we don't have a name for our Frobbable typeclass instance, we can't directly invoke it. Scala's prelude contains the answer: the implicitly function. It's definition is dead simple:

def implicitly[T](implicit e: T): T = e

You tell it the type of the implicit value that you want, and it gives it to you. An explicit definition of cofrobable above might be

def cofrobbable[T: Frobbable](a: T, b: T) = implicitly[Frobbable[T]].cofrob(a, b)
// Roughly equivalent:
def explicitCofrobbable[T](a: T, b: T)(implicit cofrobber: Frobbable[T]) = cofrobber.cofrob(a, b)

Useful typeclass tricks

Once nice thing about implicitly passed type classes (and why Manifest even works) is that you don't even need an instance of the type. E.g., in a real-worldish example, we could have

trait ScalaToTypescript[T] {
  def typeName: String
  def convertValue(value: T): String
}
implicit object IntToTS extends ScalaToTypescript[Int] {
  def typeName = "number"
  def convertValue(value: Int) = value.toString
}
implicit object StringToTS extends ScalaToTypescript[String] {
  def typeName = "string"
  def convertValue(value: String) = "\"" + value.replace("\\", "\\\\").replace("\n", "\\n").replace("\"", "\\\"") + "\""
}
implicit def ListToTs[T : ScalaToTypescript] extends ScalaToTypescript[List[T]] {
  def typeName = implicitly[ScalaToTypescript[T]].typeName + "[]"
  def convertValue(value: List[T]) = "[" + value.map(implicitly[ScalaToTypescript[T]].convertValue).mkString(", ")
}

And elsewhere,

implicit object PersonToTS extends ScalaToTypescript[Person] {
  def typeName = "Person"
  def convertValue(value: Person) = s"new Person(${implicitly[ScalaToTypescript[String]].convertValue(value.name)}, ${implicitly[ScalaToTypescript[Int]].convertValue(value.age)})"
}

Validator example

You could do something like

trait ValidateInputs[T] {
  def validate(value: T): Option[ValidationError]
}
trait ValidationError {
  def description: String
}
implicit def Validate[T1 : ValidateInputs, T2: ValidateInputs](): ValidateInputs[Product2[T1, T2]] = new ValidateInputs {
  def validate(value: Product2[T1, T2]) = implicitly[ValidateInputs[T1]].validate(value._1)
    .orElse (implicitly[ValidateInputs[T1]].validate(value._2))
}
case class PhoneNumber(text: String)
case object InvalidPhoneNumber extends ValidationError {
  def description = "Malformed phone number"
}
implicit object PhoneNumberValidator extends ValidateInputs[PhoneNumber] {
  private val phonePattern = raw"\d{3}-\d{3}-\d{4}".r // This is a bad example, but meh
  def validate(phoneNumber: PhoneNumber) = phoneNumber.text match {
    case phonePattern() => None
    case _ => Some(InvalidPhoneNumber)
  }
}
case class TooYoung(age: Int) extends ValidationError {
  def description = s"${age} is too young"
}
case object WrongName extends ValidationError {
  def description = "It's the no Homer*s* club!"
}
// More specific than Product2
implicit object PersonValidator extends ValidateInputs[Person] {
  def validate(person: Person) = if (person.age) < 18 {
    Some(TooYoung(person.age)
  } else if (person.name == "Homer") {
    Some(WrongName)
  } else None
}

which will automatically define a ValidateInputs implementation for all 2-tuples and case classes with two elements, where both elements also implement ValidateInputs. Product1 through Product22 are defined.

Not covered by this doc