Sudoku Solver in Scala Part 1: Validated types

Sometimes you're trying to solve a puzzle so hard so you have to write a program to solve it. That happened to me with one specific sudoku so I wrote a sudoku solver.

I did some learning during this and I want to share them here. This part is about injecting validation into the type system.

What is sudoku

Sudoku is a puzzle where you need to finish filling a 9x9 grid with digits 1 to 9. Not randomly (that would be boring) but to satisfy 3 conditions:

  • All the digits in every row are unique.
  • All the digits in every column are unique.
  • All the digits in every 9 3x3 box (marked with thick borders) are unique.

You can try solving this sudoku. The rest of the post can wait.

Writing Solver Types

Let's start with a cell. It can be either full (with value) or empty (with a set of values you can write in this cell). This is a nice example of a sum type.

sealed trait Cell
case class FullCell(value: Digit) extends Cell
case class EmptyCell(allowed: Set[Digit]) extends Cell

Not we need to decide on how can we store cells in a grid. It is tempting to create a two-dimensional array just as we see it and it would make sense if we accessed it just by the row and column numbers. But we also care about 3x3 boxes. To have the same approach with any case I defined a flat vector with 81 cells and created a bunch of functions for converting any way to define coordinates to a vector index.

type GridKey = Int
object GridKey {
    def rowCol(row: Digit, col: Digit): GridKey = ???
    def boxNum(box: Digit, num: Digit): GridKey = ???
    ...
}

class Grid(val body: Vector[Cell])

You could notice type Digit used everywhere. I used a library called [refined](class Grid(val body: Vector[Cell])) to create a type that contains an integer in a range from 1 to 9. It gives you a Int => Option[Digit] function to convert integers to digits in runtime but it also validates hardcoded digits and compile time.

type Digit = Int Refined Interval.Closed[1, 9]
val a: Digit = 5 // Works
val b: Digit = 15 // Fails at compile time

// We'll also often need collections of every possible digit
// and it's not fun to validate them at runtime
// so let's hardcode an all digits array to use it everywhere
val allDigits = List[Digit](1, 2, 3, 4, 5, 6, 7, 8, 9)

Unfortunately defining two types with same validations works as if you defined a type once so there is no simple way do differentiate between value digits and coordinate digits so the next code compiles.

type DigitA = Int Refined Interval.Closed[1, 9]
type DigitB = Int Refined Interval.Closed[1, 9]
val a: DigitA = 5
val b: DigitB = a // Works

We still have to wait for Scala 3 with opaque types feature.

Ensuring a valid GridKey

Having a type of GridKey as an alias of int is not safe. It should always be a number between 0 and 80 and it's used as input argument in some Grid methods so it would be nice to have a type that implies this restriction.

The first idea is to use Refined again but there's a problem: we need to generate grid keys dynamically with formulas. This gives us an integer value to be converted into a refined type. And there's no way to convert a value to a refined type dynamically ensuring it's successful - the only function for this returns an Either. Putting .getOrElse(???) in every function to unwrap an Either is also not good. And changing our function signature to something like def rowCol(row: Digit, col: Digit): Option[GridKey] just to use refined types is even worse: such functions should always return a valid value.

Instead of this, I created a class GridKey with a single property:

class GridKey private (val value: Int) {
  assert(value >= 0 && value < gridSize)
}

It has a private constructor so the only way to create an instance of GridKey is through smart constructors that should always create valid instances. We still need property tests to make sure it's true but this is good enough.

There's one new problem though. We are going to create a lot of grid key so a need to allocate memory for a new class instance every time may cause a significant speed loss. But there's only 81 value we can have so we can pregenerate all the grid keys and then retrieve a cached value every time we need to instantiate it.

object GridKey {
  /** All the keys so we don't need to instantiate them every time. */
  val keys: Array[GridKey] = Array.range(0, gridSize).map(new GridKey(_))
  ...
}

At this point, I noticed that we could use the same trick to have grid keys as refined types but there's no sense of rewriting it again.

Static size collection

You could notice that the grid body vector needs to always have 81 elements. It would be nice to enforce this at the type level. But it didn't work out.

Refined library has collection.Size[P] predicate but I couldn't use it for the same reason why I didn't use refined for GridKey: we need to modify grid body a lot and every update requires rewrapping it into refined type. This would be too much. And we can't pregenerate all possible values in a case of grid.

There is also Shapeless library that has a feature to define static size collections. It even adds an implicit method .ensureSized[L <: Nat] to collections in order to convert them to sized ones assuming it's possible. But I faced a problem with natural number types.

Without an out-of-the-box support for dependent types in Scala, shapeless uses Peano number types to define natural numbers as types. This is a neat system where you define zero as _0, 1 as Succ[_0], 2 as Succ[Succ[_0]] and so on. Surprisingly this is enough to define an arithmetic on types using only _0 and Succ[A] as base types. This allows defining numbers within the type system of every language (I guess) but only for small enough numbers. Every next number adds one more level of type nesting and it's very easy to make your compiler cry.

Shapeless has predefined types for numbers up to 22. We need 81. But that's not the real problem.

The real problem is getting a value at an index. The Sized wrapper doesn't just retrieve a value but also uses the type system to validate the index is within collection bounds. And this means you need to provide a number type according to your index together with the index you want to retrieve. In practice this makes commands like body(5) be verified at compile time but commands like i => body(i) impossible.

You can retrieve the underlying collection from sized one via .underlying but it returns an IterableOps type making it too expensive to retrieve an index.

So how can we make sure the grid body is valid? Again, with smart constructors. We make a constructor private, add an assertion (just to be sure) and add two smart constructors that cannot create a grid with an invalid body.

class Grid private (val body: Vector[Cell]) {
  assert(body.size == gridSize)
}
object Grid {
  def empty(): Grid = new Grid(Vector.fill(gridSize)(EmptyCell()))
  def fromString(string: String): Option[Grid] = {
    val cells: List[(GridKey, Digit)] = ??? // Parse strings
    Grid.empty().set(cells)
  }
}

We still have to rely on tests to make sure invalid bodies are not created within Grid but this is good enough.

Summary

At this point, we can remove all the assert instructions because types we use give us no possibility to handle around invalid data. And even if you're dealing with such basic types as integers and collections, you still can wrap them into validated types.

Next time I'll write about composing functions.

Tags: , , ,


Comments