Skip to content

Spatial search

Anastasios Zouzias edited this page Sep 17, 2016 · 1 revision

Some examples using ShapeLuceneRDD

We assume that you initiated the spark-shell using the script ./spark-shell-csv.sh (which loads the spark-csv package)

import org.zouzias.spark.lucenerdd.spatial.shape.ShapeLuceneRDD
import org.zouzias.spark.lucenerdd.spatial.shape._
import org.zouzias.spark.lucenerdd._
import org.zouzias.spark.lucenerdd.LuceneRDD

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "false").option("inferSchema", "true").option("delimiter", "\t").load("src/test/resources/spatial/CH.txt")
val swissCities = df.select("C0", "C1", "C5", "C4").map(row => ((row.getDouble(2), row.getDouble(3)), row.getString(1).toLowerCase()))
val shapes = ShapeLuceneRDD(swissCities)
shapes.count

The above should return

Knn search

Now, let's perform a KNN (k-nearest neighbors) search around Bern (7.433534, 46.948380)

shapes.knnSearch( (7.433534, 46.948380), 10).foreach(println)

For more human friendly format, try

shapes.knnSearch( (7.433534, 46.948380), 20).flatMap(_.doc.textField("_1")).foreach(println)

Circle search

Now, let's see how many entries our dataset has within 1km around Bern.

shapes.circleSearch( (7.433534, 46.948380), 1, 1000).size
shapes.circleSearch( (7.433534, 46.948380), 1, 10).foreach(println)
Clone this wiki locally