================================ Integrate Scylla with Databricks ================================ Scylla is Apache Cassandra compatible at the CQL binary protocol level, and any driver which uses CQL will work with Scylla (more :doc:`here `). Any application which uses a CQL driver will work with Scylla, for example, Databricks Spark cluster. Resource list ------------- Although your requirements may be different, this example uses the following resources: * Scylla cluster * Databricks account Integration instructions ------------------------ **Before you begin** Verify that you have installed Scylla and know the Scylla server IP address. Make sure you have a connection on port 9042: .. code-block:: none curl :9042 **Procedure** 1. Create a new Databricks cluster with the following configuration: Databricks runtime version: .. code-block:: none Runtime: 9.1 LTS (Scala 2.12, Spark 3.1.2) Spark config: .. code-block:: none spark.sql.catalog. com.datastax.spark.connector.datasource.CassandraCatalog spark.sql.catalog..spark.cassandra.connection.host spark.cassandra.auth.username spark.cassandra.auth.password 2. Once this set up, install connector library by Maven: (Path: Libraries --> Install new --> Maven --> Search Packages --> Maven Centrall) .. code-block:: none com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.1.0 **Test case** 1. Prepare test data [Scylla]: .. code-block:: none CREATE KEYSPACE databriks WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor' : 3}; CREATE TABLE databriks.demo1 (pk text PRIMARY KEY, ck1 text, ck2 text); INSERT INTO databriks.demo1 (pk, ck1, ck2) VALUES ('pk', 'ck1', 'ck2'); 2. Create and play new notebook [Databricks]: .. code-block:: none df = spark.read.cassandraFormat.table(".databriks.demo1") display(df)