70 lines
2.0 KiB
ReStructuredText
70 lines
2.0 KiB
ReStructuredText
==================================
|
|
Integrate ScyllaDB with Databricks
|
|
==================================
|
|
|
|
ScyllaDB is Apache Cassandra compatible at the CQL binary protocol level, and any driver which uses CQL will work with ScyllaDB (more :doc:`here </using-scylla/drivers/index>`). Any application which uses a CQL driver will work with ScyllaDB, for example, Databricks Spark cluster.
|
|
|
|
Resource list
|
|
-------------
|
|
Although your requirements may be different, this example uses the following resources:
|
|
|
|
* ScyllaDB cluster
|
|
* Databricks account
|
|
|
|
Integration instructions
|
|
------------------------
|
|
|
|
**Before you begin**
|
|
|
|
Verify that you have installed ScyllaDB and know the ScyllaDB server IP address.
|
|
Make sure you have a connection on port 9042:
|
|
|
|
.. code-block:: none
|
|
|
|
curl <scylla_IP>:9042
|
|
|
|
|
|
**Procedure**
|
|
|
|
1. Create a new Databricks cluster with the following configuration:
|
|
|
|
Databricks runtime version:
|
|
|
|
.. code-block:: none
|
|
|
|
Runtime: 9.1 LTS (Scala 2.12, Spark 3.1.2)
|
|
|
|
Spark config:
|
|
|
|
.. code-block:: none
|
|
|
|
spark.sql.catalog.<your_catalog> com.datastax.spark.connector.datasource.CassandraCatalog
|
|
spark.sql.catalog.<your_catalog>.spark.cassandra.connection.host <your_host>
|
|
spark.cassandra.auth.username <your_username>
|
|
spark.cassandra.auth.password <your_password>
|
|
|
|
2. Once this set up, install connector library by Maven:
|
|
(Path: Libraries --> Install new --> Maven --> Search Packages --> Maven Centrall)
|
|
|
|
.. code-block:: none
|
|
|
|
com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.1.0
|
|
|
|
**Test case**
|
|
|
|
1. Prepare test data [ScyllaDB]:
|
|
|
|
.. code-block:: none
|
|
|
|
CREATE KEYSPACE databriks WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor' : 3};
|
|
CREATE TABLE databriks.demo1 (pk text PRIMARY KEY, ck1 text, ck2 text);
|
|
INSERT INTO databriks.demo1 (pk, ck1, ck2) VALUES ('pk', 'ck1', 'ck2');
|
|
|
|
2. Create and play new notebook [Databricks]:
|
|
|
|
.. code-block:: none
|
|
|
|
df = spark.read.cassandraFormat.table("<your_catalog>.databriks.demo1")
|
|
display(df)
|
|
|