Files
scylladb/docs/using-scylla/integrations/integration-databricks.rst
Anna Stuchlik c5580399a8 replace the Driver pages with a link to the new Drivers pages
This commit removes the now redundant driver pages from
the Scylla DB documentation. Instead, the link to the pages
where we moved the diver information is added.
Also, the links are updated across the ScyllaDB manual.

Redirections are added for all the removed pages.

Fixes https://github.com/scylladb/scylladb/issues/26871

Closes scylladb/scylladb#27277
2025-12-04 10:07:27 +02:00

70 lines
2.0 KiB
ReStructuredText

==================================
Integrate ScyllaDB with Databricks
==================================
ScyllaDB is Apache Cassandra compatible at the CQL binary protocol level, and any driver which uses CQL will work with ScyllaDB. See `ScyllaDB Drivers <https://docs.scylladb.com/stable/drivers/index.html>`_. Any application which uses a CQL driver will work with ScyllaDB, for example, Databricks Spark cluster.
Resource list
-------------
Although your requirements may be different, this example uses the following resources:
* ScyllaDB cluster
* Databricks account
Integration instructions
------------------------
**Before you begin**
Verify that you have installed ScyllaDB and know the ScyllaDB server IP address.
Make sure you have a connection on port 9042:
.. code-block:: none
curl <scylla_IP>:9042
**Procedure**
1. Create a new Databricks cluster with the following configuration:
Databricks runtime version:
.. code-block:: none
Runtime: 9.1 LTS (Scala 2.12, Spark 3.1.2)
Spark config:
.. code-block:: none
spark.sql.catalog.<your_catalog> com.datastax.spark.connector.datasource.CassandraCatalog
spark.sql.catalog.<your_catalog>.spark.cassandra.connection.host <your_host>
spark.cassandra.auth.username <your_username>
spark.cassandra.auth.password <your_password>
2. Once this set up, install connector library by Maven:
(Path: Libraries --> Install new --> Maven --> Search Packages --> Maven Centrall)
.. code-block:: none
com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.1.0
**Test case**
1. Prepare test data [ScyllaDB]:
.. code-block:: none
CREATE KEYSPACE databriks WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor' : 3};
CREATE TABLE databriks.demo1 (pk text PRIMARY KEY, ck1 text, ck2 text);
INSERT INTO databriks.demo1 (pk, ck1, ck2) VALUES ('pk', 'ck1', 'ck2');
2. Create and play new notebook [Databricks]:
.. code-block:: none
df = spark.read.cassandraFormat.table("<your_catalog>.databriks.demo1")
display(df)