Commit Graph

233 Commits

Author SHA1 Message Date
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Avi Kivity
bbad8f4677 replica: move ::database, ::keyspace, and ::table to replica namespace
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.

References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.

scylla-gdb.py is adjusted to look for both the new and old names.
2022-01-07 12:04:38 +02:00
Asias He
a8ad385ecd repair: Get rid of the gc_grace_seconds
The gc_grace_seconds is a very fragile and broken design inherited from
Cassandra. Deleted data can be resurrected if cluster wide repair is not
performed within gc_grace_seconds. This design pushes the job of making
the database consistency to the user. In practice, it is very hard to
guarantee repair is performed within gc_grace_seconds all the time. For
example, repair workload has the lowest priority in the system which can
be slowed down by the higher priority workload, so that there is no
guarantee when a repair can finish. A gc_grace_seconds value that is
used to work might not work after data volume grows in a cluster. Users
might want to avoid running repair during a specific period where
latency is the top priority for their business.

To solve this problem, an automatic mechanism to protect data
resurrection is proposed and implemented. The main idea is to remove the
tombstone only after the range that covers the tombstone is repaired.

In this patch, a new table option tombstone_gc is added. The option is
used to configure tombstone gc mode. For example:

1) GC a tombstone after gc_grace_seconds

cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'timeout'} ;

This is the default mode. If no tombstone_gc option is specified by the
user. The old gc_grace_seconds based gc will be used.

2) Never GC a tombstone

cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'disabled'};

3) GC a tombstone immediately

cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'immediate'};

4) GC a tombstone after repair

cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'};

In addition to the 'mode' option, another option 'propagation_delay_in_seconds'
is added. It defines the max time a write could possibly delay before it
eventually arrives at a node.

A new gossip feature TOMBSTONE_GC_OPTIONS is added. The new tombstone_gc
option can only be used after the whole cluster supports the new
feature. A mixed cluster works with no problem.

Tests: compaction_test.py, ninja test

Fixes #3560

[avi: resolve conflicts vs data_dictionary]
2022-01-04 19:48:14 +02:00
Avi Kivity
247f2b69d5 Merge "system tables: create the schema more efficiently" from Botond
"
System tables currently almost uniformly use a pattern like this to
create their schema:

    return schema_builder(make_shared_schema(...))
    // [...]
    .with_version(...)
    .build(...);

This pattern is very wasteful because it first creates a schema, then
dismantles it just to recreate it again. This series abolishes this
pattern without much churn by simply adding a constructor to schema
builder that takes identical parameters to `make_shared_schema()`,
then simply removing `make_shared_schema()` from these users, who now
build a schema builder object directly and build the schema only once.

Tests: unit(dev)
"

* 'schema-builder-make-shared-schema-ctor/v1' of https://github.com/denesb/scylla:
  treewide: system tables: don't use make_shared_schema() for creating schemas
  schema_builder: add a constructor providing make_shared_schema semantics
  schema_builder: without_column(): don't assume column_specification exists
  schema: add static variant of column_name_type()
2021-11-07 18:23:22 +02:00
Botond Dénes
e991604918 schema: make private constructor invokable via make_lw_shared
The schema has a private constructor, which means it can't be
constructed with `make_lw_shared()` even by classes which are otherwise
able to invoke the private constructor themselves.
This results in such classes (`schema_builder`) resorting to building a
local schema object, then invoking `make_lw_shared()` with the schema's
public move constructor. Moving a schema is not cheap at all however, so
each `schema_builder::build()` call results in two expensive schema
construction operations.
We could make `make_lw_shared()` a friend of `schema` to resolve this,
but then we'd de-facto open the private consctructor to the world.
Instead this patch introduces a private tag type, which is added to the
private constructor, which is then made public. Everybody can invoke the
constructor but only friends can create the private tag instance
required to actually call it.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211105085940.359708-1-bdenes@scylladb.com>
2021-11-07 12:51:09 +02:00
Botond Dénes
d3833c5978 schema: add static variant of column_name_type()
So schema_builder can use it too (without a schema instance at hand).
2021-11-05 11:41:04 +02:00
Raphael S. Carvalho
4950ce539c schema: replace outdated comment on default compaction strategy
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211104210043.199156-1-raphaelsc@scylladb.com>
2021-11-05 00:35:41 +02:00
Botond Dénes
3f4f408bcf schema: add get_reversed()
A variant of make_reversed() which goes through the schema registry,
teaching the schema to the registry if necessary. This effectively
caches the result of the reversing and as an added bonus double
reversing yields the very same schema C++ object that was the starting
point.

Closes #9365
2021-09-22 18:55:25 +03:00
Botond Dénes
f200c8104a schema: introduce make_reversed()
`make_revered()` creates a schema identical to the schema instance it is
called on, with clustering order reversed. To distinguish the reverse
schema from the original one, the node-id part of its version UUID is
bit-flipped. This ensures that reversing a schema twice will result in
the identical schema to the original one (although a different C++
object).

This reversed schema will be used in reversed reads, so intermediate
layers can be ignorant of the fact that the read happens in reverse.
2021-09-09 11:49:05 +03:00
Botond Dénes
9a9b58e67b schema: add a transforming copy constructor
Taking a transform functor, which is executed after the raw schema is
copied, but before the derivate fields are computed (rebuild()).
2021-09-09 11:49:05 +03:00
Asias He
6350a19f73 compaction: Move compaction_strategy.hh to compaction dir
The top dir is a mess. Move compaction_strategy.hh and
compaction_strategy_type.hh to the new home.
2021-08-07 08:06:37 +08:00
Pavel Solodovnikov
76bea23174 treewide: reduce header interdependencies
Use forward declarations wherever possible.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>

Closes #8813
2021-06-07 15:58:35 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Pavel Solodovnikov
e0749d6264 treewide: some random header cleanups
Eliminate not used includes and replace some more includes
with forward declarations where appropriate.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2021-06-06 19:18:49 +03:00
Piotr Jastrzebski
76d7c761d1 schema: Stop using deprecated constructor
This is another boring patch.

One of schema constructors has been deprecated for many years now but
was used in several places anyway. Usage of this constructor could
lead to data corruption when using MX sstables because this constructor
does not set schema version. MX reading/writing code depends on schema
version.

This patch replaces all the places the deprecated constructor is used
with schema_builder equivalent. The schema_builder sets the schema
version correctly.

Fixes #8507

Test: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <4beabc8c942ebf2c1f9b09cfab7668777ce5b384.1622357125.git.piotr@scylladb.com>
2021-05-30 11:58:27 +03:00
Pavel Solodovnikov
fff7ef1fc2 treewide: reduce boost headers usage in scylla header files
`dev-headers` target is also ensured to build successfully.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2021-05-20 01:33:18 +03:00
Pavel Solodovnikov
aa4c359cff column_mapping_entry: extract == and != operators
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20201016123638.99534-1-pa.solodovnikov@scylladb.com>
2020-10-16 14:59:50 +02:00
Pavel Solodovnikov
81cf11f8a0 schema: add equality operator for column_mapping class
Add a comparator for column mappings that will be used later
in unit-tests to check whether two column mappings match or not.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2020-10-15 19:24:44 +03:00
Pavel Solodovnikov
9aa4712270 lwt: introduce paxos_grace_seconds per-table option to set paxos ttl
Previously system.paxos TTL was set as max(3h, gc_grace_seconds).

Introduce new per-table option named `paxos_grace_seconds` to set
the amount of seconds which are used to TTL data in paxos tables
when using LWT queries against the base table.

Default value is equal to `DEFAULT_GC_GRACE_SECONDS`,
which is 10 days.

This change allows to easily test various issues related to paxos TTL.

Fixes #6284

Tests: unit (dev, debug)

Co-authored-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200816223935.919081-1-pa.solodovnikov@scylladb.com>
2020-08-17 16:44:14 +02:00
Nadav Har'El
7e01ae089e cdc: avoid including cdc/cdc_options.hh everywhere
Before this patch, modifying cdc/cdc_options.hh required recompiling 264
source files. This is because this header file was included by a couple
other header files - most notably schema.hh, where a forward declaration
would have been enough. Only the handful of source files which really
need to access the CDC options should include "cdc/cdc_options.hh" directly.

After this patch, modifying cdc/cdc_options.hh requires only 6 source files
to be recompiled.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200813070631.180192-1-nyh@scylladb.com>
2020-08-16 14:41:47 +03:00
Rafael Ávila de Espíndola
efeaded427 Everywhere: Add a make_shared_schema helper
This replaces a lot of make_lw_shared(schema(...)) with
make_shared_schema(...).

This makes it easier to drop a dependency on the differences between
seastar::make_shared and std::make_shared.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-07-21 10:33:49 -07:00
Pavel Emelyanov
f045cec586 snap: Get rid of storage_service reference in schema.cc
Now when the snapshot stopping is correctly handled, we may pull the database
reference all the way down to the schema::describe().

One tricky place is in table::napshot() -- the local db reference is pulled
through an smp::submit_to call, but thanks to the shard checks in the place
where it is needed the db is still "local"

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-26 20:28:25 +03:00
Glauber Costa
44a0e40cb2 compaction: move compaction_strategy_type to its own header
I just hit a circularity in header inclusion that I traced back to the
fact that schema.hh includes compaction_strategy.hh. schema.hh is in
turn included in lots of places, so a circularity is not hard to come
by.

The schema header really only needs to know about the compaction_type,
so it can inform schema users about it. Following the trend in header
clenups, I am moving that to a separate header which will both break
the circularity and make sure we are included less stuff that is not
needed.

With this change, Scylla fails to compile due to a new missing forward
declaration at index/secondary_index_manager.hh, so this is fixed.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200527172203.915936-1-glauber@scylladb.com>
2020-05-29 08:14:27 +03:00
Pekka Enberg
ed0d00f51e Revert "Revert "schema: Default dc_local_read_repair_chance to zero""
This reverts commit 43b488a7bc. The commit
was originally reverted because a dtest was sensitive to the value. The
dtest is fixed now, so let's revert the revert as requested by Glauber.
2020-05-21 08:05:13 +03:00
Pavel Solodovnikov
f6e765b70f cql3: pass column_specification via lw_shared_ptr
`column_specification` class is marked as "final": it's safe
to use non-polymorphic pointer "lw_shared_ptr" instead of a
more generic "shared_ptr".

tests: unit(dev, debug)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200427084016.26068-1-pa.solodovnikov@scylladb.com>
2020-04-27 12:47:42 +03:00
Pekka Enberg
43b488a7bc Revert "schema: Default dc_local_read_repair_chance to zero"
This reverts commit fdd2d9de3d because it
breaks one heat-weighted load balancing dtest:

FAIL: heat_weighted_load_balancing_cl_QUORUM_test (heat_weighted_load_balancing_test.HeatWeightedLB)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/penberg/src/scylla/scylla-dtest/heat_weighted_load_balancing_test.py", line 182, in heat_weighted_load_balancing_cl_QUORUM_test
    self.run_heat_weighted_load_balancing('QUORUM')
  File "/home/penberg/src/scylla/scylla-dtest/heat_weighted_load_balancing_test.py", line 165, in run_heat_weighted_load_balancing
    self.verify_metrics(metrics, cached=False)
  File "/home/penberg/src/scylla/scylla-dtest/heat_weighted_load_balancing_test.py", line 73, in verify_metrics
    mean_avg, node_mean_avg, key))
AssertionError: 19.0 not found in range(3, 13) : Cache difference between nodes is less then expected: 6469.6/328.2, metric scylla_storage_proxy_coordinator_reads_local_node

I am reverting because it's a test issue, and we should bring this
commit back once the test is fixed.

Gleb Natapov explains:

"dtest result directly depends on replicas we contact. Glauber's patch
make us contacts less replicas, so numbers differ."
2020-04-02 13:43:29 +03:00
Glauber Costa
fdd2d9de3d schema: Default dc_local_read_repair_chance to zero
dc_local_read_repair_chance is a legacy of old times: Cassandra itself
now defaults to zero, and we should look into that too.

Most serious production clusters are either repaired through our
asynchronous repair, or don't need repair at all.

Synchronous read repair can help things converging, but it implies an
impact at query time. For clusters that are on an asynchronous repair
schedule this should not be needed.

Fixes #6109

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200331183418.21452-1-glauber@scylladb.com>
2020-04-01 08:27:49 +02:00
Piotr Jastrzebski
e72696a8e6 sharding_info: rename the class to sharder
Also rename all variables that were named si or sinfo
to sharder.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:33 +02:00
Piotr Jastrzebski
92cdc21123 schema: remove incorrect comment
partitioner is actually part of schema digest and
is stored locally in internal tables.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:33 +02:00
Piotr Jastrzebski
7bd2b8d73f schema: make it possible to set sharding_info per schema
Previously schema::get_sharding_info was obtaining
sharding_info from the partitioner but we want to remove
sharding_info from the partitioner so we need a place
in schema to store it there instead.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:33 +02:00
Piotr Jastrzebski
8d81a2498f schema: add get_sharding_info
At the moment, we have a single sharding logic per node
but we want to be able to set it per table in the future.
To make it easy to change in the future sharding_info
will be managed inside schema and all the other code
will access it through schema::get_sharding_info function.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 09:35:27 +02:00
Nadav Har'El
35d95d6887 merge: Add postimage implementation
Merged pull request https://github.com/scylladb/scylla/pull/5996 from
Calle Wilund:

Fixes #4992

Implements post-image support by synthesizing it from
pre-image + delta.

Post-image data differs from the delta data in two ways:

1.) It merges non-atomics into an actual result value
2.) It contains all columns of the row, not just
those affected by the update.

For a non-atomic field, the post-image value of a column
is either the pre-image or the delta (maybe null)

Tested by adding post-image checks to pre-image test
and collection/udt tests
2020-03-16 13:42:07 +02:00
Calle Wilund
ca7046256f schema: Add "columns" accessor for columns by kind
To prevent switch-code everywhere.
2020-03-16 09:21:06 +00:00
Piotr Jastrzebski
5bbb826c49 schema: drop optional from _partitioner field
Always set the field to the default value if no
table specific partitioner has been set.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-15 10:25:21 +01:00
Piotr Jastrzebski
22daa262ee partitioner: move default_partitioner to schema.cc
Make it inaccessible to other compilation units.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-15 10:25:20 +01:00
Piotr Jastrzebski
57b69fb804 schema: include partitioner name in scylla tables mutation
There are two results of this patch:
1. New partitioner name column is persited on node's disk in scylla_tables
2. New partitioner name column is included into schema digest

This is achieved by including this new column in scylla tables mutation.
For that we:
1. Add partitioner name to the result of make_scylla_tables_mutation.
   If table does not have a specific partitioner set and uses default
   partitioner then we don't include the name of such default partitioner.
   Only the name of custom partitioner is added if a table has one.
2. In create_table_from_mutations we check whether scylla tables mutation
   has a partitioner name set. If so then we use it as a parameter for
   schema_builder.

Note that previous patches have ensured that this new column will be included
into schema digest only after the whole cluster supports per table partitioners.
Before that, during rolling upgrade, new partitioner name column is hidden and
not shared with other nodes.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-15 10:25:20 +01:00
Piotr Jastrzebski
1d6cec1b0a schema: make it possible to set custom partitioner
schema_builder::with_partitioner can be used now to
set custom partitioner on a table.
If no such partitioner is set, global partitioner is
still used.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-15 10:25:20 +01:00
Piotr Jastrzebski
54d24553bb schema: get_partitioner return const&
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-06 13:33:53 +01:00
Piotr Dulikowski
861c7b5626 schema: get cdc options from schema extensions
Removes logic responsible for setting cdc_options from dedicated column
in scylla_tables, and uses the "cdc" schema extension instead.
2020-03-05 16:11:21 +01:00
Rafael Ávila de Espíndola
151f5e723f Pass string_view to the schema constructor
This moves string copies from the callers of the constructor to the
implementation.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-02-28 17:04:12 -08:00
Piotr Jastrzebski
9b95153136 schema: add get_partitioner()
The plan is to remove dht::global_partitioner()
and use schema::get_partitioner() instead.

This will allow a usage of per schema/table partitioner
instead of a single global partitioner everywhere.

Initially schema::get_partitioner will call
dht::global_partitioner. After all the calls
to dht::global_partitioner are switched to
schema::get_partitioner, the ability to set per schema
partitioner will be implemented.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:04:41 +01:00
Nadav Har'El
9953a33354 merge "Adding a schema file when creating a snapshot"
Merged pull request https://github.com/scylladb/scylla/pull/5294 from
Amnon Heiman:

To use a snapshot we need a schema file that is similar to the result of
running cql DESCRIBE command.

The DESCRIBE is implemented in the cql driver so the functionality needs
to be re-implemented inside scylla.

This series adds a describe method to the schema file and use it when doing
a snapshot.

There are different approach of how to handle materialize views and
secondary indexes.

This implementation creates each schema.cql file in its own relevant
directory, so the schema for materializing view, for example, will be
placed in the snapshot directory of the table of that view.

Fixes #4192
2020-01-16 12:05:50 +02:00
Amnon Heiman
82367b325a schema: Add a describe method
This patch adds a describe method to a table schema.

It acts similar to a DESCRIBE cql command that is implemented in a CQL
driver.

The method supports tables, secondary indexes local indexes and
materialize views.

relates to: #4192

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-01-15 15:06:00 +02:00
Gleb Natapov
16e0fc4742 schema: allow schema to be marked as 'always sync to commitlog'
All writes that uses this schema will be immediately persisted on a
storage.
2020-01-15 12:15:42 +02:00
Calle Wilund
2787b0c4f8 cdc: Move "options" to separate header to avoid to much header inclusion
cdc should not contaminate the whole universe.
2019-12-09 12:12:09 +00:00
Konstantin Osipov
6159c012db schema: pre-allocate the bitset of column_set
The number of columns is usually small, and avoiding
a resize speeds up bit manipulation functions.
2019-11-13 11:41:51 +03:00
Konstantin Osipov
e95d675567 schema: introduce schema::all_columns_count()
schema::all_columns_count() will be used to reserve
memory of the column_set bitmask.
2019-11-13 11:41:42 +03:00
Konstantin Osipov
191acec7ab schema: rename column_mask to column_set
Since it contains a precise set of columns, it's more
accurate to call it a set, not a mask. Besides, the name
column_mask is already used for column options on storage
level.
2019-11-13 11:41:30 +03:00
Nadav Har'El
631846a852 CDC: Implement minimal version that logs only primary key of each change
Merge a patch series from Piotr Jastrzębski (haaawk):

This PR introduces CDC in it's minimal version.

It is possible now to create a table with CDC enabled or to enable/disable
CDC on existing table. There is a management of CDC log and description
related to enabling/disabling CDC for a table.

For now only primary key of the changed data is logged.

To be able to co-locate cdc streams with related base table partitions it
was needed to propagate the information about the number of shards per node.
This was node through gossip.

There is an assumption that all the nodes use the same value for
sharding_ignore_msb_bits. If it does not hold we would have to gossip
sharding_ignore_msb_bits around together with the number of shards.

Fixes #4986.

Tests: unit(dev, release, debug)
2019-10-20 11:41:01 +03:00
Piotr Jastrzebski
ca9536a771 schema: add _cdc_options field
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-10-17 10:55:31 +02:00