Compare commits

...

76 Commits

Author SHA1 Message Date
Avi Kivity
d112a230c0 Merge 'Fix hang in multishard_writer' from Asias
"
This series fix hang in multishard_writer when error happens. It contains
- multishard_writer: Abort the queue attached to consumers when producer fails
- repair: Fix hang when the writer is dead

Fixes #6241
Refs: #6248
"

* asias-stream_fix_multishard_writer_hang:
  repair: Fix hang when the writer is dead
  mutation_writer_test: Add test_multishard_writer_producer_aborts
  multishard_writer: Abort the queue attached to consumers when producer fails

(cherry picked from commit 8925e00e96)
2020-05-02 07:35:46 +03:00
Raphael S. Carvalho
4371cb41d0 api/service: fix segfault when taking a snapshot without keyspace specified
If no keyspace is specified when taking snapshot, there will be a segfault
because keynames is unconditionally dereferenced. Let's return an error
because a keyspace must be specified when column families are specified.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200427195634.99940-1-raphaelsc@scylladb.com>
(cherry picked from commit 02e046608f)

Fixes #6336.
2020-04-30 12:57:39 +03:00
Botond Dénes
a8b9f94dcb schema: schema(): use std::stable_sort() to sort key columns
When multiple key columns (clustering or partition) are passed to
the schema constructor, all having the same column id, the expectation
is that these columns will retain the order in which they were passed to
`schema_builder::with_column()`. Currently however this is not
guaranteed as the schema constructor sort key columns by column id with
`std::sort()`, which doesn't guarantee that equally comparing elements
retain their order. This can be an issue for indexes, the schemas of
which are built independently on each node. If there is any room for
variance between for the key column order, this can result in different
nodes having incompatible schemas for the same index.
The fix is to use `std::stable_sort()` which guarantees that the order
of equally comparing elements won't change.

This is a suspected cause of #5856, although we don't have hard proof.

Fixes: #5856
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
[avi: upgraded "Refs" to "Fixes", since we saw that std::sort() becomes
      unstable at 17 elements, and the failing schema had a
      clustering key with 23 elements]
Message-Id: <20200417121848.1456817-1-bdenes@scylladb.com>
(cherry picked from commit a4aa753f0f)
2020-04-19 18:25:09 +03:00
Hagit Segev
77500f9171 release: prepare for 3.2.5 2020-04-18 18:57:59 +03:00
Kamil Braun
13328e7253 sstables: freeze types nested in collection types in legacy sstables
Some legacy `mc` SSTables (created in Scylla 3.0) may contain incorrect
serialization headers, which don't wrap frozen UDTs nested inside collections
with the FrozenType<...> tag. When reading such SSTable,
Scylla would detect a mismatch between the schema saved in schema
tables (which correctly wraps UDTs in the FrozenType<...> tag) and the schema
from the serialization header (which doesn't have these tags).

SSTables created in Scylla versions 3.1 and above, in particular in
Scylla versions that contain this commit, create correct serialization
headers (which wrap UDTs in the FrozenType<...> tag).

This commit does two things:
1. for all SSTables created after this commit, include a new feature
   flag, CorrectUDTsInCollections, presence of which implies that frozen
   UDTs inside collections have the FrozenType<...> tag.
2. when reading a Scylla SSTable without the feature flag, we assume that UDTs
   nested inside collections are always frozen, even if they don't have
   the tag. This assumption is safe to be made, because at the time of
   this commit, Scylla does not allow non-frozen (multi-cell) types inside
   collections or UDTs, and because of point 1 above.

There is one edge case not covered: if we don't know whether the SSTable
comes from Scylla or from C*. In that case we won't make the assumption
described in 2. Therefore, if we get a mismatch between schema and
serialization headers of a table which we couldn't confirm to come from
Scylla, we will still reject the table. If any user encounters such an
issue (unlikely), we will have to use another solution, e.g. using a
separate tool to rewrite the SSTable.

Fixes #6130.

[avi: adjusted sstable file paths]
(cherry picked from commit 3d811e2f95)
2020-04-17 09:53:17 +03:00
Kamil Braun
79b58f89f1 sstables: move definition of column_translation::state::build to a .cc file
Ref #6130
2020-04-17 09:16:28 +03:00
Asias He
ba2821ec70 gossip: Add an option to force gossip generation
Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation
number g1, g2, g3.

n1, n2, n3 running scylla version with commit
0a52ecb6df (gossip: Fix max generation
drift measure)

One year later, user wants the upgrade n1,n2,n3 to a new version

when n3 does a rolling restart with a new version, n3 will use a
generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and
g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's
gossip update and mark g3 as down.

Such unnecessary marking of node down can cause availability issues.
For example:

DC1: n1, n2
DC2: n3, n4

When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which
causes the whole DC2 to be unavailable.

To fix, we can start the node with a gossip generation within
MAX_GENERATION_DIFFERENCE difference for the new node.

Once all the nodes run the version with commit
0a52ecb6df, the option is no logger
needed.

Fixes #5164

(cherry picked from commit 743b529c2b)
2020-03-27 12:50:23 +01:00
Asias He
d72555e786 gossiper: Always use the new generation number
User reported an issue that after a node restart, the restarted node
is marked as DOWN by other nodes in the cluster while the node is up
and running normally.

Consier the following:

- n1, n2, n3 in the cluster
- n3 shutdown itself
- n3 send shutdown verb to n1 and n2
- n1 and n2 set n3 in SHUTDOWN status and force the heartbeat version to
  INT_MAX
- n3 restarts
- n3 sends gossip shadow rounds to n1 and n2, in
  storage_service::prepare_to_join,
- n3 receives response from n1, in gossiper::handle_ack_msg, since
  _enabled = false and _in_shadow_round == false, n3 will apply the
  application state in fiber1, filber 1 finishes faster filber 2, it
  sets _in_shadow_round = false
- n3 receives response from n2, in gossiper::handle_ack_msg, since
  _enabled = false and _in_shadow_round == false, n3 will apply the
  application state in fiber2, filber 2 yields
- n3 finishes the shadow round and continues
- n3 resets gossip endpoint_state_map with
  gossiper.reset_endpoint_state_map()
- n3 resumes fiber 2, apply application state about n3 into
  endpoint_state_map, at this point endpoint_state_map contains
  information including n3 itself from n2.
- n3 calls gossiper.start_gossiping(generation_number, app_states, ...)
  with new generation number generated correctly in
  storage_service::prepare_to_join, but in
  maybe_initialize_local_state(generation_nbr), it will not set new
  generation and heartbeat if the endpoint_state_map contains itself
- n3 continues with the old generation and heartbeat learned in fiber 2
- n3 continues the gossip loop, in gossiper::run,
  hbs.update_heart_beat() the heartbeat is set to the number starting
  from 0.
- n1 and n2 will not get update from n3 because they use the same
  generation number but n1 and n2 has larger heartbeat version
- n1 and n2 will mark n3 as down even if n3 is alive.

To fix, always use the the new generation number.

Fixes: #5800
Backports: 3.0 3.1 3.2
(cherry picked from commit 62774ff882)
2020-03-27 12:50:20 +01:00
Hagit Segev
4c38534f75 release: prepare for 3.2.4 2020-03-25 10:12:29 +02:00
Gleb Natapov
a092f5d1f4 transport: pass tracing state explicitly instead of relying on it been in the client_state
Multiple requests can use the same client_state simultaneously, so it is
not safe to use it as a container for a tracing state which is per request.
Currently next request may overwrite tracing state for previous one
causing, in a best case, wrong trace to be taken or crash if overwritten
pointer is freed prematurely.

Fixes #6014

(cherry picked from commit 866c04dd64)

Message-Id: <20200324144003.GA20781@scylladb.com>
2020-03-24 16:55:46 +02:00
Piotr Sarna
723fd50712 cql: fix qualifying indexed columns for filtering
When qualifying columns to be fetched for filtering, we also check
if the target column is not used as an index - in which case there's
no need of fetching it. However, the check was incorrectly assuming
that any restriction is eligible for indexing, while it's currently
only true for EQ. The fix makes a more specific check and contains
many dynamic casts, but these will hopefully we gone once our
long planned "restrictions rewrite" is done.
This commit comes with a test.

Fixes #5708
Tests: unit(dev)

(cherry picked from commit 767ff59418)
2020-03-22 09:47:12 +01:00
Konstantin Osipov
89deac7795 locator: correctly select endpoints if RF=0
SimpleStrategy creates a list of endpoints by iterating over the set of
all configured endpoints for the given token, until we reach keyspace
replication factor.
There is a trivial coding bug when we first add at least one endpoint
to the list, and then compare list size and replication factor.
If RF=0 this never yields true.
Fix by moving the RF check before at least one endpoint is added to the
list.
Cassandra never had this bug since it uses a less fancy while()
loop.

Fixes #5962
Message-Id: <20200306193729.130266-1-kostja@scylladb.com>

(cherry picked from commit ac6f64a885)
2020-03-12 12:10:27 +02:00
Avi Kivity
3843e5233c logalloc: increase capacity of _regions vector outside reclaim lock
Reclaim consults the _regions vector, so we don't want it moving around while
allocating more capacity. For that we take the reclaim lock. However, that
can cause a false-positive OOM during startup:

1. all memory is allocated to LSA as part of priming (2baa16b371)
2. the _regions vector is resized from 64k to 128k, requiring a segment
   to be freed (plenty are free)
3. but reclaiming_lock is taken, so we cannot reclaim anything.

To fix, resize the _regions vector outside the lock.

Fixes #6003.
Message-Id: <20200311091217.1112081-1-avi@scylladb.com>

(cherry picked from commit c020b4e5e2)
2020-03-12 11:25:34 +02:00
Benny Halevy
1b3c78480c dist/redhat: scylla.spec.mustache: set _no_recompute_build_ids
By default, `/usr/lib/rpm/find-debuginfo.sh` will temper with
the binary's build-id when stripping its debug info as it is passed
the `--build-id-seed <version>.<release>` option.

To prevent that we need to set the following macros as follows:
  unset `_unique_build_ids`
  set `_no_recompute_build_ids` to 1

Fixes #5881

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 25a763a187)
2020-03-09 15:22:09 +02:00
Yaron Kaikov
48253eb183 release: prepare for 3.2.3 2020-03-04 14:18:38 +02:00
Takuya ASADA
5d60522c81 dist/debian: fix "unable to open node-exporter.service.dpkg-new" error
It seems like *.service is conflicting on install time because the file
installed twice, both debian/*.service and debian/scylla-server.install.

We don't need to use *.install, so we can just drop the line.

Fixes #5640

(cherry picked from commit 29285b28e2)
2020-03-02 11:32:30 +02:00
Benny Halevy
63e93110d1 gossiper: do_stop_gossiping: copy live endpoints vector
It can be resized asynchronously by mark_dead.

Fixes #5701

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200203091344.229518-1-bhalevy@scylladb.com>
(cherry picked from commit f45fabab73)
2020-02-26 13:00:22 +02:00
Gleb Natapov
83105efba8 commitlog: use commitlog IO scheduling class for segment zeroing
There may be other commitlog writes waiting for zeroing to complete, so
not using proper scheduling class causes priority inversion.

Fixes #5858.

Message-Id: <20200220102939.30769-2-gleb@scylladb.com>
(cherry picked from commit 6a78cc9e31)
2020-02-26 12:51:29 +02:00
Benny Halevy
5840eb602a storage_service: drain_on_shutdown: unregister storage_proxy subscribers from local_storage_service
Match subscription done in main() and avoid cross shard access
to _lifecycle_subscribers vector.

Fixes #5385

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Acked-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200123092817.454271-1-bhalevy@scylladb.com>
(cherry picked from commit 5b0ea4c114)
2020-02-25 16:40:12 +02:00
Avi Kivity
61738999ea Revert "streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations"
This reverts commit dbf72c72b3. It exposes a data
resurrection bug (#5838).
2020-02-24 10:04:06 +02:00
Piotr Dulikowski
0b23e7145d hh: handle counter update hints correctly
This patch fixes a bug that appears because of an incorrect interaction
between counters and hinted handoff.

When a counter is updated on the leader, it sends mutations to other
replicas that contain all counter shards from the leader. If consistency
level is achieved but some replicas are unavailable, a hint with
mutation containing counter shards is stored.

When a hint's destination node is no longer its replica, it is attempted
to be sent to all its current replicas. Previously,
storage_proxy::mutate was used for that purpose. It was incorrect
because that function treats mutations for counter tables as mutations
containing only a delta (by how much to increase/decrease the counter).
These two types of mutations have different serialization format, so in
this case a "shards" mutation is reinterpreted as "delta" mutation,
which can cause data corruption to occur.

This patch backports `storage_proxy::mutate_hint_from_scratch`
function, which bypasses special handling of counter mutations and
treats them as regular mutations - which is the correct behavior for
"shards" mutations.

Refs #5833.
Backports: 3.1, 3.2, 3.3
Tests: unit(dev)
(cherry picked from commit ec513acc49)
2020-02-19 16:51:24 +02:00
Hagit Segev
3374aa20bb release: prepare for 3.2.2 2020-02-18 15:19:57 +02:00
Avi Kivity
c4e89ea1b0 Merge "cql3: time_uuid_fcts: validate time UUID" from Benny
"
Throw an error in case we hit an invalid time UUID
rather than hitting an assert.

Fixes #5552

(Ref #5588 that was dequeued and fixed here)

Test: UUID_test, cql_query_test(debug)
"

* 'validate-time-uuid' of https://github.com/bhalevy/scylla:
  cql3: abstract_function_selector: provide assignment_testable_source_context
  test: cql_query_test: add time uuid validation tests
  cql3: time_uuid_fcts: validate timestamp arg
  cql3: make_max_timeuuid_fct: delete outdated FIXME comment
  cql3: time_uuid_fcts: validate time UUID
  test: UUID_test: add tests for time uuid
  utils: UUID: create_time assert nanos_since validity
  utils/UUID_gen: make_nanos_since
  utils: UUID: assert UUID.is_timestamp

(cherry picked from commit 3343baf159)

Conflicts:
	cql3/functions/time_uuid_fcts.hh
	tests/cql_query_test.cc
2020-02-17 20:05:38 +02:00
Piotr Sarna
26d9ce6b98 db,view: fix generating view updates for partition tombstones
The update generation path must track and apply all tombstones,
both from the existing base row (if read-before-write was needed)
and for the new row. One such path contained an error, because
it assumed that if the existing row is empty, then the update
can be simply generated from the new row. However, lack of the
existing row can also be the result of a partition/range tombstone.
If that's the case, it needs to be applied, because it's entirely
possible that this partition row also hides the new row.
Without taking the partition tombstone into account, creating
a future tombstone and inserting an out-of-order write before it
in the base table can result in ghost rows in the view table.
This patch comes with a test which was proven to fail before the
changes.

Branches 3.1,3.2,3.3
Fixes #5793

Tests: unit(dev)
Message-Id: <8d3b2abad31572668693ab585f37f4af5bb7577a.1581525398.git.sarna@scylladb.com>
(cherry picked from commit e93c54e837)
2020-02-16 18:19:28 +02:00
Avi Kivity
6d1a4e2c0b Update seastar submodule
* seastar dab9f10e76...c8668e98bd (1):
  > config: Do not allow zero rates

Fixes #5360.
2020-02-16 17:01:59 +02:00
Benny Halevy
fad143a441 repair: initialize row_level_repair: _zero_rows
Avoid following UBSAN error:
repair/row_level.cc:2141:7: runtime error: load of value 240, which is not a valid value for type 'bool'

Fixes #5531

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 474ffb6e54)
2020-02-16 16:11:17 +02:00
Rafael Ávila de Espíndola
bc07b877a5 main: Explicitly allow scylla core dumps
I have not looked into the security reason for disabling it when
a program has file capabilities.

Fixes #5560

[avi: remove extraneous semicolon]
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200106231836.99052-1-espindola@scylladb.com>
(cherry picked from commit b80852c447)
2020-02-16 16:03:51 +02:00
Gleb Natapov
09ad011f98 lwt: fix write timeout exception reporting
CQL transport code relies on an exception's C++ type to create correct
reply, but in lwt we converted some mutation_timeout exceptions to more
generic request_timeout while forwarding them which broke the protocol.
Do not drop type information.

Fixes #5598.

Message-Id: <20200115180313.GQ9084@scylladb.com>
(cherry picked from commit 51281bc8ad)
2020-02-16 16:00:38 +02:00
Avi Kivity
b34973df4e tools: toolchain: dbuild: relax process limit in container
Docker restricts the number of processes in a container to some
limit it calculates. This limit turns out to be too low on large
machines, since we run multiple links in parallel, and each link
runs many threads.

Remove the limit by specifying --pids-limit -1. Since dbuild is
meant to provide a build environment, not a security barrier,
this is okay (the container is still restricted by host limits).

I checked that --pids-limit is supported by old versions of
docker and by podman.

Fixes #5651.
Message-Id: <20200127090807.3528561-1-avi@scylladb.com>

(cherry picked from commit 897320f6ab)
2020-02-16 15:41:18 +02:00
Pavel Solodovnikov
d65e2ac6af lwt: fix handling of nulls in parameter markers for LWT queries
This patch affects the LWT queries with IF conditions of the
following form: `IF col in :value`, i.e. if the parameter
marker is used.

When executing a prepared query with a bound value
of `(None,)` (tuple with null, example for Python driver), it is
serialized not as NULL but as "empty" value (serialization
format differs in each case).

Therefore, Scylla deserializes the parameters in the request as
empty `data_value` instances, which are, in turn, translated
to non-empty `bytes_opt` with empty byte-string value later.

Account for this case too in the CAS condition evaluation code.

Example of a problem this patch aims to fix:

Suppose we have a table `tbl` with a boolean field `test` and
INSERT a row with NULL value for the `test` column.

Then the following update query fails to apply due to the
error in IF condition evaluation code (assume `v=(null)`):
`UPDATE tbl SET test=false WHERE key=0 IF test IN :v`
returns false in `[applied]` column, but is expected to succeed.

Tests: unit(debug, dev), dtest(prepared stmt LWT tests at https://github.com/scylladb/scylla-dtest/pull/1286)

Fixes: #5710

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200205102039.35851-1-pa.solodovnikov@scylladb.com>
(cherry picked from commit bcc4647552)
2020-02-16 15:37:44 +02:00
Asias He
dbf72c72b3 streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations
The table::flush_streaming_mutations is used in the days when streaming
data goes to memtable. After switching to the new streaming, data goes
to sstables directly in streaming, so the sstables generated in
table::flush_streaming_mutations will be empty.

It is unnecessary to invalidate the cache if no sstables are added. To
avoid unnecessary cache invalidating which pokes hole in the cache, skip
calling _cache.invalidate() if the sstables is empty.

The steps are:

- STREAM_MUTATION_DONE verb is sent when streaming is done with old or
  new streaming
- table::flush_streaming_mutations is called in the verb handler
- cache is invalidated for the streaming ranges

In summary, this patch will avoid a lot of cache invalidation for
streaming.

Backports: 3.0 3.1 3.2
Fixes: #5769
(cherry picked from commit 5e9925b9f0)
2020-02-16 15:16:37 +02:00
Botond Dénes
b542b9c89a row: append(): downgrade assert to on_internal_error()
This assert, added by 060e3f8 is supposed to make sure the invariant of
the append() is respected, in order to prevent building an invalid row.
The assert however proved to be too harsh, as it converts any bug
causing out-of-order clustering rows into cluster unavailability.
Downgrade it to on_internal_error(). This will still prevent corrupt
data from spreading in the cluster, without the unavailability caused by
the assert.

Fixes: #5786
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200211083829.915031-1-bdenes@scylladb.com>
(cherry picked from commit 3164456108)
2020-02-16 15:12:22 +02:00
Takuya ASADA
c0e493edcc dist/debian: keep /etc/systemd .conf files on 'remove'
Since dpkg does not re-install conffiles when it removed by user,
currently we are missing dependencies.conf and sysconfdir.conf on rollback.
To prevent this, we need to stop running
'rm -rf /etc/systemd/system/scylla-server.service.d/' on 'remove'.

Fixes #5734

(cherry picked from commit 43097854a5)
2020-02-12 14:27:00 +02:00
Takuya ASADA
88718996ed scylla_post_install.sh: fix 'integer expression expected' error
awk returns float value on Debian, it causes postinst script failure
since we compare it as integer value.
Replaced with sed + bash.

Fixes #5569

(cherry picked from commit 5627888b7c)
2020-02-04 14:30:28 +02:00
Gleb Natapov
97236a2cee db/system_keyspace: use user memory limits for local.paxos table
Treat writes to local.paxos as user memory, as the number of writes is
dependent on the amount of user data written with LWT.

Fixes #5682

Message-Id: <20200130150048.GW26048@scylladb.com>
(cherry picked from commit b08679e1d3)
2020-02-02 17:37:04 +02:00
Rafael Ávila de Espíndola
6c272b48f5 types: Fix encoding of negative varint
We would sometimes produce an unnecessary extra 0xff prefix byte.

The new encoding matches what cassandra does.

This was both a efficiency and correctness issue, as using varint in a
key could produce different tokens.

Fixes #5656

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
(cherry picked from commit c89c90d07f)
2020-02-02 16:46:27 +02:00
Avi Kivity
6a8ae87efa test: make eventually() more patient
We use eventually() in tests to wait for eventually consistent data
to become consistent. However, we see spurious failures indicating
that we wait too little.

Increasing the timeout has a negative side effect in that tests that
fail will now take longer to do so. However, this negative side effect
is negligible to false-positive failures, since they throw away large
test efforts and sometimes require a person to investigate the problem,
only to conclude it is a false positive.

This patch therefore makes eventually() more patient, by a factor of
32.

Fixes #4707.
Message-Id: <20200130162745.45569-1-avi@scylladb.com>

(cherry picked from commit ec5b721db7)
2020-02-01 13:21:38 +02:00
Pekka Enberg
d24d9d037e Update seastar submodule
* seastar acd63c47...dab9f10e (1):
  > perftune.py: Use safe_load() for fix arbitrary code execution
Fixes #5630
2020-01-30 16:42:51 +02:00
Dejan Mircevski
43766bd453 config: Remove UDF from experimental_features_t
Scylla 3.2 doesn't support UDF, so do not accept UDF as a valid option
to experimental_features.

Fixes #5645.

No fix is needed on master, which does support UDF.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-01-28 19:28:20 +02:00
Takuya ASADA
ddd8f9b1d1 dist/debian: Use tilde for release candidate builds
We need to add '~' to handle rcX version correctly on Debian variants
(merged at ae33e9f), but when we moved to relocated package we mistakenly
dropped the code, so add the code again.

Fixes #5641

(cherry picked from commit dd81fd3454)
2020-01-28 18:35:17 +02:00
Yaron Kaikov
e3e301906d release: prepare for 3.2.1 2020-01-22 17:02:42 +02:00
Piotr Sarna
2c822d4c1f db,view: fix checking for secondary index special columns
A mistake in handling legacy checks for special 'idx_token' column
resulted in not recognizing materialized views backing secondary
indexes properly. The mistake is really a typo, but with bad
consequences - instead of checking the view schema for being an index,
we asked for the base schema, which is definitely not an index of
itself.

Branches 3.1,3.2 (asap)
Fixes #5621
Fixes #4744

(cherry picked from commit 9b379e3d63)
2020-01-21 23:23:24 +02:00
Avi Kivity
04f8800b5b Update seastar submodule
* seastar 8e236efda...acd63c479 (1):
  > inet_address: Make inet_address == operator ignore scope (again)

Fixes #5225.
2020-01-21 13:44:41 +02:00
Asias He
a72a06d3b7 repair: Avoid duplicated partition_end write
Consider this:

1) Write partition_start of p1
2) Write clustering_row of p1
3) Write partition_end of p1
4) Repair is stopped due to error before writing partition_start of p2
5) Repair calls repair_row_level_stop() to tear down which calls
   wait_for_writer_done(). A duplicate partition_end is written.

To fix, track the partition_start and partition_end written, avoid
unpaired writes.

Backports: 3.1 and 3.2
Fixes: #5527
(cherry picked from commit 401854dbaf)
2020-01-21 13:38:57 +02:00
Avi Kivity
f9b11c9b30 cql3: update_statement: do not set query option always_return_static_content for list read-before-write
The query option always_return_static_content was added for lightweight
transations in commits e0b31dd273 (infrastructure) and 65b86d155e
(actual use). However, the flag was added unconditionally to
update_parameters::options. This caused it to be set for list
read-modify-write operations, not just for lightweight transactions.
This is a little wasteful, and worse, it breaks compatibility as old
nodes do not understand the always_return_static_content flag and
complain when they see it.

To fix, remove the always_return_static_content from
update_parameters::options and only set it from compare-and-swap
operations that are used to implement lightweight transactions.

Fixes #5593.

Reviewed-by: Gleb Natapov <gleb@scylladb.com>
Message-Id: <20200114135133.2338238-1-avi@scylladb.com>
(cherry picked from commit 6c84dd0045)
2020-01-15 09:15:55 +02:00
Hagit Segev
798357f656 release: prepare for 3.2.0 2020-01-13 14:10:46 +02:00
Takuya ASADA
7eb86fbbb4 docker: fix typo of scylla-jmx script path (#5551)
The path should /opt/scylladb/jmx, not /opt/scylladb/scripts/jmx.

Fixes #5542

(cherry picked from commit 238a25a0f4)
2020-01-08 10:51:34 +02:00
Nadav Har'El
edf431f581 merge: CDC rolling upgrade
Merged pull request https://github.com/scylladb/scylla/pull/5538 from
Avi Kivity and Piotr Jastrzębski.

This series prepares CDC for rolling upgrade. This consists of
reducing the footprint of cdc, when disabled, on the schema, adding
a cluster feature, and redacting the cdc column when transferring
it to other nodes. The latter is needed because we'll want to backport
this to 3.2, which doesn't have canonical_mutations yet.

Fixes #5191.

(cherry picked from commit f0d8dd4094)
2020-01-07 08:19:11 +02:00
Avi Kivity
3f358c9772 tests: schema_change_test: add ability to adjust the schema that we test
This is part of original commit 52b48b415c
("Test that schema digests with UDFs don't change"). It is needed to
test tables with CDC enabled.

Ref #5191.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-01-07 08:13:30 +02:00
Yaron Kaikov
c8d5738a48 release: prepare for 3.2.rc4 2020-01-01 15:19:46 +02:00
Takuya ASADA
de5c06414b dist: stop replacing /usr/lib/scylla with symlink (#5530)
Since we merged /usr/lib/scylla with /opt/scylladb, we removed
/usr/lib/scylla and replace it with the symlink point to /opt/scylladb.
However, RPM does not support replacing a directory with a symlink,
we are doing some dirty hack using RPM scriptlet, but it causes
multiple issues on upgrade/downgrade.
(See: https://docs.fedoraproject.org/en-US/packaging-guidelines/Directory_Replacement/)

To minimize Scylla upgrading/downgrade issues on user side, it's better
to keep /usr/lib/scylla directory.
Instead of creating single symlink /usr/lib/scylla -> /opt/scylladb,
we can create symlinks for each setup scripts like
/usr/lib/scylla/<script> -> /opt/scylladb/scripts/<script>.

Fixes #5522
Fixes #4585
Fixes #4611

(cherry picked from commit 263385cb4b)
2019-12-30 22:02:29 +02:00
Benny Halevy
de314dfe30 tracing: one_session_records: keep local tracing ptr
Similar to trace_state keep shared_ptr<tracing> _local_tracing_ptr
in one_session_records when constructed so it can be used
during shutdown.

Fixes #5243

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 7aef39e400)
2019-12-24 18:42:07 +02:00
Avi Kivity
40a077bf93 database: fix schema use-after-move in make_multishard_streaming_reader
On aarch64, asan detected a use-after-move. It doesn't happen on x86_64,
likely due to different argument evaluation order.

Fix by evaluating full_slice before moving the schema.

Note: I used "auto&&" and "std::move()" even though full_slice()
returns a reference. I think this is safer in case full_slice()
changes, and works just as well with a reference.

Fixes #5419.

(cherry picked from commit 85822c7786)
2019-12-24 18:34:46 +02:00
Pavel Solodovnikov
d488e762cf LWT: Fix required participants calculation for LOCAL_SERIAL CL
Suppose we have a multi-dc setup (e.g. 9 nodes distributed across
3 datacenters: [dc1, dc2, dc3] -> [3, 3, 3]).

When a query that uses LWT is executed with LOCAL_SERIAL consistency
level, the `storage_proxy::get_paxos_participants` function
incorrectly calculates the number of required participants to serve
the query.

In the example above it's calculated to be 5 (i.e. the number of
nodes needed for a regular QUORUM) instead of 2 (for LOCAL_SERIAL,
which is equivalent to LOCAL_QUORUM cl in this case).

This behavior results in an exception being thrown when executing
the following query with LOCAL_SERIAL cl:

INSERT INTO users (userid, firstname, lastname, age) VALUES (0, 'first0', 'last0', 30) IF NOT EXISTS

Unavailable: Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level for cl LOCAL_SERIAL. Requires 5, alive 3" info={'required_replicas': 5, 'alive_replicas': 3, 'consistency': 'LOCAL_SERIAL'}

Tests: unit(dev), dtest(consistency_test.py)

Fixes #5477.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20191216151732.64230-1-pa.solodovnikov@scylladb.com>
(cherry picked from commit c451f6d82a)
2019-12-23 15:21:32 +02:00
Asias He
637d80ffcf repair: Do not return working_row_buf_nr in get combined row hash verb
In commit b463d7039c (repair: Introduce
get_combined_row_hash_response), working_row_buf_nr is returned in
REPAIR_GET_COMBINED_ROW_HASH in addition to the combined hash. It is
scheduled to be part of 3.1 release. However it is not backported to 3.1
by accident.

In order to be compatible between 3.1 and 3.2 repair. We need to drop
the working_row_buf_nr in 3.2 release.

Fixes: #5490
Backports: 3.2
Tests: Run repair in a mixed 3.1 and 3.2 cluster
(cherry picked from commit 7322b749e0)
2019-12-22 15:53:18 +02:00
Hagit Segev
39b17be562 release: prepare for 3.2.rc3 2019-12-15 10:33:13 +02:00
Dejan Mircevski
e54df0585e cql3: Fix needs_filtering() for clustering columns
The LIKE operator requires filtering, so needs_filtering() must check
is_LIKE().  This already happens for partition columns, but it was
overlooked for clustering columns in the initial implementation of
LIKE.

Fixes #5400.

Tests: unit(dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
(cherry picked from commit 27b8b6fe9d)
2019-12-12 14:40:09 +02:00
Avi Kivity
7d113bd1e9 Merge "Add experimental_features option" from Dejan
"
Add --experimental-features -- a vector of features to unlock. Make corresponding changes in the YAML parser.

Fixes #5338
"

* 'vecexper' of https://github.com/dekimir/scylla:
  config: Add `experimental_features` option
  utils: Add enum_option

(cherry picked from commit 63474a3380)
2019-12-12 14:39:42 +02:00
Avi Kivity
2e7cd77bc4 Update seastar submodule
* seastar 8837a3fdf1...8e236efda9 (1):
  > Merge "reactor: fix iocb pool underflow due to unaccounted aio fsync" from Avi

Fixes #5443.
2019-12-12 14:17:32 +02:00
Cem Sancak
0a38d2b0ee Fix DPDK mode in prepare script
Fixes #5455.

(cherry picked from commit 86b8036502)
2019-12-12 14:15:37 +02:00
Piotr Sarna
2ff26d1160 table: Reduce read amplification in view update generation
This commit makes sure that single-partition readers for
read-before-write do not have fast-forwarding enabled,
as it may lead to huge read amplification. The observed case was:
1. Creating an index.
  CREATE INDEX index1  ON myks2.standard1 ("C1");
2. Running cassandra-stress in order to generate view updates.
cassandra-stress write no-warmup n=1000000 cl=ONE -schema \
  'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \
  keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors
  skip-read-validation -node 127.0.0.1;

Without disabling fast-forwarding, single-partition readers
were turned into scanning readers in cache, which resulted
in reading 36GB (sic!) on a workload which generates less
than 1GB of view updates. After applying the fix, the number
dropped down to less than 1GB, as expected.

Refs #5409
Fixes #4615
Fixes #5418

(cherry picked from commit 79c3a508f4)
2019-12-05 22:35:05 +02:00
Calle Wilund
9dd714ae64 commitlog_replayer: Ensure applied frozen_mutation is safe during apply
Fixes #5211

In 79935df959 replay apply-call was
changed from one with no continuation to one with. But the frozen
mutation arg was still just lambda local.

Change to use do_with for this case as well.

Message-Id: <20191203162606.1664-1-calle@scylladb.com>
(cherry picked from commit 56a5e0a251)
2019-12-04 15:02:15 +02:00
Hagit Segev
3980570520 release: prepare for 3.2.rc2 2019-12-03 18:16:37 +02:00
Avi Kivity
9889e553e6 Update seastar submodule
* seastar 6f0ef3251...8837a3fdf (1):
  > shared_future: Fix crash when all returned futures time out

Fixes #5322
2019-11-29 11:47:24 +02:00
Avi Kivity
3e0b09faa1 Update seastar submodule to point to scylla-seastar.git
This allows us to add 3.2 specific patches to Seastar.
2019-11-29 11:45:44 +02:00
Asias He
bc4106ff45 repair: Fix rx_hashes_nr metrics (#5213)
In get_full_row_hashes_with_rpc_stream and
repair_get_row_diff_with_rpc_stream_process_op which were introduced in
the "Repair switch to rpc stream" series, rx_hashes_nr metrics are not
updated correctly.

In the test we have 3 nodes and run repair on node3, we makes sure the
following metrics are correct.

assertEqual(node1_metrics['scylla_repair_tx_hashes_nr'] + node2_metrics['scylla_repair_tx_hashes_nr'],
   	    node3_metrics['scylla_repair_rx_hashes_nr'])
assertEqual(node1_metrics['scylla_repair_rx_hashes_nr'] + node2_metrics['scylla_repair_rx_hashes_nr'],
   	    node3_metrics['scylla_repair_tx_hashes_nr'])
assertEqual(node1_metrics['scylla_repair_tx_row_nr'] + node2_metrics['scylla_repair_tx_row_nr'],
   	    node3_metrics['scylla_repair_rx_row_nr'])
assertEqual(node1_metrics['scylla_repair_rx_row_nr'] + node2_metrics['scylla_repair_rx_row_nr'],
   	    node3_metrics['scylla_repair_tx_row_nr'])
assertEqual(node1_metrics['scylla_repair_tx_row_bytes'] + node2_metrics['scylla_repair_tx_row_bytes'],
   	    node3_metrics['scylla_repair_rx_row_bytes'])
assertEqual(node1_metrics['scylla_repair_rx_row_bytes'] + node2_metrics['scylla_repair_rx_row_bytes'],
            node3_metrics['scylla_repair_tx_row_bytes'])

Tests: repair_additional_test.py:RepairAdditionalTest.repair_almost_synced_3nodes_test
Fixes: #5339
Backports: 3.2
(cherry picked from commit 6ec602ff2c)
2019-11-25 18:18:12 +02:00
Rafael Ávila de Espíndola
df3563c1ae rpmbuild: don't use dwz
By default rpm uses dwz to merge the debug info from various
binaries. Unfortunately, it looks like addr2line has not been updated
to handle this:

// This works
$ addr2line  -e build/release/scylla 0x1234567

$ dwz -m build/release/common.debug build/release/scylla.debug build/release/iotune.debug

// now this fails
$ addr2line -e build/release/scylla 0x1234567

I think the issue is

https://sourceware.org/bugzilla/show_bug.cgi?id=23652

Fixes #5289

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191123015734.89331-1-espindola@scylladb.com>
(cherry picked from commit 8599f8205b)
2019-11-25 11:45:15 +02:00
Rafael Ávila de Espíndola
1c89961c4f commitlog: make sure a file is closed
If allocate or truncate throws, we have to close the file.

Fixes #4877

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191114174810.49004-1-espindola@scylladb.com>
(cherry picked from commit 6160b9017d)
2019-11-24 17:47:08 +02:00
Tomasz Grabiec
85b1a45252 row_cache: Fix abort on bad_alloc during cache update
Since 90d6c0b, cache will abort when trying to detach partition
entries while they're updated. This should never happen. It can happen
though, when the update fails on bad_alloc, because the cleanup guard
invalidates the cache before it releases partition snapshots (held by
"update" coroutine).

Fix by destroying the coroutine first.

Fixes #5327.

Tests:
  - row_cache_test (dev)

Message-Id: <1574360259-10132-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit e3d025d014)
2019-11-24 17:42:41 +02:00
Pekka Enberg
6a847e2242 test.py: Append test repeat cycle to output XML filename
Currently, we overwrite the same XML output file for each test repeat
cycle. This can cause invalid XML to be generated if the XML contents
don't match exactly for every iteration.

Fix the problem by appending the test repeat cycle in the XML filename
as follows:

  $ ./test.py --repeat 3 --name vint_serialization_test --mode dev --jenkins jenkins_test

  $ ls -1 *.xml
  jenkins_test.release.vint_serialization_test.0.boost.xml
  jenkins_test.release.vint_serialization_test.1.boost.xml
  jenkins_test.release.vint_serialization_test.2.boost.xml

Fixes #5303.

Message-Id: <20191119092048.16419-1-penberg@scylladb.com>
(cherry picked from commit 505f2c1008)
2019-11-20 22:26:29 +02:00
Nadav Har'El
10cf0e0d91 merge: row_marker: correct row expiry condition
Merged patch set by Piotr Dulikowski:

This change corrects condition on which a row was considered expired by its
TTL.

The logic that decides when a row becomes expired was inconsistent with the
logic that decides if a single cell is expired. A single cell becomes expired
when expiry_timestamp <= now, while a row became expired when
expiry_timestamp < now (notice the strict inequality). For rows inserted
with TTL, this caused non-key cells to expire (change their values to null)
one second before the row disappeared. Now, row expiry logic uses non-strict
inequality.

Fixes #4263,
Fixes #5290.

Tests:

    unit(dev)
    python test described in issue #5290

(cherry picked from commit 9b9609c65b)
2019-11-20 21:37:16 +02:00
Hagit Segev
8c1474c039 release: prepare for 3.2.rc1 2019-11-19 21:38:26 +02:00
Yaron Kaikov
bb5e9527bb dist/docker: Switch to 3.2 release repository (#5296)
Modify Dockerfile SCYLLA_REPO_URL argument to point to the 3.2 repository.
2019-11-18 11:59:51 +02:00
Nadav Har'El
4dae72b2cd sstables: allow non-traditional characters in table name
The goal of this patch is to fix issue #5280, a rather serious Alternator
bug, where Scylla fails to restart when an Alternator table has secondary
indexes (LSI or GSI).

Traditionally, Cassandra allows table names to contain only alphanumeric
characters and underscores. However, most of our internal implementation
doesn't actually have this restriction. So Alternator uses the characters
':' and '!' in the table names to mark global and local secondary indexes,
respectively. And this actually works. Or almost...

This patch fixes a problem of listing, during boot, the sstables stored
for tables with such non-traditional names. The sstable listing code
needlessly assumes that the *directory* name, i.e., the CF names, matches
the "\w+" regular expression. When an sstable is found in a directory not
matching such regular expression, the boot fails. But there is no real
reason to require such a strict regular expression. So this patch relaxes
this requirement, and allows Scylla to boot with Alternator's GSI and LSI
tables and their names which include the ":" and "!" characters, and in
fact any other name allowed as a directory name.

Fixes #5280.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191114153811.17386-1-nyh@scylladb.com>
(cherry picked from commit 2fb2eb27a2)
2019-11-17 18:07:52 +02:00
Kamil Braun
1e444a3dd5 sstables: fix sstable file I/O CQL tracing when reading multiple files (#5285)
CQL tracing would only report file I/O involving one sstable, even if
multiple sstables were read from during the query.

Steps to reproduce:

create a table with NullCompactionStrategy
insert row, flush memtables
insert row, flush memtables
restart Scylla
tracing on
select * from table
The trace would only report DMA reads from one of the two sstables.

Kudos to @denesb for catching this.

Related issue: #4908

(cherry picked from commit a67e887dea)
2019-11-17 16:09:35 +02:00
Yaron Kaikov
76906d6134 release: prepare for 3.2.rc0 2019-11-17 16:08:11 +02:00
393 changed files with 2028 additions and 337 deletions

2
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui

View File

@@ -1,7 +1,7 @@
#!/bin/sh
PRODUCT=scylla
VERSION=666.development
VERSION=3.2.5
if test -f version
then

View File

@@ -254,6 +254,9 @@ void set_storage_service(http_context& ctx, routes& r) {
if (column_family.empty()) {
resp = service::get_local_storage_service().take_snapshot(tag, keynames);
} else {
if (keynames.empty()) {
throw httpd::bad_param_exception("The keyspace of column families must be specified");
}
if (keynames.size() > 1) {
throw httpd::bad_param_exception("Only one keyspace allowed when specifying a column family");
}

View File

@@ -74,7 +74,7 @@ public:
options() = default;
options(const std::map<sstring, sstring>& map) {
if (map.find("enabled") == std::end(map)) {
throw exceptions::configuration_exception("Missing enabled CDC option");
return;
}
for (auto& p : map) {
@@ -92,6 +92,9 @@ public:
}
}
std::map<sstring, sstring> to_map() const {
if (!_enabled) {
return {};
}
return {
{ "enabled", _enabled ? "true" : "false" },
{ "preimage", _preimage ? "true" : "false" },

View File

@@ -241,7 +241,9 @@ batch_size_fail_threshold_in_kb: 50
# broadcast_rpc_address: 1.2.3.4
# Uncomment to enable experimental features
# experimental: true
# experimental_features:
# - cdc
# - lwt
# The directory where hints files are stored if hinted handoff is enabled.
# hints_directory: /var/lib/scylla/hints

View File

@@ -381,6 +381,7 @@ scylla_tests = [
'tests/data_listeners_test',
'tests/truncation_migration_test',
'tests/like_matcher_test',
'tests/enum_option_test',
]
perf_tests = [
@@ -875,6 +876,7 @@ pure_boost_tests = set([
'tests/top_k_test',
'tests/small_vector_test',
'tests/like_matcher_test',
'tests/enum_option_test',
])
tests_not_using_seastar_test_framework = set([

View File

@@ -266,7 +266,7 @@ bool column_condition::applies_to(const data_value* cell_value, const query_opti
return value.has_value() && is_satisfied_by(operator_type::EQ, *cell_value->type(), *column.type, *cell_value, *value);
});
} else {
return std::any_of(in_values.begin(), in_values.end(), [] (const bytes_opt& value) { return value.has_value() == false; });
return std::any_of(in_values.begin(), in_values.end(), [] (const bytes_opt& value) { return !value.has_value() || value->empty(); });
}
}

View File

@@ -61,6 +61,16 @@ make_now_fct() {
});
}
static int64_t get_valid_timestamp(const data_value& ts_obj) {
auto ts = value_cast<db_clock::time_point>(ts_obj);
int64_t ms = ts.time_since_epoch().count();
auto nanos_since = utils::UUID_gen::make_nanos_since(ms);
if (!utils::UUID_gen::is_valid_nanos_since(nanos_since)) {
throw exceptions::server_exception(format("{}: timestamp is out of range. Must be in milliseconds since epoch", ms));
}
return ms;
}
inline
shared_ptr<function>
make_min_timeuuid_fct() {
@@ -74,8 +84,7 @@ make_min_timeuuid_fct() {
if (ts_obj.is_null()) {
return {};
}
auto ts = value_cast<db_clock::time_point>(ts_obj);
auto uuid = utils::UUID_gen::min_time_UUID(ts.time_since_epoch().count());
auto uuid = utils::UUID_gen::min_time_UUID(get_valid_timestamp(ts_obj));
return {timeuuid_type->decompose(uuid)};
});
}
@@ -85,7 +94,6 @@ shared_ptr<function>
make_max_timeuuid_fct() {
return make_native_scalar_function<true>("maxtimeuuid", timeuuid_type, { timestamp_type },
[] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
// FIXME: should values be a vector<optional<bytes>>?
auto& bb = values[0];
if (!bb) {
return {};
@@ -94,12 +102,22 @@ make_max_timeuuid_fct() {
if (ts_obj.is_null()) {
return {};
}
auto ts = value_cast<db_clock::time_point>(ts_obj);
auto uuid = utils::UUID_gen::max_time_UUID(ts.time_since_epoch().count());
auto uuid = utils::UUID_gen::max_time_UUID(get_valid_timestamp(ts_obj));
return {timeuuid_type->decompose(uuid)};
});
}
inline utils::UUID get_valid_timeuuid(bytes raw) {
if (!utils::UUID_gen::is_valid_UUID(raw)) {
throw exceptions::server_exception(format("invalid timeuuid: size={}", raw.size()));
}
auto uuid = utils::UUID_gen::get_UUID(raw);
if (!uuid.is_timestamp()) {
throw exceptions::server_exception(format("{}: Not a timeuuid: version={}", uuid, uuid.version()));
}
return uuid;
}
inline
shared_ptr<function>
make_date_of_fct() {
@@ -110,7 +128,7 @@ make_date_of_fct() {
if (!bb) {
return {};
}
auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb))));
auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(get_valid_timeuuid(*bb))));
return {timestamp_type->decompose(ts)};
});
}
@@ -125,7 +143,7 @@ make_unix_timestamp_of_fct() {
if (!bb) {
return {};
}
return {long_type->decompose(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb)))};
return {long_type->decompose(UUID_gen::unix_timestamp(get_valid_timeuuid(*bb)))};
});
}
@@ -176,7 +194,7 @@ make_timeuuidtodate_fct() {
if (!bb) {
return {};
}
auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb))));
auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(get_valid_timeuuid(*bb))));
auto to_simple_date = get_castas_fctn(simple_date_type, timestamp_type);
return {simple_date_type->decompose(to_simple_date(ts))};
});
@@ -211,7 +229,7 @@ make_timeuuidtotimestamp_fct() {
if (!bb) {
return {};
}
auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb))));
auto ts = db_clock::time_point(db_clock::duration(UUID_gen::unix_timestamp(get_valid_timeuuid(*bb))));
return {timestamp_type->decompose(ts)};
});
}
@@ -245,10 +263,14 @@ make_timeuuidtounixtimestamp_fct() {
if (!bb) {
return {};
}
return {long_type->decompose(UUID_gen::unix_timestamp(UUID_gen::get_UUID(*bb)))};
return {long_type->decompose(UUID_gen::unix_timestamp(get_valid_timeuuid(*bb)))};
});
}
inline bytes time_point_to_long(const data_value& v) {
return data_value(get_valid_timestamp(v)).serialize();
}
inline
shared_ptr<function>
make_timestamptounixtimestamp_fct() {
@@ -263,7 +285,7 @@ make_timestamptounixtimestamp_fct() {
if (ts_obj.is_null()) {
return {};
}
return {long_type->decompose(ts_obj)};
return time_point_to_long(ts_obj);
});
}
@@ -282,7 +304,7 @@ make_datetounixtimestamp_fct() {
return {};
}
auto from_simple_date = get_castas_fctn(timestamp_type, simple_date_type);
return {long_type->decompose(from_simple_date(simple_date_obj))};
return time_point_to_long(from_simple_date(simple_date_obj));
});
}

View File

@@ -478,7 +478,7 @@ inline bool single_column_primary_key_restrictions<clustering_key>::needs_filter
// 3. a SLICE restriction isn't on a last place
column_id position = 0;
for (const auto& restriction : _restrictions->restrictions() | boost::adaptors::map_values) {
if (restriction->is_contains() || position != restriction->get_column_def().id) {
if (restriction->is_contains() || restriction->is_LIKE() || position != restriction->get_column_def().id) {
return true;
}
if (!restriction->is_slice()) {

View File

@@ -390,28 +390,45 @@ std::vector<const column_definition*> statement_restrictions::get_column_defs_fo
if (need_filtering()) {
auto& sim = db.find_column_family(_schema).get_index_manager();
auto [opt_idx, _] = find_idx(sim);
auto column_uses_indexing = [&opt_idx] (const column_definition* cdef) {
return opt_idx && opt_idx->depends_on(*cdef);
auto column_uses_indexing = [&opt_idx] (const column_definition* cdef, ::shared_ptr<single_column_restriction> restr) {
return opt_idx && restr && restr->is_supported_by(*opt_idx);
};
auto single_pk_restrs = dynamic_pointer_cast<single_column_partition_key_restrictions>(_partition_key_restrictions);
if (_partition_key_restrictions->needs_filtering(*_schema)) {
for (auto&& cdef : _partition_key_restrictions->get_column_defs()) {
if (!column_uses_indexing(cdef)) {
::shared_ptr<single_column_restriction> restr;
if (single_pk_restrs) {
auto it = single_pk_restrs->restrictions().find(cdef);
if (it != single_pk_restrs->restrictions().end()) {
restr = dynamic_pointer_cast<single_column_restriction>(it->second);
}
}
if (!column_uses_indexing(cdef, restr)) {
column_defs_for_filtering.emplace_back(cdef);
}
}
}
auto single_ck_restrs = dynamic_pointer_cast<single_column_clustering_key_restrictions>(_clustering_columns_restrictions);
const bool pk_has_unrestricted_components = _partition_key_restrictions->has_unrestricted_components(*_schema);
if (pk_has_unrestricted_components || _clustering_columns_restrictions->needs_filtering(*_schema)) {
column_id first_filtering_id = pk_has_unrestricted_components ? 0 : _schema->clustering_key_columns().begin()->id +
_clustering_columns_restrictions->num_prefix_columns_that_need_not_be_filtered();
for (auto&& cdef : _clustering_columns_restrictions->get_column_defs()) {
if (cdef->id >= first_filtering_id && !column_uses_indexing(cdef)) {
::shared_ptr<single_column_restriction> restr;
if (single_pk_restrs) {
auto it = single_ck_restrs->restrictions().find(cdef);
if (it != single_ck_restrs->restrictions().end()) {
restr = dynamic_pointer_cast<single_column_restriction>(it->second);
}
}
if (cdef->id >= first_filtering_id && !column_uses_indexing(cdef, restr)) {
column_defs_for_filtering.emplace_back(cdef);
}
}
}
for (auto&& cdef : _nonprimary_key_restrictions->get_column_defs()) {
if (!column_uses_indexing(cdef)) {
auto restr = dynamic_pointer_cast<single_column_restriction>(_nonprimary_key_restrictions->get_restriction(*cdef));
if (!column_uses_indexing(cdef, restr)) {
column_defs_for_filtering.emplace_back(cdef);
}
}

View File

@@ -92,6 +92,14 @@ public:
: abstract_function_selector(fun, std::move(arg_selectors))
, _tfun(dynamic_pointer_cast<T>(fun)) {
}
const functions::function_name& name() const {
return _tfun->name();
}
virtual sstring assignment_testable_source_context() const override {
return format("{}", this->name());
}
};
}

View File

@@ -79,11 +79,6 @@ public:
dynamic_pointer_cast<functions::aggregate_function>(func), std::move(arg_selectors))
, _aggregate(fun()->new_aggregate()) {
}
virtual sstring assignment_testable_source_context() const override {
// FIXME:
return "FIXME";
}
};
}

View File

@@ -82,12 +82,6 @@ public:
: abstract_function_selector_for<functions::scalar_function>(
dynamic_pointer_cast<functions::scalar_function>(std::move(fun)), std::move(arg_selectors)) {
}
virtual sstring assignment_testable_source_context() const override {
// FIXME:
return "FIXME";
}
};
}

View File

@@ -111,7 +111,9 @@ lw_shared_ptr<query::read_command> cas_request::read_command() const {
} else {
ranges = query::clustering_range::deoverlap(std::move(ranges), clustering_key::tri_compare(*_schema));
}
query::partition_slice ps(std::move(ranges), *_schema, columns_to_read, update_parameters::options);
auto options = update_parameters::options;
options.set(query::partition_slice::option::always_return_static_content);
query::partition_slice ps(std::move(ranges), *_schema, columns_to_read, options);
ps.set_partition_row_limit(max_rows);
return make_lw_shared<query::read_command>(_schema->id(), _schema->version(), std::move(ps));
}

View File

@@ -60,7 +60,6 @@ public:
static constexpr query::partition_slice::option_set options = query::partition_slice::option_set::of<
query::partition_slice::option::send_partition_key,
query::partition_slice::option::send_clustering_key,
query::partition_slice::option::always_return_static_content,
query::partition_slice::option::collections_as_maps>();
// Holder for data for

View File

@@ -1984,7 +1984,8 @@ flat_mutation_reader make_multishard_streaming_reader(distributed<database>& db,
return make_multishard_combining_reader(make_shared<streaming_reader_lifecycle_policy>(db), partitioner, std::move(s), pr, ps, pc,
std::move(trace_state), fwd_mr);
});
return make_flat_multi_range_reader(std::move(schema), std::move(ms), std::move(range_generator), schema->full_slice(),
auto&& full_slice = schema->full_slice();
return make_flat_multi_range_reader(std::move(schema), std::move(ms), std::move(range_generator), std::move(full_slice),
service::get_local_streaming_read_priority(), {}, mutation_reader::forwarding::no);
}

View File

@@ -1241,6 +1241,34 @@ void db::commitlog::segment_manager::flush_segments(bool force) {
}
}
/// \brief Helper for ensuring a file is closed if an exception is thrown.
///
/// The file provided by the file_fut future is passed to func.
/// * If func throws an exception E, the file is closed and we return
/// a failed future with E.
/// * If func returns a value V, the file is not closed and we return
/// a future with V.
/// Note that when an exception is not thrown, it is the
/// responsibility of func to make sure the file will be closed. It
/// can close the file itself, return it, or store it somewhere.
///
/// \tparam Func The type of function this wraps
/// \param file_fut A future that produces a file
/// \param func A function that uses a file
/// \return A future that passes the file produced by file_fut to func
/// and closes it if func fails
template <typename Func>
static auto close_on_failure(future<file> file_fut, Func func) {
return file_fut.then([func = std::move(func)](file f) {
return futurize_apply(func, f).handle_exception([f] (std::exception_ptr e) mutable {
return f.close().then_wrapped([f, e = std::move(e)] (future<> x) {
using futurator = futurize<std::result_of_t<Func(file)>>;
return futurator::make_exception_future(e);
});
});
});
}
future<db::commitlog::segment_manager::sseg_ptr> db::commitlog::segment_manager::allocate_segment_ex(const descriptor& d, sstring filename, open_flags flags, bool active) {
file_open_options opt;
opt.extent_allocation_size_hint = max_size;
@@ -1258,7 +1286,7 @@ future<db::commitlog::segment_manager::sseg_ptr> db::commitlog::segment_manager:
return fut;
});
return fut.then([this, d, active, filename, flags](file f) {
return close_on_failure(std::move(fut), [this, d, active, filename, flags] (file f) {
f = make_checked_file(commit_error_handler, f);
// xfs doesn't like files extended betond eof, so enlarge the file
auto fut = make_ready_future<>();
@@ -1288,7 +1316,7 @@ future<db::commitlog::segment_manager::sseg_ptr> db::commitlog::segment_manager:
v.emplace_back(iovec{ buf.get_write(), s});
m += s;
}
return f.dma_write(max_size - rem, std::move(v)).then([&rem](size_t s) {
return f.dma_write(max_size - rem, std::move(v), service::get_local_commitlog_priority()).then([&rem](size_t s) {
rem -= s;
return stop_iteration::no;
});

View File

@@ -276,7 +276,7 @@ future<> db::commitlog_replayer::impl::process(stats* s, commitlog::buffer_and_r
}
auto shard = _db.local().shard_of(fm);
return _db.invoke_on(shard, [this, cer = std::move(cer), &src_cm, rp, shard, s] (database& db) -> future<> {
return _db.invoke_on(shard, [this, cer = std::move(cer), &src_cm, rp, shard, s] (database& db) mutable -> future<> {
auto& fm = cer.mutation();
// TODO: might need better verification that the deserialized mutation
// is schema compatible. My guess is that just applying the mutation
@@ -306,7 +306,9 @@ future<> db::commitlog_replayer::impl::process(stats* s, commitlog::buffer_and_r
return db.apply_in_memory(m, cf, db::rp_handle(), db::no_timeout);
});
} else {
return db.apply_in_memory(fm, cf.schema(), db::rp_handle(), db::no_timeout);
return do_with(std::move(cer).mutation(), [&](const frozen_mutation& m) {
return db.apply_in_memory(m, cf.schema(), db::rp_handle(), db::no_timeout);
});
}
}).then_wrapped([s] (future<> f) {
try {

View File

@@ -22,6 +22,7 @@
#include <unordered_map>
#include <regex>
#include <sstream>
#include <boost/any.hpp>
#include <boost/program_options.hpp>
@@ -108,6 +109,10 @@ const config_type config_type_for<int32_t> = config_type("integer", value_to_jso
template <>
const config_type config_type_for<db::seed_provider_type> = config_type("seed provider", seed_provider_to_json);
template <>
const config_type config_type_for<std::vector<enum_option<db::experimental_features_t>>> = config_type(
"experimental features", value_to_json<std::vector<sstring>>);
}
namespace YAML {
@@ -153,6 +158,23 @@ struct convert<db::config::seed_provider_type> {
}
};
template <>
class convert<enum_option<db::experimental_features_t>> {
public:
static bool decode(const Node& node, enum_option<db::experimental_features_t>& rhs) {
std::string name;
if (!convert<std::string>::decode(node, name)) {
return false;
}
try {
std::istringstream(name) >> rhs;
} catch (boost::program_options::invalid_option_value&) {
return false;
}
return true;
}
};
}
#if defined(DEBUG)
@@ -669,7 +691,9 @@ db::config::config(std::shared_ptr<db::extensions> exts)
, shutdown_announce_in_ms(this, "shutdown_announce_in_ms", value_status::Used, 2 * 1000, "Time a node waits after sending gossip shutdown message in milliseconds. Same as -Dcassandra.shutdown_announce_in_ms in cassandra.")
, developer_mode(this, "developer_mode", value_status::Used, false, "Relax environment checks. Setting to true can reduce performance and reliability significantly.")
, skip_wait_for_gossip_to_settle(this, "skip_wait_for_gossip_to_settle", value_status::Used, -1, "An integer to configure the wait for gossip to settle. -1: wait normally, 0: do not wait at all, n: wait for at most n polls. Same as -Dcassandra.skip_wait_for_gossip_to_settle in cassandra.")
, experimental(this, "experimental", value_status::Used, false, "Set to true to unlock experimental features.")
, force_gossip_generation(this, "force_gossip_generation", liveness::LiveUpdate, value_status::Used, -1 , "Force gossip to use the generation number provided by user")
, experimental(this, "experimental", value_status::Used, false, "Set to true to unlock all experimental features.")
, experimental_features(this, "experimental_features", value_status::Used, {}, "Unlock experimental features provided as the option arguments (possible values: 'lwt' and 'cdc'). Can be repeated.")
, lsa_reclamation_step(this, "lsa_reclamation_step", value_status::Used, 1, "Minimum number of segments to reclaim in a single step")
, prometheus_port(this, "prometheus_port", value_status::Used, 9180, "Prometheus port, set to zero to disable")
, prometheus_address(this, "prometheus_address", value_status::Used, "0.0.0.0", "Prometheus listening address")
@@ -779,10 +803,12 @@ db::fs::path db::config::get_conf_dir() {
return confdir;
}
void db::config::check_experimental(const sstring& what) const {
if (!experimental()) {
throw std::runtime_error(format("{} is currently disabled. Start Scylla with --experimental=on to enable.", what));
bool db::config::check_experimental(experimental_features_t::feature f) const {
if (experimental()) {
return true;
}
const auto& optval = experimental_features();
return find(begin(optval), end(optval), enum_option<experimental_features_t>{f}) != end(optval);
}
namespace bpo = boost::program_options;
@@ -827,6 +853,12 @@ const db::extensions& db::config::extensions() const {
return *_extensions;
}
std::unordered_map<sstring, db::experimental_features_t::feature> db::experimental_features_t::map() {
// We decided against using the construct-on-first-use idiom here:
// https://github.com/scylladb/scylla/pull/5369#discussion_r353614807
return {{"lwt", LWT}, {"cdc", CDC}};
}
template struct utils::config_file::named_value<seastar::log_level>;
namespace utils {

View File

@@ -33,6 +33,7 @@
#include "seastarx.hh"
#include "utils/config_file.hh"
#include "utils/enum_option.hh"
namespace seastar { class file; struct logging_settings; }
@@ -75,14 +76,20 @@ sstring config_value_as_json(const std::unordered_map<sstring, log_level>& v);
namespace db {
/// Enumeration of all valid values for the `experimental` config entry.
struct experimental_features_t {
enum feature { LWT, CDC };
static std::unordered_map<sstring, feature> map(); // See enum_option.
};
class config : public utils::config_file {
public:
config();
config(std::shared_ptr<db::extensions>);
~config();
// Throws exception if experimental feature is disabled.
void check_experimental(const sstring& what) const;
/// True iff the feature is enabled.
bool check_experimental(experimental_features_t::feature f) const;
/**
* Scans the environment variables for configuration files directory
@@ -262,7 +269,9 @@ public:
named_value<uint32_t> shutdown_announce_in_ms;
named_value<bool> developer_mode;
named_value<int32_t> skip_wait_for_gossip_to_settle;
named_value<int32_t> force_gossip_generation;
named_value<bool> experimental;
named_value<std::vector<enum_option<experimental_features_t>>> experimental_features;
named_value<size_t> lsa_reclamation_step;
named_value<uint16_t> prometheus_port;
named_value<sstring> prometheus_address;

View File

@@ -405,11 +405,8 @@ future<> manager::end_point_hints_manager::sender::do_send_one_mutation(frozen_m
return _proxy.send_to_endpoint(std::move(m), end_point_key(), { }, write_type::SIMPLE, service::allow_hints::no);
} else {
manager_logger.trace("Endpoints set has changed and {} is no longer a replica. Mutating from scratch...", end_point_key());
// FIXME: using 1h as infinite timeout. If a node is down, we should get an
// unavailable exception.
auto timeout = db::timeout_clock::now() + 1h;
//FIXME: Add required frozen_mutation overloads
return _proxy.mutate({m.fm.unfreeze(m.s)}, consistency_level::ALL, timeout, nullptr, empty_service_permit());
return _proxy.mutate_hint_from_scratch(std::move(m));
}
});
}

View File

@@ -33,12 +33,14 @@ enum class schema_feature {
// See https://github.com/scylladb/scylla/issues/4485
DIGEST_INSENSITIVE_TO_EXPIRY,
COMPUTED_COLUMNS,
CDC_OPTIONS,
};
using schema_features = enum_set<super_enum<schema_feature,
schema_feature::VIEW_VIRTUAL_COLUMNS,
schema_feature::DIGEST_INSENSITIVE_TO_EXPIRY,
schema_feature::COMPUTED_COLUMNS
schema_feature::COMPUTED_COLUMNS,
schema_feature::CDC_OPTIONS
>>;
}

View File

@@ -294,19 +294,24 @@ schema_ptr tables() {
}
// Holds Scylla-specific table metadata.
schema_ptr scylla_tables() {
static thread_local auto schema = [] {
schema_ptr scylla_tables(schema_features features) {
static auto make = [] (bool has_cdc_options) -> schema_ptr {
auto id = generate_legacy_id(NAME, SCYLLA_TABLES);
return schema_builder(NAME, SCYLLA_TABLES, std::make_optional(id))
auto sb = schema_builder(NAME, SCYLLA_TABLES, std::make_optional(id))
.with_column("keyspace_name", utf8_type, column_kind::partition_key)
.with_column("table_name", utf8_type, column_kind::clustering_key)
.with_column("version", uuid_type)
.with_column("cdc", map_type_impl::get_instance(utf8_type, utf8_type, false))
.set_gc_grace_seconds(schema_gc_grace)
.with_version(generate_schema_version(id))
.build();
}();
return schema;
.set_gc_grace_seconds(schema_gc_grace);
if (has_cdc_options) {
sb.with_column("cdc", map_type_impl::get_instance(utf8_type, utf8_type, false));
sb.with_version(generate_schema_version(id, 1));
} else {
sb.with_version(generate_schema_version(id));
}
return sb.build();
};
static thread_local schema_ptr schemas[2] = { make(false), make(true) };
return schemas[features.contains(schema_feature::CDC_OPTIONS)];
}
// The "columns" table lists the definitions of all columns in all tables
@@ -608,14 +613,28 @@ schema_ptr aggregates() {
}
#endif
static
mutation
redact_columns_for_missing_features(mutation m, schema_features features) {
if (features.contains(schema_feature::CDC_OPTIONS)) {
return std::move(m);
}
if (m.schema()->cf_name() != SCYLLA_TABLES) {
return std::move(m);
}
slogger.debug("adjusting schema_tables mutation due to possible in-progress cluster upgrade");
m.upgrade(scylla_tables(features));
return std::move(m);
}
/**
* Read schema from system keyspace and calculate MD5 digest of every row, resulting digest
* will be converted into UUID which would act as content-based version of the schema.
*/
future<utils::UUID> calculate_schema_digest(distributed<service::storage_proxy>& proxy, schema_features features)
{
auto map = [&proxy] (sstring table) {
return db::system_keyspace::query_mutations(proxy, NAME, table).then([&proxy, table] (auto rs) {
auto map = [&proxy, features] (sstring table) {
return db::system_keyspace::query_mutations(proxy, NAME, table).then([&proxy, table, features] (auto rs) {
auto s = proxy.local().get_db().local().find_schema(NAME, table);
std::vector<mutation> mutations;
for (auto&& p : rs->partitions()) {
@@ -624,6 +643,7 @@ future<utils::UUID> calculate_schema_digest(distributed<service::storage_proxy>&
if (is_system_keyspace(partition_key)) {
continue;
}
mut = redact_columns_for_missing_features(std::move(mut), features);
mutations.emplace_back(std::move(mut));
}
return mutations;
@@ -647,8 +667,8 @@ future<utils::UUID> calculate_schema_digest(distributed<service::storage_proxy>&
future<std::vector<canonical_mutation>> convert_schema_to_mutations(distributed<service::storage_proxy>& proxy, schema_features features)
{
auto map = [&proxy] (sstring table) {
return db::system_keyspace::query_mutations(proxy, NAME, table).then([&proxy, table] (auto rs) {
auto map = [&proxy, features] (sstring table) {
return db::system_keyspace::query_mutations(proxy, NAME, table).then([&proxy, table, features] (auto rs) {
auto s = proxy.local().get_db().local().find_schema(NAME, table);
std::vector<canonical_mutation> results;
for (auto&& p : rs->partitions()) {
@@ -657,6 +677,7 @@ future<std::vector<canonical_mutation>> convert_schema_to_mutations(distributed<
if (is_system_keyspace(partition_key)) {
continue;
}
mut = redact_columns_for_missing_features(std::move(mut), features);
results.emplace_back(mut);
}
return results;
@@ -669,6 +690,14 @@ future<std::vector<canonical_mutation>> convert_schema_to_mutations(distributed<
return map_reduce(all_table_names(features), map, std::vector<canonical_mutation>{}, reduce);
}
std::vector<mutation>
adjust_schema_for_schema_features(std::vector<mutation> schema, schema_features features) {
for (auto& m : schema) {
m = redact_columns_for_missing_features(m, features);
}
return std::move(schema);
}
future<schema_result>
read_schema_for_keyspaces(distributed<service::storage_proxy>& proxy, const sstring& schema_table_name, const std::set<sstring>& keyspace_names)
{
@@ -1673,7 +1702,19 @@ mutation make_scylla_tables_mutation(schema_ptr table, api::timestamp_type times
auto ckey = clustering_key::from_singular(*s, table->cf_name());
mutation m(scylla_tables(), pkey);
m.set_clustered_cell(ckey, "version", utils::UUID(table->version()), timestamp);
store_map(m, ckey, "cdc", timestamp, table->cdc_options().to_map());
auto cdc_options = table->cdc_options().to_map();
if (!cdc_options.empty()) {
store_map(m, ckey, "cdc", timestamp, cdc_options);
} else {
// Avoid storing anything for cdc disabled, so we don't end up with
// different digests on different nodes due to the other node redacting
// the cdc column when the cdc cluster feature is disabled.
//
// Tombstones are not considered for schema digest, so this is okay (and
// needed in order for disabling of cdc to have effect).
auto& cdc_cdef = *scylla_tables()->get_column_definition("cdc");
m.set_clustered_cell(ckey, cdc_cdef, atomic_cell::make_dead(timestamp, gc_clock::now()));
}
return m;
}

View File

@@ -109,7 +109,7 @@ schema_ptr view_virtual_columns();
schema_ptr dropped_columns();
schema_ptr indexes();
schema_ptr tables();
schema_ptr scylla_tables();
schema_ptr scylla_tables(schema_features features = schema_features::full());
schema_ptr views();
schema_ptr computed_columns();
@@ -154,6 +154,7 @@ future<> save_system_keyspace_schema();
future<utils::UUID> calculate_schema_digest(distributed<service::storage_proxy>& proxy, schema_features);
future<std::vector<canonical_mutation>> convert_schema_to_mutations(distributed<service::storage_proxy>& proxy, schema_features);
std::vector<mutation> adjust_schema_for_schema_features(std::vector<mutation> schema, schema_features features);
future<schema_result_value_type>
read_schema_partition_for_keyspace(distributed<service::storage_proxy>& proxy, const sstring& schema_table_name, const sstring& keyspace_name);

View File

@@ -104,10 +104,10 @@ api::timestamp_type schema_creation_timestamp() {
// FIXME: Make automatic by calculating from schema structure.
static const uint16_t version_sequence_number = 1;
table_schema_version generate_schema_version(utils::UUID table_id) {
table_schema_version generate_schema_version(utils::UUID table_id, uint16_t offset) {
md5_hasher h;
feed_hash(h, table_id);
feed_hash(h, version_sequence_number);
feed_hash(h, version_sequence_number + offset);
return utils::UUID_gen::get_name_UUID(h.finalize());
}
@@ -1748,7 +1748,7 @@ static void maybe_add_virtual_reader(schema_ptr s, database& db) {
}
static bool maybe_write_in_user_memory(schema_ptr s, database& db) {
return (s.get() == batchlog().get())
return (s.get() == batchlog().get()) || (s.get() == paxos().get())
|| s == v3::scylla_views_builds_in_progress();
}

View File

@@ -152,7 +152,7 @@ schema_ptr aggregates();
}
table_schema_version generate_schema_version(utils::UUID table_id);
table_schema_version generate_schema_version(utils::UUID table_id, uint16_t offset = 0);
// Only for testing.
void minimal_setup(distributed<database>& db, distributed<cql3::query_processor>& qp);

View File

@@ -307,7 +307,7 @@ deletable_row& view_updates::get_view_row(const partition_key& base_key, const c
if (!cdef.is_computed()) {
//FIXME(sarna): this legacy code is here for backward compatibility and should be removed
// once "computed_columns feature" is supported by every node
if (!service::get_local_storage_service().db().local().find_column_family(_base->id()).get_index_manager().is_index(*_base)) {
if (!service::get_local_storage_service().db().local().find_column_family(_base->id()).get_index_manager().is_index(*_view)) {
throw std::logic_error(format("Column {} doesn't exist in base and this view is not backing a secondary index", cdef.name_as_text()));
}
computed_value = token_column_computation().compute_value(*_base, base_key, update);
@@ -879,7 +879,11 @@ future<stop_iteration> view_update_builder::on_results() {
if (_update && !_update->is_end_of_partition()) {
if (_update->is_clustering_row()) {
apply_tracked_tombstones(_update_tombstone_tracker, _update->as_mutable_clustering_row());
generate_update(std::move(*_update).as_clustering_row(), { });
auto existing_tombstone = _existing_tombstone_tracker.current_tombstone();
auto existing = existing_tombstone
? std::optional<clustering_row>(std::in_place, _update->as_clustering_row().key(), row_tombstone(std::move(existing_tombstone)), row_marker(), ::row())
: std::nullopt;
generate_update(std::move(*_update).as_clustering_row(), std::move(existing));
}
return advance_updates();
}

View File

@@ -63,7 +63,7 @@ if __name__ == '__main__':
run('ip link set dev {TAP} master {BRIDGE}'.format(TAP=tap, BRIDGE=bridge))
run('chown {USER}.{GROUP} /dev/vhost-net'.format(USER=user, GROUP=group))
elif mode == 'dpdk':
ethpcciid = cfg.get('ETHPCIID')
ethpciid = cfg.get('ETHPCIID')
nr_hugepages = cfg.get('NR_HUGEPAGES')
run('modprobe uio')
run('modprobe uio_pci_generic')
@@ -73,7 +73,6 @@ if __name__ == '__main__':
f.write(nr_hugepages)
if dist_name() == 'Ubuntu':
run('hugeadm --create-mounts')
fi
else:
set_nic_and_disks = get_set_nic_and_disks_config_value(cfg)
ifname = cfg.get('IFNAME')

View File

@@ -125,7 +125,7 @@ if [ -z "$TARGET" ]; then
fi
RELOC_PKG_FULLPATH=$(readlink -f $RELOC_PKG)
RELOC_PKG_BASENAME=$(basename $RELOC_PKG)
SCYLLA_VERSION=$(cat SCYLLA-VERSION-FILE)
SCYLLA_VERSION=$(cat SCYLLA-VERSION-FILE | sed 's/\.rc/~rc/')
SCYLLA_RELEASE=$(cat SCYLLA-RELEASE-FILE)
ln -fv $RELOC_PKG_FULLPATH ../$PRODUCT-server_$SCYLLA_VERSION-$SCYLLA_RELEASE.orig.tar.gz

View File

@@ -4,7 +4,6 @@ etc/security/limits.d/scylla.conf
etc/scylla.d/*.conf
opt/scylladb/share/doc/scylla/*
opt/scylladb/share/doc/scylla/licenses/
usr/lib/systemd/system/*.service
usr/lib/systemd/system/*.timer
usr/lib/systemd/system/*.slice
usr/bin/scylla
@@ -21,6 +20,7 @@ opt/scylladb/scripts/libexec/*
opt/scylladb/bin/*
opt/scylladb/libreloc/*
opt/scylladb/libexec/*
usr/lib/scylla/*
var/lib/scylla/data
var/lib/scylla/commitlog
var/lib/scylla/hints

View File

@@ -24,10 +24,6 @@ if [ "$1" = configure ]; then
fi
ln -sfT /etc/scylla /var/lib/scylla/conf
if [ -d /usr/lib/scylla ]; then
mv /usr/lib/scylla /usr/lib/scylla.old
fi
ln -sfT /opt/scylladb/scripts /usr/lib/scylla
grep -v api_ui_dir /etc/scylla/scylla.yaml | grep -v api_doc_dir > /tmp/scylla.yaml
echo "api_ui_dir: /opt/scylladb/swagger-ui/dist/" >> /tmp/scylla.yaml

View File

@@ -6,8 +6,12 @@ case "$1" in
purge|remove)
rm -rf /etc/systemd/system/scylla-housekeeping-daily.service.d/
rm -rf /etc/systemd/system/scylla-housekeeping-restart.service.d/
rm -rf /etc/systemd/system/scylla-server.service.d/
rm -rf /etc/systemd/system/scylla-helper.slice.d/
# We need to keep dependencies.conf and sysconfdir.conf on 'remove',
# otherwise it will be missing after rollback.
if [ "$1" = "purge" ]; then
rm -rf /etc/systemd/system/scylla-server.service.d/
fi
;;
esac

View File

@@ -5,7 +5,7 @@ MAINTAINER Avi Kivity <avi@cloudius-systems.com>
ENV container docker
# The SCYLLA_REPO_URL argument specifies the URL to the RPM repository this Docker image uses to install Scylla. The default value is the Scylla's unstable RPM repository, which contains the daily build.
ARG SCYLLA_REPO_URL=http://downloads.scylladb.com/rpm/unstable/centos/master/latest/scylla.repo
ARG SCYLLA_REPO_URL=http://downloads.scylladb.com/rpm/unstable/centos/branch-3.2/latest/scylla.repo
ADD scylla_bashrc /scylla_bashrc

View File

@@ -2,4 +2,4 @@
source /etc/sysconfig/scylla-jmx
exec /opt/scylladb/scripts/jmx/scylla-jmx -l /opt/scylladb/scripts/jmx
exec /opt/scylladb/jmx/scylla-jmx -l /opt/scylladb/jmx

View File

@@ -15,6 +15,12 @@ Obsoletes: scylla-server < 1.1
%global __brp_python_bytecompile %{nil}
%global __brp_mangle_shebangs %{nil}
%undefine _find_debuginfo_dwz_opts
# Prevent find-debuginfo.sh from tempering with scylla's build-id (#5881)
%undefine _unique_build_ids
%global _no_recompute_build_ids 1
%description
Scylla is a highly scalable, eventually consistent, distributed,
partitioned row DB.
@@ -75,9 +81,6 @@ getent passwd scylla || /usr/sbin/useradd -g scylla -s /sbin/nologin -r -d %{_sh
if [ -f /etc/systemd/coredump.conf ];then
/opt/scylladb/scripts/scylla_coredump_setup
fi
if [ -d /usr/lib/scylla ]; then
mv /usr/lib/scylla /usr/lib/scylla.old
fi
/opt/scylladb/scripts/scylla_post_install.sh
@@ -95,10 +98,6 @@ if [ -d /tmp/%{name}-%{version}-%{release} ]; then
rm -rf /tmp/%{name}-%{version}-%{release}/
fi
ln -sfT /etc/scylla /var/lib/scylla/conf
if [ -d /usr/lib/scylla ]; then
mv /usr/lib/scylla /usr/lib/scylla.old
fi
ln -sfT /opt/scylladb/scripts /usr/lib/scylla
%clean
rm -rf $RPM_BUILD_ROOT
@@ -130,6 +129,7 @@ rm -rf $RPM_BUILD_ROOT
/opt/scylladb/bin/*
/opt/scylladb/libreloc/*
/opt/scylladb/libexec/*
%{_prefix}/lib/scylla/*
%attr(0755,scylla,scylla) %dir %{_sharedstatedir}/scylla/
%attr(0755,scylla,scylla) %dir %{_sharedstatedir}/scylla/data
%attr(0755,scylla,scylla) %dir %{_sharedstatedir}/scylla/commitlog

View File

@@ -76,6 +76,9 @@ Scylla with issue #4139 fixed)
bit 4: CorrectEmptyCounters (if set, indicates the sstable was generated by
Scylla with issue #4363 fixed)
bit 5: CorrectUDTsInCollections (if set, indicates that the sstable was generated
by Scylla with issue #6130 fixed)
## extension_attributes subcomponent
extension_attributes = extension_attribute_count extension_attribute*

View File

@@ -98,6 +98,13 @@ public:
sstring get_message() const { return what(); }
};
class server_exception : public cassandra_exception {
public:
server_exception(sstring msg) noexcept
: exceptions::cassandra_exception{exceptions::exception_code::SERVER_ERROR, std::move(msg)}
{ }
};
class protocol_exception : public cassandra_exception {
public:
protocol_exception(sstring msg) noexcept

View File

@@ -1622,11 +1622,15 @@ future<> gossiper::start_gossiping(int generation_nbr, std::map<application_stat
// message on all cpus and forard them to cpu0 to process.
return get_gossiper().invoke_on_all([do_bind] (gossiper& g) {
g.init_messaging_service_handler(do_bind);
}).then([this, generation_nbr, preload_local_states] {
}).then([this, generation_nbr, preload_local_states] () mutable {
build_seeds_list();
/* initialize the heartbeat state for this localEndpoint */
maybe_initialize_local_state(generation_nbr);
if (_cfg.force_gossip_generation() > 0) {
generation_nbr = _cfg.force_gossip_generation();
logger.warn("Use the generation number provided by user: generation = {}", generation_nbr);
}
endpoint_state& local_state = endpoint_state_map[get_broadcast_address()];
local_state.set_heart_beat_state_and_update_timestamp(heart_beat_state(generation_nbr));
local_state.mark_alive();
for (auto& entry : preload_local_states) {
local_state.add_application_state(entry.first, entry.second);
}
@@ -1831,7 +1835,8 @@ future<> gossiper::do_stop_gossiping() {
if (my_ep_state && !is_silent_shutdown_state(*my_ep_state)) {
logger.info("Announcing shutdown");
add_local_application_state(application_state::STATUS, _value_factory.shutdown(true)).get();
for (inet_address addr : _live_endpoints) {
auto live_endpoints = _live_endpoints;
for (inet_address addr : live_endpoints) {
msg_addr id = get_msg_addr(addr);
logger.trace("Sending a GossipShutdown to {}", id);
ms().send_gossip_shutdown(id, get_broadcast_address()).then_wrapped([id] (auto&&f) {

View File

@@ -69,11 +69,6 @@ struct get_sync_boundary_response {
uint64_t new_rows_nr;
};
struct get_combined_row_hash_response {
repair_hash working_row_buf_combined_csum;
uint64_t working_row_buf_nr;
};
enum class row_level_diff_detect_algorithm : uint8_t {
send_full_set,
send_full_set_rpc_stream,

View File

@@ -219,6 +219,20 @@ EOS
for i in $SBINFILES; do
ln -srf "$rprefix/scripts/$i" "$rusr/sbin/$i"
done
# we need keep /usr/lib/scylla directory to support upgrade/downgrade
# without error, so we need to create symlink for each script on the
# directory
install -m755 -d "$rusr"/lib/scylla/scyllatop/views
for i in $(find "$rprefix"/scripts/ -maxdepth 1 -type f); do
ln -srf $i "$rusr"/lib/scylla/
done
for i in $(find "$rprefix"/scyllatop/ -maxdepth 1 -type f); do
ln -srf $i "$rusr"/lib/scylla/scyllatop
done
for i in $(find "$rprefix"/scyllatop/views -maxdepth 1 -type f); do
ln -srf $i "$rusr"/lib/scylla/scyllatop/views
done
else
install -m755 -d "$rdata"/saved_caches
install -d -m755 "$retc"/systemd/system/scylla-server.service.d

View File

@@ -53,13 +53,13 @@ std::vector<inet_address> simple_strategy::calculate_natural_endpoints(const tok
endpoints.reserve(replicas);
for (auto& token : tm.ring_range(t)) {
if (endpoints.size() == replicas) {
break;
}
auto ep = tm.get_endpoint(token);
assert(ep);
endpoints.push_back(*ep);
if (endpoints.size() == replicas) {
break;
}
}
return std::move(endpoints.get_vector());

10
main.cc
View File

@@ -54,6 +54,7 @@
#include <seastar/core/file.hh>
#include <sys/time.h>
#include <sys/resource.h>
#include <sys/prctl.h>
#include "disk-error-handler.hh"
#include "tracing/tracing.hh"
#include "tracing/tracing_backend_registry.hh"
@@ -464,6 +465,15 @@ inline auto defer_with_log_on_error(Func&& func) {
}
int main(int ac, char** av) {
// Allow core dumps. The would be disabled by default if
// CAP_SYS_NICE was added to the binary, as is suggested by the
// epoll backend.
int r = prctl(PR_SET_DUMPABLE, 1, 0, 0, 0);
if (r) {
std::cerr << "Could not make scylla dumpable\n";
exit(1);
}
int return_value = 0;
try {
// early check to avoid triggering

View File

@@ -39,6 +39,9 @@
#include <seastar/core/execution_stage.hh>
#include "types/map.hh"
#include "compaction_garbage_collector.hh"
#include "utils/exceptions.hh"
logging::logger mplog("mutation_partition");
template<bool reversed>
struct reversal_traits;
@@ -1236,7 +1239,9 @@ row::apply_monotonically(const column_definition& column, atomic_cell_or_collect
void
row::append_cell(column_id id, atomic_cell_or_collection value) {
if (_type == storage_type::vector && id < max_vector_size) {
assert(_storage.vector.v.size() <= id);
if (_storage.vector.v.size() > id) {
on_internal_error(mplog, format("Attempted to append cell#{} to row already having {} cells", id, _storage.vector.v.size()));
}
_storage.vector.v.resize(id);
_storage.vector.v.emplace_back(cell_and_hash{std::move(value), cell_hash_opt()});
_storage.vector.present.set(id);
@@ -1876,7 +1881,7 @@ bool row_marker::compact_and_expire(tombstone tomb, gc_clock::time_point now,
_timestamp = api::missing_timestamp;
return false;
}
if (_ttl > no_ttl && _expiry < now) {
if (_ttl > no_ttl && _expiry <= now) {
_expiry -= _ttl;
_ttl = dead;
}

View File

@@ -679,7 +679,7 @@ public:
if (is_missing() || _ttl == dead) {
return false;
}
if (_ttl != no_ttl && _expiry < now) {
if (_ttl != no_ttl && _expiry <= now) {
return false;
}
return _timestamp > t.timestamp;
@@ -689,7 +689,7 @@ public:
if (_ttl == dead) {
return true;
}
return _ttl != no_ttl && _expiry < now;
return _ttl != no_ttl && _expiry <= now;
}
// Can be called only when is_live().
bool is_expiring() const {

View File

@@ -177,6 +177,13 @@ future<> multishard_writer::distribute_mutation_fragments() {
return handle_end_of_stream();
}
});
}).handle_exception([this] (std::exception_ptr ep) {
for (auto& q : _queue_reader_handles) {
if (q) {
q->abort(ep);
}
}
return make_exception_future<>(std::move(ep));
});
}

View File

@@ -321,11 +321,7 @@ struct get_sync_boundary_response {
};
// Return value of the REPAIR_GET_COMBINED_ROW_HASH RPC verb
struct get_combined_row_hash_response {
repair_hash working_row_buf_combined_csum;
// The number of rows in the working row buf
uint64_t working_row_buf_nr;
};
using get_combined_row_hash_response = repair_hash;
struct node_repair_meta_id {
gms::inet_address ip;

View File

@@ -444,10 +444,14 @@ class repair_writer {
uint64_t _estimated_partitions;
size_t _nr_peer_nodes;
// Needs more than one for repair master
std::vector<std::optional<future<uint64_t>>> _writer_done;
std::vector<std::optional<future<>>> _writer_done;
std::vector<std::optional<seastar::queue<mutation_fragment_opt>>> _mq;
// Current partition written to disk
std::vector<lw_shared_ptr<const decorated_key_with_hash>> _current_dk_written_to_sstable;
// Is current partition still open. A partition is opened when a
// partition_start is written and is closed when a partition_end is
// written.
std::vector<bool> _partition_opened;
public:
repair_writer(
schema_ptr schema,
@@ -462,10 +466,13 @@ public:
future<> write_start_and_mf(lw_shared_ptr<const decorated_key_with_hash> dk, mutation_fragment mf, unsigned node_idx) {
_current_dk_written_to_sstable[node_idx] = dk;
if (mf.is_partition_start()) {
return _mq[node_idx]->push_eventually(mutation_fragment_opt(std::move(mf)));
return _mq[node_idx]->push_eventually(mutation_fragment_opt(std::move(mf))).then([this, node_idx] {
_partition_opened[node_idx] = true;
});
} else {
auto start = mutation_fragment(partition_start(dk->dk, tombstone()));
return _mq[node_idx]->push_eventually(mutation_fragment_opt(std::move(start))).then([this, node_idx, mf = std::move(mf)] () mutable {
_partition_opened[node_idx] = true;
return _mq[node_idx]->push_eventually(mutation_fragment_opt(std::move(mf)));
});
}
@@ -475,6 +482,7 @@ public:
_writer_done.resize(_nr_peer_nodes);
_mq.resize(_nr_peer_nodes);
_current_dk_written_to_sstable.resize(_nr_peer_nodes);
_partition_opened.resize(_nr_peer_nodes, false);
}
void create_writer(unsigned node_idx) {
@@ -516,7 +524,24 @@ public:
return consumer(std::move(reader));
});
},
t.stream_in_progress());
t.stream_in_progress()).then([this, node_idx] (uint64_t partitions) {
rlogger.debug("repair_writer: keyspace={}, table={}, managed to write partitions={} to sstable",
_schema->ks_name(), _schema->cf_name(), partitions);
}).handle_exception([this, node_idx] (std::exception_ptr ep) {
rlogger.warn("repair_writer: keyspace={}, table={}, multishard_writer failed: {}",
_schema->ks_name(), _schema->cf_name(), ep);
_mq[node_idx]->abort(ep);
return make_exception_future<>(std::move(ep));
});
}
future<> write_partition_end(unsigned node_idx) {
if (_partition_opened[node_idx]) {
return _mq[node_idx]->push_eventually(mutation_fragment(partition_end())).then([this, node_idx] {
_partition_opened[node_idx] = false;
});
}
return make_ready_future<>();
}
future<> do_write(unsigned node_idx, lw_shared_ptr<const decorated_key_with_hash> dk, mutation_fragment mf) {
@@ -524,7 +549,7 @@ public:
if (_current_dk_written_to_sstable[node_idx]->dk.equal(*_schema, dk->dk)) {
return _mq[node_idx]->push_eventually(mutation_fragment_opt(std::move(mf)));
} else {
return _mq[node_idx]->push_eventually(mutation_fragment(partition_end())).then([this,
return write_partition_end(node_idx).then([this,
node_idx, dk = std::move(dk), mf = std::move(mf)] () mutable {
return write_start_and_mf(std::move(dk), std::move(mf), node_idx);
});
@@ -534,21 +559,33 @@ public:
}
}
future<> write_end_of_stream(unsigned node_idx) {
if (_mq[node_idx]) {
// Partition_end is never sent on wire, so we have to write one ourselves.
return write_partition_end(node_idx).then([this, node_idx] () mutable {
// Empty mutation_fragment_opt means no more data, so the writer can seal the sstables.
return _mq[node_idx]->push_eventually(mutation_fragment_opt());
});
} else {
return make_ready_future<>();
}
}
future<> do_wait_for_writer_done(unsigned node_idx) {
if (_writer_done[node_idx]) {
return std::move(*(_writer_done[node_idx]));
} else {
return make_ready_future<>();
}
}
future<> wait_for_writer_done() {
return parallel_for_each(boost::irange(unsigned(0), unsigned(_nr_peer_nodes)), [this] (unsigned node_idx) {
if (_writer_done[node_idx] && _mq[node_idx]) {
// Partition_end is never sent on wire, so we have to write one ourselves.
return _mq[node_idx]->push_eventually(mutation_fragment(partition_end())).then([this, node_idx] () mutable {
// Empty mutation_fragment_opt means no more data, so the writer can seal the sstables.
return _mq[node_idx]->push_eventually(mutation_fragment_opt()).then([this, node_idx] () mutable {
return (*_writer_done[node_idx]).then([] (uint64_t partitions) {
rlogger.debug("Managed to write partitions={} to sstable", partitions);
return make_ready_future<>();
});
});
});
}
return make_ready_future<>();
return when_all_succeed(write_end_of_stream(node_idx), do_wait_for_writer_done(node_idx));
}).handle_exception([this] (std::exception_ptr ep) {
rlogger.warn("repair_writer: keyspace={}, table={}, wait_for_writer_done failed: {}",
_schema->ks_name(), _schema->cf_name(), ep);
return make_exception_future<>(std::move(ep));
});
}
};
@@ -1098,14 +1135,14 @@ private:
_working_row_buf_combined_hash.clear();
if (_row_buf.empty()) {
return make_ready_future<get_combined_row_hash_response>(get_combined_row_hash_response{repair_hash(), 0});
return make_ready_future<get_combined_row_hash_response>(get_combined_row_hash_response());
}
return move_row_buf_to_working_row_buf().then([this] {
return do_for_each(_working_row_buf, [this] (repair_row& r) {
_working_row_buf_combined_hash.add(r.hash());
return make_ready_future<>();
}).then([this] {
return get_combined_row_hash_response{_working_row_buf_combined_hash, _working_row_buf.size()};
return get_combined_row_hash_response{_working_row_buf_combined_hash};
});
});
}
@@ -1352,7 +1389,9 @@ public:
auto source_op = get_full_row_hashes_source_op(current_hashes, remote_node, node_idx, source);
auto sink_op = get_full_row_hashes_sink_op(sink);
return when_all_succeed(std::move(source_op), std::move(sink_op));
}).then([current_hashes] () mutable {
}).then([this, current_hashes] () mutable {
stats().rx_hashes_nr += current_hashes->size();
_metrics.rx_hashes_nr += current_hashes->size();
return std::move(*current_hashes);
});
}
@@ -1763,6 +1802,7 @@ static future<stop_iteration> repair_get_row_diff_with_rpc_stream_process_op(
return make_exception_future<stop_iteration>(std::runtime_error("get_row_diff_with_rpc_stream: Inject error in handler loop"));
}
bool needs_all_rows = hash_cmd.cmd == repair_stream_cmd::needs_all_rows;
_metrics.rx_hashes_nr += current_set_diff.size();
auto fp = make_foreign(std::make_unique<std::unordered_set<repair_hash>>(std::move(current_set_diff)));
return smp::submit_to(src_cpu_id % smp::count, [from, repair_meta_id, needs_all_rows, fp = std::move(fp)] {
auto rm = repair_meta::get_repair_meta(from, repair_meta_id);
@@ -2067,6 +2107,7 @@ future<> repair_init_messaging_service_handler(repair_service& rs, distributed<d
std::unordered_set<repair_hash> set_diff, bool needs_all_rows) {
auto src_cpu_id = cinfo.retrieve_auxiliary<uint32_t>("src_cpu_id");
auto from = cinfo.retrieve_auxiliary<gms::inet_address>("baddr");
_metrics.rx_hashes_nr += set_diff.size();
auto fp = make_foreign(std::make_unique<std::unordered_set<repair_hash>>(std::move(set_diff)));
return smp::submit_to(src_cpu_id % smp::count, [from, repair_meta_id, fp = std::move(fp), needs_all_rows] () mutable {
auto rm = repair_meta::get_repair_meta(from, repair_meta_id);
@@ -2170,7 +2211,7 @@ class row_level_repair {
// If the total size of the `_row_buf` on either of the nodes is zero,
// we set this flag, which is an indication that rows are not synced.
bool _zero_rows;
bool _zero_rows = false;
// Sum of estimated_partitions on all peers
uint64_t _estimated_partitions = 0;
@@ -2292,8 +2333,8 @@ private:
// are identical, there is no need to transfer each and every
// row hashes to the repair master.
return master.get_combined_row_hash(_common_sync_boundary, _all_nodes[idx]).then([&, this, idx] (get_combined_row_hash_response resp) {
rlogger.debug("Calling master.get_combined_row_hash for node {}, got combined_hash={}, rows_nr={}", _all_nodes[idx], resp.working_row_buf_combined_csum, resp.working_row_buf_nr);
combined_hashes[idx]= std::move(resp.working_row_buf_combined_csum);
rlogger.debug("Calling master.get_combined_row_hash for node {}, got combined_hash={}", _all_nodes[idx], resp);
combined_hashes[idx]= std::move(resp);
});
}).get();

View File

@@ -931,7 +931,6 @@ future<> row_cache::do_update(external_updater eu, memtable& m, Updater updater)
});
return seastar::async([this, &m, updater = std::move(updater), real_dirty_acc = std::move(real_dirty_acc)] () mutable {
coroutine update;
size_t size_entry;
// In case updater fails, we must bring the cache to consistency without deferring.
auto cleanup = defer([&m, this] {
@@ -939,6 +938,7 @@ future<> row_cache::do_update(external_updater eu, memtable& m, Updater updater)
_prev_snapshot_pos = {};
_prev_snapshot = {};
});
coroutine update; // Destroy before cleanup to release snapshots before invalidating.
partition_presence_checker is_present = _prev_snapshot->make_partition_presence_checker();
while (!m.partitions.empty()) {
with_allocator(_tracker.allocator(), [&] () {

View File

@@ -288,10 +288,10 @@ schema::schema(const raw_schema& raw, std::optional<raw_view_info> raw_view_info
+ column_offset(column_kind::regular_column),
_raw._columns.end(), column_definition::name_comparator(regular_column_name_type()));
std::sort(_raw._columns.begin(),
std::stable_sort(_raw._columns.begin(),
_raw._columns.begin() + column_offset(column_kind::clustering_key),
[] (auto x, auto y) { return x.id < y.id; });
std::sort(_raw._columns.begin() + column_offset(column_kind::clustering_key),
std::stable_sort(_raw._columns.begin() + column_offset(column_kind::clustering_key),
_raw._columns.begin() + column_offset(column_kind::static_column),
[] (auto x, auto y) { return x.id < y.id; });

View File

@@ -109,7 +109,10 @@ std::optional<std::map<sstring, sstring>> schema_mutations::cdc_options() const
if (_scylla_tables) {
auto rs = query::result_set(*_scylla_tables);
if (!rs.empty()) {
return db::schema_tables::get_map<sstring, sstring>(rs.row(0), "cdc");
auto map = db::schema_tables::get_map<sstring, sstring>(rs.row(0), "cdc");
if (map && !map->empty()) {
return map;
}
}
}
return { };

View File

@@ -58,7 +58,8 @@ EOS
# For systems with not a lot of memory, override default reservations for the slices
# seastar has a minimum reservation of 1.5GB that kicks in, and 21GB * 0.07 = 1.5GB.
# So for anything smaller than that we will not use percentages in the helper slice
MEMTOTAL_BYTES=$(cat /proc/meminfo | grep MemTotal | awk '{print $2 * 1024}')
MEMTOTAL=$(cat /proc/meminfo |grep -e "^MemTotal:"|sed -s 's/^MemTotal:\s*\([0-9]*\) kB$/\1/')
MEMTOTAL_BYTES=$(($MEMTOTAL * 1024))
if [ $MEMTOTAL_BYTES -lt 23008753371 ]; then
mkdir -p /etc/systemd/system/scylla-helper.slice.d/
cat << EOS > /etc/systemd/system/scylla-helper.slice.d/memory.conf

Submodule seastar updated: 6f0ef32514...c8668e98bd

View File

@@ -93,6 +93,7 @@ void migration_manager::init_messaging_service()
_feature_listeners.push_back(ss.cluster_supports_view_virtual_columns().when_enabled(update_schema));
_feature_listeners.push_back(ss.cluster_supports_digest_insensitive_to_expiry().when_enabled(update_schema));
_feature_listeners.push_back(ss.cluster_supports_cdc().when_enabled(update_schema));
auto& ms = netw::get_local_messaging_service();
ms.register_definitions_update([this] (const rpc::client_info& cinfo, std::vector<frozen_mutation> fm, rpc::optional<std::vector<canonical_mutation>> cm) {
@@ -311,7 +312,8 @@ future<> migration_manager::merge_schema_from(netw::messaging_service::msg_addr
try {
for (const auto& cm : canonical_mutations) {
auto& tbl = db.find_column_family(cm.column_family_id());
mutations.emplace_back(cm.to_mutation(tbl.schema()));
mutations.emplace_back(cm.to_mutation(
tbl.schema()));
}
} catch (no_such_column_family& e) {
mlogger.error("Error while applying schema mutations from {}: {}", src, e);
@@ -902,8 +904,9 @@ future<> migration_manager::announce(std::vector<mutation> mutations, bool annou
future<> migration_manager::push_schema_mutation(const gms::inet_address& endpoint, const std::vector<mutation>& schema)
{
netw::messaging_service::msg_addr id{endpoint, 0};
auto fm = std::vector<frozen_mutation>(schema.begin(), schema.end());
auto cm = std::vector<canonical_mutation>(schema.begin(), schema.end());
auto adjusted_schema = db::schema_tables::adjust_schema_for_schema_features(schema, get_local_storage_service().cluster_schema_features());
auto fm = std::vector<frozen_mutation>(adjusted_schema.begin(), adjusted_schema.end());
auto cm = std::vector<canonical_mutation>(adjusted_schema.begin(), adjusted_schema.end());
return netw::get_local_messaging_service().send_definitions_update(id, std::move(fm), std::move(cm));
}

View File

@@ -38,7 +38,12 @@ private:
public:
query_state(client_state& client_state, service_permit permit)
: _client_state(client_state)
, _trace_state_ptr(_client_state.get_trace_state())
, _trace_state_ptr(tracing::trace_state_ptr())
, _permit(std::move(permit))
{ }
query_state(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit)
: _client_state(client_state)
, _trace_state_ptr(std::move(trace_state))
, _permit(std::move(permit))
{ }

View File

@@ -706,7 +706,9 @@ static future<std::optional<utils::UUID>> sleep_and_restart() {
* nodes have seen the most recent commit. Otherwise, return null.
*/
future<utils::UUID> paxos_response_handler::begin_and_repair_paxos(client_state& cs, unsigned& contentions, bool is_write) {
_proxy->get_db().local().get_config().check_experimental("Paxos");
if (!_proxy->get_db().local().get_config().check_experimental(db::experimental_features_t::LWT)) {
throw std::runtime_error("Paxos is currently disabled. Start Scylla with --experimental-features=lwt to enable.");
}
return do_with(api::timestamp_type(0), shared_from_this(), [this, &cs, &contentions, is_write]
(api::timestamp_type& min_timestamp_micros_to_use, shared_ptr<paxos_response_handler>& prh) {
return repeat_until_value([this, &contentions, &cs, &min_timestamp_micros_to_use, is_write] {
@@ -1883,8 +1885,9 @@ storage_proxy::get_paxos_participants(const sstring& ks_name, const dht::token &
});
pending_endpoints.erase(itend, pending_endpoints.end());
size_t participants = pending_endpoints.size() + natural_endpoints.size();
size_t required_participants = db::quorum_for(ks) + pending_endpoints.size();
const size_t participants = pending_endpoints.size() + natural_endpoints.size();
const size_t quorum_size = natural_endpoints.size() / 2 + 1;
const size_t required_participants = quorum_size + pending_endpoints.size();
std::vector<gms::inet_address> live_endpoints;
live_endpoints.reserve(participants);
@@ -2180,6 +2183,14 @@ future<> storage_proxy::send_to_endpoint(
allow_hints);
}
future<> storage_proxy::mutate_hint_from_scratch(frozen_mutation_and_schema fm_a_s) {
// FIXME: using 1h as infinite timeout. If a node is down, we should get an
// unavailable exception.
const auto timeout = db::timeout_clock::now() + 1h;
std::array<mutation, 1> ms{fm_a_s.fm.unfreeze(fm_a_s.s)};
return mutate_internal(std::move(ms), db::consistency_level::ALL, false, nullptr, empty_service_permit(), timeout);
}
/**
* Send the mutations to the right targets, write it locally if it corresponds or writes a hint when the node
* is not available.
@@ -3932,7 +3943,7 @@ storage_proxy::do_query_with_paxos(schema_ptr s,
return make_ready_future<storage_proxy::coordinator_query_result>(f.get0());
} catch (request_timeout_exception& ex) {
_stats.cas_read_timeouts.mark();
return make_exception_future<storage_proxy::coordinator_query_result>(std::move(ex));
return make_exception_future<storage_proxy::coordinator_query_result>(std::current_exception());
} catch (exceptions::unavailable_exception& ex) {
_stats.cas_read_unavailables.mark();
return make_exception_future<storage_proxy::coordinator_query_result>(std::move(ex));
@@ -4059,7 +4070,7 @@ future<bool> storage_proxy::cas(schema_ptr schema, shared_ptr<cas_request> reque
return make_ready_future<bool>(f.get0());
} catch (request_timeout_exception& ex) {
_stats.cas_write_timeouts.mark();
return make_exception_future<bool>(std::move(ex));
return make_exception_future<bool>(std::current_exception());
} catch (exceptions::unavailable_exception& ex) {
_stats.cas_write_unavailables.mark();
return make_exception_future<bool>(std::move(ex));

View File

@@ -459,6 +459,8 @@ public:
*/
future<> mutate_atomically(std::vector<mutation> mutations, db::consistency_level cl, clock_type::time_point timeout, tracing::trace_state_ptr tr_state, service_permit permit);
future<> mutate_hint_from_scratch(frozen_mutation_and_schema fm_a_s);
// Send a mutation to one specific remote target.
// Inspired by Cassandra's StorageProxy.sendToHintedEndpoints but without
// hinted handoff support, and just one target. See also

View File

@@ -344,12 +344,11 @@ std::set<sstring> storage_service::get_config_supported_features_set() {
// This should only be true in tests (see cql_test_env.cc:storage_service_for_tests)
auto& db = service::get_local_storage_service().db();
if (db.local_is_initialized()) {
auto& config = service::get_local_storage_service().db().local().get_config();
auto& config = db.local().get_config();
if (config.enable_sstables_mc_format()) {
features.insert(MC_SSTABLE_FEATURE);
}
if (config.experimental()) {
// push additional experimental features
if (config.check_experimental(db::experimental_features_t::CDC)) {
features.insert(CDC_FEATURE);
}
}
@@ -1441,7 +1440,8 @@ future<> storage_service::drain_on_shutdown() {
ss._sys_dist_ks.invoke_on_all(&db::system_distributed_keyspace::stop).get();
slogger.info("Drain on shutdown: system distributed keyspace stopped");
get_storage_proxy().invoke_on_all([&ss] (storage_proxy& local_proxy) mutable {
get_storage_proxy().invoke_on_all([] (storage_proxy& local_proxy) mutable {
auto& ss = service::get_local_storage_service();
ss.unregister_subscriber(&local_proxy);
return local_proxy.drain_on_shutdown();
}).get();
@@ -3533,6 +3533,7 @@ db::schema_features storage_service::cluster_schema_features() const {
f.set_if<db::schema_feature::VIEW_VIRTUAL_COLUMNS>(bool(_view_virtual_columns));
f.set_if<db::schema_feature::DIGEST_INSENSITIVE_TO_EXPIRY>(bool(_digest_insensitive_to_expiry));
f.set_if<db::schema_feature::COMPUTED_COLUMNS>(bool(_computed_columns));
f.set_if<db::schema_feature::CDC_OPTIONS>(bool(_cdc_feature));
return f;
}

View File

@@ -2341,8 +2341,8 @@ public:
return bool(_mc_sstable_feature);
}
bool cluster_supports_cdc() const {
return bool(_cdc_feature);
const gms::feature& cluster_supports_cdc() const {
return _cdc_feature;
}
bool cluster_supports_row_level_repair() const {

View File

@@ -72,47 +72,8 @@ private:
static std::vector<column_info> build(
const schema& s,
const utils::chunked_vector<serialization_header::column_desc>& src,
bool is_static) {
std::vector<column_info> cols;
if (s.is_dense()) {
const column_definition& col = is_static ? *s.static_begin() : *s.regular_begin();
cols.push_back(column_info{
&col.name(),
col.type,
col.id,
col.type->value_length_if_fixed(),
col.is_multi_cell(),
col.is_counter(),
false
});
} else {
cols.reserve(src.size());
for (auto&& desc : src) {
const bytes& type_name = desc.type_name.value;
data_type type = db::marshal::type_parser::parse(to_sstring_view(type_name));
const column_definition* def = s.get_column_definition(desc.name.value);
std::optional<column_id> id;
bool schema_mismatch = false;
if (def) {
id = def->id;
schema_mismatch = def->is_multi_cell() != type->is_multi_cell() ||
def->is_counter() != type->is_counter() ||
!def->type->is_value_compatible_with(*type);
}
cols.push_back(column_info{
&desc.name.value,
type,
id,
type->value_length_if_fixed(),
type->is_multi_cell(),
type->is_counter(),
schema_mismatch
});
}
boost::range::stable_partition(cols, [](const column_info& column) { return !column.is_collection; });
}
return cols;
}
const sstable_enabled_features& features,
bool is_static);
utils::UUID schema_uuid;
std::vector<column_info> regular_schema_columns_from_sstable;
@@ -125,10 +86,10 @@ private:
state(state&&) = default;
state& operator=(state&&) = default;
state(const schema& s, const serialization_header& header)
state(const schema& s, const serialization_header& header, const sstable_enabled_features& features)
: schema_uuid(s.version())
, regular_schema_columns_from_sstable(build(s, header.regular_columns.elements, false))
, static_schema_columns_from_sstable(build(s, header.static_columns.elements, true))
, regular_schema_columns_from_sstable(build(s, header.regular_columns.elements, features, false))
, static_schema_columns_from_sstable(build(s, header.static_columns.elements, features, true))
, clustering_column_value_fix_lengths (get_clustering_values_fixed_lengths(header))
{}
};
@@ -136,9 +97,10 @@ private:
lw_shared_ptr<const state> _state = make_lw_shared<const state>();
public:
column_translation get_for_schema(const schema& s, const serialization_header& header) {
column_translation get_for_schema(
const schema& s, const serialization_header& header, const sstable_enabled_features& features) {
if (s.version() != _state->schema_uuid) {
_state = make_lw_shared(state(s, header));
_state = make_lw_shared(state(s, header, features));
}
return *this;
}

View File

@@ -38,6 +38,8 @@
*/
#include "mp_row_consumer.hh"
#include "column_translation.hh"
#include "concrete_types.hh"
namespace sstables {
@@ -79,4 +81,86 @@ atomic_cell make_counter_cell(api::timestamp_type timestamp, bytes_view value) {
return ccb.build(timestamp);
}
// See #6130.
static data_type freeze_types_in_collections(data_type t) {
return ::visit(*t, make_visitor(
[] (const map_type_impl& typ) -> data_type {
return map_type_impl::get_instance(
freeze_types_in_collections(typ.get_keys_type()->freeze()),
freeze_types_in_collections(typ.get_values_type()->freeze()),
typ.is_multi_cell());
},
[] (const set_type_impl& typ) -> data_type {
return set_type_impl::get_instance(
freeze_types_in_collections(typ.get_elements_type()->freeze()),
typ.is_multi_cell());
},
[] (const list_type_impl& typ) -> data_type {
return list_type_impl::get_instance(
freeze_types_in_collections(typ.get_elements_type()->freeze()),
typ.is_multi_cell());
},
[&] (const abstract_type& typ) -> data_type {
return std::move(t);
}
));
}
/* If this function returns false, the caller cannot assume that the SSTable comes from Scylla.
* It might, if for some reason a table was created using Scylla that didn't contain any feature bit,
* but that should never happen. */
static bool is_certainly_scylla_sstable(const sstable_enabled_features& features) {
return features.enabled_features;
}
std::vector<column_translation::column_info> column_translation::state::build(
const schema& s,
const utils::chunked_vector<serialization_header::column_desc>& src,
const sstable_enabled_features& features,
bool is_static) {
std::vector<column_info> cols;
if (s.is_dense()) {
const column_definition& col = is_static ? *s.static_begin() : *s.regular_begin();
cols.push_back(column_info{
&col.name(),
col.type,
col.id,
col.type->value_length_if_fixed(),
col.is_multi_cell(),
col.is_counter(),
false
});
} else {
cols.reserve(src.size());
for (auto&& desc : src) {
const bytes& type_name = desc.type_name.value;
data_type type = db::marshal::type_parser::parse(to_sstring_view(type_name));
if (!features.is_enabled(CorrectUDTsInCollections) && is_certainly_scylla_sstable(features)) {
// See #6130.
type = freeze_types_in_collections(std::move(type));
}
const column_definition* def = s.get_column_definition(desc.name.value);
std::optional<column_id> id;
bool schema_mismatch = false;
if (def) {
id = def->id;
schema_mismatch = def->is_multi_cell() != type->is_multi_cell() ||
def->is_counter() != type->is_counter() ||
!def->type->is_value_compatible_with(*type);
}
cols.push_back(column_info{
&desc.name.value,
type,
id,
type->value_length_if_fixed(),
type->is_multi_cell(),
type->is_counter(),
schema_mismatch
});
}
boost::range::stable_partition(cols, [](const column_info& column) { return !column.is_collection; });
}
return cols;
}
}

View File

@@ -1344,7 +1344,7 @@ public:
, _consumer(consumer)
, _sst(sst)
, _header(sst->get_serialization_header())
, _column_translation(sst->get_column_translation(s, _header))
, _column_translation(sst->get_column_translation(s, _header, sst->features()))
, _has_shadowable_tombstones(sst->has_shadowable_tombstones())
{
setup_columns(_regular_row, _column_translation.regular_columns());

View File

@@ -2699,7 +2699,7 @@ entry_descriptor entry_descriptor::make_descriptor(sstring sstdir, sstring fname
static std::regex la_mc("(la|mc)-(\\d+)-(\\w+)-(.*)");
static std::regex ka("(\\w+)-(\\w+)-ka-(\\d+)-(.*)");
static std::regex dir(".*/([^/]*)/(\\w+)-[\\da-fA-F]+(?:/staging|/upload|/snapshots/[^/]+)?/?");
static std::regex dir(".*/([^/]*)/([^/]+)-[\\da-fA-F]+(?:/staging|/upload|/snapshots/[^/]+)?/?");
std::smatch match;

View File

@@ -780,8 +780,9 @@ public:
const serialization_header& get_serialization_header() const {
return get_mutable_serialization_header(*_components);
}
column_translation get_column_translation(const schema& s, const serialization_header& h) {
return _column_translation.get_for_schema(s, h);
column_translation get_column_translation(
const schema& s, const serialization_header& h, const sstable_enabled_features& f) {
return _column_translation.get_for_schema(s, h, f);
}
const std::vector<unsigned>& get_shards_for_this_sstable() const {
return _shards;

View File

@@ -459,7 +459,8 @@ enum sstable_feature : uint8_t {
ShadowableTombstones = 2, // See #3885
CorrectStaticCompact = 3, // See #4139
CorrectEmptyCounters = 4, // See #4363
End = 5,
CorrectUDTsInCollections = 5, // See #6130
End = 6,
};
// Scylla-specific features enabled for a particular sstable.

View File

@@ -292,7 +292,7 @@ create_single_key_sstable_reader(column_family* cf,
filter_sstable_for_reader(sstables->select(pr), *cf, schema, pr, key, slice)
| boost::adaptors::transformed([&] (const sstables::shared_sstable& sstable) {
tracing::trace(trace_state, "Reading key {} from sstable {}", pr, seastar::value_of([&sstable] { return sstable->get_filename(); }));
return sstable->read_row_flat(schema, pr.start()->value(), slice, pc, resource_tracker, std::move(trace_state), fwd);
return sstable->read_row_flat(schema, pr.start()->value(), slice, pc, resource_tracker, trace_state, fwd);
})
);
if (readers.empty()) {
@@ -315,7 +315,7 @@ flat_mutation_reader make_range_sstable_reader(schema_ptr s,
{
auto reader_factory_fn = [s, &slice, &pc, resource_tracker, trace_state, fwd, fwd_mr, &monitor_generator]
(sstables::shared_sstable& sst, const dht::partition_range& pr) mutable {
return sst->read_range_rows_flat(s, pr, slice, pc, resource_tracker, std::move(trace_state), fwd, fwd_mr, monitor_generator(sst));
return sst->read_range_rows_flat(s, pr, slice, pc, resource_tracker, trace_state, fwd, fwd_mr, monitor_generator(sst));
};
return make_combined_reader(s, std::make_unique<incremental_reader_selector>(s,
std::move(sstables),
@@ -587,7 +587,7 @@ flat_mutation_reader make_local_shard_sstable_reader(schema_ptr s,
auto reader_factory_fn = [s, &slice, &pc, resource_tracker, trace_state, fwd, fwd_mr, &monitor_generator]
(sstables::shared_sstable& sst, const dht::partition_range& pr) mutable {
flat_mutation_reader reader = sst->read_range_rows_flat(s, pr, slice, pc,
resource_tracker, std::move(trace_state), fwd, fwd_mr, monitor_generator(sst));
resource_tracker, trace_state, fwd, fwd_mr, monitor_generator(sst));
if (sst->is_shared()) {
using sig = bool (&)(const dht::decorated_key&);
reader = make_filtering_reader(std::move(reader), sig(belongs_to_current_shard));
@@ -2543,7 +2543,7 @@ future<row_locker::lock_holder> table::do_push_view_replica_updates(const schema
std::move(slice),
std::move(m),
[base, views = std::move(views), lock = std::move(lock), this, timeout, source = std::move(source), &io_priority] (auto& pk, auto& slice, auto& m) mutable {
auto reader = source.make_reader(base, pk, slice, io_priority);
auto reader = source.make_reader(base, pk, slice, io_priority, nullptr, streamed_mutation::forwarding::no, mutation_reader::forwarding::no);
return this->generate_and_propagate_view_updates(base, std::move(views), std::move(m), std::move(reader)).then([lock = std::move(lock)] () mutable {
// return the local partition/row lock we have taken so it
// remains locked until the caller is done modifying this

View File

@@ -131,6 +131,7 @@ boost_tests = [
'data_listeners_test',
'truncation_migration_test',
'like_matcher_test',
'enum_option_test',
]
other_tests = [
@@ -265,7 +266,7 @@ if __name__ == "__main__":
env['UBSAN_OPTIONS'] = 'print_stacktrace=1'
env['BOOST_TEST_CATCH_SYSTEM_ERRORS'] = 'no'
def run_test(path, type, exec_args):
def run_test(path, repeat, type, exec_args):
boost_args = []
# avoid modifying in-place, it will change test_to_run
exec_args = exec_args + '--collectd 0'.split()
@@ -274,7 +275,7 @@ if __name__ == "__main__":
mode = 'release'
if path.startswith(os.path.join('build', 'debug')):
mode = 'debug'
xmlout = (args.jenkins + "." + mode + "." + os.path.basename(path.split()[0]) + ".boost.xml")
xmlout = (args.jenkins + "." + mode + "." + os.path.basename(path.split()[0]) + "." + str(repeat) + ".boost.xml")
boost_args += ['--report_level=no', '--logger=HRF,test_suite:XML,test_suite,' + xmlout]
if type == 'boost':
boost_args += ['--']
@@ -312,8 +313,8 @@ if __name__ == "__main__":
path = test[0]
test_type = test[1]
exec_args = test[2] if len(test) >= 3 else []
for _ in range(args.repeat):
futures.append(executor.submit(run_test, path, test_type, exec_args))
for repeat in range(args.repeat):
futures.append(executor.submit(run_test, path, repeat, test_type, exec_args))
results = []
cookie = len(futures)

View File

@@ -77,3 +77,45 @@ BOOST_AUTO_TEST_CASE(test_make_random_uuid) {
std::sort(uuids.begin(), uuids.end());
BOOST_CHECK(std::unique(uuids.begin(), uuids.end()) == uuids.end());
}
BOOST_AUTO_TEST_CASE(test_get_time_uuid) {
using namespace std::chrono;
auto uuid = utils::UUID_gen::get_time_UUID();
BOOST_CHECK(uuid.is_timestamp());
auto tp = system_clock::now();
uuid = utils::UUID_gen::get_time_UUID(tp);
BOOST_CHECK(uuid.is_timestamp());
auto millis = duration_cast<milliseconds>(tp.time_since_epoch()).count();
uuid = utils::UUID_gen::get_time_UUID(millis);
BOOST_CHECK(uuid.is_timestamp());
auto unix_timestamp = utils::UUID_gen::unix_timestamp(uuid);
BOOST_CHECK(unix_timestamp == millis);
}
BOOST_AUTO_TEST_CASE(test_min_time_uuid) {
using namespace std::chrono;
auto tp = system_clock::now();
auto millis = duration_cast<milliseconds>(tp.time_since_epoch()).count();
auto uuid = utils::UUID_gen::min_time_UUID(millis);
BOOST_CHECK(uuid.is_timestamp());
auto unix_timestamp = utils::UUID_gen::unix_timestamp(uuid);
BOOST_CHECK(unix_timestamp == millis);
}
BOOST_AUTO_TEST_CASE(test_max_time_uuid) {
using namespace std::chrono;
auto tp = system_clock::now();
auto millis = duration_cast<milliseconds>(tp.time_since_epoch()).count();
auto uuid = utils::UUID_gen::max_time_UUID(millis);
BOOST_CHECK(uuid.is_timestamp());
auto unix_timestamp = utils::UUID_gen::unix_timestamp(uuid);
BOOST_CHECK(unix_timestamp == millis);
}

View File

@@ -844,14 +844,20 @@ inline std::basic_ostream<Args...> & operator<<(std::basic_ostream<Args...> & os
}
}
namespace {
void throw_on_error(const sstring& opt, const sstring& msg, std::optional<utils::config_file::value_status> status) {
if (status != config::value_status::Invalid) {
throw std::invalid_argument(msg + " : " + opt);
}
}
} // anonymous namespace
SEASTAR_TEST_CASE(test_parse_yaml) {
config cfg;
cfg.read_from_yaml(cassandra_conf, [](auto& opt, auto& msg, auto status) {
if (status != config::value_status::Invalid) {
throw std::invalid_argument(msg + " : " + opt);
}
});
cfg.read_from_yaml(cassandra_conf, throw_on_error);
BOOST_CHECK_EQUAL(cfg.cluster_name(), "Test Cluster");
BOOST_CHECK_EQUAL(cfg.cluster_name.is_set(), true);
@@ -917,3 +923,62 @@ SEASTAR_TEST_CASE(test_parse_broken) {
return make_ready_future<>();
}
using ef = experimental_features_t;
using features = std::vector<enum_option<ef>>;
SEASTAR_TEST_CASE(test_parse_experimental_features_cdc) {
config cfg;
cfg.read_from_yaml("experimental_features:\n - cdc\n", throw_on_error);
BOOST_CHECK_EQUAL(cfg.experimental_features(), features{ef::CDC});
BOOST_CHECK(cfg.check_experimental(ef::CDC));
BOOST_CHECK(!cfg.check_experimental(ef::LWT));
return make_ready_future();
}
SEASTAR_TEST_CASE(test_parse_experimental_features_lwt) {
config cfg;
cfg.read_from_yaml("experimental_features:\n - lwt\n", throw_on_error);
BOOST_CHECK_EQUAL(cfg.experimental_features(), features{ef::LWT});
BOOST_CHECK(!cfg.check_experimental(ef::CDC));
BOOST_CHECK(cfg.check_experimental(ef::LWT));
return make_ready_future();
}
SEASTAR_TEST_CASE(test_parse_experimental_features_multiple) {
config cfg;
cfg.read_from_yaml("experimental_features:\n - cdc\n - lwt\n - cdc\n", throw_on_error);
BOOST_CHECK_EQUAL(cfg.experimental_features(), (features{ef::CDC, ef::LWT, ef::CDC}));
BOOST_CHECK(cfg.check_experimental(ef::CDC));
BOOST_CHECK(cfg.check_experimental(ef::LWT));
return make_ready_future();
}
SEASTAR_TEST_CASE(test_parse_experimental_features_invalid) {
config cfg;
using value_status = utils::config_file::value_status;
cfg.read_from_yaml("experimental_features:\n - invalidoptiontvaluedonotuse\n",
[&cfg] (const sstring& opt, const sstring& msg, std::optional<value_status> status) {
BOOST_REQUIRE_EQUAL(opt, "experimental_features");
BOOST_REQUIRE_NE(msg.find("line 2, column 7"), msg.npos);
BOOST_CHECK(!cfg.check_experimental(ef::CDC));
BOOST_CHECK(!cfg.check_experimental(ef::LWT));
});
return make_ready_future();
}
SEASTAR_TEST_CASE(test_parse_experimental_true) {
config cfg;
cfg.read_from_yaml("experimental: true", throw_on_error);
BOOST_CHECK(cfg.check_experimental(ef::CDC));
BOOST_CHECK(cfg.check_experimental(ef::LWT));
return make_ready_future();
}
SEASTAR_TEST_CASE(test_parse_experimental_false) {
config cfg;
cfg.read_from_yaml("experimental: false", throw_on_error);
BOOST_CHECK(!cfg.check_experimental(ef::CDC));
BOOST_CHECK(!cfg.check_experimental(ef::LWT));
return make_ready_future();
}

View File

@@ -4070,6 +4070,8 @@ SEASTAR_TEST_CASE(test_like_operator_on_clustering_key) {
require_rows(e, "select s from t where s like '%c' allow filtering", {{T("abc")}});
cquery_nofail(e, "insert into t (p, s) values (2, 'acc')");
require_rows(e, "select s from t where s like '%c' allow filtering", {{T("abc")}, {T("acc")}});
cquery_nofail(e, "insert into t (p, s) values (2, 'acd')");
require_rows(e, "select s from t where p = 2 and s like '%c' allow filtering", {{T("acc")}});
});
}
@@ -4261,3 +4263,272 @@ SEASTAR_TEST_CASE(test_rf_expand) {
});
});
}
// Test that tombstones with future timestamps work correctly
// when a write with lower timestamp arrives - in such case,
// if the base row is covered by such a tombstone, a view update
// needs to take it into account. Refs #5793
SEASTAR_TEST_CASE(test_views_with_future_tombstones) {
return do_with_cql_env_thread([] (auto& e) {
cquery_nofail(e, "CREATE TABLE t (a int, b int, c int, d int, e int, PRIMARY KEY (a,b,c));");
cquery_nofail(e, "CREATE MATERIALIZED VIEW tv AS SELECT * FROM t"
" WHERE a IS NOT NULL AND b IS NOT NULL AND c IS NOT NULL PRIMARY KEY (b,a,c);");
// Partition tombstone
cquery_nofail(e, "delete from t using timestamp 10 where a=1;");
auto msg = cquery_nofail(e, "select * from t;");
assert_that(msg).is_rows().with_size(0);
cquery_nofail(e, "insert into t (a,b,c,d,e) values (1,2,3,4,5) using timestamp 8;");
msg = cquery_nofail(e, "select * from t;");
assert_that(msg).is_rows().with_size(0);
msg = cquery_nofail(e, "select * from tv;");
assert_that(msg).is_rows().with_size(0);
// Range tombstone
cquery_nofail(e, "delete from t using timestamp 16 where a=2 and b > 1 and b < 4;");
msg = cquery_nofail(e, "select * from t;");
assert_that(msg).is_rows().with_size(0);
cquery_nofail(e, "insert into t (a,b,c,d,e) values (2,3,4,5,6) using timestamp 12;");
msg = cquery_nofail(e, "select * from t;");
assert_that(msg).is_rows().with_size(0);
msg = cquery_nofail(e, "select * from tv;");
assert_that(msg).is_rows().with_size(0);
// Row tombstone
cquery_nofail(e, "delete from t using timestamp 24 where a=3 and b=4 and c=5;");
msg = cquery_nofail(e, "select * from t;");
assert_that(msg).is_rows().with_size(0);
cquery_nofail(e, "insert into t (a,b,c,d,e) values (3,4,5,6,7) using timestamp 18;");
msg = cquery_nofail(e, "select * from t;");
assert_that(msg).is_rows().with_size(0);
msg = cquery_nofail(e, "select * from tv;");
assert_that(msg).is_rows().with_size(0);
});
}
shared_ptr<cql_transport::messages::result_message> cql_func_require_nofail(
cql_test_env& env,
const seastar::sstring& fct,
const seastar::sstring& inp,
std::unique_ptr<cql3::query_options>&& qo = nullptr,
const std::experimental::source_location& loc = std::experimental::source_location::current()) {
auto res = shared_ptr<cql_transport::messages::result_message>(nullptr);
auto query = format("SELECT {}({}) FROM t;", fct, inp);
try {
if (qo) {
res = env.execute_cql(query, std::move(qo)).get0();
} else {
res = env.execute_cql(query).get0();
}
BOOST_TEST_MESSAGE(format("Query '{}' succeeded as expected", query));
} catch (...) {
BOOST_ERROR(format("query '{}' failed unexpectedly with error: {}\n{}:{}: originally from here",
query, std::current_exception(),
loc.file_name(), loc.line()));
}
return res;
}
// FIXME: should be in cql_assertions, but we don't want to call boost from cql_assertions.hh
template <typename Exception>
void cql_func_require_throw(
cql_test_env& env,
const seastar::sstring& fct,
const seastar::sstring& inp,
std::unique_ptr<cql3::query_options>&& qo = nullptr,
const std::experimental::source_location& loc = std::experimental::source_location::current()) {
auto query = format("SELECT {}({}) FROM t;", fct, inp);
try {
if (qo) {
env.execute_cql(query, std::move(qo)).get();
} else {
env.execute_cql(query).get();
}
BOOST_ERROR(format("query '{}' succeeded unexpectedly\n{}:{}: originally from here", query,
loc.file_name(), loc.line()));
} catch (Exception& e) {
BOOST_TEST_MESSAGE(format("Query '{}' failed as expected with error: {}", query, e));
} catch (...) {
BOOST_ERROR(format("query '{}' failed with unexpected error: {}\n{}:{}: originally from here",
query, std::current_exception(),
loc.file_name(), loc.line()));
}
}
static void create_time_uuid_fcts_schema(cql_test_env& e) {
cquery_nofail(e, "CREATE TABLE t (id int primary key, t timestamp, l bigint, f float, u timeuuid, d date)");
cquery_nofail(e, "INSERT INTO t (id, t, l, f, u, d) VALUES "
"(1, 1579072460606, 1579072460606000, 1579072460606, a66525e0-3766-11ea-8080-808080808080, '2020-01-13')");
cquery_nofail(e, "SELECT * FROM t;");
}
SEASTAR_TEST_CASE(test_basic_time_uuid_fcts) {
return do_with_cql_env_thread([] (auto& e) {
create_time_uuid_fcts_schema(e);
cql_func_require_nofail(e, "currenttime", "");
cql_func_require_nofail(e, "currentdate", "");
cql_func_require_nofail(e, "now", "");
cql_func_require_nofail(e, "currenttimeuuid", "");
cql_func_require_nofail(e, "currenttimestamp", "");
});
}
SEASTAR_TEST_CASE(test_time_uuid_fcts_input_validation) {
return do_with_cql_env_thread([] (auto& e) {
create_time_uuid_fcts_schema(e);
// test timestamp arg
auto require_timestamp = [&e] (const sstring& fct) {
cql_func_require_nofail(e, fct, "t");
cql_func_require_throw<exceptions::server_exception>(e, fct, "l");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "f");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "u");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "d");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "currenttime()");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "currentdate()");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "now()");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "currenttimeuuid()");
cql_func_require_nofail(e, fct, "currenttimestamp()");
};
require_timestamp("mintimeuuid");
require_timestamp("maxtimeuuid");
// test timeuuid arg
auto require_timeuuid = [&e] (const sstring& fct) {
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "t");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "l");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "f");
cql_func_require_nofail(e, fct, "u");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "d");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "currenttime()");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "currentdate()");
cql_func_require_nofail(e, fct, "now()");
cql_func_require_nofail(e, fct, "currenttimeuuid()");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "currenttimestamp()");
};
require_timeuuid("dateof");
require_timeuuid("unixtimestampof");
// test timeuuid or date arg
auto require_timeuuid_or_date = [&e] (const sstring& fct) {
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "t");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "l");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "f");
cql_func_require_nofail(e, fct, "u");
cql_func_require_nofail(e, fct, "d");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "currenttime()");
cql_func_require_nofail(e, fct, "currentdate()");
cql_func_require_nofail(e, fct, "now()");
cql_func_require_nofail(e, fct, "currenttimeuuid()");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "currenttimestamp()");
};
require_timeuuid_or_date("totimestamp");
// test timestamp or timeuuid arg
auto require_timestamp_or_timeuuid = [&e] (const sstring& fct) {
cql_func_require_nofail(e, fct, "t");
cql_func_require_throw<std::exception>(e, fct, "l");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "f");
cql_func_require_nofail(e, fct, "u");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "d");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "currenttime()");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "currentdate()");
cql_func_require_nofail(e, fct, "now()");
cql_func_require_nofail(e, fct, "currenttimeuuid()");
cql_func_require_nofail(e, fct, "currenttimestamp()");
};
require_timestamp_or_timeuuid("todate");
// test timestamp, timeuuid, or date arg
auto require_timestamp_timeuuid_or_date = [&e] (const sstring& fct) {
cql_func_require_nofail(e, fct, "t");
cql_func_require_throw<exceptions::server_exception>(e, fct, "l");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "f");
cql_func_require_nofail(e, fct, "u");
cql_func_require_nofail(e, fct, "d");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "currenttime()");
cql_func_require_nofail(e, fct, "currentdate()");
cql_func_require_nofail(e, fct, "now()");
cql_func_require_nofail(e, fct, "currenttimeuuid()");
cql_func_require_nofail(e, fct, "currenttimestamp()");
};
require_timestamp_timeuuid_or_date("tounixtimestamp");
});
}
SEASTAR_TEST_CASE(test_time_uuid_fcts_result) {
return do_with_cql_env_thread([] (auto& e) {
create_time_uuid_fcts_schema(e);
// test timestamp arg
auto require_timestamp = [&e] (const sstring& fct) {
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "mintimeuuid(t)");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "maxtimeuuid(t)");
cql_func_require_nofail(e, fct, "dateof(u)");
cql_func_require_nofail(e, fct, "unixtimestampof(u)");
cql_func_require_nofail(e, fct, "totimestamp(u)");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "todate(u)");
cql_func_require_nofail(e, fct, "tounixtimestamp(u)");
};
require_timestamp("mintimeuuid");
require_timestamp("maxtimeuuid");
// test timeuuid arg
auto require_timeuuid = [&e] (const sstring& fct) {
cql_func_require_nofail(e, fct, "mintimeuuid(t)");
cql_func_require_nofail(e, fct, "maxtimeuuid(t)");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "dateof(u)");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "unixtimestampof(u)");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "totimestamp(u)");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "todate(u)");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "tounixtimestamp(u)");
};
require_timeuuid("dateof");
require_timeuuid("unixtimestampof");
// test timeuuid or date arg
auto require_timeuuid_or_date = [&e] (const sstring& fct) {
cql_func_require_nofail(e, fct, "mintimeuuid(t)");
cql_func_require_nofail(e, fct, "maxtimeuuid(t)");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "dateof(u)");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "unixtimestampof(u)");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "totimestamp(u)");
cql_func_require_nofail(e, fct, "todate(u)");
cql_func_require_throw<exceptions::invalid_request_exception>(e, fct, "tounixtimestamp(u)");
};
require_timeuuid_or_date("totimestamp");
// test timestamp or timeuuid arg
auto require_timestamp_or_timeuuid = [&e] (const sstring& fct) {
};
require_timestamp_or_timeuuid("todate");
// test timestamp, timeuuid, or date arg
auto require_timestamp_timeuuid_or_date = [&e] (const sstring& fct) {
cql_func_require_nofail(e, fct, "mintimeuuid(t)");
cql_func_require_nofail(e, fct, "maxtimeuuid(t)");
cql_func_require_nofail(e, fct, "dateof(u)");
cql_func_require_nofail(e, fct, "unixtimestampof(u)");
cql_func_require_nofail(e, fct, "totimestamp(u)");
cql_func_require_nofail(e, fct, "todate(u)");
cql_func_require_nofail(e, fct, "tounixtimestamp(u)");
};
require_timestamp_timeuuid_or_date("tounixtimestamp");
});
}

View File

@@ -352,7 +352,10 @@ public:
cfg->view_hints_directory.set(data_dir_path + "/view_hints.dir");
cfg->num_tokens.set(256);
cfg->ring_delay_ms.set(500);
cfg->experimental.set(true);
auto features = cfg->experimental_features();
features.emplace_back(db::experimental_features_t::CDC);
features.emplace_back(db::experimental_features_t::LWT);
cfg->experimental_features(features);
cfg->shutdown_announce_in_ms.set(0);
cfg->broadcast_to_all_shards().get();
create_directories((data_dir_path + "/system").c_str());

175
tests/enum_option_test.cc Normal file
View File

@@ -0,0 +1,175 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#define BOOST_TEST_MODULE core
#include <boost/test/unit_test.hpp>
#include <array>
#include <boost/program_options.hpp>
#include <map>
#include <set>
#include <sstream>
#include <string>
#include <unordered_map>
#include "utils/enum_option.hh"
namespace po = boost::program_options;
namespace {
struct days {
enum enumeration { Mo, Tu, We, Th, Fr, Sa, Su };
static std::unordered_map<std::string, enumeration> map() {
return {{"Mon", Mo}, {"Tue", Tu}, {"Wed", We}, {"Thu", Th}, {"Fri", Fr}, {"Sat", Sa}, {"Sun", Su}};
}
};
template <typename T>
enum_option<T> parse(const char* value) {
po::options_description desc("Allowed options");
desc.add_options()("opt", po::value<enum_option<T>>(), "Option");
po::variables_map vm;
const char* argv[] = {"$0", "--opt", value};
po::store(po::parse_command_line(3, argv, desc), vm);
return vm["opt"].as<enum_option<T>>();
}
template <typename T>
std::string format(typename T::enumeration d) {
std::ostringstream os;
os << enum_option<T>(d);
return os.str();
}
} // anonymous namespace
BOOST_AUTO_TEST_CASE(test_parsing) {
BOOST_CHECK_EQUAL(parse<days>("Sun"), days::Su);
BOOST_CHECK_EQUAL(parse<days>("Mon"), days::Mo);
BOOST_CHECK_EQUAL(parse<days>("Tue"), days::Tu);
BOOST_CHECK_EQUAL(parse<days>("Wed"), days::We);
BOOST_CHECK_EQUAL(parse<days>("Thu"), days::Th);
BOOST_CHECK_EQUAL(parse<days>("Fri"), days::Fr);
BOOST_CHECK_EQUAL(parse<days>("Sat"), days::Sa);
}
BOOST_AUTO_TEST_CASE(test_parsing_error) {
BOOST_REQUIRE_THROW(parse<days>("Sunday"), po::invalid_option_value);
BOOST_REQUIRE_THROW(parse<days>(""), po::invalid_option_value);
BOOST_REQUIRE_THROW(parse<days>(" "), po::invalid_option_value);
BOOST_REQUIRE_THROW(parse<days>(" Sun"), po::invalid_option_value);
}
BOOST_AUTO_TEST_CASE(test_formatting) {
BOOST_CHECK_EQUAL(format<days>(days::Mo), "Mon");
BOOST_CHECK_EQUAL(format<days>(days::Tu), "Tue");
BOOST_CHECK_EQUAL(format<days>(days::We), "Wed");
BOOST_CHECK_EQUAL(format<days>(days::Th), "Thu");
BOOST_CHECK_EQUAL(format<days>(days::Fr), "Fri");
BOOST_CHECK_EQUAL(format<days>(days::Sa), "Sat");
BOOST_CHECK_EQUAL(format<days>(days::Su), "Sun");
}
BOOST_AUTO_TEST_CASE(test_formatting_unknown) {
BOOST_CHECK_EQUAL(format<days>(static_cast<days::enumeration>(77)), "?unknown");
}
namespace {
struct names {
enum enumeration { John, Jane, Jim };
static std::map<std::string, enumeration> map() {
return {{"John", John}, {"Jane", Jane}, {"James", Jim}};
}
};
} // anonymous namespace
BOOST_AUTO_TEST_CASE(test_ordered_map) {
BOOST_CHECK_EQUAL(parse<names>("James"), names::Jim);
BOOST_CHECK_EQUAL(format<names>(names::Jim), "James");
BOOST_CHECK_EQUAL(parse<names>("John"), names::John);
BOOST_CHECK_EQUAL(format<names>(names::John), "John");
BOOST_CHECK_EQUAL(parse<names>("Jane"), names::Jane);
BOOST_CHECK_EQUAL(format<names>(names::Jane), "Jane");
BOOST_CHECK_THROW(parse<names>("Jimbo"), po::invalid_option_value);
BOOST_CHECK_EQUAL(format<names>(static_cast<names::enumeration>(77)), "?unknown");
}
namespace {
struct cities {
enum enumeration { SF, TO, NY };
static std::unordered_map<std::string, enumeration> map() {
return {
{"SanFrancisco", SF}, {"SF", SF}, {"SFO", SF}, {"Frisco", SF},
{"Toronto", TO}, {"TO", TO}, {"YYZ", TO}, {"TheSix", TO},
{"NewYork", NY}, {"NY", NY}, {"NYC", NY}, {"BigApple", NY},
};
}
};
} // anonymous namespace
BOOST_AUTO_TEST_CASE(test_multiple_parse) {
BOOST_CHECK_EQUAL(parse<cities>("SanFrancisco"), cities::SF);
BOOST_CHECK_EQUAL(parse<cities>("SF"), cities::SF);
BOOST_CHECK_EQUAL(parse<cities>("SFO"), cities::SF);
BOOST_CHECK_EQUAL(parse<cities>("Frisco"), cities::SF);
BOOST_CHECK_EQUAL(parse<cities>("Toronto"), cities::TO);
BOOST_CHECK_EQUAL(parse<cities>("TO"), cities::TO);
BOOST_CHECK_EQUAL(parse<cities>("YYZ"), cities::TO);
BOOST_CHECK_EQUAL(parse<cities>("TheSix"), cities::TO);
BOOST_CHECK_EQUAL(parse<cities>("NewYork"), cities::NY);
BOOST_CHECK_EQUAL(parse<cities>("NY"), cities::NY);
BOOST_CHECK_EQUAL(parse<cities>("NYC"), cities::NY);
BOOST_CHECK_EQUAL(parse<cities>("BigApple"), cities::NY);
}
BOOST_AUTO_TEST_CASE(test_multiple_format) {
BOOST_CHECK((std::set<std::string>{"SanFrancisco", "SF", "SFO", "Frisco"}).count(format<cities>(cities::SF)));
BOOST_CHECK((std::set<std::string>{"Toronto", "TO", "YYZ", "TheSix"}).count(format<cities>(cities::TO)));
BOOST_CHECK((std::set<std::string>{"NewYork", "NY", "NYC", "BigApple"}).count(format<cities>(cities::NY)));
}
namespace {
struct numbers {
enum enumeration { ONE, TWO };
static std::unordered_map<int, enumeration> map() {
return {{1, ONE}, {2, TWO}};
}
};
} // anonymous namespace
BOOST_AUTO_TEST_CASE(test_non_string) {
BOOST_CHECK_EQUAL(parse<numbers>("1"), numbers::ONE);
BOOST_CHECK_EQUAL(parse<numbers>("2"), numbers::TWO);
BOOST_CHECK_THROW(parse<numbers>("3"), po::invalid_option_value);
BOOST_CHECK_THROW(parse<numbers>("xx"), po::invalid_option_value);
BOOST_CHECK_THROW(parse<numbers>(""), po::invalid_option_value);
BOOST_CHECK_EQUAL(format<numbers>(numbers::ONE), "1");
BOOST_CHECK_EQUAL(format<numbers>(numbers::TWO), "2");
BOOST_CHECK_EQUAL(format<numbers>(static_cast<numbers::enumeration>(77)), "?unknown");
}

View File

@@ -25,7 +25,7 @@
#include <seastar/util/noncopyable_function.hh>
inline
void eventually(noncopyable_function<void ()> f, size_t max_attempts = 12) {
void eventually(noncopyable_function<void ()> f, size_t max_attempts = 17) {
size_t attempts = 0;
while (true) {
try {
@@ -43,7 +43,7 @@ void eventually(noncopyable_function<void ()> f, size_t max_attempts = 12) {
inline
bool eventually_true(noncopyable_function<bool ()> f) {
const unsigned max_attempts = 10;
const unsigned max_attempts = 15;
unsigned attempts = 0;
while (true) {
if (f()) {

View File

@@ -1320,6 +1320,104 @@ SEASTAR_THREAD_TEST_CASE(test_mutation_upgrade_type_change) {
assert_that(m).is_equal_to(m2);
}
// This test checks the behavior of row_marker::{is_live, is_dead, compact_and_expire}. Those functions have some
// duplicated logic that decides if a row is expired, and this test verifies that they behave the same with respect
// to TTL.
SEASTAR_THREAD_TEST_CASE(test_row_marker_expiry) {
can_gc_fn never_gc = [] (tombstone) { return false; };
auto must_be_alive = [&] (row_marker mark, gc_clock::time_point t) {
BOOST_TEST_MESSAGE(format("must_be_alive({}, {})", mark, t));
BOOST_REQUIRE(mark.is_live(tombstone(), t));
BOOST_REQUIRE(mark.is_missing() || !mark.is_dead(t));
BOOST_REQUIRE(mark.compact_and_expire(tombstone(), t, never_gc, gc_clock::time_point()));
};
auto must_be_dead = [&] (row_marker mark, gc_clock::time_point t) {
BOOST_TEST_MESSAGE(format("must_be_dead({}, {})", mark, t));
BOOST_REQUIRE(!mark.is_live(tombstone(), t));
BOOST_REQUIRE(mark.is_missing() || mark.is_dead(t));
BOOST_REQUIRE(!mark.compact_and_expire(tombstone(), t, never_gc, gc_clock::time_point()));
};
const auto timestamp = api::timestamp_type(1);
const auto t0 = gc_clock::now();
const auto t1 = t0 + 1s;
const auto t2 = t0 + 2s;
const auto t3 = t0 + 3s;
// Without timestamp the marker is missing (doesn't exist)
const row_marker m1;
must_be_dead(m1, t0);
must_be_dead(m1, t1);
must_be_dead(m1, t2);
must_be_dead(m1, t3);
// With timestamp and without ttl, a row_marker is always alive
const row_marker m2(timestamp);
must_be_alive(m2, t0);
must_be_alive(m2, t1);
must_be_alive(m2, t2);
must_be_alive(m2, t3);
// A row_marker becomes dead exactly at the moment of expiry
// Reproduces #4263, #5290
const auto ttl = 1s;
const row_marker m3(timestamp, ttl, t2);
must_be_alive(m3, t0);
must_be_alive(m3, t1);
must_be_dead(m3, t2);
must_be_dead(m3, t3);
}
SEASTAR_THREAD_TEST_CASE(test_querying_expired_rows) {
auto s = schema_builder("ks", "cf")
.with_column("pk", bytes_type, column_kind::partition_key)
.with_column("ck", bytes_type, column_kind::clustering_key)
.build();
auto pk = partition_key::from_singular(*s, data_value(bytes("key1")));
auto ckey1 = clustering_key::from_singular(*s, data_value(bytes("A")));
auto ckey2 = clustering_key::from_singular(*s, data_value(bytes("B")));
auto ckey3 = clustering_key::from_singular(*s, data_value(bytes("C")));
auto ttl = 1s;
auto t0 = gc_clock::now();
auto t1 = t0 + 1s;
auto t2 = t0 + 2s;
auto t3 = t0 + 3s;
auto results_at_time = [s] (const mutation& m, gc_clock::time_point t) {
auto slice = partition_slice_builder(*s)
.without_partition_key_columns()
.build();
auto opts = query::result_options{query::result_request::result_and_digest, query::digest_algorithm::xxHash};
return query::result_set::from_raw_result(s, slice, m.query(slice, opts, t));
};
mutation m(s, pk);
m.partition().clustered_row(*m.schema(), ckey1).apply(row_marker(api::new_timestamp(), ttl, t1));
m.partition().clustered_row(*m.schema(), ckey2).apply(row_marker(api::new_timestamp(), ttl, t2));
m.partition().clustered_row(*m.schema(), ckey3).apply(row_marker(api::new_timestamp(), ttl, t3));
assert_that(results_at_time(m, t0))
.has_size(3)
.has(a_row().with_column("ck", data_value(bytes("A"))))
.has(a_row().with_column("ck", data_value(bytes("B"))))
.has(a_row().with_column("ck", data_value(bytes("C"))));
assert_that(results_at_time(m, t1))
.has_size(2)
.has(a_row().with_column("ck", data_value(bytes("B"))))
.has(a_row().with_column("ck", data_value(bytes("C"))));
assert_that(results_at_time(m, t2))
.has_size(1)
.has(a_row().with_column("ck", data_value(bytes("C"))));
assert_that(results_at_time(m, t3)).is_empty();
}
SEASTAR_TEST_CASE(test_querying_expired_cells) {
return seastar::async([] {
auto s = schema_builder("ks", "cf")

View File

@@ -118,6 +118,53 @@ SEASTAR_TEST_CASE(test_multishard_writer) {
});
}
SEASTAR_TEST_CASE(test_multishard_writer_producer_aborts) {
return do_with_cql_env_thread([] (cql_test_env& e) {
auto test_random_streams = [] (random_mutation_generator&& gen, size_t partition_nr, generate_error error = generate_error::no) {
auto muts = gen(partition_nr);
schema_ptr s = gen.schema();
auto source_reader = partition_nr > 0 ? flat_mutation_reader_from_mutations(muts) : make_empty_flat_reader(s);
int mf_produced = 0;
auto get_next_mutation_fragment = [&source_reader, &mf_produced] () mutable {
if (mf_produced++ > 800) {
return make_exception_future<mutation_fragment_opt>(std::runtime_error("the producer failed"));
} else {
return source_reader(db::no_timeout);
}
};
auto& partitioner = dht::global_partitioner();
try {
distribute_reader_and_consume_on_shards(s, partitioner,
make_generating_reader(s, std::move(get_next_mutation_fragment)),
[&partitioner, error] (flat_mutation_reader reader) mutable {
if (error) {
return make_exception_future<>(std::runtime_error("Failed to write"));
}
return repeat([&partitioner, reader = std::move(reader), error] () mutable {
return reader(db::no_timeout).then([&partitioner, error] (mutation_fragment_opt mf_opt) mutable {
if (mf_opt) {
if (mf_opt->is_partition_start()) {
auto shard = partitioner.shard_of(mf_opt->as_partition_start().key().token());
BOOST_REQUIRE_EQUAL(shard, this_shard_id());
}
return make_ready_future<stop_iteration>(stop_iteration::no);
} else {
return make_ready_future<stop_iteration>(stop_iteration::yes);
}
});
});
}
).get0();
} catch (...) {
// The distribute_reader_and_consume_on_shards is expected to fail and not block forever
}
};
test_random_streams(random_mutation_generator(random_mutation_generator::generate_counters::no, local_shard_only::yes), 1000, generate_error::no);
test_random_streams(random_mutation_generator(random_mutation_generator::generate_counters::no, local_shard_only::yes), 1000, generate_error::yes);
});
}
namespace {
class bucket_writer {

View File

@@ -371,10 +371,8 @@ SEASTAR_TEST_CASE(test_merging_does_not_alter_tables_which_didnt_change) {
muts2.push_back(db::schema_tables::make_scylla_tables_mutation(s0, api::new_timestamp()));
mm.announce(muts2).get();
// SCYLLA_TABLES have additional columns so announcing its mutation
// changes the tables
BOOST_REQUIRE(s1 != find_table().schema());
BOOST_REQUIRE(legacy_version != find_table().schema()->version());
BOOST_REQUIRE(s1 == find_table().schema());
BOOST_REQUIRE_EQUAL(legacy_version, find_table().schema()->version());
});
});
}
@@ -575,7 +573,7 @@ SEASTAR_TEST_CASE(test_prepared_statement_is_invalidated_by_schema_change) {
// We don't want schema digest to change between Scylla versions because that results in a schema disagreement
// during rolling upgrade.
future<> test_schema_digest_does_not_change_with_disabled_features(sstring data_dir, std::set<sstring> disabled_features, std::vector<utils::UUID> expected_digests) {
future<> test_schema_digest_does_not_change_with_disabled_features(sstring data_dir, std::set<sstring> disabled_features, std::vector<utils::UUID> expected_digests, std::function<void(cql_test_env& e)> extra_schema_changes) {
using namespace db;
using namespace db::schema_tables;
@@ -597,7 +595,7 @@ future<> test_schema_digest_does_not_change_with_disabled_features(sstring data_
cql_test_config cfg_in(db_cfg_ptr);
cfg_in.disabled_features = std::move(disabled_features);
return do_with_cql_env_thread([regenerate, expected_digests = std::move(expected_digests)](cql_test_env& e) {
return do_with_cql_env_thread([regenerate, expected_digests = std::move(expected_digests), extra_schema_changes = std::move(extra_schema_changes)] (cql_test_env& e) {
if (regenerate) {
// Exercise many different kinds of schema changes.
e.execute_cql(
@@ -613,6 +611,7 @@ future<> test_schema_digest_does_not_change_with_disabled_features(sstring data_
e.execute_cql(
"create keyspace tests2 with replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };").get();
e.execute_cql("drop keyspace tests2;").get();
extra_schema_changes(e);
}
auto expect_digest = [&] (schema_features sf, utils::UUID expected) {
@@ -673,7 +672,7 @@ SEASTAR_TEST_CASE(test_schema_digest_does_not_change) {
utils::UUID("1d91ad22-ea7c-3e7f-9557-87f0f3bb94d7"),
utils::UUID("2dcd4a37-cbb5-399b-b3c9-8eb1398b096b")
};
return test_schema_digest_does_not_change_with_disabled_features("./tests/sstables/schema_digest_test", std::set<sstring>{"COMPUTED_COLUMNS"}, std::move(expected_digests));
return test_schema_digest_does_not_change_with_disabled_features("./tests/sstables/schema_digest_test", std::set<sstring>{"COMPUTED_COLUMNS", "CDC"}, std::move(expected_digests), [] (cql_test_env& e) {});
}
SEASTAR_TEST_CASE(test_schema_digest_does_not_change_after_computed_columns) {
@@ -688,5 +687,26 @@ SEASTAR_TEST_CASE(test_schema_digest_does_not_change_after_computed_columns) {
utils::UUID("d58e5214-516e-3d0b-95b5-01ab71584a8d"),
utils::UUID("e1b50bed-2ab8-3759-92c7-1f4288046ae6")
};
return test_schema_digest_does_not_change_with_disabled_features("./tests/sstables/schema_digest_test_computed_columns", std::set<sstring>{}, std::move(expected_digests));
return test_schema_digest_does_not_change_with_disabled_features("./tests/sstables/schema_digest_test_computed_columns", std::set<sstring>{"CDC"}, std::move(expected_digests), [] (cql_test_env& e) {});
}
SEASTAR_TEST_CASE(test_schema_digest_does_not_change_with_cdc_options) {
std::vector<utils::UUID> expected_digests{
utils::UUID("a1f07f31-59d6-372a-8c94-7ea467354b39"),
utils::UUID("524d418d-a2e2-3fc3-bf45-5fb79b33c7e4"),
utils::UUID("524d418d-a2e2-3fc3-bf45-5fb79b33c7e4"),
utils::UUID("018fccba-8050-3bb9-a0a5-2b3c5f0371fe"),
utils::UUID("018fccba-8050-3bb9-a0a5-2b3c5f0371fe"),
utils::UUID("58f4254e-cc3b-3d56-8a45-167f9a3ea423"),
utils::UUID("48fda4f8-d7b5-3e59-a47a-7397989a9bf8"),
utils::UUID("8049bcfe-eb01-3a59-af33-16cef8a34b45"),
utils::UUID("2195a821-b2b8-3cb8-a179-2f5042e90841")
};
return test_schema_digest_does_not_change_with_disabled_features(
"./tests/sstables/schema_digest_test_cdc_options",
std::set<sstring>{},
std::move(expected_digests),
[] (cql_test_env& e) {
e.execute_cql("create table tests.table_cdc (pk int primary key, c1 int, c2 int) with cdc = {'enabled':'true'};").get();
});
}

View File

@@ -5262,3 +5262,131 @@ SEASTAR_THREAD_TEST_CASE(test_sstable_log_too_many_rows) {
test_sstable_log_too_many_rows_f(random, (random + 1), false);
test_sstable_log_too_many_rows_f((random + 1), random, true);
}
// The following test runs on tests/sstables/3.x/uncompressed/legacy_udt_in_collection
// It was created using Scylla 3.0.x using the following CQL statements:
//
// CREATE KEYSPACE ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
// CREATE TYPE ks.ut (a int, b int);
// CREATE TABLE ks.t ( pk int PRIMARY KEY,
// m map<int, frozen<ut>>,
// fm frozen<map<int, frozen<ut>>>,
// mm map<int, frozen<map<int, frozen<ut>>>>,
// fmm frozen<map<int, frozen<map<int, frozen<ut>>>>>,
// s set<frozen<ut>>,
// fs frozen<set<frozen<ut>>>,
// l list<frozen<ut>>,
// fl frozen<list<frozen<ut>>>
// ) WITH compression = {};
// UPDATE ks.t USING TIMESTAMP 1525385507816568 SET
// m[0] = {a: 0, b: 0},
// fm = {0: {a: 0, b: 0}},
// mm[0] = {0: {a: 0, b: 0}},
// fmm = {0: {0: {a: 0, b: 0}}},
// s = s + {{a: 0, b: 0}},
// fs = {{a: 0, b: 0}},
// l[scylla_timeuuid_list_index(7fb27e80-7b12-11ea-9fad-f4d108a9e4a3)] = {a: 0, b: 0},
// fl = [{a: 0, b: 0}]
// WHERE pk = 0;
//
// It checks whether a SSTable containing UDTs nested in collections, which contains incorrect serialization headers
// (doesn't wrap nested UDTs in the FrozenType<...> tag) can be loaded by new versions of Scylla.
static const sstring LEGACY_UDT_IN_COLLECTION_PATH =
"tests/sstables/3.x/uncompressed/legacy_udt_in_collection";
SEASTAR_THREAD_TEST_CASE(test_legacy_udt_in_collection_table) {
auto abj = defer([] { await_background_jobs().get(); });
auto ut = user_type_impl::get_instance("ks", to_bytes("ut"),
{to_bytes("a"), to_bytes("b")},
{int32_type, int32_type}, false);
auto m_type = map_type_impl::get_instance(int32_type, ut, true);
auto fm_type = map_type_impl::get_instance(int32_type, ut, false);
auto mm_type = map_type_impl::get_instance(int32_type, fm_type, true);
auto fmm_type = map_type_impl::get_instance(int32_type, fm_type, false);
auto s_type = set_type_impl::get_instance(ut, true);
auto fs_type = set_type_impl::get_instance(ut, false);
auto l_type = list_type_impl::get_instance(ut, true);
auto fl_type = list_type_impl::get_instance(ut, false);
auto s = schema_builder("ks", "t")
.with_column("pk", int32_type, column_kind::partition_key)
.with_column("m", m_type)
.with_column("fm", fm_type)
.with_column("mm", mm_type)
.with_column("fmm", fmm_type)
.with_column("s", s_type)
.with_column("fs", fs_type)
.with_column("l", l_type)
.with_column("fl", fl_type)
.set_compressor_params(compression_parameters::no_compression())
.build();
auto m_cdef = s->get_column_definition(to_bytes("m"));
auto fm_cdef = s->get_column_definition(to_bytes("fm"));
auto mm_cdef = s->get_column_definition(to_bytes("mm"));
auto fmm_cdef = s->get_column_definition(to_bytes("fmm"));
auto s_cdef = s->get_column_definition(to_bytes("s"));
auto fs_cdef = s->get_column_definition(to_bytes("fs"));
auto l_cdef = s->get_column_definition(to_bytes("l"));
auto fl_cdef = s->get_column_definition(to_bytes("fl"));
BOOST_REQUIRE(m_cdef && fm_cdef && mm_cdef && fmm_cdef && s_cdef && fs_cdef && l_cdef && fl_cdef);
auto ut_val = make_user_value(ut, {int32_t(0), int32_t(0)});
auto fm_val = make_map_value(fm_type, {{int32_t(0), ut_val}});
auto fmm_val = make_map_value(fmm_type, {{int32_t(0), fm_val}});
auto fs_val = make_set_value(fs_type, {ut_val});
auto fl_val = make_list_value(fl_type, {ut_val});
mutation mut{s, partition_key::from_deeply_exploded(*s, {0})};
auto ckey = clustering_key::make_empty();
// m[0] = {a: 0, b: 0}
{
collection_mutation_description desc;
desc.cells.emplace_back(int32_type->decompose(0),
atomic_cell::make_live(*ut, write_timestamp, ut->decompose(ut_val), atomic_cell::collection_member::yes));
mut.set_clustered_cell(ckey, *m_cdef, desc.serialize(*m_type));
}
// fm = {0: {a: 0, b: 0}}
mut.set_clustered_cell(ckey, *fm_cdef, atomic_cell::make_live(*fm_type, write_timestamp, fm_type->decompose(fm_val)));
// mm[0] = {0: {a: 0, b: 0}},
{
collection_mutation_description desc;
desc.cells.emplace_back(int32_type->decompose(0),
atomic_cell::make_live(*fm_type, write_timestamp, fm_type->decompose(fm_val), atomic_cell::collection_member::yes));
mut.set_clustered_cell(ckey, *mm_cdef, desc.serialize(*mm_type));
}
// fmm = {0: {0: {a: 0, b: 0}}},
mut.set_clustered_cell(ckey, *fmm_cdef, atomic_cell::make_live(*fmm_type, write_timestamp, fmm_type->decompose(fmm_val)));
// s = s + {{a: 0, b: 0}},
{
collection_mutation_description desc;
desc.cells.emplace_back(ut->decompose(ut_val),
atomic_cell::make_live(*bytes_type, write_timestamp, bytes{}, atomic_cell::collection_member::yes));
mut.set_clustered_cell(ckey, *s_cdef, desc.serialize(*s_type));
}
// fs = {{a: 0, b: 0}},
mut.set_clustered_cell(ckey, *fs_cdef, atomic_cell::make_live(*fs_type, write_timestamp, fs_type->decompose(fs_val)));
// l[scylla_timeuuid_list_index(7fb27e80-7b12-11ea-9fad-f4d108a9e4a3)] = {a: 0, b: 0},
{
collection_mutation_description desc;
desc.cells.emplace_back(timeuuid_type->decompose(utils::UUID("7fb27e80-7b12-11ea-9fad-f4d108a9e4a3")),
atomic_cell::make_live(*ut, write_timestamp, ut->decompose(ut_val), atomic_cell::collection_member::yes));
mut.set_clustered_cell(ckey, *l_cdef, desc.serialize(*l_type));
}
// fl = [{a: 0, b: 0}]
mut.set_clustered_cell(ckey, *fl_cdef, atomic_cell::make_live(*fl_type, write_timestamp, fl_type->decompose(fl_val)));
sstable_assertions sst(s, LEGACY_UDT_IN_COLLECTION_PATH);
sst.load();
assert_that(sst.read_rows_flat()).produces(mut).produces_end_of_stream();
}

View File

@@ -0,0 +1 @@
3519784297

View File

@@ -0,0 +1,9 @@
Scylla.db
CRC.db
Filter.db
Statistics.db
TOC.txt
Digest.crc32
Index.db
Summary.db
Data.db

View File

@@ -0,0 +1,9 @@
CompressionInfo.db
Filter.db
Data.db
Statistics.db
TOC.txt
Digest.crc32
Scylla.db
Index.db
Summary.db

Some files were not shown because too many files have changed in this diff Show More