A utility which can load a schema from a schema.cql file. The file has
to contain all the "dependencies" of the table: keyspace, UDTs, etc.
This will be used by the scylla-sstable-crawler in the next patch.
"
There's a landmine buried in range_rombstone's move constructor.
Whoever tries to use it risks grabbing the tombstone from the
containing list thus leaking the guy optionally invalidating an
iterator pointing at it. There's a safety without_link moving
constructor out there, but still.
To keep this place safe it's better to separate range_tombstone
from its linkage into anywhere. In particular to keep the range
tombstones in a range_tombstone_list here's the entry that keeps
the tombstone _and_ the list hook (which's a boost set hook).
The approach resembles the rows_entry::deletable_row pair.
tests: unit(dev, debug, patch from #9207)
fixes: #9243
"
* 'br-range-tombstone-vs-entry' of https://github.com/xemul/scylla:
range_tombstone: Drop without-link constructor
range_tombstone: Drop move_assign()
range_tombstone: Move linkage into range_tombstone_entry
range_tombstone_list: Prepare to use range_tombstone_entry
range_tombstone, code: Add range_tombstone& getters
range_tombstone_list: Factor out tombstone construction
range_tombstone_list: Simplify (maybe) pop_front_and_lock()
range_tombstone_list: De-templatize pop_as<>
range_tombstone_list: Conceptualize erase_where()
range_tombstone(_list): Mark some bits noexcept
mutation: Use range_tombstone_list's iterators
mutation_partition: Shorten memory usage calculation
mutation_partition: Remove unused local variable
We were silently ignoring INSERTs with NULL values for primary-key
columns, which Cassandra rejects. Fix it by rejecting any
modification_statement that would operate on empty partition or
clustering range.
This is the most direct fix, because range and slice are calculated in
one place for all modification statements. It covers not only NULL
cases, but also impossible restrictions like c>0 AND c<0.
Unfortunately, Cassandra doesn't treat all modification statements
consistently, so this fix cannot fully match its behavior. We err on
the side of tolerance, accepting some DELETE statements that Cassandra
rejects. We add a TODO for rejecting such DELETEs later.
Fixes#7852.
Tests: unit (dev), cql-pytest against Cassandra 4.0
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Closes#9286
Now it's time to remove the boost set's hook from the range_tombstone
and keep it wrapped into another class if the r._tombstone's location
is the range_tombstone_list.
Also the added previously .tombstone() getters and the _entry alias
can be removed -- all the code can work with the new class.
Two places in the code that made use of without_link{} move-constructor
are patched to get the range_tombstone part from the respective _entry
with the same result.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently all the code operates on the range_tombstone class.
and many of those places get the range tombstone in question
from the range_tombstone_list. Next patches will make that list
carry (and return) some new object called range_tombstone_entry,
so all the code that expects to see the former one there will
need to patched to get the range_tombstone from the _entry one.
This patch prepares the ground for that by introdusing the
range_tombstone& tombstone() { return *this; }
getter on the range_tombstone itself and patching all future
users of the _entry to call .tombstone() right now.
Next patch will remove those getters together with adding the new
range_tombstone_entry object thus automatically converting all
the patched places into using the entry in a proper way.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The tests in this patch verify that null characters are valid characters
inside string and bytes (blob) attributes in Alternator. The tests
verify this for both key attributes and non-key attributes (since those
are serialized differently, it's important to check both cases).
The tests pass on both DynamoDB and Alternator - confirming that we
don't have a bug in this area.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210824163442.186881-1-nyh@scylladb.com>
This patch adds tests for two undocumented (as far as I can tell) corner
cases of CQL's string types:
1. The types "text" and "varchar" are not just similar - they are in
fact exactly the same type.
2. All CQL string and blob types ("ascii", "text" or "varchar", "blob")
allow the null character as a valid character inside them. They are
*not* C strings that get terminated by the first null.
These tests pass on both Cassandra and Scylla, so did not expose any
bug, but having such tests is useful to understand these (so-far)
undocumented behaviors - so we can later document them.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210824225641.194146-1-nyh@scylladb.com>
* 'raft-misc-v3' of github.com:scylladb/scylla-dev:
raft: rename snapshot into snapshot_descriptor
raft: drop snapshot if is application failed
raft: fix local snapshot detection
raft: replication_test: store multiple snapshots in a state machine
raft: do not wait for entry to become stable before replicate it
...when using Segment/TotalSegment option.
The requirement is not specified in DynamoDB documents, but found
in DynamoDB Local:
{"__type":"com.amazon.coral.validate#ValidationException",
"message":"Exclusive start key must lie within the segment"}
Fixes#9272
Signed-off-by: Liu Lan <liulan_yewu@cmss.chinamobile.com>
Closes#9270
Return the pre- 6773563d3 behavior of demanding ALLOW FILTERING when partition slice is requested but on potentially unlimited number of partitions. Put it on a flag defaulting to "off" for now.
Fixes#7608; see comments there for justification.
Tests: unit (debug, dev), dtest (cql_additional_test, paging_test)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Closes#9126
* github.com:scylladb/scylla:
cql3: Demand ALLOW FILTERING for unlimited, sliced partitions
cql3: Track warnings in prepared_statement
test: Use ALLOW FILTERING more strictly
cql3: Add statement_restrictions::to_string
When a query requests a partition slice but doesn't limit the number
of partitions, require that it also says ALLOW FILTERING. Although
do_filter() isn't invoked for such queries, the performance can still
be unexpectedly slow, and we want to signal that to the user by
demanding they explicitly say ALLOW FILTERING.
Because we now reject queries that worked fine before, existing
applications can break. Therefore, the behavior is controlled by a
flag currently defaulting to off. We will default to "on" in the next
Scylla version.
Fixes#7608; see comments there for justification.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
State machine should be able to store more then one snapshot at a time
(one may be the currently used one and another is transferred from a
leader but not applied yet).
Since io_fiber persist entries before sending out messages even non
stable entries will become stable before observed by other nodes.
This patch also moves generation of append messages into get_outptut()
call because without the change we will lose batching since each
advance of last_idx will generate new append message.
This method has nothing to do with storage service and
is only needed to move feature service options from one
method to another. This can be done by the only caller
of it.
tests: unit(dev)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210827133954.29535-1-xemul@scylladb.com>
"
This series moves the timeout parameter, that is passed to most
f_m_r methods, into the reader_permit. This eliminates
the need to pass the timeout around, as it's taken
from the permit when needed.
The permit timeout is updated in certain cases
when the permit/reader is paused and retrieved
later on for reuse.
Following are perf_simple_query results showing ~1%
reduction in insns/op and corresponding increase in tps.
$ build/release/test/perf/perf_simple_query -c 1 --operations-per-shard 1000000 --task-quota-ms 10
Before:
102500.38 tps ( 75.1 allocs/op, 12.1 tasks/op, 45620 insns/op)
After:
103957.53 tps ( 75.1 allocs/op, 12.1 tasks/op, 45372 insns/op)
Test: unit(dev)
DTest:
repair_additional_test.py:RepairAdditionalTest.repair_abort_test (release)
materialized_views_test.py:TestMaterializedViews.remove_node_during_mv_insert_3_nodes_test (release)
materialized_views_test.py:InterruptBuildProcess.interrupt_build_process_with_resharding_half_to_max_test (release)
migration_test.py:TTLWithMigrate.big_table_with_ttls_test (release)
"
* tag 'reader_permit-timeout-v6' of github.com:bhalevy/scylla:
flat_mutation_reader: get rid of timeout parameter
reader_concurrency_semaphore: use permit timeout for admission
reader_concurrency_semaphore: adjust reactivated reader timeout
multishard_mutation_query: create_reader: validate saved reader permit
repair: row_level: read_mutation_fragment: set reader timeout
flat_mutation_reader: maybe_timed_out: use permit timeout
test: sstable_datafile_test: add sstable_reader_with_timeout
reader_permit: add timeout member
"
This series implements section 6.4 of the Raft PhD. It allows to do
linearisable reads on a follower bypassing raft log entirely. After this
series server::read_barrier can be executed on a follower as well as
leader and after it completes local user's state machine state can be
accessed directly.
"
* 'raft-read-v9' of github.com:scylladb/scylla-dev:
raft: test: add read_barrier test to replication_test
raft: test: add read_barrier tests to fsm_test
raft: make read_barrier work on a follower as well as on a leader
raft: add a function to wait for an index to be applied
raft: (server) add a helper to wait through uncertainty period
raft: make fsm::current_leader() public
raft: add hasher for raft::internal::tagged_uint64
serialize: add serialized for std::monostate
raft: fix indentation in applier_fiber
This patch implements RAFT extension that allows to perform linearisable
reads by accessing local state machine. The extension is described
in section 6.4 of the PhD. To sum it up to perform a read barrier on
a follower it needs to asks a leader the last committed index that it
knows about. The leader must make sure that it is still a leader before
answering by communicating with a quorum. When follower gets the index
back it waits for it to be applied and by that completes read_barrier
invocation.
The patch adds three new RPC: read_barrier, read_barrier_reply and
execute_read_barrier_on_leader. The last one is the one a follower uses
to ask a leader about safe index it can read. First two are used by a
leader to communicate with a quorum.
"
Factor out replication test, make it work with different clocks, add
some features, and add a many nodes test with steady_clock. Also
refactor common test helper.
Many nodes test passes for release and dev and normal tick of 100ms for
up to 1000 servers. For debug mode it's much fewer due to lack of
optimizations so it's only tested for smaller numbers.
Tests: unit ({dev}), unit ({debug}), unit ({release})
"
* 'raft-many-22-v12' of https://github.com/alecco/scylla: (21 commits)
raft: candidate timeout proportional to cluster size
raft: testing: many nodes test
raft: replication test: remove unused tick_all
raft: replication test: delays
raft: replication test: packet drop rpc helper
raft: replication test: connectivity configuration
raft: replication test: rpc network map in raft_cluster
raft: replication test: use minimum granularity
raft: replication test: minor: rename local to int ids
raft: replication test: fix restart_tickers when partitioning
raft: replication test: partition ranges
raft: replication test: isolate one server
raft: replication test: move objects out of header
raft: replication test: make dummy command const
raft: replication test: template clock type
raft: replication test: tick delta inside raft_cluster
raft: replication test: style - member initializer
raft: replication test: move common code out
raft: testing: refactor helper
raft: log election stages
...
Now that the timeout is stored in the reader
permit use it for admission rather than a timeout
parameter.
Note that evictable_reader::next_partition
currently passes db::no_timeout to
resume_or_create_reader, which propagated to
maybe_wait_readmission, but it seems to be
an oversight of the f_m_r api that doesn't
pass a timeout to next_partition().
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
To avoid dueling candidates with large clusters, make the timeout
proportional to the cluster size.
Debug mode is too slow for a test of 1000 nodes so it's disabled, but
the test passes for release and dev modes.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Tests with many nodes and realistic timers and ticks.
Network delays are kept as a fraction of ticks. (e.g. 20/100)
Tests with 600 or more nodes hang in debug mode.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Allow test supplied delays for rpc communication.
Allow supplying network delay, local delay (nodes within the same
server), how many nodes are local, and an extra small delay simulating
local load.
Modify rpc class to support delays. If delays are enabled, it no longer
directly calls the other node's server code but it schedules it to be
called later. This makes the test more realistic as in the previous
version the first candidate was always going to get to all followers
first, preventing a dueling candidates scenario.
Previously, tickers were all scheduled at the same time, so there was no
spread of them across the tick time. Now these tickers are scheduled
with a uniform spread across this time (tick delta).
Also previously, for custom free elections used tick_all() which
traversed _in_configuration sequentially and ticked each. This, combined
with rpc outbound directly calling methods in the other server without
yielding, caused free elections to be unrealistic with same order
determined and first candidate always winning. This patch changes this
behavior. The free election uses normal tickers (now uniformly
distributed in tick delay time) and its loop waits for tick delay time
(yielding) and checks if there's a new leader. Also note the order might
not be the same in debug mode if more than one tick is scheduled.
As rpc messages are sent delayed, network connectivity needs to be
checked again before calling the function on the remote side.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
When partitioning, elect_new_leader restarts tickers, so don't
re-restart them in this case.
When leader is dropped and no new leader is specified, restart tickers
before free election.
If no change of leader, restart tickers.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>