Said test creates two vectors, the vector storage being allocated with
the default allocator, while its content being allocated on LSA. If an
exception is thrown however, both are freed via the default allocator,
triggering an assert in LSA code. Move the cleanup into a `defer()` so
the correct cleanup sequence is executed even on exceptions.
`_request` performed a GET request and extracted a text body out of the
response.
Split it into `_get`, which only performs the request, and `_get_text`,
which calls `_get` and extracts the body as text.
Also extract a `_resource_uri` function which will be used for other
request types.
Add a suite which is basically equivalent to `topology` except that it
doesn't start servers with Raft enabled.
The suite will be used to test the Raft upgrade procedure.
The suite contains a basic test just to check the suite itself can run;
the test will be removed when 'real' tests are added.
Closes#11487
* github.com:scylladb/scylladb:
test.py: PythonTestSuite: sum default config params with user-provided ones
test: add a topology suite with Raft disabled
test: pylib: use Python dicts to manipulate `ScyllaServer` configuration
test: pylib: store `config_options` in `ScyllaServer`
Due to slow debug machines timing out, bump up all timeouts
significantly.
The cause was ExecutionProfile request_timeout. Also set a high
heartbeat timeout and bump already set timeouts to be safe, too.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Closes#11516
This series introduces two configurable options when working with TWCS tables:
- `restrict_twcs_default_ttl` - a LiveUpdate-able tri_mode_restriction which defaults to WARN and will notify the user whenever a TWCS table is created without a `default_time_to_live` setting
- `twcs_max_window_count` - Which forbids the user from creating TWCS tables whose window count (buckets) are past a certain threshold. We default to 50, which should be enough for most use cases, and a setting of 0 effectively disables the check.
Refs: #6923Fixes: #9029Closes#11445
* github.com:scylladb/scylladb:
tests: cql_query_test: add mixed tests for verifying TWCS guard rails
tests: cql_query_test: add test for TWCS window size
tests: cql_query_test: add test for TWCS tables with no TTL defined
cql: add configurable restriction of default_time_to_live when for TimeWindowCompactionStrategy tables
cql: add max window restriction for TimeWindowCompactionStrategy
time_window_compaction_strategy: reject invalid window_sizes
cql3 - create/alter_table_statement: Make check_restricted_table_properties accept a schema_ptr
`feature_service` provided two sets of features: `known_feature_set` and
`supported_feature_set`. The purpose of both and the distinction between
them was unclear and undocumented.
The 'supported' features were gossiped by every node. Once a feature is
supported by every node in the cluster, it becomes 'enabled'. This means
that whatever piece of functionality is covered by the feature, it can
by used by the cluster from now on.
The 'known' set was used to perform feature checks on node start; if the
node saw that a feature is enabled in the cluster, but the node does not
'know' the feature, it would refuse to start. However, if the feature
was 'known', but wasn't 'supported', the node would not complain. This
means that we could in theory allow the following scenario:
1. all nodes support feature X.
2. X becomes enabled in the cluster.
3. the user changes the configuration of some node so feature X will
become unsupported but still known.
4. The node restarts without error.
So now we have a feature X which is enabled in the cluster, but not
every node supports it. That does not make sense.
It is not clear whether it was accidental or purposeful that we used the
'known' set instead of the 'supported' set to perform the feature check.
What I think is clear, is that having two sets makes the entire thing
unnecessarily complicated and hard to think about.
Fortunately, at the base to which this patch is applied, the sets are
always the same. So we can easily get rid of one of them.
I decided that the name which should stay is 'supported', I think it's
more specific than 'known' and it matches the name of the corresponding
gossiper application state.
Closes#11512
Add a suite which is basically equivalent to `topology` except that it
doesn't start servers with Raft enabled.
The suite will be used to test the Raft upgrade procedure.
The suite contains a basic test just to check the suite itself can run;
the test will be removed when 'real' tests are added.
Previously we used a formattable string to represent the configuration;
values in the string were substituted by Python's formatting mechanism
and the resulting string was stored to obtain the config file.
This approach had some downsides, e.g. it required boilerplate work to
extend: to add a new config options, you would have to modify this
template string.
Instead we can represent the configuration as a Python dictionary. Dicts
are easy to manipulate, for example you can sum two dicts; if a key
appears in both, the second dict 'wins':
```
{1:1} | {1:2} == {1:2}
```
This makes the configuration easy to extend without having to write
boilerplate: if the user of `ScyllaServer` wants to add or override a
config option, they can simply add it to the `config_options` dict and
that's it - no need to modify any internal template strings in
`ScyllaServer` implementation like before. The `config_options` dict is
simply summed with the 'base' config dict of `ScyllaServer`
(`config_options` is the right summand so anything in there overrides
anything in the base dict).
An example of this extensibility is the `authenticator` and `authorizer`
options which no longer appear in `scylla_cluster.py` module after this
change, they only appear in the suite.yaml file.
Also, use "workdir" option instead of specifying data dir, commitlog
dir etc. separately.
Previously the code extracted `authenticator` and `authorizer` keys from
the config options and stored them.
Store the entire dict instead. The new code is easier to extend if we
want to make more options configurable.
"
The test in question plays with snitches to simulate the topology
over which tokens are spread. This set replaces explicit snitch
usage with temporary topology object.
Some snitch traces are still left, but those are for token_metadata
internal which still call global snitch for DC/RACK.
"
* 'br-tests-use-topology-not-snitch' of https://github.com/xemul/scylla:
network_topology_strategy_test: Use topology instead of snitch
network_topology_strategy_test: Populate explicit topology
dirty_memory_manager tracks lsa regions (memtables) under region_group:s,
in order to be able to pick up the largest memtable as a candidate for
flushing.
Just as region_group:s contain regions, they can also contain other
region_group:s in a nested structure. It also tracks the nested region_group
that contains the largest region in a binomial heap.
This latter facility is no longer used. It saw use when we had the system
dirty_memory_manager nested under the user dirty_memory_manager, but
that proved too complicated so it was undone. We still nest a virtual
region_group under the real region_group, and in fact it is the
virtual region_group that holds the memtables, but it is accessed
directly to find the largest memtable (region_group::get_largest_region)
and so all the mechanism that sorts region_group:s is bypassed.
Start to dismantle this house of cards by removing the subgroup
sorting. Since the hierarchy has exactly one parent and one child,
it's clearly useless. This is seen by the fact that we can just remove
everything related.
We still need the _subgroups member to hold the virtual region_group;
it's replaced by a vector. I verified that the non-intrusive vector
is exception safe since push_back() happens at the very end; in any
case this is early during setup where we aren't under memory pressure.
A few tests that check the removed functionality are deleted.
Closes#11515
Compaction group can be defined as a set of files that can be compacted together. Today, all sstables belonging to a table in a given shard belong to the same group. So we can say there's one group per table per shard. As we want to eventually allow isolation of data that shouldn't be mixed, e.g. data from different vnodes, then we want to have more than one group per table per shard. That's why compaction groups is being introduced here.
Today, all memtables and sstables are stored in a single structure per table. After compaction groups, there will be memtables and sstables for each group in the table.
As we're taking an incremental approach, table still supports a single group. But work was done on preparing table for supporting multiple groups. Completing that work is actually the next step. Also, a procedure for deriving the group from token is introduced, but today it always return the single group owned by the table. Once multiple groups are supported, then that procedure should be implemented to map a token to a group.
No semantics was changed by this series.
Closes#11261
* github.com:scylladb/scylladb:
replica: Move memtables to compaction_group
replica: move compound SSTable set to compaction group
replica: move maintenance SSTable set to compaction_group
replica: move main SSTable set to compaction_group
replica: Introduce compaction_group
replica: convert table::stop() into coroutine
compaction_manager: restore indentation
compaction_manager: Make remove() and stop_ongoing_compactions() noexcept
test: sstable_compaction_test: Don't reference main sstable set directly
test: sstable_utils: Set data size fields for fake SSTable
test: sstable_compaction_test: remove needless usage of column_family_test::add_sstable
Task manager for observing and managing long-running, asynchronous tasks in Scylla
with the interface for the user. It will allow listing of tasks, getting detailed
task status and progression, waiting for their completion, and aborting them.
The task manager will be configured with a “task ttl” that determines how long
the task status is kept in memory after the task completes.
At first it will support repair and compaction tasks, and possibly more in the future.
Currently:
Sharded `task_manager` is started in `main.cc` where it is further passed
to `http_context` for the purpose of user interface.
Task manager's tasks are implemented in two two layers: the abstract
and the implementation one. The latter is a pure virtual class which needs
to be overriden by each module. Abstract layer provides the methods that
are shared by all modules and the access to module-specific methods.
Each module can access task manager, create and manage its tasks through
`task_manager::module` object. This way data specific to a module can be
separated from the other modules.
User can access task manager rest api interface to track asynchronous tasks.
The available options consist of:
- getting a list of modules
- getting a list of basic stats of all tasks in the requested module
- getting the detailed status of the requested task
- aborting the requested task
- waiting for the requested task to finish
To enable testing of the provided api, test specific task implementation and module
are provided. Their lifetime can be simulated with the standalone test api.
These components are compiled and the tests are run in all but release build modes.
Fixes: #9809Closes#11216
* github.com:scylladb/scylladb:
test: task manager api test
task_manager: test api layer implementation
task_manager: add test specific classes
task_manager: test api layer
task_manager: api layer implementation
task_manager: api layer
task_manager: keep task_manager reference in http_context
start sharded task manager
task_manager: create task manager object
This patch adds set of 10 cenarios that have been unveiled during additional testing.
In particular, most of the scenarios cover ALTER TABLE statements, which - if not handled -
may break the guardrails safe-mode. The situations covered are:
- STCS->TWCS with no TTL defined
- STCS->TWCS with small TTL
- STCS->TWCS with large TTL value
- TWCS table with small to large TTL
- No TTL TWCS to large TTL and then small TTL
- twcs_max_window_count LiveUpdate - Decrease TTL
- twcs_max_window_count LiveUpdate - Switch CompactionStrategy
- No TTL TWCS table to STCS
- Large TTL TWCS table, modify attribute other than compaction and default_time_to_live
- Large TTL STCS table, fail to switch to TWCS with no TTL explicitly defined
This patch adds a test for checking the validity of tables using TimeWindowCompactionStrategy
with an incorrect number of compaction windows.
The twcs_max_window_count LiveUpdate-able parameter is also disabled during the execution of the
test in order to ensure that users can effectively disable the enforcement, should they want.
This patch adds a testcase for TimeWindowCompactionStrategy tables created with no
default_time_to_live defined. It makes use of the LiveUpdate-able restrict_twcs_default_ttl
parameter in order to determine whether TWCS tables without TTL should be forbidden or not.
The test replays all 3 possible variations of the tri_mode_restriction and verifies tables
are correctly created/altered according to the current setting on the replica which receives
the request.
Now memtables live in compaction_group. Also introduced function
that selects group based on token, but today table always return
the single group managed by it. Once multiple groups are supported,
then the function should interpret token content to select the
group.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
This commit is restricted to moving maintenance set into compaction_group.
Next, we'll introduce compound set into it.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Preparatory change for main sstable set to be moved into compaction
group. After that, tests can no longer direct access the main
set.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
So methods that look at data size and require it to be higher than 0
will work on fake SSTables created using set_values().
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
column_family_test::add_sstable will soon be changed to run in a thread,
and it's not needed in this procedure, so let's remove its usage.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Otherwise it crashes some python versions.
The cast was there before a2dd64f68f
explicitly dropped one while moving the code between files.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#11511
Broadcast tables are tables for which all statements are strongly
consistent (linearizable), replicated to every node in the cluster and
available as long as a majority of the cluster is available. If a user
wants to store a “small” volume of metadata that is not modified “too
often” but provides high resiliency against failures and strong
consistency of operations, they can use broadcast tables.
The main goal of the broadcast tables project is to solve problems which
need to be solved when we eventually implement general-purpose strongly
consistent tables: designing the data structure for the Raft command,
ensuring that the commands are idempotent, handling snapshots correctly,
and so on.
In this MVP (Minimum Viable Product), statements are limited to simple
SELECT and UPDATE operations on the built-in table. In the future, other
statements and data types will be available but with this PR we can
already work on features like idempotent commands or snapshotting.
Snapshotting is not handled yet which means that restarting a node or
performing too many operations (which would cause a snapshot to be
created) will give incorrect results.
In a follow-up, we plan to add end-to-end Jepsen tests
(https://jepsen.io/). With this PR we can already simulate operations on
lists and test linearizability in linear complexity. This can also test
Scylla's implementation of persistent storage, failure detector, RPC,
etc.
Design doc: https://docs.google.com/document/d/1m1IW320hXtsGulzSTSHXkfcBKaG5UlsxOpm6LN7vWOc/edit?usp=sharingCloses#11164
* github.com:scylladb/scylladb:
raft: broadcast_tables: add broadcast_kv_store test
raft: broadcast_tables: add returning query result
raft: broadcast_tables: add execution of intermediate language
raft: broadcast_tables: add compilation of cql to intermediate language
raft: broadcast_tables: add definition of intermediate language
db: system_keyspace: add broadcast_kv_store table
db: config: add BROADCAST_TABLES feature flag
- To isolate the different pytest suites, remove the top level conftest
and move needed contents to existing `test/pylib/cql_repl/conftest.py`
and `test/topology/conftest.py`.
- Add logging to CQL and Python suites.
- Log driver version for CQL and topology tests.
Closes#11482
* github.com:scylladb/scylladb:
test.py: enable log capture for Python suite
test.py: log driver name/version for cql/topology
test.py: remove top level conftest.py
Test queries scylla with following statements:
* SELECT value FROM system.broadcast_kv_store WHERE key = CONST;
* UPDATE system.broadcast_kv_store SET value = CONST WHERE key = CONST;
* UPDATE system.broadcast_kv_store SET value = CONST WHERE key = CONST IF value = CONST;
where CONST is string randomly chosen from small set of random strings
and half of conditional updates has condition with comparison to last
written value.
We decided to extend `cql_statement` hierarchy with `strongly_consistent_modification_statement`
and `strongly_consistent_select_statement`. Statements operating on
system.broadcast_kv_store will be compiled to these new subclasses if
BROADCAST_TABLES flag is enabled.
If the query is executed on a shard other than 0 it's bounced to that shard.
Change a8ad385ecd introduced
```
thread_local std::unordered_map<utils::UUID, seastar::lw_shared_ptr<repair_history_map>> repair_history_maps;
```
We're trying to avoid global scoped variables as much as we can so this should probably be embedded in some sharded service.
This series moves the thread-local `repair_history_maps` instances to `compaction_manager`
and passes a reference to the shard compaction_manager to functions that need it for compact_for_query
and compact_for_compaction.
Since some paths don't need it and don't have access to the compactio_manager,
the series introduced `utils::optional_reference<T>` that allows to pass nullopt.
In this case, `get_gc_before_for_key` behaves in `tombstone_gc_mode::repair` as if the table wasn't repaired and tombstones are not garbage-collected.
Fixes#11208Closes#11366
* github.com:scylladb/scylladb:
tombstone_gc: deglobalize repair_history_maps
mutation_compactor: pass tombstone_gc_state to compact_mutation_state
mutation_partition: compact_for_compaction_v2: get tombstone_gc_state
mutation_partition: compact_for_compaction: get tombstone_gc_state
mutation_readers: pass tombstone_gc_state to compating_reader
sstables: get_gc_before_*: get tombstone_gc_state from caller
compaction: table_state: add virtual get_tombstone_gc_state method
db: view: get_tombstone_gc_state from compaction_manager
db: view: pass base table to view_update_builder
repair: row_level: repair_update_system_table_handler: get get_tombstone_gc_state for db compaction_manager
replica: database: get_tombstone_gc_state from compaction_manager
compaction_manager: add tombstone_gc_state
replica: table: add get_compaction_manager function
tombstone_gc: introduce tombstone_gc_state
repair_service: simplify update_repair_time error handling
tombstone_gc: update_repair_time: get table_id rather than schema_ptr
tombstone_gc: delete unused forward declaration
database: do not drop_repair_history_map_for_table in detach_column_family
Remove top level conftest so different suites have their own (as it was
before).
Move minimal functionality into existing test/pylib/cql_repl/conftest.py
so cql tests can run on their own.
Move param setting into test/topology/conftest.py.
Use uuid module for unique keyspace name for cql tests.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Before/after test checks are done per test case, there's no longer need
to check after pytest finishes.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Closes#11489
When importing from `pylib`, don't modify `sys.path` but use the fact
that both `test/` and `test/pylib/` directories contain an `__init__.py`
file, so `test.pylib` is a valid module if we start with `test/` as the
Python package root.
Both `pytest` and `mypy` (and I guess other tools) understand this
setup.
Also add an `__init__.py` to `test/topology/` so other modules under the
`test/` directory will be able to import stuff from `test/topology/`
(i.e. from `test.topology.X import Y`).
Closes#11467
I created new issues for each missing field in DescribeTable's
response for GSIs and LSIs, so in this patch we edit the xfail
messages in the test to refer to these issues.
Additionally, we only had a test for these fields for GSIs, so this
patch also adds a similar test for LSIs. I turns out there is a
difference between the two tests - the two fields IndexStatus and
ProvisionedThroughput are returned for GSIs, but not for LSIs.
Refs #7750
Refs #11466
Refs #11470
Refs #11471
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#11473
and override it in table::table_state to get the tombstone_gc_state
from the table's compaction_manager.
It is going to be used in the next patched to pass the gc state
from the compaction_strategy down to sstables and compaction.
table_state_for_test was modified to just keep a null
tombstone_gc_state.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Most of the test's cases use rack-inferring snitch driver and get
DC/RACK from it via the test_dc_rack() helper. The helper was introduced
in one of the previous sets to populate token metadata with some DC/RACK
as normal tokens manipulations required respective endpoint in topology.
This patch removes the usage of global snitch and replaces it with the
pre-populated topology. The pre-population is done in rack-inferring
snitch like manner, since token_metadata still uses global snitch and
the locations from snitch and this temporary topology should match.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There's a test case that makes its own snitch driver that generates
pre-claculated DC/RACK data for test endpoints. This patch replaces this
custom snitch driver with a standalone topology object.
Note: to get DC/RACK info from this topo the get_location() is used
since the get_rack()/get_datacenter() are still wrappers around global
snitch.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Add experimental flag 'broadcast-tables' for enabling BROADCAST_TABLES feature.
This feature requires raft group0, thus enabling it without RAFT will cause an error.
Said method currently emits a partition-end. This method is only called
when the last fragment in the stream is a range tombstone change with a
position after all clustered rows. The problem is that
consume_partition_end() is also called unconditionally, resulting in two
partition-end fragments being emitted. The fix is simple: make this
method a no-op, there is nothing to do there.
Also add two tests: one targeted to this bug and another one testing the
crawling reader with random mutations generated for random schema.
Fixes: #11421Closes#11422
this setting was removed back in
dcdd207349, so despite that we are still
passing `storage_service_config` to the ctor of `storage_service`,
`storage_service::storage_service()` just drops it on the floor.
in this change, `storage_service_config` class is removed, and all
places referencing it are updated accordingly.
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Closes#11415
"
The topology object maintains all sort of node/DC/RACK mappings on
board. When new entries are added to it the DC and RACK are taken
from the global snitch instance which, in turn, checks gossiper,
system keyspace and its local caches.
This set make topology population API require DC and RACK via the
call argument. In most of the cases the populating code is the
storage service that knows exactly where to get those from.
After this set it will be possible to remove the dependency knot
consiting of snitch, gossiper, system keyspace and messaging.
"
* 'br-topology-dc-rack-info' of https://github.com/xemul/scylla:
toplogy: Use the provided dc/rack info
test: Provide testing dc/rack infos
storage_service: Provide dc/rack for snitch reconfiguration
storage_service: Provide dc/rack from system ks on start
storage_service: Provide dc/rack from gossiper for replacement
storage_service: Provide dc/rack from gossiper for remotes
storage_service,dht,repair: Provide local dc/rack from system ks
system_keyspace: Cache local dc-rack on .start()
topology: Some renames after previous patch
topology: Require entry in the map for update_normal_tokens()
topology: Make update_endpoint() accept dc-rack info
replication_strategy: Accept dc-rack as get_pending_address_ranges argument
dht: Carry dc-rack over boot_strapper and range_streamer
storage_service: Make replacement info a real struct
Test teardown involves dropping the test keyspace. If there are stopped servers occasionally we would see timeouts.
Start stopped servers after a test is finished (and passed).
Revert previous commit making teardown async again.
Closes#11412
* github.com:scylladb/scylladb:
test.py: restart stopped servers before teardown...
Revert "test.py: random tables make DDL queries async"