Commit Graph

31703 Commits

Author SHA1 Message Date
Konstantin Osipov
dbdfac7a0f test.py: improve logging of ScyllaServer 2022-06-28 18:22:01 +03:00
Konstantin Osipov
c4bee2860b test.py: use test uname, not id, in log output
Clarify "Test was cancelled" error message.
2022-06-28 18:22:01 +03:00
Konstantin Osipov
b502dd3c4e test.py: support --log-level option and pass it to logging module
scylla-python driver and scylla_server.py can be more verbose
at higher log levels, allow specifying the log level from the command
line.
2022-06-28 18:22:01 +03:00
Konstantin Osipov
ad7423649f test.py: make ScyllaServer more reliable and fast
1) Stick to the specific server in control connections.
It could happen that, when starting a cluster and checking
if a specific node is up, the check would actually execute
against an already running node. Prevent this from happening
by setting a white list connection balancing policy for control
connections.

2) When checking if CQL is up, ignore timeout errors
Scylla in debug mode can easily time out on a DDL query,
and the timeout error at start up would lead to the entire cluster
marked as broken. This is too harsh, allow timeouts at start.

3) No longer force schema migration when starting the server
By default, Raft is on, so the nodes are getting schema
through Raft leader. Schema migration significantly slows
down cluster start in debug mode (60 seconds -> 100 seconds),
and even though it was a great test that helped discover
several bugs in Scylla, it shouldn't be part of normal
cluster boot, so disable it.
2022-06-28 18:21:25 +03:00
Konstantin Osipov
867d5b4eda test.py: don't capture pytest output
Let print()s inside pytest tests go into pytest logs. This simplifies
debugging, especially if someone is not familiar with pytest.
2022-06-28 17:46:47 +03:00
Konstantin Osipov
fd3d08e560 test.py: add type annotations
Add type annotations where possible.
2022-06-28 17:46:47 +03:00
Konstantin Osipov
2470b1d888 test.py: convert log_filename to pathlib 2022-06-28 17:46:47 +03:00
Konstantin Osipov
20070e2d89 test.py: please linter
Rename the static method to not collied with a member variable
with the same name.
2022-06-28 17:46:46 +03:00
Konstantin Osipov
fc232099bb test.py: remove mypy warnings 2022-06-28 17:46:46 +03:00
Botond Dénes
6c818f8625 Merge 'sstables: generation_type tidy-up' from Michael Livshin
- Use `sstables::generation_type` in more places
- Enforce conceptual separation of `sstables::generation_type` and `int64_t`
- Fix `extremum_tracker` so that `sstables::generation_type` can be non-default-constructible

Fixes #10796.

Closes #10844

* github.com:scylladb/scylla:
  sstables: make generation_type an actual separate type
  sstables: use generation_type more soundly
  extremum_tracker: do not require default-constructible value types
2022-06-28 08:50:12 +03:00
Calle Wilund
688fd31e64 commitlog: Add counters for actual pending allocations + segment wait
Fixes #9367

The CL counters pending_allocations and requests_blocked_memory are
exposed in graphana (etc) and often referred to as metrics on whether
we are blocking on commit log. But they don't really show this, as
they only measure whether or not we are blocked on the memory bandwidth
semaphore that provides rate back pressure (fixed num bytes/s - sortof).

However, actual tasks in allocation or segment wait is not exposed, so
if we are blocked on disk IO or waiting for segments to become available,
we have no visible metrics.

While the "old" counters certainly are valid, I have yet to ever see them
be non-zero in modern life.

Closes #9368
2022-06-28 08:36:27 +03:00
Nadav Har'El
e22364dcc5 doc, alternator: split "experimental" features from "unimplemented" ones
Currently in docs/alternator/compatibility.md experimental features
and unimplemented features are bunched together under one heading
("unimplemented features"). In this patch we separate them into two
sections. This makes the "unimplemented features" section shorter,
and also allows us to link to the new "experimental features" section
separately.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #10893
2022-06-28 08:08:50 +03:00
Tomasz Grabiec
1a9d1d380a Reads from cache lack preemption check when scanning over range tombstones
A scan over range tombstones will ignore preemption, which may cause
reactor stalls or read failure due to std::bad_alloc.

This is a regression introduced in
5e97fb9fc4.  _lower_bound_changed was
always set to false, which is later checked at preemption point and
inhibits yielding.

Closes #10900
2022-06-28 06:58:48 +03:00
Avi Kivity
7b37e02aa7 Update seastar submodule
* seastar ff46af9ae0...9c016aeebf (8):
  > Merge "Handle overflow in token bucket replenisher" from Pavel E
Fixes #10743
Fixes #10846
  > abort_source: request_abort: restore legacy no-args method
  > configure.py: do not use distutils
  > configure.py: drop unused "import sys"
  > Revert "Use recv syscall instead of read in do_read_some()"
  > Use recv syscall instead of read in do_read_some()
  > Merge 'Add initial support for websocket protocol' from Andrzej Stalke
  > Merge 'abort_source: request_abort: allow passing exception to subscribers' from Benny Halevy

Closes #10898
2022-06-27 23:11:56 +03:00
Kamil Braun
ff4ecfa182 dht: boot_strapper: check if keyspace still exists in bootstrap
While we're iterating over the fetched keyspace names, some of these
keyspaces may get dropped. Handle that by checking if the keyspace still
exists.

Also, when retrieving the replication strategy from the keyspace, store
the pointer (which is an `lw_shared_ptr`) to the strategy to keep it
alive, in case the keyspace that was holding it gets dropped.

Closes #10861
2022-06-27 19:13:46 +02:00
Asias He
d3c6e72c69 repair: Allow abort repair jobs in early stage
Consider this:

- User starts a repair job with http api
- User aborts all repair
- The repair_info object for the repair job is created
- The repair job is not aborted

In this patch, the repair uuid is recorded before repair_info object is
created, so that repair can now abort repair jobs in the early stage.

Fixes #10384

Closes #10428
2022-06-27 16:39:36 +03:00
Pavel Emelyanov
f3841c1b45 exceptions: Define operator<< for exception_code
Otherwise cql_transport::additional_options_for_proto_ext() complains
about inability to format the enum class value

Introduced by efc3953c (transport: add rate_limit_error)
Fmt version 8.1.1-5.fc35, fresher one must have it out of the box

Fixes #10884

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220627052703.32024-1-xemul@scylladb.com>
2022-06-27 14:49:58 +03:00
Avi Kivity
3131cbea62 Merge 'query: allow replica to provide arbitrary continue position' from Botond Dénes
Currently, we use the last row in the query result set as the position where the query is continued from on the next page. Since only live rows make it into query result set, this mandates the query to be stopped on a live row on the replica, lest any dead rows or tombstones processed after the live rows, would have to be re-processed on the next page (and the saved reader would have to be thrown away due to position mismatch). This requirement of having to stop on a live row is problematic with datasets which have lots of dead rows or tombstones, especially if these form a prefix. In the extreme case, a query can time out before it can process a single live row and the data-set becomes effectively unreadable until compaction gets rid of the tombstones.
This series prepares the way for the solution: it allows the replica to determine what position the query should continue from on the next page. This position can be that of a dead row, if the query stopped on a dead row. For now, the replica supplies the same position that would have been obtained with looking at the last row in the result set, this series merely introduces the infrastructure for transferring a position together with the query result, and it prepares the paging logic to make use of this position. If the coordinator is not prepared for the new field, it will simply fall-back to the old way of looking at the last row in the result set. As I said for now this is still the same as the content of the new field so there is no problem in mixed clusters.

Refs: https://github.com/scylladb/scylla/issues/3672
Refs: https://github.com/scylladb/scylla/issues/7689
Refs: https://github.com/scylladb/scylla/issues/7933

Tests: manual upgrade test.
I wrote a data set with:
```
./scylla-bench -mode=write -workload=sequential -replication-factor=3 -nodes 127.0.0.1,127.0.0.2,127.0.0.3 -clustering-row-count=10000 -clustering-row-size=8096 -partition-count=1000
```
This creates large, 80MB partitions, which should fill many pages if read in full. Then I started a read workload:
```
./scylla-bench -mode=read -workload=uniform -replication-factor=3 -nodes 127.0.0.1,127.0.0.2,127.0.0.3 -clustering-row-count=10000 -duration=10m -rows-per-request=9000 -page-size=100
```
I confirmed that paging is happening as expected, then upgraded the nodes one-by-one to this PR (while the read-load was ongoing). I observed no read errors or any other errors in the logs.

Closes #10829

* github.com:scylladb/scylla:
  query: have replica provide the last position
  idl/query: add last_position to query_result
  mutlishard_mutation_query: propagate compaction state to result builder
  multishard_mutation_query: defer creating result builder until needed
  querier: use full_position instead of ad-hoc struct
  querier: rely on compactor for position tracking
  mutation_compactor: add current_full_position() convenience accessor
  mutation_compactor: s/_last_clustering_pos/_last_pos/
  mutation_compactor: add state accessor to compact_mutation
  introduce full_position
  idl: move position_in_partition into own header
  service/paging: use position_in_partition instead of clustering_key for last row
  alternator/serialization: extract value object parsing logic
  service/pagers/query_pagers.cc: fix indentation
  position_in_partition: add to_string(partition_region) and parse_partition_region()
  mutation_fragment.hh: move operator<<(partition_region) to position_in_partition.hh
2022-06-27 12:23:21 +03:00
Benny Halevy
81fa1ce9a1 Revert 'Compact staging sstables'
This patch reverts the following patches merged in
78750c2e1a "Merge 'Compact staging sstables' from Benny Halevy"

> 597e415c38 "table: clone staging sstables into table dir"
> ce5bd505dc "view_update_generator: discover_staging_sstables: reindent"
> 59874b2837 "table: add get_staging_sstables"
> 7536dd7f00 "distributed_loader: populate table directory first"

The feature causes regressions seen with e.g.
https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-release/41/testReport/materialized_views_test/TestMaterializedViews/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split011___test_base_replica_repair/
```
AssertionError: Expected [[0, 0, 'a', 3.0]] from SELECT * FROM t_by_v WHERE v = 0, but got []
```

Where views aren't updated properly.
Apparently since `table::stream_view_replica_updates`
doesn't exclude the staging sstables anymore and
since they are cloned to the base table as new sstables
it seems to the view builder that no view updates are
required since there's no changes comparing to the base table.

Reopens #9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10890
2022-06-27 12:18:48 +03:00
Benny Halevy
8bccd5e9c5 compaction_manager: task: acquire_semaphore: handle abort_requested_exception
Change 8f39547d89 added
`handle_exception_type([] (const semaphore_aborted& e) {})`,
but it turned out that `named_semaphore_aborted` isn't
derived from `semaphore_aborted`, but rather from
`abort_requested_exception` so handle the base exception
instead.

Fixes #10666

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10881
2022-06-27 09:47:48 +03:00
Pavel Emelyanov
708d3a1ea4 cql-test: Initialize cluster port as integer
Otherwise it complains like this:

_________ ERROR at setup of test_allow_filtering_indexed_no_filtering __________

request = <SubRequest 'cql' for <Function test_allow_filtering_indexed_no_filtering>>

    @pytest.fixture(scope="session")
    def cql(request):
        [...<snip>...]
>       return cluster.connect()

test/cql-pytest/conftest.py:66:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cassandra/cluster.py:1708: in cassandra.cluster.Cluster.connect
    ???
cassandra/cluster.py:1765: in cassandra.cluster.Cluster._new_session
    ???
cassandra/cluster.py:2563: in cassandra.cluster.Session.__init__
    ???
cassandra/pool.py:203: in cassandra.pool.Host.__str__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   TypeError: %d format: a real number is required, not str

tests: unit(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-06-27 09:00:48 +03:00
Benny Halevy
751eceb2e6 types: time_point_to_string: use numeric formatting rather than chrono-format specifiers
As reported in #10867, newer versions of the fmt library
format %Y using 4-characters width, 0-padding the prefix
when needed, while older versions don't do that.

This change moves away from using %Y and friends
fmt specifiers to using explicit numeric-based formatting
conforming to ISO 8601 and making sure the year field
has at least 4 digits and is zero padded.  When
negative, the width is upped to 5 so it would show as -0001
rather than -001.

The unit test was updated respectively.

Fixes #10867

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10870
2022-06-27 08:28:56 +03:00
Benny Halevy
9c231ad0ce repair_reader: construct _reader_handle before _reader
Currently, the `_reader` member is explicitly
initialized with the result of the call to `make_reader`.
And `make_reader`, as a side effect, assigns a value
to the `_reader_handle` member.

Since C++ initializes class members sequentially,
in the order they are defined, the assignment to `_reader_handle`
in `make_reader()` happens before `_reader_handle` is initialized.

This patch fixes that by changing the definition order,
and consequently, the member initialization order
in the constructor so that `_reader_handle` will be (default-)initialized
before the call to `make_reader()`, avoiding the undefined behavior.

Fixes #10882

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10883
2022-06-26 20:17:47 +03:00
Amnon Heiman
6b9b76c919 main.cc: add trailing backslash to the API directories
The API uses the http server to serve two directories: the api_ui_dir
where the swagger-ui directory is found and the api_doc_dir where the
swagger definition files are found.

Internally, the API uses the httpd::directory_handler that append the
files it gets from the path to the base directory name.
A user can override the default configuration and set a directory name
that will not end with a backslash. This will result with files not
found.

This patch check if that backslash is missing, and if it is, adds it to
the API configuration.

Fixes #10700

Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Closes #10877
2022-06-26 20:05:37 +03:00
Piotr Sarna
f2bb676d27 docs: mention python in debugging.md
Evaluating Python code from within gdb is priceless,
especially that all helper classes and functions sourced from
scylla-gdb.py can be used in there. This commit adds a paragraph
in debugging.md mentioning this tool.

Closes #10869
2022-06-24 15:16:43 +03:00
Pavel Emelyanov' via ScyllaDB development
a78af050fd cql: Constify select_statement restrictions
It is in fact immutable (both the pointer and the object it points
to), so is the pointer copy returned by get_restrictions() method,
so are those propagated to filtering stuff.

tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1028

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220624083351.24970-1-xemul@scylladb.com>
2022-06-24 12:27:36 +03:00
Nadav Har'El
905088ce7a cql: improve error message for static column in materialized view
Static columns are not currently allowed in a materialized view. If the
base table has a static column and one tries to create a view with a
"SELECT *", the following error message is printed today:

  Unable to include static column 'ColumnDefinition{name=s,
  type=org.apache.cassandra.db.marshal.Int32Type, kind=STATIC,
  componentIndex=null, droppedAt=-9223372036854775808}' which would
  be included by Materialized View SELECT * statement

It is completely unnecessary to include all these details about the
column definition - just its name would have sufficed. In other words,
we should print def.name_as_text(), not the entire def. This is what
other error messages in the same file do as well.

After this patch the error message becomes nicer and clearer:

  Unable to include static column 's' which would be included by
  Materialized View SELECT * statement

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #10854
2022-06-24 11:19:33 +03:00
Botond Dénes
78750c2e1a Merge 'Compact staging sstables' from Benny Halevy
This series decouples the staging sstables from the table's sstable set.

The current behavior keeps the sstables in the staging directory until view building is done. They are readable as any other sstable, but fenced off from compaction, so they don't go away in the meanwhile.

Currently, when views are built, the sstables are moved into the main table directory where they will then be compacted normally.

The problem with this design is that the staging sstables are never compacted, in particular they won't get cleaned up or scrubbed.

The cleanup scenario open a backdoor for data resurrection when the staging sstables are moved after view building while possibly containing stale partitions (#9559) which will not be cleaned up until next time cleanup compaction is performed.

With this series, SSTables that are created in or moved to the staging sub-directory are "cloned" into the base table directory by hard-linking the components there and creating a new sstable object which loads the cloned files.

The former, in the staging directory is used solely for view building and is not added to the table's sstable set, while the latter, its clone, behaves like any other sstable and is added either to the regular or maintenance set and is read and compacted normally.

When view building is done, instead of moving the staging sstable into the table's base directory, it is simply unlinked.
If its "clone" wasn't compacted away yet, then it will just remain where it is, exactly like it would be after it was moved there in the present state of things.  If it was already compacted and no longer exists, then unlinking will then free its storage.

Note that snapshot is based on the sstables listed by the table, which do not include the staging sstables with this change.
But that shouldn't matter since even today, the sstables in the snapshot has no notion of "staging" directory and it is expected that the MV's are either updated view `nodetool refresh` if restoring sstables from snapshot using the uploads dir, or if restoring the whole table from backup - MV's are effectively expected to be rebuilt from scratch (they are not included in automatic snapshots anyway since we don't have snapshot-coherency across tables).

A fundamental infrastructure change was done to achieve that which is to change the sstable_list which was a std::unordered_set<shared_sstable> into a std::unordered_map<generation_type, shared_sstable> that keeps the shared_sstable objects indexed by generation number (that must be unique).  With this model, sstables are supposed to be searched by the generation number, not by their pointer, since when the staging sstable is clones, there will be 2 shared_sstable objects with the same generation (and different `dir()`) and we must distinguish between them.

Special care was taken to throw a runtime_error exception if when looking up a shared sstable and finding another one with the same generation, since they must never exist in the same sstable_map.

Fixes #9559

Closes #10657

* github.com:scylladb/scylla:
  table: clone staging sstables into table dir
  view_update_generator: discover_staging_sstables: reindent
  table: add get_staging_sstables
  view_update_generator: discover_staging_sstables: get shared table ptr earlier
  distributed_loader: populate table directory first
  sstables: time_series_sstable_set: insert: make exception safe
  sstables: move_to_new_dir: fix debug log message
2022-06-24 08:05:38 +03:00
Botond Dénes
1f4f8ba773 Merge 'compaction_manager: track if off-startegy compaction was performed in run_offstrategy_compaction' from Benny Halevy
This series moves the logic to not perform off-strategy compaction if the maintenance set is empty from the table layer down to the compaction_manager layer since it is the one that needs to make the decision.

With that compaction_manager::perform_offstrategy will return a future<bool> which resolves to true
iff off-strategy compaction was required and performed.

The sstable_compaction_test was adjusted and a new compaction_manager_for_testing class was added
to make sure the compaction manager is enabled when constructed (it wasn't so test_offstrategy_sstable_compaction didn't perform any off-strategy compactions!) and stopped before destroyed.

Closes #10848

* github.com:scylladb/scylla:
  table: perform_offstrategy_compaction: move off-strategy logic to compaction_manager
  compaction_manager: offstrategy_compaction_task: refactor log printouts
  test: sstable_compaction: compaction_manager_for_testing
2022-06-24 08:04:02 +03:00
Avi Kivity
dab56b82fa Merge 'Per-partition rate limiting' from Piotr Dulikowski
Due to its sharded and token-based architecture, Scylla works best when the user workload is more or less uniformly balanced across all nodes and shards. However, a common case when this assumption is broken is the "hot partition" - suddenly, a single partition starts getting a lot more reads and writes in comparison to other partitions. Because the shards owning the partition have only a fraction of the total cluster capacity, this quickly causes latency problems for other partitions within the same shard and vnode.

This PR introduces per-partition rate limiting feature. Now, users can choose to apply per-partition limits to their tables of choice using a schema extension:

```
ALTER TABLE ks.tbl
WITH per_partition_rate_limit = {
	'max_writes_per_second': 100,
	'max_reads_per_second': 200
};
```

Reads and writes which are detected to go over that quota are rejected to the client using a new RATE_LIMIT_ERROR CQL error code - existing error codes didn't really fit well with the rate limit error, so a new error code is added. This code is implemented as a part of a CQL protocol extension and returned to clients only if they requested the extension - if not, the existing CONFIG_ERROR will be used instead.

Limits are tracked and enforced on the replica side. If a write fails with some replicas reporting rate limit being reached, the rate limit error is propagated to the client. Additionally, the following optimization is implemented: if the coordinator shard/node is also a replica, we account the operation into the rate limit early and return an error in case of exceeding the rate limit before sending any messages to other replicas at all.

The PR covers regular, non-batch writes and single-partition reads. LWT and counters are not covered here.

Results of `perf_simple_query --smp=1 --operations-per-shard=1000000`:

- Write mode:
  ```
  8f690fdd47 (PR base):
  129644.11 tps ( 56.2 allocs/op,  13.2 tasks/op,   49785 insns/op)
  This PR:
  125564.01 tps ( 56.2 allocs/op,  13.2 tasks/op,   49825 insns/op)
  ```
- Read mode:
  ```
  8f690fdd47 (PR base):
  150026.63 tps ( 63.1 allocs/op,  12.1 tasks/op,   42806 insns/op)
  This PR:
  151043.00 tps ( 63.1 allocs/op,  12.1 tasks/op,   43075 insns/op)
  ```

Manual upgrade test:
- Start 3 nodes, 4 shards each, Scylla version 8f690fdd47
- Create a keyspace with scylla-bench, RF=3
- Start reading and writing with scylla-bench with CL=QUORUM
- Manually upgrade nodes one by one to the version from this PR
- Upgrade succeeded, apart from a small number of operations which failed when each node was being put down all reads/writes succeeded
- Successfully altered the scylla-bench table to have a read and write limit and those limits were enforced as expected

Fixes: #4703

Closes #9810

* github.com:scylladb/scylla:
  storage_proxy: metrics for per-partition rate limiting of reads
  storage_proxy: metrics for per-partition rate limiting of writes
  database: add stats for per partition rate limiting
  tests: add per_partition_rate_limit_test
  config: add add_per_partition_rate_limit_extension function for testing
  cf_prop_defs: guard per-partition rate limit with a feature
  query-request: add allow_limit flag
  storage_proxy: add allow rate limit flag to get_read_executor
  storage_proxy: resultize return type of get_read_executor
  storage_proxy: add per partition rate limit info to read RPC
  storage_proxy: add per partition rate limit info to query_result_local(_digest)
  storage_proxy: add allow rate limit flag to mutate/mutate_result
  storage_proxy: add allow rate limit flag to mutate_internal
  storage_proxy: add allow rate limit flag to mutate_begin
  storage_proxy: choose the right per partition rate limit info in write handler
  storage_proxy: resultize return types of write handler creation path
  storage_proxy: add per partition rate limit to mutation_holders
  storage_proxy: add per partition rate limit info to write RPC
  storage_proxy: add per partition rate limit info to mutate_locally
  database: apply per-partition rate limiting for reads/writes
  database: move and rename: classify_query -> classify_request
  schema: add per_partition_rate_limit schema extension
  db: add rate_limiter
  storage_proxy: propagate rate_limit_exception through read RPC
  gms: add TYPED_ERRORS_IN_READ_RPC cluster feature
  storage_proxy: pass rate_limit_exception through write RPC
  replica: add rate_limit_exception and a simple serialization framework
  docs: design doc for per-partition rate limiting
  transport: add rate_limit_error
2022-06-24 01:32:13 +03:00
Benny Halevy
597e415c38 table: clone staging sstables into table dir
clone staging sstables so their content may be compacted while
views are built.  When done, the hard-linked copy in the staging
subdirectory will be simply unlinked.

Fixes #9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-06-23 16:55:27 +03:00
Benny Halevy
ce5bd505dc view_update_generator: discover_staging_sstables: reindent
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-06-23 16:55:27 +03:00
Benny Halevy
59874b2837 table: add get_staging_sstables
We don't have to go over all sstables in the table to select the
staging sstables out of them, we can get it directly from the
_sstables_staging map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-06-23 16:55:27 +03:00
Benny Halevy
b8b14d76b3 view_update_generator: discover_staging_sstables: get shared table ptr earlier
It's potentially a bit more efficient since
t.get_sstables is called only once, while
t.shared_from_this() is called per staging sstable.

Also, prepare for the following patches that modify
this function further.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-06-23 16:55:27 +03:00
Benny Halevy
7536dd7f00 distributed_loader: populate table directory first
So we can clone staging sstables into it later
when populating the table from the staging_dir

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-06-23 16:55:27 +03:00
Benny Halevy
cd68b04fbf sstables: time_series_sstable_set: insert: make exception safe
Need to erase the shared sstable from _sstables
if insertion to _sstables_reversed fails.

Fixes #10787

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-06-23 16:55:27 +03:00
Benny Halevy
9d41676116 sstables: move_to_new_dir: fix debug log message
Remove extraneous `old_dir` arg.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-06-23 16:55:27 +03:00
Raphael S. Carvalho' via ScyllaDB development
32600f60f3 scylla-gdb: Fix scylla_compaction_tasks
Make it account for all the changes done in the compaction manager
recently. 5.0 is not affected. So does not merit a backport.

(gdb) scylla compaction-tasks
        1 type=sstables::compaction_type::Reshard, state=compaction_manager::task::state::active, "keyspace1"."standard1"
Total: 1 instances of compaction_manager::task

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220621225600.20359-1-raphaelsc@scylladb.com>
2022-06-23 16:17:31 +03:00
Piotr Sarna
026f58f2a4 scylla-gdb: document scylla_shard
The command is quite straightforward, but it didn't offer
any documentation when calling `help scylla shard`, so it's
hereby added. As a small bonus, a more comprehensive message
is printed when the argument is not an integer.
Message-Id: <9b958a4befce1c7baa6f86504ab74b93840b37e9.1655984258.git.sarna@scylladb.com>
2022-06-23 15:24:36 +03:00
Piotr Sarna
280e54c3e6 scylla-gdb: add printing registers from seastar::thread
`scylla thread` command is extended with a non-intrusive
option for dumping saved registers from the jmp_buf structure
in an unmangled form.
It can later be useful, e.g. for peeking into thread's instruction
pointer or reasoning about its stack.

Example debugging session:
(gdb) scylla threads
[shard 1] (seastar::thread_context*) 0x6010000d9e00, stack: 0x601004f00000
[shard 1] (seastar::thread_context*) 0x6010000daf00, stack: 0x601004e00000

(gdb) scylla thread --print-regs 0x6010000d9e00
rbx: 0x601004f1fd00
rbp: 0x601004f1fc20
r12: 0x6010000d9e20
r13: 0x6010002a3190
r14: 0x601004f1fd08
r15: 0x6010000d9e10
rsp: 0x601004f1fbb0
rip: 0x2f0aea6

(gdb) disassemble 0x2f0aea6
Dump of assembler code for function _ZN7seastar12jmp_buf_link10switch_outEv:
   0x0000000002f0ae90 <+0>:	push   %rax
   0x0000000002f0ae91 <+1>:	mov    0xc8(%rdi),%rax
   0x0000000002f0ae98 <+8>:	mov    %rax,%fs:0xfffffffffffe5dc8
   0x0000000002f0aea1 <+17>:	call   0x30333d0 <_setjmp@plt>
   0x0000000002f0aea6 <+22>:	test   %eax,%eax
   0x0000000002f0aea8 <+24>:	je     0x2f0aeac <_ZN7seastar12jmp_buf_link10switch_outEv+28>
   0x0000000002f0aeaa <+26>:	pop    %rax
   0x0000000002f0aeab <+27>:	ret
   0x0000000002f0aeac <+28>:	mov    %fs:0xfffffffffffe5dc8,%rdi
   0x0000000002f0aeb5 <+37>:	mov    $0x1,%esi
   0x0000000002f0aeba <+42>:	call   0x30333c0 <longjmp@plt>
End of assembler dump.
Message-Id: <553c1ed76987776916d5261ed13866650e84df34.1655984258.git.sarna@scylladb.com>
2022-06-23 15:24:36 +03:00
Piotr Sarna
01d281442e test: extend view filtering test case
In order to cover more code paths, the test case
now places filtering on various combinations of base columns,
including both primary keys and regular columns.
It also makes the test scylla_only, as filtering is an extension
not supported in Cassandra right now.

Closes #10860
2022-06-23 14:19:41 +03:00
Botond Dénes
fd5f8f2275 query: have replica provide the last position
Use the recently introduced query-result facility to have the replica
set the position where the query should continue from. For now this is
the same as what the implicit position would have been previously (last
row in result), but it opens up the possibility to stop the query at a
dead row.
2022-06-23 13:36:24 +03:00
Botond Dénes
009d2fe2f7 idl/query: add last_position to query_result
To be used to allow the replica to specify the last position in the
stream, where the query was left off. Currently this is always
the same as the implicit position -- the last row in the result-set --
but this requires only stopping the read on a live row, which is a
requirement we want to lift: we want to be able to stop on a tombstone.
As tombstones are not included in the query result, we have to allow the
replica to overwrite the last seen position explicitly.
This patch introduces the new field in the query-result IDL but it is
not written to yet, nor is it read, that is left for the next patches.
2022-06-23 13:36:24 +03:00
Botond Dénes
7b6b7a49cd mutlishard_mutation_query: propagate compaction state to result builder
Not used in this patch, facilitates further patching.
2022-06-23 13:36:24 +03:00
Botond Dénes
738cb99c53 multishard_mutation_query: defer creating result builder until needed
Currently the result builder is created two frames above the method in
which actually needed. Push down a factory method instead and create it
where actually used. This allows us to pass it arguments that are
present only in the method which uses it.
2022-06-23 13:36:24 +03:00
Botond Dénes
5575f8a55a querier: use full_position instead of ad-hoc struct 2022-06-23 13:36:24 +03:00
Botond Dénes
58d53b66c1 querier: rely on compactor for position tracking
For some time now the compactor track its own position. The querier can
make use of this instead of duplicating this effort.
2022-06-23 13:36:24 +03:00
Botond Dénes
9beef08a1b mutation_compactor: add current_full_position() convenience accessor 2022-06-23 13:36:24 +03:00
Botond Dénes
a3cd235de2 mutation_compactor: s/_last_clustering_pos/_last_pos/
Generalize position tracking to track non-clustering positions too.
Also add an accessor for it.
2022-06-23 13:36:24 +03:00
Botond Dénes
5a6e807a1c mutation_compactor: add state accessor to compact_mutation 2022-06-23 13:36:24 +03:00