Commit Graph

42620 Commits

Author SHA1 Message Date
Tomasz Grabiec
0addca88b9 tests: tablets: Check that nodes are internally balanced
Existing tests are augmented with a check which verifies that
all nodes are internally balanced.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
0e2617336a tests: tablets: Improve debuggability by showing which rows are missing 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
329342bfb2 tablets, storage_service: Support intra-node migration in move_tablet() API 2024-05-16 00:28:47 +02:00
Tomasz Grabiec
db9d3f0128 tablet_allocator: Generate intra-node migration plan
Intra-node migrations are scheduled for each node independently with
the aim to equalize per-shard tablet count on each node.

This is needed to avoid severe imbalance between shards which can
happen when some table grows and is split. The inter-node balance can
be equal, so inter-node migration cannot fix the imbalance. Also, if
RF=N then there is not even a possibility of moving tablets around to
fix the imbalance.  The only way to bring the system to balance is to
move tablets within the nodes.

After scheduling inter-node migrations, the algorithm schedules
intra-node migrations. This means that across-node migrations can
proceed in parallel with intra-node migrations if there is free
capacity to carry them out, but across-node migrations have higher
priority.

Fixes #16594
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
793af3d6e1 tablet_allocator: Extract make_internode_plan()
Currently the load balancer is only generting an inter-node plan, and
the algorithm is embedded in make_plan(). The method will become even
harder to follow once we add more kinds of plan generating steps,
e.g. inter-node plan. Extract the inter-node plan to make it easier to
add other plans and see the grand flow.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
f95a0f0182 tablet_allocator: Maintain candidate list and shard tablet count for
target nodes

The node_load datastructure was not updated to reflect migration
decisions on the target node. This is not needed for inter-node
migration because target nodes are not considered as sources. But we
want it to reflect migration decisions so that later inter-node
migration sees an accurate picture with earlier migrations reflected
in node_load.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
c86f659421 tablet_allocator: Lift apply_load/can_accept_load lambdas to member functions
Will be needed by member methods which generate migration plans.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
fdcaaea91a tablets, streaming: Implement tablet streaming for intra-node migration 2024-05-16 00:28:46 +02:00
Tomasz Grabiec
aafeacc8d9 dht, auto_refreshing_sharder: Allow overriding write selector
During streaming for intra-node migration we want to write only to the
new shard. To achieve that, allow altering write selector in
sharder::shard_for_writes() and per-instance of
auto_refreshing_sharder.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
dfed4efcc5 multishard_writer: Handle intra-node migration
This writer is used by streaming, on tablet migration and
load-and-stream.

The caller of distribute_reader_and_consume_on_shards(), which provides
a sharder, is supposed to ensure that effective_replication_map is kept
alive around it, in order for topology coordinator to wait for any writes
which may be in flight to reach their shards before tablet replica starts
another migration. This is already the case:

  1) repair and load-and-stream keep the erm around writing.

  2) tablet migration uses autorefreshing_sharder, so it does not, but
     it keeps the topology_guard around the operation in the consumer,
     which serves the same purpose.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
4df818db98 storage_proxy: Handle intra-node tablet migration for writes
When sharder says that the write should go to multiple shards,
we need to consider the write as applied only if it was applied
to all those shards.

This can happen during intra-node tablet migration. During such migration,
the request coordinator on storage_proxy side is coordinating to hosts
as if no migration was in progress. The replica-side coordinator coordinates
to shards based on sharder response.

One way to think about it is that
effective_replication_map::get_natural_endpoints()/get_pending_endpoints()
tells how to coordinate between nodes, and sharder tells how to
coordinate between shards. Both work with some snapshot of tablet
metadata, which should be kept alive around the operation. Sharder is
associated with its own effective_replication_map, which marks the
topology version as used and allows barriers to synchronize with
replica-side operations.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
6c6ce2d928 tablets: Get rid of tablet_map::get_shard()
Its semantics do not fit well with intra-node migration which allow
two owning shards. Replace uses with the new has_replica() API.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
d000ad0325 tablets: Avoid tablet_map::get_shard in cleanup
In preparation for intra-node migration for which get_shard() is not
prepared.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
daaceda963 tablets: test: Use sharder instead of tablet_map::get_shard()
tablet_map::get_shard() will go away as it is not prepared for
intra-node migration.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
d47dfceb34 tablets: tablet_sharder: Allow working with non-local host
Will be used in tests.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
6946ad2a45 sharding: Prepare for intra-node-migration
Tablet sharder is adjusted to handle intra-migration where a tablet
can have two replicas on the same host. For reads, sharder uses the
read selector to resolve the conflict. For writes, the write selector
is used.

The old shard_of() API is kept to represent shard for reads, and new
method is introduced to query the shards for writing:
shard_for_writes(). All writers should be switched to that API, which
is not done in this patch yet.

The request handler on replica side acts as a second-level
coordinator, using sharder to determine routing to shards. A given
sharder has a scope of a single topology version, a single
effective_replication_map_ptr, which should be kept alive during
writes.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
b5bb46357b docs: Document sharder use for tablets 2024-05-16 00:28:46 +02:00
Tomasz Grabiec
82b34d34d8 tablets: Introduce tablet transition kind for intra-node migration
We need a separate transition kind for intra node migration so that we
don't have to recover this information from replica set in an
expensive way. This information is needed in the hot path - in
effective_replicaiton_map, to not return the pending tablet replica to
the coordinator. From its perspective, replica set is not
transitional.

The transition will also be used to alter the behavior of the
sharder. When not in intra-node migration, the sharder should
advertise the shard which is either in the previous or next replica
set. During intra-node migration, that's not possible as there may be
two such shards. So it will return the shard according to the current
read selector.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
942ea39bf0 tests: tablets: Fix use-after-move of skiplist in rebalance_tablets()
balance_tablets() is invoked in a loop, so only the first call will
see non-empty skiplist.

This bug starts to manifest after adding intra-node migration plan,
causing failures of the test_load_balancing_with_skiplist test
case. The reason is that rebalancing will now require multiple passes
before convergence is reached, due to intra-node migrations, and later
calls will not see the skiplist and try to balance skipped nodes,
vioating test's assertions.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
4d84451cf1 sstables, gdb: Track readers in a linked list
For the purpose of scylla-gdb.py command "scylla
active-sstables". Before the patch, readers were located by scanning
the heap for live objects with vtable pointers corresponding to
readers. It was observed that the test scylla_gdb/test_misc.py::test_active_sstables started failing like this:

  gdb.error: Error occurred in Python: Cannot access memory at address 0x300000000000000

This could be explained by there being a live object on the heap which
used to be a reader but now is a different object, and the _sst field
contains some other data which is not a pointer.

To fix, track readers explicitly in a linked list so that the gdb
script can reliably walk readers.

Fixes #18618.
2024-05-16 00:28:46 +02:00
Tomasz Grabiec
fad6c41cee raft topology: Fix global token metadata barrier to not fence ahead of what is drained
Topology version may be updated, for example, by executing a RESTful
API call to move a tablet. If that is done concurrently with an
ongoing token metadata barrier executed by topology coordinator
(because there is active tablet migration, for example), then some
requests may fail due to being fenced out unnecessarily.

The problem is that barrier function assumes no concurrent topology
updates so it sets the fence version to the one which is current after
other nodes are drained. This patch changes it to set the fence to the
version which was current before other nodes were drained. Semantics
of the barrier are preserved because it only guarantees that topology
state from before the invocation of barrier is propagated.

Fixes #18699
2024-05-16 00:28:46 +02:00
Amnon Heiman
0c84692c97 replica/table.cc: Add metrics per-table-per-node
This patch adds metrics that will be reported per-table per-node.
The added metrics (that are part of the per-table per-shard metrics)
are:
scylla_column_family_cache_hit_rate
scylla_column_family_read_latency
scylla_column_family_write_latency
scylla_column_family_live_disk_space

Fixes #18642

Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Closes scylladb/scylladb#18645
2024-05-14 07:54:34 +03:00
Raphael S. Carvalho
0b2ec3063c sstables: Fix incremental_reader_selector (for range reads) with tablets
incremental_reader_selector is the mechanism for incremental comsumption
of disjoint sstables on range reads.

tablet_sstable_set was implemented, such that selector is efficient with
tablets.

The problem is selector is vnode addicted and will only consider a given
set exhausted when maximum token is reached.

With tablets, that means a range read on first tablet of a given shard
will also consume other tablets living in the same shard. That results
in combined reader having to work with empty sstable readers of tablets
that don't intersect with the range of the read. It won't cause extra
I/O because the underlying sstables don't intersect with the range of
the read. It's only unnecessary CPU work, as it involves creating
readers (= allocation), feeding them into combined reader, which will
in turn invoke the sstable readers only to realize they don't have any
data for that range.

With 100k tablets (ranges), and 100 tablets per shard, and ~5 sstables
per tablet, there will be this amount of readers (empty or not):
  (100k * ((100^2 + 100) / 2) * avg_sstable_per_tablet=5) = ~2.5 billions.

~5000 times more readers, it can be quite significant additional cpu
work, even though I/O dominates the most in scans. It's an inefficiency
that we rather get rid of.

The behavior can be observed from logs (there's 1 sstable for each of
4 tablets, but note how readers are created for every single one of
them when reading only 1 tablet range):
```
table - make_reader_v2 - range=(-inf, {-4611686018427387905, end}]
    incremental_reader_selector - create_new_readers(null): selecting on pos {minimum token, w=-1}
    sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._34qn42... that has range [{-9151620220812943033, start},{-4813568684827439727, end}]
    incremental_reader_selector - create_new_readers(null): selecting on pos {-4611686018427387904, w=-1}
    sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._368nk2... that has range [{-4599560452460784857, start},{-78043747517466964, end}]
    incremental_reader_selector - create_new_readers(null): selecting on pos {0, w=-1}
    sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._38lj42... that has range [{851021166589397842, start},{3516631334339266977, end}]
    incremental_reader_selector - create_new_readers(null): selecting on pos {4611686018427387904, w=-1}
    sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._3dba82... that has range [{5065088566032249228, start},{9215673076482556375, end}]
```

Fix is about making sure the tablet set won't select past the
supplied range of the read.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18556
2024-05-14 07:43:22 +03:00
Pavel Emelyanov
bb1696910c Merge 'scylla-nodetool: make documentation links product and version dependant' from Botond Dénes
Currently, all documentation links that feature anywhere in the help output of scylla-nodetool, are hard-coded to point to the documentation of the latest stable release. As our documentation is version and product (open-source or enterprise) specific, this is not correct. This PR addresses this, by generating documentation links such that they point to the documentation appropriate for the product and version of the scylladb release.

Fixes: https://github.com/scylladb/scylladb/issues/18276

- [x] the native nodetool is a new feature, no backport needed

Closes scylladb/scylladb#18476

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: make doc link version-specific
  release: introduce doc_link()
  build: pass scylla product to release.cc
2024-05-13 18:03:45 +03:00
Botond Dénes
d82a31f15f service/storage_proxy: add useful version of base write throttle metrics
There are two metrics to help observe base-write throttling:
* current_throttled_base_writes
* last_mv_flow_control_delay

Both show a snapshot of what is happening right at the time of querying
these metrincs. This doesn't work well when one wants to investigate the
role throttling is playing in occasional write timeouts.s Prometheus
scrapes metrics in multi-second intervals, and the probability of that
instant catching the throttling at play is very small (almost zero).
Add two new metrics:
* throttled_base_writes_total
* mv_flow_control_delay_total

These accumulate all values, allowing graphana to derive the values and
extract information about throttle events that happened in the past
(but not necessarily at the instant of the scrape).
Note that dividing the two values, will yield the average delay for a
throttle, which is also useful.

Closes scylladb/scylladb#18435
2024-05-13 18:02:06 +03:00
Asias He
952dfc6157 repair: Introduce repair_partition_count_estimation_ratio config option
In commit 642f9a1966 (repair: Improve
estimated_partitions to reduce memory usage), a 10% hard coded
estimation ratio is used.

This patch introduces a new config option to specify the estimation
ratio of partitions written by repair out of the total partitions.

It is set to 0.1 by default.

Fixes #18615

Closes scylladb/scylladb#18634
2024-05-13 15:16:55 +03:00
Botond Dénes
afa870a387 Merge 'Some sstable set related improvements' from Raphael "Raph" Carvalho
Closes scylladb/scylladb#18616

* github.com:scylladb/scylladb:
  replica: Make it explicit table's sstable set is immutable
  replica: avoid reallocations in tablet_sstable_set
  replica: Avoid compound set if only one sstable set is filled
2024-05-13 14:17:24 +03:00
Pavel Emelyanov
2ce643d06b table: Directly compare std::optional<shard_id> with shard_id
There's a loop that calculates the number of shard matches over a tablet
map. The check of the given shard against optional<shard> can be made
shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18592
2024-05-13 13:25:05 +03:00
Andrei Chekun
76a766cab0 Migrate alternator tests to PythonTestSuite
As part of the unification process, alternator tests are migrated to the PythonTestSuite instead of using the RunTestSuite. The main idea is to have one suite, so there will be easier to maintain and introduce new features.
Introduce the prepare_sql option for suite.yaml to add possibility to run cql statements as precondition for the test suite.
Related: https://github.com/scylladb/scylladb/issues/18188

Closes scylladb/scylladb#18442
2024-05-13 13:23:29 +03:00
Avi Kivity
51d09e6a2a cql3: castas_fcts: do not rely on boost casting large multiprecision integers to floats behavior
In [1] a bug casting large multiprecision integers to floats is documented (note that it
received two fixes, the most recent and relevant is [2]). Even with the fix, boost now
returns NaN instead of ±∞ as it did before [3].

Since we cannot rely on boost, detect the conditions that trigger the bug and return
the expected result.

The unit test is extended to cover large negative numbers.

Boost version behavior:
 - 1.78 - returns ±∞
 - 1.79 - terminates
 - 1.79 + fix - returns NaN

Fixes https://github.com/scylladb/scylladb/issues/18508

[1] https://github.com/boostorg/multiprecision/issues/553
[2] ea786494db
[3] https://github.com/boostorg/math/issues/1132

Closes scylladb/scylladb#18532
2024-05-13 13:18:28 +03:00
Yaniv Michael Kaul
4639ca1bf5 compaction_strategy.cc: typo -> "performanceimproves" -> "performance improves"
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#18629
2024-05-13 08:43:38 +03:00
Patryk Wrobel
ec820e214c scylla_io_setup: ensure correct RLIMIT_NOFILE for iotune
The default limit of open file descriptors
per process may be too small for iotune on
certain machines with large number of cores.

In such case iotune reports failure due to
unability to create files or to set up seastar
framework.

This change configures the limit of open file
descriptors before running iotune to ensure
that the failure does not occur.

The limit is set via 'resource.setrlimit()' in
the parent process. The limit is then inherited
by the child process.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#18546
2024-05-13 08:35:52 +03:00
Avi Kivity
cc8b4e0630 batchlog_manager, test: initialize delay configuration
In b4e66ddf1d (4.0) we added a new batchlog_manager configuration
named delay, but forgot to initialize it in cql_test_env. This somehow
worked, but doesn't with clang 18.

Fix it by initializing to 0 (there isn't a good reason to delay it).
Also provide a default to make it safer.

Closes scylladb/scylladb#18572
2024-05-13 07:57:35 +03:00
Israel Fruchter
a1a6bd6798 Update tools/cqlsh submodule to v6.0.18
* tools/cqlsh e5f5eafd...c8158555 (11):
  > cqlshlib/sslhandling: fix logic of `ssl_check_hostname`
  > cqlshlib/sslhandling.py: don't use empty userkey/usercert
  > Dockerfile: noninteractive isn't enough for answering yet on apt-get
  > fix cqlsh version print
  > cqlshlib/sslhandling: change `check_hostname` deafult to False
  > Introduce new ssl configuration for disableing check_hostname
  > set the hostname in ssl_options.server_hostname when SSL is used
  > issue-73 Fixed a bug where username and password from the credentials file were ignored.
  > issue-73 Fixed a bug where username and password from the credentials file were ignored.
  > issue-73
  > github actions: update `cibuildwheel==v2.16.5`

Fixes: scylladb/scylladb#18590

Closes scylladb/scylladb#18591
2024-05-13 07:25:10 +03:00
Yaron Kaikov
3eb81915c1 docker: drop jmx and tools-java from installation
Following the work done in dd0779675f,
removing the scylla-jmx and scylla-tools-java from our docker image

Closes scylladb/scylladb#18566
2024-05-13 07:24:23 +03:00
Takuya ASADA
9538af0d95 scylla_kernel_check: fix block device size error on latest mkfs.xfs
On latest mkfs.xfs, it does not allow to format a block device which is
smaller than 300MB.
There are options to ignore this validation but it is unsupported
feature, so it is better to increase the loopback image size to
"supported size" == 300MB.

reference: https://lore.kernel.org/all/164738662491.3191861.15611882856331908607.stgit@magnolia/

Fixes #18568

Closes scylladb/scylladb#18620
2024-05-13 07:23:29 +03:00
Avi Kivity
c8cc47df2d Merge 'replica: allocate storage groups dynamically' from Aleksandra Martyniuk
Allocate storage groups dynamically, i.e.:
- on table creation allocate only storage groups that are on this
  shard;
- allocate a storage group for tablet that is moved to this shard;
- deallocate storage group for tablet that is moved out of this shard.

Output of `./build/release/scylla perf-simple-query -c 1 --random-seed=2248493992` before change:
```
random-seed=2248493992
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
64933.90 tps ( 63.2 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42163 insns/op,        0 errors)
65865.36 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42155 insns/op,        0 errors)
66649.36 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42176 insns/op,        0 errors)
67029.60 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42176 insns/op,        0 errors)
68361.21 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42166 insns/op,        0 errors)

median 66649.36 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42176 insns/op,        0 errors)
median absolute deviation: 784.00
maximum: 68361.21
minimum: 64933.90
```

Output of `./build/release/scylla perf-simple-query -c 1 --random-seed=2248493992` after change:
```
random-seed=2248493992
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
63744.12 tps ( 63.2 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42153 insns/op,        0 errors)
66613.16 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42153 insns/op,        0 errors)
69667.39 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42184 insns/op,        0 errors)
67824.78 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42180 insns/op,        0 errors)
67244.21 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42174 insns/op,        0 errors)

median 67244.21 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42174 insns/op,        0 errors)
median absolute deviation: 631.05
maximum: 69667.39
minimum: 63744.12
```

Fixes: #16877.

Closes scylladb/scylladb#17664

* github.com:scylladb/scylladb:
  test: add test for back and forth tablets migration
  replica: allocate storage groups dynamically
  replica: refresh snapshot in compaction_group::cleanup
  replica: add rwlock to storage_group_manager
  replica: handle reads of non-existing tablets gracefully
  service: move to cleanup stage if allow_write_both_read_old fails
  replica: replace table::as_table_state
  compaction: pass compaction group id to reshape_compaction_group
  replica: open code get_compaction_group in perform_cleanup_compaction
  replica: drop single_compaction_group_if_available
2024-05-12 21:22:02 +03:00
Nadav Har'El
9813ec9446 Merge 'test: perf: add end-to-end benchmark for alternator' from Marcin Maliszkiewicz
The code is based on similar idea as perf_simple_query. The main differences are:
  - it starts full scylla process
  - communicates with alternator via http (localhost)
  - uses richer table schema with all dynamoDB types instead of only strings

  Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc).

  Results on my machine (with 1 vCPU):
  > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload read --duration 10 2> /dev/null
  ...
  median 23402.59616090321
  median absolute deviation: 598.77
  maximum: 24014.41
  minimum: 19990.34

  > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write --duration 10 2> /dev/null
  ...
  median 16089.34211320635
  median absolute deviation: 552.65
  maximum: 16915.95
  minimum: 14781.97

  The above seem more realistic than results from perf_simple_query which are 96k and 49k tps (per core).

Related: https://github.com/scylladb/scylladb/issues/12518

Closes scylladb/scylladb#13121

* github.com:scylladb/scylladb:
  test: perf: alternator: add option to skip data pre-population
  perf-alternator-workloads: add operations-per-shard option
  test: perf: add global secondary indexes write workload for alternator
  test: perf: add option to continue after failed request
  test: perf: add read modify write workload for alternator (lwt)
  test: perf: add scan workload for alternator
  test: perf: add end-to-end benchmark for alternator
  test: perf: extract result aggregation logic to a separate struct
2024-05-12 18:15:29 +03:00
Kefu Chai
fd14b6f26b test/nodetool: do not accept 1 return code when passing --help to nodetool
in 906700d5, we accepted 0 as well as the return code of
"nodetool <command> --help", because we needed to be prepared for
the newer seastar submodule while be compatible with the older
seastar versions. now that in 305f1bd3, we bumped up the seastar
module, and this commit picked up the change to return 0 when
handling "--help" command line option in seastar, we are able to
drop the workaround.

so, in this change, we only use "0" as the expected return code.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18627
2024-05-12 14:30:31 +03:00
Avi Kivity
be76527781 Merge 'build: cmake build dist-unified by default and put tarballs under per-config paths' from Kefu Chai
in the same spirit of d57a82c156, this change adds `dist-unified` as one of the default targets. so that it is built by default. the unified package is required to when redistributing the precompiled packages -- we publish the rpm, deb and tar balls to S3.

- [x] cmake related change, no need to backport

Closes scylladb/scylladb#18621

* github.com:scylladb/scylladb:
  build: cmake: use paths to be compatible with CI
  build: cmake build dist-unified by default
2024-05-12 11:16:03 +03:00
Benny Halevy
796ca367d1 gossiper: rename topo_sm member to _topo_sm
Follow scylla convention for class member naming.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#18528
2024-05-12 11:02:35 +03:00
Avi Kivity
2ad13e5d76 auth: complete coroutinization of password_authenticator::create_default_if_missing
password_authenticator::create_default_if_missing() is a confusing mix of
coroutines and continuations, simplify it to a normal coroutine.

Closes scylladb/scylladb#18571
2024-05-11 17:04:20 +03:00
Kefu Chai
1186ddef16 build: cmake: use paths to be compatible with CI
our CI workflow for publishing the packages expects the tar balls
to be located under `build/$buildMode/dist/tar`, where `$buildMode`
is "release" or "debug".

before this change, the CMake building system puts the tar balls
under "build/dist" when the multi-config generator is used. and
`configure.py` uses multi-config generator.

in this change, we put the tar balls for redistribution under
`build/$<CONFIG>/dist/tar`, where `$<CONFIG>` is "RelWithDebInfo"
or "Debug", this works better with the CI workflow -- we just need
to map "release" and "debug" to "RelWithDebInfo" and "Debug" respectively.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-05-11 21:56:50 +08:00
Kefu Chai
0f85255c74 build: cmake build dist-unified by default
in the same spirit of d57a82c156, this change adds `dist-unified`
as one of the default targets. so that it is built by default.
the unified package is required to when redistributing the precompiled
packages -- we publish the rpm, deb and tar balls to S3.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-05-11 18:44:11 +08:00
Raphael S. Carvalho
7faba69f28 replica: Make it explicit table's sstable set is immutable
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-10 11:58:08 -03:00
Raphael S. Carvalho
55c0272b68 replica: avoid reallocations in tablet_sstable_set
reserve upfront wherever possible to avoid reallocations.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-10 10:44:39 -03:00
Raphael S. Carvalho
35a0d47408 replica: Avoid compound set if only one sstable set is filled
Most of the time only main set is filled, so we can avoid one layer
of indirection (= compound set) when maintenance set is empty.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-05-10 10:44:34 -03:00
Aleksandra Martyniuk
51fdda4199 test: add test for back and forth tablets migration 2024-05-10 15:08:56 +02:00
Aleksandra Martyniuk
b4371a0ea0 replica: allocate storage groups dynamically
Currently empty storage_groups are allocated for tablets that are
not on this shard.

Allocate storage groups dynamically, i.e.:
- on table creation allocate only storage groups that are on this
  shard;
- allocate a storage group for tablet that is moved to this shard;
- deallocate storage group for tablet that is cleaned up.

Stop compaction group before it's deallocated.

Add a flag to table::cleanup_tablet deciding whether to deallocate
sgs and use it in commitlog tests.
2024-05-10 15:08:21 +02:00
Aleksandra Martyniuk
6e1e082e8c replica: refresh snapshot in compaction_group::cleanup
During compaction_group::cleanup sstables set is updated, but
row_cache::_underlaying still keeps a shared ptr to the old set.
Due to that descriptors to deleted sstables aren't closed.

Refresh snapshot in order to store new sstables set in _underlying
mutation source.
2024-05-10 14:56:38 +02:00