Commit Graph

23804 Commits

Author SHA1 Message Date
Avi Kivity
a43d5079f3 table: fix on_compaction_completion corrupting _sstables_compacted_but_not_deleted during self-race
on_compaction_completion() updates _sstables_compacted_but_not_deleted
through a temporary to avoid an exception causing a partial update:

  1. copy _sstables_compacted_but_not_deleted to a temporary
  2. update temporary
  3. do dangerous stuff
  4. move temporary to _sstables_compacted_but_not_deleted

This is racy when we have parallel compactions, since step 3 yields.
We can have two invocations running in parallel, taking snapshots
of the same _sstables_compacted_but_not_deleted in step 1, each
modifying it in different ways, and only one of them winning the
race and assigning in step 4. With the right timing we can end
with extra sstables in _sstables_compacted_but_not_deleted.

Before a5369881b3, this was a benign race (only resulting in
deleted file space not being reclaimed until the service is shut
down), but afterwards, extra sstable references result in the service
refusing to shut down. This was observed in database_test in debug
mode, where the race more or less reliably happens for system.truncated.

Fix by using a different method to protect
_sstables_compacted_but_not_deleted. We unconditionally update it,
and also unconditionally fix it up (on success or failure) using
seastar::defer(). The fixup includes a call to rebuild_statistics()
which must happen every time we touch the sstable list.

Fixes #7331.
2020-10-06 08:29:34 +03:00
Etienne Adam
46f0354cdb redis: pass request as a reference
This patch change the way the request object is passed,
using a reference instead of temporaries.

'exists' test is passing in debug mode, whereas it was
always failing before.

Fixes #7261 by ensuring request object is alive for all commands
during the whole request duration.

Signed-off-by: Etienne Adam <etienne.adam@gmail.com>
Message-Id: <20200924202034.30399-1-etienne.adam@gmail.com>
2020-10-04 14:58:00 +03:00
Avi Kivity
5b5b8b3264 lua: be compatibile with Lua 5.4's lua_resume()
Lua 5.4 added an extra parameter to lua_resume()[1]. The parameter
denotes the number of arguments yielded, but our coroutines
don't yield any arguments, so we can just ignore it.

Define a macro to allow adding extra stuff with Lua 5.4,
and use it to supply the extra parameter.

[1] https://www.lua.org/manual/5.4/manual.html#8.3

Closes #7324
2020-10-04 14:07:51 +03:00
Nadav Har'El
ad48d8b43c Merge 'idl: fix definition order related build failures with clang' from Avi Kivity
Clang eagerly instantiates templates, apparently with the following
algorithm:

 - if both the declaration and definition are seen at the time of
   instantiation, instantiate the template
 - if only the declaration is see at the time of instantiation, just emit
   a reference to the template; even if the definition is later seen,
   it is not instantiated

The "reference" in the second case is a relocation entry in the object file
that is satisfied at link time by the linker, but if no other object file
instantiated the needed template, a link error results.

These problems are hard to diagnose but easy to fix. This series fixes all
known such issues in the code base. It was tested on gcc as well.

Closes #7322

* github.com:scylladb/scylla:
  query-result-reader: order idl implementations correctly
  frozen_schema: order idl implementations correctly
  idl-compiler: generate views after serializers
2020-10-04 11:16:19 +03:00
Takuya ASADA
d611d74905 dist/common/scripts/scylla_setup: force developer mode on nonroot when NOFILE is too low
On Ubuntu 16/18 and Debian 9, LimitNOFILE is set to 4096 and not able to override from
user unit.
To run scylla-server in such environment, we need to turn on developer mode and show
warnings.

Fixes #7133

Closes #7323
2020-10-04 10:16:30 +03:00
Avi Kivity
4b40bc5065 query-result-reader: order idl implementations correctly
Clang eagerly instantiates templates, so if it needs a template
function for which it has a declaration but not a definition, it
will not instantiate the definition when it sees it. This causes
link errors.

Fix by ordering the idl implementation files so that definitions
come before uses.
2020-10-03 19:56:29 +03:00
Avi Kivity
94fcec99d1 frozen_schema: order idl implementations correctly
Clang eagerly instantiates templates, so if it needs a template
function for which it has a declaration but not a definition, it
will not instantiate the definition when it sees it. This causes
link errors.

Fix by ordering the idl implementation files so that definitions
come before uses.
2020-10-03 19:56:28 +03:00
Avi Kivity
a99aba9e48 idl-compiler: generate views after serializers
Clang eagerly instantiates templates, so if it needs a template
function for which it has a declaration but not a definition, it
will not instantiate the definition when it sees it. This causes
link errors.

In this case, the views use the serializer implementations, but are
generated before them.

Fix by generating the view implementations after the serializer
implementations that they use.
2020-10-03 19:56:25 +03:00
Tomasz Grabiec
40b42393d2 Merge "Raft: disable boost tests, add disable to test.py" from Alejo
Add disable option for test configuration.
Tests in this list will be disabled for all modes.

* alejo/next-disable-raft-tests-01:
  Raft: disable boost tests for now
  Tests: add disable to configuration
  Raft: Remove tests for now
2020-10-02 15:51:13 +02:00
Yaron Kaikov
bec0c15ee9 configure.py: Add version to unified tarball filename
Let's add the version and release to unified tarball filename to avoid
having to do that in release engineering pipelines, for example.

Closes #7317
2020-10-02 15:48:11 +03:00
Alejo Sanchez
bb67d15e2f Raft: disable boost tests for now
Disable raft fsm boost tests until raft is part of build.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-10-02 14:03:01 +02:00
Alejo Sanchez
eff7b63c08 Tests: add disable to configuration
For suite.yaml add an extra configuration option disable.

Tests in this list will disabled for all modes.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-10-02 14:01:50 +02:00
Alejo Sanchez
ef170a5088 Raft: Remove tests for now
Remove raft C++ tests until raft is included in build process.

[tgrabiec]: Fixes test.py failure. Tests are not compiled unless --build-raft is
passed to configure.py and we cannot enable it by default yet.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Message-Id: <20201002102847.1140775-1-alejo.sanchez@scylladb.com>
2020-10-02 12:42:21 +02:00
Alejo Sanchez
4e26dad3a0 Raft: Remove tests for now
Remove raft C++ tests until raft is included in build process.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-10-02 12:26:05 +02:00
Tomasz Grabiec
864b2c5736 CMakeLists.txt: Add raft directory to source code directories
Needed for IDE integration. Not used for building currently.
Message-Id: <1601570008-19666-1-git-send-email-tgrabiec@scylladb.com>
2020-10-01 19:38:39 +03:00
Gleb Natapov
3e8dbb3c09 lwt: do not return unavailable exception from the 'learn' stage
Unavailable exception means that operation was not started and it can be
retried safely. If lwt fails in the learn stage though it most
certainly means that its effect will be observable already. The patch
returns timeout exception instead which means uncertainty.

Fixes #7258

Message-Id: <20201001130724.GA2283830@scylladb.com>
2020-10-01 17:16:52 +02:00
Tomasz Grabiec
ca7f0c61f0 Merge "raft: initial implementation" from Gleb
This is the beginning of raft protocol implementation. It only supports
log replication and voter state machine. The main difference between
this one and the RFC (besides having voter state machine) is that the
approach taken here is to implement raft as a deterministic state
machine and move all the IO processing away from the main logic.
To do that some changes to RPC interface was required: all verbs are now
one way meaning that sending a request does not wait for a reply  and
the reply arrives as a separate message (or not at all, it is safe to
drop packets).

* scylla-dev/raft-v4:
  raft: add a short readme file
  raft: compile raft tests
  raft: add raft tests
  raft: Implement log replication and leader election
  raft: Introduce raft interface header
2020-10-01 17:09:52 +02:00
Konstantin Osipov
9a5f2b87dc raft: add a short readme file
The file has a brief description of the code status, usage and some
implementation assumptions.
2020-10-01 14:30:59 +03:00
Gleb Natapov
16cb009ea2 raft: compile raft tests
Compilation is not enabled by default as it requires coroutines support
and may require special compiler (until distributed one fixes all the
bugs related to coroutines). To enable raft tests compilation new
configure.py option is added (--build-raft).
2020-10-01 14:30:59 +03:00
Gleb Natapov
4959609589 raft: add raft tests
Add test for currently implemented raft features. replication_test
tests replication functionality with various initial log configurations.
raft_fsm_test test voting state machine functionality.
2020-10-01 14:30:59 +03:00
Gleb Natapov
e1ac1a61c9 raft: Implement log replication and leader election
This patch introduces partial RAFT implementation. It has only log
replication and leader election support. Snapshotting and configuration
change along with other, smaller features are not yet implemented.

The approach taken by this implementation is to have a deterministic
state machine coded in raft::fsm. What makes the FSM deterministic is
that it does not do any IO by itself. It only takes an input (which may
be a networking message, time tick or new append message), changes its
state and produce an output. The output contains the state that has
to be persisted, messages that need to be sent and entries that may
be applied (in that order). The input and output of the FSM is handled
by raft::server class. It uses raft::rpc interface to send and receive
messages and raft::storage interface to implement persistence.
2020-10-01 14:30:59 +03:00
Gleb Natapov
c073997431 raft: Introduce raft interface header
This commit introduce public raft interfaces. raft::server represents
single raft server instance. raft::state_machine represents a user
defined state machine. raft::rpc, raft::rpc_client and raft::storage are
used to allow implementing custom networking and storage layers.

A shared failure detector interface defines keep-alive semantics,
required for efficient implementation of thousands of raft groups.
2020-10-01 14:30:59 +03:00
Piotr Dulikowski
bfbf02a657 transport/config: fix cross-shard use of updateable_value
Recently, the cql_server_config::max_concurrent_requests field was
changed to be an updateable_value, so that it is updated when the
corresponding option in Scylla's configuration is live-reloaded.
Unfortunately, due to how cql_server is constructed, this caused
cql_server instances on all shards to store an updateable_value which
pointed to an updateable_value_source on shard 0. Unsynchronized
cross-shard memory operations ensue.

The fix changes the cql_server_config so that it holds a function which
creates an updateable_value appropriate for the given shard. This
pattern is similar to another, already existing option in the config:
get_service_memory_limiter_semaphore.

This fix can be reverted if updateable_value becomes safe to use across
shards.

Tests: unit(dev)

Fixes: #7310
2020-10-01 14:10:56 +03:00
Etienne Adam
98dc0dc03a redis: only create required keyspaces/tables
The 'redis_database_count' was already existing, but
was not used when initializing the keyspaces. This
patch merely uses it. I think it's better that way, it
seems cleaner not to create 15 x 5 tables when we
use only one redis database.

Also change a test to test with a higher max number
of database.

Signed-off-by: Etienne Adam <etienne.adam@gmail.com>
Message-Id: <20200930210256.4439-1-etienne.adam@gmail.com>
2020-10-01 10:27:03 +03:00
Wojciech Mitros
e79ad38425 tracing: add username to the session table
In order to improve observability, add a username field to the the
system_traces.sessions table. The system table should be change
while upgrading by running the fix_system_distributed_tables.py
script. Until the table is updated, the old behaviour is preserved.

Fixes #6737.
2020-10-01 04:46:40 +02:00
Nadav Har'El
d73cf589e7 docs: fix typos in docs/alternator/alternator.md
Discovered by running a spell-checker.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200930101046.76710-1-nyh@scylladb.com>
2020-10-01 04:46:40 +02:00
Nadav Har'El
8db01aeeb4 docs: fix typo in alternator/getting-started.md
Fix a typo reported by a user. Ran spell-checker to verify there are no
other obvious spelling mistakes in that file.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200930084304.74776-1-nyh@scylladb.com>
2020-10-01 04:46:40 +02:00
Avi Kivity
701d24a832 Merge 'Enhance max concurrent requests code' from Piotr Sarna
This miniseries enhances the code from #7279 by:
 * adding metrics for shed requests, which will allow to pinpoint the problem if the max concurrent requests threshold is too low
 * making the error message more comprehensive by pointing at the variable used to set max concurrent requests threshold

Example of an ehanced error message:
```
ConnectionException('Failed to initialize new connection to 127.0.0.1: Error from server: code=1001 [Coordinator node overloaded] message="too many in-flight requests (configured via max_concurrent_requests_per_shard): 18"',)})
```

Closes #7299

* github.com:scylladb/scylla:
  transport: make _requests_serving param uint32_t
  transport: make overloaded error message more descriptive
  transport: add requests_shed metrics
2020-10-01 04:46:40 +02:00
Piotr Sarna
876e9fe51a transport: make _requests_serving param uint32_t
It's not realistic for a shard to have over 4 billion concurrent
requests, so this value can be safely represented in 32 bits.
Also, since the current concurrency limit is represented in uint32_t,
it makes sense for these two to have matching types.
2020-09-30 08:20:52 +02:00
Piotr Sarna
d18f68f1c1 transport: make overloaded error message more descriptive
The message now mentions the config variable used to set the limit
of max allowed concurrent requests.
2020-09-30 08:20:51 +02:00
Piotr Sarna
792ff3757a transport: add requests_shed metrics
The counter shows a total number of requests shed due to overload.
2020-09-30 08:20:50 +02:00
Avi Kivity
fd1dd0eac7 Merge "Track the memory consumption of reader buffers" from Botond
"
The last major untracked area of the reader pipeline is the reader
buffers. These scale with the number of readers as well as with the size
and shape of data, so their memory consumption is unpredictable varies
wildly. For example many small rows will trigger larger buffers
allocated within the `circular_buffer<mutation_fragment>`, while few
larger rows will consume a lot of external memory.

This series covers this area by tracking the memory consumption of both
the buffer and its content. This is achieved by passing a tracking
allocator to `circular_buffer<mutation_fragment>` so that each
allocation it makes is tracked. Additionally, we now track the memory
consumption of each and every mutation fragment through its whole
lifetime. Initially I contemplated just tracking the `_buffer_size` of
`flat_mutation_reader::impl`, but concluded that as our reader trees are
typically quite deep, this would result in a lot of unnecessary
`signal()`/`consume()` calls, that scales with the number of mutation
fragments and hence adds to the already considerable per mutation
fragment overhead. The solution chosen in this series is to instead
track the memory consumption of the individual mutation fragments, with
the observation that these are typically always moved and very rarely
copied, so the number of `signal()`/`consume()` calls will be minimal.

This additional tracking introduces an interesting dilemma however:
readers will now have significant memory on their account even before
being admitted. So it may happen that they can prevent their own
admission via this memory consumption. To prevent this, memory
consumption is only forwarded to the semaphore upon admission. This
might be solved when the semaphore is moved to the front -- before the
cache.
Another consequence of this additional, more complete tracking is that
evictable readers now consume memory even when the underlying reader is
evicted. So it may happen that even though no reader is currently
admitted, all memory is consumed from the semaphore. To prevent any such
deadlocks, the semaphore now admits a reader unconditionally if no
reader is admitted -- that is if all count resources all available.

Refs: #4176

Tests: unit(dev, debug, release)
"

* 'track-reader-buffers/v2' of https://github.com/denesb/scylla: (37 commits)
  test/manual/sstable_scan_footprint_test: run test body in statement sched group
  test/manual/sstable_scan_footprint_test: move test main code into separate function
  test/manual/sstable_scan_footprint_test: sprinkle some thread::maybe_yield():s
  test/manual/sstable_scan_footprint_test: make clustering row size configurable
  test/manual/sstable_scan_footprint_test: document sstable related command line arguments
  mutation_fragment_test: add exception safety test for mutation_fragment::mutate_as_*()
  test: simple_schema: add make_static_row()
  reader_permit: reader_resources: add operator==
  mutation_fragment: memory_usage(): remove unused schema parameter
  mutation_fragment: track memory usage through the reader_permit
  reader_permit: resource_units: add permit() and resources() accessors
  mutation_fragment: add schema and permit
  partition_snapshot_row_cursor: row(): return clustering_row instead of mutation_fragment
  mutation_fragment: remove as_mutable_end_of_partition()
  mutation_fragment: s/as_mutable_partition_start/mutate_as_partition_start/
  mutation_fragment: s/as_mutable_range_tombstone/mutate_as_range_tombstone/
  mutation_fragment: s/as_mutable_clustering_row/mutate_as_clustering_row/
  mutation_fragment: s/as_mutable_static_row/mutation_as_static_row/
  flat_mutation_reader: make _buffer a tracked buffer
  mutation_reader: extract the two fill_buffer_result into a single one
  ...
2020-09-29 16:08:16 +03:00
Pekka Enberg
8f17ca2d1a scripts/refresh-submodules.sh: Add python3 submodule
Message-Id: <20200928075422.377888-1-penberg@scylladb.com>
2020-09-29 16:06:32 +03:00
Yaron Kaikov
d48df44f26 configure.py: build python3, jmx, tools and unified-tar only in relevant dist-{mode}
Today when ever we are building scylla in a singel mode we still
building jmx, tools and python3 for all dev,release and debug.
Let's make sure we build only in relevant build mode

Also adding unified-tar to ninja build

Closes #7260
2020-09-29 15:41:52 +03:00
Juliusz Stasiewicz
0afa738a8f tracing: Fix error on slow batches
`trace_keyspace_helper::make_slow_query_mutation_data` expected a
"query" key in its parameters, which does not appear in case of
e.g. batches of prepared statements. This is example of failing
`record.parameters`:
```
...{"query[0]" : "INSERT INTO ks.tbl (pk, i) values (?, ?);"},
{"query[1]" : "INSERT INTO ks.tbl (pk, i) values (?, ?);"}...
```

In such case Scylla recorded no trace and said:
```
ERROR 2020-09-28 10:09:36,696 [shard 3] trace_keyspace_helper - No
"query" parameter set for a session requesting a slow_query_log record
```

Fix here is to leave query empty if not found. The users can still
retrieve the query contents from existing info.

Fixes #5843

Closes #7293
2020-09-29 13:24:39 +02:00
Asias He
eedcee7f31 gossip: Reduce unncessary VIEW_BACKLOG updates
The blacklog of current and max in VIEW_BACKLOG is not update but the
nodes are updating VIEW_BACKLOG all the time. For example:

```
INFO  2020-03-06 17:13:46,761 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.3, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486026590,718)
INFO  2020-03-06 17:13:46,821 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486026531,742)
INFO  2020-03-06 17:13:47,765 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.3, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486027590,721)
INFO  2020-03-06 17:13:47,825 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486027531,745)
INFO  2020-03-06 17:13:48,772 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.3, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486028590,726)
INFO  2020-03-06 17:13:48,833 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486028531,750)
INFO  2020-03-06 17:13:49,772 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.3, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486029590,729)
INFO  2020-03-06 17:13:49,832 [shard 0] storage_service - Update system.peers table: endpoint=127.0.0.2, app_state=VIEW_BACKLOG, versioned_value=Value(0:18446744073709551615:1583486029531,753)
```

The downside of such updates:

 - Introduces more gossip exchange traffic
 - Updates system.peers all the time

The extra unnecessary gossip traffic is fine to a cluster in a good
shape but when some of the nodes or shards are loaded, such messages and
the handling of such messages can make the system even busy.

With this patch, VIEW_BACKLOG is updated only when the backlog is really
updated.

Btw, we can even make the update only when the change of the backlog is
great than a threshold, e.g., 5%, which can reduce the traffic even
further.

Fixes #5970
2020-09-29 13:37:37 +03:00
Avi Kivity
6fdc8f28a9 Update tools/jmx submodule
* tools/jmx 45e4f28...25bcd76 (1):
  > install.sh: stop using symlinks for systemd units on nonroot mode

Fixes #7288.
2020-09-29 13:32:45 +03:00
Takuya ASADA
8504332e17 scylla_setup: skip offline warnings on nonroot mode
Since most of the scripts requires root privilege, we don't shows up offline
warning on nonroot mode.

Fixes #7286

Closes #7287
2020-09-29 13:30:13 +03:00
Eliran Sinvani
925cdc9ae1 consistency level: fix wrong quorum calculation whe RF = 0
We used to calculate the number of endpoints for quorum and local_quorum
unconditionally as ((rf / 2) + 1). This formula doesn't take into
account the corner case where RF = 0, in this situation quorum should
also be 0.
This commit adds the missing corner case.

Tests: Unit Tests (dev)
Fixes #6905

Closes #7296
2020-09-29 13:25:41 +03:00
Takuya ASADA
ba29074c42 install.sh: stop using symlinks for systemd units on nonroot mode
On some environment, systemctl enable <service> fails when we use symlink.
So just directly copy systemd units to ~/.config/systemd/user, instead of
creating symlink.

Fixes #7288

Closes #7290
2020-09-29 12:20:41 +03:00
Piotr Sarna
9e5ce5a93c counters: remove unused 1.7.4 counter order code
After cleaning up old cluster features (253a7640e3)
the code for special handling of 1.7.4 counter order was effectively
only used in its own tests, so it can be safely removed.

Closes #7289
2020-09-29 12:16:58 +03:00
Avi Kivity
57f377e1fe Merge 'Add max concurrent requests configuration option to coordinator' from Piotr Sarna
This series approaches issue #7072 and provides a very simple mechanism for limiting the number of concurrent CQL requests being served on a shard. Once the limit is hit, new requests will be instantly refused and OverloadedException will be returned to the client.
This mechanism has many improvement opportunities:
 * shedding requests gradually instead of having one hard limit,
 * having more than one limit per different types of queries (reads, writes, schema changes, ...),
 * not using a preconfigured value at all, and instead figuring out the limit dynamically,
 * etc.

... and none of these are taken into account in this series, which only adds a very basic configuration variable. The variable can be updated live without a restart - it can be done by updating the .yaml file and triggering a configuration re-read via sending the SIGHUP signal to Scylla.

The default value for this parameter is a very large number, which translates to effectively not shedding any requests at all.

Refs #7072

Closes #7279

* github.com:scylladb/scylla:
  transport: make max_concurrent_requests_per_shard reloadable
  transport: return exceptional future instead of throwing
  transport,config: add a param for max request concurrency
  exceptions: make a single-param constructor explicit
  exceptions: add a constructor based on custom message
2020-09-29 12:14:03 +03:00
Pekka Enberg
1adf2cc848 Revert "scylla_ntp_setup: use chrony on all distributions"
This reverts commit 8366d2231d because it
causes the following "scylla_setup" failure on Ubuntu 16.04:

  Command: 'sudo /usr/lib/scylla/scylla_setup --nic ens5 --disks /dev/nvme0n1  --swap-directory / '
  Exit code: 1
  Stdout:
  Setting up libtomcrypt0:amd64 (1.17-7ubuntu0.1) ...
  Setting up chrony (2.1.1-1ubuntu0.1) ...
  Creating '_chrony' system user/group for the chronyd daemon…
  Creating config file /etc/chrony/chrony.conf with new version
  Processing triggers for libc-bin (2.23-0ubuntu11.2) ...
  Processing triggers for ureadahead (0.100.0-19.1) ...
  Processing triggers for systemd (229-4ubuntu21.29) ...
  501 Not authorised
  NTP setup failed.
  Stderr:
  chrony.service is not a native service, redirecting to systemd-sysv-install
  Executing /lib/systemd/systemd-sysv-install enable chrony
  Traceback (most recent call last):
  File "/opt/scylladb/scripts/libexec/scylla_ntp_setup", line 63, in <module>
  run('chronyc makestep')
  File "/opt/scylladb/scripts/scylla_util.py", line 504, in run
  return subprocess.run(cmd, stdout=stdout, stderr=stderr, shell=shell, check=exception, env=scylla_env).returncode
  File "/opt/scylladb/python3/lib64/python3.8/subprocess.py", line 512, in run
  raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['chronyc', 'makestep']' returned non-zero exit status 1.
2020-09-29 11:23:23 +03:00
Piotr Sarna
4b856cf62d transport: make max_concurrent_requests_per_shard reloadable
This configuration entry is expected to be used as a quick fix
for an overloaded node, so it should be possible to reload this value
without having to restart the server.
2020-09-29 10:11:36 +02:00
Piotr Sarna
4da8957461 transport: return exceptional future instead of throwing
Throwing bears an additional cost, so it's better to simply
construct the error in place and return it.
2020-09-29 10:00:30 +02:00
Piotr Sarna
b4db6d2598 transport,config: add a param for max request concurrency
The newly introduced parameter - max_concurrent_requests_per_shard
- can be used to limit the number of in-flight requests a single
coordinator shard can handle. Each surplus request will be
immediately refused by returning OverloadedException error to the client.
The default value for this parameter is large enough to never
actually shed any requests.
Currently, the limit is only applied to CQL requests - other frontends
like alternator and redis are not throttled yet.
2020-09-29 09:59:30 +02:00
Botond Dénes
2ee026f26f test/manual/sstable_scan_footprint_test: run test body in statement sched group
So that queries are processed in said scheduling group and thus they use
the user read concurrency semaphore.
2020-09-28 11:27:49 +03:00
Botond Dénes
272a54b81c test/manual/sstable_scan_footprint_test: move test main code into separate function 2020-09-28 11:27:49 +03:00
Botond Dénes
29861b068e test/manual/sstable_scan_footprint_test: sprinkle some thread::maybe_yield():s
To avoid stalls.
2020-09-28 11:27:49 +03:00
Botond Dénes
daa9fa72f1 test/manual/sstable_scan_footprint_test: make clustering row size configurable
So that large-row workloads can be simulated too.
2020-09-28 11:27:49 +03:00