scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-01 05:35:48 +00:00

Author	SHA1	Message	Date
Piotr Sarna	5cc5b64d82	github: remove THE REST rule from CODEOWNERS file The rule for THE REST results in each person listed in it to receive notifications about every single pull request, which can easily lead to inbox overload - the generic rule is therefore dropped and authors of pull requests are expected to manually add reviewers. GitHub offers semi-random suggestions for reviewers anyway. Message-Id: <3c0f7a2f13c098438a8abf998ec56b74db87c733.1596450426.git.sarna@scylladb.com>	2020-08-03 13:48:39 +03:00
Eliran Sinvani	779502ab11	Revert "schema: take into account features when converting a table creation to" This reverts commit `b97f466438`. It turns out that the schema mechanism has a lot of nuances, after this change, for unknown reason, it was empirically proven that the amount of cross shard on an upgraded node was increased significantly with a steady stress traffic, if was so significant that the node appeared unavailable to the coordinators because all of the requests started to fail on smp_srvice_group semaphore. This revert will bring back a caveat in Scylla, the caveat is that creating a table in a mixed cluster might under certain condition cause schema mismatch on the newly created table, this make the table essentially unusable until the whole cluster has a uniform version (rolling upgrade or rollback completion). Fixes #6893.	2020-08-03 12:51:16 +03:00
Botond Dénes	c81658c96e	configure.py: remove unused variable do_sanitize Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200803082724.120916-1-bdenes@scylladb.com>	2020-08-03 12:51:16 +03:00
Botond Dénes	f4c8163d11	db/config_file.hh: named_value: remove unused members _name and _desc They seem to be just copypasta. Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200803080604.45595-1-bdenes@scylladb.com>	2020-08-03 12:51:16 +03:00
Benny Halevy	3fa0f289de	table: snapshot: do not capture name This captured sstring is unused. Test: database_test(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200803072258.44681-1-bhalevy@scylladb.com>	2020-08-03 12:51:16 +03:00
Botond Dénes	e4d06a3bbf	scylla-gdb.py: collection_element: add circular_buffer support Also add a __getitem__() to circular_buffer and mask indexes so they are mapped to [`_impl.begin`, `_impl.end`). Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200803053646.14689-1-bdenes@scylladb.com>	2020-08-03 12:51:16 +03:00
Benny Halevy	122136c617	tables: snapshot: do not create links from multiple shards We need only one of the shards owning each ssatble to call create_links. This will allow us to simplify it and only handle crash/replay scenarios rather than rename/link/remove races. Fixes #1622 Test: unit(dev), database_test(debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200803065505.42100-3-bhalevy@scylladb.com>	2020-08-03 10:07:07 +03:00
Benny Halevy	ec6e136819	table: snapshot: reduce copies of snapshot dir sstring Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200803065505.42100-2-bhalevy@scylladb.com>	2020-08-03 10:07:06 +03:00
Benny Halevy	72365445c6	table: snapshot: create destination dir only once No need to recursive_touch_directory for each sstable. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200803065505.42100-1-bhalevy@scylladb.com>	2020-08-03 10:07:05 +03:00
Pekka Enberg	4f0f97773e	configure.py: Use build directory variable The "outdir" variable in configure.py and "$builddir" in build.ninja file specifies the build directory. Let's use them to eliminate hard-coded "build" paths from configure.py. Message-Id: <20200731105113.388073-1-penberg@scylladb.com>	2020-08-03 09:51:51 +03:00
Nadav Har'El	ae25661d9c	alternator test: set streams time window to zero Alternator Streams have a "alternator_streams_time_window_s" parameter which is used to allow for correct ordering in the stream in the face of clock differences between Scylla nodes and possibly network delays. This parameter currently defaults to 10 seconds, and there is a discussion on issue #6929 on whether it is perhaps too high. But in any case, for tests running on a single node there is no reason not to set this parameter to zero. Setting this parameter to zero greatly speeds up the Alternator Streams tests which use ReadRecords to read from the stream. Previously each such test took at least 10 seconds, because the data was only readable after a 10 second delay. With alternator_streams_time_window_s=0, these tests can finish in less than a second. Unfortunately they are still relatively slow because our Streams implementation has 512 shards, and thus we need over a thousand (!) API calls to read from the stream). Running "test/alternator/run test_streams.py" with 25 tests took before this patch 114 seconds, after this patch, it is down to 18 seconds. Refs #6929 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Reviewed-by: Calle Wilund <calle@scylladb.com> Message-Id: <20200728184612.1253178-1-nyh@scylladb.com>	2020-08-03 09:19:57 +03:00
Avi Kivity	257c17a87a	Merge "Don't depend on seastar::make_(lw_)?shared idiosyncrasies" from Rafael " While working on another patch I was getting odd compiler errors saying that a call to ::make_shared was ambiguous. The reason was that seastar has both: template <typename T, typename... A> shared_ptr<T> make_shared(A&&... a); template <typename T> shared_ptr<T> make_shared(T&& a); The second variant doesn't exist in std::make_shared. This series drops the dependency in scylla, so that a future change can make seastar::make_shared a bit more like std::make_shared. " * 'espindola/make_shared' of https://github.com/espindola/scylla: Everywhere: Explicitly instantiate make_lw_shared Everywhere: Add a make_shared_schema helper Everywhere: Explicitly instantiate make_shared cql3: Add a create_multi_column_relation helper main: Return a shared_ptr from defer_verbose_shutdown	2020-08-02 19:51:24 +03:00
Avi Kivity	bb9ad9c90b	Merge 'Mount RAID volume correctly beyond reboot' from Takuya " To mount RAID volume correctly (#6876), we need to wait for MDRAID initialization. To do so we need to add After=mdmonitor.service on var-lib-scylla.mount. Also, `lsblk -n -oPARTTYPE {dev}` does not work for CentOS7, since older lsblk does not supported PARTTYPE column (#6954). We need to provide relocatable lsblk and run it on out() / run() function instead of distribution provided version. " * syuu1228-scylla_raid_setup_mount_correctly_beyond_reboot: scylla_raid_setup: initialize MDRAID before mounting data volume create-relocatable-package.py: add lsblk for relocatable CLI tools scylla_util.py: always use relocatable CLI tools	2020-08-02 16:36:45 +03:00
Piotr Sarna	ccbffc3177	codeowners: add some @psarnas and @penbergs where applicable I shamelessly added myself to some modules I usually take part in reviewing. Also, I assume that the THE REST bucket should show current maintainers, so the list is extended appropriately. Message-Id: <0c172d0f20e367c3ce47fdf8d40755038ddee373.1596195689.git.sarna@scylladb.com>	2020-07-31 17:08:28 +03:00
Rafael Ávila de Espíndola	30722b8c8e	logalloc: Add disable_failure_guard during a few tls variable initialization The constructors of these global variables can allocate memory. Since the variables are thread_local, they are initialized at first use. There is nothing we can do if these allocations fail, so use disable_failure_guard. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200729184901.205646-1-espindola@scylladb.com>	2020-07-31 15:49:21 +02:00
Pavel Emelyanov	14b279020b	scylla-gdb.py: Support b+tree-based row_cache::_partitions The row_cache::_partitions type is nowadays a double_decker which is B+tree of intrusive_arrays of cache_entrys, so scylla cache command will raise an error being unable to parse this new data type. The respective iterator for double decker starts on the tree and walks the list of leaf nodes, on each node it walks the plain array of data nodes, then on each data node it walks the intrusive array of cache_entrys yielding them to the caller. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200730145851.8819-1-xemul@scylladb.com>	2020-07-31 15:48:25 +02:00
Piotr Jastrzębski	b16b2c348f	Add CDC code owners	2020-07-31 14:22:08 +03:00
Piotr Jastrzębski	7eff7a39a0	Add hinted handoff code owners	2020-07-31 14:21:59 +03:00
Piotr Jastrzębski	443affa525	Update counters code owners	2020-07-31 14:21:48 +03:00
Tomasz Grabiec	5263e0453a	CMakeLists.txt: Add abseil to include directories Fixes IDE integration. Message-Id: <1596190352-15467-1-git-send-email-tgrabiec@scylladb.com>	2020-07-31 12:15:23 +02:00
Avi Kivity	66c2b4c8bf	tools: toolchain: regenerate for gcc 10.2 Fixes #6813. As a side effect, this also brings in xxhash 0.7.4.	2020-07-31 08:32:16 +03:00
Takuya ASADA	9e5d548f75	scylla_raid_setup: initialize MDRAID before mounting data volume var-lib-scylla.mount should wait for MDRAID initilization, so we need to add 'After=mdmonitor.service'. However, currently mdmonitor.service fails to start due to no mail address specified, we need to add the entry on mdadm.conf. Fixes #6876	2020-07-31 06:33:52 +09:00
Takuya ASADA	6ba2a6c42e	create-relocatable-package.py: add lsblk for relocatable CLI tools We need latest version of lsblk that supported partition type UUID. Fixes #6954	2020-07-31 04:23:03 +09:00
Takuya ASADA	a19a62e6f6	scylla_util.py: always use relocatable CLI tools On some CLI tools, command options may different between latest version vs older version. To maximize compatibility of setup scripts, we should always use relocatable CLI tools instead of distribution version of the tool. Related #6954	2020-07-31 04:17:01 +09:00
Piotr Sarna	b3ad5042c4	.gitignore: add .vscode to the list Since it looks like vscode is used as main IDE by some developers, including me, let's ignore its helper files. Message-Id: <63931cadc733c3d0345616be633a6479dc85ca19.1596115302.git.sarna@scylladb.com>	2020-07-30 16:35:06 +03:00
Piotr Sarna	8728c70628	.gitignore: allow symlinks when ignoring testlog The .gitignore entry for testlog/ directory is generalized from "testlog/*" to "testlog", in order to please everyone who potentially wants test logs to use ramfs by symlinking testlog to /tmp. Without the change, the symlink remains visible in `git status`. Message-Id: <e600f5954868aea7031beb02b1d8e12a2ff869e2.1596115302.git.sarna@scylladb.com>	2020-07-30 16:35:02 +03:00
Piotr Sarna	0788a77109	Merge 'Replace MAINTAINERS with CODEOWNERS' from Pekka Replace the MAINTAINERS file with a CODEOWNERS file, which Github is able to parse, and suggest reviewers for pull requests. * penberg-penberg/codeowners: Replace MAINTAINERS with CODEOWNERS Update MAINTAINERS	2020-07-30 15:12:59 +02:00
Nadav Har'El	8b9da9c92a	alternator test: tests for combination of query filter and projection The tests in this patch, which pass on DynamoDB but fail on Alternator, reproduce a bug described in issue #6951. This bug makes it impossible for a Query (or Scan) to filter on an attribute if that attribute is not requested to be included in the output. This patch includes two xfailing tests of this type: One testing a combination of FilterExpression and ProjectionExpression, and the second testing a combination of QueryFilter and AttributesToGet; These two pairs are, respectively, DynamoDB's newer and older syntaxes to achieve the same thing. Additionally, we add two xfailing tests that demonstrates that combining old and new style syntax (e.g., FilterExpression with AttributesToGet) should not have been allowed (DynamoDB doesn't allow such combinations), but Alternator currently accepts these combinations. Refs #6951 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200729210346.1308461-1-nyh@scylladb.com>	2020-07-30 09:34:23 +02:00
Rafael Ávila de Espíndola	a548e5f5d1	test: Mark tmpdir::remove noexcept Also disable the allocation failure injection in it. Refs #6831. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200729200019.250908-2-espindola@scylladb.com>	2020-07-30 09:55:52 +03:00
Rafael Ávila de Espíndola	d8ba9678b4	test: Move tmpdir code to a .cc file This is not hot, so we can move it out of the header. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200729200019.250908-1-espindola@scylladb.com>	2020-07-30 09:55:52 +03:00
Tomasz Grabiec	3486eba1ce	commitlog: Fix use-after-free on mutation object during replay The mutation object may be freed prematurely during commitlog replay in the schema upgrading path. We will hit the problem if the memtable is full and apply_in_memory() needs to defer. This will typically manifest as a segfault. Fixes #6953 Introduced in `79935df` Tests: - manual using scylla binary. Reproduced the problem then verified the fix makes it go away Message-Id: <1596044010-27296-1-git-send-email-tgrabiec@scylladb.com>	2020-07-29 20:58:15 +03:00
Nadav Har'El	665b78253a	alternator test: reduce amount of Scylla logs saved The test/alternator/run script follows the pytest log with a full log of Scylla. This saved log can be useful in diagnosing problems, but most of it is filled with non-useful "INFO"-level messages. The two biggest offenders are compaction - which logs every single compaction happening, and the migration manager, which is just a second (and very long) message about schema change operations (e.g., table creations). Neither of these are interesting for Alternator's tests, which shouldn't care exactly when compaction of which sstable is happening. These two components alone are reponsible for 80% of the log lines, and 90% of the log bytes! In this patch we increase the log level of just these two components - compaction and migration_manager - to WARN, which reduces the log by the same percentages (80% by lines, 90% by bytes). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200728191420.1254961-1-nyh@scylladb.com>	2020-07-29 14:17:12 +03:00
Takuya ASADA	3a25e7285b	scylla_post_install.sh: generate memory.conf for CentOS7 On CentOS7, systemd does not support percentage-based parameter. To apply memory parameter on CentOS7, we need to override the parameter in bytes, instead of percentage. Fixes #6783	2020-07-29 14:10:16 +03:00
Avi Kivity	fea5067dfa	Merge "Limit non-paged query memory consumption" from Botond " Non-paged queries completely ignore the query result size limiter mechanism. They consume all the memory they want. With sufficiently large datasets this can easily lead to a handful or even a single unpaged query producing an OOM. This series continues the work started by `134d5a5f7`, by introducing a configurable pair of soft/hard limit (default to 1MB/100MB) that is applied to otherwise unlimited queries, like reverse and unpaged ones. When an unlimited query reaches the soft limit a warning is logged. This should give users some heads-up to adjust their application. When the hard limit is reached the query is aborted. The idea is to not greet users with failing queries after an upgrade while at the same time protect the database from the really bad queries. The hard limit should be decreased from time to time gradually approaching the desired goal of 1MB. We don't want to limit internal queries, we trust ourselves to either use another form of memory usage control, or read only small datasets. So the limit is selected according to the query class. User reads use the `max_memory_for_unlimited_query_{soft,hard}_limit` configuration items, while internal reads are not limited. The limit is obtained by the coordinator, who passes it down to replicas using the existing `max_result_size` parameter (which is not a special type containing the two limits), which is now passed on every verb, instead of once per connection. This ensures that all replicas work with the same limits. For normal paged queries `max_result_size` is set to the usual `query::result_memory_limiter::maximum_result_size` For queries that can consume unlimited amount of memory -- unpaged and reverse queries -- this is set to the value of the aforementioned `max_memory_for_unlimited_query_{soft,hard}_limit` configuration item, but only for user reads, internal reads are not limited. This has the side-effect that reverse reads now send entire partitions in a single page, but this is not that bad. The data was already read, and its size was below the limit, the replica might as well send it all. Fixes: #5870 " * 'nonpaged-query-limit/v5' of https://github.com/denesb/scylla: (26 commits) test: database_test: add test for enforced max result limit mutation_partition: abort read when hard limit is exceeded for non-paged reads query-result.hh: move the definition of short_read to the top test: cql_test_env: set the max_memory_unlimited_query_{soft,hard}_limit test: set the allow_short_read slice option for paged queries partition_slice_builder: add with_option() result_memory_accounter: remove default constructor query_*(): use the coordinator specified memory limit for unlimited queries storage_proxy: use read_command::max_result_size to pass max result size around query: result_memory_limiter: use the new max_result_size type query: read_command: add max_result_size query: read_command: use tagged ints for limit ctor params query: read_command: add separate convenience constructor service: query_pager: set the allow_short_read flag result_memory_accounter: check(): use _maximum_result_size instead of hardcoded limit storage_proxy: add get_max_result_size() result_memory_limiter: add unlimited_result_size constant database: add get_statement_scheduling_group() database: query_mutations(): obtain the memory accounter inside query: query_class_config: use max_result_size for the max_memory_for_unlimited_query field ...	2020-07-29 13:41:53 +03:00
Avi Kivity	22fe38732d	Update tools/jmx and tools/java submodules * tools/java a9480f3a87...aa7898d771 (4): > dist: debian: do not require root during package build > cassandra-stress: Add serial consistency options > dist: debian: fix detection of debuild > bin tools: Use non-default `cassandra.config` * tools/jmx c0d9d0f...626fd75 (1): > dist: debian: do not require root during package build Fixes #6655.	2020-07-29 12:55:18 +03:00
Botond Dénes	3804dfcc0c	test: database_test: add test for enforced max result limit Two tests are added: one that works on the low-level database API, and another one that works on the CQL API.	2020-07-29 08:32:34 +03:00
Botond Dénes	f7a4d19fb1	mutation_partition: abort read when hard limit is exceeded for non-paged reads If the read is not paged (short read is not allowed) abort the query if the hard memory limit is reached. On reaching the soft memory limit a warning is logged. This should allow users to adjust their application code while at the same time protecting the database from the really bad queries. The enforcement happens inside the memory accounter and doesn't require cooperation from the result builders. This ensures memory limit set for the query is respected for all kind of reads. Previously non-paged reads simply ignored the memory accounter requesting the read to stop and consumed all the memory they wanted.	2020-07-29 08:32:31 +03:00
Rafael Ávila de Espíndola	c4cb3817cf	build: Use -fdata-sections and -ffunction-sections This is a 4.2% reduction in the scylla text size, from 38975956 to 37404404 bytes. When benchmarking perf_simple_query without --shuffle-sections, there is no performance difference. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200724032504.3004-1-espindola@scylladb.com>	2020-07-28 19:39:26 +03:00
Botond Dénes	02a7492d62	query-result.hh: move the definition of short_read to the top It will be used by `result_memory_{limiter,accounter}` soon.	2020-07-28 18:00:29 +03:00
Botond Dénes	43c0da4b63	test: cql_test_env: set the max_memory_unlimited_query_{soft,hard}_limit To an unlimited value, in order to avoid aborting any unpaged queries executed by tests, that would exceed the default result limit of 1MB/100MB.	2020-07-28 18:00:29 +03:00
Botond Dénes	648ce473ab	test: set the allow_short_read slice option for paged queries Some tests use the lower level methods directly and meant to use paging but didn't and nobody noticed. This was revealed by the enforcement of max result size (introduced in a later patch), which caused these tests to fail due to exceeding the max result size. This patch fixes this by setting the `allow_short_reads` slice option.	2020-07-28 18:00:29 +03:00
Botond Dénes	d27f8321d7	partition_slice_builder: add with_option()	2020-07-28 18:00:29 +03:00
Botond Dénes	6660a5df51	result_memory_accounter: remove default constructor If somebody wants to bypass proper memory accounting they should at the very least be forced to consider if that is indeed wise and think a second about the limit they want to apply.	2020-07-28 18:00:29 +03:00
Botond Dénes	9eab5bca27	query_*(): use the coordinator specified memory limit for unlimited queries It is important that all replicas participating in a read use the same memory limits to avoid artificial differences due to different amount of results. The coordinator now passes down its own memory limit for reads, in the form of max_result_size (or max_size). For unpaged or reverse queries this has to be used now instead of the locally set max_memory_unlimited_query configuration item. To avoid the replicas accidentally using the local limit contained in the `query_class_config` returned from `database::make_query_class_config()`, we refactor the latter into `database::get_reader_concurrency_semaphore()`. Most of its callers were only interested in the semaphore only anyway and those that were interested in the limit as well should get it from the coordinator instead, so this refactoring is a win-win.	2020-07-28 18:00:29 +03:00
Botond Dénes	159d37053d	storage_proxy: use read_command::max_result_size to pass max result size around Use the recently added `max_result_size` field of `query::read_command` to pass the max result size around, including passing it to remote nodes. This means that the max result size will be sent along each read, instead of once per connection. As we want to select the appropriate `max_result_size` based on the type of the query as well as based on the query class (user or internal) the previous method won't do anymore. If the remote doesn't fill this field, the old per-connection value is used.	2020-07-28 18:00:29 +03:00
Botond Dénes	fbbbc3e05c	query: result_memory_limiter: use the new max_result_size type	2020-07-28 18:00:29 +03:00
Botond Dénes	92a7b16cba	query: read_command: add max_result_size This field will replace max size which is currently passed once per established rpc connection via the CLIENT_ID verb and stored as an auxiliary value on the client_info. For now it is unused, but we update all sites creating a read command to pass the correct value to it. In the next patch we will phase out the old max size and use this field to pass max size on each verb instead.	2020-07-28 18:00:29 +03:00
Botond Dénes	8992bcd1f8	query: read_command: use tagged ints for limit ctor params The convenience constructor of read_command now has two integer parameter next to each other. In the next patch we intend to add another one. This is recipe for disaster, so to avoid mistakes this patch converts these parameters to tagged integers. This makes sure callers pass what they meant to pass. As a matter of fact, while fixing up call-sites, I already found several ones passing `query::max_partitions` to the `row_limit` parameter. No harm done yet, as `query::max_partitions` == `query::max_rows` but this shows just how easy it is to mix up parameters with the same type.	2020-07-28 18:00:29 +03:00
Botond Dénes	2ca118b2d5	query: read_command: add separate convenience constructor query::read_command currently has a single constructor, which serves both as an idl constructor (order of parameters is fixed) and a convenience one (most parameters have default values). This makes it very error prone to add new parameters, that everyone should fill. The new parameter has to be added as last, with a default value, as the previous ones have a default value as well. This means the compiler's help cannot be enlisted to make sure all usages are updated. This patch adds a separate convenience constructor to be used by normal code. The idl constructor looses all default parameters. New parameters can be added to any position in the convenience constructor (to force users to fill in a meaningful value) while the removed default parameters from the idl constructor means code cannot accidentally use it without noticing.	2020-07-28 18:00:29 +03:00
Botond Dénes	1615fe4c5e	service: query_pager: set the allow_short_read flag All callers should set this already before passing the slice to the pager, however not all actually do (e.g. `cql3::indexed_table_select_statement::read_posting_list()`). Instead of auditing each call site, just make sure this is set in the pager itself. If someone is creating a pager we can be sure they mean to use paging.	2020-07-28 18:00:29 +03:00

1 2 3 4 5 ...

22992 Commits