scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 06:05:53 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	2d64fc3a3e	main: Shut down database with verbose_shutdown helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-25 18:47:03 +03:00
Pavel Emelyanov	636c300db5	main: Shut down prometheus with verbose_shutdown() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> --- v2: - Have stop easrlier so that exception in start/listen do not prevent prometheu.stop from calling	2019-11-25 18:47:03 +03:00
Pavel Emelyanov	804b152527	main: Sanitize shutting down callbacks As suggested in issue #4586 here is the helper that prints "shutting down foo" message, then shuts the foo down, then prints the "shutting down foo was successfull". In between it catches the exception (if any) and warns this in logs. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-11-25 18:45:49 +03:00
Pavel Emelyanov	f6ac969f1e	mm: Stop migration manager Before stopping the db itself, stop the migration service. It must be stopped before RPC, but RPC is not stopped yet itself, so we should be safe here. Here's the tail of the resulting logs: INFO 2019-11-20 11:22:35,193 [shard 0] init - shutdown migration manager INFO 2019-11-20 11:22:35,193 [shard 0] migration_manager - stopping migration service INFO 2019-11-20 11:22:35,193 [shard 1] migration_manager - stopping migration service INFO 2019-11-20 11:22:35,193 [shard 0] init - Shutdown database started INFO 2019-11-20 11:22:35,193 [shard 0] init - Shutdown database finished INFO 2019-11-20 11:22:35,193 [shard 0] init - stopping prometheus API server INFO 2019-11-20 11:22:35,193 [shard 0] init - Scylla version 666.development-0.20191120.25820980f shutdown complete. Also -- stop the mm on drain before the commitlog it stopped. [Tomasz: mm needs the cl because pulling schema changes from other nodes involves applying them into the database. So cl/db needs to be stopped after mm is stopped.] The drain logs would look like ... INFO 2019-11-25 11:00:40,562 [shard 0] migration_manager - stopping migration service INFO 2019-11-25 11:00:40,562 [shard 1] migration_manager - stopping migration service INFO 2019-11-25 11:00:40,563 [shard 0] storage_service - DRAINED: and then on stop ... INFO 2019-11-25 11:00:46,427 [shard 0] init - shutdown migration manager INFO 2019-11-25 11:00:46,427 [shard 0] init - Shutdown database started INFO 2019-11-25 11:00:46,427 [shard 0] init - Shutdown database finished INFO 2019-11-25 11:00:46,427 [shard 0] init - stopping prometheus API server INFO 2019-11-25 11:00:46,427 [shard 0] init - Scylla version 666.development-0.20191125.3eab6cd54 shutdown complete. Fixes #5300 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191125080605.7661-1-xemul@scylladb.com>	2019-11-25 12:59:01 +01:00
Asias He	6ec602ff2c	repair: Fix rx_hashes_nr metrics (#5213 ) In get_full_row_hashes_with_rpc_stream and repair_get_row_diff_with_rpc_stream_process_op which were introduced in the "Repair switch to rpc stream" series, rx_hashes_nr metrics are not updated correctly. In the test we have 3 nodes and run repair on node3, we makes sure the following metrics are correct. assertEqual(node1_metrics['scylla_repair_tx_hashes_nr'] + node2_metrics['scylla_repair_tx_hashes_nr'], node3_metrics['scylla_repair_rx_hashes_nr']) assertEqual(node1_metrics['scylla_repair_rx_hashes_nr'] + node2_metrics['scylla_repair_rx_hashes_nr'], node3_metrics['scylla_repair_tx_hashes_nr']) assertEqual(node1_metrics['scylla_repair_tx_row_nr'] + node2_metrics['scylla_repair_tx_row_nr'], node3_metrics['scylla_repair_rx_row_nr']) assertEqual(node1_metrics['scylla_repair_rx_row_nr'] + node2_metrics['scylla_repair_rx_row_nr'], node3_metrics['scylla_repair_tx_row_nr']) assertEqual(node1_metrics['scylla_repair_tx_row_bytes'] + node2_metrics['scylla_repair_tx_row_bytes'], node3_metrics['scylla_repair_rx_row_bytes']) assertEqual(node1_metrics['scylla_repair_rx_row_bytes'] + node2_metrics['scylla_repair_rx_row_bytes'], node3_metrics['scylla_repair_tx_row_bytes']) Tests: repair_additional_test.py:RepairAdditionalTest.repair_almost_synced_3nodes_test Fixes: #5339 Backports: 3.2	2019-11-25 13:57:37 +02:00
Nadav Har'El	3eab6cd549	Merged "toolchain: update to Fedora 31" Merged pull request https://github.com/scylladb/scylla/pull/5310 from Avi Kivity: This is a minor update as gcc and boost versions did not change. A noteable update is patchelf 0.10, which adds support to large binaries. A few minor issues exposed by the update are fixed in preparatory patches. Patches: dist: rpm: correct systemd post-uninstall scriptlet build: force xz compression on rpm binary payload tools: toolchain: update to Fedora 31	2019-11-24 13:38:45 +02:00
Tomasz Grabiec	e3d025d014	row_cache: Fix abort on bad_alloc during cache update Since `90d6c0b`, cache will abort when trying to detach partition entries while they're updated. This should never happen. It can happen though, when the update fails on bad_alloc, because the cleanup guard invalidates the cache before it releases partition snapshots (held by "update" coroutine). Fix by destroying the coroutine first. Fixes #5327. Tests: - row_cache_test (dev) Message-Id: <1574360259-10132-1-git-send-email-tgrabiec@scylladb.com>	2019-11-24 12:06:51 +02:00
Rafael Ávila de Espíndola	8599f8205b	rpmbuild: don't use dwz By default rpm uses dwz to merge the debug info from various binaries. Unfortunately, it looks like addr2line has not been updated to handle this: // This works $ addr2line -e build/release/scylla 0x1234567 $ dwz -m build/release/common.debug build/release/scylla.debug build/release/iotune.debug // now this fails $ addr2line -e build/release/scylla 0x1234567 I think the issue is https://sourceware.org/bugzilla/show_bug.cgi?id=23652 Fixes #5289 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191123015734.89331-1-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola	25d5d39b3c	reloc: Force using sha1 for build-ids The default build-id used by lld is xxhash, which is 8 bytes long. rpm requires build-ids to be at least 16 bytes long (https://github.com/rpm-software-management/rpm/issues/950). We force using sha1 for now. That has no impact in gold and bfd since that is their default. We set it in here instead of configure.py to not slow down regular builds. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191123020801.89750-1-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola	b5667b9c31	build: don't compress debug info in executables By default we were compressing debug info only in release executables. The idea, if I understand it correctly, is that those are the ones we ship, so we want a more compact binary. I don't think that was doing anything useful. The compression is just gzip, so when we ship a .tar.xz, having the debug info compressed inside the scylla binary probably reduces the overall compression a bit. When building a rpm the situation in amusing. As part of the rpm build process the debug info is decompressed and extracted to an external file. Given that most of the link time goes to compressing debug info, it is probably a good idea to just skip that. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191123022825.102837-1-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Tomasz Grabiec	d84859475e	Merge "Refactor test.py and cleanup resources" from Kostja Structure the code to be able to introduce futures. Apply trivial cleanups. Switch to asyncio and use it to work with processes and handle signals. Cleanup all processes upon signal.	2019-11-24 11:35:29 +02:00
Tomasz Grabiec	e166fdfa26	Merge "Optimize LWT query phase" from Vladimir Davydov This patch implements a simple optimization for LWT: it makes PAXOS prepare phase query locally and return the current value of the modified key so that a separate query is not necessary. For more details see patch 6. Patch 1 fixes a bug in next. Patches 2-5 contain trivial preparatory refactoring.	2019-11-24 11:35:29 +02:00
Pavel Solodovnikov	4879db70a6	system_keyspace: support timeouts in queries to `system.paxos` table. Also introduce supplementary `execute_cql_with_timeout` function. Remove redundant comment for `execute_cql`. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20191121214148.57921-1-pa.solodovnikov@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	bf5f864d80	paxos: piggyback result query on prepare response Current LWT implementation uses at least three network round trips: - first, execute PAXOS prepare phase - second, query the current value of the updated key - third, propose the change to participating replicas (there's also learn phase, but we don't wait for it to complete). The idea behind the optimization implemented by this patch is simple: piggyback the current value of the updated key on the prepare response to eliminate one round trip. To generate less network traffic, only the closest to the coordinator replica sends data while other participating replicas send digests which are used to check data consistency. Note, this patch changes the API of some RPC calls used by PAXOS, but this should be okay as long as the feature in the early development stage and marked experimental. To assess the impact of this optimization on LWT performance, I ran a simple benchmark that starts a number of concurrent clients each of which updates its own key (uncontended case) stored in a cluster of three AWS i3.2xlarge nodes located in the same region (us-west-1) and measures the aggregate bandwidth and latency. The test uses shard-aware gocql driver. Here are the results: latency 99% (ms) bandwidth (rq/s) timeouts (rq/s) clients before after before after before after 1 2 2 626 637 0 0 5 4 3 2616 2843 0 0 10 3 3 4493 4767 0 0 50 7 7 10567 10833 0 0 100 15 15 12265 12934 0 0 200 48 30 13593 14317 0 0 400 185 60 14796 15549 0 0 600 290 94 14416 15669 0 0 800 568 118 14077 15820 2 0 1000 710 118 13088 15830 9 0 2000 1388 232 13342 15658 85 0 3000 1110 363 13282 15422 233 0 4000 1735 454 13387 15385 329 0 That is, this optimization improves max LWT bandwidth by about 15% and allows to run 3-4x more clients while maintaining the same level of system responsiveness.	2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola	6160b9017d	commitlog: make sure a file is closed If allocate or truncate throws, we have to close the file. Fixes #4877 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191114174810.49004-1-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	3d1d4b018f	paxos: remove unnecessary move constructor invocations invoke_on() guarantees that captures object won't be destroyed until the future returned by the invoked function is resolved so there's no need to move key, token, proposal for calling paxos_state::*_impl helpers.	2019-11-24 11:35:29 +02:00
Rafael Ávila de Espíndola	cfb079b2c9	types: Refactor duplicated value_cast implementation The two implementations of value_cast were almost identical. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-3-espindola@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	ef2e96c47c	storage_proxy: factor out helper to sort endpoints by proximity We need it for PAXOS.	2019-11-24 11:35:29 +02:00
Nadav Har'El	854e6c8d7b	alternator-test: test_health_only_works_for_root_path: remove wrong check The test_health_only_works_for_root_path test checks that while Alternator's HTTP server responds to a "GET /" request with success ("health check"), it should respond to different URLs with failures (page not found). One of the URLs it tested was "/..", but unfortunately some versions of Python's HTTP client canonize this request to just a "/", causing the request to unexpectedly succeed - and the test to fail. So this patch just drops the "/.." check. A few other nonsense URLs are attempted by the test - e.g., "/abc". Fixes #5321 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	63d4590336	storage_proxy: move digest_algorithm upper We need it for PAXOS. Mark it as static inline while we are at it.	2019-11-24 11:35:29 +02:00
Nadav Har'El	43d3e8adaf	alternator: make DescribeTable return table schema One of the fields still missing in DescribeTable's response (Refs #5026) was the table's schema - KeySchema and AttributeDefinitions. This patch adds this missing feature, and enables the previously-xfailing test test_describe_table_schema. A complication of this patch is that in a table with secondary indexes, we need to return not just the base table's schema, but also the indexes' schema. The existing tests did not cover that feature, so we add here two more tests in test_gsi.py for that. One of these secondary-index schema tests, test_gsi_2_describe_table_schema, still fails, because it outputs a range-key which Scylla added to a view because of its own implementation needs, but wasn't in the user's definition of the GSI. I opened a separate issue #5320 for that. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	f5c2a23118	serializer: add reference_wrapper handling Serialize reference_wrapper<T> as T and make sure is_equivalent<> treats reference_wrapper<T> wrapped in std::optional<> or std::variant<>, or std::tuple<> as T. We need it to avoid copying query::result while serializing paxos::promise.	2019-11-24 11:35:29 +02:00
Botond Dénes	89f9b89a89	scylla-gdb.py: scylla task_histogram: scan all tasks with -a or -s 0 Currently even if `-a` or `-s 0` is provided, `scylla task_histogram` will scan a limited amount of pages due to a bug in the scan loop's stop condition, which will be trigger a stop once the default sample limit is reached. Fix the loop by skipping this check when the user wants to scan all tasks. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191121141706.29476-1-bdenes@scylladb.com>	2019-11-24 11:35:29 +02:00
Vladimir Davydov	1452653fbc	query_context: fix use after free of timeout_config in execute_cql_with_timeout timeout_config is used by reference by cql3::query_processor::process(), see cql3::query_options, so the caller must make sure it doesn't go away.	2019-11-24 11:35:29 +02:00
Konstantin Osipov	b8b5834cf1	test.py: simplify message output in run_test()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	90a8f79d7e	test.py: use UnitTest class where possible	2019-11-21 23:16:22 +03:00
Konstantin Osipov	8cd8cfc307	test.py: rename harness command line arguments to 'options' UnitTest class uses juggles with the name 'args' quite a bit to construct the command line for a unit test, so let's spread the harness command line arguments from the unit test command line arguments a bit apart by consistently calling the harness command line arguments 'options', and unit test command line arguments 'args'. Rename usage() to parse_cmd_line().	2019-11-21 23:16:22 +03:00
Konstantin Osipov	e5d624d055	test.py: consolidate argument handling in UnitTest constructor Create unique UnitTest objects in find_tests() for each found match, including repeat, to ensure each test has its own unique id. This will also be used to store execution state in the test.	2019-11-21 23:16:22 +03:00
Konstantin Osipov	dd60673cef	test.py: move --collectd to standard args	2019-11-21 23:16:22 +03:00
Konstantin Osipov	fe12f73d7f	test.py: introduce class UnitTest	2019-11-21 23:16:22 +03:00
Konstantin Osipov	bbcdee37f7	test.py: add add_test_list() to find_tests()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	4723afa09c	test.py: add long tests with add_test()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	13f1e2abc6	test.py: store the non-default seastar arguments along with definition	2019-11-21 23:16:22 +03:00
Konstantin Osipov	72ef11eb79	test.py: introduce add_test() to find_tests() To avoid code duplication, and to build upon later.	2019-11-21 23:16:22 +03:00
Konstantin Osipov	b50b24a8a7	test.py: avoid an unnecessary loop in find_tests()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	a5103d0092	test.py: move args.repeat processing to find_tests() It somewhat stands in the way of using asyncio This patch also implements a more comprehensive fix for #5303, since we not only have --repeat, but run some tests in different configurations, in which case xml output is also overwritten.	2019-11-21 23:16:22 +03:00
Konstantin Osipov	0f0a49b811	test.py: introduce print_summary() and write_xunit_report() (One more moving of the code around).	2019-11-21 23:16:22 +03:00
Konstantin Osipov	22166771ef	test.py: rename test_to_run tests_to_run	2019-11-21 23:16:22 +03:00
Konstantin Osipov	1d94d9827e	test.py: introduce run_all_tests()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	29087e1349	test.py: move out run_test() routine (Trivial code refactoring.)	2019-11-21 23:16:22 +03:00
Konstantin Osipov	79506fc5ab	test.py: introduce find_tests() Trivial code refactoring.	2019-11-21 23:16:22 +03:00
Konstantin Osipov	a44a1c4124	test.py: remove print_status_succint (Trivial code cleanup.)	2019-11-21 23:16:22 +03:00
Konstantin Osipov	b9605c1d37	test.py: move mode list evaluation to usage()	2019-11-21 23:16:22 +03:00
Konstantin Osipov	0c4df5a548	test.py: add usage()	2019-11-21 23:16:22 +03:00
Pavel Emelyanov	e0f40ed16a	cli: Add the --workdir\|-W option When starting scylla daemon as non-root the initialization fails because standard /var/lib/scylla is not accessible by regular users. Making the default dir accessible for user is not very convenient either, as it will cause conflicts if two or more instances of scylla are in use. This problem can be resolved by specifying --commitlog-directory, --data-file-directories, etc on start, but it's too much typing. I propose to revive Nadav's --home option that allows to move all the directories under the same prefix in one go. Unlike Nadav's approach the --workdir option doesn't do any tricky manipulations with existing directories. Insead, as Pekka suggested, the individual directories are placed under the workir if and only if the respective option is NOT provided. Otherwise the directory configuration is taken as is regardless of whether its absolute or relative path. The values substutution is done early on start. Avi suggested that this is unsafe wrt HUP config re-read and proper paths must be resolved on the fly, but this patch doesn't address that yet, here's why. First of all, the respective options are MustRestart now and the substitution is done before HUP handler is installed. Next, commitlog and data_file values are copied on start, so marking the options as LiveUpdate won't make any effect. Finally, the existing named_value::operator() returns a reference, so returning a calculated (and thus temporary) value is not possible (from my current understanding, correct me if I'm wrong). Thus if we want the _directory() to return calculated value all callers of them must be patched to call something different (e.g. _directory.get() ?) which will lead to more confusion and errors. Changes v3: - the option is --workdir back again - the existing *directory are only affected if unset - default config doesn't have any of these set - added the short -W alias Changes v2: - the option is --home now - all other paths are changed to be relative Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20191119130059.18066-1-xemul@scylladb.com>	2019-11-21 15:07:39 +02:00
Rafael Ávila de Espíndola	5417c5356b	types: Move get_castas_fctn to cql3 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-9-espindola@scylladb.com>	2019-11-21 12:08:50 +02:00
Rafael Ávila de Espíndola	f06d6df4df	types: Simplify casts to string These now just use the to_string member functions, which makes it possible to move the code to another file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-8-espindola@scylladb.com>	2019-11-21 12:08:50 +02:00
Rafael Ávila de Espíndola	786b1ec364	types: Move json code to its own file Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-7-espindola@scylladb.com>	2019-11-21 12:08:49 +02:00
Rafael Ávila de Espíndola	af8e207491	types: Avoid using deserialize_value in json code This makes it independent of internal functions and makes it possible to move it to another file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-6-espindola@scylladb.com>	2019-11-21 12:08:49 +02:00
Rafael Ávila de Espíndola	ed65e2c848	types: Move cql3_kind to the cql3 directory Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20191120181213.111758-5-espindola@scylladb.com>	2019-11-21 12:08:47 +02:00

1 2 3 4 5 ...

20255 Commits