scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 11:55:15 +00:00

Author	SHA1	Message	Date
Asias He	64a4c0ede2	streaming: Do not open rpc stream connection if ranges are not relevant to a shard Given a list of ranges to stream, stream_transfer_task will create an reader with the ranges and create a rpc stream connection on all the shards. When user provides ranges to repair with -st -et options, e.g., using scylla-manger, such ranges can belong to only one shard, repair will pass such ranges to streaming. As a result, only one shard will have data to send while the rpc stream connections are created on all the shards, which can cause the kernel run out of ports in some systems. To mitigate the problem, do not open the connection if the ranges do not belong to the shard at all. Refs: #4708	2019-07-18 18:31:21 +03:00
Avi Kivity	51cff8ad23	Merge "Fix storage service for tests" from Botond " Fix another source of flakyness in mutation_reader_test. This one is caused by storage_service_for_tests lacking a config::broadcast_to_all_shards() call, triggering an invalid memory access (or SEGFAULT) when run on more than one shards. Refs: #4695 " * 'fix_storage_service_for_tests' of https://github.com/denesb/scylla: tests: storage_service_for_tests: broadcast config to all shards tests: move storage_service_for_tests impl to test_services.cc	2019-07-18 18:27:47 +03:00
Nadav Har'El	997b92a666	migration_manager: allow dropping table and all its views The function announce_column_family_drop() drops (deletes) a base table and all the materialized-views used for its secondary indexes, but not other materialized views - if there are any, the operation refuses to continue. This is exactly what CQL's "DROP TABLE" needs, because it is not allowed to drop a table before manually dropping its views. But there is no inherent reason why it we can't support an operation to delete a table and all its views - not just those related to indexes. This patch adds such an option to announce_column_family_drop(). This option is not used by the existing CQL layer, but can be used by other code automating operations programatically without CQL. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190716150559.11806-1-nyh@scylladb.com>	2019-07-18 13:26:25 +02:00
Takuya ASADA	bd7d1b2d38	dist/common/systemd: change stop timeout sec to 900s Currently scylla-server.service uses DefaultTimeoutStopSec = 90, if Scylla does not able to clean-shutdown in 90sec we may have data corruption on the node. Since we already set TimeoutStartSec = 900, we can use TimeoutSec to set both TimeoutStartSec and TimeoutStopSec to 900. See #4700 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190717095416.10652-1-syuu@scylladb.com>	2019-07-17 15:37:47 +03:00
Nadav Har'El	759752947b	drop_index_statement: fix column_family() All statement objects which derive from cf_statement, including drop_index_statement, have a column_family() returning the name of the column family involved in this statement. For most statement this is known at the time of construction, because it is part of the statement, but for "DROP INDEX", the user doesn't specify the table's name - just the index name. So we need to override column_family() to find the table name. The existing implementation assert()ed that we can always find such a table, but this is not true - for example, in a DROP INDEX with "IF EXISTS", it is perfectly fine for no such table to exist. In this case we don't want a crash, and not even an except - it's fine that we just return an empty table name. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190716180104.15985-1-nyh@scylladb.com>	2019-07-17 09:44:47 +03:00
Kamil Braun	4417e78125	Fix timestamp_type_impl::timestamp_from_string. Now it accepts the 'z' or 'Z' timezone, denoting UTC+00:00. Fixes #4641. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-16 19:16:56 +03:00
Asias He	722ab3bb65	repair: Log repair id in check_failed_ranges Add the word `id` before the repair id in the log. It makes the log easier to figure out what the number stands for.	2019-07-16 19:10:19 +03:00
Avi Kivity	43690ecbdf	Merge "Fix disable_sstable_write synchronization with on_compaction_completion" from Benny " disable_sstable_write needs to acquire _sstable_deletion_sem to properly synchronize with background deletions done by on_compaction_completion to ensure no sstables will be created or deleted during reshuffle_sstables after storage_service::load_new_sstables disables sstable writes. Fixes #4622 Test: unit(dev), nodetool_additional_test.py migration_test.py " * 'scylla-4622-fix-disable-sstable-write' of https://github.com/bhalevy/scylla: table: document _sstables_lock/_sstable_deletion_sem locking order table: disable_sstable_write: acquire _sstable_deletion_sem table: uninline enable_sstable_write table: reshuffle_sstables: add log message	2019-07-16 19:06:58 +03:00
Amnon Heiman	399d79fc6f	init: do not allow replace-address for seeds If a node is a seed node, it can not be started with replace-address-first-boot or the replace-address flag. The issue is that as a seed node it will generate new tokens instead of replacing the existing one the user expect it to replaec when supplying the flags. This patch will throw a bad_configuration_error exception in this case. Fixes #3889 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-07-16 18:53:19 +03:00
Calle Wilund	dbc3499fd1	server: Fix cql notification inet address serialization Fixes #4717 Bug in ipv6 support series caused inet_address serialization to include an additional "size" parameter in the address chunk. Message-Id: <20190716134254.20708-1-calle@scylladb.com>	2019-07-16 16:51:59 +03:00
Botond Dénes	b40cf1c43d	tests: storage_service_for_tests: broadcast config to all shards Due to recent changes to the config subsystem, configuration has to be broadcast to all shards if one wishes to use it on them. The `storage_service_for_tests` has a `sharded<gms::gossiper>` member, which reads config values on initialization on each shard, causing a crash as the configuration was initialized only on shard 0. Add a call to `config::broadcast_to_all_shards()` to ensure all shards have access to valid config values.	2019-07-16 10:37:17 +03:00
Botond Dénes	fc9f46d7c1	tests: move storage_service_for_tests impl to test_services.cc Let's make it easier to find.	2019-07-16 10:36:49 +03:00
Paweł Dziepak	060e3f8ac2	mutation_partition: verify row::append_cell() precondition row::append_cell() has a precondition that the new cell column id needs to be larger than that of any other already existing cell. If this precondition is violated the row will end up in an invalid state. This patch adds assertion to make sure we fail early in such cases.	2019-07-15 23:25:06 +02:00
Botond Dénes	5f22771ea8	tests/mutation_reader_test stabilize test_multishard_combining_reader_non_strictly_monotonic_positions Currently the test_multishard_combining_reader_non_strictly_monotonic_positions is flaky. The test is somewhat unconventional, in that it doesn't use the same instance of data as the input to the test and as it's expected output, instead it invokes the method which generates this data (`make_fragments_with_non_monotonic_positions()`) twice, first to generate the input, and a secondly to generate the expected output. This means that the test is prone to any deviation in the data generated by said method. One such deviation, discovered recently, is that the method doesn't explicitly specify the deletion time of the generated range tombstones. This results in this deletion time sometimes differing between the test input and the expected output. Solve by explicitly passing the same deletion time to all created range tombstones. Refs: #4695	2019-07-15 23:24:16 +02:00
Tomasz Grabiec	14700c2ac4	Merge "Fix the system.size_estimates table" from Kamil Fixes a segfault when querying for an empty keyspace. Also, fixes an infinite loop on smp > 1. Queries to system.size_estimates table which are not single-partition queries caused Scylla to go into an infinite loop inside multishard_combining_reader::fill_buffer. This happened because multishard_combinind_reader assumes that shards return rows belonging to separate partitions, which was not the case for size_estimates_mutation_reader. Fixes #4689.	2019-07-15 22:09:30 +02:00
Asias He	8774adb9d0	repair: Avoid deadlock in remove_repair_meta Start n1, n2 Create ks with rf = 2 Run repair on n2 Stop n2 in the middle of repair n1 will notice n2 is DOWN, gossip handler will remove repair instance with n2 which calls remove_repair_meta(). Inside remove_repair_meta(), we have: ``` 1 return parallel_for_each(*repair_metas, [repair_metas] (auto& rm) { 2 return rm->stop(); 3 }).then([repair_metas, from] { 4 rlogger.debug("Removed all repair_meta for single node {}", from); 5 }); ``` Since 3.1, we start 16 repair instances in parallel which will create 16 readers.The reader semaphore is 10. At line 2, it calls ``` 6 future<> stop() { 7 auto gate_future = _gate.close(); 8 auto writer_future = _repair_writer.wait_for_writer_done(); 9 return when_all_succeed(std::move(gate_future), std::move(writer_future)); 10 } ``` The gate protects the reader to read data from disk: ``` 11 with_gate(_gate, [] { 12 read_rows_from_disk 13 return _repair_reader.read_mutation_fragment() --> calls reader() to read data 14 }) ``` So line 7 won't return until all the 16 readers return from the call of reader(). The problem is, the reader won't release the reader semaphore until the reader is destroyed! So, even if 10 out of the 16 readers have finished reading, they won't release the semaphore. As a result, the stop() hangs forever. To fix in short term, we can delete the reader, aka, drop the the repair_meta object once it is stopped. Refs: #4693	2019-07-15 21:51:57 +02:00
Benny Halevy	0e4567c881	table: document _sstables_lock/_sstable_deletion_sem locking order Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-15 19:20:35 +03:00
Calle Wilund	1ed9a44396	utils::config_file: Propagare broadcast_to_all_shards to dependent files Fixes #4713 Modifying config files to use sharded storage misses the fact that extensions are allowed to add non-member config fields to the main configuration, typically from "extra" config_file objects. Unless those "extra" files are broadcast when main file broadcast, the values will not be readable from other shards. This patch propagates the broadcast to all other config files whose entries are in the top level object. This ensures we always keep data up to date on config reload. Message-Id: <20190715135851.19948-1-calle@scylladb.com>	2019-07-15 17:02:09 +03:00
Nadav Har'El	9cc9facbea	configure.py: atomically overwrite build.ninja configure.py currently takes some time to write build.ninja. If the user interrupts (e.g., control-C) configure.py, it can leave behind a partial or even empty build.ninja file. This is most frustrating when the user didn't explicitly run "configure.py", but rather just ran "ninja" and ninja decided to run configure.py, and after interrupting it the user cannot run "ninja" again because build.ninja is gone. Another result of losing build.ninja is that the user now needs to remember which parameters to run "configure.py", because the old ones stored in build.ninja were lost. The solution in this patch is simple: We write the new build.ninja contents into a temporary file, not directly into build.ninja. Then, only when the entire file has been succesfully written, do we rename the temporary file to its intended name - build.ninja. Fixes #4706 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190715122129.16033-1-nyh@scylladb.com>	2019-07-15 15:34:48 +03:00
Eliran Sinvani	997a146c7f	auth: Prevent race between role_manager and pasword_authenticator When scylla is started for the first time with PasswordAuthenticator enabled, it can be that a record of the default superuser will be created in the table with the can_login and is_superuser set to null. It happens because the module in charge of creating the row is the role manger and the module in charge of setting the default password salted hash value is the password authenticator. Those two modules are started together, it the case when the password authenticator finish the initialization first, in the period until the role manager completes it initialization, the row contains those null columns and any loging attempt in this period will cause a memory access violation since those columns are not expected to ever be null. This patch removes the race by starting the password authenticator and autorizer only after the role manger finished its initialization. Tests: 1. Unit tests (release) 2. Auth and cqlsh auth related dtests. Fixes #4226 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20190714124839.8392-1-eliransin@scylladb.com>	2019-07-14 16:19:57 +03:00
Rafael Ávila de Espíndola	67c624d967	Add documentation for large_rows and large_cells Fixes #4552 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190614151907.20292-1-espindola@scylladb.com>	2019-07-12 19:21:26 +03:00
Amnon Heiman	1c6dec139f	API: compaction_manager add get pending tasks by table The pending tasks by table name API return an array of pending tasks by keyspace/table names. After this patch the following command would work: curl -X GET 'http://localhost:10000/compaction_manager/metrics/pending_tasks_by_table' Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-07-12 19:21:26 +03:00
Takuya ASADA	842f75d066	reloc: provide libthread_db.so.1 to debug thread on gdb In scylla-debuginfo package, we have /usr/lib/debug/opt/scylladb/libreloc/libthread_db-1.0.so-666.development-0.20190711.73a1978fb.el7.x86_64.debug but we actually does not have libthread_db.so.1 in /opt/scylladb/libreloc since it's not available on ldd result with scylla binary. To debug thread, we need to add the library in a relocatable package manually. Fixes #4673 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190711111058.7454-1-syuu@scylladb.com>	2019-07-12 19:21:26 +03:00
Piotr Sarna	ac7531d8d9	db,hints: decouple in-flight hints limits from resource manager The resource manager is used to manage common resources between various hints managers. In-flight hints used to be one of the shared resources, but it proves to cause starvation, when one manager eats the whole limit - which may be especially painful if the background materialized views hints manager starves the regular hints manager, which can in turn start failing user writes because of admission control. This patch makes the limit per-manager again, which effectively reverts the limit to its original behavior. Fixes #4483 Message-Id: <8498768e8bccbfa238e6a021f51ec0fa0bf3f7f9.1559649491.git.sarna@scylladb.com>	2019-07-12 19:21:26 +03:00
Rafael Ávila de Espíndola	4e7ffb80c0	cql: Fix use of UDT in reversed columns We were missing calls to underlying_type in a few locations and so the insert would think the given literal was invalid and the select would refuse to fetch a UDT field. Fixes #4672 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190708200516.59841-1-espindola@scylladb.com>	2019-07-12 19:21:26 +03:00
Kamil Braun	60a4867a5b	Fix infinite looping when performing a range query on system.size_estimates. Queries to system.size_estimates table which are not single parition queries caused Scylla to go into an infinite loop inside multishard_combining_reader::fill_buffer. This happened because multishard_combinind_reader assumes that shards return rows belonging to separate partitions, which was not the case for size_estimates_mutation_reader. This commit fixes the issue and closes #4689.	2019-07-12 18:09:15 +02:00
Kamil Braun	ba5a02169e	Fix segmentation fault when querying system.size_estimates for an empty keyspace.	2019-07-12 18:02:10 +02:00
Kamil Braun	a1665b74a9	Refactor size_estimates_virtual_reader Move the implementation of size_estimates_mutation_reader to a separate compilation unit to speed up compilation times and increase readability. Refactor tests to use seastar::thread.	2019-07-12 17:53:00 +02:00
Benny Halevy	6dad9baa1c	table: disable_sstable_write: acquire _sstable_deletion_sem `disable_sstable_write` needs to acquire `_sstable_deletion_sem` to properly synchronize with background deletions done by `on_compaction_completion` to ensure no sstables will be created or deleted during `reshuffle_sstables` after `storage_service::load_new_sstables` disables sstable writes. Fixes #4622 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-11 12:14:44 +03:00
Benny Halevy	bbbd749f70	table: uninline enable_sstable_write Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-11 12:14:44 +03:00
Benny Halevy	c6bad3f3c2	table: reshuffle_sstables: add log message To mark the point in time writes are disabled and scanning of the data directory is beginning. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-11 12:14:44 +03:00
Rafael Ávila de Espíndola	281f3a69f8	mc writer: Fix exception safety when closing _index_writer This fixes a possible cause of #4614. From the backtrace in that issue, it looks like a file is being closed twice. The first point in the backtrace where that seems likely is in the MC writer. My first idea was to add a writer::close and make it the responsibility of the code using the writer to call it. That way we would move work out of the destructor. That is a bit hard since the writer is destroyed from flat_mutation_reader::impl::~consumer_adapter and that would need to get a close function too. This patch instead just fixes an exception safety issue. If _index_writer->close() throws, _index_writer is still valid and ~writer will try to close it again. If the exception was thrown after _completed.set_value(), that would explain the assert about _completed.set_value() being called twice. With this patch the path outside of the destructor now moves the writer to a local variable before trying to close it. Fixes #4614 Message-Id: <20190710171747.27337-1-espindola@scylladb.com>	2019-07-10 19:27:19 +02:00
Paweł Dziepak	eb7d17e5c5	lsa: make sure align_up_for_asan() doesn't cause reads past end of segment In debug mode the LSA needs objects to be 8-byte aligned in order to maximise coverage from the AddressSanitizer. Usually `close_active()` creates a dummy objects that covers the end of the segment being closed. However, it the last real objects ends in the last eight bytes of the segment then that dummy won't be created because of the alignment requirements. This broke exit conditions on loops trying to read all objects in the segment and caused them to attempt to dereference address at the end of the segment. This patch fixes that. Fixes #4653.	2019-07-10 19:19:24 +02:00
Avi Kivity	e32bdb6b90	Merge "Warn user about using SimpleStrategy with Multi DC deployment" from Kamil " If the user creates a keyspace with the 'SimpleStrategy' replication class in a multi-datacenter environment, they will receive a warning in the CQL shell and in the server logs. Resolves #4481 and #4651. " * 'multidc' of https://github.com/kbr-/scylla: Warn user about using SimpleStrategy with Multi DC deployment Add warning support to the CQL binary protocol implementation	2019-07-10 16:47:07 +03:00
Avi Kivity	138b28ae43	Merge "Fix command line parsing and add logging." from Kamil " Fixes #4203 and #4141. " * 'cmdline' of https://github.com/kbr-/scylla: Add logging of parsed command line options Fix command line argument parsing in main.	2019-07-10 16:40:57 +03:00
Avi Kivity	405fd517b0	Merge "IPv6 support" from Calle " Fixes #2027 Modifies inet address type in scylla to use seastar::net::inet_address, and removes explicit use of ipv4_addr in various network code in favour of socket_address. Thus capable of resolving and binding to ipv6. Adds config option to enable/disable ipv6 (default enabled), so upgrading cluster can continue to work while running mixed version nodes (since gossip message address serialization becomes different). " * 'calle/ipv6' of https://github.com/elcallio/scylla: test-serialization: Add small roundtrip test for inet address (v4 + v6) inet_address/init: Make ipv6 default enabled db::config: Add enable ipv6 switch (default off) gms::inet_address: Make serialization ipv6 aware Remove usage of inet_address::raw_addr() Replace use of "ipv4_addr" with socket_address inet_address: Add optional family to lookup gms::inet_address: Change inet_address to wrap actual seastar::net::inet_address types: Add ipv6_address support	2019-07-10 15:07:56 +03:00
Benny Halevy	b4dc118639	tests: logalloc_test: scale down test_region_groups Post commit `b3adabda2d` (Reduce logalloc differences between debug and release) logalloc_test's memory footprint has grown, in particular in test_region_groups, and it triggers the oom killer on our test automation machines. This patch scales down this test case so it requires less memory. Fixes #4669 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-07-10 12:06:10 +02:00
Pekka Enberg	bb53c109b4	test.py: Add option for repeating test execution This adds a '--repeat N' command line option to test.py, which can be used to execute the tests N times. This is useful for finding flakey tests, for example. Message-Id: <20190710092115.15960-1-penberg@scylladb.com>	2019-07-10 12:42:39 +03:00
Botond Dénes	ce647fac9f	timestamp_based_splitting_writer: fix the handling of partition tombstone Currently the handling of partition tombstones is broken in multiple ways: * The partition-tombstone is lost when the bucket is calculated for its timestamp (due to a misplaced `std::exchange()`). * When the `partition_start` fragment (containing the partition tombstone) is actually written to the bucket we emit another `partition_start` fragment before it because the bucket has not seen that partition before and we fail to notice that we are actually writing the partition header. This bug was allowed to fly under the radar because the unit test was accidentally not creating partition tombstones in the generated data (due to a mistake). It was discovered while working on unit tests for another test and fixing the data generation function to actually generate partition tombstones. This patch fixes both problems in the handling of partition tombstones but it doesn't yet fixes the test. That is deferred until the patch series which uncovered this bug is merged to avoid merge conflicts. The other series mentioned here is: [PATCH v6 00/15] compaction: allow collecting purged data Fixes: #4683 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190710092427.122623-1-bdenes@scylladb.com>	2019-07-10 12:36:57 +03:00
Pekka Enberg	e6cc90aa98	test: add 'eventually' block to index paging test (#4681 ) Without 'eventually', the test is flaky because the index can still be not up to date while checking its conditions. Fixes #4670 Tests: unit(dev)	2019-07-10 11:46:03 +03:00
Kamil Braun	d6736a304a	Add metric for failed memtable flushes Resolves #3316. Signed-off-by: Kamil Braun <kbraun@scylladb.com>	2019-07-10 11:30:10 +03:00
Amnon Heiman	2fbc5ea852	config_file.hh: get_value return a pointer to the value The get_value method returns a pointer to the value that is used by the value_to_json method. The assumption is that the void pointer points to the actual value. Fixes #4678 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-07-10 10:40:35 +03:00
Piotr Sarna	ebbe038d19	test: add 'eventually' block to index paging test Without 'eventually', the test is flaky because the index can still be not up to date while checking its conditions. Fixes #4670	2019-07-09 17:07:16 +02:00
Asias He	39ca044dab	repair: Allow repair when a replica is down Since commit `bb56653` (repair: Sync schema from follower nodes before repair), the behaviour of handling down node during repair has been changed. That is, if a repair follower is down, it will fail to sync schema with it and the repair of the range will be skipped. This means a range can not be repaired unless all the nodes for the replicas are up. To fix, we filter out the nodes that is down and mark the repair is partial and repair with the nodes that are still up. Tests: repair_additional_test:RepairAdditionalTest.repair_with_down_nodes_2b_test Fixes: #4616 Backports: 3.1 Message-Id: <621572af40335cf5ad222c149345281e669f7116.1562568434.git.asias@scylladb.com>	2019-07-09 10:07:36 +03:00
Calle Wilund	5dfc356380	test-serialization: Add small roundtrip test for inet address (v4 + v6) Verify we get back what we put in.	2019-07-08 15:28:21 +00:00
Calle Wilund	3cfb79e0ff	inet_address/init: Make ipv6 default enabled Makes lookup find any (incl ipv6 numeric) address. Init will look at enable_ipv6 and use explcit ipv4 family lookup if not enabled.	2019-07-08 14:13:10 +00:00
Calle Wilund	1f5e1d22bf	db::config: Add enable ipv6 switch (default off) Off by default to prevent problems during cluster migration when needing to gossip with non-ipv6 aware nodes.	2019-07-08 14:13:09 +00:00
Calle Wilund	c540e36fe2	gms::inet_address: Make serialization ipv6 aware Because inet_address was initially hardcoded to ipv4, its wire format is not very forward compatible. Since we potentially need to communicate with older version nodes, we manually define the new serial format for inet_address to be: ipv4: 4 bytes address ipv6: 4 bytes marker 0xffffffff (invalid address) 16 bytes data -> address	2019-07-08 14:13:09 +00:00
Calle Wilund	e9816efe06	Remove usage of inet_address::raw_addr()	2019-07-08 14:13:09 +00:00
Calle Wilund	4ef940169f	Replace use of "ipv4_addr" with socket_address Allows the various sockets to use ipv6 address binding if so configured.	2019-07-08 14:13:09 +00:00

1 2 3 4 5 ...

18987 Commits