scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 03:45:11 +00:00

Author	SHA1	Message	Date
Piotr Jastrzebski	1a43849cd2	table: Add cache_enabled member function This function determines cache usage based both on table _config and dynamic schema information. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-05-06 18:39:01 +02:00
Piotr Jastrzebski	546dbf1fcc	cf_prop_defs: persist caching_options in schema Previously 'WITH CACHING =' was ignored both in CREATE TABLE and in ALTER TABLE statements. Now it will be persisted in schema so that it can be used later to control caching per table. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-05-06 18:38:37 +02:00
Piotr Jastrzebski	812dfd22bd	property_definitions: add get that returns variant Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-05-06 18:38:04 +02:00
Piotr Jastrzebski	0475dab359	feature: add PER_TABLE_CACHING feature This feature will ensure that caching can be switched off per table only after the whole cluster supports it. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-05-05 08:14:49 +02:00
Piotr Jastrzebski	2d727114ed	caching_options: add enabled parameter Scylla inherits from Origin two caching parameters (keys and rows_per_partition) that are ignored. This patch adds a new parameter called "enabled" which is true by default and controls whether cache is used for a selected table or not. If the parameter is missing in the map then it has the default value of true. To minimize the impact of this change, enabled == true is represented as an absence of this parameter. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-05-05 08:14:49 +02:00
Piotr Sarna	1c4e8f5030	alternator: fix checking max item depth Maximum item depth accepted by DynamoDB is 32, and alternator chose 39 as its arbitrary value in order to provide 7 shining new levels absolutely free of charge. Unfortunately, our code which checks the nesting level in rapidjson parsing bumps the counter by 2 for every object, which is due to rapidjson's internal implementation. In order to actually support at least 32 levels, the threshold is simply doubled. This commit comes with a test case which ensures that 32-nested items are accepted both by alternator and DynamoDB. The test case failed for alternator before the fix. Fixes #6366 Tests: unit(dev), alternator(local, remote)	2020-05-04 23:46:20 +03:00
Glauber Costa	c5cdd77f8e	gossip_test: start the compaction manager explicitly Right now the compaction_manager needs to be started explicitly. We may change it in the future, but right now that's how it is. Everything works now even without it, because compaction_manager::stop happens to work even if it was not started. But it is technically illegal. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200504143048.17201-1-glauber@scylladb.com>	2020-05-04 17:40:32 +03:00
Bentsi Magidovich	e77dad3adf	scylla_coredump_setup: Fix incorrect coredump directory mount The issue is that the mount is /var/lib/scylla/coredump -> /var/lib/systemd/coredump. But we need to do the opposite in order to save the coredump on the partition that Scylla is using: /var/lib/systemd/coredump-> /var/lib/scylla/coredump Fixes #6301	2020-05-04 15:47:45 +03:00
Avi Kivity	f3bcd4d205	Merge 'Support SSL Certificate Hot Reloading' from Calle " Fixes #6067 Makes the scylla endpoint initializations that support TLS use reloadable certificate stores, watching used cert + key files for changes, and reload iff modified. Tests in separate dtest set. " * elcallio-calle/reloadable-tls: transport: Use reloadable tls certificates redis: Use reloadable tls certificates alternator: Use reloadable tls certificates messaging_service: Use reloadable TLS certificates	2020-05-04 15:11:16 +03:00
Piotr Sarna	bec95a0605	treewide: use thread-safe variant of localtime In order to ensure thread-safety, all usages of localtime() are replaced with localtime_r(), which may accept a local buffer. Tests: unit(dev) Fixes #6364 Message-Id: <ad4a0c0e1707f0318325718715a3a647e3ebfdfe.1588592156.git.sarna@scylladb.com>	2020-05-04 14:46:08 +03:00
Calle Wilund	70aca26a3e	transport: Use reloadable tls certificates	2020-05-04 11:32:21 +00:00
Calle Wilund	bacf2fa981	redis: Use reloadable tls certificates	2020-05-04 11:32:21 +00:00
Calle Wilund	cc9bb6454c	alternator: Use reloadable tls certificates	2020-05-04 11:32:21 +00:00
Calle Wilund	08d069f78d	messaging_service: Use reloadable TLS certificates Changes messaging service rpc to use reloadable tls certificates iff tls is enabled- Note that this means that the service cannot start listening at construction time if TLS is active, and user need to call start_listen_ex to initialize and actually start the service. Since "normal" messaging service is actually started from gms, this route too is made a continuation.	2020-05-04 11:32:21 +00:00
Piotr Sarna	fb7fa7f442	alternator: fix signature timestamps Generating timestamps for auth signatures used a non-thread-safe ::gmtime function instead of thread-safe ::gmtime_r. Tests: unit(dev) Fixes #6345	2020-05-04 14:12:11 +03:00
Piotr Sarna	05ec95134a	clocks-impl: switch to thread-safe time conversion std::gmtime() has a sad property of using a global static buffer for returning its value. This is not thread-safe, so its usage is replaced with gmtime_r, which can accept a local buffer. While no regressions where observed in this particular area of code, a similar bug caused failures in alternator, so it's better to simply replace all std::gmtime calls with their thread-safe counterpart. Message-Id: <39e91c74de95f8313e6bb0b12114bf12c0e79519.1588589151.git.sarna@scylladb.com>	2020-05-04 14:11:38 +03:00
Takuya ASADA	57f3f82ed1	redis: add EX option for set command Add EX option for SET command, to set TTL for the key. A behavior of SET EX is same as SETEX command, it just different syntax. see: https://redis.io/commands/set	2020-05-04 13:58:18 +03:00
Eliran Sinvani	a346e862c1	Auth: return correct error code when role is not found Scylla returns the wrong error code (0000 - server internal error) in response to trying to do authentication/authorization operations that involves a non-existing role. This commit changes those cases to return error code 2200 (invalid query) which is the correct one and also the one that Cassandra returns. Tests: Unit tests (Dev) All auth and auth_role dtests	2020-05-04 12:57:27 +03:00
Glauber Costa	55f5ca39a9	sstable_test: rework test to use a thread The compaction_manager test lives inside a thread and it is not taking advantage of it, with continuations all over. One of the side effects of it is that the test is calling stop() twice on the compaction_manager. While this works today, it is not good practice. A change I am making is just about to break it. This patch converts the test to fully use .get() instead of chained continuations and in doing so also guarantees that the compaction manager will be RAII-stopped just one, from a defer object. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200503161420.8346-2-glauber@scylladb.com>	2020-05-03 19:54:04 +03:00
Piotr Sarna	bf5f247bc5	db: set gc grace period to 0 for local system tables Local system tables from `system` namespace use LocalStrategy replication, so they do not need to be concerned about gc grace period. Some system tables already set gc grace period to 0, but other ones, including system.large_partitions, did not. That may result in millions of tombstones being needlessly kept for these tables, which can cause read timeouts. Fixes #6325 Tests: unit(dev), local(running cqlsh and playing with system tables)	2020-05-03 17:41:50 +03:00
Avi Kivity	9952cdfec1	Merge "scylla-gdb.py: improve finding references to intrusive container elements" from Botond " Intrusive containers often have references between containers elements that point to some non-first word of the element. This references currently fly below the radar of `scylla find` and `scylla generate-object-graph`, as they are looking to references to only the first word of the objects. So objects that are members of an intrusive container often appear to have no inbound references at all. This patch-set improves support for finding such references by looking for references to non-first words of objects. It also includes some generic, minor improvements to scylla generate_object_graph. " * 'scylla-gdb.py-scylla-generate-object-graph-linked-lists/v1' of https://github.com/denesb/scylla: scylla-gdb.py: scylla generate_object_graph: make label of initial vertice bold scylla-gdb.py: scylla generate_object_graph: remove redundant lookup scylla-gdb.py: scylla generate_object_graph: print "to" offsets scylla-gdb.py: scylla generate-object-graph: use value-range to find references scylla-gdb.py: scylla find: allow finding ranges of values scylla-gdb.py: find_in_live(): return pointer_metadata instances	2020-05-03 16:22:22 +03:00
Glauber Costa	70e5252a5d	table: no longer accept online loading of SSTable files in the main directory Loading SSTables from the main directory is possible, to be compatible with Cassandra, but extremely dangerous and not recommended. From the beginning, we recommend using an separate, upload/ directory. In all this time, perhaps due to how the feature's usefulness is reduced in Cassandra due to the possible races, I have never seen anyone coming from Cassandra doing procedures involving refresh at all. Loading SSTables from the main directory forces us to disable writes to the table temporarily until the SSTables are sorted out. If we get rid of this, we can get rid of the disabling of the writes as well. We can't do it now because if we want to be nice to the odd user that may be using refresh through the main directory without our knowledge we should at least error out. This patch, then, does that: it errors out if SSTables are found in the main directory. It will not proceed with the refresh, and direct the user to the upload directory. The main loop in reshuffle_sstables is left in place structurally for now, but most of it is gone. The test for is is deleted. After a period of deprecation we can start ignoring these SSTables and get rid of the lock. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200429144511.13681-1-glauber@scylladb.com>	2020-05-03 08:40:38 +03:00
Glauber Costa	e44b2826ab	compaction: avoid abandoned futures when using interposers When using interposers, cancelling compactions can leave futures that are not waited for (resharding, twcs) The reason is when consume_end_of_stream gets called, it tries to push end_of_stream into the queue_reader_handle. Because cancelling a compaction is done through an exception, the queue_reader_handle is terminated already at this time. Trying to push to it generates another exception and prevents us from returning the future right below it. This patch adds a new method is_terminated() and if we detect that the queue_reader_handle is already terminated by this point, we don't try to push. We call it is_terminated() because the check is to see if the queue_reader_handle has a _reader. The reader is also set to null on successful destruction. Signed-off-by: Glauber Costa <glauber@scylladb.com> Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200430175839.8292-1-glauber@scylladb.com>	2020-05-01 16:30:23 +03:00
Avi Kivity	122f57871d	Update seastar submodule * seastar 0523b0fac...3c2e27811 (2): > future: Add a futurizer::satisfy_with_result_of > future: Move concept definitions earlier	2020-05-01 12:55:48 +03:00
Tomasz Grabiec	d78fbf7c16	Merge "storage_service: Make replacing node take writes" from Asias Background: Replace operation is used to replace a dead node in the cluster. Currently during replace operation, the replacing node does not take any writes. As a result, new writes to a range after the sync for that range is done, e.g., after streaming for that range is finished, will not be synced to the replacing node. Hinted hand off or repair after the replacing operation will help. But it is better if we can make the writes to the replacing node to avoid any post replacing operation actions. After this series and repair based node operation series, the replace operation will guarantee the replacing node has all the latest copy of data including the new writes during the replace operation. In short, no more repairs before or after the replacing operation. Just replacing the node is enough. Implementation: Filter the node being replaced out of the natural endpoints in storage_proxy, so that: The node being replaced will not be selected as the target for normal write or normal read. Do not depend on the gossip liveness to avoid selecting replacing node for normal write or normal read when the replacing node has the same ip address as the node being replaced. No more special handling for hibernate state in gossip which makes it is simpler and more robust. Replacing node will be marked as UP. Put the replacing node in the pending list, so that: Replacing node will take writes but write to replacing will not be counted as CL. Replacing node will not take normal read. Example: For example, with RF = 3, n1, n2, n3 in the cluster, n3 is dead and being replaced by node n4. When n4 starts: writes to nodes {n1, n2, n3} are changed to normal_replica_writes = {n1, n2} and pending_replica_writes= {n4}. reads to nodes {n1, n2, n3} are changed to normal_replica_reads = {n1, n2} only. This way, the replacing node n4 now takes writes but does not take reads. Tests: Measure the number of writes during pending period that is the replacing starts and finishes the replace operation. Start 5 nodes, n1 to n5. Stop n5 Start write in the background Start n6 to replace n5 Get scylla_database_total_writes metrics when the replacing node announces HIBERNATE (replacing) and NORMAL status. Before: 2020-02-06 08:35:35.921837 Get metrics when other knows replacing node = HIBERNATE 2020-02-06 08:35:35.939493 scylla_database_total_writes: node1={'scylla_database_total_writes': 15483} 2020-02-06 08:35:35.950614 scylla_database_total_writes: node2={'scylla_database_total_writes': 15857} 2020-02-06 08:35:35.961820 scylla_database_total_writes: node3={'scylla_database_total_writes': 16195} 2020-02-06 08:35:35.978427 scylla_database_total_writes: node4={'scylla_database_total_writes': 15764} 2020-02-06 08:35:35.992580 scylla_database_total_writes: node6={'scylla_database_total_writes': 331} 2020-02-06 08:36:49.794790 Get metrics when other knows replacing node = NORMAL 2020-02-06 08:36:49.809189 scylla_database_total_writes: node1={'scylla_database_total_writes': 267088} 2020-02-06 08:36:49.823302 scylla_database_total_writes: node2={'scylla_database_total_writes': 272352} 2020-02-06 08:36:49.837228 scylla_database_total_writes: node3={'scylla_database_total_writes': 274004} 2020-02-06 08:36:49.851104 scylla_database_total_writes: node4={'scylla_database_total_writes': 262972} 2020-02-06 08:36:49.862504 scylla_database_total_writes: node6={'scylla_database_total_writes': 513} Writes = 513 - 331 After: 2020-02-06 08:28:56.548047 Get metrics when other knows replacing node = HIBERNATE 2020-02-06 08:28:56.560813 scylla_database_total_writes: node1={'scylla_database_total_writes': 290886} 2020-02-06 08:28:56.573925 scylla_database_total_writes: node2={'scylla_database_total_writes': 310304} 2020-02-06 08:28:56.586305 scylla_database_total_writes: node3={'scylla_database_total_writes': 304049} 2020-02-06 08:28:56.601464 scylla_database_total_writes: node4={'scylla_database_total_writes': 303770} 2020-02-06 08:28:56.615066 scylla_database_total_writes: node6={'scylla_database_total_writes': 604} 2020-02-06 08:29:10.537016 Get metrics when other knows replacing node = NORMAL 2020-02-06 08:29:10.553257 scylla_database_total_writes: node1={'scylla_database_total_writes': 336126} 2020-02-06 08:29:10.567181 scylla_database_total_writes: node2={'scylla_database_total_writes': 358549} 2020-02-06 08:29:10.581939 scylla_database_total_writes: node3={'scylla_database_total_writes': 351416} 2020-02-06 08:29:10.595567 scylla_database_total_writes: node4={'scylla_database_total_writes': 350580} 2020-02-06 08:29:10.610548 scylla_database_total_writes: node6={'scylla_database_total_writes': 45460} Writes = 45460 - 604 As we can see the replacing node did not take write before and take write after the patch. Check log of writer handler in storage_proxy storage_proxy - creating write handler for token: -2642068240672386521, keyspace_name=ks, original_natrual={127.0.0.1, 127.0.0.5, 127.0.0.2}, natural={127.0.0.1, 127.0.0.2}, pending={127.0.0.6} The node being replaced, n5=127.0.0.5, is filtered out and the replacing node, n6=127.0.0.6 is in the pending list. * asias/replace_take_writes: storage_service: Make replacing node take writes repair: Use token_metadata with the replacing node in do_rebuild_replace_with_repair abstract_replication_strategy: Add get_ranges which takes token_metadata abstract_replication_strategy: Add get_natural_endpoints_without_node_being_replaced abstract_replication_strategy: Add allow_remove_node_being_replaced_from_natural_endpoints token_metadata: Calculate pending ranges for replacing node storage_service: Unify handling of replaced node removal from gossip storage_service: Update tokens and replace address for replace operation	2020-04-30 19:28:35 +02:00
Pavel Emelyanov	513ce1e6a5	storage_proxy_stats: Make get_ep_stat() noexcept The .get_ep_stat(ep) call can throw when registering metrics (we have issue for it, #5697). This is not expected by it callers, in particular abstract_write_response_handler::timeout_cb breaks in the middle and doesn't call the on_timeout() and the _proxy->remove_response_handler(), which results in not removed and not released responce handler. In turn not released response handler doesn't set the _ready future on which response_wait() waits -> stuck. Although the issue with .get_ep_stat() should be fixed, an exception in it mustn't lead to deadlocks, so the fix is to make the get_ep_stat() noexcept by catching the exception and returning a dummy stat object instead to let caller(s) finish. Fixes #5985 Tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200430163639.5242-1-xemul@scylladb.com>	2020-04-30 19:40:08 +03:00
Avi Kivity	88224619b6	Update seastar submodule * seastar d0cbf7d1e8...0523b0fac4 (1): > Merge "Fix issues found by valgrind" from Rafael	2020-04-30 19:20:37 +03:00
Asias He	b8ac10c451	config: Do not enable repair based node operations by default Give it some more time to mature. Use the old stream plan based node operations by default. Fixes: #6305 Backports: 4.0	2020-04-30 12:37:24 +03:00
Avi Kivity	8925e00e96	Merge 'Fix hang in multishard_writer' from Asias " This series fix hang in multishard_writer when error happens. It contains - multishard_writer: Abort the queue attached to consumers when producer fails - repair: Fix hang when the writer is dead Fixes #6241 Refs: #6248 " * asias-stream_fix_multishard_writer_hang: repair: Fix hang when the writer is dead mutation_writer_test: Add test_multishard_writer_producer_aborts multishard_writer: Abort the queue attached to consumers when producer fails	2020-04-30 12:27:55 +03:00
Avi Kivity	280854ab46	Merge " Avoid use-after-free of sstable writer" from Rafael " The backlog_controller has a timer that periodically accesses the sstable writers of ongoing writes. This patch series makes sure we remove entries from the list of ongoing writes before the corresponding sstable writer is destroyed. Fixes #6221. " * 'espindola/fix-6221-v5' of https://github.com/espindola/scylla: sstables: Call revert_charges in compaction_write_monitor::write_failed sstables: Call monitor->write_failed earlier. sstables: Add write_failed to the write_monitor interface	2020-04-30 12:21:27 +03:00
Pekka Enberg	5c6265d14b	Merge 'redis: add setex and ttl commands' from Takuya "Enabling TTL feature, add setex and ttl commands to use it." * 'redis_setex_ttl' of git://github.com/syuu1228/scylla: redis: add test for setex/ttl redis: add ttl command redis: add setex command	2020-04-30 09:39:48 +03:00
Pekka Enberg	d4c0d80f13	Merge 'redis: add lolwut test' from Takuya "Add test for lolwut command, and also fix a bug on lolwut found by the test." * 'redis_lolwut_test' of git://github.com/syuu1228/scylla: redis: lolwut parameter fix redis-test: add lolwut test	2020-04-30 09:30:43 +03:00
Piotr Sarna	c7c8bd0978	Update seastar submodule * seastar 8fae03c2...d0cbf7d1 (6): > tests: restore compatibility with C++14 (broken due to std::filesystem) > http: make headers case-insensitive > on_internal_error: add scoped_no_abort_on_internal_error > Merge "make when_all functions noexcept" from Benny > chunked_fifo: fix underflow in reserve() > doc: document compatibility promises Fixes #6319	2020-04-30 07:29:23 +02:00
Asias He	7d86a3b208	storage_service: Make replacing node take writes Background: Replace operation is used to replace a dead node in the cluster. Currently during replace operation, the replacing node does not take any writes. As a result, new writes to a range after the sync for that range is done, e.g., after streaming for that range is finished, will not be synced to the replacing node. Hinted hand off or repair after the replacing operation will help. But it is better if we can make the writes to the replacing node to avoid any post replacing operation actions. After this series and repair based node operation series, the replace operation will guarantee the replacing node has all the latest copy of data including the new writes during the replace operation. In short, no more repairs before or after the replacing operation. Just replacing the node is enough. Implementation: 1) Filter the node being replaced out of the natural endpoints in storage_proxy, so that: - The node being replaced will not be selected as the target for normal write or normal read. - Do not depend on the gossip liveness to avoid selecting replacing node for normal write or normal read when the replacing node has the same ip address as the node being replaced. No more special handling for hibernate state in gossip which makes it is simpler and more robust. Replacing node will be marked as UP. 2) Put the replacing node in the pending list, so that: - Replacing node will take writes but write to replacing will not be counted as CL. - Replacing node will not take normal read. Example: For example, with RF = 3, n1, n2, n3 in the cluster, n3 is dead and being replaced by node n4. When n4 starts: - writes to nodes {n1, n2, n3} are changed to normal_replica_writes = {n1, n2} and pending_replica_writes= {n4}. - reads to nodes {n1, n2, n3} are changed to normal_replica_reads = {n1, n2} only. This way, the replacing node n4 now takes writes but does not take reads. Tests: 1) Measure the number of writes during pending period that is the replacing starts and finishes the replace operation. - Start 5 nodes, n1 to n5. - Stop n5 - Start write in the background - Start n6 to replace n5 - Get scylla_database_total_writes metrics when the replacing node announces HIBERNATE (replacing) and NORMAL status. Before: 2020-02-06 08:35:35.921837 Get metrics when other knows replacing node = HIBERNATE 2020-02-06 08:35:35.939493 scylla_database_total_writes: node1={'scylla_database_total_writes': 15483} 2020-02-06 08:35:35.950614 scylla_database_total_writes: node2={'scylla_database_total_writes': 15857} 2020-02-06 08:35:35.961820 scylla_database_total_writes: node3={'scylla_database_total_writes': 16195} 2020-02-06 08:35:35.978427 scylla_database_total_writes: node4={'scylla_database_total_writes': 15764} 2020-02-06 08:35:35.992580 scylla_database_total_writes: node6={'scylla_database_total_writes': 331} 2020-02-06 08:36:49.794790 Get metrics when other knows replacing node = NORMAL 2020-02-06 08:36:49.809189 scylla_database_total_writes: node1={'scylla_database_total_writes': 267088} 2020-02-06 08:36:49.823302 scylla_database_total_writes: node2={'scylla_database_total_writes': 272352} 2020-02-06 08:36:49.837228 scylla_database_total_writes: node3={'scylla_database_total_writes': 274004} 2020-02-06 08:36:49.851104 scylla_database_total_writes: node4={'scylla_database_total_writes': 262972} 2020-02-06 08:36:49.862504 scylla_database_total_writes: node6={'scylla_database_total_writes': 513} Writes = 513 - 331 After: 2020-02-06 08:28:56.548047 Get metrics when other knows replacing node = HIBERNATE 2020-02-06 08:28:56.560813 scylla_database_total_writes: node1={'scylla_database_total_writes': 290886} 2020-02-06 08:28:56.573925 scylla_database_total_writes: node2={'scylla_database_total_writes': 310304} 2020-02-06 08:28:56.586305 scylla_database_total_writes: node3={'scylla_database_total_writes': 304049} 2020-02-06 08:28:56.601464 scylla_database_total_writes: node4={'scylla_database_total_writes': 303770} 2020-02-06 08:28:56.615066 scylla_database_total_writes: node6={'scylla_database_total_writes': 604} 2020-02-06 08:29:10.537016 Get metrics when other knows replacing node = NORMAL 2020-02-06 08:29:10.553257 scylla_database_total_writes: node1={'scylla_database_total_writes': 336126} 2020-02-06 08:29:10.567181 scylla_database_total_writes: node2={'scylla_database_total_writes': 358549} 2020-02-06 08:29:10.581939 scylla_database_total_writes: node3={'scylla_database_total_writes': 351416} 2020-02-06 08:29:10.595567 scylla_database_total_writes: node4={'scylla_database_total_writes': 350580} 2020-02-06 08:29:10.610548 scylla_database_total_writes: node6={'scylla_database_total_writes': 45460} Writes = 45460 - 604 As we can see the replacing node did not take write before and take write after the patch. 2) Check log of writer handler in storage_proxy storage_proxy - creating write handler for token: -2642068240672386521, keyspace_name=ks, original_natrual={127.0.0.1, 127.0.0.5, 127.0.0.2}, natural={127.0.0.1, 127.0.0.2}, pending={127.0.0.6} The node being replaced, n5=127.0.0.5, is filtered out and the replacing node, n6=127.0.0.6 is in the pending list. Fixes: #5482	2020-04-30 10:22:30 +08:00
Asias He	e3fbc8fba1	repair: Use token_metadata with the replacing node in do_rebuild_replace_with_repair We will change the update of tokens in token_metadata in the next patch so that the tokens of the replacing node are updated to token_metadata only after the replace operation is done. In order to get the correct ranges for the replacing node in do_rebuild_replace_with_repair, we need to use a copy of token_metadata contains the tokens of the replacing node. Refs: #5482	2020-04-30 10:22:30 +08:00
Asias He	b640614aa6	abstract_replication_strategy: Add get_ranges which takes token_metadata It is useful when the caller wants to calculate ranges using a custom token_metadata. It will be used soon in do_rebuild_replace_with_repair for replace operation. Refs: #5482	2020-04-30 10:22:30 +08:00
Asias He	37d3d3e051	abstract_replication_strategy: Add get_natural_endpoints_without_node_being_replaced Similar to natural_endpoints but with the node being replaced filtered out. Refs: #5482	2020-04-30 10:22:30 +08:00
Asias He	1a75a60cfc	abstract_replication_strategy: Add allow_remove_node_being_replaced_from_natural_endpoints Decide if the replication strategy allow removing the node being replaced from the natural endpoints when a node is being replaced in the cluster. LocalStrategy is the not allowed to do so because it always returns the node itself as the natural_endpoints and the node will not appear in the pending_endpoints. It is needed by the "Make replacing node take writes" work. Refs: #5482	2020-04-30 10:22:30 +08:00
Pekka Enberg	eac9e253e7	sstables: Fix open-coded version parsing in make_descriptor() The make_descriptor() function parses a string representation of sstable version using a ternary operator. Clean it up by using sstables::from_string(), which is future-proof when we add support for later sstable formats. Message-Id: <20200429082126.15944-1-penberg@scylladb.com>	2020-04-29 16:25:12 +02:00
Asias He	bd6691301e	token_metadata: Calculate pending ranges for replacing node It will be needed soon for making replace node take writes. Refs: #5482	2020-04-29 16:02:10 +08:00
Asias He	75cf1d18b5	storage_service: Unify handling of replaced node removal from gossip Currently, after the replacing node finishes the replace operation, it removes the node being replaced from gossip directly in storage_service::join_token_ring() with gossiper::replaced_endpoint(), so the gossip states for the replaced node is gone. When other nodes knows the replace operation is done, they will call storage_service::remove_endpoint() and gossiper::remove_endpoint() to quarantine the node but keep the gossip states. To prevent the replacing node learns the state of replaced node again from existing node again, the replacing node uses 2X quarantine time. This makes the gossip states for the replaced node different on other nodes and replacing nodes. It makes it is harder to reason about the gossip states because the discrepancy of the states between nodes. To fix, we unify the handling of replaced node on both replacing node and other nodes. On all the nodes, once the replacing node becomes NORMAL status, we remove the replaced node from token_metadata and quarantine it but keep the gossip state. Since the replaced node is no longer a member of the cluster, the fatclient timer will count and expire and remove the replaced node from gossip. Refs: #5482	2020-04-29 16:02:10 +08:00
Asias He	66c1907524	storage_service: Update tokens and replace address for replace operation The motivation is to make the replacing node has the same view of the token ring as the rest of the cluster. If the replacing node has the same ip of the node being replaced, we should update the tokens in token_metadata when the replace operation starts, so that this replacing node and the rest of the cluster see the same token ring. If the replacing node has the different ip address of the node being replaced, we should update the tokens in token_metadata only when replace operation is done, because the other nodes will update the replacing node's token in token_metadata when the replace operation is done. Refs: #5482	2020-04-29 16:02:00 +08:00
Nadav Har'El	ff5615d59d	alternator test: drastically reduce time to boot Scylla The alternator test, test/alternator/run, runs Scylla and runs the various tests against it. Before this patch, just booting Scylla took about 26 seconds (for a dev build, on my laptop). This patch reduces this delay to less than one second! It turns out that almost the entire delay was artificial, two periods of 12 seconds "waiting for the gossip to settle", which are completely unnecessary in the one-node cluster used in the Alternator test. So a simple "--skip-wait-for-gossip-to-settle 0" parameter eliminates these long delays completely. Amusingly, the Scylla boot is now so fast, that I had to change a "sleep 2" in the test script to "sleep 1", because 2 seconds is now much more than it takes to boot Scylla :-) Fixes #6310. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200428145035.22894-1-nyh@scylladb.com>	2020-04-29 07:55:03 +02:00
Benny Halevy	3b31acfa80	exceptions: drop OVERFLOW_ERROR cql binary protocol extension Client drivers act differently on errors codes they don't recognize. Adding new errors codes is considered a protocol extension and should be negotiated with the client. This change keeps `overflow_error_exception` internally but uses the INVALID cql error code to return the error message back to the client similar to keyspace_not_defined_exception. We (and cassandra) already use `invalid_request_exception` extensively to return various errors related to invalid values or types in the query. Fixes #6264 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Reviewed-by: Gleb Natapov <gleb@scylladb.com> Message-Id: <20200422130011.108003-1-bhalevy@scylladb.com>	2020-04-28 12:16:00 +03:00
Piotr Sarna	09e4f3b917	alternator: implement ScanIndexForward The ScanIndexForward parameter is now fully implemented and can accept ScanIndexForward=false in order to query the partitions in reverse clustering order. Note that reading partition slices in reverse order is less efficient than forward scans and may put a strain on memory usage, especially for large partitions, since the whole partition is currently fetched in order to be reversed. Fixes #5153	2020-04-28 11:44:46 +03:00
Piotr Sarna	be5d3f4733	Merge 'A bunch of refactors in versioned_value and gossiper' from Kamil 1. Remove the `versioned_value::factory` class, it didn't add any value. It just forced us to create an object for making `versioned_value`s, for no sensible reason. 2. Move some `versioned_value` deserialization code (string -> internal data structures) into the versioned_value module. Previously, it was scattered all around the place. 3. Make `gossiper::get_seeds` const and return a const reference. I needed these refactors for a PR I was preparing to fix an issue with CDC. The attempt of fixing the issue failed (I'm trying something different now), but the refactors might be useful anyway. * kbr--vv-refactor: gossiper: make `get_seeds` method const and return a const ref versioned_value: remove versioned_value::factory class gms: move TOKENS string deserialization code into versioned_value	2020-04-28 10:27:45 +02:00
Pavel Solodovnikov	ed7a7554b8	storage_proxy: allow cas() to accept nullptr read_command This patch allows users of storage_proxy::cas() to supply nullptr as `query::read_command` which is supposed to skip the procedure of reading the existing value. The feature is used in alternator code for Read-Modify-Write operations: some of them don't require reading previous item values before updating. Move `read_nothing_read_command` from alternator code to storage_proxy layer and fabricate a new no-op command from it when storage_proxy::cas() is used with nullptr read_command. This allows to avoid sprinkling if-else branches all over the code in order to check for null-equality of `cmd`. We return from storage_proxy::query() very early with an empty result in case we're given an empty partition_slice (which resides inside the passed `read_command`) so this approach should be perfectly fine. Expand documentation for the `cas()` function to cover new possible value for `cmd` argument. Fixes: #6238 Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200428065235.5714-1-pa.solodovnikov@scylladb.com>	2020-04-28 10:44:19 +03:00
Asias He	35c5ef78b9	repair: Fix hang when the writer is dead Consdier: When repair master gets data from repair follower: 1) apply_rows_on_master_in_thread is called 2) a repair writer is created with _repair_writer.create_writer 3) the repair writer fails 4) data is written to the queue _mq[node_idx]->push_eventually attached with the writer Since the writer is dead. No one is going to fetch data from the _mq queue. The apply_rows_on_master_in_thread will block forever. To fix, when the writer is failed, we should abort the _mq queue. Refs: #6248	2020-04-28 12:14:32 +08:00
Raphael S. Carvalho	02e046608f	api/service: fix segfault when taking a snapshot without keyspace specified If no keyspace is specified when taking snapshot, there will be a segfault because keynames is unconditionally dereferenced. Let's return an error because a keyspace must be specified when column families are specified. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200427195634.99940-1-raphaelsc@scylladb.com>	2020-04-27 23:37:00 +03:00
Pekka Enberg	3a10bddd7d	configure.py: Add '--with-seastar' option This patch adds a '--with-seastar=<PATH>' option to configure.py, which allows user to override the default seastar submodule path. This is useful when building packages from source tarballs, for example. Message-Id: <20200427165511.6448-1-penberg@scylladb.com>	2020-04-27 20:01:35 +03:00

1 2 3 4 5 ...

21980 Commits