scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 19:10:42 +00:00

Author	SHA1	Message	Date
Israel Fruchter	f1f5586bf6	scylla_coredump_setup: Remove the coredump create by the check We generate a coredump as part of "scylla_coredump_setup" to verify that coredumps are working. However, we need to remove that test coredump to avoid people and test infrastructure reporting those coredumps. Fixes #6159 (cherry picked from commit `28c3d4f8e8`)	2020-06-03 16:52:51 +03:00
Amos Kong	3a447cd755	active the coredump directory mount during coredump setup Currently we use a systemd mount (var-lib-systemd-coredump.mount) to mount default coredump directory (/var/lib/systemd/coredump) to (/var/lib/scylla/coredump). The /var/lib/scylla had been mounted to a big storage, so we will have enough space for coredump after the mount. Currently in coredump_setup, we only enabled var-lib-systemd-coredump.mount, but not start it. The directory won't be mounted after coredump_setup, so the coredump will still be saved to default coredump directory. The mount will only effect after reboot. Fixes #6566 (cherry picked from commit `abf246f6e5`)	2020-06-03 09:25:59 +03:00
Pekka Enberg	176aa91be5	Revert "scylla_coredump_setup: Fix incorrect coredump directory mount" This reverts commit `e77dad3adf` because its incorrect. Amos explains: "Quote from https://www.freedesktop.org/software/systemd/man/systemd.mount.html What= Takes an absolute path of a device node, file or other resource to mount. See mount(8) for details. If this refers to a device node, a dependency on the respective device unit is automatically created. Where= Takes an absolute path of a file or directory for the mount point; in particular, the destination cannot be a symbolic link. If the mount point does not exist at the time of mounting, it is created as directory. So the mount point is '/var/lib/systemd/coredump' and '/var/lib/scylla/coredump' is the file to mount, because /var/lib/scylla had mounted a second big storage, which has enough space for Huge coredumps. Bentsi or other touched problem with old scylla-master AMI, a coredump occurred but not successfully saved to disk for enospc. The directory /var/lib/systemd/coredump wasn't mounted to /var/lib/scylla/coredump. They WRONGLY thought the wrong mount was caused by the config problem, so he posted a fix. Actually scylla-ami-setup / coredump wasn't executed on that AMI, err: unit scylla-ami-setup.service not found Because 'scylla-ami-setup.service' config file doesn't exist or is invalid. Details of my testing: https://github.com/scylladb/scylla/issues/6300#issuecomment-637324507 So we need to revert Bentsi's patch, it changed the right config to wrong." (cherry picked from commit `9d9d54c804`)	2020-06-03 09:25:49 +03:00
Avi Kivity	4a3eff17ff	Revert "Revert "config: Do not enable repair based node operations by default"" This reverts commit `71d0d58f8c`. Repair-based node operations are still not ready.	2020-06-02 18:08:03 +03:00
Nadav Har'El	2e00f6d0a1	alternator: fix support for bytes type in Query's KeyConditions Our parsing of values in a KeyConditions paramter of Query was done naively. As a result, we got bizarre error messages "condition not met: false" when these values had incorrect type (this is issue #6490). Worse - the naive conversion did not decode base64-encoded bytes value as needed, so KeyConditions on bytes-typed keys did not work at all. This patch fixes these bugs by using our existing utility function get_key_from_typed_value(), which takes care of throwing sensible errors when types don't match, and decoding base64 as needed. Unfortunately, we didn't have test coverage for many of the KeyConditions features including bytes keys, which is why this issue escaped detection. A patch will follow with much more comprehensive tests for KeyConditions, which also reproduce this issue and verify that it is fixed. Refs #6490 Fixes #6495 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200524141800.104950-1-nyh@scylladb.com> (cherry picked from commit `6b38126a8f`)	2020-05-31 13:53:45 +03:00
Nadav Har'El	bf509c3b16	alternator: add mandatory configurable write isolation mode Alternator supports four ways in which write operations can use quorum writes or LWT or both, which we called "write isolation policies". Until this patch, Alternator defaulted to the most generally safe policy, "always_use_lwt". This default could have been overriden for each table separately, but there was no way to change this default for all tables. This patch adds a "--alternator-write-isolation" configuration option which allows changing the default. Moreover, @dorlaor asked that users must explicitly choose this default mode, and not get "always_use_lwt" without noticing. The previous default, "always_use_lwt" supports any workload correctly but because it uses LWT for all writes it may be disappointingly slow for users who run write-only workloads (including most benchmarks) - such users might find the slow writes so disappointing that they will drop Scylla. Conversely, a default of "forbid_rmw" will be faster and still correct, but will fail on workloads which need read-modify-write operations - and suprise users that need these operations. So Dor asked that that none of the write modes be made the default, and users must make an informed choice between the different write modes, rather than being disappointed by a default choice they weren't aware of. So after this patch, Scylla refuses to boot if Alternator is enabled but a "--alternator-write-isolation" option is missing. The patch also modifies the relevant documentation, adds the same option to our docker image, and the modifies the test-running script test/alternator/run to run Scylla with the old default mode (always_use_lwt), which we need because we want to test RMW operations as well. Fixes #6452 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200524160338.108417-1-nyh@scylladb.com> (cherry picked from commit `c3da9f2bd4`)	2020-05-31 13:42:11 +03:00
Avi Kivity	84ef30752f	Update seastar submodule * seastar e708d1df3a...78f626af6c (1): > reactor: don't mlock all memory at once Fixes #6460.	2020-05-31 13:34:42 +03:00
Avi Kivity	f1b71ec216	Point seastar submodule at scylla-seastar.git This allows us to backport seastar patches to the 4.1 branch.	2020-05-31 13:34:42 +03:00
Piotr Sarna	93ed536fba	alternator: wait for schema agreement after table creation In order to be sure that all nodes acknowledged that a table was created, the CreateTable request will now only return after seeing that schema agreement was reached. Rationale: alternator users check if the table was created by issuing a DescribeTable request, and assume that the table was correctly created if it returns nonempty results. However, our current implementation of DescribeTable returns local results, which is not enough to judge if all the other nodes acknowledge the new table. CQL drivers are reported to always wait for schema agreement after issuing DDL-changing requests, so there should be no harm in waiting a little longer for alternator's CreateTable as well. Fixes #6361 Tests: alternator(local) (cherry picked from commit `5f2eadce09`)	2020-05-31 13:18:11 +03:00
Nadav Har'El	ab3da4510c	docs, alternator: improve description of status of global tables support The existing text did not explain what happens if additional DCs are added to the cluster, so this patch improves the explanation of the status of our support for global tables, including that issue. Fixes #6353 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200513175908.21642-1-nyh@scylladb.com> (cherry picked from commit `f3fd976120`)	2020-05-31 13:13:13 +03:00
Asias He	bb8fcbff68	repair: Abort the queue in write_end_of_stream in case of error In write_end_of_stream, it does: 1) Write write_partition_end 2) Write empty mutation_fragment_opt If 1) fails, 2) will be skipped, the consumer of the queue will wait for the empty mutation_fragment_opt forever. Found this issue when injecting random exceptions between 1) and 2). Refs #6272 Refs #6248 (cherry picked from commit `b744dba75a`)	2020-05-27 20:11:30 +03:00
Hagit Segev	af43d0c62d	release: prepare for 4.1.rc1 scylla-4.1.rc1	2020-05-26 18:57:30 +03:00
Amnon Heiman	8c8c266f67	storage_service: get_range_to_address_map prevent use after free The implementation of get_range_to_address_map has a default behaviour, when getting an empty keypsace, it uses the first non-system keyspace (first here is basically, just a keyspace). The current implementation has two issues, first, it uses a reference to a string that is held on a stack of another function. In other word, there's a use after free that is not clear why we never hit. The second, it calls get_non_system_keyspaces twice. Though this is not a bug, it's redundant (get_non_system_keyspaces uses a loop, so calling that function does have a cost). This patch solves both issues, by chaning the implementation to hold a string instead of a reference to a string. Second, it stores the results from get_non_system_keyspaces and reuse them it's more efficient and holds the returned values on the local stack. Fixes #6465 Signed-off-by: Amnon Heiman <amnon@scylladb.com> (cherry picked from commit `69a46d4179`)	2020-05-25 12:48:11 +03:00
Nadav Har'El	6d1301d93c	alternator: better error messages when 'forbid_rmw' mode is on When the 'forbid_rmw' write isolation policy is selected, read-modify-write are intentionally forbidden. The error message in this case used to say: "Read-modify-write operations not supported" Which can lead users to believe that this operation isn't supported by this version of Alternator - instead of realizing that this is in fact a configurable choice. So in this patch we just change the error message to say: "Read-modify-write operations are disabled by 'forbid_rmw' write isolation policy. Refer to https://github.com/scylladb/scylla/blob/master/docs/alternator/alternator.md#write-isolation-policies for more information." Fixes #6421. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200518125538.8347-1-nyh@scylladb.com> (cherry picked from commit `5ef9854e86`)	2020-05-25 08:49:48 +03:00
Tomasz Grabiec	be545d6d5d	sstables: index_reader: Fix overflow when calculating promoted index end When index file is larger than 4GB, offset calculation will overflow uint32_t and _promoted_index_end will be too small. As a result, promoted_index_size calculation will underflow and the rest of the page will be interpretd as a promoted index. The partitions which are in the remainder of the index page will not be found by single-partition queries. Data is not lost. Introduced in `6c5f8e0eda`. Fixes #6040 Message-Id: <20200521174822.8350-1-tgrabiec@scylladb.com> (cherry picked from commit `a6c87a7b9e`)	2020-05-24 09:45:42 +03:00
Rafael Ávila de Espíndola	a1c15f0690	repair: Make sure sinks are always closed In a recent next failure I got the following backtrace function=function@entry=0x270360 "seastar::rpc::sink_impl<Serializer, Out>::~sink_impl() [with Serializer = netw::serializer; Out = {repair_row_on_wire_with_cmd}]") at assert.c:101 at ./seastar/include/seastar/core/shared_ptr.hh:463 at repair/row_level.cc:2059 This patch changes a few functions to use finally to make sure the sink is always closed. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200515202803.60020-1-espindola@scylladb.com> (cherry picked from commit `311fbe2f0a`) Ref #6414	2020-05-20 09:00:10 +03:00
Asias He	4d68c53389	repair: Fix race between write_end_of_stream and apply_rows Consider: n1, n2, n1 is the repair master, n2 is the repair follower. === Case 1 === 1) n1 sends missing rows {r1, r2} to n2 2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1 is written to sstable, r2 is not written yet, r1 belongs to partition 1, r2 belongs to partition 2. It yields after row r1 is written. data: partition_start, r1 3) n1 sends repair_row_level_stop to n2 because error has happened on n1 4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream() data: partition_start, r1, partition_end 5) Step 2 resumes to apply the rows. data: partition_start, r1, partition_end, partition_end, partition_start, r2 === Case 2 === 1) n1 sends missing rows {r1, r2} to n2 2) n2 runs apply_rows_on_follower to apply rows, e.g., {r1, r2}, r1 is written to sstable, r2 is not written yet, r1 belongs to partition 1, r2 belongs to partition 2. It yields after partition_start for r2 is written but before _partition_opened is set to true. data: partition_start, r1, partition_end, partition_start 3) n1 sends repair_row_level_stop to n2 because error has happened on n1 4) n2 calls wait_for_writer_done() which in turn calls write_end_of_stream(). Since _partition_opened[node_idx] is false, partition_end is skipped, end_of_stream is written. data: partition_start, r1, partition_end, partition_start, end_of_stream This causes unbalanced partition_start and partition_end in the stream written to sstables. To fix, serialize the write_end_of_stream and apply_rows with a semaphore. Fixes: #6394 Fixes: #6296 Fixes: #6414 (cherry picked from commit `b2c4d9fdbc`)	2020-05-20 08:07:53 +03:00
Piotr Dulikowski	7d1f352be2	hinted handoff: don't keep positions of old hints in rps_set When sending hints from one file, rps_set field in send_one_file_ctx keeps track of commitlog positions of hints that are being currently sent, or have failed to be sent. At the end of the operation, if sending of some hints failed, we will choose position of the earliest hint that failed to be sent, and will retry sending that file later, starting from that position. This position is stored in _last_not_complete_rp. Usually, this set has a bounded size, because we impose a limit of at most 128 hints being sent concurrently. Because we do not attempt to send any more hints after a failure is detected, rps_set should not have more than 128 elements at a time. Due to a bug, commitlog positions of old hints (older than gc_grace_seconds of the destination table) were inserted into rps_set but not removed after checking their age. This could cause rps_set to grow very large when replaying a file with old hints. Moreover, if the file mixed expired and non-expired hints (which could happen if it had hints to two tables with different gc_grace_seconds), and sending of some non-expired hints failed, then positions of expired hints could influence calculation _last_not_complete_rp, and more hints than necessary would be resent on the next retry. This simple patch removes commitlog position of a hint from rps_set when it is detected to be too old. Fixes #6422 (cherry picked from commit `85d5c3d5ee`)	2020-05-20 08:05:51 +03:00
Piotr Dulikowski	0fe5335447	hinted handoff: remove discarded hint positions from rps_set Related commit: `85d5c3d` When attempting to send a hint, an exception might occur that results in that hint being discarded (e.g. keyspace or table of the hint was removed). When such an exception is thrown, position of the hint will already be stored in rps_set. We are only allowed to retain positions of hints that failed to be sent and needed to be retried later. Dropping a hint is not an error, therefore its position should be removed from rps_set - but current logic does not do that. Because of that bug, hint files with many discardable hints might cause rps_set to grow large when the file is replayed. Furthermore, leaving positions of such hints in rps_set might cause more hints than necessary to be re-sent if some non-discarded hints fail to be sent. This commit fixes the problem by removing positions of discarded hints from rps_set. Fixes #6433 (cherry picked from commit `0c5ac0da98`)	2020-05-20 08:03:20 +03:00
Avi Kivity	8a026b8b14	Revert "compaction_manager: allow early aborts through abort sources." This reverts commit `e8213fb5c3`. It results in an assertion failure in remove_index_file_test. Fixes #6413. (cherry picked from commit `5b971397aa`)	2020-05-13 18:26:34 +03:00
Yaron Kaikov	0760107b9f	release: prepare for 4.1.rc0 scylla-4.1.rc0	2020-05-11 11:32:01 +03:00
Nadav Har'El	7da949026d	doc, alternator: shorten description of "tags" compatibility The "current compatibility with DynamoDB" section in alternator.md is where we should list very briefly our state of compatibility - it's not the right place to explain implementation details or track obscure bugs. I've significantly shortened the "Tags" section because, in brief, we do fully support tags and should say that we do. I moved the two bugs mentioned in the text into the bug tracker: Refs #6389 Refs #6391 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200507125022.22608-1-nyh@scylladb.com>	2020-05-07 17:48:34 +02:00
Tomasz Grabiec	2078016f84	test: memory_footprint: Avoid invalid identifiers as columnnames Column name should not start with a digit, as can be the case with random_string(). Message-Id: <1588860648-15796-1-git-send-email-tgrabiec@scylladb.com>	2020-05-07 17:33:34 +03:00
Pavel Emelyanov	ef181fb2d0	test: Add option to flush memtables for perf_simple_query The test in question measures the speed of memtables, not the row_cache. With this option it can do both. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200507140603.12350-1-xemul@scylladb.com>	2020-05-07 16:09:40 +02:00
Ivan Prisyazhnyy	84e25e8ba4	api: support table auto compaction control The patch implements: - /storage_service/auto_compaction API endpoint - /column_family/autocompaction/{name} API endpoint Those APIs allow to control and request the status of background compaction jobs for the existing tables. The implementation introduces the table::_compaction_disabled_by_user. Then the CompactionManager checks if it can push the background compaction job for the corresponding table. New members === table::enable_auto_compaction(); table::disable_auto_compaction(); bool table::is_auto_compaction_disabled_by_user() const Test === Tests: unit(sstable_datafile_test autocompaction_control_test), manual $ ninja build/dev/test/boost/sstable_datafile_test $ ./build/dev/test/boost/sstable_datafile_test --run_test=autocompaction_control_test -- -c1 -m2G --overprovisioned --unsafe-bypass-fsync 1 --blocked-reactor-notify-ms 2000000 The test tries to submit a compaction job after playing with autocompaction control table switch. However, there is no reliable way to hook pending compaction task. The code assumed that with_scheduling_group() closure will never preempt execution of the stats check. Revert === Reverts commit `c8247ac`. In previous version the execution sometimes resulted into the following error: test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test": critical check cm->get_stats().pending_tasks == 1 \|\| cm->get_stats().active_tasks == 1 has failed This version adds a few sstables to the cf, starts the compaction and awaits until it is finished. API change === - `/column_family/autocompaction/` always returned `true` while answering to the question: if the autocompaction disabled (see https://github.com/scylladb/scylla-jmx/blob/master/src/main/java/org/apache/cassandra/db/ColumnFamilyStore.java#L321). now it answers to the question: if the autocompaction for specific table is enabled. The question logic is inverted. The patch to the JMX is required. However, the change is decent because all old values were invalid (it always reported all compactions are disabled). - `/column_family/autocompaction/` got support for POST/DELETE per table Fixes === Fixes #1488 Fixes #1808 Fixes #440 Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com> Reviewed-by: Glauber Costa <glauber@scylladb.com>	2020-05-07 16:23:38 +03:00
Nadav Har'El	e9aa1173e0	doc, alternator: better documentation for write isolation policies Alternator supports four different write isolation policies, the default being to do all the writes with LWT, but these policies were only briefly explained in alternator.md. This patch significantly expands on this explanation, better explaining the tradeoffs involved in these four options, and when each might make sense (if at all). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200506235152.18190-1-nyh@scylladb.com>	2020-05-07 13:59:38 +02:00
Nadav Har'El	f12989ff73	alternator/test: minor cleanup in test_key_condition_expression.py Some minor cleanups, mostly comments, in test_key_condition_expression.py Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200506212849.16207-1-nyh@scylladb.com>	2020-05-07 13:58:44 +02:00
Botond Dénes	791acc7f38	sstables: sstable_reader: fix read range upper bound calculation for reverse slices The single-key sstable reader uses the clustering ranges from the slice to determine the upper bound of the disk read-range using the index. For this is simply uses the end bound of the last clustering ranges. For reverse reads however the clustering ranges in the slice are in reverse order, so this will in fact be the upper bound of the smallest range. Depending on whether the distance between the clustering range is big enough for the sstable reader to use the index to skip between them, this will lead to either reading too little data or an assert failure. This patch fixes the problematic function `get_slice_upper_bound()` to consider reverse reads as well. Initially I thought there will be more mishandling of reverse slices, but actually `mutation_fragment_filter`, the component doing the actual slicing of rows, is already reverse-slice aware. A unit test which reproduces the assert failure is also added. Fixes: #6171 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200507114956.271799-1-bdenes@scylladb.com>	2020-05-07 14:52:04 +03:00
Avi Kivity	bef8e5e930	Merge "Don't invalidate row cache when adding GC SStable to SSTable Set" from Raphael " Garbage collected SSTables, created by incremental compaction process, are being added to the SSTable set using a function that invalidates row cache using the range of the SSTable itself. That's incorrect because data in GC SSTables come from preexisting SSTables in set, meaning the state of data isn't changed and so no need for invalidation at all. Incorrect invalidation like this is a source of read performance issues. This problem is fixed by including GC SSTables to the descriptor which is used to specify changes to the SSTable set, which is the correct thing to do given that a midway failure could leave the set in an incorrect state. Fixes #5956. Fixes #6275. tests: unit(dev) " * 'fix_issue_5956_v4' of github.com:raphaelsc/scylla: sstables/compaction: Don't invalidate row cache when adding GC SSTable to SSTable set sstables/compaction: Change meaning of compaction_completion_desc input and output fields sstables/compaction: Clean up code around garbage_collected_sstable_writer	2020-05-07 14:10:49 +03:00
Glauber Costa	e8213fb5c3	compaction_manager: allow early aborts through abort sources. The shutdown process of compaction manager starts with an explicit call from the database object. However that can only happen everything is already initialized. This works well today, but I am soon to change the resharding process to operate before the node is fully ready. One can still stop the database in this case, but reshardings will have to finish before the abort signal is processed. This patch passes the existing abort source to the construction of the compaction_manager and subscribes to it. If the abort source is triggered, the compaction manager will react to it firing and all compactions it manages will be stopped. We still want the database object to be able to wait for the compaction manager, since the database is the object that owns the lifetime of the compaction manager. To make that possible we'll use a future that is return from stop(): no matter what triggered the abort, either an early abort during initial resharding or a database-level event like drain, everything will shut down in the right order. The abort source is passed to the database, who is responsible from constructing the compaction manager. Tests: unit (dev), manual start+stop, manual drain + stop Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200506184749.98288-1-glauber@scylladb.com>	2020-05-07 13:24:47 +03:00
Asias He	71d0d58f8c	Revert "config: Do not enable repair based node operations by default" This reverts commit `b8ac10c451`. The repair based node operations will be enabled by default in 4.1. Revert the patch which disables it by default.	2020-05-07 13:17:35 +03:00
Avi Kivity	fbf2194b31	Merge 'cql3: Fix detection of bound variables in tuples' from Juliusz This is unrelated to counters, but happens to fix #4209 `tuple::delayed_value::contains_bind_marker` used to check that ALL terms are bound (not that ANY of them is bound). As a result, scylla would crash in prepare codepath for collections of tuples. After this fix `invalid_request_exception` is thrown instead. * jul-stas-4209-crash-on-counter-shards-set: boost/tests: test for bound variable in a list of tuple literals cql3: fix detection of bound variables in tuples	2020-05-07 13:13:51 +03:00
Botond Dénes	2e09a0317c	types, compound: pass std::current_exception() to on_internal_error() So that nested exceptions are not lost. Also, marshal exceptions, the ones we have in these places, already have a backtrace, so might as well use that, instead of creating a new one, loosing unwound frames. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200507091405.244544-1-bdenes@scylladb.com>	2020-05-07 11:25:25 +02:00
Juliusz Stasiewicz	7b48d8c33c	boost/tests: test for bound variable in a list of tuple literals This test checks that the list literals of tuples with some (but not all!) bind markers are rejected.	2020-05-07 11:03:53 +02:00
Pavel Solodovnikov	55d89d2cbe	lwt: add cql tests to test delete+insert behavior on the same row in one batch Add a couple of cql tests regarding conditional batches: 1. Verify that "delete" takes priority over "insert" when applied to the same row within the same batch. 2. Test that a workaround for the issue works as expected (i.e. delete only individual cells instead of the full record). Tests: unit(dev) Fixes: #6273 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200506201200.176590-1-pa.solodovnikov@scylladb.com>	2020-05-07 10:53:22 +02:00
Tomasz Grabiec	b0f2d2bee0	Merge "lwt: fix linearisability issues with reads and writes with non met conditions" form Gleb Fixes #6299.	2020-05-07 10:49:01 +02:00
Juliusz Stasiewicz	b46d7cf8d1	cql3: fix detection of bound variables in tuples `tuple::delayed_value::contains_bind_marker` used to check that ALL terms are bound (not that ANY of them is bound). As a result, scylla would crash in prepare codepath for collections. After this fix `invalid_request_exception` is thrown instead. Fixes #4209	2020-05-07 10:44:52 +02:00
Benny Halevy	b2f50224d9	table: database_sstable_write_monitor: revert charges in destructor We must unregister the monitor upon destruction to prevent use-after-free from `compaction_backlog_tracker::backlog` path. This is similar to ~compaction_read_monitor as implemented in commit `ca284174d0` Fixes #6385 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200506214419.569655-1-bhalevy@scylladb.com>	2020-05-07 10:39:39 +02:00
Nadav Har'El	0214f0ad60	main: really enable the "--start-native-transport" option In commit `da3bf20e71` we supposedly enabled support for Cassandra's "start_native_transport" option which can be set to 0 to run Scylla without listening on the CQL port. This can be useful, for example, if a user only want the DynamoDB or Redis APIs but not CQL. Unfortunately, the option was still marked "Unused", so it wasn't really enabled as a valid command line option. This patch fixes that, and documents the start_native_transport option in docs/protocols.md, where we document the different protocols, ports, and options to configure them. Fixes #6387. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200506174850.13616-1-nyh@scylladb.com>	2020-05-07 11:09:18 +03:00
Avi Kivity	2b0c317dec	test: lib: exception_utils: fix crash with fmt-6.2.0 fmt, the formatting library we use, detects types with conversion to std::string_view (and formats them as strings) and types that support operator<<(std::ostream, const T&) (and performs custom formatting on them). However, if <fmt/ostream.h>, the latter is not done. The problem happens with seastar::sstring, which implements both, and debug mode, which disables inlining. Some translation units do include <fmt/ostream.h>, and so generate code to do custom formatting. exception_utils.cc doesn't, and so generates code to format via string_view conversion. At link time, the compiler picks one of the generated functions and includes it in the final binary; it happened to pick one generated outside exception_utils.cc, using custom formatting. However, there is also code in fmt to encode which path fmt chose - string_view or custom. This code is constexpr and so is evaluated in exception_utils.cc. The result is that the function to perform formatting of seastar::sstring uses custom formatting, while the descriptor containing the method used says it is formatting via string_view. This is enough to cause a crash. The problem is limited to debug mode, since in other modes all this code is inlined, and so is consistent within the translation unit. We need a more general fix (hopefully in fmt), but for now a simple fix is to add the missing include. Ref https://github.com/fmtlib/fmt/issues/1662	2020-05-07 08:59:02 +03:00
Avi Kivity	6f1a8cfeea	Merge 'Use special partitioner for CDC Log' from Piotr " CDC has to create CDC streams that are co-located with corresponding BaseTable data. This is not always easy. Especially for small vnodes. This PR introduces new partitioner which allows us to easily find such stream ids that the stream belongs to a given vnode and shard. The idea is that a partitioner accepts only keys that are a blob composed of two int64 numbers. The first number is the token of the key. Tests: unit(dev), dtests(CDC) " * haaawk-cdc_partitioner: cdc:use CDCPartitioner for CDC Log dht: Add find_first_token_for_shard dht: use long_token in token::to_int64 cdc: add CDCPartitioner stream_id: add token_from_bytes static function i_partitioner: Stop distinguishing whether keys order is preserved	2020-05-06 20:29:27 +03:00
Pavel Solodovnikov	1d3f9174c5	cql3: avoid using shared_ptr's in unrecognized_entity_exception Using shared_ptr's in `unrecognized_entity_exception` can lead to cross-cpu deletion of a pointer which will trigger an assert `_cpu == std::this_thread::get_id()' when shared_ptr is disposed. Copy `column_identifier` to the exception object and avoid using an instance of `cql3::relation`: just get a string representation from it since nothing more is used in associated exception handling code. Fixes: #6287 Tests: unit(dev, debug), dtest(lwt_destructive_ddl_test.py:LwtDestructiveDDLTest.test_rename_column) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200506155714.150497-1-pa.solodovnikov@scylladb.com>	2020-05-06 19:02:36 +03:00
Piotr Sarna	f48e414eab	db, view: remove duplicate entries from pending endpoints When generating view updates, an endpoint can appear both as a primary paired endpoint for the view update, and as a pending endpoint (due to range movements). In order not to generate the same update twice for the same endpoint, the paired endpoint is removed from the list of pending endpoints if present. Fixes #5459 Tests: unit(dev), dtest(TestMaterializedViews.add_dc_during_mv_insert_test)	2020-05-06 16:42:56 +03:00
Benny Halevy	682fb3acfd	api: storage_service: serialize true_snapshot_size Following up on `91b71a0b1a` We also need to serialize storage_service::true_snapshots_size with snapshot-modifying operations. It seems like it was assumed that get_snapshot_details is done under run_snapshot_list_operation, but the one called here is the table method, not the api::storage_service::get_snapshot_details. Fixes #5603 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200506115732.483966-1-bhalevy@scylladb.com>	2020-05-06 15:33:38 +03:00
Pavel Solodovnikov	b183530f2c	cql3: use lw_shared_ptr instead of shared_ptr for column_condition Both `cql3::column_condition` and `cql3::column_condition::raw` classes are marked as `final`: it's safe to use lw_shared_ptr instead of generic `seastar::shared_ptr`. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200428202249.82785-1-pa.solodovnikov@scylladb.com>	2020-05-06 13:11:07 +03:00
Nadav Har'El	ddb483461a	test/alternator: xfailing tests for FilterExpression feature This patch adds a comprehensive, hopefully complete, test for the yet-unimplemented FilterExpression feature. FilterExpression is the modern syntax which allows filtering the results of Query and Scan requests. The patch includes 50 tests spanning more than 700 lines of code, testing (hopefully) all the various FilterExpression features, sub-cases, syntax peculiarities, and so on. As usual, all included tests pass when run against DynamoDB ("pytest --aws") and xfail when run against Scylla. This test should be helpful to understand how to implement FilterExpression correctly, as well as test the future implementation. Refs #5038. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200503165639.15320-1-nyh@scylladb.com>	2020-05-06 12:56:20 +03:00
Botond Dénes	6de51db84a	tools: introduce scylla_types We often have to examine raw values, obtained from various sources, like sstables, logs and coredumps. For some types it is quite simple to convert raw hex values to human readable ones manually (integers), for others it is very hard or simply not practical. This command-line tool aims to ease working with raw values, by providing facilities to print them in human readable form and compare them. We can extend it with more functions as needed. Examples: $ scylla_types -a print -t Int32Type b34b62d4 -1286905132 $ scylla_types -a compare -t 'ReversedType(TimeUUIDType)' b34b62d46a8d11ea0000005000237906 d00819896f6b11ea00000000001c571b b34b62d4-6a8d-11ea-0000-005000237906 > d0081989-6f6b-11ea-0000-0000001c571b Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200505124914.104827-1-bdenes@scylladb.com>	2020-05-06 12:56:20 +03:00
Avi Kivity	bf2ab10b6a	Update seastar submodule * seastar 3c2e27811...e708d1df3 (10): > Merge "Fix a few issues found by clang's asan" from Rafael > seastar: app_template: allow a description to be provided for the app > membarrier: fix madvise(MADV_DONTNEED) failure and crash with --lock-memory Fixes #6346 > rpc::compressor: Fix static init fiasco with names > fair_queue: express all internal fair_queue quantities as fair_queue_tickets > net: remove API v1 compatibility layer (variadic future in networking) > testing: Move parts of the exchanger out of line > on_internal_error: add overload taking an std::exception_ptr > tuple_utils: Add a missing include > Merge "Fix use of uninitialized found by valgrind" from Rafael	2020-05-06 12:56:20 +03:00
Raphael S. Carvalho	a214ccdf89	sstables/compaction: Don't invalidate row cache when adding GC SSTable to SSTable set Garbage collected SSTable is incorrectly added to SSTable set with a function that invalidates row cache. This problem is fixed by adding GC SStable to set using mechanism which replaces old sstables with new sstables. Also, adding GC SSTable to set in a separate call is not correct. We should make sure that GC SSTable reaches the SSTable set at the same time its respective old (input) SSTable is removed from the set, and that's done using a single request call to table. Fixes #5956. Fixes #6275. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-05-05 12:03:19 -03:00
Raphael S. Carvalho	8f4458f1d5	sstables/compaction: Change meaning of compaction_completion_desc input and output fields input_sstables is renamed to old_sstables and is about old SSTables that should be deleted and removed from the SSTable set. output_sstables is renamed to new_sstables and is about new SSTable that should be added to the SSTable set, replacing the old ones. This will allow us, for example, to add auxiliary SSTables to SSTable set using the same call which replaces output SSTables by input SSTables in compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-05-05 12:03:08 -03:00

1 2 3 4 5 ...

22037 Commits