scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 20:57:00 +00:00

Author	SHA1	Message	Date
Botond Dénes	f0281eaa98	db/virtual_table: remove _db member This member is potentially dangerous as it only becomes non-null sometimes after the virtual table object is constructed. This is asking for nullptr dereference. Instead, remove this member and have virtual table implementations that need a db, ask for it in the constructor, it is available in `register_virtual_tables()` now.	2021-11-05 15:42:41 +02:00
Botond Dénes	200e2fad4d	db/system_keyspace: propagate distributed<> database and storage_service to register_virtual_tables() As some virtual tables will need the distributed versions of these.	2021-11-05 15:42:41 +02:00
Botond Dénes	185c5f1f5b	docs/design-notes/system_keyspace.md: add listing of existing virtual tables As well as a link to the newly added docs/guides/virtual-tables.md	2021-11-05 15:42:39 +02:00
Botond Dénes	b8c156d4f7	docs/guides: add virtual-tables.md Explaining what virtual tables are, what are good candidates for virtual tables and how you can write one.	2021-11-05 11:49:27 +02:00
Raphael S. Carvalho	4950ce539c	schema: replace outdated comment on default compaction strategy Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211104210043.199156-1-raphaelsc@scylladb.com>	2021-11-05 00:35:41 +02:00
Nadav Har'El	5e52858295	rjson, alternator: rename set() functions add() The rjson::set() sounds like it can set any member of a JSON object (i.e., map), but that's not true :-( It calls the RapidJson function AddMember() so it can only add a member to an object which doesn't have a member with the same name (i.e., key). If it is called with a key that already has a value, the result may have two values for the same key, which is ill-formed and can cause bugs like issue #9542. So in this patch we begin by renaming rjson::set() and its variant to rjson::add() - to suggest to its user that this function only adds members, without checking if they already exist. After this rename, I was left with dozens of calls to the set() functions that need to changed to either add() - if we're sure that the object cannot already have a member with the same name - or to replace() if it might. The vast majority of the set() calls were starting with an empty item and adding members with fixed (string constant) names, so these can be trivially changed to add(). It turns out that all other set() calls - except the one fixed in issue #9542 - can also use add() because there are various "excuses" why we know the member names will be unique. A typical example is a map with column-name keys, where we know that the column names are unique. I added comments in front of such non-obvious uses of add() which are safe. Almost all uses of rjson except a handful are in Alternator, so I verified that all Alternator test cases continue to pass after this patch. Fixes #9583 Refs #9542 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211104152540.48900-1-nyh@scylladb.com>	2021-11-04 16:35:38 +01:00
Nadav Har'El	b95e431228	alternator: fix bug in ReturnValues=ALL_NEW This patch fixes a bug in UpdateItem's ReturnValues=ALL_NEW, which in some cases returned the OLD (pre-modification) value of some of the attributes, instead of its NEW value. The bug was caused by a confusion in our JSON utility function, rjson::set(), which sounds like it can set any member of a map, but in fact may only be used to add a new member - if a member with the same name (key) already existed, the result is undefined (two values for the same key). In ReturnValues=ALL_NEW we did exactly this: we started with a copy of the original item, and then used set() to override some of the members. This is not allowed. So in this patch, we introduce a new function, rjson::replace(), which does what we previously thought that rjson::set() does - i.e., replace a member if it exists, or if not, add it. We call this function in the ReturnValues=ALL_NEW code. This patch also adds a test case that reproduces the incorrect ALL_NEW results - and gets fixed by this patch. In an upcoming patch, we should rename the confusingly-named set() functions and audit all their uses. But we don't do this in this patch yet. We just add some comments to clarify what set() does - but don't change it, and just add one new function for replace(). Fixes #9542 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211104134937.40797-1-nyh@scylladb.com>	2021-11-04 16:34:58 +01:00
Pavel Emelyanov	6e97d2ce87	Merge branch 'compaction_cleanup_and_improvements_v2' from Raphael S. Carvalho Cleanup and improvements for compaction * 'compaction_cleanup_and_improvements_v2' of https://github.com/raphaelsc/scylla: compaction: fix outdated doc of compact_sstables() table: fix indentation in compact_sstables() table: give a more descriptive name to compaction_data in compact_sstables() compaction_manager: rename submit_major_compaction to perform_major_compaction compaction: fix indentantion in compaction.hh compaction: move incremental_owned_ranges_checker into cleanup_compaction compaction: make owned ranges const in cleanup_compaction compaction: replace outdated comment in regular_compaction compaction: give a more descriptive name to compaction_data compaction_manager: simplify creation of compaction_data	2021-11-04 17:27:07 +03:00
Raphael S. Carvalho	132a840ed5	compaction: fix outdated doc of compact_sstables() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-04 11:09:24 -03:00
Raphael S. Carvalho	98dd57113f	table: fix indentation in compact_sstables() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-04 11:09:24 -03:00
Raphael S. Carvalho	51aa79e267	table: give a more descriptive name to compaction_data in compact_sstables() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-04 11:09:24 -03:00
Raphael S. Carvalho	8ce9cda391	compaction_manager: rename submit_major_compaction to perform_major_compaction for symmetry, let's call it perform_* as it doesn't work like submission functions which doesn't wait for result, like the one for minor compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-04 09:54:00 -03:00
Raphael S. Carvalho	0d745912d0	compaction: fix indentantion in compaction.hh Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-04 09:50:46 -03:00
Raphael S. Carvalho	5af9a690c1	compaction: move incremental_owned_ranges_checker into cleanup_compaction let's move checker into cleanup as it's not needed elsewhere. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-04 09:49:44 -03:00
Raphael S. Carvalho	04ef2124c6	compaction: make owned ranges const in cleanup_compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-04 09:47:12 -03:00
Raphael S. Carvalho	d86c2491d4	compaction: replace outdated comment in regular_compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-04 09:45:34 -03:00
Raphael S. Carvalho	b344db1696	compaction: give a more descriptive name to compaction_data info is no longer descriptive, as compaction now works with compaction_data instead of compaction_info. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-04 09:43:08 -03:00
Raphael S. Carvalho	63dc4e2107	compaction_manager: simplify creation of compaction_data there's no need for wrapping compaction_data in shared_ptr, also let's kill unused params in create_compaction_data to simplify its creation. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-04 09:35:49 -03:00
Takuya ASADA	9b4cf8c532	scylla_util.py: On is_gce(), return False when it's on GKE GKE metadata server does not provide same metadata as GCE, we should not return True on is_gce(). So try to fetch machine-type from metadata server, return False if it 404 not found. Fixes #9471 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Closes #9582	2021-11-04 12:49:06 +02:00
Avi Kivity	a64458e71a	Merge "Run test cases in parallel by default" from Pavel E " Some time ago there was introduced the --parallel-cases option that was set to False by default. Now everything is ready for making it True. Running in a BYO job shows that it takes 30 minutes less to complete the debug tests. Other timings remain almost the same. tests: unit(dev), unit(debug) " * 'br-parallel-cases-by-default' of https://github.com/xemul/scylla: test.py: Run parallel cases by default test, raft: Keep many-400 case out of debug mode test.py: Cache collected test-cases	2021-11-04 10:10:08 +02:00
Pavel Emelyanov	d1679b66f2	test.py: Run parallel cases by default There were few missing bits before making this the default. - default max number of AIOs, now tests are run with the greatly reduced value - 1.5 hours single case from database_test, now it's split and scales with --parallel-cases - suite add_test methods called in a loop for --repeat options, patch #1 from this set fixes it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-04 10:47:13 +03:00
Pavel Emelyanov	12cf69e5f5	test, raft: Keep many-400 case out of debug mode This case takes 45+ minutes which is 1.5 times longer then the second longest case out there. I propose to keep the many-400 case out of debug runs, there's many-100 one which is configured the same way but uses 4x times less "nodes". Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-04 10:47:13 +03:00
Pavel Emelyanov	0d0ccd50b5	test.py: Cache collected test-cases The add_test method of a siute can be called several times in a row e.g. in case of --repeat option or because there are more than one custom_args entries in the suite.yaml file. In any case it's pointless to re-collect the test cases by launching the test binary again, it's much faster (and 100% safe) to keep the list of cases from the previous call and re-use it if the test name matches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-04 10:47:13 +03:00
Avi Kivity	e1817b536f	build: clobber user/group info from node_exporter tarball node_exporter is packaged with some random uid/gid in the tarball. When extracting it as an ordinary user this isn't a problem, since the uid/gid are reset to the current user, but that doesn't happen under dbuild since `tar` thinks the current user is root. This causes a problem if one wants to delete the build directory later, since it becomes owned by some random user (see /etc/subuid) Reset the uid/gid infomation so this doesn't happen. Closes #9579	2021-11-04 09:27:13 +02:00
Raphael S. Carvalho	ab0217e30e	compaction: Improve overall efficiency by not diluting it with relatively inefficient jobs Compaction efficiency can be defined as how much backlog is reduced per byte read or written. We know a few facts about efficiency: 1) the more files are compacted together (the fan-in) the higher the efficiency will be, however... 2) the bigger the size difference of input files the worse the efficiency, i.e. higher write amplification. so compactions with similar-sized files are the most efficient ones, and its efficiency increases with a higher number of files. However, in order to not have bad read amplification, number of files cannot grow out of bounds. So we have to allow parallel compaction on different tiers, but to avoid "dilution" of overall efficiency, we will only allow a compaction to proceed if its efficiency is greater than or equal to the efficiency of ongoing compactions. By the time being, we'll assume that strategies don't pick candidates with wildly different sizes, so efficiency is only calculated as a function of compaction fan-in. Now when system is under heavy load, then fan-in threshold will automatically grow to guarantee that overall efficiency remains stable. Please note that fan-in is defined in number of runs. LCS compaction on higher levels will have a fan-in of 2. Under heavy load, it may happen that LCS will temporarily switch to size-tiered mode for compaction to keep up with amount of data being produced. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211103215110.135633-2-raphaelsc@scylladb.com>	2021-11-03 20:03:23 +02:00
Raphael S. Carvalho	0db70a8d98	compaction: STCS: pick bucket with largest fan-in instead STCS is considering the smallest bucket, out of the ones which contain more than min_threshold elements, to be the most interesting one to compact now. That's basically saying we'll only compact larger tiers once we're done with smaller ones. That can be problematic because under heavy load, larger tiers cannot be compacted in a timely manner even though they're the ones contributing the most to read amplification. For example, if we're producing sstables in smaller tiers at roughly the same rate that we can compact them, then it may happen that larger tiers will not be compacted even though new sstables are being pushed to them. Therefore, backlog will not be reduced in a satisfactory manner, so read latency is affected. By picking the bucket with largest fan-in instead, we'll choose the most efficient compaction, as we'll target buckets which can reduce more from backlog once compacted. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211103215110.135633-1-raphaelsc@scylladb.com>	2021-11-03 20:03:19 +02:00
Raphael S. Carvalho	e9cb56cd81	table: Adjust partition estimation for segregation on memtable flush If memtable flush is segregated into multiple files, partition estimation becomes innacurate and consequently bloom filters are bigger than needed, leading to an increase in memory consumption. To fix this, let's wire adjust_partition_estimate() into the flush procedure, such that original estimation will be adjusted if segregation is going to be performed. That's done by feeding mutation_source_metadata, which will leave original estimation unchanged if no segregation is needed, but will adjust it otherwise. Fixes #9581. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211103141600.65806-2-raphaelsc@scylladb.com>	2021-11-03 17:51:03 +02:00
Raphael S. Carvalho	2340cfa957	memtable-sstable: Extend interface to allow adjustment of estimated partitions Without tweaking interface, there was no way to adjust estimated partitions on flush. For example, when segregating a memtable for TWCS, all produced sstables would have an estimation equal to the memtable size, even though each only contains a subset of it, which leads to a significant increase in memory consumption for bloom filters. Subsequent work will use this interface to perform the adjustment. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211103141600.65806-1-raphaelsc@scylladb.com>	2021-11-03 17:51:03 +02:00
Avi Kivity	08ce119703	Merge "Fix twcs reshape disjoint test case" from Pavel E " There are 3 overlapping problems with the test case. It has use after move that covers wrond window selection and relies on a time-since-epoch being aligned with the time window by chance. tests: unit(dev) " * 'br-twcs-test-fixes' of https://github.com/xemul/scylla: test, compaction: Do not rely on random timestamp test, compaction: Fix use after move in twcs reshape	2021-11-03 17:38:29 +02:00
Asias He	f5f5714aa6	repair: Return HTTP 400 when repiar id is not found There are two APIs for checking the repair status and they behave differently in case the id is not found. ``` {"host": "192.168.100.11:10001", "method": "GET", "uri": "/storage_service/repair_async/system_auth?id=999", "duration": "1ms", "status": 400, "bytes": 49, "dump": "HTTP/1.1 400 Bad Request\r\nContent-Length: 49\r\nContent-Type: application/json\r\nDate: Wed, 03 Nov 2021 10:49:33 GMT\r\nServer: Seastar httpd\r\n\r\n{\"message\": \"unknown repair id 999\", \"code\": 400}"} {"host": "192.168.100.11:10001", "method": "GET", "uri": "/storage_service/repair_status?id=999&timeout=1", "duration": "0ms", "status": 500, "bytes": 49, "dump": "HTTP/1.1 500 Internal Server Error\r\nContent-Length: 49\r\nContent-Type: application/json\r\nDate: Wed, 03 Nov 2021 10:49:33 GMT\r\nServer: Seastar httpd\r\n\r\n{\"message\": \"unknown repair id 999\", \"code\": 500}"} ``` The correct status code is 400 as this is a parameter error and should not be retried. Returning status code 500 makes smarter http clients retry the request in hopes of server recovering. After this patch: curl -X PGET 'http://127.0.0.1:10000/storage_service/repair_async/system_auth?id=9999' {"message": "unknown repair id 9999", "code": 400} curl -X GET 'http://127.0.0.1:10000/storage_service/repair_status?id=9999' {"message": "unknown repair id 9999", "code": 400} Fixes #9576 Closes #9578	2021-11-03 17:15:40 +02:00
Pavel Emelyanov	9628d72964	test, compaction: Do not rely on random timestamp Again, there's a sub-case with sequential time stamps that still works by chance. This time it's because splitting 256 sstables into buckets of maximum 8 ones is allowed to have the 1st and the last ones with less than 8 items in it, e.g. 3, 8, ..., 8, 5. The exact generation depends on the time-since-epoch at which it starts. When all the cases are run altogether this time luckily happens to be well-aligned with 8-hours and the generated buckets are filled perfectly. When this particular test-case is run all alone (e.g. by --run_test or --parallel-cases) then the starting time becomes different and it gets less than 4 sstables in its first bucket. The fix is in adjusting the starting time to be aligned with the 8 hours window. Actually, the 8 hours appeared in the previous patch, before which it was 24 hours. Nonetheless, the above reasoning applies to any size of the time window that's less than 256, so it's still an independent fix. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-03 15:41:19 +03:00
Pavel Emelyanov	c6ce6b9ca1	test, compaction: Fix use after move in twcs reshape The options are std::move-d twice -- first into schema builder then into compaction strategy. Surprisingly, but the 2nd move makes the test work. There's a sub-case in this case that checks sstables with incremental timestamps with 1 hour step -- 0h, 1h, 2h, ... 255h. Next, the twcs buckets generator obeys a minimal threshold of 4 sstables per bucket. Those with less sstables in are not included in the job. Finally, since the options used to create the twcs are empty after the 1st move the default window of 24 hours is used. If they were configured correctly with 1 hour window then all buckets would contain 1 sstable and the generated job would become empty. So the fix is both -- don't move after move and make the window size large enough to fit more sstables than the mentioned minimum. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-03 15:09:15 +03:00
Piotr Sarna	f36bbe05b4	Merge 'alternator: add support for AttributeUpdates Add operation' from Nadav Har'El In UpdateItem's AttributeUpdates (old-style parameter) we were missing support for the ADD operation - which can increment a number, or add items to sets (or to lists, even though this fact isn't documented). This two-patch series add this missing feature. The first patch just moves an existing function to where we can reuse it, and the second patch is the actual implementation of the feature (and enabling its test). Fixes #5893 Closes #9574 * github.com:scylladb/scylla: alternator: add support for AttributeUpdates ADD operation alternator: move list_concatenate() function	2021-11-03 09:33:50 +01:00
Avi Kivity	075ceb8918	Merge 'AWS: add scylla_io_setup preset parameters for ARM instances' from Takuya ASADA Currently, scylla-server fails to start on ARM instances because scylla_io_setup does not have preset parameters even instance type added to 'supported instance'. To fix this, we need to add io parameter preset on scylla_io_setup. Also, we mistakenly added EBS only instances at `a004b1da30`, need to remove them. Instrances does not have ephemeral disk should be 'unsupported instance', we still run our AMI on it, but we print warning message on login prompt, and user requires to run scylla_io_setup. Fixes #9493 Closes #9532 * github.com:scylladb/scylla: scylla_util.py: remove EBS only ARM instances from support instance list scylla_io_setup: support ARM instances on AWS	2021-11-03 10:19:59 +02:00
Nadav Har'El	00335b1901	alternator: add support for AttributeUpdates ADD operation In UpdateItem's AttributeUpdates (old-style parameter) we were missing support for the ADD operation - which can increment a number, or add items to sets (or to lists, even though this fact isn't documented). This patch adds this feature, and the test for it begins to pass so its "xfail" marker is removed. Fixes #5893 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-11-03 10:19:26 +02:00
Nadav Har'El	7e6c5394f3	alternator: move list_concatenate() function The list_concatenate() function was only used for UpdateExpression's ADD operation, so we made it a static function in the source file where it was used. In the next patch, we'll want to use it in another place (AttributeUpdates' ADD operation), so let's move it to the same file where similar functions for sets exist. This patch is almost entirely a code move, but also makes one small change: list_concatenate() used to throw an exception if one of the arguments wasn't a list, but the text of this exception was specific to UpdateExpression. So in the new version, we return a null value in this case - and the caller checks for it and throws the right exception. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2021-11-03 10:19:26 +02:00
Nadav Har'El	56eb994d8f	alternator: allow Authorization header to be without spaces The "Authorization" HTTP header is used in DynamoDB API to sign requests. Our parser for this header, in server::verify_signature(), required the different components of this header to be separated by a comma followed by a whitespace - but it turns out that in DynamoDB both spaces and commas are optional - one of them is enough. At least one DynamoDB client library - the old "boto" (which predated boto3) - builds this header without spaces. In this patch we add a test that shows that an Authorization header with spaces removed works fine in DynamoDB but didn't work in Alternator, and after this patch modifies the parsing code for this header, the test begins to pass (and the other tests show that the previously-working cases didn't break). Fixes #9568 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20211101214114.35693-1-nyh@scylladb.com>	2021-11-03 06:38:28 +02:00
Takuya ASADA	4a96a8145e	scylla_util.py: remove EBS only ARM instances from support instance list Since we required ephemeral disks for our AMI, these EBS only ARM instances cannot add in it is 'supported instance' list. We still able to run our AMI on these instance types but login message warns it is 'unsupported instance type', and requires to run scylla_io_setup manually.	2021-11-03 10:26:42 +09:00
Takuya ASADA	4e8060ba72	scylla_io_setup: support ARM instances on AWS Add preset parameters for AWS ARM intances. Fixes #9493	2021-11-03 10:26:42 +09:00
Avi Kivity	b3d5651fd7	Update seastar submodule * seastar 083898a172...a189cdc45d (7): > print: deprecate print() family > treewide: replace uses of fmt_print() and fprint() with direct fmt calls > circular_buffer: mark clear noexcept > circular_buffer: mark trivial methods noexcept > Merge "file: allow destroying append_challenged_posix_file_impl following a close failure" from Benny > merge: Add parsing HTTP response status > inet_address: fix usage of `htonl` for clang	2021-11-02 19:26:09 +02:00
garanews	7a6a59eb7c	fix some typo in docs Closes #9510	2021-11-02 19:59:16 +03:00
Botond Dénes	6ad0a2989c	compaction/scrub: segregate input only in segregate mode scrub_compaction assumes that `make_interposer_consumer()` is called only when `use_interposer_consumer()` returns true. This is false, so in effect scrub always ends up using the segregating interposer. Fix this by short-circuiting the former method when the latter returns true, returning the passed-in consumer unchanged. Tests: unit(dev) Fixes #9541 Closes #9564	2021-11-02 15:25:22 +02:00
Avi Kivity	15a80bb5ce	Update tools/jmx submodule * tools/jmx 5c383b6...48d37f3 (1): > StorageService: scrub: fix scrubMode is empty condition Ref scylladb/scylla-jmx#180.	2021-11-02 15:21:31 +02:00
Avi Kivity	2f23d22739	Merge 'Scrub compaction: add a new in-memory partition segregation method' from Botond Dénes The current disk-based segregation method works well enough for most cases, but it struggles greatly when there are a lot of partitions in the input. When this is the case it will produce tons of buckets (sstables), in the order of hundreds or even thousands. This puts a huge strain on different parts of the system. This series introduces a new segregation method which specializes on the lots of small partitions case. If the conditions are right, it can cause a drastic reduction of buckets. In one case I tested, a 1.1GB sstable with 3.6M partitions in it produced just 2 output sstables, down from the 500+ with the on-disk method. This new method uses a memtable to sort out-of-order partitions. In-order partitions bypass this sorting altogether and go to the disk directly. This method is not suitable for cases where either the partition are large or the total amount of data is large. For those the disk-based method should be used. Scrub compaction decides on the method to use based on heuristics. Tests: unit(dev) Closes #9548 * github.com:scylladb/scylla: compaction: scrub_compaction: add bucket count to finish message test/boost: mutation_writer_test: harden the partition-based segregator test mutation_writer: remove now unused on-disk partition segregator compaction,test: use the new in-memory segregator for scrub mutation_writer/partition_based_splitting_writer: add memtable-based segregator	2021-11-02 15:18:41 +02:00
Tomasz Grabiec	00814dcadc	Merge "raft: randomized_nemesis_test: perform cluster reconfigurations" from Kamil We introduce a new operation to the framework: `reconfiguration`. The operation sends a reconfiguration request to a Raft cluster. It bounces a few times in case of `not_a_leader` results. A side effect of the operation is modifying a `known` set of nodes which the operation's state has a reference to. This `known` set can then be used by other operations (such as `raft_call`s) to find the current leader. For now we assume that reconfigurations are performed sequentially. If a reconfiguration succeeds, we change `known` to the new configuration. If it fails, we change `known` to be the set sum of the previous configuration and the current configuration (because we don't know what the configuration will eventually be - the old or the attempted one - so any member of the set sum may eventually become a leader). We use a dedicated thread (similarly to the network partitioning thread) to periodically perform random reconfigurations. * kbr/reconfig-v2: test: raft: randomized_nemesis_test: perform reconfigurations in basic_generator_test test: raft: randomized_nemesis_test: improve the bouncing algorithm test: raft: randomized_nemesis_test: handle more error types test: raft: randomized_nemesis_test put `variant` and `monostate` `ostream` `operator<<`s into `std` namespace test: raft: randomized_nemesis_test: `reconfiguration` operation	2021-11-02 13:55:45 +01:00
Botond Dénes	eaf4454ac8	compaction: scrub_compaction: add bucket count to finish message It is useful to know how many buckets (output sstables) scrub produced in total. The end compaction message will only report those still open when the scrub finished, but will omit those that were closed in the middle.	2021-11-02 12:24:37 +02:00
Botond Dénes	e4e369053b	test/boost: mutation_writer_test: harden the partition-based segregator test Test both methods: the "old" disk-based one and the recently added in-memory one, with different configurations and also add additional checks to ensure they don't loose data.	2021-11-02 12:24:37 +02:00
Botond Dénes	74f2290e49	mutation_writer: remove now unused on-disk partition segregator Also removes related tests, including the exception safety test which just spins forever with the memtable method.	2021-11-02 12:24:33 +02:00
Botond Dénes	f2f529855d	compaction,test: use the new in-memory segregator for scrub	2021-11-02 09:00:44 +02:00
Botond Dénes	18599f26fa	mutation_writer/partition_based_splitting_writer: add memtable-based segregator The current method of segregating partitions doesn't work well for huge number of small partitions. For especially bad input, it can produce hundreds or even thousands of buckets. This patch adds a new segregator specialized for this use-case. This segregator uses a memtable to sort out-of-order partitions in-memory. When the memtable size reaches the provided max-memory limit, it is flushed to disk and a new empty one is created. In-order partitions bypass the sorting altogether and go to the fast-path bucket. The new method is not used yet, this will come in the next patch.	2021-11-02 08:23:16 +02:00

1 2 3 4 5 ...

28789 Commits