scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 14:15:46 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	8bdad0bb28	tests: Generalize bptree stress test Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-02-02 09:28:57 +03:00
Pavel Solodovnikov	9d17a654a6	raft: use null_sharder for raft tables Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20210201105300.110210-1-pa.solodovnikov@scylladb.com>	2021-02-01 18:52:04 +02:00
Gleb Natapov	382ee066bf	database: drop duplicated function The database lass have to duplicated functions keyspaces() and get_keyspaces(). Drop the former since it is used in one place only. Message-Id: <20210201135333.GA1403508@scylladb.com>	2021-02-01 18:52:04 +02:00
Tomasz Grabiec	eac9c1d80a	Merge "raft: configuration changes with joint consensus" from Kostja Support configuration changes based on joint consensus. When a user adds a configuration entry, commit an interim "joint consensus" configuration to the log first, and transition to the final configuration once both C_old and C_new configurations accept the joint entry. Misc cleanups. * scylla-dev/raft-config-changes-v2: raft: update README.md raft: add a simple test for configuration changes raft: joint consensus, wire up configuration changes in the API raft: joint consensus, count votes using joint config raft: joint consensus, wire up configuration changes in FSM raft: joint consensus, update progress tracker with joint configuration raft: joint consensus, don't store configuration in FSM raft: joint consensus, keep track of the last confchange index in the log raft: joint consensus, implement helpers in class configuration raft: joint consensus, use unordered_set for server_address list raft: joint consensus, switch configuration to joint raft: rename check_committed() to maybe_commit() raft: fix spelling and add comments	2021-02-01 18:52:04 +02:00
Nadav Har'El	75a4281bff	cql-pytest: test the units supposed to be usable for "duration" type This patch adds a test for the different units which are supposed to be usable for assigning a "duration" type in CQL. It turns out that all documented units are supported correctly except µs (with a unicode mu), so the test reproduces issue #8001. The test xfails on Scylla (because µs is not supported) and passes on Cassandra. Refs: #8001. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210131192220.407481-1-nyh@scylladb.com>	2021-02-01 11:05:10 +01:00
Avi Kivity	bb202db1ff	Merge 'dist/offline_installer/redhat: fix umask error' from Takuya ASADA Since makeself script changes current umask, scylla_setup causes "scylla does not work with current umask setting (0077)" error. To fix that we need use latest version of makeself, and specfiy --keep-umask option. Fixes #6243 Closes #6244 * github.com:scylladb/scylla: dist/offline_redhat: fix umask error dist/offline_installer/redhat: support cross build	2021-01-31 18:47:27 +02:00
Takuya ASADA	49e4f318a0	dist/offline_redhat: fix umask error Since makeself script changes current umask, scylla_setup causes "scylla does not work with current umask setting (0077)" error. To fix that we need use latest version of makeself, and specfiy --keep-umask option. Fixes #6243	2021-01-31 21:37:49 +09:00
Takuya ASADA	74d7e31576	dist/offline_installer/redhat: support cross build Supported cross build by running CentOS7 on docker, now it's able to build on Fedora. It also supported switch container image, tested on Oracle Linux 7 and CentOS 7/8.	2021-01-31 21:37:49 +09:00
Avi Kivity	9271e4bf6e	Update seastar submodule * seastar 52d41277a...cb3aaf07e (2): > tls: reloadable_credentials_base: add_dir_watch: fix root dir detection > scripts/perftune.py: convert nic option in old perftune.yaml to list for compatibility	2021-01-31 13:28:45 +02:00
Raphael S. Carvalho	298d54ceb0	utils/fragment_temporary_buffer: don't push empty fragment if data size is fragment-aligned last fragment is unconditionally pushed to set of fragments, so if data size is fragment-aligned, an empty fragment will be needlessly pushed to the back of the fragment set. note: i haven't tested if empty fragment at back of set will cause issues, i think it won't, but this should be avoided anyway. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210129231532.871405-3-raphaelsc@scylladb.com>	2021-01-30 20:54:20 +02:00
Raphael S. Carvalho	e745f1e697	utils/fragmented_temporary_buffer: avoid reallocations by reserving upfront Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210129231532.871405-2-raphaelsc@scylladb.com>	2021-01-30 20:54:20 +02:00
Raphael S. Carvalho	08e838d4b5	utils/fragmented_temporary_buffer: simplify allocate_to_fit() 1) reuse default_fragment_size for knowledge of max fragment size 2) fragments_count is not a good name as it doesn't include last non-full fragment (if present), so rename it. 3) simplify calculation of last fragment size Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210129231532.871405-1-raphaelsc@scylladb.com>	2021-01-30 20:54:20 +02:00
Konstantin Osipov	a8f2fa7fa0	raft: update README.md	2021-01-29 22:07:08 +03:00
Konstantin Osipov	b7692af8bc	raft: add a simple test for configuration changes Test adding, removing replacing a node. With fix-ups by Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-01-29 22:07:08 +03:00
Konstantin Osipov	c7b5a60320	raft: joint consensus, wire up configuration changes in the API Now that we've implemented joint consensus based configuration changes, replace add_server()/remove_server() with a more general set_configuration().	2021-01-29 22:07:08 +03:00
Konstantin Osipov	afadc7c0a1	raft: joint consensus, count votes using joint config Send RequestVote to a joint config. We need to exclude self from the list of peers if we're not part of the current configuration. Avoid disrupting the cluster in this case. Maintain separate status for previous and current config when counting votes.	2021-01-29 22:07:08 +03:00
Konstantin Osipov	8b86d91754	raft: joint consensus, wire up configuration changes in FSM When add_entry() with new configuraiton is submitted, create a joint configuration and switch to it immediately. Refuse to enter joint configuration if a configuration change is already in progress. When the leader it committed an entry with joint configuration, append a new entry with final configuration and switch to it. Resign leadership if the current leader is not part of a new configuration. When we change from A, B, C to B, C, D and the leader is A, then, when C_new starts to be used, the leader is not part of the current configuration, so it doesn't have to be in the tracker. Do not try to find & advance leader progress unconditionally then.	2021-01-29 22:07:08 +03:00
Konstantin Osipov	18a684ba11	raft: joint consensus, update progress tracker with joint configuration The leader doesn't have to be part of the current configuration, so add a way to access follower_progress for the leader only if it is present. Upon configuration changes, preserve progress information for intact nodes, remove for removed, and create a new progress object for added nodes. When tracking commit progress in joint configuration mode, calculate two commit indexes for two configurations, and choose the smallest one.	2021-01-29 22:07:08 +03:00
Konstantin Osipov	20df1955b2	raft: joint consensus, don't store configuration in FSM In follower state, FSM doesn't know the current cluster configuration. Instead of trying to watch the follower log for configuration changes to keep FSM copy up to date, remove it from FSM altogether since the follower doesn't need it anyway. When entering candidate or leader state, fetch the most recent configuration from the log and initialize the state specific state with it.	2021-01-29 22:07:07 +03:00
Konstantin Osipov	b29181875c	raft: joint consensus, keep track of the last confchange index in the log When initializing the log, find the most recent configuration change index, if present. Maintain the most recent configuration change index when the log is truncated or entries are appended to it. The last configuration change index will be used by FSM when it enters candidate or leader state to fetch the current configuration. We never truncate beyond a single in-progress configuration change, so storing the previous value of last_conf_idx helps avoid log backward scan on truncation in 100% of cases. Remove all unused log constructors.	2021-01-29 22:07:07 +03:00
Konstantin Osipov	6e128aa357	raft: joint consensus, implement helpers in class configuration	2021-01-29 22:07:07 +03:00
Konstantin Osipov	1ca738d9a2	raft: joint consensus, use unordered_set for server_address list	2021-01-29 22:07:07 +03:00
Konstantin Osipov	df944f953c	raft: joint consensus, switch configuration to joint In order to work correctly in transitional configuration, participants must enter it after crashes, restarts and state changes. This means it must be stored in Raft log and snapshot on the leader and followers. This is most easily done if transitional configuration is just a flavour of standard configuration. In FSM, rename _current_config to _configuration, it now contains both current and future configuration at all times.	2021-01-29 22:07:07 +03:00
Konstantin Osipov	076e46af9e	raft: rename check_committed() to maybe_commit() This is what the function does, and it's the name used in other implementations.	2021-01-29 22:07:07 +03:00
Gleb Natapov	aad0209b1c	raft: fix spelling and add comments Fix spelling errors in a few comments, improve comments. With fix-ups by Gleb Natapov <gleb@scylladb.com>	2021-01-29 22:07:07 +03:00
Pavel Emelyanov	575c992a35	test: Bring test_apply_monotonically_is_monotonic back to work The idea of the monotonicity checking test is: try to apply one one random partition to another random one sequentually failing allocations. Each time allocation fails (with the bad_alloc exception) -- check the exception guarantee is respected, then apply (!) the very same two partitions to each other. At the end of the test we make sure, that an exception may pop up at any point of application and it will be safe. This idea is flawed currently. When verifying the guarantee the test moves the 2nd partition and leaves it empty for the next loop iteration. So right on the 2nd attempt to apply partitions it becomes a no-op, doesn't fail and no more exceptions arise. Fix by restoring both partitions at the end of each check. Broken since `74db08165d`. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210129153641.5449-1-xemul@scylladb.com>	2021-01-29 18:47:15 +01:00
Tomasz Grabiec	16eb4c6ce2	Merge "raft: system table backed persistency module" from Pavel Solodovnikov This series contains an initial implementation of raft persistency module that uses `raft` system table as the underlying storage model. "system.raft" table will be used as a backend storage for implementing raft persistence module in Scylla. It combines both raft log, persisted vote and term, and snapshot info. The table is partitioned by group id, thus allowing multi-raft operation. The rest of the table structure mirrors the fields of corresponding core raft structures defined in `raft.hh`, such as `raft::log_entry`. The raft table stores the only the latest snapshot id while the actual snapshot will be available in a separate table called `system.raft_snapshots`. The schema of `raft_snapshots` mirrors the fields of `raft::snapshot` structure. IDL definitions are also added for every raft struct so that we automatically provide serialization and deserialization facilities needed both for persistency module and for future RPC implmementation. The first patch is a side-change needed to provide complete serialization/deserialization for `bytes_ostream`, which we need when persisting the raft log in the table (since `data` is a variant containing `raft::command` (aka `bytes_ostream`) among others). `bytes_ostream` was lacking `deserialize` function, which is added in the patch. The second patch provides serializer for `lw_shared_ptr<T>` which will be used for `raft::append_entries`, which has a field with `std::vector<const lw_shared_ptr<raft::log_entry>>` type. There is also a patch to extend `fragmented_temporary_buffer` with a static function `allocate_to_fit` that allocates an instance of the fragmented buffer that has a specified size. Individual fragment size is limited to 128kb. The patch-set also contains the test suite covering basic functionality of the persistency module. * manmanson/raft-api-impl-v11: raft/sys_table_storage: add basic tests for raft_sys_table_storage raft: introduce `raft_sys_table_storage` class utils: add `fragmented_temporary_buffer::allocate_to_fit` raft: add IDL definitions for raft types raft: create `system.raft` and `system.raft_snapshots` tables serializer: add `serializer<lw_shared_ptr<T>>` specialization serializer: add `deserialize` function overload for `bytes_ostream`	2021-01-29 11:40:39 +02:00
Pavel Solodovnikov	e309502c42	raft/sys_table_storage: add basic tests for raft_sys_table_storage The test suite covers the most basic use cases for the system table backed raft persistency module: * store/load vote and term * store/load snapshot * store snapshot with log tail truncation * store/load log entries * log truncation Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-01-29 02:00:27 +03:00
Pavel Solodovnikov	aebb1987b5	raft: introduce `raft_sys_table_storage` class This is the implementation of raft persistency module that uses `raft` system table as the underlying storage model. The instance is supposed to be bound to a single raft group. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-01-29 02:00:12 +03:00
Pavel Solodovnikov	d14dc030ac	utils: add `fragmented_temporary_buffer::allocate_to_fit` Introduce `fragmented_temporary_buffer::allocate_to_fit` static function returning an instance of the buffer of a specified size. The allocated buffer fragments have a size of at most 128kb. `bytes_ostream` has the same hard-coded limit, so just use the same here. This patch will be later needed for `raft::log_entry` raw data serialization when writing to the underlying persistent storage. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-01-29 01:59:16 +03:00
Pavel Solodovnikov	e1504bbf0e	raft: add IDL definitions for raft types Changes to the `configuration` and `tagged_uint64` classes are needed to overcome limitations of the IDL compiler tool, i.e. we need to supply a constructor to the struct initializing all the members (raft::configuration) and also need to make an accessor function for private members (in case of raft::tagged_uint64). All other structs mirror raft definitions in exactly the same way they are declared in `raft.hh`. `tagged_id` and `tagged_uint64` are used directly instead of their typedef-ed companions defined in `raft.hh` since we don't want to introduce indirect dependencies. In such case it can be guaranteed that no accidental changes made outside of the idl file will affect idl definitions. This patch also fixes a minor typo in `snapshot_id_tag` struct used in `snapshot_id` typedef.	2021-01-29 01:59:10 +03:00
Pavel Solodovnikov	cf5b8c4b79	raft: create `system.raft` and `system.raft_snapshots` tables System raft table will be used as a backend storage for implementing raft persistence module in Scylla. It combines both raft log, persisted vote and term, and snapshot info. The table is partitioned by group id, thus allowing multi-raft operation. The rest of the table structure mirrors the fields of corresponding core raft structures defined in `raft.hh`, such as `raft::log_entry`. The raft table stores the only the latest snapshot id while the actual snapshot will be available in a separate table called `system.raft_snapshots`. The schema of `raft_snapshots` mirrors the fields of `raft::snapshot` structure. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-01-29 01:59:04 +03:00
Pavel Solodovnikov	83c26e542d	serializer: add `serializer<lw_shared_ptr<T>>` specialization This one works similar to `serializer<optional<T>>` and will be later needed for serializing `raft::append_request`, which has a field containing `lw_shared_ptr`. Users to be warned, though: this code assumes that the pointer is never null. This is done to mirror the serialize implementation for `lw_shared_ptr:s` in the messaging_service.cc, which is subject to being deleted in favor of the impl in the `serializer_impl.hh`. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-01-29 01:58:46 +03:00
Avi Kivity	b32ece6975	Update tools/java submodule * tools/java 4a55b81941...78c8ef4f54 (1): > nodetool: do no treat table name with dot as a secondary index Fixes #6521.	2021-01-28 16:16:47 +02:00
Kamil Braun	bf115e7d69	schema_tables: put schema tables on shard 0 We use a custom sharder for all schema tables: every table under the `system_schema` keyspace, plus `system.scylla_table_schema_history`. This sharder puts all data on shard 0. To achieve this, we hardcode the sharder in initial schema object definitions. Furthermore - since the sharder is not stored inside schema mutations yet - whenever we deserialize schema objects from mutations, we modify the sharder based on the schema's keyspace and table names. A regression test is added to ensure no one forgets to set the special sharder for newly added schema tables. This test assumes that all newly added schema tables will end up in the `system_schema` keyspace (other tables may go unnoticed, unfortunately). Closes #7947	2021-01-28 13:28:22 +02:00
Avi Kivity	32cdcc0c8b	Merge "sstables: consolidate reader factory methods" from Botond " Currently there are three different methods for creating an sstable reader: * one for single key reads * one for ranged reads * and one nobody uses This patch-set consolidates all these into a single `make_reader()` method, which behind the scenes uses the same logic to dispatch to the right sstable reader constructor that `sstables::as_mutation_source()` uses. This patch-set is part of an effort to clean up the jungle that is the various reader creation methods. The next step is to clean up the sstable_set, which has even more methods. One very sad discovery I made while working on this patch-set is that we still default `mutation_reader::forwarding` to `yes` in the sstable range reader creator method and in the `mutation_source::make_reader()`. I couldn't assume that all callers are passing what they mean as the value for that parameter. I found many sites in tests that create forwardable single partition readers. This is also something we should address soon. Tests: unit(release, debug:v3) " * 'sstables-consolidate-reader-factory-methods-v4' of https://github.com/denesb/scylla: cql_query_test: add unit test covering the non-optimal TWCS sstable read path sstable_mutation_reader: consolidate constructors tests: don't pass temporary ranges to readers sstables: sstable_mutation_reader: remove now unused whole sstable constructor sstables: stats: remove now unused sstable_partition_reads counter sstable: remove read_.row._flat() methods tree-wide: use sstables::make_reader() instead of the read_.row._flat() methods sstables: pass partition_range to create_single_key_sstable_reader() sstables: sstable: add make_reader()	2021-01-28 12:05:06 +02:00
Botond Dénes	1e9ce62ee6	cql_query_test: add unit test covering the non-optimal TWCS sstable read path The sstable read path for TWCS tables takes a different path when the optimized read path cannot be used. This path was found to be not covered at all by unit tests which allowed a trivial use-after-free to slip in. Add a unit test to cover this path as well, so ASAN can catch such bugs in the future.	2021-01-28 11:34:03 +02:00
Avi Kivity	55609f2033	Update seastar submodule * seastar a287bb1a3...52d41277a (8): > fair_queue: Preempted requests got re-queued too far > scripts/perftune.py: remove repeated items after merging options from file > file.hh: Remove fair_queue.hh > Merge "Reloadable TLS certificate tolerance" from Calle > Merge "Cancellable IO" from Pavel E > abort-source: Improve the subscriptions management > fair_queue: Improve requests preemption while in pending state > http: add support for Default handler (/*)	2021-01-28 08:45:33 +01:00
Konstantin Osipov	b4f875f08e	uuid: reduce code dependency on UUID_gen.hh Do not include UUID_gen.hh in trace_state.hh and lists.hh to reduce header level dependency on it. Message-Id: <20210127173114.725761-2-kostja@scylladb.com>	2021-01-27 20:08:29 +02:00
Botond Dénes	6024ef5dad	sstable_mutation_reader: consolidate constructors The two remaining sstable constructor are very similar apart from the content of the initialize lambda. Speaking of which, the two remaining initializer lambdas can be easily merged into one too. So this patch does just that, consolidates the two constructors one and moves consolidates as well as extracts the initializer method into a member method. This means we have to store the previously captured variables as members, but this is actually a good thing: when debugging we can see the range and slice the reader is reading, and we are not actually paying for it either -- they were already stored, just out of sight.	2021-01-27 17:38:17 +02:00
Botond Dénes	dd26a96e63	tests: don't pass temporary ranges to readers The sstable_mutation_reader, like all other mutation readers expects that the partition-range passed to it is kept alive by its creator for the duration of its lifetime. However, the single-key constructor of the sstable reader was more tolerant, as it only extracted the key from the range, essentially requiring only the key to be kept alive (but not the containing range). Naturally in time some code come to rely on it and ended up passing temporary ranges to the reader. This behaviour will no longer be acceptable as we are about to consolidate the various sstable reader constructors, uniformly requiring that the range is kept alive. So this patch fixes up the tests so they work with this stricter requirement. Only two occurences were found.	2021-01-27 17:38:17 +02:00
Botond Dénes	43ad64db78	sstables: sstable_mutation_reader: remove now unused whole sstable constructor	2021-01-27 17:38:17 +02:00
Botond Dénes	ec6c540c30	sstables: stats: remove now unused sstable_partition_reads counter	2021-01-27 17:38:17 +02:00
Botond Dénes	5f18e9eb37	sstable: remove read_.row._flat() methods	2021-01-27 17:38:17 +02:00
Botond Dénes	c3b4e990a2	tree-wide: use sstables::make_reader() instead of the read_.row._flat() methods	2021-01-27 17:38:17 +02:00
Botond Dénes	080bc2ffec	sstables: pass partition_range to create_single_key_sstable_reader() We want to unify the various sstable reader creation methods and this method taking a ring position instead of a partition range like everybody else stands in the way of that. This is effect reverts `68663d0de`.	2021-01-27 17:38:14 +02:00
Wojciech Mitros	a1f93e4297	api: use a list instead of a vector to remove a large allocation in api handler Follow-up to #7917 The size of an cf::column_family_info is 224 bytes, so an std::vector that contains one for each column family may be very large, causing allocations of over 1MB. Considering the vector is used only for iteration, it can be changed to a non-contiguous list instead. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com> Closes #7973	2021-01-27 16:02:07 +02:00
Avi Kivity	aec231ba2e	Merge "Unify query paths" from Botond " Currently we have two parallel query paths: * database::query() -> table::query() -> data_query() * mutation::query() The former is used by single partition queries, the latter by range scans, as mutation::query() is used to convert reconcilable_result to query::result (which means it is also used in single partition queries if it triggers read repair). This is a rather unfortunate situation as we have two parallel implementation of the query code, which means they are prone to diverge, and in fact they already have -- more on that later. This patchset aims to remedy this situation by retiring `mutation::query()` and migrating users to an implementation based on the "standard" query path, in other words one using the same building blocks as the `database::query()` path. This means using `compact_mutation` for compacting and `query_result_builder` for result building. These components however were created to work with `flat_mutation_reader`, however introducing a reader into this pipeline would mean that we'd have to make all the related APIs asynchronous, which would cause an insane amount of churn. To avoid this, this patchset adds an API compatible `consume()` method to `mutation`, which can accept a `compact_mutation` instance as-is. This allows an elegant and succinct reimplementation. So far so good. Like mentioned above, the two implementations have diverged in time, or have been different from the start. The difference manifest when calculating digests, more precisely in which tombstones are included in the digest. The retired `mutation::query()` path incorporates only non-purgeable tombstones in the digest. The standard query path however incorporates all tombstones, even those that can be purged. After some scrutiny however this difference proved to be completely theoretical, as the code path where this would matter -- converting reconcilable result to query result -- passes min timestamp as the query time to the compaction, so nothing is compacted and hence the difference has no chance to manifest. This patch-set was motivated by the desire to provide a single solution to #7434, instead of two, one for each path. Tests: unit(release:v2, debug:v2, dev:v3) " * 'unified-query-path/v3' of https://github.com/denesb/scylla: mutation: remove now unused query() and query_compacted() treewide: use query_mutations() instead of mutation::query() mutation_test: test_query_digest: ensure digest is produced consistently mutation_query: introduce query_mutation() mutation_query: to_data_query_result(): migrate to standard query code mutation_query: move to_data_query_result() to mutation_partition.cc mutation: add consume() flat_mutation_reader: move mutation consumer concepts to separate header mutation compactor: query compaction: ignore purgeable tombstones	2021-01-27 15:58:47 +02:00
Botond Dénes	a5a8037f6e	sstables: sstable: add make_reader() This will be the only method to create sstable readers with. For now we leave the other variants, they as well as their users will be removed in a following patch.	2021-01-27 15:20:06 +02:00
Nadav Har'El	2113849a2b	cql-pytest: reproducer for toJson() bug with doubles This patch adds a cql-pytest, test_json.py::test_tojson_double(), which reproduces issue #7972 - where toJson() prints some doubles incorrectly - truncated to integers, but some it prints fine (I still don't know why, this will need to be debugged). The test is marked xfail: It fails on Scylla, and passes on Cassandra. Refs #7972. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210127124338.297544-1-nyh@scylladb.com>	2021-01-27 14:00:25 +01:00

1 2 3 4 5 ...

24994 Commits