scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 11:55:15 +00:00

Author	SHA1	Message	Date
Kamil Braun	841f07e9b7	cdc: add config option to disable streams rewriting Rewriting stream descriptions is a long, expensive, and prone-to-failure operation. Due to #8061 it may consume a lot of memory. In general, it may keep failing (and being retried) endlessly, straining the cluster. As a backdoor we add this flag for potential future needs of admins or field engineers. I don't expect it will ever be used, but it won't hurt and may save us some work in the worst case scenario.	2021-02-18 11:44:59 +01:00
Kamil Braun	9bdd000e97	cdc: rewrite streams to the new description table Nodes automatically ensure that the latest CDC generation's list of streams is present in the streams description table. When a new generation appears, we only need to update the table for this generation; old generations are already inserted. However, we've changed the description table (from `cdc_streams_descriptions` to `cdc_streams_descriptions_v2`). The existing mechanism only ensures that the latest generation appears in the new description table. This commit adds an additional procedure that rewrites the older generations as well, if we find that it is necessary to do so (i.e. when some CDC log tables may contain data in these generations).	2021-02-18 11:44:59 +01:00
Kamil Braun	4ef736a0a3	cql3: query_processor: improve internal paged query API The `query_processor::query` method allowed internal paged queries. However, it was quite limited, hardcoding a number of parameters: consistency level, timeout config, page size. This commit does the following improvements: 1. Rename `query` to `query_internal` to make it obvious that this API is supposed to be used for internal queries only 2. Extend the method to take consistency level, timeout config, and page size as parameters 3. Remove unused overloads of `query_internal` 4. Fix a bunch of typos / grammar issues in the docstring	2021-02-18 11:44:59 +01:00
Kamil Braun	67d4e5576d	sys_dist_ks: split CDC streams table partitions into clustered rows Until now, the lists of streams in the `cdc_streams_descriptions` table for a given generation were stored in a single collection. This solution has multiple problems when dealing with large clusters (which produce large lists of streams): 1. large allocations 2. reactor stalls 3. mutations too large to even fit in commitlog segments This commit changes the schema of the table as described in issue #7993. The streams are grouped according to token ranges, each token range being represented by a separate clustering row. Rows are inserted in reasonably large batches for efficiency. The table is renamed to enable easy upgrade. On upgrade, the latest CDC generation's list of streams will be (re-)inserted into the new table. Yet another table is added: one that contains only the generation timestamps clustered in a single partition. This makes it easy for CDC clients to learn about new generations. It also enables an elegant two-phase insertion procedure of the generation description: first we insert the streams; only after ensuring that a quorum of replicas contains them, we insert the timestamp. Thus, if any client observes a timestamp in the timestamps table (even using a ONE query), it means that a quorum of replicas must contain the list of streams.	2021-02-18 11:44:59 +01:00
Kamil Braun	ba920361b3	cdc: use chunked_vector for streams in streams_version The vector may get quite long (say... 1,6M stream IDs). We prevent a large allocation by using utils::chunked_vector.	2021-02-18 11:44:59 +01:00
Kamil Braun	9ae4467970	cdc: remove `streams_version::expired` field This field was not used anywhere.	2021-02-18 11:44:59 +01:00
Kamil Braun	3d7b990300	system_distributed_keyspace: use mutation API to insert CDC streams The `storage_proxy::mutate` low-level API is much more powerful than the CQL API. This power is not needed for this commit but for the next.	2021-02-18 11:44:59 +01:00
Kamil Braun	0df15ca8cc	storage_service: don't use `sys_dist_ks` before it is started It could happen that system_distributed_keyspace was used by storage_service before it was fully started (inside `handle_cdc_generation`), i.e. before sys_dist_ks' `start()` returned (on shard 0). It only checked whether `local_is_initialized()` returns true, so it only ensured that the service is constructed. Currently, sys_dist_ks' `start` only announces migrations, so this was mostly harmless. More concretely: it could result in the node trying to send CQL requests using a table that it didn't yet recognize by calling sys_dist_ks' methods before the `announce_migration` call inside `start` has returned. This would result in an exception; however, the exception would be catched by the caller and the procedure would be retried, succeeding eventually. See `handle_cdc_generation` for details. Still, the initial intention of the code was to wait for the sys_dist_ks service to be fully started before it was used. This commit fixes that.	2021-02-18 11:44:59 +01:00
Botond Dénes	ba7a9d2ac3	imr: switch back to open-coded description of structures Commit `aab6b0ee27` introduced the controversial new IMR format, which relied on a very template-heavy infrastructure to generate serialization and deserialization code via template meta-programming. The promise was that this new format, beyond solving the problems the previous open-coded representation had (working on linearized buffers), will speed up migrating other components to this IMR format, as the IMR infrastructure reduces code bloat, makes the code more readable via declarative type descriptions as well as safer. However, the results were almost the opposite. The template meta-programming used by the IMR infrastructure proved very hard to understand. Developers don't want to read or modify it. Maintainers don't want to see it being used anywhere else. In short, nobody wants to touch it. This commit does a conceptual revert of `aab6b0ee27`. A verbatim revert is not possible because related code evolved a lot since the merge. Also, going back to the previous code would mean we regress as we'd revert the move to fragmented buffers. So this revert is only conceptual, it changes the underlying infrastructure back to the previous open-coded one, but keeps the fragmented buffers, as well as the interface of the related components (to the extent possible). Fixes: #5578	2021-02-16 23:43:07 +01:00
Eliran Sinvani	178ced9014	schema tables: Remove mutations to unknown tables when adapting schema mutations Whenever an alter table occurs, the mutations for the just altered table are sent over to all of the replicas from the coordinator. In a mixed cluster the mutations should be adapted to a specific version of the schema. However, the adaptation that happens today doesn't omit mutations to newly added schema tables, to be more specific, mutations to the `computed_columns` table which doesn't exist for example in version 2019.1 This makes altering a table during a rolling upgrade from 2019.1 to 2020.1 dangerous.	2021-02-11 13:48:55 +02:00
Eliran Sinvani	ff1ba9bc2b	schema tables: Register 'scylla_tables' versions that were sent to other nodes In a mixed cluster there can be a situation where `scylla_tables` needs to be sent over to another node because a schema sync or because the node pulls it because it is referenced by a frozen_mutation. The former is not a problem since the sending node chooses the version to send. However, the former is problematic since `scylla_tables` versions are not registered anywhere. This registers every `scylla_tables` schema version which is used to adapted mutations since after this happens a schema pull for this version might follow.	2021-02-11 13:47:16 +02:00
Gleb Natapov	d8345c67d9	Consolidate system and non system keyspace creation The code that creates system keyspace open code a lot of things from database::create_keyspace(). The patch makes create_keyspace() suitable for both system and non system keyspaces and uses it to create system keyspaces as well. Message-Id: <20210209160506.1711177-1-gleb@scylladb.com>	2021-02-09 17:18:04 +01:00
Avi Kivity	4082f57edc	Merge 'Make commitlog disk limit a hard limit.' from Calle Wilund Refs #6148 Commitlog disk limit was previously a "soft" limit, in that we allowed allocating new segments, even if we were over disk usage max. This would also cause us sometimes to create new segments and delete old ones, if badly timed in needing and releasing segments, in turn causing useless disk IO for pre-allocation/zeroing. This patch set does: * Make limit a hard limit. If we have disk usage > max, we wait for delete or recycle. * Make flush threshold configurable. Default is ask for flush when over 50% usage. (We do not wait for results) * Make flush "partial". We flush X% of the used space (used - thres/2), and make the rp limit accordingly. This means we will try to clear the N oldest segments, not all. I.e. "lighter" flush. Of course, if the CL is wholly dominated by a single CF, this will not really help much. But when > 1 cf is used, it means we can skip those not having unflushed data < req rp. * Force more eager flush/recycle if we're out of segments Note: flush threshold is not exposed in scylla config (yet). Because I am unsure of wording, and even if it should. Note: testing is sparse, esp. in regard to latency/timeouts added in high usage scenarios. While I can fairly easily provoke "stalls" (i.e. forced waiting for segments to free up) with simple C-S, it is hard to say exactly where in a more sane config (I set my limits looow) latencies will start accumulating. Closes #7879 * github.com:scylladb/scylla: commitlog: Force earlier cycle/flush iff segment reserve is empty commitlog: Make segment allocation wait iff disk usage > max commitlog: Do partial (memtable) flushing based on threshold commitlog: Make flush threshold configurable table: Add a flush RP mark to table, and shortcut if not above	2021-02-08 16:44:05 +02:00
Pavel Emelyanov	a05adb8538	database: Remove global storage proxy reference The db::update_keyspace() needs sharded<storage_proxy> reference, but the only caller of it already has it and can pass one as argument. tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210205175611.13464-3-xemul@scylladb.com>	2021-02-08 12:59:46 +01:00
Calle Wilund	c5f6125039	commitlog: Add "add_entries" call to allow inputting N mutations Fixes #7615 Allows N mutations to be written "atomically" (i.e. in the same call). Either all are added to segement, or none. Returns rp_handle vector corresponding to the call vector.	2021-02-02 10:41:08 +00:00
Calle Wilund	5fcc2066ed	commitlog: Make commitlog entries optionally multi-entry Allows writing more than one blob of data using a single "add" call into segment. The old call sites will still just provide a single entry. To ensure we can determine the health of all the entries as a unit, we need to wrap them in a "parent" entry. For this, we bump the commitlog segment format and introduce a magic marker, which if present, means we have entries in entry, totalling "size" bytes. We checksum the entra header, and also checksum the individual checksums of each sub-entry (faster). This is added as a post-word. When parsing/replaying, if v2+ and marker, we have to read all entries + checksums into memory, verify, and _then_ we can actually send the info to caller.	2021-02-02 10:41:08 +00:00
Calle Wilund	6bef3f9cc3	commitlog: Move entry_writer definition to cc file Should not be public/visible	2021-02-02 10:32:44 +00:00
Pavel Solodovnikov	9d17a654a6	raft: use null_sharder for raft tables Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20210201105300.110210-1-pa.solodovnikov@scylladb.com>	2021-02-01 18:52:04 +02:00
Tomasz Grabiec	16eb4c6ce2	Merge "raft: system table backed persistency module" from Pavel Solodovnikov This series contains an initial implementation of raft persistency module that uses `raft` system table as the underlying storage model. "system.raft" table will be used as a backend storage for implementing raft persistence module in Scylla. It combines both raft log, persisted vote and term, and snapshot info. The table is partitioned by group id, thus allowing multi-raft operation. The rest of the table structure mirrors the fields of corresponding core raft structures defined in `raft.hh`, such as `raft::log_entry`. The raft table stores the only the latest snapshot id while the actual snapshot will be available in a separate table called `system.raft_snapshots`. The schema of `raft_snapshots` mirrors the fields of `raft::snapshot` structure. IDL definitions are also added for every raft struct so that we automatically provide serialization and deserialization facilities needed both for persistency module and for future RPC implmementation. The first patch is a side-change needed to provide complete serialization/deserialization for `bytes_ostream`, which we need when persisting the raft log in the table (since `data` is a variant containing `raft::command` (aka `bytes_ostream`) among others). `bytes_ostream` was lacking `deserialize` function, which is added in the patch. The second patch provides serializer for `lw_shared_ptr<T>` which will be used for `raft::append_entries`, which has a field with `std::vector<const lw_shared_ptr<raft::log_entry>>` type. There is also a patch to extend `fragmented_temporary_buffer` with a static function `allocate_to_fit` that allocates an instance of the fragmented buffer that has a specified size. Individual fragment size is limited to 128kb. The patch-set also contains the test suite covering basic functionality of the persistency module. * manmanson/raft-api-impl-v11: raft/sys_table_storage: add basic tests for raft_sys_table_storage raft: introduce `raft_sys_table_storage` class utils: add `fragmented_temporary_buffer::allocate_to_fit` raft: add IDL definitions for raft types raft: create `system.raft` and `system.raft_snapshots` tables serializer: add `serializer<lw_shared_ptr<T>>` specialization serializer: add `deserialize` function overload for `bytes_ostream`	2021-01-29 11:40:39 +02:00
Pavel Solodovnikov	cf5b8c4b79	raft: create `system.raft` and `system.raft_snapshots` tables System raft table will be used as a backend storage for implementing raft persistence module in Scylla. It combines both raft log, persisted vote and term, and snapshot info. The table is partitioned by group id, thus allowing multi-raft operation. The rest of the table structure mirrors the fields of corresponding core raft structures defined in `raft.hh`, such as `raft::log_entry`. The raft table stores the only the latest snapshot id while the actual snapshot will be available in a separate table called `system.raft_snapshots`. The schema of `raft_snapshots` mirrors the fields of `raft::snapshot` structure. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-01-29 01:59:04 +03:00
Kamil Braun	bf115e7d69	schema_tables: put schema tables on shard 0 We use a custom sharder for all schema tables: every table under the `system_schema` keyspace, plus `system.scylla_table_schema_history`. This sharder puts all data on shard 0. To achieve this, we hardcode the sharder in initial schema object definitions. Furthermore - since the sharder is not stored inside schema mutations yet - whenever we deserialize schema objects from mutations, we modify the sharder based on the schema's keyspace and table names. A regression test is added to ensure no one forgets to set the special sharder for newly added schema tables. This test assumes that all newly added schema tables will end up in the `system_schema` keyspace (other tables may go unnoticed, unfortunately). Closes #7947	2021-01-28 13:28:22 +02:00
Avi Kivity	114da51d73	Revert "commitlog: fix size of a write used to zero a segment" This reverts commit `df2f67626b`. The fix is correct, but has an unfortunate side effect with O_DSYNC: each 128k write also needs to flush the XFS log. This translates to 32MB/128k = 256 flushes, compared to one flush with the original code. A better fix would be to prezero without O_DSYNC, then reopen the file with O_DSYNC, but we can do that later. Reopens #5857.	2021-01-20 10:23:43 +02:00
Tomasz Grabiec	94749b01eb	Merge "futurize flat_mutation_reader::next_partition" from Benny The main motivation for this patchset is to prepare for adding a async close() method to flat_mutation_reader. In order to close the reader before destroying it in all paths we need to make next_partition asynchronous so it can asynchronously close a current reader before destoring it, e.g. by reassignment of flat_mutation_reader_opt, as done in scanning_reader::next_partition. Test: unit(release, debug) * git@github.com:bhalevy/scylla.git futurize-next-partition-v1: flat_mutation_reader: return future from next_partition multishard_mutation_query: read_context: save_reader: destroy reader_meta from the calling shard mutation_reader: filtering_reader: fill_buffer: futurize inner loop flat_mutation_reader::impl: consumer_adapter: futurize handle_result flat_mutation_reader: consume_pausable/in_thread: futurize_invoke consumer flat_mutation_reader: FlatMutationReaderConsumer: support also async consumer flat_mutation_reader:impl: get rid of _consume_done member	2021-01-19 10:19:03 +02:00
Avi Kivity	60f5ec3644	Merge 'managed_bytes: switch to explicit linearization' from Michał Chojnowski This is a revival of #7490. Quoting #7490: The managed_bytes class now uses implicit linearization: outside LSA, data is never fragmented, and within LSA, data is linearized on-demand, as long as the code is running within with_linearized_managed_bytes() scope. We would like to stop linearizing managed_bytes and keep it fragmented at all times, since linearization can require large contiguous chunks. Large contiguous allocations are hard to satisfy and cause latency spikes. As a first step towards that, we remove all implicitly linearizing accessors and replace them with an explicit linearization accessor, with_linearized(). Some of the linearization happens long before use, by creating a bytes_view of the managed_bytes object and passing it onwards, perhaps storing it for later use. This does not work with with_linearized(), which creates a temporary linearized view, and does not work towards the longer term goal of never linearizing. As a substitute a managed_bytes_view class is introduced that acts as a view for managed_bytes (for interoperability it can also be a view for bytes and is compatible with bytes_view). By the end of the series, all linearizations are temporary, within the scope of a with_linearized() call and can be converted to fragmented consumption of the data at leisure. This has limited practical value directly, as current uses of managed_bytes are limited to keys (which are limited to 64k). However, it enables converting the atomic_cell layer back to managed_bytes (so we can remove IMR) and the CQL layer to managed_bytes/managed_bytes_view, removing contiguous allocations from the coordinator. Closes #7820 * github.com:scylladb/scylla: test: add hashers_test memtable: fix accounting of managed_bytes in partition_snapshot_accounter test: add managed_bytes_test utils: fragment_range: add a fragment iterator for FragmentedView keys: update comments after changes and remove an unused method mutation_test: use the correct preferred_max_contiguous_allocation in measuring_allocator row_cache: more indentation fixes utils: remove unused linearization facilities in `managed_bytes` class misc: fix indentation treewide: remove remaining `with_linearized_managed_bytes` uses memtable, row_cache: remove `with_linearized_managed_bytes` uses utils: managed_bytes: remove linearizing accessors keys, compound: switch from bytes_view to managed_bytes_view sstables: writer: add write_* helpers for managed_bytes_view compound_compat: transition legacy_compound_view from bytes_view to managed_bytes_view types: change equal() to accept managed_bytes_view types: add parallel interfaces for managed_bytes_view types: add to_managed_bytes(const sstring&) serializer_impl: handle managed_bytes without linearizing utils: managed_bytes: add managed_bytes_view::operator[] utils: managed_bytes: introduce managed_bytes_view utils: fragment_range: add serialization helpers for FragmentedMutableView bytes: implement std::hash using appending_hash utils: mutable_view: add substr() utils: fragment_range: add compare_unsigned utils: managed_bytes: make the constructors from bytes and bytes_view explicit utils: managed_bytes: introduce with_linearized() utils: managed_bytes: constrain with_linearized_managed_bytes() utils: managed_bytes: avoid internal uses of managed_bytes::data() utils: managed_bytes: extract do_linearize_pure() thrift: do not depend on implicit conversion of keys to bytes_view clustering_bounds_comparator: do not depend on implicit conversion of keys to bytes_view cql3: expression: linearize get_value_from_mutation() eariler bytes: add to_bytes(bytes) cql3: expression: mark do_get_value() as static	2021-01-18 11:01:28 +02:00
Benny Halevy	29002e3b48	flat_mutation_reader: return future from next_partition To allow it to asynchronously close underlying readers on next_partition(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-01-13 17:35:07 +02:00
Calle Wilund	4be718ebfa	commitlog: Force earlier cycle/flush iff segment reserve is empty Attempt to hurry flushing/segment delete/recycle if we are trying to get a segment for allocation, and reserve is empty when above disk threshold. This is minimize time waited in allocation semaphore.	2021-01-11 12:45:36 +00:00
Calle Wilund	be8c359a62	commitlog: Make segment allocation wait iff disk usage > max Instead of allowing new segments to be added, explicitly wait for either disk delete or recycle to happen iff current disk usage is larger than limit.	2021-01-11 12:45:36 +00:00
Calle Wilund	ab55a1b4e6	commitlog: Do partial (memtable) flushing based on threshold Instead of asking to flush data for all segments, just request up to an RP where we get comfortably below disk usage threshold.	2021-01-11 12:45:10 +00:00
Michał Chojnowski	dbcf987231	keys, compound: switch from bytes_view to managed_bytes_view The keys classes (partition_key et al) already use managed_bytes, but they assume the data is not fragmented and make liberal use of that by casting to bytes_view. The view classes use bytes_view. Change that to managed_bytes_view, and adjust return values to managed_bytes/managed_bytes_view. The callers are adjusted. In some places linearization (to_bytes()) is needed, but this isn't too bad as keys are always <= 64k and thus will not be fragmented when out of LSA. We can remove this linearization later. The serialize_value() template is called from a long chain, and can be reached with either bytes_view or managed_bytes_view. Rather than trace and adjust all the callers, we patch it now with constexpr if. operator bytes_view (in keys) is converted to operator managed_bytes_view, allowing callers to defer or avoid linearization.	2021-01-08 14:16:08 +01:00
Calle Wilund	7c84b16cd8	commitlog: Make flush threshold configurable	2021-01-05 18:16:09 +00:00
Avi Kivity	43a2636229	Merge "Remove proxy from size-estimates reader" from Pavel E " The size_estimates_mutation_reader call for global proxy to get database from. The database is used to find keyspaces to work with. However, it's safe to keep the local database refernece on the reader itself. tests: unit(debug) " * 'br-no-proxy-in-size-estimate-reader' of https://github.com/xemul/scylla: size_estimate_reader: Use local db reference not global size_estimate_reader: Keep database reference on mutation reader size_estimate_reader: Keep database reference on virtual_reader	2021-01-05 11:28:09 +02:00
Pavel Emelyanov	9632af5d6b	schema_tables: Drop unused merge_schema overload After the `d3aa1759` one of them became unused. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210105051724.5249-1-xemul@scylladb.com>	2021-01-05 11:25:22 +02:00
Piotr Sarna	d5da455d95	schema_tables: describe calculate_schema_digest better - the mystical `accept_predicate` is renamed to `accept_keyspace` to be more self-descriptive - a short comment is added to the original calculate_schema_digest function header, mentioning that it computes schema digest for non-system keyspaces Refs #7854 Message-Id: <04f1435952940c64afd223bd10a315c3681b1bef.1609763443.git.sarna@scylladb.com>	2021-01-04 14:46:17 +02:00
Piotr Sarna	13a60b02ea	schema_tables: allow custom predicates in schema digest calc For testing purposes it would be useful to be able to skip computing schema for certain tables (namely, internal distributed tables). In order to allow that, a function which accepts a custom predicate is added.	2021-01-04 10:11:41 +01:00
Piotr Sarna	f293c59a46	system_keyspace: migrate helper functions to string_view Functions for checking if the keyspace is system/internal were based on sstring references, which is impractical compared to string views and may lead to unnecessary creation of sstring instances.	2021-01-04 09:47:01 +01:00
Gleb Natapov	d3aa17591c	migration_manager: drop announce_locally flag It looks like the history of the flag begins in Cassandra's https://issues.apache.org/jira/browse/CASSANDRA-7327 where it is introduced to speedup tests by not needing to start the gossiper. The thing is we always start gossiper in our cql tests, so the flag only introduce noise. And, of course, since we want to move schema to use raft it goes against the nature of the raft to be able to apply modification only locally, so we better get rid of the capability ASAP. Tests: units(dev, debug) Message-Id: <20201230111101.4037543-2-gleb@scylladb.com>	2021-01-03 13:58:09 +02:00
Gleb Natapov	491f10bb70	schema-tables: make schema update global when fixing legacy SI tables When a node notice that it uses legacy SI tables it converts them to use new format, but it update only local schema. It will only cause schema discrepancy between nodes, there schema change should propagate globally. Fixes #7857. Message-Id: <20201230111101.4037543-1-gleb@scylladb.com>	2021-01-03 13:57:46 +02:00
Pavel Solodovnikov	219ac2bab5	large_data_handler: fix segmentation fault when constructing `data_value` from a `nullptr` It turns out that `cql_table_large_data_handler::record_large_rows` and `cql_table_large_data_handler::record_large_cells` were broken for reporting static cells and static rows from the very beginning: In case a large static cell or a large static row is encountered, it tries to execute `db::try_record` with `nullptr` additional values, denoting that there is no clustering key to be recorded. These values are next passed to `qctx.execute_cql()`, which creates `data_value` instances for each statement parameter, hence invoking `data_value(nullptr)`. This uses `const char*` overload which delegates to `std::string_view` ctor overload. It is UB to pass `nullptr` pointer to `std::string_view` ctor. Hence leading to segmentation faults in the aforementioned large data reporting code. What we want here is to make a null `data_value` instead, so just add an overload specifically for `std::nullptr_t`, which will create a null `data_value` with `text` type. A regression test is provided for the issue (written in `cql-pytest` framework). Tests: test/cql-pytest/test_large_cells_rows.py Fixes: #6780 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20201223204552.61081-1-pa.solodovnikov@scylladb.com>	2020-12-24 11:37:43 +02:00
Konstantin Osipov	2c46938c2a	commitlog: avoid a syscall in a most common case of segment recycle When recycling a segment in O_DSYNC mode if the size of the segment is neither shrunk nor grown, avoid calling file::truncate() or file::allocate(). Message-Id: <20201215182332.1017339-2-kostja@scylladb.com>	2020-12-16 14:57:36 +02:00
Konstantin Osipov	b6c6cc275f	commitlog: align input of dma_write() during segment recycle Normally a file size should be aligned around block size, since we never write to it any unaligned size. However, we're not protected against partial writes. Just to be safe, align up the amount of bytes to zerofill when recycling a segment. Message-Id: <20201211142628.608269-4-kostja@scylladb.com>	2020-12-14 12:16:18 +02:00
Konstantin Osipov	ad6817bcde	commitlog: fix typo in a comment Message-Id: <20201211142628.608269-2-kostja@scylladb.com>	2020-12-14 12:16:14 +02:00
Pavel Emelyanov	3a025cfa52	schema-tables: Use db from make_update_table_mutations in make_update_indices_mutations Two halves of the tunnel finally connect -- the latter helper needs the local database instance and is only called by the former one which already has it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 21:23:53 +03:00
Pavel Emelyanov	89fd524c5a	schema-tables: Add database argument to make_update_table_mutations There are 3 callers of this helper (cdc, migration manager and tests) and all of them already have the database object at hands. The argument will be used by next patch to remove call for global storage proxy instance from make_update_indices_mutations. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 21:21:22 +03:00
Pavel Emelyanov	1bcef04c7a	schema-tables: Factor out calls getting database instance The make_update_indices_mutations gets database instance for two things -- to find the cf to work with and to get the value of a feature for index view creation. To suit both and to remove calls for global storage proxy and service instances get the database once in the function entrance. Next patch will clean this further. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 21:17:11 +03:00
Pavel Emelyanov	6dd10e771d	index-manager: Move feature evaluation one level up The create_view_for_index needs to know the state of the correct-idx-token-in-secondary-index feature. To get one it takes quite a long route through global storage service instance. Since there's only one caller of the method in question, and the method is called in a loop, it's a bit faster to get the feature value in caller and pass it in argument. This will also help to get rid of the call for global storage service. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 21:14:12 +03:00
Pavel Emelyanov	3a3ee45488	size_estimate_reader: Use local db reference not global The get_next_partition uses global proxy instance to get the local database reference. Now it's available in the reader object itself, so it's possible to remove this call for global storage proxy. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 20:38:21 +03:00
Pavel Emelyanov	107dcbfbd6	size_estimate_reader: Keep database reference on mutation reader This reader uses local databse instance in its get_next_partition method to find keyspaces to work with Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 20:34:54 +03:00
Pavel Emelyanov	48e494fb62	size_estimate_reader: Keep database reference on virtual_reader The database will be then used to create the mutation reader Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-11 20:31:35 +03:00
Nadav Har'El	781f9d9aca	alternator: make default timeout configurable Whereas in CQL the client can pass a timeout parameter to the server, in the DynamoDB API there is no such feature; The server needs to choose reasonable timeouts for its own internal operations - e.g., writes to disk, querying other replicas, etc. Until now, Alternator had a fixed timeout of 10 seconds for its requests. This choice was reasonable - it is much higher than we expect during normal operations, and still lower than the client-side timeouts that some DynamoDB libraries have (boto3 has a one-minute timeout). However, there's nothing holy about this number of 10 seconds, some installations might want to change this default. So this patch adds a configuration option, "--alternator-timeout-in-ms", to choose this timeout. As before, it defaults to 10 seconds (10,000ms). In particular, some test runs are unusually slow - consider for example testing a debug build (which is already very slow) in an extremely over-comitted test host. In some cases (see issue #7706) we noticed the 10 second timeout was not enough. So in this patch we increase the default timeout chosen in the "test/alternator/run" script to 30 seconds. Please note that as the code is structured today, this timeout only applies to some operations, such as GetItem, UpdateItem or Scan, but does not apply to CreateTable, for example. This is a pre-existing issue that this patch does not change. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20201207122758.2570332-1-nyh@scylladb.com>	2020-12-09 14:30:43 +01:00
Avi Kivity	f802356572	Revert "Revert "Merge "raft: fix replication if existing log on leader" from Gleb"" This reverts commit `dc77d128e9`. It was reverted due to a strange and unexplained diff, which is now explained. The HEAD on the working directory being pulled from was set back, so git thought it was merging the intended commits, plus all the work that was committed from HEAD to master. So it is safe to restore it.	2020-12-08 19:19:55 +02:00

1 2 3 4 5 ...

1956 Commits