scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 22:25:48 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	aad1307b14	row_cache, memtable: Use upgrade_schema()	2019-10-03 13:28:33 +02:00
Tomasz Grabiec	3177732b35	flat_mutation_reader: Introduce upgrade_schema()	2019-10-03 13:28:33 +02:00
Asias He	a9b95f5f01	repair: Fix tracker::start and tracker::done in case of error The operation after gate.enter() in tracker::start() can fail and throw, we should call gate.leave() in such case to avoid unbalanced enter and leave calls. tracker::done() has similar issue too. Fix it by removing the gate enter and leave logic in tracker start and done. A helper tracker::run() is introduced to take care of the gate and repair status. In addition, the error log is improved. It now logs exceptions on all shards in the summary. e.g., [shard 0] repair - repair id 1 failed: std::runtime_error ({shard 0: std::runtime_error (error0), shard 1: std::runtime_error (error1)}) Fixes #5074	2019-10-03 13:33:02 +03:00
Botond Dénes	00b432b61d	querier_cache: correctly account entries evicted on insertion in the population Currently, the population stat is not increased for entries that are evicted immediately on insert, however the code that does the eviction still decreases the population stat, leading to an imbalance and in some cases the underflow of the population stat. To fix, unconditionally increase the population stat upon inserting an entry, regardless of whether it is immediately evicted or not. Fixes: #5123 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191001153215.82997-1-bdenes@scylladb.com>	2019-10-03 11:49:44 +03:00
Konstantin Osipov	e8c13efb41	lwt: move mutation hashers to mutation.hh Prepare mutation hashers for reuse in CAS implementation. Message-Id: <20190930202409.40561-2-kostja@scylladb.com>	2019-10-01 19:49:31 +02:00
Konstantin Osipov	6cde985946	lwt: remove code that no longer servers as a reference Remove ifdef'ed Java code, since LWT implementation is based on the current state of the origin. Message-Id: <20190930201022.40240-2-kostja@scylladb.com>	2019-10-01 19:46:15 +02:00
Konstantin Osipov	4d214b624b	lwt: ensure enum_set::of is constexpr. This allows using it to initialize const static members. Message-Id: <20190930200530.40063-2-kostja@scylladb.com>	2019-10-01 19:45:56 +02:00
Tomasz Grabiec	3b9bf9d448	Merge "storage_proxy: replace variadic futures with structs" from Avi Seastar variadic futures are deprecated, so replace with structs to avoid nasty deprecation warnings.	2019-10-01 19:32:55 +02:00
Avi Kivity	162730862d	storage_proxy: remove variadic future from query_partition_key_range_concurrent() Seastar variadic futures are deprecated, so replace with a nice struct.	2019-09-30 21:33:44 +03:00
Avi Kivity	968b34a2b4	storage_proxy: remove variadic future from digest_read_resolver Seastar variadic futures are deprecated, so replace with a nice struct.	2019-09-30 21:32:17 +03:00
Nadav Har'El	c9aae13fae	docs/alternator/getting-started.md: fix indentation in example code The example Python code had wrong indentation, and wouldn't actually work if naively copy-pasted. Noticed by Noam Hasson. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190929091440.28042-1-nyh@scylladb.com>	2019-09-30 13:03:29 +03:00
Avi Kivity	c6b66d197b	Merge "Couple of preparatory patches for lwt" from Gleb " This is a collection of assorted patches that will be needed for LWT. Most of them are trivial, but one touches a lot of files, so have a good chance to cause rebase headache (I already had to rebase it on top of Alternator). Lets push them earlier instead of carrying them in the lwt branch. " * 'gleb/lwt-prepare-v2' of github.com:scylladb/seastar-dev: lwt: make _last_timestamp_micros static lwt: Add client_state::get_timestamp_for_paxos() function lwt: Pass client_state reference all the way to storage_proxy::query exceptions: Add a constructor for unavailable_exception that allows providing a custom message serializer: Add std::variant support lwt: Add missing functions to utils/UUID_gen.hh	2019-09-29 13:02:26 +03:00
Avi Kivity	9e990725d9	Merge "Simplify and explain from_varint_to_integer #5031 " from Rafael " This is the second version of the patch series. The previous one was just the second patch, this one adds more tests an another patch to make it easier to test that the new code has the same behavior as the old one. " * 'espindola/overflow-is-intentional' of https://github.com/espindola/scylla: types: Simplify and explain from_varint_to_integer Add more cast tests	2019-09-29 11:27:55 +03:00
Tomasz Grabiec	b0e0f29b06	db: read: Filter-out sstables using its first and last keys Affects single-partition reads only. Refs #5113 When executing a query on the replica we do several things in order to narrow down the sstable set we read from. For tables which use LeveledCompactionStrategy, we store sstables in an interval set and we select only sstables whose partition ranges overlap with the queried range. Other compaction strategies don't organize the sstables and will select all sstables at this stage. The reasoning behind this is that for non-LCS compaction strategies the sstables' ranges will typically overlap and using interval sets in this case would not be effective and would result in quadratic (in sstable count) memory consumption. The assumption for overlap does not hold if the sstables come from repair or streaming, which generates non-overlapping sstables. At a later stage, for single-partition queries, we use the sstables' bloom filter (kept in memory) to drop sstables which surely don't contain given partition. Then we proceed to sstable indexes to narrow down the data file range. Tables which don't use LCS will do unnecessary I/O to read index pages for single-partition reads if the partition is outside of the sstable's range and the bloom filter is ineffective (Refs #5112). This patch fixes the problem by consulting sstable's partition range in addition to the bloom filter, so that the non-overlapping sstables will be filtered out with certainty and not depend on bloom filter's efficiency. It's also faster to drop sstables based on the keys than the bloom filter. Tests: - unit (dev) - manual using cqlsh Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190927122505.21932-1-tgrabiec@scylladb.com>	2019-09-28 19:42:57 +03:00
Tomasz Grabiec	b93cc21a94	sstables: Fix partition key count estimation for a range The method sstable::estimated_keys_for_range() was severely under-estimating the number of partitions in an sstable for a given token range. The first reason is that it underestimated the number of sstable index pages covered by the range, by one. In extreme, if the requested range falls into a single index page, we will assume 0 pages, and report 1 partition. The reason is that we were using get_sample_indexes_for_range(), which returns entries with the keys falling into the range, not entries for pages which may contain the keys. A single page can have a lot of partitions though. By default, there is a 1:20000 ratio between summary entry size and the data file size covered by it. If partitions are small, that can be many hundreds of partitions. Another reason is that we underestimate the number of partitions in an index page. We multiply the number of pages by: (downsampling::BASE_SAMPLING_LEVEL * _components->summary.header.min_index_interval) / _components->summary.header.sampling_level Using defaults, that means multiplying by 128. In the cassandra-stress workload a single partition takes about 300 bytes in the data file and summary entry is 22 bytes. That means a single page covers 22 * 20'000 = 440'000 bytes of the data file, which contains about 1'466 partitions. So we underestimate by an order of magnitude. Underestimating the number of partitions will result in too small bloom filters being generated for the sstables which are the output of repair or streaming. This will make the bloom filters ineffective which results in reads selecting more sstables than necessary. The fix is to base the estimation on the number of index pages which may contain keys for the range, and multiply that by the average key count per index page. Fixes #5112. Refs #4994. The output of test_key_count_estimation: Before: count = 10000 est = 10112 est([-inf; +inf]) = 512 est([0; 0]) = 128 est([0; 63]) = 128 est([0; 255]) = 128 est([0; 511]) = 128 est([0; 1023]) = 128 est([0; 4095]) = 256 est([0; 9999]) = 512 est([5000; 5000]) = 1 est([5000; 5063]) = 1 est([5000; 5255]) = 1 est([5000; 5511]) = 1 est([5000; 6023]) = 128 est([5000; 9095]) = 256 est([5000; 9999]) = 256 est(non-overlapping to the left) = 1 est(non-overlapping to the right) = 1 After: count = 10000 est = 10112 est([-inf; +inf]) = 10112 est([0; 0]) = 2528 est([0; 63]) = 2528 est([0; 255]) = 2528 est([0; 511]) = 2528 est([0; 1023]) = 2528 est([0; 4095]) = 5056 est([0; 9999]) = 10112 est([5000; 5000]) = 2528 est([5000; 5063]) = 2528 est([5000; 5255]) = 2528 est([5000; 5511]) = 2528 est([5000; 6023]) = 5056 est([5000; 9095]) = 7584 est([5000; 9999]) = 7584 est(non-overlapping to the left) = 0 est(non-overlapping to the right) = 0 Tests: - unit (dev) Reviewed-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190927141339.31315-1-tgrabiec@scylladb.com>	2019-09-28 19:36:43 +03:00
Piotr Sarna	10f90d0e25	types: remove deprecated comment The comment does not apply anymore, as this definition is no more in database.hh. Message-Id: <a0b6ff851e1e3bcb5fcd402fbf363be7af0219af.1569580556.git.sarna@scylladb.com>	2019-09-27 19:32:17 +02:00
Dejan Mircevski	9a89e0c5ec	dbuild: Update README on interactive mode `dbuild` was recently (`24c732057`) updated to run in interactive mode when given no arguments; we can now update the README to mention that. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-09-27 16:33:27 +02:00
Dejan Mircevski	f8638d8ae1	alternator: Add build byproducts to .gitignore Add .pytest_cache and expressions.tokens to the top-level .gitignore. Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-09-27 16:18:45 +02:00
Dejan Mircevski	332ffa77ea	alternator: Actually use BEGINS_WITH in its tests For some reason, BEGINS_WITH tests used EQ as comparison operator. Tests: pytest test_expected.py Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2019-09-26 22:41:34 +03:00
Tomasz Grabiec	5b0e48f25b	Merge "toppartitions: don't transport schema_ptr across shards" from Avi When the toppartitions operation gathers results, it copies partition keys with their schema_ptr:s. When these schema_ptr:s are copies or destroyed, they can cause leaks or premature frees of the schema in its original shard since reference count operations in are not atomic. Fix that by converting the schema_ptr to a global_schema_ptr during transportation. Fixes #5104 (direct bug) Fixes #5018 (schema prematurely freed, toppartitions previously executed on that node) Fixes #4973 (corrupted memory pool of the same size class as schema, toppartitions previously executed on that node) Tests: new test added that fails with the existing code in debug mode, manual toppartitions test	2019-09-26 17:09:54 +02:00
Avi Kivity	36b4d55b28	tests: add test for toppartitions cross-shard schema_ptr copy	2019-09-26 17:40:46 +03:00
Avi Kivity	670f398a8a	toppartitions: do not copy schema_ptr:s in item keys across shards Copying schema_ptrs across shards results in memory corruption since lw_shared_ptr does not use atomic operations for reference counts. Prevent that by converting schema_ptr:s to global_schema_ptr:s before shipping them across shards in the map operation, and converting them back to local schema_ptr:s in the reduce operation.	2019-09-26 17:26:40 +03:00
Avi Kivity	f015bd69b7	toppartitions: compare schemas using schema::id(), not pointer to schema This allows keys from different stages in the schema's like to compare equal. This is safe since the partition key cannot change, unlike the rest of the schema. More importantly, it will allow us to compare keys made local after a pass through global_schema_ptr, which does not guarantee that the schema_ptr conversion will be the same even when starting with the same global_schema_ptr.	2019-09-26 17:15:46 +03:00
Avi Kivity	ea4976a128	schema_registry: mark global_schema_ptr move constructor noexcept Throwing move constructors are a a pain; so we should try to make them noexcept. Currently, global_schema_ptr's move constructor throws an exception if used illegaly (moving from a different shard); this patch changes it to an assert, on the grounds that this error is impossible to recover from. The direct motivation for the patch is the desire to store objects containing a global_schema_ptr in a chunked_vector, to move lists of partition keys across shards for the topppartitions functionality. chunked_vector currently requires noexcept move constructors for its value_type.	2019-09-26 16:56:59 +03:00
Avi Kivity	ba64ec78cf	messaging_service: use rpc::tuple instead of variadic futures for rpc Since variadic future<> is deprecated, switch to rpc::tuple for multiple return values in rpc calls. This is more or less mechanical translation.	2019-09-26 12:09:31 +02:00
Tomasz Grabiec	9183e28f2c	Merge "Recreate dependent user types" from Rafael When a user type changes we were not recreating other uses types that use it. This patch series fixes that and makes it clear which code is responsible for it. In the system.types table a user type refers to another by name. When a user type is modified, only its entry in the table is changed. At runtime a user type has direct pointer to the types it uses. To handle the discrepancy we need to recreate any dependent types when a entry in system.types changes. Fixes #5049	2019-09-26 12:06:32 +02:00
Gleb Natapov	e0b303b432	lwt: make _last_timestamp_micros static If each client_state has its own copy of the variable two clients may generate timestamps that clash and needlessly create contention. Making the variable shared between all client_state on the same shard will make sure this will not happen to two clients on the same shard. It may still happen for two client on two different shards or two different nodes.	2019-09-26 11:44:00 +03:00
Gleb Natapov	622d21f740	lwt: Add client_state::get_timestamp_for_paxos() function Paxos needs a unique timestamp that is greater than some other timestamp, so that the next round had more chances to succeed. Add a function that returns such a timestamp.	2019-09-26 11:44:00 +03:00
Gleb Natapov	e72a105b5e	lwt: Pass client_state reference all the way to storage_proxy::query client_state holds a state to generate monotonically increasing unique timestamp. Queries with a SERIAL consistency level need it to generate a paxos round.	2019-09-26 11:44:00 +03:00
Gleb Natapov	556f65e8a1	exceptions: Add a constructor for unavailable_exception that allows providing a custom message	2019-09-26 11:44:00 +03:00
Gleb Natapov	209414b4eb	serializer: Add std::variant support	2019-09-26 11:44:00 +03:00
Gleb Natapov	f9209e27d4	lwt: Add missing functions to utils/UUID_gen.hh Some lwt related code is missing in our UUID implementation. Add it.	2019-09-26 11:44:00 +03:00
Rafael Ávila de Espíndola	5af8b1e4a3	types: recreate dependent user types. In the system.types table a user type refers to another by name. When a user type is modified, only its entry in the table is changed. At runtime a user type has direct pointer to the types it uses. To handle the discrepancy we need to recreate any dependent types when a entry in system.types changes. Fixes #5049 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-25 15:41:45 -07:00
Rafael Ávila de Espíndola	4c3209c549	types: Don't include dependent user types in update. The way schema changes propagate is by editing the system tables and comparing the before and after state. When a user type A uses another user type B and we modify B, the representation of A in the system table doesn't change, so this code was not producing any changes on the diff that the receiving side uses. Deleting it makes it clear that it is the receiver's responsibility to handle dependent user types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-25 15:41:45 -07:00
Rafael Ávila de Espíndola	34eddafdb0	types: Don't modify the type list in db::cql_type_parser::raw_builder With this patch db::cql_type_parser::raw_builder creates a local copy of the list of existing types and uses that internally. By doing that build() should have no observable behavior other than returning the new types. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-25 15:41:45 -07:00
Rafael Ávila de Espíndola	d6b2e3b23b	types: pass a reference to prepare_internal We were never passing a null pointer and never saving a copy of the lw_shared_ptr. Passing a reference is more flexible as not all callers are required to hold the user_types_metadata in a lw_shared_ptr. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-09-25 15:40:30 -07:00
Avi Kivity	03260dd910	Update seastar submodule * seastar b56a8c5045...c21a7557f9 (3): > net: socket::{set,get}_reuseaddr() should not be virtual > iotune: print verbose message in case of shutdown errors > iotune: close test file on shutdown Fixes #4946.	2019-09-25 16:08:32 +03:00
Tomasz Grabiec	06b9818e98	Merge "storage_proxy: tolerate view_update_write_response_handler id not found on shutdown" from Benny 1. Add assert in remove_response_handler to make crashes like in #5032 easier to understand. 2. Lookup the view_update_write_response_handler id before calling timeout_cb and tolerate it not found. Just log a warning if this happened. Fixes #5032	2019-09-25 14:49:42 +02:00
Avi Kivity	83bc59a89f	Merge "mvcc: Fix incorrect schema version being used to copy the mutation when applying (#5099 )" from Tomasz " Currently affects only counter tables. Introduced in `27014a2`. mutation_partition(s, mp) is incorrect because it uses s to interpret mp, while it should use mp_schema. We may hit this if the current node has a newer schema than the incoming mutation. This can happen during table schema altering when we receive the mutation from a node which hasn't processed the schema change yet. This is undefined behavior in general. If the alter was adding or removing columns, this may result in corruption of the write where values of one column are inserted into a different column. Fixes #5095. " * 'fix-schema-alter-counter-tables' of https://github.com/tgrabiec/scylla: mvcc: Fix incorrect schema verison being used to copy the mutation when applying mutation_partition: Track and validate schema version in debug builds tests: Use the correct schema to access mutation_partition	2019-09-25 15:30:22 +03:00
Tomasz Grabiec	11440ff792	mvcc: Fix incorrect schema verison being used to copy the mutation when applying Currently affects only counter tables. Introduced in `27014a2`. mutation_partition(s, mp) is incorrect, because it uses s to interpret mp, while it should use mp_schema. We may hit this if the current node has a newer schema than the incoming mutation. This can happen during alter when we receive the mutation from a node which hasn't processed the schema change yet. This is undefined behavior in general. If the alter was adding or removing columns, this may result in corruption of the write where values of one column are inserted into a different column. Fixes #5095.	2019-09-25 11:28:07 +02:00
Tomasz Grabiec	bce0dac751	mutation_partition: Track and validate schema version in debug builds This patch makes mutation_partition validate the invariant that it's supposed to be accessed only with the schema version which it conforms to. Refs #5095	2019-09-25 10:27:06 +02:00
Avi Kivity	721fa44c4f	Update seastar submodule * seastar e51a1a8ed9...b56a8c5045 (3): > net: add support for UNIX-domain sockets > future: Warn on promise::set_exception with no corresponding future or task > Merge "Handle exceptions in repeat_until_value and misc cleanups" from Rafael	2019-09-25 11:21:57 +03:00
Benny Halevy	e9388b3f03	storage_proxy::drain_on_shutdown fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-25 11:19:50 +03:00
Benny Halevy	b7c7af8a75	storage_proxy: validate id from view_update_handlers_list Handle a race where a write handler is removed from _response_handlers but not yet from _view_update_handlers_list. Fixes #5032 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-25 11:19:50 +03:00
Benny Halevy	1fea5f5904	storage_proxy: refactor remove_response_handler Refactor remove_response_handler_entry out of remove_response_handler, to be called on a valid iterator found by _response_handlers.find(id). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-25 11:19:50 +03:00
Benny Halevy	592c4bcfc2	storage_proxy: remove_response_handler: assert id was found Help identify cases like seen in #5032 where the handler id wasn't found from the on_down -> timeout_cb path. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-25 11:19:50 +03:00
Raphael S. Carvalho	571fa94eb5	sstables/compaction_manager: Don't perform upgrade on shared SSTables compaction_manager::perform_sstable_upgrade() fails when it feeds compaction mechanism with shared sstables. Shared sstables should be ignored when performing upgrade and so wait for reshard to pick them up in parallel. Whenever a shared sstable is brought up either on restart or via refresh, reshard procedure kicks in. Reshard picks the highest supported format so the upgrade for shared sstable will naturally take place. Fixes #5056. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20190925042414.4330-1-raphaelsc@scylladb.com>	2019-09-25 11:18:40 +03:00
Asias He	19e8c14ad1	gossiper: Improve the gossip timer callback lock handling (#5097 ) - Update the outdated comments in do_stop_gossiping. It was storage_service not storage_proxy that used the lock. More importantly, storage_service does not use it any more. - Drop the unused timer_callback_lock and timer_callback_unlock API - Use with_semaphore to make sure the semaphore usage is balanced. - Add log in gossiper::do_stop_gossiping when it tries to take the semaphore to help debug hang during the shutdown. Refs: #4891 Refs: #4971	2019-09-25 10:46:38 +03:00
Tomasz Grabiec	4d9b176aaa	tests: Use the correct schema to access mutation_partition	2019-09-24 19:46:57 +02:00
Botond Dénes	425cc0c104	doc: add debugging.md A documentation file that is intended to be a place for anything debugging related: getting started tutorial, tips and tricks and advanced guides. For now it contains a short introductions, some selected links to more in-depth documentation and some trips and tricks that I could think off the top of my head. One of those tricks describes how to load cores obtained from relocatable packages inside the `dbuild` container. I originally intended to add that to `tools/toolchain/README.md` but was convinced that `docs/debugging.md` would be a better place for this. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190924133110.15069-1-bdenes@scylladb.com>	2019-09-24 20:18:45 +03:00

1 2 3 4 5 ...

19720 Commits