scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 03:30:49 +00:00

Author	SHA1	Message	Date
Asias He	b2c110699e	gms: Remove i_failure_detector.hh It is not used any more.	2019-03-22 09:08:51 +08:00
Asias He	af579a055b	gossip: Get rid of the gms::get_local_failure_detector static object Store the failure_detector object inside gossiper object. - No more the global object sharded<failure_detector> - No need to initialize sharded<failure_detector> manually which simplifies the code in tests/cql_test_env.cc and init.cc.	2019-03-22 09:08:51 +08:00
Asias He	2b6a4050c2	dht: Do not use failure_detector::is_alive in failure_detector_source_filter Switch failure_detector_source_filter to use get_local_gossiper::is_alive directly since we are going to remove the static gms::get_local_failure_detector object soon. Pass the nodes that are down to the filter direclty, to avoid the range_streamer to depends on gossiper at all.	2019-03-22 08:26:47 +08:00
Asias He	9dbc4af1dd	tests: Fix stop snitch in gossip_test.cc It should stop snitch not failure detector. Fix it up. We are going to remove the static failure_detector object soon.	2019-03-22 08:26:47 +08:00
Asias He	967794798a	gossiper: Do not use value_factory from storage_service object Avoid using value_factory from storage_service inside gossiper.	2019-03-22 08:26:47 +08:00
Asias He	4a55617c6c	gossiper: Use cfg options from _cfg instead of get_local_storage_service Gossiper has db::config _cfg now, avoid using the get_local_storage_service() to get config options.	2019-03-22 08:26:44 +08:00
Asias He	ee1227b3ae	gossiper: Pass db::config object to gossiper class Gossiper calls service::get_local_storage_service() to get cfg options. To avoid cyclic dependency, pass the cfg object to gossiper directly.	2019-03-22 08:25:16 +08:00
Asias He	1652ee512a	init: Pass gossiper object to init_ms_fd_gossiper In order to avoid the usage of the static gossiper object returned from get_local_gossiper().	2019-03-22 08:25:16 +08:00
Duarte Nunes	5752174762	Merge 'Use staging directory for uploaded sstables awaiting view updates' from Piotr " This series adds moving sstables uploaded via `nodetool refresh` to staging/ directory if they require generating view updates from them. Previous behavior (leaving these sstables in upload/ directory until view updates are generated) might have caused sstables with conflicting names to be mistakenly overwritten by the user. Fixes #4047 Tests: unit (dev) dtest: backup_restore_tests.py + backup_restore_tests.py modified with having materialized view definitions " * 'use_staging_directory_for_uploaded_sstables_awaiting_view_updates' of https://github.com/psarna/scylla: sstables: simplify requires_view_building loader: move uploaded view pending sstables to staging	2019-03-21 12:46:02 -03:00
Gleb Natapov	bb93d990ad	messaging_service: keep shared pointer to an rpc connection while opening mutation fragment stream Current code captures a reference to rpc::client in a continuation, but there is no guaranty that the reference will be valid when continuation runs. Capture shared pointer to rpc::client instead. Fixes #4350. Message-Id: <20190314135538.GC21521@scylladb.com>	2019-03-21 12:46:01 -03:00
Tomasz Grabiec	69775c5721	row_cache: Fix abort in cache populating read concurrent with memtable flush When we're populating a partition range and the population range ends with a partition key (not a token) which is present in sstables and there was a concurrent memtable flush, we would abort on the following assert in cache::autoupdating_underlying_reader: utils::phased_barrier::phase_type creation_phase() const { assert(_reader); return _reader_creation_phase; } That's because autoupdating_underlying_reader::move_to_next_partition() clears the _reader field when it tries to recreate a reader but it finds the new range to be empty: if (!_reader \|\| _reader_creation_phase != phase) { if (_last_key) { auto cmp = dht::ring_position_comparator(_cache._schema); auto&& new_range = _range.split_after(_last_key, cmp); if (!new_range) { _reader = {}; return make_ready_future<mutation_fragment_opt>(); } Fix by not asserting on _reader. creation_phase() will now be meaningful even after we clear the _reader. The meaning of creation_phase() is now "the phase in which the reader was last created or 0", which makes it valid in more cases than before. If the reader was never created we will return 0, which is smaller than any phase returned by cache::phase_of(), since cache starts from phase 1. This shouldn't affect current behavior, since we'd abort() if called for this case, it just makes the value more appropriate for the new semantics. Tests: - unit.row_cache_test (debug) Fixes #4236 Message-Id: <1553107389-16214-1-git-send-email-tgrabiec@scylladb.com>	2019-03-21 12:46:00 -03:00
Asias He	c0f744b407	storage_service: Wait for gossip to settle only if do_bind is set In commit `71bf757b2c`, we call wait_for_gossip_to_settle() which takes some time to complete in storage_service::prepare_to_join(). In tests/cql_query_test calls init_server with do_bind == false which in turn calls storage_service::prepare_to_join(). Since in the test, there is only one node, there is no point to wait for gossip to settle. To make the cql_query_test fast again, do not call wait_for_gossip_to_settle if do_bind is false. Before this patch, cql_query_test takes forever to complete. After it takes 10s. Tests: tests/cql_query_test Message-Id: <3ae509e0a011ae30eef3f383c6a107e194e0e243.1553147332.git.asias@scylladb.com>	2019-03-21 12:46:00 -03:00
Avi Kivity	a9cf07369f	Merge "Add local indexes" from Piotr " This series adds support for local indexing, i.e. when the index table resides on the same partition as base data. It addresses the performance issue of having an indexed query that also specifies a partition key - index will be queried locally. " * 'add_local_indexing_11' of https://github.com/psarna/scylla: (30 commits) tests: add cases for local index prefix optimization tests: add create/drop local index test case tests: add non-standard names cases to local index tests tests: add multi pk case for local index tests tests: add test for malformed local index definitions tests: add local index paging test tests: add local indexing test cql3: add CREATE INDEX syntax for local indexes cql3: use serialization function to create index target string index: add serialization function for index targets index: use proper local index target when adding index index: add parsing target column name from local index targets db: add checking for local index in schema tables index: add checking if serialized target implies local index index: enable parsing multi-key targets index: move target parser code to .cc file json: add non-throwing overload for to_json_value cql3: add checking for local indexes in has_supporting_index() cql3: move finding index restrictions to prepare stage cql3: add picking an index by score ...	2019-03-21 12:46:00 -03:00
Nadav Har'El	561c640ed1	materialized views: allow view without clustering columns When a materialized view was created, the verification code artificially forbade creating a view without a clustering key column. However, there is no real reason to forbid this. In the trivial case, the original base table might not have had a clustering key, and the view might want to use the exact same key. In a more complex case, a view may want to have all the primary key columns as partition key columns, and that should be fine. The patch also includes a regression test, which failed before this patch, and succeeds with it (we test that we can create materialized views in both aforementioned scenarios, and these materialized views work as expected). Duarte raised the opinion that the "trivial" case of a view table with a key identical to that of the base should be disallowed. However, this should be done, if at all (I think it shouldn't), in a follow-up patch, which will implement the non-triviality requirement consistently (e.g., require view primary key to be different from base's, regardless of the existance or non-existance of clustering columns). Fixes #4340. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190320122925.10108-1-nyh@scylladb.com>	2019-03-21 12:45:52 -03:00
Glauber Costa	34b640993f	storage proxy: add tracepoints about delays When we are tracing requests, we would like to know everything that happened to a query that can contribute to it having increased latencies. We insert some of those latencies explicitly due to throttling, but we do not log that into tracing. In the case of storage proxy, we do have a log message at trace level but that is rarely used: trace messages are too heavy of a hammer, there is no way to specify specific queries, etc. The correct place for that is CQL tracing. This patch moves that message to CQL tracing. We also add a matching tracepoint assuring us that no delay happened if that's the case. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190320163350.15075-1-glauber@scylladb.com>	2019-03-21 12:45:52 -03:00
Avi Kivity	eddb98e8c6	Merge "sstables: mc: Write and read static compact tables the same way as Cassandra" from Tomasz " Static compact tables are tables with compact storage and no clustering columns. Before this patch, Scylla was writing rows of static compact tables as clustered rows instead of as static rows. That's because in our in-memory model such tables have regular rows and no static row. In Cassandra's schema (since 3.x), those tables have columns which are marked as static and there are no regular columns. This worked fine as long as Scylla was writing and reading those sstables. But when importing sstables from Cassandra, our reader was skipping the static row, since it's not present in our schema, and returning no rows as a result. Also, Cassandra, and Scylla tools, would have problems reading those sstables. Fix this by writing rows for such tables the same way as Cassandra does. In order to support rolling downgrade, we do that only when all nodes are upgraded. Fixes #4139. Tests: - unit (dev) " * tag 'static-compact-mc-fix-v3.1' of github.com:tgrabiec/scylla: tests: sstables: Test reading of static compact sstable generated by Cassandra tests: sstables: Add test for writing and reading of static compact tables sstables: mc: Write static compact tables the same way as Cassandra sstable: mc: writer: Set _static_row_written inside write_static_row() sstables: Add sstable::features() sstables: mc: writer: Prepare write_static_row() for working with any column_kind storage_service: Introduce the CORRECT_STATIC_COMPACT feature flag sstables: mc: writer: Build indexed_columns together with serialization_header sstables: mc: writer: De-optimize make_serialization_header() sstable: mc: writer: Move attaching of mc-specific components out of generic code	2019-03-21 12:45:51 -03:00
Piotr Sarna	9695a47e96	sstables: simplify requires_view_building Since sstables uploaded via upload/ directory are no longer left there awaiting view updates, the only remaining valid directory is staging/.	2019-03-20 13:47:21 +01:00
Botond Dénes	0c381572fd	repair::row_level: pin table for local reads The repair reader depends on the table object being alive, while it is reading. However, for local reads, there was no synchronization between the lifecycle of the repair reader and that of the table. In some cases this can result in use-after-free. Solve by using the table's existing mechanism for lifecycle extension: `read_in_progress()`. For the non-local reader, when the local node's shard configuration is different from the remote one's, this problem is already solved, as the multishard streaming reader already pins table objects on the used shards. This creates an inconsistency that might be suprising (in a bad way). One reader takes care of pinning needed resources while the other one doesn't. I was thorn on how to reconcile this, and decided to go with the simplest solution, explicitely pinning the table for local reads, that is conserve the inconsistency. It was suggested that this inconsitency is remedied by building resource pinning into the local reader as well [1] but there is opposition to this [2]. Adding a wrapper reader which does just the resource pinning seems excessive, both in code and runtime overhead. Spotted while investigating repair-related crashes which occured during interrupted repairs. Fixes: #4342 [1] https://github.com/scylladb/scylla/issues/4342#issuecomment-474271050 [2] https://github.com/scylladb/scylla/issues/4342#issuecomment-474331657 Tests: none, this is a trivial fix for a not-yet-seen-in-the-wild bug. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <8e84ece8343468960d4e161467ecd9bb10870c27.1553072505.git.bdenes@scylladb.com>	2019-03-20 14:45:22 +02:00
Piotr Sarna	986004a959	loader: move uploaded view pending sstables to staging When loading tables uploaded via `nodetool refresh`, they used to be left in upload/ directory if view updates would need to be generated from them. Since view update generation is asynchronous, sstables left in the directory could erroneously get overwritten by the user, who decides to upload another batch of sstables and some of the names collided. To remedy this, uploaded sstables that need view updates are moved to staging/ directory with a unique generation number, where they await view update generation. Fixes #4047	2019-03-20 13:44:29 +01:00
Juliana Oliveira	8cd6028d0d	Dockerfile: remove cgroup volume mount Mounting /sys/fs/cgroup inside the image causes docker cgroup to not be mounted internally. Therefore, hosts cannot limit resources on Scylla. This patch removes the cgroup volume mount, allowing folders under /sys/fs/cgroup to be created inside docker. Message-Id: <20190320122053.GA20256@shenzou.localdomain>	2019-03-20 14:30:27 +02:00
Nadav Har'El	7c874057f5	materialized_views: propagate "view virtual columns" between nodes db::schema_tables::ALL and db::schema_tables::all_tables() are both supposed to list the same schema tables - the former is the list of their names, and the latter is the list of their schemas. This code duplication makes it easy to forget to update one of them, and indeed recently the new "view_virtual_columns" was added to all_tables() but not to ALL. What this patch does is to make ALL a function instead of constant vector. The newly named all_table_names() function uses all_tables() so the list of schema tables only appears once. So that nobody worries about the performance impact, all_table_names() caches the list in a per-thread vector that is only prepared once per thread. Because after this patch all_table_names() has the "view_virtual_columns" that was previously missing, this patch also fixes #4339, which was about virtual columns in materialized views not being propagated to other nodes. Unfortunately, to test the fix for #4339 we need a test with multiple nodes, so we cannot test it here in a unit test, and will instead use the dtest framework, in a separate patch. Fixes #4339 Branches: 3.0 Tests: all unit tests (release and debug mode), new dtest for #4339. The unit test mutation_reader_test failed in debug mode but not in release mode, but this probably has nothing to do with this patch (?). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190320063437.32731-1-nyh@scylladb.com>	2019-03-20 09:14:59 -03:00
Nadav Har'El	ccf731a820	Materialized views: add metric for current flow-control delay The materialized views flow control mechanism works by adding a certain delay to each client request, designed to slow down the client to the rate at we can complete the background view work. Until now we could observe this mechanism only indirectly, in whether or not it succeeded to keep the view backlog bounded; But we had no way to directly observe the delay that we decided to add. In fact, we had a bug where this delay was constantly zero, and we didn't even notice :-) So in this patch we add a new metric, scylla_storage_proxy_coordinator_last_mv_flow_control_delay The metric is a floating point number, in units of seconds. This metric is somewhat peculiar that it always contains the last delay used for some request - unlike other metrics it doesn't measure the "current" value of something. Moreover, it can jump wildly because there is no guarantee that each request's delay will be identical (in particular, different requests may involve different base replicas which have different view backlogs, so decide on different delays). In the future we may want to supplement this metric with some sort of delay histogram. But even this simple metric is already useful to debug certain scenarios and understand if the materialized-views flow control is working or not. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190227133630.26328-1-nyh@scylladb.com>	2019-03-20 09:14:59 -03:00
Tomasz Grabiec	fbeae4ffeb	toolchain: Install gdb in the image Scylla built using the frozen toolchain needs to be debugged on a system with matching libraries. It's easiest if it's also done on the same image. Install gdb in the image so that it's always out there when we need it. Fixes #4329 Message-Id: <1553072393-9145-1-git-send-email-tgrabiec@scylladb.com>	2019-03-20 13:35:26 +02:00
Piotr Sarna	41679de13e	tests: add cases for local index prefix optimization The cases check if incorporating clustering key prefix into the indexed query works fine (i.e. does not require filtering and returns proper rows).	2019-03-20 10:51:27 +01:00
Piotr Sarna	56a0e6d992	tests: add create/drop local index test case	2019-03-20 10:51:27 +01:00
Piotr Sarna	3c61c8e18a	tests: add non-standard names cases to local index tests New test cases cover case-sensitive column/table names and names with non-alphanumeric characters like commas and parentheses.	2019-03-20 10:51:27 +01:00
Piotr Sarna	d664e0e522	tests: add multi pk case for local index tests	2019-03-20 10:51:27 +01:00
Piotr Sarna	3b39029924	tests: add test for malformed local index definitions	2019-03-20 10:51:27 +01:00
Piotr Sarna	4b82011cd3	tests: add local index paging test	2019-03-20 10:51:27 +01:00
Piotr Sarna	8836500fcd	tests: add local indexing test A test case for local indexing is added to the SI suite.	2019-03-20 10:51:27 +01:00
Piotr Sarna	cedec95f8d	cql3: add CREATE INDEX syntax for local indexes In order to create a local index, the syntax used is: CREATE INDEX t ON ((p1, p2, p3), v); where (p1, p2, p3) are partition key columns (all of them), and v is the indexed column.	2019-03-20 10:51:27 +01:00
Piotr Sarna	1fd61c5ac4	cql3: use serialization function to create index target string Instead of building the string manually, a serialization function is called to create a string out of index target list.	2019-03-20 10:51:27 +01:00
Piotr Sarna	757419b524	index: add serialization function for index targets Since target_parser is responsible for deserializing target strings, the function that serializes them belongs in the same class.	2019-03-20 10:51:26 +01:00
Piotr Sarna	074ed2c8a5	index: use proper local index target when adding index With global indexes, target column name is always the same as the string kept in 'options[target]' field. It's not the case for local indexes, and so a proper extracting function is used to get the value.	2019-03-20 10:20:24 +01:00
Piotr Sarna	2fcae3d0ec	index: add parsing target column name from local index targets When (re)creating a local index, the target string needs to be used to parse out the actual indexed column: "(base_pk_part1,base_pk_part2,base_pk_part3),actual_indexed_column". This column is later used to deterine if an index should be applied to a SELECT statement.	2019-03-20 10:20:24 +01:00
Piotr Sarna	e0d7807eed	db: add checking for local index in schema tables Based on which targets the index has, it will be either local or global - local indexes have their full base partition key embedded in their targets.	2019-03-20 10:20:24 +01:00
Piotr Sarna	de5e5ee1a5	index: add checking if serialized target implies local index This utility enables checking if the specified target indicated having a local index, even before base table schema is known.	2019-03-20 10:20:24 +01:00
Piotr Sarna	5672edc149	index: enable parsing multi-key targets Parsing index targets that consist of partition key columns followed by clustering key columns is enabled.	2019-03-20 10:20:24 +01:00
Piotr Sarna	9782381dd4	index: move target parser code to .cc file It will be useful later when expanding the implementation.	2019-03-20 10:20:24 +01:00
Piotr Sarna	25264d61ee	json: add non-throwing overload for to_json_value It will be needed later to avoid unnecessary try-catch blocks.	2019-03-20 10:20:24 +01:00
Piotr Sarna	b46ab76d4b	cql3: add checking for local indexes in has_supporting_index() With local indexes it's not sufficient to check if a single restriction is supported by an index in order to decide that in can be used, because local indexes can be leveraged only when full partition key is properly restricted. (It also serves as a great example why restrictions code would greatly benefit from a facelift! :) )	2019-03-20 10:20:24 +01:00
Piotr Sarna	87f6e37caa	cql3: move finding index restrictions to prepare stage Index restrictions that match a given index were recomputed during execution stage, which is redundant and prone to errors. Now, used index restrictions are cached in a prepare statement.	2019-03-20 10:20:22 +01:00
Piotr Sarna	9823898b27	cql3: add picking an index by score Instead of choosing the first index that we find (in column def order), the index with highest score is picked. Currently local indexes score higher than global ones if restrictions allow local indexing to be applied.	2019-03-20 10:20:02 +01:00
Piotr Sarna	2f173f7ed8	cql3: add handling paging state for local indexes When computing paging state for local indexes, the partition and clustering keys are different than with global ones: - partition key is the same as base's - clustering key starts with the indexed column	2019-03-20 10:20:02 +01:00
Piotr Sarna	75dd964751	cql3: add handling partition slices for local indexes For local indexes, a slice will consist of the indexed column followed by base clustering columns.	2019-03-20 10:20:01 +01:00
Piotr Sarna	b12162c8f5	cql3: add returning correct partition ranges for local indexes Local indexes always share the partition range with their base.	2019-03-20 09:51:46 +01:00
Piotr Sarna	da8e8f18b3	cql3: make read_posting_list a member function It already accepts several arguments that can be extracted from 'this', and more will be added in the future. New parameters include lambdas prepared during prepare stage that define how to extract partition/clustering key ranges depending on which index is used, so keeping it a static function will result in unbounded number of parameters with complex types, which will in turn make the function header almost illegible for a reader. Hence, read_posting_list becomes a member function with easy access to any data prepared during prepare stage.	2019-03-20 09:51:46 +01:00
Piotr Sarna	85017c5ad4	cql3: look for indexed column definition only once There's no need to look for the column definition inside a loop.	2019-03-20 09:51:46 +01:00
Piotr Sarna	8002471c81	cql3: allow index target to keep multiple columns Instead of having just one column definition, index target is now a variant of either single column definition or a vector of them. The vector is expected to be used when part of a target definition is enclosed in parentheses: $ CREATE INDEX ON t((p),v); or $ CREATE INDEX ON t((p1,p2), v); etc. This feature will allow providing (possibly composite) base partition key to CREATE INDEX statement, which will result in creating a local index.	2019-03-20 09:51:46 +01:00
Piotr Sarna	a45022dbc7	docs: document index target serialization Index target serialization format is extended for the purpose of local indexing. Both new and old formats are described in docs.	2019-03-20 09:51:46 +01:00

1 2 3 4 5 ...

18318 Commits