scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-07 07:23:15 +00:00

Author	SHA1	Message	Date
Duarte Nunes	4859b759b9	Merge 'Make all timeouts explicit' from Avi " This patchset makes all users of query_processor specify their timeouts explicitly, in preparation for the removal of cql_statement::execute_internal() (whose main function was to override timeouts). " * tag 'cql-explicit-timeouts/v1' of https://github.com/avikivity/scylla: query_processor: require clients to specify timeout configuration query_processor: un-default consistency level in make_internal_options	2018-05-26 16:10:58 +02:00
Piotr Sarna	3792bed3ed	view: adapt view_stats to act as write stats This commit adapts view_stats structure so it can be passed to storage_proxy as write stats. Thanks to that, mv replica updates will not interfere with user write metrics. As a side effect it also provides more stats to replica view updates. Closes #3385 Closes #3416	2018-05-22 16:52:58 +02:00
Piotr Sarna	9246bb36bc	db: add row locking metrics This commit adds statistics to row_locker class. Metrics are independendly counted for all lock types: row<->partition and exclusive<->shared. Metrics gathered: - total acquisitions - operations that wait on the lock - histogram of the time spent on waiting on this type of lock References #3385 References #3416	2018-05-22 16:52:58 +02:00
Piotr Sarna	49bebcfa25	view: add view metrics This commit introduces view statistics: - updates pushed to local/remote replicas - updates failed to be pushed to local/remote replicas Metrics are kept on per-table basis, i.e. updates_pushed_remote shows the number of total updates (mutations) pushed to all paired mv replicas that this particular table has. Every single update is taken into consideration, so if view update requires removing a row from one view and adding a row to another, it will be counted as 2 updates. References #3385 References #3416	2018-05-22 16:52:58 +02:00
Calle Wilund	62c3b4c429	commitlog: Ensure file objects are closed before object free Fixes #3446 Previously, only shutdown-synced objects where actually closed, which is wrong. This introduces yet another queue, processed together with the deletion objects, which ensures we explicitly close all objects that have been discarded. Message-Id: <20180521140456.32100-1-calle@scylladb.com>	2018-05-22 14:52:06 +03:00
Glauber Costa	596a525950	commitlog: don't move pointer to segment We are currently moving the pointer we acquired to the segment inside the lambda in which we'll handle the cycle. The problem is, we also use that same pointer inside the exception handler. If an exception happens we'll access it and we'll crash. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180518125820.10726-1-glauber@scylladb.com>	2018-05-18 17:25:18 +02:00
Avi Kivity	a99e820bb9	query_processor: require clients to specify timeout configuration Remove implicit timeouts and replace with caller-specified timeouts. This allows removing the ambiguity about what timeout a statement is executed with, and allows removing cql_statement::execute_internal(), which mostly overrode timeouts and consistency levels. Timeout selection is now as follows: query_processor::*_internal: infinite timeout, CL=ONE query_processor::process(), execute(): user-specified consisistency level and timeout All callers were adjusted to specify an infinite timeout. This can be further adjusted later to use the "other" timeout for DCL and the read or write timeout (as needed) for authentication in the normal query path. Note that infinite timeouts don't mean that the query will hang; as soon as the failure detector decides that the node is down, RPC responses will termiante with a failure and the query will fail.	2018-05-14 09:41:06 +03:00
Duarte Nunes	a23bda3393	Merge 'Implement separate timeout for range queries' from Avi " This patchset implements separate timeouts for range queries, and lays the foundations for separate timeouts for other query types. While the feature in itself is worthy, the real motivation is to have the timeouts decided by the caller, instead of storage_proxy. This in turn is required to disentangle each layer behaving differently depending on whether the query is internal or not; instead, the goal is to have each caller declare its needs in terms of consistency level and timeouts, and have the lower layers implement its requirements instead of making their own decisions. Fixes #3013. Tests: unit (release) " * tag '3013/v1.1' of https://github.com/avikivity/scylla: storage_proxy: remove default_query_timeout() storage_proxy: don't use default timeouts query_options: augment with timeout_config thrift: configure thrift transport and handler with a timeout_config transport: configure native transport with a timeout_config cql3: define and populate timeout_config_selector timeout_config: introduce timeout configuration	2018-05-13 20:05:50 +02:00
Paweł Dziepak	75b8b521d9	db/view/build_progress: avoid copying mutation fragment	2018-05-09 16:52:26 +01:00
Paweł Dziepak	0b4c6b8938	types: make some collection_type_impl functions non-static The switch to the new in-memory representation will require a larger parts of the logic be aware of the type of the values they are dealing with. In most cases it is not a significant burden for the users.	2018-05-09 16:52:26 +01:00
Vlad Zolotarov	48c96d09d6	db::hints::manager: drain hints when the node is decommissioned/removed When node is decommissioned/removed it will drain all its hints and all remote nodes that have hints to it will drain their hints to this node. What "drain" means? - The node that "drains" hints to a specific destination will ignore failures and will continue sending hints till the end of the current segment, erase it and move to the next one till there are no more segments left. After all hints are drained the corresponding hints directory is removed. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-05-08 22:29:21 +01:00
Vlad Zolotarov	ec76f8a27d	db::hints::manager: add a few more trace messages Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-05-08 22:29:21 +01:00
Vlad Zolotarov	6ede32156f	db::hints::manager::end_point_hints_manager::sender: add set_stopping()/stopping() methods It's nicer to have access methods instead of working directly with enum_set methods and values. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-05-08 22:29:21 +01:00
Vlad Zolotarov	94da744f37	db::hints::manager::end_point_hints_manager::stop(): log the last exception instead of forwarding it Returning a future with an exception from end_point_manager::stop() is practically useless because the best the caller can do is to log it and continue as if it didn't happen because it has other things to shut down. Therefore in order to simplify the caller we will log the exception if it happens and will always return a non-exceptional future. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-05-08 22:29:21 +01:00
Vlad Zolotarov	8aedbf9d18	db::hints: manager.hh: cleanup: fix the comments Fix the comments that went out of sync with the current implementation. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-05-08 22:29:21 +01:00
Vlad Zolotarov	5463b58faa	db::hints::manager: rework end_point_hints_manager::stop() to use seastar::async() This simplifies the code reading and extending. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-05-08 22:29:21 +01:00
Duarte Nunes	c053275a48	db/view/row_locking: Add timeout when waiting for the lock This ensures we respect the write timeout set by the client when applying base writes, in case a writes takes too long to acquire the row lock for the read-before-write phase of a materialized view update. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180507132755.8751-1-duarte@scylladb.com>	2018-05-07 18:22:39 +01:00
Duarte Nunes	2be75bdfc9	db/timeout_clock: Properly scope type names Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180426134457.21290-1-duarte@scylladb.com>	2018-05-07 11:24:41 +03:00
Piotr Sarna	fe02c3d0e2	database, sstables, tests: add large_partition_handler This commit makes database, sstables and tests aware of which large_partition_handler they use. Proper large_partition_handler is retrievable from config information and is based on existing compaction_large_partition_warning_threshold_mb entry. Right now CQL TABLE variant of large_partition_handler is used in the database. Tests use a NOP version of large_partition_handler, which does not depend on CQL queries at all.	2018-05-04 14:38:13 +02:00
Piotr Sarna	14b3c7e7e7	db: add large_partition_handler interface with implementations This commit introduces large_partition_handler class, which can be used to take additional action when large partitions are written. It comes with two implementations: * NOP, used in tests, which does nothing on large partition update/delete * CQL TABLE, which inserts/deletes information on particular sstable to system.large_partitions table, in order to be retrievable from cqlsh later. References #3292	2018-05-04 12:46:31 +02:00
Piotr Sarna	02822efbc8	db: add system.large_partitions table This commit adds a system.large_partitions table, which can be used to trace largest partitions of a cluster. Schema: ( keyspace_name text, table_name text, sstable_name text, partition_size bigint, key text, compaction_time timestamp, PRIMARY KEY((keyspace_name, table_name), sstable_name, partition_size, key) ) WITH CLUSTERING ORDER BY (partition_size DESC); References #3292	2018-05-04 12:45:40 +02:00
Nadav Har'El	21d7507b74	secondary index: move stuff out of db/index directory The db/index directory contains just a few lines of code that exists there for historical reasons. It's confusing that we have both db/index and index/ directory related to secondary-indexing. This patch moves what little is still in db/index/ to index/. In the future we should probably get rid of the "secondary_index" class we had there, but for now, let's at least not have a whole new directory for it. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180501101246.21143-1-nyh@scylladb.com>	2018-05-01 13:21:24 +03:00
Avi Kivity	d8dd7e05a7	storage_proxy: don't use default timeouts Require all callers to supply timeouts instead of relying on defaults. Since all callers now have the timeouts set up, they can easily supply them.	2018-04-30 13:19:53 +03:00
Nadav Har'El	8012f231ca	materialized views: fix another case-sensitivity bug We had another case-sensitivity bug in materialized views, where if a case-sensitive (quoted) column name was listed explicitly on "SELECT" (instead of implicitly, e.g., in "SELECT *") the column name was incorrectly folded to lower-case and inserts would fail. This patch fixes the code, where a "SELECT" statement was built using the desired column names, but column names that needed quoting were not being quoted. The bug was in a helper function build_select_statement() which took column name strings and failed to quote them. We clean up this function to take column definitions instead of strings - and take care of the quoting itself. It also needs to quote the table's name in the select statement being built. Fixes #3391. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180429221857.6248-6-nyh@scylladb.com>	2018-04-30 00:27:23 +02:00
Tomasz Grabiec	b1465291cf	db: schema_tables: Treat drop of scylla_tables.version as an alter After upgrade from 1.7 to 2.0, nodes will record a per-table schema version which matches that on 1.7 to support the rolling upgrade. Any later schema change (after the upgrade is done) will drop this record from affected tables so that the per-table schema version is recalculated. If nodes perform a schema pull (they detect schema mismatch), then the merge will affect all tables and will wipe the per-table schema version record from all tables, even if their schema did not change. If then only some nodes get restarted, the restarted nodes will load tables with the new (recalculated) per-table schema version, while not restarted nodes will still use the 1.7 per-table schema version. Until all nodes are restarted, writes or reads between nodes from different groups will involve a needless exchange of schema definition. This will manifest in logs with repeated messages indicating schema merge with no effect, triggered by writes: database - Schema version changed to 85ab46cd-771d-36c9-bc37-db6d61bfa31f database - Schema version changed to 85ab46cd-771d-36c9-bc37-db6d61bfa31f database - Schema version changed to 85ab46cd-771d-36c9-bc37-db6d61bfa31f The sync will be performed if the receiving shard forgets the foreign version, which happens if it doesn't process any request referencing it for more than 1 second. This may impact latency of writes and reads. The fix is to treat schema changes which drop the 1.7 per-table schema version marker as an alter, which will switch in-memory data structures to use the new per-table schema version immediately, without the need for a restart. Fixes #3394 Tests: - dtest: schema_test.py, schema_management_test.py - reproduced and validated the fix with run_upgrade_tests.sh from git@github.com:tgrabiec/scylla-dtest.git - unit (release) Message-Id: <1524764211-12868-1-git-send-email-tgrabiec@scylladb.com>	2018-04-27 17:12:33 +03:00
Calle Wilund	b1edf75c8b	types: Make seastar::inet_address the "native" type for CQL inet. Fixes #3187 Requires seastar "inet_address: Add constructor and conversion function from/to IPv4" Implements support IPv6 for CQL inet data. The actual data stored will now vary between 4 and 16 bytes. gms::inet_address has been augumented to interop with seastar::inet_address, though of course actually trying to use an Ipv6 address there or in any of its tables with throw badly. Tests assuming ipv4 changed. Storing a ipv4_address should be transparent, as it now "widens". However, since all ipv4 is inet_address, but not vice versa, there is no implicit overloading on the read paths. I.e. tests and system_keyspace (where we read ip addresses from tables explicitly) are modified to use the proper type. Message-Id: <20180424161817.26316-1-calle@scylladb.com>	2018-04-24 23:12:07 +01:00
Piotr Jastrzebski	e1e23ec555	Pass sstable version to describe_type Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-24 11:30:26 +02:00
Duarte Nunes	844e0b41d1	db/view: Move cells instead of copying in add_cells_to_view() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-04-23 09:32:03 +01:00
Duarte Nunes	4b4d1dbd1f	db/view: Handle unselected base columns and corner cases When a view's PK only contains the columns that form the base's PK, then the liveness of a particular view row is determined not only by the base row's marker, but also by the selected and, more importantly, unselected columns. This patch ensures that unselected columns are considered as much as possible, even though some limitations will still exist. In particular, we need to represent multiple timestamps (from all the unselected columns), but have only mechanisms to record a single timestamp. We also have some issues when dealing with selected column, and the way we currently delete them. Consider the following: create table cf (p int, c int, a int, b int, primary key (p, c)) create materialized view vcf as select a, b from cf where p is not null and c is not null primary key (p, c) 1) update cf using timestamp 10 set a = 1 where p = 1 and c = 1 2) delete a from cf using timestamp 11 where p = 1 and c = 1 3) update cf using timestamp 1 set a = 2 where p = 1 and c = 1 After 1), the MV should include a row with row marker @ ts10, p = 1, c = 1, a = 1. After 2), this row should be removed. At 3), we should add a row with row marker @ ts1, p = 1, c = 1, a = 1, with a lower timestamp. This means that the delete should not insert a row tombstone with timestamp @ 11, as we do now but it should just delete the view's row marker (which exists) with ts1. Refs #3362 Fixes #3140 Fixes #3361 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-04-23 09:32:02 +01:00
Duarte Nunes	67dac67c46	mutation_partition: Regular base column in view determines row liveness When views contain a primary key column that is not part of the base table primary key, that column determines whether the row is live or not. We need to ensure that when that cell is dead, and thus the derived row marker, either by normal deletion of by TTL, so is the rest of the row. This patch introduces the idea of shawdowing row marker. We map the status of the regular base column in the view's PK to the view row's marker. If this marker is dead, so is that cell in the base table, and so should the view row become. To enforce that, a view row's dead marker shadows the whole row if that view includes a base regular column in its PK. Fixes #3360 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-04-23 09:32:02 +01:00
Duarte Nunes	4dfce4d369	db/view: Don't avoid read-before-write when view PK matches base When a view's PK only contains the columns that form the base's PK, then the liveness of a particular view row is determined not only by the base row's marker, but also by the selected and, more importantly, unselected columns. When calculating the view's row marker we need to access those unselected columns, so we can't avoid the read-before-write as we were doing. Refs #3362 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-04-23 09:32:02 +01:00
Duarte Nunes	bd3cedd240	db/view: Process base updates to column unselected by its views When a view's PK only contains the columns that form the base's PK, then the liveness of a particular view row is determined not only by the base row's marker, but also by the selected and, more importantly, unselected columns. So, process base updates to columns unselected by any of its views. Refs #3362 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-04-23 09:32:02 +01:00
Duarte Nunes	ac9b93eb89	db/view: Consider partition tombstone when generating updates Not adding the partition tombstone to the current list of tombstones may cause updates to be incorrectly generated. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-04-23 09:32:02 +01:00
Duarte Nunes	164f043768	view_info: Add view_column() overload For when we already have the base's column_definition. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-04-23 09:32:02 +01:00
Duarte Nunes	31370fd7b1	view_info: Explicitly initialize base-dependent fields Instead of lazily-initializing the regular base column in the view's PK field, explicitly initialize it. This will be used by future patches that don't have access to the schema when wanting to obtain that column. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-04-23 09:32:02 +01:00
Avi Kivity	28be4ff5da	Revert "Merge "Implement loading sstables in 3.x format" from Piotr" This reverts commit `513479f624`, reversing changes made to `01c36556bf`. It breaks booting. Fixes #3376.	2018-04-23 06:47:00 +03:00
Avi Kivity	513479f624	Merge "Implement loading sstables in 3.x format" from Piotr " Pass sstable version to parse, write and describe_type methods to make it possible to handle different versions. For now serialization header from 3.x format is ignored. Tests: units (release) " * 'haaawk/sstables3/loading_v3' of ssh://github.com/scylladb/seastar-dev: Add test for loading the whole sstable Add test for loading statistics Add support for 3_x stats metadata Pass sstable version to describe_type Pass sstable version to write methods metadata_type: add Serialization type Pass sstable_version_types to parse methods Add test for reading filter Add test for read_summary sstables 3.x: Add test for reading TOC sstable: Make component_map version dependent sstable::component_type: add operator<< Extract sstable::component_type to separete header Remove unused sstable::get_shared_components sstable_version_types: add mc version	2018-04-22 16:18:39 +03:00
Piotr Jastrzebski	26ab3056ae	Pass sstable version to describe_type Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-22 14:41:11 +02:00
Duarte Nunes	17917e12ce	db/view: Wait for schema agreement in background upon view building Waiting for schema agreement in the foreground may cause the node to not boot in useful time. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180417125915.11262-1-duarte@scylladb.com>	2018-04-17 18:03:43 +03:00
Avi Kivity	7c01e66d53	cql3: query_processor: store and use just local shard reference of storage_proxy Since storage_proxy provides access to the entire cluster, a local shard reference is sufficient. Adjust query_processor to store a reference to just the local shard, rather than a seastar::sharded<storage_proxy> and adjust callers. This simplifies the code a little. Message-Id: <20180415142656.25370-3-avi@scylladb.com>	2018-04-16 10:20:50 +02:00
Avi Kivity	9cef37e643	Merge "db/view: View building fixes" from Duarte " Fixes to the view building process, discovered from field experience. Tests: dtest(materialized_view_tests.py, smp=2) " * 'views/view-build-fixes/v1' of https://github.com/duarten/scylla: db/view: Start view building after schema agreement db/system_keyspace: scylla_views_builds_in_progress writes are user mem db/view: Require configuration option to enable view building	2018-04-03 17:42:21 +03:00
Duarte Nunes	ec8960df45	db/view: Reject view entries with non-composite, empty partition key Empty partition keys are not supported on normal tables - they cannot be inserted or queried (surprisingly, the rules for composite partition keys are different: all components are then allowed to be empty). However, the (non-composite) partition key of a view could end up being empty if that column is: a base table regular column, a base table clustering key column, or a base table partition key column, part of a composite key. Fixes #3262 Refs CASSANDRA-14345 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180403122244.10626-1-duarte@scylladb.com>	2018-04-03 15:25:52 +03:00
Duarte Nunes	d4db043f03	db/view: Start view building after schema agreement If a base table or view has been dropped in one node, but another one hasn't yet learned about it, it starts the view build process immediately on boot, possibly calculating unneeded view updates and causing errors at the view replica, if that replica has already processed the schema changes. We should thus wait for schema agreement, even if the node is a seed. Fixes #3328 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-04-03 13:16:28 +01:00
Duarte Nunes	75bb66a50d	db/system_keyspace: scylla_views_builds_in_progress writes are user mem Treat writes to scylla_views_builds_in_progress as user memory, as the number of writes is dependent on the amount of user data on views (times the number of views, divided by the view building batch size). Fixes #3325 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-04-03 13:16:28 +01:00
Duarte Nunes	bf5045c7eb	db/view: Require configuration option to enable view building View building, enabled by default, can contain or expose issues that prevent the node from starting. In those cases, it is necessary to disable view building such that the node can be submitted to maintenance operations. Fixes #3329 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-04-03 13:16:28 +01:00
Duarte Nunes	11ece46f14	db/view: Remove leftover debug statement Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180402175238.5528-1-duarte@scylladb.com>	2018-04-03 09:41:33 +01:00
Avi Kivity	7ab52947dc	conf: define named_value<log_level> externally While building with -O1, I saw that the linker could not find the vtable for named_value<log_level>. Rather than fixing up the includes (and likely lengthening build time), fix by defining the class as an extern template, preventing it from being instantiated at the call site. Message-Id: <20180401150235.13451-1-avi@scylladb.com>	2018-04-02 19:23:06 +01:00
Duarte Nunes	a45fa8eaa2	db/view/view_builder: Allow synchronizing with the end of a build Intended for use by unit tests, this patch allows synchronizing with the end of a build for a particular view. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:11 +01:00
Duarte Nunes	5f822e3928	db/view/view_builder: Actually build views This patch adds the missing view building code to the eponymous class. We consume from the reader associated with each base table until all its views are built. If the reader reaches the end and there are incomplete views, then a view was added while others were being built. In such cases, we restart the reader to the beginning of the current token, but not to the beginning of the token range, when the view is added. Then, when we exhaust the reader, we simply create a new one for the whole token range, and resume building the pending views. We aim to be resource-conscious. On a given shard, at any given moment, we consume at most from one reader. We also strive for fairness, in that each build step inserts entries for the views of a different base. Each build step reads and generates updates for batch_size rows. We lack a controller, which could potentially allow us to go faster (to execute multiple steps at the same time, or consume more rows per batch), and also which would apply backpressure, so we could, for example, delay executing a build step. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:11 +01:00
Duarte Nunes	a21efeffa0	db/view/view_builder: React to schema changes The view_builder now uses the migration_manager to subscribe to schema change events, and update its bookkeeping accordingly. We prefer this to having the database call into the view_builder, as that would create a cyclic dependency. We serialize changes to the views of a particular base table, such that schema changes do not interfere with the upcoming view building code. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:11 +01:00

1 2 3 4 5 ...

1063 Commits