scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 12:47:02 +00:00

Author	SHA1	Message	Date
Rafael Ávila de Espíndola	fd5ea2df5a	Avoid including cryptopp headers cryptopp's config.h has the following pragma: #pragma GCC diagnostic ignored "-Wunused-function" It is not wrapped in a push/pop. Because of that, including cryptopp headers disables that warning on scylla code too. The issue has been reported as https://github.com/weidai11/cryptopp/issues/793 To work around it, this patch uses a pimpl to have a single .cc file that has to include cryptopp headers. While at it, it also reduces the differences and code duplication between the md5 and sha1 hashers. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-02-20 08:03:46 -08:00
Calle Wilund	4e657c0633	system_keyspace: Add waitable for trunc. migration For tests. Hooray for separation of concern.	2019-02-13 09:08:12 +00:00
Calle Wilund	64e8c6f31d	storage_service: Add features disabling for tests	2019-02-13 09:08:12 +00:00
Calle Wilund	12ebcf1ec7	commitlog_replay: Use dedicated table for truncation Fixes #4083 Instead of sharded collection in system.local, use a dedicated system table (system.truncated) to store truncation positions. Makes query/update easier and easier on the query memory. The code also migrates any existing truncation positions on startup and clears the old data.	2019-02-13 09:08:12 +00:00
Avi Kivity	6c71eae63f	Merge "API: Stream compaction history records" from Amnon " get_compaction_history can return a lot of records which will add up to a big http reply. This series makes sure it will not create large allocations when returning the results. It adds an api to the query_processor to use paged queries with a consumer function that returns a future, this way we can use the http stream after each record. This implementation will prevent large allocations and stalls. Fixes #4152 " * 'amnon/compaction_history_stream_v7' of github.com:scylladb/seastar-dev: tests/query_processor_test: add query_with_consumer_test system_keyspace, api: stream get_compaction_history query_processor: query and for_each_cql_result with future	2019-02-05 14:16:36 +02:00
Amnon Heiman	6c7742d616	system_keyspace, api: stream get_compaction_history get_compaciton_history can return big chunk of data. To prevent large memory allocation, the get_compaction_history now read each compaction_history record and use the http stream to send it. Fixes #4152 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-02-05 11:14:53 +02:00
Piotr Jastrzebski	834bec5cc9	Read shard awareness columns as dropped Without this new version of Scylla won't be able to start with system tables inherited after older version that had shard awareness columns. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <cb62f20fc0c98f532c6f4ad5e08b3794951e85bd.1549289050.git.piotr@scylladb.com>	2019-02-04 18:43:11 +02:00
Piotr Jastrzebski	ad217bbdc7	Revert "system_keyspace: add sharding information to local table" This reverts commit `bdce561ada`. Those columns are not used and cause problems with tools. Refs #4112 Message-Id: <c772ebc0ebc001e5bdf229424c6d51dc58cd5d2e.1548945023.git.piotr@scylladb.com>	2019-01-31 19:06:55 +01:00
Botond Dénes	4e89dea9ea	database: don't allow access to global semaphores Recently we had a bug (#4096) due to a component (`multishard_mutation_query()`) assuming that all reads used the semaphore obtainable via `database::user_read_concurrency_sem()`. This problem revealed that it is plain wrong to allow access to the shard-global semaphores residing in the database object. Instead all code wishing to access the relevant semaphore for some read, should do so via the relevant `table` object, thus guaranteeing that it will get the correct semaphore, configured for that table. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4f3a6780eb3240822db34aba7c1ba0a675a96592.1547734212.git.bdenes@scylladb.com>	2019-01-21 16:29:02 +02:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Avi Kivity	0c0cc66ee7	system_keyspace, view: reduce interdependencies system_keyspace is an implementation detail for most of its users, not part of the interface, as it's only used to store internal data. Therefore, including it in a header file causes unneeded dependencies. This patch removes a dependency between views and system_keyspace.hh by moving view_name and view_build_progress into a separate header file, and using forward declarations where possible. This allows us to remove an inclusion of system_keyspace.hh from a header file (the last one), so that further changes to system_keyspace.hh will cause fewer recompilations. Message-Id: <20181228215736.11493-1-avi@scylladb.com>	2018-12-29 12:12:15 +00:00
Tomasz Grabiec	7747f2dde3	Merge "nodetool toppartitions" from Rafi & Avi Implementation of nodetool toppartiotion query, which samples most frequest PKs in read/write operation over a period of time. Content: - data_listener classes: mechanism that interfaces with mutation readers in database and table classes, - toppartition_query and toppartition_data_listener classes to implement toppartition-specific query (this interfaces with data_listeners and the REST api), - REST api for toppartitions query. Uses Top-k structure for handling stream summary statistics (based on implementation in C, see #2811). What's still missing: - JMX interface to nodetool (interface customization may be required), - Querying #rows and #bytes (currently, only #partitions is supported). Fixes #2811 https://github.com/avikivity/scylla rafie_toppartitions_v7.1: top_k: whitespace and minor fixes top_k: map template arguments top_k: std::list -> chunked_vector top_k: support for appending top_k results nodetool toppartitions: refactor table::config constructor nodetool toppartitions: data listeners nodetool toppartitions: add data_listeners to database/table nodetool toppartitions: fully_qualified_cf_name nodetool toppartitions: Toppartitions query implementation nodetool toppartitions: Toppartitions query REST API nodetool toppartitions: nodetool-toppartitions script	2018-12-28 16:31:24 +01:00
Rafi Einstein	038f8c7988	nodetool toppartitions: refactor table::config constructor Eliminae extra parameters to ctor and deduce them instead from db param. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Avi Kivity	775b7e41f4	Update seastar submodule * seastar d59fcef...b924495 (2): > build: Fix protobuf generation rules > Merge "Restructure files" from Jesse Includes fixup patch from Jesse: " Update Seastar `#include`s to reflect restructure All Seastar header files are now prefixed with "seastar" and the configure script reflects the new locations of files. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com> "	2018-11-21 00:01:44 +02:00
Avi Kivity	d77e044cde	db: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	04b70a2ff8	system_keyspace: simplify complicated sprint() update_peer_info() uses two sprint()s where one would do, which confuses the sprint-to-fmt translator. Simplify the code by using just one call.	2018-11-01 13:16:17 +00:00
Benny Halevy	2a57c454f2	update_compaction_history: handle execute_cql exception Fixes #3774 Tested using view_schema_test with and without injecting an exception in modification_statement::do_execute for "compaction_history". Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181017105758.9602-3-bhalevy@scylladb.com>	2018-10-24 18:39:53 +03:00
Tomasz Grabiec	10f6b125c8	database: Run system table flushes in the main scheduling group memtable flushes for system and regular region groups run under the memtable_scheduling_group, but the controller adjusts shares based on the occupancy of the regular region group. It can happen that regular is not under pressure, but system is. In this case the controller will incorrectly assign low shares to the memtable flush of system. This may result in high latency and low throughput for writes in the system group. I observed writes to the sytem keyspace timing out (on scylla-2.3-rc2) in the dtest: limits_test.py:TestLimits.max_cells_test, which went away after this. Fixes #3717. Message-Id: <1535016026-28006-1-git-send-email-tgrabiec@scylladb.com>	2018-08-23 15:07:05 +03:00
Duarte Nunes	2fa7f10429	db/system_keyspace: Add function to remove view build status of a shard This patch adds a function that clears the view build in-progress status for the current shard, similar to the existing one that clears it across all shards. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-07-11 21:27:39 +01:00
Avi Kivity	6f23403137	Merge "Virtualize IndexInfo system table" from Duarte " The IndexInfo table tracks the secondary indexes that have already been populated. Since our secondary index implementation is backed by materialized views, we can virtualize that table so queries are actually answered by built_views. Fixes #3483 " * 'built-indexes-virtual-reader/v2' of github.com:duarten/scylla: tests/virtual_reader_test: Add test for built indexes virtual reader db/system_keysace: Add virtual reader for IndexInfo table db/system_keyspace: Explain that table_name is the keyspace in IndexInfo index/secondary_index_manager: Expose index_table_name() db/legacy_schema_migrator: Don't migrate indexes	2018-06-06 17:35:51 +03:00
Glauber Costa	bdce561ada	system_keyspace: add sharding information to local table We would like the clients to be able to route work directly to the right shards. To do that, they need to know the sharding algorithm and its parameters. The algorithm can be copied into the client, but the parameters need to be exported somewhere. Let's use the local table for that. Signed-off-by: Glauber Costa <glauber@scylladb.com> --- v2: force msb to zero on non-murmur	2018-06-04 11:25:58 -04:00
Duarte Nunes	3e39985c7a	db/system_keysace: Add virtual reader for IndexInfo table The IndexInfo table tracks the secondary indexes that have already been populated. Since our secondary index implementation is backed by materialized views, we can virtualize that table so queries are actually answered by built_views. Fixes #3483 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-06-04 11:14:17 +01:00
Duarte Nunes	65c4205334	db/system_keyspace: Explain that table_name is the keyspace in IndexInfo This patch adds the same comment that exists in Apache Cassandra, explaining that the table_name column in the IndexInfo system table actually refers to the keyspace name. Don't be fooled. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-06-04 11:14:17 +01:00
Duarte Nunes	7187963bda	db/legacy_schema_migrator: Don't migrate indexes Previous versions contained no indexes, and Apache Cassandra indexes cannot be migrated to Scylla. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-06-04 11:14:17 +01:00
Duarte Nunes	a23bda3393	Merge 'Implement separate timeout for range queries' from Avi " This patchset implements separate timeouts for range queries, and lays the foundations for separate timeouts for other query types. While the feature in itself is worthy, the real motivation is to have the timeouts decided by the caller, instead of storage_proxy. This in turn is required to disentangle each layer behaving differently depending on whether the query is internal or not; instead, the goal is to have each caller declare its needs in terms of consistency level and timeouts, and have the lower layers implement its requirements instead of making their own decisions. Fixes #3013. Tests: unit (release) " * tag '3013/v1.1' of https://github.com/avikivity/scylla: storage_proxy: remove default_query_timeout() storage_proxy: don't use default timeouts query_options: augment with timeout_config thrift: configure thrift transport and handler with a timeout_config transport: configure native transport with a timeout_config cql3: define and populate timeout_config_selector timeout_config: introduce timeout configuration	2018-05-13 20:05:50 +02:00
Piotr Sarna	fe02c3d0e2	database, sstables, tests: add large_partition_handler This commit makes database, sstables and tests aware of which large_partition_handler they use. Proper large_partition_handler is retrievable from config information and is based on existing compaction_large_partition_warning_threshold_mb entry. Right now CQL TABLE variant of large_partition_handler is used in the database. Tests use a NOP version of large_partition_handler, which does not depend on CQL queries at all.	2018-05-04 14:38:13 +02:00
Piotr Sarna	02822efbc8	db: add system.large_partitions table This commit adds a system.large_partitions table, which can be used to trace largest partitions of a cluster. Schema: ( keyspace_name text, table_name text, sstable_name text, partition_size bigint, key text, compaction_time timestamp, PRIMARY KEY((keyspace_name, table_name), sstable_name, partition_size, key) ) WITH CLUSTERING ORDER BY (partition_size DESC); References #3292	2018-05-04 12:45:40 +02:00
Avi Kivity	d8dd7e05a7	storage_proxy: don't use default timeouts Require all callers to supply timeouts instead of relying on defaults. Since all callers now have the timeouts set up, they can easily supply them.	2018-04-30 13:19:53 +03:00
Calle Wilund	b1edf75c8b	types: Make seastar::inet_address the "native" type for CQL inet. Fixes #3187 Requires seastar "inet_address: Add constructor and conversion function from/to IPv4" Implements support IPv6 for CQL inet data. The actual data stored will now vary between 4 and 16 bytes. gms::inet_address has been augumented to interop with seastar::inet_address, though of course actually trying to use an Ipv6 address there or in any of its tables with throw badly. Tests assuming ipv4 changed. Storing a ipv4_address should be transparent, as it now "widens". However, since all ipv4 is inet_address, but not vice versa, there is no implicit overloading on the read paths. I.e. tests and system_keyspace (where we read ip addresses from tables explicitly) are modified to use the proper type. Message-Id: <20180424161817.26316-1-calle@scylladb.com>	2018-04-24 23:12:07 +01:00
Duarte Nunes	75bb66a50d	db/system_keyspace: scylla_views_builds_in_progress writes are user mem Treat writes to scylla_views_builds_in_progress as user memory, as the number of writes is dependent on the amount of user data on views (times the number of views, divided by the view building batch size). Fixes #3325 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-04-03 13:16:28 +01:00
Duarte Nunes	4227641a3d	db/system_keyspace: Add API for MV-related system tables This patch implements an API to access the MV-related system tables, which pertain to the view building process. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:10 +01:00
Duarte Nunes	b2cae7ea09	db/system_keyspace: Add virtual reader for MV in-progress build status Provide a virtual reader so users can query the in-progress view table in a way compatible with Apache Cassandra. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:10 +01:00
Duarte Nunes	7811474697	db/system_keyspace: Add Scylla-specific MV system table When building a materialized view, we divide our work by shard, so we need to register which shard did what work in the in-progress system table. We also add the token we started at, which will enable some optimizations in the view building code. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:10 +01:00
Duarte Nunes	38831888d2	db/system_keyspace: Include MV system tables in all_tables() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:10 +01:00
Botond Dénes	2e2abf6edb	storage_proxy: add coordinator_query_options and coordinator_query_result As yet more parameters and return-values are about to be added to all storage_proxy::query_* methods we need a way that scales better than changing the signatures every time. To this end we aggregate all non-mandatory query parameters into `coordinator_query_options` and all return values into `coordinator_query_result`. This way new fields can be simply added to the respective structs while the signatures of the methods themselves and their client code can remain unchanged.	2018-03-19 15:17:35 +02:00
Botond Dénes	eac597d726	Add preferred and last replicas to the signature of query() preferred_replicas are added to the parameters and last_replicas are added to the return type. The preferred replicas will be used as a hint for the selection of the replicas to send the read requests to. The last replicas (returned) are the replicas actually selected for the read. This will allow queries to consistently hit the same replicas for each page thus reusing readers created on these replicas. For convenience a query() overload is provided that doesn't take or return the preferred and last replicas. This patch only adds the parameters and propagates them down to query_singular() and query_partition_key_range(). The code to actually use these preferred-replicas will be added in later patches. This reason for separating this is to reduce noise and improve reviewability for those functional changes later.	2018-03-13 10:34:34 +02:00
Avi Kivity	4f6b892aa1	cql3: remove #include of system_keyspace.hh We include system_keyspace for just the string "system" (and a related is_system_keyspace() function). Replace with a forward-declared functions.	2018-03-11 18:02:23 +02:00
Botond Dénes	1259031af3	Use the reader_concurrency_semaphore to limit reader concurrency	2018-03-08 14:12:12 +02:00
Duarte Nunes	9254a9a6fe	db/system_keyspace: Move dependency on db/schema_tables to source file And add missing dependencies to header file. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180307111304.2914-1-duarte@scylladb.com>	2018-03-07 14:45:36 +02:00
José Guilherme Vanz	380bc0aa0d	Swap arguments order of mutation constructor Swap arguments in the mutation constructor keeping the same standard from the constructor variants. Refs #3084 Signed-off-by: José Guilherme Vanz <guilherme.sft@gmail.com> Message-Id: <20180120000154.3823-1-guilherme.sft@gmail.com>	2018-01-21 12:58:42 +02:00
Glauber Costa	08a0c3714c	allow request-specific read timeouts in storage proxy reads Timeouts are a global property. However, for tables in keyspaces like the system keyspace, we don't want to uphold that timeout--in fact, we wan't no timeout there at all. We already apply such configuration for requests waiting in the queued sstable queue: system keyspace requests won't be removed. However, the storage proxy will insert its own timeouts in those requests, causing them to fail. This patch changes the storage proxy read layer so that the timeout is applied based on the column family configuration, which is in turn inherited from the keyspace configuration. This matches our usual way of passing db parameters down. In terms of implementation, we can either move the timeout inside the abstract read executor or keep it external. The former is a bit cleaner, the the latter has the nice property that all executors generated will share the exact same timeout point. In this patch, we chose the latter. We are also careful to propagate the timeout information to the replica. So even if we are talking about the local replica, when we add the request to the concurrency queue, we will do it in accordance with the timeout specified by the storage proxy layer. After this patch, Scylla is able to start just fine with very low timeouts--since read timeouts in the system keyspace are now ignored. Fixes #2462 Implementation notes, and general comments about open discussion in 2462: * Because we are not bypassing the timeout, just setting it high enough, I consider the concerns about the batchlog moot: if we fail for any other reason that will be propagated. Last case, because the timeout is per-CF, we could do what we do for the dirty memory manager and move the batchlog alone to use a different timeout setting. * Storage proxy likes specifying its timeouts as a time_point, whereas when we get low enough as to deal with the read_concurrency_config, we are talking about deltas. So at some point we need to convert time_points to durations. We do that in the database query functions. v2: - use per-request instead of per-table timeouts. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-12 07:43:21 -05:00
Botond Dénes	fea6214a0a	Update reader restriction related metrics Update description of existing reader count metrics, add memory consumption metrics. Use labels to distinguish between system, user and streaming reads related metrics.	2017-10-03 12:44:17 +03:00
Botond Dénes	47e07b787e	restricted_mutation_reader: restrict based-on memory consumption Restrict readers based on their memory consumption, instead of the count of the top-level readers. To do this an interposer is installed at the input_stream level which tracks buffers emmited by the stream. This way we can have an accurate picture of the readers' actual memory consumption. New readers will consume 16k units from the semaphore up-front. This is to account their own memory-consumption, apart from the buffers they will allocate. Creating the reader will be deferred to when there are enough resources to create it. As before only new readers will be blocked on an exhausted semaphore, existing readers can continue to work.	2017-10-03 12:44:12 +03:00
Avi Kivity	78eae8bf48	Revert "Merge "Make restricting_mutation_reader more accurate" from Botond" This reverts commit `c6e5dcc556`, reversing changes made to `19b21a0ab2`. Failes to build, plus author has more changes.	2017-10-03 11:58:59 +03:00
Botond Dénes	43dba8f173	Update reader restriction related metrics Update description of existing reader count metrics, add memory consumption metrics.	2017-09-20 11:16:21 +03:00
Botond Dénes	33e97e7457	restricted_mutation_reader: restrict based-on memory consumption Restrict readers based on their memory consumption, instead of the count of the top-level readers. To do this an interposer is installed at the input_stream level which tracks buffers emmited by the stream. This way we can have an accurate picture of the readers' actual memory consumption. New readers will consume 16k units from the semaphore up-front. This is to account their own memory-consumption, apart from the buffers they will allocate. Creating the reader will be deferred to when there are enough resources to create it. As before only new readers will be blocked on an exhausted semaphore, existing readers can continue to work.	2017-09-20 11:14:35 +03:00
Avi Kivity	e44517851e	untyped_result_set: reduce dependencies Forward-declare untyped_result_set and untyped_result_set_row, and remove the include from query_processor.hh. Message-Id: <20170916170859.27612-3-avi@scylladb.com>	2017-09-18 15:15:15 +02:00
Avi Kivity	0aaefe665b	system_keyspace: add missing include	2017-09-11 20:09:45 +03:00
Piotr Jastrzebski	dd5dc75605	Stop calling _local_cache.stop in at_exit. This removes a race condition that was causing #2721 Fixes #2721 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <ad060fab43d63c17db9f811c421d7ab26e5e57c8.1503933021.git.piotr@scylladb.com>	2017-09-03 15:55:48 +03:00
Avi Kivity	ebff739a84	Merge "use paging for compaction history" from Amnon "This series adds an option to use paging in internal query and use that for the get compaction history function. Internal paging will be done explicitly, to use paging, you first create a state object (that contains the query as well) and use that state to get the first page, the result will contain both the query result and a new state that can be used to get the next page. Fixes #2366" * 'amnon/paged_compaction_history_v5' of github.com:cloudius-systems/seastar-dev: system_keyspace: Use paging for get compaction history Add paging for internal queries query_options: Allows creating query_options from query_options	2017-08-02 18:15:58 +03:00

1 2 3 4 5

209 Commits