scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-07 07:23:15 +00:00

Author	SHA1	Message	Date
Avi Kivity	4f6b892aa1	cql3: remove #include of system_keyspace.hh We include system_keyspace for just the string "system" (and a related is_system_keyspace() function). Replace with a forward-declared functions.	2018-03-11 18:02:23 +02:00
Botond Dénes	1259031af3	Use the reader_concurrency_semaphore to limit reader concurrency	2018-03-08 14:12:12 +02:00
Raphael S. Carvalho	aa75684ee7	sstables: Warn when an extra-large partition is written Based on https://issues.apache.org/jira/browse/CASSANDRA-9643 For compaction_large_partition_warning_threshold_mb option set to 1, follow an example output: WARN 2018-02-22 19:52:11,029 [shard 0] sstable - Writing large row system/local:{key: pk{00056c6f63616c}, token:-7564491331177403445} (1276758 bytes) Fixes #2209. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180306175912.19259-1-raphaelsc@scylladb.com>	2018-03-07 15:49:46 +00:00
Duarte Nunes	9254a9a6fe	db/system_keyspace: Move dependency on db/schema_tables to source file And add missing dependencies to header file. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180307111304.2914-1-duarte@scylladb.com>	2018-03-07 14:45:36 +02:00
Avi Kivity	d973445a94	Merge "sstable/schema extensions" from Calle " Adds extension points to schema/sstables to enable hooking in stuff, like, say, something that modifies how sstable disk io works. (Cough, cough, encryption) Extensions are processed as property keywords in CQL. To add an extension, a "module" must register it into the extensions object on boot time. To avoid globals (and yet don't), extensions are reachable from config (and thus from db). Table/view tables already contain an extension element, so we utilize this to persist config. schema_tables tables/views from mutations now require a "context" object (currently only extensions, but abstracted for easier further changes. Because of how schemas currently operate, there is a super lame workaround to allow "schema_registry" access to config and by extension extensions. DB, upon instansiation, calls a thread local global "init" in schema_registry and registers the config. It, in turn, can then call table_from_mutations as required. Includes the (modified) patch to encapsulate compression into objects, mainly because it is nice to encapsulate, and isolate a little. " * 'calle/extensions-v5' of github.com:scylladb/seastar-dev: extensions: Small unit test sstables: Process extensions on file open sstables::types: Add optional extensions attribute to scylla metadata sstables::disk_types: Add hash and comparator(sstring) to disk_string schema_tables: Load/save extensions table cql: Add schema extensions processing to properties schema_tables: Require context object in schema load path schema_tables: Add opaque context object config_file_impl: Remove ostream operators main/init: Formalize configurables + add extensions to init call db::config: Add extensions as a config sub-object db::extensions: Configuration object to store various extensions cql3::statements::property_definitions: Use std::variant instead of any sstables: Add extension type for wrapping file io schema: Add opaque type to represent extensions sstables::compress/compress: Make compression a virtual object	2018-02-26 17:15:29 +02:00
Pekka Enberg	f1f691b555	Merge "Add the GoogleCloudSnitch" from Vlad "This series adds the GoogleCloudSnitch. Fixes #1619" * 'google-cloud-snitch-v4' of https://github.com/vladzcloudius/scylla: config: uncomment/add the supported snitches description tests: added gce_snitch_test locator::gce_snitch: implementation of the GoogleCloudSnitch locator::snitch_base: properly log the failure during the snitch startup	2018-02-19 15:58:56 +02:00
Duarte Nunes	f665f1ab97	db/commitlog: Close the segment file Operations on a segment's underlying append_challenged_posix_file_impl, such as truncate(), schedule asynchronous operations when they are executed, which capture the file object. To synchronize with them and prevent use-after-free, we need to call close() and only delete the segment and file when the returned future resolves. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180216235754.24257-1-duarte@scylladb.com>	2018-02-19 13:09:41 +00:00
Duarte Nunes	7004f6c7ff	db/commitlog: Actually prevent new requests during shutdown When shutting down the commitlog we try to block all new requests by acquiring all available resources. We were, however, letting go of the semaphore permits too early, before closing the gate and shutting down the active segments. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180216234826.24111-1-duarte@scylladb.com>	2018-02-19 13:09:26 +00:00
Glauber Costa	7b6f188e27	controllers: allow a static priority to override the controller output We have merged the I/O controller without this, but we want to integrate the CPU and I/O controllers into one. Currently, the quota can be statically set for the CPU controller. For now, until we gain more experience with it we should allow a static value to override the controller's output as well. That is particularly important since we don't yet control some strategies like LCS and the time-based ones. Users in the field may be using one of those strategies with a static value for background quota. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Glauber Costa	c099c98676	controllers: retire auto_adjust_flush_quota It no longer makes sense now that we have the full scheduler + controllers. In its lieu, we will provide an option to statically set the controller's shares as a safe guard against us getting this wrong. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Avi Kivity	2ee163d32b	config: mark background_writer_scheduling_quota as Unused Since the background writer flush quota config is no longer used, mark it Unused.	2018-02-07 17:19:29 -05:00
Avi Kivity	641aaba12c	database, sstables, compaction: convert use of thread_scheduling_group to seastar cpu scheduler thread_scheduling_groups are converted to plain scheduling_group. Due to differences in initialization (scheduling_group initializtion defers), we create the scheduling_groups in main.cc and propagate them to users via a new class database_config. The sstable writer loses its thread_scheduling_group parameter and instead inherits scheduling from its caller. Since shares are in the 1-1000 range vs. 0-1 for thread scheduling quotas, the flush controller was adjusted to return values within the higher ranges.	2018-02-07 17:19:29 -05:00
Calle Wilund	97f9f572f8	schema_tables: Load/save extensions table Parses the extension map in tables/views using the registered extension. If a schema row contains an unknown extension, we just preserve the data in a placeholder.	2018-02-07 10:11:46 +00:00
Calle Wilund	2b56bbfa7d	schema_tables: Require context object in schema load path Requires "workaround" fix for schema_registry and frozen_mutation, since the former is a free-float thread local, and the latter is a pure data carrier. frozen_schema can take a parameter for unfreeze, but schema registry requires being told which the system extensions are.	2018-02-07 10:11:46 +00:00
Calle Wilund	c2b49ec2e2	schema_tables: Add opaque context object To allow carrying extensions and potentially more	2018-02-07 10:11:46 +00:00
Calle Wilund	c19d8dd602	db::config: Add extensions as a config sub-object The idea being that we should have config be a global, immutable singleton, set up by startup/test then owned/referenced by db etc. Extensions are read-only in this context, so init code should set it up before handing to the config. Or keep a ref to the ext param.	2018-02-07 10:11:46 +00:00
Calle Wilund	78174c6c59	db::extensions: Configuration object to store various extensions A singular, yet not static global, container for schema/sstable extensions.	2018-02-07 10:11:46 +00:00
Vlad Zolotarov	bc90aa79b3	config: uncomment/add the supported snitches description Uncomment desscriptions of Ec2SnitchXXX which are supported for a long time already. Add the description of the new GoogleCloudSnitch. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-02-05 10:37:13 -05:00
Nadav Har'El	31d0a1dd0c	Materialized views: implement row and partition locking mechanism This patch adds a "row_locker" class providing locking (shard-locally) of individual clustering rows or entire partitions, and both exclusive and shared locks (a.k.a. reader/writer lock). As we'll see in a following patch, we need this locking capability for materialized views, to serialize the read-modify-update modifications which involve the same rows or partitions. The new row_locker is significantly different from the existing cell_locker. The two main differences are that 1. row_locker also supports locking the entire partition, not just individual rows (or cells in them), and that 2. row_locker supports also shared (reader) locks, not just exclusive locks. For this reason we opted for a new implementation, instead of making large modificiations to the existing cell_locker. And we put the source files in the view/ directory, because row_locker's requirements are pretty specific to the needs of materialized views. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-01-30 16:16:27 +02:00
Duarte Nunes	1e3fae5bef	db/schema_tables: Only drop UDTs after merging tables Dropping a user type requires that all tables using that type also be dropped. However, a type may appear to be dropped at the same time as a table, for instance due to the order in which a node receives schema notifications, or when dropping a keyspace. When dropping a table, if we build a schema in a shard through a global_schema_pointer, then we'll check for the existence of any user type the schema employs. We thus need to ensure types are only dropped after tables, similarly to how it's done for keyspaces. Fixes #3068 Tests: unit-tests (release) Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180129114137.85149-1-duarte@scylladb.com>	2018-01-30 12:07:04 +01:00
Piotr Jastrzebski	96c97ad1db	Rename streamed_mutation* files to mutation_fragment* Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-24 20:56:49 +01:00
José Guilherme Vanz	380bc0aa0d	Swap arguments order of mutation constructor Swap arguments in the mutation constructor keeping the same standard from the constructor variants. Refs #3084 Signed-off-by: José Guilherme Vanz <guilherme.sft@gmail.com> Message-Id: <20180120000154.3823-1-guilherme.sft@gmail.com>	2018-01-21 12:58:42 +02:00
Piotr Jastrzebski	4c74b8c7e7	Migrate materalized views to flat_mutation_reader Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-01-18 07:32:35 +01:00
Duarte Nunes	b607662d2e	collection_type_impl: Make for_each_cell static Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180115013532.67200-1-duarte@scylladb.com>	2018-01-15 11:16:33 +02:00
Glauber Costa	08a0c3714c	allow request-specific read timeouts in storage proxy reads Timeouts are a global property. However, for tables in keyspaces like the system keyspace, we don't want to uphold that timeout--in fact, we wan't no timeout there at all. We already apply such configuration for requests waiting in the queued sstable queue: system keyspace requests won't be removed. However, the storage proxy will insert its own timeouts in those requests, causing them to fail. This patch changes the storage proxy read layer so that the timeout is applied based on the column family configuration, which is in turn inherited from the keyspace configuration. This matches our usual way of passing db parameters down. In terms of implementation, we can either move the timeout inside the abstract read executor or keep it external. The former is a bit cleaner, the the latter has the nice property that all executors generated will share the exact same timeout point. In this patch, we chose the latter. We are also careful to propagate the timeout information to the replica. So even if we are talking about the local replica, when we add the request to the concurrency queue, we will do it in accordance with the timeout specified by the storage proxy layer. After this patch, Scylla is able to start just fine with very low timeouts--since read timeouts in the system keyspace are now ignored. Fixes #2462 Implementation notes, and general comments about open discussion in 2462: * Because we are not bypassing the timeout, just setting it high enough, I consider the concerns about the batchlog moot: if we fail for any other reason that will be propagated. Last case, because the timeout is per-CF, we could do what we do for the dirty memory manager and move the batchlog alone to use a different timeout setting. * Storage proxy likes specifying its timeouts as a time_point, whereas when we get low enough as to deal with the read_concurrency_config, we are talking about deltas. So at some point we need to convert time_points to durations. We do that in the database query functions. v2: - use per-request instead of per-table timeouts. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-12 07:43:21 -05:00
Glauber Costa	5140aaea00	add a timeout to fast forward to In the last patch, we enabled per-request timeouts, we enable timeouts in fill_buffer. There are many places, though, in which we fast_forward_to before we fill_buffer, so in order to make that effective we need to propagate the timeouts to fast_forward_to as well. In the same way as fill_buffer, we make the argument optional wherever possible in the high level callers, making them mandatory in the implementations. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-12 07:43:19 -05:00
Glauber Costa	d965af42b0	add a timeout to fill_buffer As part of the work to enable per-request timeouts, we enable timeouts in fill_buffer. The argument is made optional at the main classes, but mandatory in all the ::impl versions. This way we'll make sure we didn't forget anything. At this point we're still mostly passing that information around and don't have any entity that will act on those timeouts. In the next patch we will wire that up. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-11 12:07:41 -05:00
Glauber Costa	80c4a211d8	consolidate timeout_clock At the moment, various different subsystems use their different ideas of what a timeout_clock is. This makes it a bit harder to pass timeouts between them because although most are actually a lowres_clock, that is not guaranteed to be the case. As a matter of fact, the timeout for restricted reads is expressed as nanoseconds, which is not a valid duration in the lowres_clock. As a first step towards fixing this, we'll consolidate all of the existing timeout_clocks in one, now called db::timeout_clock. Other things that tend to be expressed in terms of that clock--like the fact that the maximum time_point means no timeout and a semaphore that wait()s with that resolution are also moved to the common header. In the upcoming patch we will fix the restricted reader timeouts to be expressed in terms of the new timeout_clock. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-11 12:07:41 -05:00
Paweł Dziepak	4dfddc97c7	db/schema_tables: do not use moved from shared pointer Shared pointer view is captured by two continuations, one of which is moving it away. Using do_with() solves the problem. Fixes #3092. Message-Id: <20171221111614.16208-1-pdziepak@scylladb.com>	2017-12-21 15:13:25 +01:00
George Tavares	ceecd542cd	db/view: Consume updated rows regardless of static row Using Materialized Views, if the base table has static columns, and the update in base table mutates static and non static rows, the streamed_mutation is stopped before process non static row. The patch avoids stopping the stream_mutation and adds a test case. Message-Id: <20171220173434.25091-1-tavares.george@gmail.com>	2017-12-21 00:49:15 +01:00
Raphael S. Carvalho	928beae242	Fix compilation of db/hints/manager.cc and row_cache.cc compiler: gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1) Problems introduced in `f6a461c7a4` and `37b19ae6ba`, respectively. They both fail to compile due to use of method in lambda without explicit mention of this. Some of failure is fixed by not using auto in lambda parameter. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20171218222144.12297-1-raphaelsc@scylladb.com>	2017-12-19 11:15:45 +01:00
Nadav Har'El	101cce3c79	Fix compilation of tests/commitlog_test.cc In commit `878d58d23a`, a new parameter was added to commitlog::descriptor. The commit message says that "It's default value is a descriptor::FILENAME_PREFIX." while in reality, it did not have a default value and compilation of tests/commitlog_test.cc broke, because it didn't specify a value. So this patch adds a default value for this parameter, as was suggested by the original commit message. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20171218131020.17883-1-nyh@scylladb.com>	2017-12-18 15:35:34 +02:00
Vlad Zolotarov	c2296c9575	config: add hints related options - hints_directory: - This option allows defining of the directory where hints files are going to be stored if hinted handoff is enabled. - hinted_handoff_enabled: - May receive either a boolean value or a list of DCs. In the later case this will define the DCs to which Nodes hints are going to be generated. - max_hint_window_in_ms: - Maximum amount of milliseconds the hints are going to be generated to the Node that is DOWN. After this time period the hints are no longer going to be generated until the Node is seen UP. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-12-14 15:08:11 -05:00
Vlad Zolotarov	51bbf18c08	db::hints::manager: initial commit Curently implemented: - Hints generation: db::hints::manager::store_hint(...). - Sending: db::hints::manager::on_timer(). TODO: - Resharding. - Node decommission. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-12-14 15:08:07 -05:00
Vlad Zolotarov	ec15d60a2d	db::commitlog::replay_position: added std::hash<replay_position> It's needed for hinted handoff. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-12-14 15:05:48 -05:00
Vlad Zolotarov	af70c0a709	db::commitlog: truncate segments to their actual sizes during shutdown Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-12-14 15:05:48 -05:00
Vlad Zolotarov	033af6c950	db::commitlog: allow defining a metrics category name Add a new field to db::commitlog::config that would define the metrics category name. If not given - metrics are not going to be registered. Set it to "commitlog" in db::commitlog::config(const db::config&). Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-12-14 15:05:47 -05:00
Vlad Zolotarov	878d58d23a	db/commitlog/commitlog::descriptor: add a filename_prefix parameter This parameter is used when creating a new segment. It's default value is a descriptor::FILENAME_PREFIX. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-12-14 15:05:47 -05:00
Vlad Zolotarov	719b1fb24f	db::commitlog::descriptor::descriptor(filename): pass a filename as a const ref Avoid not needed copy by passing a file name as a reference. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-12-14 15:05:47 -05:00
Michael Munday	5158b3f484	utils::crc: introduce process_le/be(T) methods Replace the oblique process(T) overloads for integer types with explicit process_le/be(T) methods that would interpret the given integer as a stream of bytes using the corresponding endiannes. For instance process_le(0x11223344) would treat this integer as the following array of bytes: {0x44, 0x33, 0x22, 0x11}. process_be(0x11223344) on the other hand would treat this integer as if it's {0x11, 0x22, 0x33, 0x44}. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-12-08 10:12:21 -05:00
Gleb Natapov	357c77a333	consistency_level: constify quorum_for() and local_quorum_for()	2017-12-05 13:01:20 +02:00
Paweł Dziepak	586b61d57d	size_estimates: convert reader to flat mutation readers Message-Id: <20171129105909.27084-1-pdziepak@scylladb.com>	2017-11-29 12:14:05 +00:00
Jesse Haber-Kucharsky	460f3c7065	auth: Add dormant role manager to `service` The role manager still does not interact with the rest of the system, but the role manager is now sharded on all cores and metadata is created. The following metadata tables are created: - `system_auth.roles` - `system_auth.role_members` The default superuser, "cassandra", is also created, but has no function.	2017-11-27 12:14:24 -05:00
Pekka Enberg	0c192c835c	cql3: Fix 'DROP INDEX' to also drop index view This patch fixes 'DROP INDEX' CQL statement to also drop the underlying index view automatically so that we don't leave unused materialized views behind. Message-Id: <1510303421-15945-1-git-send-email-penberg@scylladb.com>	2017-11-10 10:52:08 +01:00
Calle Wilund	959d729428	config: Resurrect command line aliases that where lost	2017-11-06 09:54:46 +00:00
Avi Kivity	d6cd44a725	Revert "Merge 'Single key sstable reader optimization' from Botond" This reverts commit `5e9cd128ad`, reversing changes made to `1f4e6759a7`. Tomek found some serious issues.	2017-10-19 12:47:21 +03:00
Botond Dénes	08502f2d48	Add single_key_parallel_scan_threshold option This option regulates when exactly the single-key optimization is considered ineffective and turned off. The threshold is the proportion of the extra data source candidates that can be read before the optimization is considered ineffective and disabled. The proportion is calculated as follows: (read_data_sources - 1) / (total_data_sources - 1) We substract 1 from the read_data_sources and total_data_sources to effectively measure the rate of extra data sources we read. This makes sure that the proportion is meaningful even if e.g. we have only have a total of 2 data-sources and we read only 1 (best case). Whenever this number goes above the threshold the optimization is disabled. The threshold is number between 0 and 1, 0 forces the optimization off and 1 forces it on. Increase the treshold to favor throughput over latency for single-row reads, decrease the treshold to improve latency at the expense of throughput. If the threshold is > 0 (it's not force disabled) and the optimization is disabled due to a read crossing the threshold, we will issue "probing" reads (every 100th read) to determine if the optimization is worth re-enabling. Probing reads are allowed to run through the optimization path and if they go below the threshold the optimization is re-enabled.	2017-10-18 17:24:03 +03:00
Calle Wilund	4bd98f7296	db::config: Re-implement on utils/config_file. Re-use config abstraction, and de-couple the seastar logging parts a little bit more.	2017-10-18 00:51:54 +00:00
Duarte Nunes	baeec0935f	Replace query::full_slice with schema::full_slice() query::full_slice doesn't select any regular or static columns, which is at odds with the expectations of its users. This patch replaces it with the schema::full_slice() version. Refs #2885 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1507732800-9448-2-git-send-email-duarte@scylladb.com>	2017-10-17 11:25:53 +02:00
Duarte Nunes	a011eb72c2	Merge branch 'CQL secondary index backing views' from Pekka "This patch series adds backing materialized view for secondary indices. When a new index is created with the 'CREATE INDEX' statement, a backing materialized view is created automatically. For example, assuming the following table: CREATE TABLE ks1.users ( userid uuid, email text, PRIMARY KEY (userid) ); When the following index is created: CREATE INDEX user_email ON ks1.users (email); The following materialized view is also created: cqlsh> DESCRIBE ks1.users; <snip> CREATE MATERIALIZED VIEW ks1.user_email_index AS SELECT email, userid FROM ks1.users WHERE email IS NOT NULL PRIMARY KEY (email, userid) WITH CLUSTERING ORDER BY (userid ASC) AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} AND comment = '' AND compaction = {'class': 'SizeTieredCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; CQL queries will use the backing materialized view as part of queries on indexed columns to fetch the primary keys." * 'penberg/cql-2i-backing-view/v3' of github.com:scylladb/seastar-dev: schema_tables: Create backing view for indices database: Kill obsolete secondary index manager stub cql3: Wire up secondary index manager cql3/restrictions: Add term_slice::is_supported_by() function index: Add secondary_index_manager::create_view_for_index() index: Add target_parser::parse() helper cql3/statements: Add index_target::from_sstring() helper index: Add secondary_index_manager::get_dependent_indices() index: Add secondary_index_manager::reload() index: Add secondary_index_manager::list_indexes() index: Add index class index: Pass column_family to secondary_index_manager constructor database: Make secondary index manager per-column family	2017-10-05 12:08:14 +01:00

1 2 3 4 5 ...

994 Commits