scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 06:05:53 +00:00

Author	SHA1	Message	Date
Asias He	7f826d3343	streaming: Expose reason for streaming On receiving a mutation_fragment or a mutation triggered by a streaming operation, we pass an enum stream_reason to notify the receiver what the streaming is used for. So the receiver can decide further operation, e.g., send view updates, beyond applying the streaming data on disk. Fixes #3276 Message-Id: <f15ebcdee25e87a033dcdd066770114a499881c0.1539498866.git.asias@scylladb.com>	2018-10-15 22:03:28 +01:00
Nadav Har'El	36a657fc10	schema: persist "view virtual" columns to a separate system table In the previous patch, we added a "view virtual" flag on columns. In this patch we add persistance to this flag: I.e., writing it to the on-disk schema table and reading it back on startup. But the implementation is not as simple as adding a flag: In the on-disk system tables, we have a "columns" table listing all the columns in the database and their types. Cqlsh's "DESCRIBE MATERIALIZED VIEW" works by reading this "columns" table, and listing all of the requested view's columns. Therefore, we cannot add "virtual columns" - which are columns not added by the user and not intended to be seen - to this list. We therefore need to create in this patch a separate list for virtual columns, in a new table "view_virtual_columns". This table is essentially identical to the existing "columns" table, just separate. We need to write each column to the appropriate table (columns with the view_virtual flag to "view_virtual_columns", columns without it to the old "columns"), read from both on startup, and remember to delete columns from both when a table is dropped. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-08-16 15:30:06 +03:00
Asias He	fd71c5718f	gossip: Reduce continuous memory usage Gossip SYN and ACK uses std::vector to store a list of gossip_digest, the larger the cluster, the more continuous memory is needed. To reduce the memory pressure which might cause std::bad_alloc, switch the std::vector to chunked_vector. In addition, change add_local_application_state to use std::list instead of std::vector. Refs #2782	2018-07-17 20:15:32 +08:00
Paweł Dziepak	ed12555192	idl: add idl description of frozen_mutation_fragments	2018-05-25 10:15:10 +01:00
Paweł Dziepak	aa4e589ace	frozen_mutation: introduce frozen_mutation_fragment This patch introduces IDL definition as well as serialisers and deserialisers for freezing mutation_fragment so that they can be transferred between nodes in a cluster.	2018-05-25 10:15:10 +01:00
Paweł Dziepak	b2e9491728	tests/idl: test variant being the first member of a structure	2018-05-25 10:15:10 +01:00
Paweł Dziepak	d731cf427d	tests/idl: test serialising and deserialising empty structures	2018-05-25 10:15:10 +01:00
Botond Dénes	ddd70dc113	Use dht::token_range alias for last/preferred replicas Use the pre-existing type alias instead of fully spelling out the type everywhere.	2018-05-10 06:22:39 +03:00
Botond Dénes	b55dcc2ce5	Add query_read_repair_decision to paging-state This new field will store the repair-decision made on the first page of the query. This decision will be sticky to all pages of the query. In mixed clusters the decision might not happen on the first page and it might even change during the query as old coordinators will not store nor respect the decision.	2018-03-19 15:17:31 +02:00
Botond Dénes	f281b3e923	Add last_replicas to paging_state Helps paged queries consistently hit the same replicas for each subsequent page. Replicas that already served a page will keep the readers used for filling it around in a cache. Subsequent page request hitting the same replicas can reuse these readers to fill the pages avoiding the work of creating these readers from scratch on every page. In a mixed cluster older coordinators will ignore this value. The value of last_replicas may change between pages as nodes may become available/unavailable or the coordinator may decide to send the read requests to different replicas at its discretion. Replicas are identified by an opaque uuid which should only make sense to the storage-proxy.	2018-03-13 10:34:34 +02:00
Nadav Har'El	fa284f6307	Add query UUID to read command This patch adds the parameter to read_command which is needed for caching of readers during multiple pages of a paged queries, which we will introduce in the next patches. The query_uuid is a UUID of a previously saved reader, which the replica is now asked to recall and resume (if this saved reader is no longer in the cache, it is fine, a new reader will be started). Additionally a helper flag is_first_page is added so that the replica can avoid doing any cache lookups (and incrementing miss counters) for the first page. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-03-13 10:34:34 +02:00
Nadav Har'El	ec7c56d18a	Add query UUID to paging state This patch adds to the "paging_state", the opaque cookie that clients are supposed to provide when asking for the next page on a paged query, a unique id field. This new field will be used to tell that a new request for a page really continues the previous page, and doesn't just by chance start at the same position the previous page stopped. We need to support setups with mixed versions - a client may get a paging state from a coordinator running a new version of Scylla and send it to a different coordinator running an old version - or vice versa. So the new uuid field is set up to have a default uuid of UUID() (a recognizable invalid uuid 0), so new versions receiving no uuid from an old version will set this invalid uuid, and old versions receiving a uuid from a new version will simply ignore it. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-03-13 10:34:34 +02:00
Duarte Nunes	0bab3e59c2	service/storage_service: Add and use xxhash feature We add a cluster feature that informs whether the xxHash algorithm is supported, and allow nodes to switch to it. We use a cluster feature because older versions are not ready to receive a different digest algorithm than MD5 when answering a data request. If we ever should add a new hash algorithm, we would also need to add a new cluster feature for that algorithm. The alternative would be to add code so a coordinator could negotiate what digest algorithm to use with the set of replicas it is contacting. Fixes #2884 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:50 +00:00
Duarte Nunes	3b9a9b7321	query-result: Send row and partition count over the wire To avoid calculating them on the coordinator side. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-08-14 10:29:06 +02:00
Tomasz Grabiec	cdf5b67522	schema_tables: Introduce system_schema.scylla_tables It will be used to store Scylla spcific table metadata. We cannot store it in the standard "tables" table for compatibility reasons - Cassandra will fail to read schema if it encounteres columns it is not expecting.	2017-07-11 14:52:23 +02:00
Pekka Enberg	8112d7c5c0	idl: Fix frozen_schema version numbers The IDL changes will appear in 2.0 so fix up the version numbers. Message-Id: <1499680669-6757-1-git-send-email-penberg@scylladb.com>	2017-07-10 14:02:20 +03:00
Gleb Natapov	fab18c0c5a	database: introduce cache_temperature class The class will represent cache hit rate for a column family and is serializable for use with RPC.	2017-06-13 09:57:14 +03:00
Avi Kivity	c4faa1e202	Merge "tracing: tracing spans and time series helper table" from Vlad " - Introduce a parent span IP and span ID paradigm. - Introduce time series tables to simplify traces processing. - Add the "How to get traces?" chapter to the tracing.md. " * 'tracing-span-ids-and-time-series-helpers-v4' of github.com:cloudius-systems/seastar-dev: docs: tracing.md: add a "how to get traces" chapter tracing::trace_keyspace_helper: introduce a time series helper tables tracing: cleanup: use nullptr instead of trace_state_ptr() tracing: introduce a span ID and parent span ID	2017-05-28 12:01:35 +03:00
Calle Wilund	6c8b5fc09d	schema_tables: Use v3 schema tables and formats Switches system/schema_* for system_schema/*, updates schema/schema builder and uses to hold/expect v3 style info (i.e. types & dropped).	2017-05-10 16:44:48 +00:00
Avi Kivity	8c5c5d3004	Merge "CQL front-end for secondary indices" from Pekka "This patch series adds CQL front-end support for secondary indices. You can now execute CREATE INDEX and DROP INDEX statements, which will update the newly added "Indexes" system table. However, the indexes are not actually backed up by anything nor are they available for CQL queries. The feature is hidden behind a new cluster feature flag and enabled only with the "--experimental" flag." * 'penberg/cql-2i/v2' of github.com:cloudius-systems/seastar-dev: (34 commits) schema: Kill index_type enum schema: Kill index_info class cql3/statements/create_index_statement: Use database::existing_index_names() in validation cql3/statements: Use secondary index manager in alter_table_statement class index: Add secondary_index_manager thrift/handler: Use index_metadata db/schema_tables: Index persistence schema: Add all_indices() to schema class schema: Remove add_default_index_names() from schema_builder class db/schema_tables: Add system table for indices cql3/Cgl.g: DROP INDEX cql3/statements: Add drop_index_statement class database: Add find_indexed_table() to database class cql3: Return change event from announce_migration() cql3/statements: Multiple index targets for CREATE INDEX cql3/statements: Use index_metadata in create_index_statement class cql3/statements: Use feature flag in create_index_statement class service/storage_service: Add feature flag for secondary indices database: Add get_available_index_name() to database class schema: Add get_default_index_name() to index_metadata class ...	2017-05-08 17:04:40 +03:00
Pekka Enberg	11474ed4c6	db/schema_tables: Index persistence	2017-05-08 10:03:28 +03:00
Vlad Zolotarov	b0f660331a	tracing: introduce a span ID and parent span ID This patch makes the tracing framework follow the general idea of Google's Dapper paper: traces generated in a context of the same query are forming a single-rooted acyclic tree where in a ScyllaDB case vertexes are spans running on each involved replica Node and edges are RPCs sent from one Node to another. - Each vertex in the tree above has an ID - "span ID". - In order to be able to build the tree from the sessions traces we need to know the parent "span ID" - the ID of a span that sent an RPC that created the current span. - Each span of a tracing session is given a 64-bit random span ID. - The root span has a span_id::illegal_id value. This patch adds: - The described above parent span ID and a span ID to the one_session_records object. - The current span ID is passed in the trace_info struct to the remote replica. - Add parent_id and span_id columns to system_traces.events table for the parent ID and span ID. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-04-25 21:52:23 -04:00
Duarte Nunes	4e693383f7	mutation_partion: Use row_tombstone This patch replaces the current row tombstone representation by a row_tombstone. The intent of the patch is thus to reify the idea of shadowable tombstones, that up until now we considered all materialized view row tombstones to be. We need to distinguish shadowable from non-shadowable row tombstones to support scenarios such as, when inserting to a table with a materialzied view: 1. insert into base (p, v1, v2) values (3, 1, 3) using timestamp 1 2. delete from base using timestamp 2 where p = 3 3. insert into base (p, v1) values (3, 1) using timestamp 3 These should yield a view row where v2 is definitely null, but with the current implementation, v2 will pop back with its value v2=3@TS=1, even though its dead in the base row. This is because the row tombstone inserted at 2) is a shadowable one. This patch only addresses the memory representation of such row_tombstones. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:33 +02:00
Duarte Nunes	8cc29f84fb	idl-compiler: Support optional fields in views When generating view code, the compiler was ignoring optional fields. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:43:04 +02:00
Paweł Dziepak	374c8a56ac	commitlog: avoid copying column_mapping It is safe to copy column_mapping accros shards. Such guarantee comes at the cost of performance. This patch makes commitlog_entry_writer use IDL generated writer to serialise commitlog_entry so that column_mapping is not copied. This also simplifies commitlog_entry itself. Performance difference tested with: perf_simple_query -c4 --write --duration 60 (medians) before after diff write 79434.35 89247.54 +12.3%	2017-02-27 17:05:58 +00:00
Paweł Dziepak	9989239c97	idl: add idl description of consistency level	2017-02-02 10:35:14 +00:00
Paweł Dziepak	9f1ebd4f7c	idl/mutation: add counter serialisation logic	2017-02-02 10:35:14 +00:00
Paweł Dziepak	b8e29cc99c	idl: is_short_read() was added in 1.6	2016-12-22 13:35:04 +01:00
Duarte Nunes	19a76a82e8	frozen_schema: Support view schemas This patch allows a view schema to be frozen. To unfreeze such a schema, we add an is_view attribute to the schema idl. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-20 13:06:11 +00:00
Paweł Dziepak	43fe3439ca	reconcilable_result: properly propagate short_read flag reconcilable_result can be merged with another or transformed into query::result. Make sure that short_read information is never lost. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-12-14 14:10:02 +00:00
Paweł Dziepak	da7ca85040	query: allow short reads When paging is used the cluster is allowed to return less rows than the client asked for. However, if such possibility is used we need a way of telling that to the coordinator and the paging implementation so that they can differentiate between short reads caused by the replica running out of data to sent and short reads caused by any other means. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-12-14 14:10:01 +00:00
Avi Kivity	18078bea9b	storage_proxy: avoid calculating digest when only one replica is contacted If we're talking to just one replica, the digest is not going to be used, so better not to calculate it at all. The optimization helps with LOCAL_ONE queries where the result is large, but does not contain large blobs (many small rows). This patch adds a digest_algorithm parameter to the READ_DATA verb that can take on two values: none and MD5 (default), and sets it to none when we're reading from one replica. In the future we may add other values for more hardware-friendly digest algorithms. Message-Id: <1479380600-19206-1-git-send-email-avi@scylladb.com>	2016-11-17 13:04:30 +02:00
Avi Kivity	a35136533d	Convert ring_position and token ranges to be nonwrapping Wrapping ranges are a pain, so we are moving wrap handling to the edges. Since cql can't generate wrapping ranges, this means thrift and the ring maintenance code; also range->ring transformations need to merge the first and last ranges. Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>	2016-11-02 21:04:11 +02:00
Vlad Zolotarov	a491ac0f18	tracing: introduce a log_slow_query logic The main idea is to log queries that take "too long" to complete. The "too long" is above the given threshold. To achieve the above this patch does the following: - Introduce two new properties to the tracing::trace_state: - "Full tracing": when the tracing of this query was explicitly requested. In this state we will record all possible traces related to this query: both on the coordinator and on any replica involved. - "Log slow query": when slow query logging is enabled. If slow query logging is enabled and a session's "duration" is above the specified threshold we will create a record in the "slow queries log" and write all trace records created on the coordinator and on a replica if a replica's session lasts longer than that threshold. (We will propagate the Coordinator's slow query logging threshold to replicas in the context of a specific tracing/logging session). The properties above are independent, namely they may be enabled and/or disabled independently and any combination of them is legal (naturally, creating a tracing session when both states above are disabled makes no sense). - Instrument the tracing::tracing service to allow the following: - Enable/disable slow query logging. - Set/get the slow query duration threshold (in microseconds). - Set/get the slow query log record TTL value (in seconds). - Instrument the trace_keyspace_helper to write a slow query log entry when requested. - The slow query logging is disabled by default and the threshold is set to half a second. - The TTL of a slow log record is set to 86400 seconds by default. - It makes sense to use the same "slow query logging threshold" and a "slow query record TTL" both on a coordinator and on a replica Nodes in a context of the same tracing session: - Pass both TTL and a threshold to the replica in a trace_info. This patch also implements the new slow query logging specific logic: - Don't write the pending tracing records before the end of a tracing session until "duration" reaches the logging threshold. - Don't build the parameters<sstring, sstring> map unless we know we will write it to I/O. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-28 18:28:44 +03:00
Vlad Zolotarov	8609900621	tracing: introduce trace_state capabilities bit field - Instead of keeping separate booleans introduce a trace_state_props_set enum_set and pass it around instead of separate booleans. - Change the trace_info to hold this value in addition to write_on_close. Initialize a corresponding bit in an enum_set based on a write_on_close value in a trace_info constructor for a backward compatibility. - Separate a trace_state constructor into two: - For a primary session object. - For a secondary session object. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-23 18:34:36 +03:00
Paweł Dziepak	dcf794b04d	idl: make bytes compatible with bytes_ostream This patch makes idl type "bytes" compatible with both bytes and bytes_ostream. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-08-22 09:31:33 +01:00
Duarte Nunes	5161ea283f	query: query::clustering_range can't wrap around This patch changes the type of query::clustering_range to express that ranges that wrap around are not allowed, and ranges that have the start bound after the end bound are considered empty. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-08-15 14:50:20 +00:00
Tomasz Grabiec	d3658b33da	tests: Add test for skip() not doing full deserialization	2016-07-25 17:35:42 +02:00
Vlad Zolotarov	a5022a09a4	tracing: use 'write' instead of 'flush' and 'store' for consistency with seastar's API In names of functions and variables: s/flush_/write_/ s/store_/write_/ In a i_tracing_backend_helper: s/flush()/kick()/ Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:57 +03:00
Duarte Nunes	21d0a2c764	query: Optionally send cell ttl This patch adds support to send a cell's ttl as part of a query's result. This is needed for thrift support. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-07-14 15:36:23 +02:00
Paweł Dziepak	7e06499458	repair: convert hashing to streamed_mutations This patch makes hashing for repair calculate checksums in a way that doesn't require rebuilding whole mutation. Unfortunately, such checksums are incompatible with the old ones so the old way for computing checksums is preserved for compatibility reasons. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:23 +01:00
Duarte Nunes	69798df95e	query: Limit number of partitions returned This is required to implement a thrift verb. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-22 09:48:13 +02:00
Duarte Nunes	01b18063ea	query: Add per-partition row limit This patch as a per-partition row limit. It ensures both local queries and the reconciliation logic abide by this limit. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-22 09:46:51 +02:00
Duarte Nunes	6a111fdd01	mutations: Introduce the range_tombstone class This patch introduces the range_tombstone class, composed of a [start, end] pair of clustering_key_prefixes, the type of inclusiveness of each bound, and a tombstone. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:58 +02:00
Duarte Nunes	7f8c35dd8c	idl: Add range tombstone IDL This patch adds the range tombstone IDL, preserving backwards compatibility. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:36 +02:00
Duarte Nunes	e2812c1b7a	idl: Rename range_tombstone::key to start ... and make it a clustering_key_prefix, in preparation of supporting not-whole-row range tombstones. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:36 +02:00
Vlad Zolotarov	6e26909b02	query::read_command: add an optional trace_info field Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:17:19 +03:00
Vlad Zolotarov	a53d329b25	tracing: add a serializable trace_info object tracing::trace_info is used to pass the tracing information between nodes. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:16:53 +03:00
Gleb Natapov	1e6f64f4ab	query: add latest modification timestamp to result structure	2016-05-24 13:27:34 +03:00
Asias He	a6080773b3	gossip: Add SUPPORTED_FEATURES application_state It is used to negotiate cluster wide features.	2016-04-06 07:12:34 +08:00

1 2

80 Commits