scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-04 22:13:19 +00:00

Author	SHA1	Message	Date
Duarte Nunes	33e18a1779	db/schema_tables: Consider differing dropped columns If a node is notified of a schema change where the schema's dropped columns have changes, that node will miss the changes to the dropped columns. A scenario where this can happen is where a column c is dropped, then added as a different typed, and then dropped again, with a node n having seen the first drop and being notified of the subsequent add and drop. Fixes #2616 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170725170622.4380-1-duarte@scylladb.com>	2017-07-26 11:59:34 +02:00
Tomasz Grabiec	ecc85988dd	legacy_schema_migrator: Don't snapshot empty legacy tables Otherwise we will create a new (empty) snapshot each time we boot. Message-Id: <1500573920-31478-2-git-send-email-tgrabiec@scylladb.com>	2017-07-21 16:56:31 +02:00
Duarte Nunes	937fe80a1a	Merge 'Fix possible inconsistency of table schema version' from Tomasz "Fixes issues uncovered in longevity test (#2608). Main problem is that due to time drift scylla_tables.version column may not get deleted on all nodes doing the schema merge, which will make some nodes come up with different table schema version than others. The inconsistency will not heal because scylla_tables doesn't take part in the schema sync. This is fixed by the last patch. This will cause nodes to constantly try to sync the schema, which under some conditions triggers #2617." * tag 'tgrabiec/fix-table-schema-version-inconsistency-v1' of github.com:scylladb/seastar-dev: schema_tables: Add scylla_tables to ALL schema: Make schema_mutations equality consistent with digest schema_tables: Extract compact_for_schema_digest() schema_tables: Always drop scylla_tables::version	2017-07-21 16:55:23 +02:00
Duarte Nunes	7eecda3a61	schema: Support compaction enabled attribute Fixes #2547 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170721132206.3037-1-duarte@scylladb.com>	2017-07-21 15:38:45 +02:00
Tomasz Grabiec	ed2388da2c	schema_tables: Add scylla_tables to ALL So that scylla_tables takes part in the digest and in mutations sent as part of schema sync. Otherwise inconsistencies in scylla_tables will not heal. Refs #2608.	2017-07-20 15:47:10 +02:00
Tomasz Grabiec	6adbe61e2f	schema_tables: Extract compact_for_schema_digest()	2017-07-20 15:47:10 +02:00
Tomasz Grabiec	1b85c316bf	schema_tables: Always drop scylla_tables::version It can happen that due to time drift between nodes, the incoming "version" cell will have higher timestamp than api::new_timestamp(). In such case the column would not be dropped and would cause version mismatch between nodes. Ensure it's always covered by using max of current time and cell's timestamp. Refs #2608.	2017-07-20 15:47:10 +02:00
Avi Kivity	c5ee62a6a4	Merge "restrict background writers with scheduling groups" from Glauber "This patchset restricts background writers - such as compactions, streaming flushes and memtable flushes to a maximum amount of CPU usage through a seastar::thread_scheduling_group. The said maximum is recommended to be set 50 % - it is default disabled, but can be adjusted through a configuration option until we are able to auto-tune this. The second patch in this series provides a preview on how such auto-tune would look like. By implementing a simple controller we automatically adjust the quota for the memtable writer processes, so that the rate at which bytes come in is equal to the rates at which bytes are flushed. Tail latencies are greatly reduced by this series, and heavy spikes that previously appeared on CPU-bound workloads are no more." * 'memtable-controller-v5' of https://github.com/glommer/scylla: simple controller for memtable/streaming writer shares. restrict background writers to 50 % of CPU.	2017-07-20 10:58:53 +03:00
Calle Wilund	7a583585a2	system_keyspace: Make sure "system" is written to keyspaces (visible) Fixes #2514 Bug in schema version 3 update: We failed to write "system" to the schema tables. Only visible on an empty instance of course. Message-Id: <1500469809-23546-2-git-send-email-calle@scylladb.com>	2017-07-19 16:18:56 +03:00
Calle Wilund	247c36e048	system_schema: Fix remaining places not handing two system keyspaces Some places remained where code looked directly at system_keyspace::NAME to determine iff a ks is considered special/system/protected. Including schema digest calculation. Export "is_system_keyspace" and use accordingly. Message-Id: <1500469809-23546-1-git-send-email-calle@scylladb.com>	2017-07-19 16:18:45 +03:00
Duarte Nunes	1daf1bc4bb	Merge 'Revert back to 1.7 schema layout in memory' from Tomasz "Fixes schema layout incompatibility in a mixed 1.7 and 2.0 cluster (#2555) by reverting back to using the old layout in memory and thus also in across-node requests. We still use the new v3 layout in schema tables (needed by drivers and external tools). Translations happen when converting to/from schema mutations." * tag 'tgrabiec/use-v2-schema-layout-in-memory-v2' of github.com:scylladb/seastar-dev: schema: Revert back to the 1.7 layout of static compact tables in memory schema: Use v3 column layout when converting to/from schema mutations schema: Encapsulate column layout translations in the v3_columns class	2017-07-19 12:52:52 +02:00
Duarte Nunes	115ff1095e	db/view: Use view schema for view pk operations Instead of base schema. Fixes #2504 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170718190703.12972-1-duarte@scylladb.com>	2017-07-19 09:59:34 +02:00
Tomasz Grabiec	a9237c1666	schema: Revert back to the 1.7 layout of static compact tables in memory We are using C* 3.x compatible layout in schema tables but want to keep using the 1.7 layout in memory for compatibility during rolling upgrade. This patch switches the schema and schema_builder classes back to the old layout. Translation of layout happens when converting to/from schema mutations. Notable changes: 1) Includes a revert of commit `6260f31e08` "thrift: Update CQL mapping of static CFs". 2) Brings back the "default_validation_class" schema attribute. In v3 it can be dervied from column definitions, but in v2 it can't, so we have to store it. 3) legacy_schema_migrator and schema_builder don't have to do conversions to v3, this is now handled by the v3_columns class. schema_builder works with the same layout as schema, that is v2. 4) Includes a revert of commit `66991a7ccb` "v3 schema test fixes" Fixes #2555.	2017-07-19 09:52:15 +02:00
Tomasz Grabiec	dc2dc056a4	schema: Use v3 column layout when converting to/from schema mutations	2017-07-19 09:52:15 +02:00
Glauber Costa	c9a529ebee	simple controller for memtable/streaming writer shares. This patch introduces a simple controller that will adjust memtables CPU shares, trying to keep it around the soft limit: if we start going below it means we're too fast (unless we are idle) and shares are adjusted downwards. If we start going above it means we're too fast and shares are adjusted upwards. I have tested this extensively in a single-CPU setup with various CPU-bound workloads while tracking virtual dirty and the results are good, with virtual dirty fluctuating only slightly, somewhere within the desired range. Exceptions to this are: 1) when the load is very light - the idle system goes faster, and that's ok 2) when the load is very high - as foreground requests dominate we can't flush fast enough and hit the hard limit. However, in such scenarios the memtable shares do hit its maximum, and the results are no worse than they are right now and this will only be fixed by CPU-limiting the actual requests. This feature can be disabled with a config option - that is scheduled to go away as we acquire more confidence in this. When the feature is disabled, all background writers (streaming, compaction, memtables) will share the same scheduling group, with static quotas. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2017-07-18 23:35:47 -04:00
Glauber Costa	4f01ec0910	restrict background writers to 50 % of CPU. In scylla, we have foreground processes, which are latency sensitive and need to be responded to as fast as possible in order to maintain good latency profiles, and background process, which are less so. The most important background processes we have during normal write workload operations are memtable writes and sstable compactions. Those processes are quite CPU-intensive, and left unchecked will easily dominate the CPU. Lower values of task-quota usually help, as it will force those processes to preempt more, but aren't enough to guarantee good isolation. We have seen boxes with good NVMe storage having their throughput reduced to less than half of the original baseline in a short dive down for the duration of a compaction. In the long run, our goal is to leverage the CPU scheduler to make sure that those processes are balanced with respect to all the others. However, the current state of affairs is causing grievances as this very moment. Thankfully, those processes live in a seastar::thread, that ships with its own rudimentary bandwidth control mechanism: the scheduling group. The goal of this patch is to wrap background processes together in a scheduling group, and assign to such group 50 % of our CPU power; the remainder being left to foreground processes. While we pride ourselves in dynamically adjusting things to the workload, we won't be able to do this properly before the CPU scheduler lands - and let's face it, leaving background processes run wild is not adaptative either. Every workload would benefit most from a different value for such shares, but 50 % is as fair as it gets if we really need static partitining in the mean time. As a defense against unforeseen consequences, we'll leave the actual value as an option, but will do our best to hide it - as this is not a tunable that we want to be part of a normal Scylla setup. The most convenient place for this tunable is still db::config, so we can easily pass it down to the database layer - but we will not document it in the yaml, and will clearly note in the help string that it is not supposed to be tuned. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2017-07-18 23:35:33 -04:00
Asias He	adc5f0bd21	gossip: Implement the missing fd_max_interval_ms and fd_initial_value_ms option It is useful for larger cluster with larger gossip message latency. By default the fd_max_interval_ms is 2 seconds which means the failure_detector will ignore any gossip message update interval larger than 2 seconds. However, in larger cluster, the gossip message udpate interval can be larger than 2 seconds. Fixes #2603. Message-Id: <49b387955fbf439e49f22e109723d3a19d11a1b9.1500278434.git.asias@scylladb.com>	2017-07-17 13:29:16 +03:00
Duarte Nunes	13caccf1cf	Merge 'Fixes around migration to v3 schema tables' from Tomasz branch 'tgrabiec/schema-migration-fixes' of github.com:scylladb/seastar-dev: schema: Use proper name comparator legacy_schema_migrator: Properly migrate non-UTF8 named columns schema_tables: Store column_name in text form legacy_schema_migrator: Migrate columns like Cassandra schema_builder: Add factory method for default_names legacy_schema_migrator: Simplify logic thrift: Don't set regular_column_name_type schema: Use proper column name type for static columns schema: Fix column_name_type() for static compact tables schema: Introduce clustering_column_at() thrift: Reuse cell_comparator::to_sstring() for obtaining comparator type partition_slice_builder: Use proper column's type instead of regular_column_name_type()	2017-07-17 11:16:52 +02:00
Tomasz Grabiec	7e54290d38	legacy_schema_migrator: Properly migrate non-UTF8 named columns Currently migrator assumed all columns are utf8-named, which doesn't have to be the case for static compact tables. Refs #2597. Due to #2573, we can assume that Scylla wasn't used with non-utf8 column names, and that old names are always in textual form.	2017-07-17 09:40:06 +02:00
Tomasz Grabiec	60a76efd37	schema_tables: Store column_name in text form That's how it is stored by Cassandra. Refs #2597.	2017-07-17 09:40:06 +02:00
Tomasz Grabiec	61229a7536	legacy_schema_migrator: Migrate columns like Cassandra This fixes generation of synthetic columns for static compact tables. Current code always generates synthetic clustering column with utf8 type and synthetic regular column with bytes type (in schema_builder). That's fine when creating a new CQL table, but not when migrating existing tables created via thrift API. Fixes #2584. This also migrates empty compact value columns like Cassandra does. Such columns are present in compact tables without regular columns, e.g.: create table test (k int, ck int, primary key (k, ck)) with compact storage; They should be migrated to a synthetic regular column with empty_type type and a non-empty name.	2017-07-17 09:40:06 +02:00
Tomasz Grabiec	6dc299c27a	legacy_schema_migrator: Simplify logic The expression "is_dense.value_or(true)" is always true inside the if, so drop it. This allows us to drop temporary calulated_is_dense. We can also get rid of one of the if branches by extracting builder.set_is_dense() outside.	2017-07-17 09:40:06 +02:00
Vlad Zolotarov	45e23d8090	db::config: fix the permissions cache related parameters description Make the descriptions of permissions_validity_in_ms, permissions_update_interval_in_ms and permissions_cache_max_entries more readable and more related to what they really do. Mention the none-zero value requirement for the permissions_update_interval_in_ms and the permissions_cache_max_entries when the permissions cache is enabled. Adjust the parameters description in the scylla.yaml too. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1499957053-31792-1-git-send-email-vladz@scylladb.com>	2017-07-13 16:00:40 +01:00
Tomasz Grabiec	30ec4af949	legacy_schema_migrator: Fix calculation of is_dense Current algorithm was marking tables with regular columns not named "value" as not dense, which doesn't have to be the case. It can be either way. It should be enough to look at clustering components. If there is a clustering key, then table is dense if and only if all comparator components belong to the clustering key. If there is no clustering key, then if there are any regular columns we're sure it's not dense. Fixes #2587. Message-Id: <1499877777-7083-1-git-send-email-tgrabiec@scylladb.com>	2017-07-13 17:28:09 +03:00
Avi Kivity	a397889c81	Merge "Preserve table schema digest on schema tables migration" from Tomasz "Currently new nodes calculate digests based on v3 schema mutations, which are very different from v2 mutations. As a result they will use schemas with different table_schema_version that the old nodes. The old nodes will not recognize the version and will try to request its definition. That will fail, because old nodes don't understand v3 schema mutations. To fix this problem, let's preserve the digests during migration, so that they're the same on new and old nodes. This will allow requests to proceed as usual. This does not solve the problem of schema being changed during the rolling upgrade. This is not allowed, as it would bring the same problem back. Fixes #2549." * tag 'tgrabiec/use-consistent-schema-table-digests-v2' of github.com:cloudius-systems/seastar-dev: tests: Add test for concurrent column addition legacy_schema_migrator: Set digest to one compatible with the old nodes schema_tables: Persist table_schema_version schema_tables: Introduce system_schema.scylla_tables schema_tables: Simplify read_table_mutations() schema_tables: Resurrect v2 read_table_mutations() system_keyspace: Forward-declare legacy schemas legacy_schema_migrator: Take storage_proxy as dependency	2017-07-11 17:22:42 +03:00
Gleb Natapov	739dd878e3	consistency_level: report less live endpoints in Unavailable exception if there are pending nodes DowngradingConsistencyRetryPolicy uses live replicas count from Unavailable exception to adjust CL for retry, but when there are pending nodes CL is increased internally by a coordinator and that may prevent retried query from succeeding. Adjust live replica count in case of pending node presence so that retried query will be able to proceed. Fixes #2535 Message-Id: <20170710085238.GY2324@scylladb.com>	2017-07-11 16:51:56 +03:00
Tomasz Grabiec	f5909ec515	legacy_schema_migrator: Set digest to one compatible with the old nodes Calculate and set digest using v2 mutations so that digests are the same before and after migration. This is neeed so that no schema definition exchange is required during rolling upgrade. Fixes #2549.	2017-07-11 14:52:23 +02:00
Tomasz Grabiec	5b69d99bf8	schema_tables: Persist table_schema_version When migrating schema tables from v2 to v3, mutations underlying table schema will change, and so will their digest. However, we want the digest to be the same on new nodes as on the old nodes, because schema exchange is not possible between the two nodes, so they must to request schema definitions from each other. The solution is to make the digest persistable, so that it sticks to given table schema, surviving both migration and node restarts. On migration from v2, the digest will be calculated from v2 mutations, so it will be the same on new and old nodes.	2017-07-11 14:52:23 +02:00
Tomasz Grabiec	cdf5b67522	schema_tables: Introduce system_schema.scylla_tables It will be used to store Scylla spcific table metadata. We cannot store it in the standard "tables" table for compatibility reasons - Cassandra will fail to read schema if it encounteres columns it is not expecting.	2017-07-11 14:52:23 +02:00
Tomasz Grabiec	cdcdf4772f	schema_tables: Simplify read_table_mutations()	2017-07-11 14:52:23 +02:00
Tomasz Grabiec	6e62bc77f1	schema_tables: Resurrect v2 read_table_mutations()	2017-07-11 14:52:23 +02:00
Tomasz Grabiec	4b5818a404	system_keyspace: Forward-declare legacy schemas	2017-07-11 14:52:23 +02:00
Tomasz Grabiec	8624edc0fa	legacy_schema_migrator: Take storage_proxy as dependency Will be needed to query for mutations.	2017-07-11 14:52:23 +02:00
Tomasz Grabiec	310d2a54d2	legacy_schema_migrator: Use separate joinpoint instance for each table Otherwise we may deadlock, as explained in commit `5e8f0efc8`: Table drop starts with creating a snapshot on all shards. All shards must use the same snapshot timestamp which, among other things, is part of the snapshot name. The timestamp is generated using supplied timestamp generating function (joinpoint object). The joinpoint object will wait for all shards to arrive and then generate and return the timestamp. However, we drop tables in parallel, using the same joinpoint instance. So joinpoint may be contacted by snapshotting shards of tables A and B concurrently, generating timestamp t1 for some shards of table A and some shards of table B. Later the remaining shards of table A will get a different timestamp. As a result, different shards may use different snapshot names for the same table. The snapshot creation will never complete because the sealing fiber waits for all shards to signal it, on the same name. Message-Id: <1499762663-21967-1-git-send-email-tgrabiec@scylladb.com>	2017-07-11 11:21:45 +02:00
Avi Kivity	91221e020b	Merge "Silence schema pull errors during upgrade from 1.7 to 2.0" from Tomasz "Old and new nodes will advertise different schema version because of different format of schema tables. This will result in attempts to sync the schema by each of the node. Currently this will result in scary error messages in logs about sync failing due to not being able to find schema of given version. It's benign, but may scare users. It the future incompatibilities could result in more subtle errors. Better to inhibit it completely." * 'tgrabiec/fix-schema-pull-errors-during-upgrade' of github.com:cloudius-systems/seastar-dev: migration_manager: Give empty response to schema pulls from incompatible nodes migration_manager: Don't pull schema from incompatible nodes service: Advertise schema tables format version through gossip	2017-07-10 14:04:04 +03:00
Tomasz Grabiec	6555a2f50b	commitlog: Discard active but unused segments on shutdown So that they are not left on disk even though we did a clean shutdown. First part of the fix is to ensure that closed segments are recognized as not allocating (_closed flag). Not doing this prevents them from being collected by discard_unused_segments(). Second part is to actually call discard_unused_segments() on shutdown after all segments were shut down, so that those whose position are cleared can be removed. Fixes #2550. Message-Id: <1499358825-17855-1-git-send-email-tgrabiec@scylladb.com>	2017-07-09 19:25:22 +03:00
Tomasz Grabiec	d33d29ad95	legacy_schema_migrator: Drop tables instead of truncate()+remove() It achieves similar effect, but is safer than non-standard remove() path. The latter was missing unregistration from compaction manager. Fixes 2554. Message-Id: <1499447165-30253-1-git-send-email-tgrabiec@scylladb.com>	2017-07-09 18:36:44 +03:00
Tomasz Grabiec	18a9e1762c	service: Advertise schema tables format version through gossip Will be needed to inhibit schema exchange on per-peer basis.	2017-07-07 19:07:59 +02:00
Glauber Costa	f3742d1e38	disable defragment-memory-on-idle-by-default It's been linked with various performance issues, either by causing them or making them worse. One example is #1634, and also recently I have investigated continuous performance degradation that was also linked to defrag on idle activity. Until we can figure out how to reduce its impact, we should disable it. Signed-off-by: Glauber Costa <glauber@glauber.scylladb> Message-Id: <20170627201109.10775-1-glauber@scylladb.com>	2017-06-28 00:21:11 +03:00
Vlad Zolotarov	6839a50677	db::commitlog: entry_writer add a virtual destructor Add a virtual destructor for a base class commitlog::entry_writer. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1498511180-18391-1-git-send-email-vladz@scylladb.com>	2017-06-27 10:17:10 +03:00
Avi Kivity	9b21a9bfb6	Merge "Implement partial cache" from Tomasz and Piotr "This series enables cache to keep partial partitions. Reads no longer have to read whole partition from sstables in order to cache the result. The 10MB threshold for partition size in cache is lifted. Known issues: - There is no partial eviction yet, whole partitions are still evicted, and partition snapshots held by active reads are not evictable at all - Information about range continuity is not recorded if that would require inserting a dummy entry, or if previous entry doesn't belong to the latest snapshot - Cache update after memtable flush happening concurrently with reads may inhibit that reads' ability to populate cache (new issue) - Cache update from flushed memtables has partition granularity, so may cause latency problems with large partition - Schema is still tracked per-partition, so after schema changes reads may induce high latency due to whole partition needing to be converted atomically - Range tombstones are repeated in the stream for every range between cache entries they cover (new issue) - Populating scans for both small and large partitions (perf_fast_forward) experienced a 40% reduction of throughput, CPU bound How was this tested: - test.py --mode release - row_cache_stress_test -c1 -m1G - perf_fast_forward, passes except for the test case checking range continuity population which would require inserting a dummy entry (mentioned above) - perf_simple_query (-c1 -m1G --duration 32): before: 90k [ops/s] stdev: 4k [ops/s] after: 94k [ops/s] stdev: 2k [ops/s]" * tag 'tgrabiec/introduce-partial-cache-v8' of github.com:cloudius-systems/seastar-dev: (130 commits) tests: row_cache: Add test_tombstone_merging_in_partial_partition test case tests: Introduce row_cache_stress_test utils: Add helpers for dealing with nonwrapping_range<int> tests: simple_schema: Allow passing the tombstone to make_range_tombstone() tests: simple_schema: Accept value by reference tests: simple_schema: Make add_row() accept optional timestamp tests: simple_schema: Make new_timestamp() public tests: simple_schema: Introduce make_ckeys() tests: simple_schema: Introduce get_value(const clustered_row&) helper tests: simple_schema: Fix comment tests: simple_schema: Add missing include row_cache: Introduce evict() tests: Add cache_streamed_mutation_test tests: mutation_assertions: Allow expecting fragments mutation_fragment: Implement equality check tests: row_cache: Add test for population of random partitions tests: row_cache: Add test for partition tombstone population tests: row_cache: Test reading randomly populated partition tests: row_cache: Add test_single_partition_update() tests: row_cache: Add test_scan_with_partial_partitions ...	2017-06-26 14:54:37 +03:00
Avi Kivity	c4ae2206c7	messaging: respect inter_dc_tcp_nodelay configuration parameter We respect it partially (client side only) for now. Fixes #6. Message-Id: <20170623172048.23103-1-avi@scylladb.com>	2017-06-24 21:49:27 +02:00
Piotr Jastrzebski	77f944880c	cache: Remove support for wide partitions This will be handled by row cache now. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-06-24 18:06:11 +02:00
Duarte Nunes	4ef25e8e38	db/schema_tables: Add note to make_update_view_mutations Document that a new view schema passed to make_update_view_mutations() might be based on base schema that hasn't yet been loaded. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170618200558.96036-1-duarte@scylladb.com>	2017-06-23 15:24:35 +02:00
Avi Kivity	f0b20be14d	Revert "system_keyspace: Make sure "system" is written to keyspaces (visible)" This reverts commit `89ef69c4b3`. Prevents nodes from joining the cluster.	2017-06-21 16:58:04 +03:00
Calle Wilund	89ef69c4b3	system_keyspace: Make sure "system" is written to keyspaces (visible) Fixes #2514 Bug in schema version 3 update: We failed to write "system" to the schema tables. Only visible on an empty instance of course. Message-Id: <1497966982-10044-1-git-send-email-calle@scylladb.com>	2017-06-20 20:59:47 +02:00
Nadav Har'El	3018df11b5	Allow reading exactly desired byte ranges and fast_forward_to In commit `c63e88d556`, support was added for fast_forward_to() in data_consume_rows(). Because an input stream's end cannot be changed after creation, that patch ignores the specified end byte, and uses the end of file as the end position of the stream. As result of this, even when we want to read a specific byte range (e.g., in the repair code to checksum the partitions in a given range), the code reads an entire 128K buffer around the end byte, or significantly more, with read-ahead enabled. This causes repair to do more than 10 times the amount of I/O it really has to do in the checksumming phase (which in the current implementation, reads small ranges of partitions at a time). This patch has two levels: 1. In the lower level, sstable::data_consume_rows(), which reads all partitions in a given disk byte range, now gets another byte position, "last_end". That can be the range's end, the end of the file, or anything in between the two. It opens the disk stream until last_end, which means 1. we will never read-ahead beyond last_end, and 2. fast_fordward_to() is not allowed beyond last_end. 2. In the upper level, we add to the various layers of sstable readers, mutation readers, etc., a boolean flag mutation_reader::forwarding, which says whether fast_forward_to() is allowed on the stream of mutations to move the stream to a different partition range. Note that this flag is separate from the existing boolean flag streamed_mutation::fowarding - that one talks about skipping inside a single partition, while the flag we are adding is about switching the partition range being read. Most of the functions that previously accepted streamed_mutation::forwarding now accept also the option mutation_reader::forwarding. The exception are functions which are known to read only a single partition, and not support fast_forward_to() a different partition range. We note that if mutation_reader::forwarding::no is requested, and fast_forward_to() is forbidden, there is no point in reading anything beyond the range's end, so data_consume_rows() is called with last_end as the range's end. But if forwarding::yes is requested, we use the end of the file as last_end, exactly like the code before this patch did. Importantly, we note that the repair's partition reading code, column_family::make_streaming_reader, uses mutation_reader::forwarding::no, while the other existing reading code will use the default forwarding::yes. In the future, we can further optimize the amount of bytes read from disk by replacing forwarding::yes by an actual last partition that may ever be read, and use its byte position as the last_end passed to data_consume_rows. But we don't do this yet, and it's not a regression from the existing code, which also opened the file input stream until the end of the file, and not until the end of the range query. Moreover, such an improvement will not improve of anything if the overall range is always very large, in which case not over-reading at its end will not improve performance. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20170619152629.11703-1-nyh@scylladb.com>	2017-06-19 18:31:32 +03:00
Avi Kivity	58fd3dd006	Merge "cql3: Quote type name when needed" from Duarte "This patch set ensures we quote the name of a UDT when it contains characters that may cause parsing by the CQL parser to fail. Fixes #2491" * 'cql3-quote-type/v1' of https://github.com/duarten/scylla: cql3/util: Make maybe_quote() take argument by const reference cql3/cql3_type: Quote UDT name if needed schema: Lift maybe_quote() into cql3/util	2017-06-18 17:59:47 +03:00
Duarte Nunes	b2c5aca4cf	db/schema_tables: View mutations shouldn't always include base ones When making the schema mutations for a view update, we should only include the base table schema mutations (in case the target node doesn't contain them) when the view is being directly updated. When it is being updated as a side effect of updating the base table, then including the base schema mutations will hide the actual changes being performed on the base. Fixes #2500 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1497782822-2711-1-git-send-email-duarte@scylladb.com>	2017-06-18 16:29:59 +03:00
Avi Kivity	6e2c9ef9fb	Revert "Allow reading exactly desired byte ranges and fast_forward_to" This reverts commit `317d7fc253` (and also the related `2c57ab84b2`). It causes crashes during range scans, reported by Gleb: "To reproduce I run SELECT * FROM keyspace1.standard1; on typical c-s dataset and 3 node cluster. Backtrace: at /home/gleb/work/seastar/seastar/core/apply.hh:36 rvalue=<unknown type in /home/gleb/work/seastar/build/release/scylla, CU 0x54cf307, DIE 0x55ebf2a>) at /home/gleb/work/seastar/seastar/core/do_with.hh:57 range=std::vector of length 6, capacity 8 = {...}) at /home/gleb/work/seastar/seastar/core/future-util.hh:142 at ./seastar/core/future.hh:890 at /home/gleb/work/seastar/seastar/core/future-util.hh:119 at /home/gleb/work/seastar/seastar/core/future-util.hh:142	2017-06-18 16:10:21 +03:00

1 2 3 4 5 ...

906 Commits