scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Author	SHA1	Message	Date
Calle Wilund	f317d7a975	commitlog: Simplify commitlog extension iteration Fixes #4640 Iterating extensions in commitlog.cc should mimic that in sstables.cc, i.e. a simple future-chain. Should also use same order for read and write open, as we should preserve transformation stack order. Message-Id: <20190702150028.18042-1-calle@scylladb.com>	2019-07-02 18:37:44 +03:00
Tomasz Grabiec	eb496b5eae	Merge "Allow changing configuration at runtime" from Avi This patchset allows changing the configuration at runtime, The user triggers this by editing the configuration file normally, then signalling the database with SIGHUP (as is traditional). The implementation is somewhat complicated due the need to store non-atomic mutable state per-shard and to synchronize the values in all shards. This is somewhat similar to Seastar's sharded<>, but that cannot be used since the configuration is read before Seastar is initialized (due to the need to read command-line options). Tests: unit (dev, debug), manual test with extra prints (dev) Ref #2689 Fixes #2517.	2019-07-01 15:04:59 +02:00
Avi Kivity	2abe015150	database: allow live update of the compaction_enforce_min_threshold config item Change the type from bool to updateable_value<bool> throughout the dependency chain and mark it as live updateable. In theory we should also observe the value and trigger compaction if it changes, but I don't think it is worthwhile.	2019-06-28 16:43:25 +03:00
Avi Kivity	8d7c1c7231	db: seed_provider_type: add operator==() Dynamically updateable configuration requires checking whether configuration items changed or not, so we can skip firing notifiers for the common case where nothing changed. This patch adds a comparison operator for seed_provider_type, which was missing it.	2019-06-28 16:43:25 +03:00
Avi Kivity	da2a98cde6	config: don't allow assignment to config values Currently, we allow adjusting configuration via cfg.whatever() = 5; by returning a mutable reference from cfg.whatever(). Soon, however, this operation will have side effects (updating all references to the config item, and triggering notifiers). While this can be done with a proxy, it is too tricky. Switch to an ordinary setter interface: cfg.whatever.set(5); Because boost::program_options no longer gets a reference to the value to be written to, we have to move the update to a notifier, and the value_ex() function has to be adjusted to infer whether it was called with a vector type after it is called, not before.	2019-06-28 16:43:25 +03:00
Glauber Costa	d916601ea4	toppartitions: fix typo toppartitons -> toppartitions Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190627160937.7842-1-glauber@scylladb.com>	2019-06-27 19:13:58 +03:00
Piotr Sarna	85a3a4b458	view: ignore duplicated key entries in progress virtual reader Build progress virtual reader uses Scylla-specific scylla_views_builds_in_progress table in order to represent legacy views_builds_in_progress rows. The Scylla-specific table contains additional cpu_id clustering key part, which is trimmed before returning it to the user. That may cause duplicated clustering row fragments to be emitted by the reader, which may cause undefined behaviour in consumers. The solution is to keep track of previous clustering keys for each partition and drop fragments that would cause duplication. That way if any shard is still building a view, its progress will be returned, and if many shards are still building, the returned value will indicate the progress of a single arbitrary shard. Fixes #4524 Tests: unit(dev) + custom monotonicity checks from <tgrabiec@scylladb.com>	2019-06-11 13:01:31 +02:00
Juliana Oliveira	fd83f61556	Add a warning for partitions with too many rows This patch adds a warning option to the user for situations where rows count may get bigger than initially designed. Through the warning, users can be aware of possible data modeling problems. The threshold is initially set to '100,000'. Tests: unit (dev) Message-Id: <20190528075612.GA24671@shenzou.localdomain>	2019-06-06 19:48:57 +03:00
Piotr Sarna	74f6ab7599	db: drop unnecessary double computation when feeding hash When feeding hash for schema digest, compact_for_schema_digest is mistakenly called twice, which may result in needless recomputation. Message-Id: <8f52201cf428a55e7057d8438025275023eb9288.1559826555.git.sarna@scylladb.com>	2019-06-06 16:16:47 +03:00
Calle Wilund	1e37e1d40c	commitlog: Add optional use of O_DSYNC mode Refs #3929 Optionally enables O_DSYNC mode for segment files, and when enabled ignores actual flushing and just barriers any ongoing writes. Iff using O_DSYNC mode, we will not only truncate the file to max size, but also do an actual initial write of zero:s to it, since XFS (intended target) has observably less good behaviour on non-physical file blocks. Once written (and maybe recycled) we should have rather satisfying throughput on writes. Note that the O_DSYNC behaviour is hidden behind a default disabled option. While user should probably seldom worry about this, we should add some sort of logic i main/init that unless specified by user, evaluates the commitlog disk and sets this to true if it is using XFS and looks ok. This is because using O_DSYNC on things like EXT4 etc has quite horrible performance. All above statements about performance and O_DSYNC behaviour are based on a sampling of benchmark results (modified fsqual) on a statistically non-ssignificant selection of disks. However, at least there the observed behaviour is a rather large difference between ::fallocate:ed disk area vs. actually written using O_DSYNC on XFS, and O_DSYNC on EXT4. Note also that measurements on O_DSYNC vs. no O_DSYNC does not take into account the wall-clock time of doing manual disk flush. This is intentionally ignored, since in the commitlog case, at least using periodic mode, flushes are relatively rare. Message-Id: <20190520120331.10229-1-calle@scylladb.com>	2019-05-20 15:10:48 +03:00
Avi Kivity	5b2c8847c7	Merge "Pre timestamp based data segregation cleanup" from Botond " This series contains loosely related generic cleanup patches that the timestamp based data segregation series depends on. Most of the patches have to do with making headers self-sustainable, that is compilable on their own. This was needed to be able to ensure that the new headers introduced or touched by that series are self-sustainable too. This series also introduces `schema_fwd.hh` which contains a forward declaration of `schema` and `schema_ptr` classes. No effort was made to find and replace all existing ad-hoc schema forward declarations in the source tree. " * 'pre-timestamp-based-data-segregation-cleanup/v1' of https://github.com/denesb/scylla: encoding_stats.hh: add missing include sstables/time_window_compaction_strategy.hh: make self-sufficient sstables/size_tiered_compaction_strategy.hh: make self-sufficient sstables/compaction_strategy_impl.hh: make header self-sufficient compaction_strategy.hh: use schema_fwd.hh db/extensions.hh: use schema_fwd.hh Add schema_fwd.hh	2019-05-15 17:37:06 +03:00
Avi Kivity	82b91c1511	Merge "gc_clock: Fix hashing to be backwards-compatible" from Tomasz " Commit `d0f9e00` changed the representation of the gc_clock::duration from int32_t to int64_t. Mutation hashing uses appending_hash<gc_clock::time_point>, which by default feeds duration::count() into the hasher. duration::rep changed from int32_t to int64_t, which changes the value of the hash. This affects schema digest and query digests, resulting in mismatches between nodes during a rolling upgrade. Fixes #4460. Refs #4485. " * tag 'fix-gc_clock-digest-v2.1' of github.com:tgrabiec/scylla: tests: Add test which verifies that schema digest stays the same tests: Add sstables for the schema digest test schema_tables, storage_service: Make schema digest insensitive to expired tombstones in empty partition db/schema_tables: Move feed_hash_for_schema_digest() to .cc file hashing: Introduce type-erased interface for the hasher hashing: Introduce C++ concept for the hasher hashers: Rename hasher to cryptopp_hasher gc_clock: Fix hashing to be backwards-compatible	2019-05-14 16:59:50 +03:00
Tomasz Grabiec	285ada5035	Merge "config: remove _make_config_values macro" from Avi The _make_config_values macro reduces duplication (both the item name and the types need to be available as C++ identifiers and as runtime strings), but is hard to work with. The macro is huge and editors don't handle it well, errors aren't identified at the correct location, and since the macro doesn't have types, it's hard to refactor. This series replaces the macro with ordinary C++ code. Some repetition is introduced, but IMO the result is easier to maintain than the macro. As a bonus the bulk of the code is moved away from the header file. Tests: unit (dev), manual testing of the config REST API * https://github.com/avikivity/scylla config-no-macro/v2 config: make the named_value type name available without requiring _make_config_values config: remove value_status from named_value template parameter list config: add named_value::value_as_json() api: config: stop using _make_config_values config: auto-add named_values into config_file config: add allowed_values parameter to named_value constructor config: convert _make_config_values to individual named_value member declarations and initializers	2019-05-14 16:00:23 +03:00
Botond Dénes	690ef09b8f	db/extensions.hh: use schema_fwd.hh	2019-05-14 13:27:30 +03:00
Tomasz Grabiec	9de071d214	schema_tables, storage_service: Make schema digest insensitive to expired tombstones in empty partition Schema digest is calculated by querying for mutations of all schema tables, then compacting them so that all tombstones in them are dropped. However, even if the mutation becomes empty after compaction, we still feed its partition key. If the same mutations were compacted prior to the query, because the tombstones expire, we won't get any mutation at all and won't feed the partition key. So schema digest will change once an empty partition of some schema table is compacted away. That's not a problem during normal cluster operation because the tombstones will expire at all nodes at the same time, and schema digest, although changes, will change to the same value on all nodes at about the same time. This fix changes digest calculation to not feed any digest for partitions which are empty after compaction. The digest returned by schema_mutations::digest() is left unchanged by this patch. It affects the table schema version calculation. It's not changed because the version is calculated on boot, where we don't yet know all the cluster features. It's possible to fix this but it's more complicated, so this patch defers that. Refs #4485. Asd	2019-05-14 10:43:06 +02:00
Tomasz Grabiec	3a4a903674	db/schema_tables: Move feed_hash_for_schema_digest() to .cc file	2019-05-14 10:43:06 +02:00
Paweł Dziepak	49b4aeca4d	Merge "hinted handoff: prevent sending attempts" from Vlad " Fix the broken logic that is meant to prevent sending hints when node is in a DOWN NORMAL state. " * 'hinted_handoff_stop_sending_to_down_node-v2' of https://github.com/vladzcloudius/scylla: hints_manager: rename the state::ep_state_is_not_normal enum value hinted handoff: fix the logic that detects that the destination node is in DN state hinted_handoff: sender::can_send(): optimize gossiper::is_alive(ep) check hinted handoff: end_point_hints_manager::sender: use _gossiper instead of _shard_manager.local_gossiper() types.cc: fix the compilation with fmt v5.3.0	2019-05-09 15:18:57 +01:00
Vlad Zolotarov	f07c341efc	hints_manager: rename the state::ep_state_is_not_normal enum value Rename this state value to better reflect the reality: state::ep_state_is_not_normal -> state::ep_state_left_the_ring The manager gets to this state when the destination Node has left the ring. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-05-08 15:46:47 -04:00
Vlad Zolotarov	93ba700458	hinted handoff: fix the logic that detects that the destination node is in DN state When node is in a DN state its gossiper state may be NORMAL, SHUTDOWN or "" depending on the use case. In addition to that if node has been removed from the ring its state is also going to be removed from the gossiper_state map. Let's consider the above when deciding if node is in the DN state. Fixes #4461 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-05-08 14:53:01 -04:00
Avi Kivity	1c65ba6e66	Use correct scylla_tables schema for removing version column Mutations carry their schema, so use that instead of bring in a global schema, which may change as features are added. Message-Id: <20190505132542.6472-1-avi@scylladb.com>	2019-05-06 13:51:08 +02:00
Piotr Sarna	cf8d2a5141	Revert "view: cache is_index for view pointer" This reverts commit `dbe8491655`. Caching the value was not done in a correct manner, which resulted in longevity tests failures. Fixes #4478 Branches: 3.1 Message-Id: <762ca9db618ca2ed7702372fbafe8ecd193dcf4d.1557129652.git.sarna@scylladb.com>	2019-05-06 11:45:46 +03:00
Benny Halevy	d9136f96f3	commitlog: descriptor: skip leading path from filename std::regex_match of the leading path may run out of stack with long paths in debug build. Using rfind instead to lookup the last '/' in in pathname and skip it if found. Fixes #4464 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190505144133.4333-1-bhalevy@scylladb.com>	2019-05-05 17:51:56 +03:00
Gleb Natapov	95c6d19f6c	batchlog_manager: fix array out of bound access endpoint_filter() function assumes that each bucket of std::unordered_multimap contains elements with the same key only, so its size can be used to know how many elements with a particular key are there. But this is not the case, elements with multiple keys may share a bucket. Fix it by counting keys in other way. Fixes #3229 Message-Id: <20190501133127.GE21208@scylladb.com>	2019-05-01 17:30:11 +03:00
Tomasz Grabiec	077c639e42	Merge "Simplify the result_set_row API" from Rafael Currently null and missing values are treated differently. Missing values throw no_such_column. Null values return nullptr, std::nullopt or throw null_column_value. The api is a bit confusing since a function returning a std::optional either returns std::nullopt or throws depending on why there is no value. With this patch series only get_nonnull throws and there is only one exception type. * https://github.com/espindola/scylla.git espindola/merge-null-and-missing-v2: query-result-set: merge handling of null and missing values Remove result_set_row::has Return a reference from get_nonnull	2019-04-30 11:06:29 +02:00
Rafael Ávila de Espíndola	63c47117b5	Return a reference from get_nonnull No reason to copy if we don't have to. Now that get_nonnull doesn't copy, replace a raw used of get_data_value with it. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-29 21:14:11 -07:00
Rafael Ávila de Espíndola	0474458872	Remove result_set_row::has Now that the various get methods return nullptr or std::nullopt on missing values, we don't need to do double lookups. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-29 19:56:26 -07:00
Rafael Ávila de Espíndola	2770b29036	query-result-set: merge handling of null and missing values Nothing seems to differentiate a missing and a null value. This patch then merges the two exception types and now the only method that throws is get_nonnull. The other methods return nullptr or std::nullopt as appropriate. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-04-29 19:56:20 -07:00
Tomasz Grabiec	c96ee9882b	db/schema_tables: Include view_virtual_columns in the digest only when all nodes do After `7c87405`, schema sync includes system_schema.view_virtual_columns in the schema digest. Old nodes don't know about this table and will not include it in the digest calculation. As a result, there will be schema disagreement until the whole cluster is upgraded. Fix this by taking the new table into account only when the whole cluster is upgraded. The table should not be used for anything before this happens. This is not currently enforced, but should be. Fixes #4457.	2019-04-28 15:50:13 +02:00
Tomasz Grabiec	73b859005c	db/schema_tables: Hash schema tables in the same order as on 3.0 The commit `7c87405` also indirectly changed the order of schema tables during hash calculation (index table should be taken after all other tables). This shows up when there is an index created and any of {user defined type, function, or aggregate}. Refs #4457.	2019-04-28 15:50:13 +02:00
Tomasz Grabiec	394a684a99	db/schema_tables: Remove table name caching from all_tables() The set of table names will depend on the features and thus will be dynamic.	2019-04-28 15:50:13 +02:00
Tomasz Grabiec	3cb7b2d72e	treewide: Propagate schema_features to db::schema::all_tables()	2019-04-28 15:50:13 +02:00
Tomasz Grabiec	0633fcde10	schema: Introduce schema_features	2019-04-28 15:50:12 +02:00
Tomasz Grabiec	6e2c190b5f	schema_tables: Propagate storage_service& to merge_schema() We will need to calculate cluster schema features at the time we calculate the schema digest.	2019-04-28 12:33:10 +02:00
Vlad Zolotarov	274b9d8069	hinted_handoff: sender::can_send(): optimize gossiper::is_alive(ep) check gossiper::is_alive() has a lot of not needed checks (e.g. is_me(ep)) that are irrelevant for HH use case and we may safely skip them. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-04-25 23:16:07 -04:00
Vlad Zolotarov	74b4076ceb	hinted handoff: end_point_hints_manager::sender: use _gossiper instead of _shard_manager.local_gossiper() sender has its own reference to the local gossiper - use it. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-04-25 23:04:02 -04:00
Avi Kivity	9a6c86e2a7	config: convert _make_config_values to individual named_value member declarations and initializers While causing some duplication (names are explicitly instead of implicitly stringified, and names are repeated in the member declaration and initializer), it is overall more maintainable than the huge macro. It is easier to overload named_value constructors when you can get error reporting on the line where the error occurs, for example.	2019-04-23 16:29:03 +03:00
Avi Kivity	d959fbfc16	config: auto-add named_values into config_file By passing a config_file into named_value, we remove another call to the _make_config_values() macro.	2019-04-23 16:29:03 +03:00
Avi Kivity	6033b6a079	config: add named_value::value_as_json() Currently, the REST API does its own conversion of named_value into json. This requires it to use the _make_config_values macro to perform iteration of all config items, since it needs to preserve the concrete type of the item while iterating, so it can select the correct json conversion. Since we want to remove that macro, we need to provide a different way to convert a config item to json. So this patch adds a value_as_json(). To hide json_return_value from the rest of the system, we extend config_type with a conversion function to handle the details. This usually calls the json_return_type constructor directly, but when it doesn't have default translation, it interposes a conversion into a type that json recognizes. I didn't bother maintaining the existing type names, since they're C++ names which don't make sense for the UI.	2019-04-23 16:28:19 +03:00
Avi Kivity	db3f61776f	config: remove value_status from named_value template parameter list The value_status is only needed at run-time, and removing it from the template parameter list reduces type proliferation (which leads to code bloat) and simplifies the code.	2019-04-23 16:15:28 +03:00
Avi Kivity	daf5744daa	config: make the named_value type name available without requiring _make_config_values I want to remove the _make_config_values macro, but it is needed now in api/config.cc to make the type names available. So as a first step, copy the type names to config_src. Further changes can extract it from there. Because we want to add more type infomation in following patches, place the type name in a new config_type object, instead of allocating a string_view in config_src.	2019-04-23 16:13:54 +03:00
Benny Halevy	5a99023d4a	treewide: use lambda for io_check of *touch_directory To prepare for a seastar change that adds an optional file_permissions parameter to touch_directory and recursive_touch_directory. This change messes up the call to io_check since the compiler can't derive the Func&& argument. Therefore, use a lambda function instead to wrap the call to {recursive_,}touch_directory. Ref #4395 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190421085502.24729-1-bhalevy@scylladb.com>	2019-04-21 12:04:39 +03:00
Paweł Dziepak	85409c1a16	Merge "Validate elements of collections" from Piotr " Previously we weren't validating elements of collections so it was possible to add non-UTF-8 string to a column with type list<text>. Tests: unit(release) Fixes #4009 " * 'haaawk/4009/v5' of github.com:scylladb/seastar-dev: types: Test correct map validation types: Test correct in clause validation types: Test correct tuple validation types: Test correct set validation types: Test correct list validation types: Add test_tuple_elements_validation types: Add test_in_clause_validation types: Add test_map_elements_validation types: Add test_set_elements_validation types: Add test_list_elements_validation types: Validate input when tuples types: Validate input when parsing a set types: Validate input when parsing a map types: Validate input when parsing a list types: Implement validation for tuple types: Implement validation for set types: Implement validation for map types: Implement validation for list types: Add cql_serialization_format parameter to validate	2019-04-18 19:07:14 +03:00
Tomasz Grabiec	5dc3f5ea33	Merge "Properly enable MC format on the cluster" from Piotr 1. All nodes in the cluster have to support MC_SSTABLE_FEATURE 2. When a node observes that whole cluster supports MC_SSTABLE_FEATURE then it should start using MC format. 3. Once all shards start to use MC then a node should broadcast that unbounded range tombstones are now supported by the cluster. 4. Once whole cluster supports unbounded range tombstones we can start accepting them on CQL level. tests: unit(release) Fixes #4205 Fixes #4113 * seastar-dev.git dev/haaawk/enable_mc/v11: system_keyspace: Add scylla_local system_keyspace: add accessors for SCYLLA_LOCAL storage_service: add _sstables_format field feature: add when_enabled callbacks system_keyspace: add storage_service param to setup Add sstable format helper methods Register feature listeners in storage_service Add service::read_sstables_format Use read_sstables_format in main.cc Use _sstables_format to determine current format Add _unbounded_range_tombstones_feature Update supported features on format change	2019-04-16 14:07:05 +02:00
Tomasz Grabiec	ac0d435c3e	Merge "hinted handoff: don't reuse_segments and discard corrupted segments" from Vlad This series addresses two issues in the hinted handoff that should complete fixing the infamous #4231. In particular the second patch removes the requirement to manually delete hints files after upgrading to 3.0.4. Tested with manual unit testing. * https://github.com/vladzcloudius/scylla.git hinted_handoff_drop_broken_segments-v3: hinted handoff: disable "reuse_segments" commitlog: introduce a segment_error hinted handoff: discard corrupted segments	2019-04-16 14:07:05 +02:00
Tomasz Grabiec	3fd82021b1	schema_tables: Serialize schema merges fairly All schema changes made to the node locally are serialized on a semaphore which lives on shard 0. For historical reasons, they don't queue but rather try to take the lock without blocking and retry on failure with a random delay from the range [0, 100 us]. Contenders which do not originate on shard 0 will have an extra disadvantage as each lock attempt will be longer by the across-shard round trip latency. If there is constant contention on shard 0, contenders originating from other shards may keep loosing to take the lock. Schema merge executed on behalf of a DDL statement may originate on any shard. Same for the schema merge which is coming from a push notification. Schema merge executed as part of the background schema pull will originate on shard 0 only, where the application state change listeners run. So if there are constant schema pulls, DDL statements may take a long time to get through. The fix is to serialize merge requests fairly, by using the blocking semaphore::wait(), which is fair. We don't have to back-off any more, since submit_to() no longer has a global concurrency limit. Fixes #4436. Message-Id: <1555349915-27703-1-git-send-email-tgrabiec@scylladb.com>	2019-04-15 20:40:38 +03:00
Avi Kivity	3afbe219cd	Merge "UDF/UDA related cleanups and refactoring" from Rafael " These are patches I wrote while working on UDF/UDA, but IMHO they are independent improvements and are ready for review. Tests: unit (debug) dtest (release) I checked that all tests in nosetests -v user_types_test.py sstabledump_test.py cqlsh_tests/cqlsh_tests.py now pass. " * 'espindola/udf-uda-refactoring-v3' of https://github.com/espindola/scylla: Refactor user type merging cql_type_parser::raw_builder: Allow building types incrementally cql3: delete dead code Include missing header return a const reference from return_type delete unused var Add a test on nested user types.	2019-04-15 16:52:13 +03:00
Glauber Costa	b9327f81cf	conf: stop telling people to run auto_bootstrap: false auto_bootstrap: false provide negligible gains for new clusters and it is extremely dangerous everywhere else. We have seen a couple of times in which users, confused by this, added this flag by mistake and added nodes with it. While they were pleased by the extremely fast times to add nodes, they were later displeased to find their data missing. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190414012028.20767-1-glauber@scylladb.com>	2019-04-14 10:42:25 +03:00
Piotr Jastrzebski	caa6798f2c	system_keyspace: add storage_service param to setup Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 09:33:40 +02:00
Piotr Jastrzebski	0211541d84	system_keyspace: add accessors for SCYLLA_LOCAL Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 09:33:40 +02:00
Piotr Jastrzebski	4c205b733a	system_keyspace: Add scylla_local Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2019-04-12 09:33:40 +02:00

1 2 3 4 5 ...

1383 Commits