scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 13:37:04 +00:00

Author	SHA1	Message	Date
Nadav Har'El	25bd139508	cross-tree: clean up use of std::random_device() std::random_device() uses the relatively slow /dev/urandom, and we rarely if ever intend to use it directly - we normally want to use it to seed a faster random_engine (a pseudo-random number generator). In many places in the code, we first created a random_device variable, and then using it created a random_engine variable. However, this practice created the risk of a programmer accidentally using the random_device object, instead of the random_engine object, because both have the same API; This hurts performance. This risk materialized in just two places in the code, utils/uuid.cc and gms/gossiper.cc. A patch for to uuid.cc was sent previously by Pawel and is not included in this patch, and the fix for gossiper.{cc,hh} is included here. To avoid risking the same mistake in the future, this patch switches across the code to an idiom where the random_device object is not named, so cannot be accidentally used. We use the following idiom: std::default_random_engine _engine{std::random_device{}()}; Here std::random_device{}() creates the random device (/dev/urandom) and pulls a random integer from it. It then uses this seed to create the random_engine (the pseudo-random number generator). The std::random_device{} object is temporary and unnamed, and cannot be unintentionally used directly. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180726154958.4405-1-nyh@scylladb.com>	2018-07-26 16:54:58 +01:00
Tomasz Grabiec	894961006b	Merge "db/view/view_builder: Fixes to bookkeeping" from Duarte This series contains a couple of fixes to the bookkeeping of the view build process, which could cause data to be left behind in the system tables. * git@github.com:duarten/scylla.git materialized-views/view-build-fixes/v1: Duarte Nunes (3): db/system_keyspace: Add function to remove view build status of a shard db/view: Don't have shard 0 clear other shard's status on drop db/view: Restrict writes to the distributed system keyspace to shard 0	2018-07-17 18:01:28 +02:00
Tomasz Grabiec	25d09e51ac	Merge "db/view/build_progress_virtual_reader: Fixes to clustering key adjusts" from Duarte This series contains a couple of fixes to the adjusting of clustering keys in the build_progress_virtual_reader, some of which could potentially cause heap overflows when querying the legacy system table. * git@github.com:duarten/scylla.git materialized-views/build-progress-virtual-reader-fixes/v1: Duarte Nunes (3): db/view/build_progress_virtual_reader: Use correct schema to adjust ck db/view/build_progress_virtual_reader: Fix full ck detection db/view/build_progress_virtual_reader: Also adjust end RT bound	2018-07-17 18:00:30 +02:00
Avi Kivity	acb3163639	large_partition_handler: output friendly partition key Use abstract_type::to_string() to prettify partition key components. Manually tested by setting --compaction-large-partition-warning-threshold-mb to zero and inspecting the output for compound and non-compound partition keys.	2018-07-17 14:44:52 +03:00
Duarte Nunes	55caaec411	db/view/build_progress_virtual_reader: Also adjust end RT bound Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-07-11 23:28:31 +01:00
Duarte Nunes	eda6b88b0e	db/view/build_progress_virtual_reader: Fix full ck detection As an optimization, the virtual reader doesn't change the underlying key if it is not full, and hence doesn't include the extra clustering key. However, this detection is broken because it checked for 3 clustering columns, instead of 2. This patch fixes that by obtaining the clustering key size from the underlying schema instead of hardcoding the size. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-07-11 23:28:31 +01:00
Duarte Nunes	ff3a0d437a	db/view/build_progress_virtual_reader: Use correct schema to adjust ck The virtual reader adjusts clustering keys obtained from the underlying, scylla-specific schema, and potentially sheds the extra clustering key that's absent from the Cassandra-compatible schema. This patches ensures we use the correct schema to iterator over the key. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-07-11 23:28:31 +01:00
Duarte Nunes	df66d7db59	db/view: Restrict writes to the distributed system keyspace to shard 0 Writing to the distributed system keyspace should be confined to a single shard of each host, namely shard 0. We were violating this constraint by having all shards set the host status to "started". This could be problematic when the build finishes quickly or there's a concurrent view drop, such that a write done by shard 0 can have a smaller timestamp than one done by some other shard. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-07-11 21:45:26 +01:00
Duarte Nunes	e683c1367f	db/view: Don't have shard 0 clear other shard's status on drop Shard 0 can clear the in-progress build status of all shards when a view finishes building, because we are ensured all writes to the system table have completed with earlier timestamps. This is not the case when dropping a view. A drop can happen concurrently with the build, in which case shard 0 may process the notification before another shard receives it, and before that shard writes to the system table. Fix this by ensuring each shard clears its own status on drop. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-07-11 21:45:26 +01:00
Duarte Nunes	2fa7f10429	db/system_keyspace: Add function to remove view build status of a shard This patch adds a function that clears the view build in-progress status for the current shard, similar to the existing one that clears it across all shards. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-07-11 21:27:39 +01:00
Duarte Nunes	156817e00e	db/size_estimates_virtual_reader: Use left-exclusive token ranges We were considering the token ranges in the size_estimates system table to be inclusive, which is incorrect and incompatible with Cassandra. While we ignore the inclusiveness of the partition_range bounds when selecting sstables, we do take it into account in estimated_keys_for_range(). We would thus select the correct sstables, but could over-estimate the range size nonetheless. Tests: virtual_reader_test(release) Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180709115919.5106-1-duarte@scylladb.com>	2018-07-09 15:26:32 +03:00
Avi Kivity	512baf536f	storage_proxy: implement write timeouts Require a timeout parameter for storage_proxy::mutate_begin() and all its callers (all the way to thrift and cql modification_statement and batch_statement). This should fix spurious debug-mode test failures, where overcommit and general debug slowness result in the default timeouts being exceeded. Since the tests use infinite timeouts, they should not time out any more. Tests: unit (release), with an extra patch that aborts when a non-infinite timeout is detected. Message-Id: <20180707204424.17116-1-avi@scylladb.com>	2018-07-08 10:27:03 +01:00
Vlad Zolotarov	c65a110839	main: remove the "experimental" tag from the hinted handoff feature Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-07-06 19:19:40 -04:00
Vlad Zolotarov	83ba6d84a1	db::hints::manager: implement rebalance() method Rebalance hints segments that need to be sent among all present shards. Ensure that after rebalancing the difference between the number of segments of any two shards is not greater than 1. Try to minimize the amount of "file rename" operations in order to achieve the needed result. Note: "Resharding" is a particular case of rebalancing. Tests: dtest: hintedhandoff_additional_test.py:TestHintedHandoff.hintedhandoff_rebalance_test Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-07-06 19:18:46 -04:00
Piotr Sarna	828497ad19	hints: amend a comment in device limits To make the comment less confusing, 'group of managers' is used instead of 'device'. Refs #3516 Reported-by: Vlad Zolotarov <vladz@scylladb.com> Signed-off-by: Piotr Sarna <sarna@scylladb.com> Message-Id: <60c9ab6b47195570f7ce7dff9556e3739b7ae00f.1529862547.git.sarna@scylladb.com>	2018-06-24 19:14:59 +01:00
Piotr Sarna	8b43ac3a57	hints: reserve more space for dedicated storage Reserving 10% of space for hints managers makes sense if the device is shared with other components (like /data or /commitlog). But, if hints directory is mounted on a dedicated storage, it makes sense to reserve much more - 90% was chosen as a sane limit. Whether storage is 'dedicated' or not is based on a simple check if given hints directory is a mount point. Fixes #3516 Signed-off-by: Piotr Sarna <sarna@scylladb.com>	2018-06-22 10:27:00 +02:00
Piotr Sarna	32f86ca61e	hints: add is_mountpoint function A helper function that checks whether a path is also a mount point is added. Signed-off-by: Piotr Sarna <sarna@scylladb.com>	2018-06-22 10:26:52 +02:00
Piotr Sarna	b6c1b8c5ef	hints: make space_watchdog device-aware Instead of having one static space limit for all directories, space_watchdog now keeps a per-device limit, shared among hints managers residing on the same disks. References #3516 Signed-off-by: Piotr Sarna <sarna@scylladb.com>	2018-06-22 10:26:45 +02:00
Piotr Sarna	d22668de04	hints: add device_id to manager In order to make space_watchdog device-aware, device_id field is added to hints manager. It's an equivalent of stat.st_dev and it identifies the disk that contains manager's root directory. Signed-off-by: Piotr Sarna <sarna@scylladb.com>	2018-06-22 10:26:37 +02:00
Piotr Sarna	91b5e33c6a	hints: add get_device_id function In order to distinguish which directories reside on which devices, get_device_id function is added to resource manager. Signed-off-by: Piotr Sarna <sarna@scylladb.com>	2018-06-22 10:25:47 +02:00
Glauber Costa	290d553c3a	compaction_strategy: allow the user to tell us if min_threshold has to be strict Now that we have the controller, we would like to take min_threshold as a hint. If there is nothing to compact, we can ignore that and start compacting less than min_threshold SSTables so that the backlog keeps reducing. But there are cases in which we don't want min_threshold to be a hint and we want to enforce it strictly. For instance, if write amplification is more of a concern than space amplification. This patch adds a YAML option that allows the user to tell us that. We will default to false, meaning min_threshold is not strictly enforced. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-06-15 13:42:43 -04:00
Piotr Sarna	6b3a97e34a	hints: fix max_shard_disk_space_size initialization Previously max_shard_disk_space_size was unconditionally initialized with the capacity of hints_directory. But, it's likely that hints_directory doesn't exist at all if hinted handoff is not enabled, which results in Scylla failing to boot. So, max_shard_disk_space_size is now initialized with the capacity of hints_for_views directory, which is always present. This commit also moves max_shard_disk_space_size to the .cc file where it belongs - resource_manager.cc. Tests: unit (release) Message-Id: <9f7b86b6452af328c05c5c6c55bfad3382e12445.1528977363.git.sarna@scylladb.com>	2018-06-14 14:24:01 +01:00
Gleb Natapov	cdf1289b43	Provide available memory size to hinted handoff resource manager during creation	2018-06-11 15:34:13 +03:00
Gleb Natapov	cc47f6c69d	Provide available memory size to commitlog during creation	2018-06-11 15:34:13 +03:00
Nadav Har'El	41472e2618	legacy_schema_migrator: add comment When I came across db/legacy_schema_migrator.cc, I had no idea what it does and though I had obvious guesses (it somehow migrates old schemas, right?) I didn't know what it really does. So after I figured this out, I wrote this comment so the next person doesn't need to guess. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180605120225.25173-1-nyh@scylladb.com>	2018-06-10 19:39:06 +03:00
Avi Kivity	6f23403137	Merge "Virtualize IndexInfo system table" from Duarte " The IndexInfo table tracks the secondary indexes that have already been populated. Since our secondary index implementation is backed by materialized views, we can virtualize that table so queries are actually answered by built_views. Fixes #3483 " * 'built-indexes-virtual-reader/v2' of github.com:duarten/scylla: tests/virtual_reader_test: Add test for built indexes virtual reader db/system_keysace: Add virtual reader for IndexInfo table db/system_keyspace: Explain that table_name is the keyspace in IndexInfo index/secondary_index_manager: Expose index_table_name() db/legacy_schema_migrator: Don't migrate indexes	2018-06-06 17:35:51 +03:00
Duarte Nunes	833d34e88a	Merge 'Make rows in a secondary index ordered by token' from Piotr " As in #3423, ensuring token order on secondary index queries can be done by adding an additional column to views that back secondary indexes. This column is a first clustering column and contains token value, computed on updates. This series also updates tests and comments refering to issue 3423. Tests: unit (release, debug) " * 'order_by_token_in_si_5' of https://github.com/psarna/scylla: cql3: update token order comments index, tests: add token column to secondary index schema view: add handling of a token column for secondary indexes view: add is_index method	2018-06-06 10:07:43 +01:00
Piotr Sarna	d5e7b5507b	view: add handling of a token column for secondary indexes In order to ensure token order on secondary index queries, first clustering column for each view that backs a secondary index is going to store a token computed from base's partition keys. After this commit, if there exists a column that is not present in base schema, it will be filled with computed token.	2018-06-05 18:59:25 +02:00
Piotr Sarna	06eee0f525	view: add is_index method is_index method returns true if view that owns it is backing a secondary index.	2018-06-05 11:10:24 +02:00
Glauber Costa	bdce561ada	system_keyspace: add sharding information to local table We would like the clients to be able to route work directly to the right shards. To do that, they need to know the sharding algorithm and its parameters. The algorithm can be copied into the client, but the parameters need to be exported somewhere. Let's use the local table for that. Signed-off-by: Glauber Costa <glauber@scylladb.com> --- v2: force msb to zero on non-murmur	2018-06-04 11:25:58 -04:00
Duarte Nunes	3e39985c7a	db/system_keysace: Add virtual reader for IndexInfo table The IndexInfo table tracks the secondary indexes that have already been populated. Since our secondary index implementation is backed by materialized views, we can virtualize that table so queries are actually answered by built_views. Fixes #3483 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-06-04 11:14:17 +01:00
Duarte Nunes	65c4205334	db/system_keyspace: Explain that table_name is the keyspace in IndexInfo This patch adds the same comment that exists in Apache Cassandra, explaining that the table_name column in the IndexInfo system table actually refers to the keyspace name. Don't be fooled. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-06-04 11:14:17 +01:00
Duarte Nunes	7187963bda	db/legacy_schema_migrator: Don't migrate indexes Previous versions contained no indexes, and Apache Cassandra indexes cannot be migrated to Scylla. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-06-04 11:14:17 +01:00
Piotr Sarna	204bc17bd7	hints: decouple hints manager metrics from constructor Now that more than one instance of hints manager can be present at the same time, registering metrics is moved out of the constructor to prevent 'registering metrics twice' errors.	2018-06-04 09:46:06 +02:00
Piotr Sarna	f345efc79a	hints: move space_watchdog to resource manager Space watchdog is decoupled from hints manager and moved to resource manager, so it can be shared among different hints manager instances.	2018-06-04 09:46:01 +02:00
Piotr Sarna	ef40f7e628	hints: move send limiter to resource manager Send limiting semaphore is moved from hints manager to resource manager. In consequence, hints manager now keeps a reference to its resource manager.	2018-06-04 09:35:58 +02:00
Piotr Sarna	2315937854	hints: move constants to resource_manager Constants related to managing resources are moved to newly created resource_manager class. Later, this class will be used to manage (potentially shared) resources of hints managers.	2018-06-04 09:35:58 +02:00
Paweł Dziepak	0ea6d14cf5	atomic_cell: explicitly state when atomic_cell is a collection member Collections are not going to be fully converted to the IMR just yet and still use the old serialisation format. This means that they still don't support fragmented values very well. This patch passes the information when an atomic_cell is created as a member of a collection so that later we can avoid fragmenting the value in such cases.	2018-05-31 15:51:11 +01:00
Paweł Dziepak	aa25f0844f	atomic_cell: introduce fragmented buffer value interface As a prepratation for the switch to the new cell representation this patch changes the type returned by atomic_cell_view::value() to one that requires explicit linearisation of the cell value. Even though the value is still implicitly linearised (and only when managed by the LSA) the new interface is the same as the target one so that no more changes to its users will be needed.	2018-05-31 15:51:11 +01:00
Paweł Dziepak	27014a23d7	treewide: require type info for copying atomic_cell_or_collection	2018-05-31 15:51:11 +01:00
Paweł Dziepak	e9d6fc48ac	treewide: require type for creating atomic_cell	2018-05-31 15:51:11 +01:00
Paweł Dziepak	93130e80fb	atomic_cell: require column_definition for creating atomic_cell views	2018-05-31 15:51:11 +01:00
Duarte Nunes	99d678d079	db/view: Remove ifdef'd Java code It provides no useful information, so just get rid of it. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-05-28 11:51:23 +01:00
Duarte Nunes	ad18d535e9	db/view: Ignore scenario where base replica hasn't joined the ring Apache Cassandra handles a case where the node hasn't joined the ring and may consequentially have an outdated view of it. Following the same reasoning as with the previous patch, we ignore this scenario. It happens when there are range movements, and this node is bootstrapping, but there are already other mechanisms in the cluster, such as hinted handoff and dual-writing to replicas during range movements, that contribute to this update eventually making its way to the view. This patch doesn't change any behavior, but it provides the reasoning why we won't use the batchlog as Cassandra does, or the hinted handoff log as we will, to later send the update when the node is joined (note that Cassandra just sends the mutations "later", and doesn't check again for any condition or change). Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-05-28 11:51:23 +01:00
Duarte Nunes	be45e6a1b7	db/view: Handle case when base has no paired view replica If no view replica is paired with the current base replica, it means there's a range movement going on (decommission or move), such that this base replica is gaining new token ranges. The current node is thus a pending_endpoint from the POV of the coordinator that sent the request. Sending view updates to the view replica this base will eventually be paired with only makes a difference when the base update didn't make it to the node which is currently being decommissioned or moved-from. The update will, however, make it to that node if HH is enabled at the coordinator, before the range movement finishes, or later to this node when it becomes a natural endpoint for the token. We still ensure we send to any pending view endpoints though, at least until we handle that case more optimally. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-05-28 11:51:18 +01:00
Duarte Nunes	4859b759b9	Merge 'Make all timeouts explicit' from Avi " This patchset makes all users of query_processor specify their timeouts explicitly, in preparation for the removal of cql_statement::execute_internal() (whose main function was to override timeouts). " * tag 'cql-explicit-timeouts/v1' of https://github.com/avikivity/scylla: query_processor: require clients to specify timeout configuration query_processor: un-default consistency level in make_internal_options	2018-05-26 16:10:58 +02:00
Piotr Sarna	3792bed3ed	view: adapt view_stats to act as write stats This commit adapts view_stats structure so it can be passed to storage_proxy as write stats. Thanks to that, mv replica updates will not interfere with user write metrics. As a side effect it also provides more stats to replica view updates. Closes #3385 Closes #3416	2018-05-22 16:52:58 +02:00
Piotr Sarna	9246bb36bc	db: add row locking metrics This commit adds statistics to row_locker class. Metrics are independendly counted for all lock types: row<->partition and exclusive<->shared. Metrics gathered: - total acquisitions - operations that wait on the lock - histogram of the time spent on waiting on this type of lock References #3385 References #3416	2018-05-22 16:52:58 +02:00
Piotr Sarna	49bebcfa25	view: add view metrics This commit introduces view statistics: - updates pushed to local/remote replicas - updates failed to be pushed to local/remote replicas Metrics are kept on per-table basis, i.e. updates_pushed_remote shows the number of total updates (mutations) pushed to all paired mv replicas that this particular table has. Every single update is taken into consideration, so if view update requires removing a row from one view and adding a row to another, it will be counted as 2 updates. References #3385 References #3416	2018-05-22 16:52:58 +02:00
Calle Wilund	62c3b4c429	commitlog: Ensure file objects are closed before object free Fixes #3446 Previously, only shutdown-synced objects where actually closed, which is wrong. This introduces yet another queue, processed together with the deletion objects, which ensures we explicitly close all objects that have been discarded. Message-Id: <20180521140456.32100-1-calle@scylladb.com>	2018-05-22 14:52:06 +03:00

1 2 3 4 5 ...

1108 Commits