scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 20:16:43 +00:00

Author	SHA1	Message	Date
Pekka Enberg	1d5f7be447	systemd: Use PermissionsStartOnly instead of running sudo Use the PermissionsStartOnly systemd option to apply the permission related configurations only to the start command. This allows us to stop using "sudo" for ExecStartPre and ExecStopPost hooks and drop the "requiretty" /etc/sudoers hack from Scylla's RPM. Tested-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1466407587-31734-1-git-send-email-penberg@scylladb.com>	2016-06-20 11:53:24 +03:00
Vlad Zolotarov	baf3614e8f	sstables: don't backup sstables that are a result of a compaction According to incremental backup description (http://docs.datastax.com/en/cassandra_win/2.2/cassandra/operations/opsBackupIncremental.html) sstables that are a result of a compaction process should not be backed up since original sstables had already been backed up. Fixes #1308 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <1466338622-7323-1-git-send-email-vladz@cloudius-systems.com>	2016-06-20 09:52:30 +03:00
Pekka Enberg	f4153c75a0	cql3: Bump CQL language version to 3.2.1 We already added 3.2.1 support in commit `569d288` ("cql3: Add TRUNCATE TABLE alias for TRUNCATE") but never got around fixing the CQL version reported to drivers. Fixes #1358. Message-Id: <1466403967-28654-1-git-send-email-penberg@scylladb.com>	2016-06-20 09:42:12 +03:00
Avi Kivity	07045ffd7c	dist: fix scylla-kernel-conf postinstall scriptlet failure Because we build on CentOS 7, which does not have the %sysctl_apply macro, the macro is not expanded, and therefore executed incorrectly even on 7.2, which does. Fix by expanding the macro manually. Fixes #1360. Message-Id: <1466250006-19476-1-git-send-email-avi@scylladb.com>	2016-06-20 09:36:39 +03:00
Lucas Meneghel Rodrigues	ae622b0c08	dist/common/scripts/scylla_kernel_check: Update messages Small grammar tweaks to the script's output messages. Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com> Message-Id: <1466205496-3885-3-git-send-email-lmr@scylladb.com>	2016-06-19 19:28:58 +03:00
Lucas Meneghel Rodrigues	aacf7eb2ae	dist/common/scripts/scylla_kernel_check: Fix conditional statement Since most of the time people are running scylla_setup on a fully upgraded ubuntu 14.04 box, we rarely reach that code path, but once we do we end up with an error. Let's fix that. Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com> Message-Id: <1466205496-3885-2-git-send-email-lmr@scylladb.com>	2016-06-19 19:28:56 +03:00
Nadav Har'El	faa45812b2	Rewrite shared sstables only after entire CF is read Starting in commit `721f7d1d4f`, we start "rewriting" a shared sstable (i.e., splitting it into individual shards) as soon as it is loaded in each shard. However as discovered in issue #1366, this is too soon: Our compaction process relies in several places that compaction is only done after all the sstables of the same CF have been loaded. One example is that we need to know the content of the other sstables to decide which tombstones we can expire (this is issue #1366). Another example is that we use the last generation number we are aware of to decide the number of the next compaction output - and this is wrong before we saw all sstables. So with this patch, while loading sstables we only make a list of shared sstables which need to be rewritten - and the actual rewrite is only started when we finish reading all the sstables for this CF. We need to do this in two cases: reboot (when we load all the existing sstables we find on disk), and nodetool referesh (when we import a set of new sstables). Fixes #1366. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1466344078-31290-1-git-send-email-nyh@scylladb.com>	2016-06-19 16:50:51 +03:00
Paweł Dziepak	dde87e0b0e	row_cache: drop schema upgrade for new entries in update() Commit `daad2eb` "row_cache: fix memory leak in case of schema upgrade failure" has fixed a memory leak caused by failed upgrade_entry(). However, in case of upgrade failure memtable_entry used to create the new cache entry was left in some invalid state. If the operation was retried the cache would attempt again to apply that memtable_entry which now would be in invalid state. The solution is to either to ignore upgrade_entry() exceptions or do not call it at all and let the cache entry be upgraded on demand. This patch implements the latter. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1466163435-27367-1-git-send-email-pdziepak@scylladb.com>	2016-06-17 13:43:01 +02:00
Paweł Dziepak	daad2ebf81	row_cache: fix memory leak in case of schema upgrade failure When update() causes a new entry to be inserted to the cache the procedure is as follows: 1. allocate and construct new entry 2. upgrade entry schema 3. add entry to lru list and cache tree Step 2 may fail and at this point the pointer to the entry is neither protected by RAII nor added in any of the cache containers. The solution is to swap steps 2 and 3 so that even if the upgrade fails the entry is already owned by the cache and won't leak. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1466161709-25288-1-git-send-email-pdziepak@scylladb.com>	2016-06-17 13:12:01 +02:00
Asias He	4f3ce42163	storage_service: Prevent old version node to join a new version cluster We want to prevent older version of scylla which has fewer features to join a cluster with newer version of scylla which has more features, because when scylla sees a feature is enabled on all other nodes, it will start to use the feature and assume existing nodes and future nodes will always have this feature. In order to support downgrade during rolling upgrade, we need to support mixed old and new nodes case. 1) All old nodes O O O O O <- N OK O O O O O <- O OK 2) All new nodes N N N N N <- N OK N N N N N <- O FAIL 3) Mixed old and new nodes O N O N O <- N OK O N O N O <- O OK (O == old node, N == new node, <- == joining the cluster) With this patch, I tested: 1.1) Add new node to new node cluster gossip - Feature check passed. Local node 127.0.0.4 features = {RANGE_TOMBSTONES}, Remote common_features = {RANGE_TOMBSTONES} 1.2) Add old node to old node cluster gossip - Feature check passed. Local node 127.0.0.4 features = {}, Remote common_features = {} 2.1) Add new node to new node cluster gossip - Feature check passed. Local node 127.0.0.4 features = {RANGE_TOMBSTONES}, Remote common_features = {RANGE_TOMBSTONES} 2.2) Add old node to new node cluster seastar - Exiting on unhandled exception: std::runtime_error (Feature check failed. This node can not join the cluster because it does not understand the feature. Local node 127.0.0.4 features = {}, Remote common_features = {RANGE_TOMBSTONES}) 3.1) Add new node to mixed cluster gossip - Feature check passed. Local node 127.0.0.4 features = {RANGE_TOMBSTONES}, Remote common_features = {} 3.2) Add old node to mixed cluster gossip - Feature check passed. Local node 127.0.0.4 features = {}, Remote common_features = {} Fixes #1253	2016-06-17 10:49:45 +08:00
Asias He	32ed468e42	gossip: Remove empty string feature in get_supported_features If the feature string is empty, boost::split will return std::set<sstring> = {""} instead of std::set<sstring> = {} which will make a node with a feaure, e.g. std::set<sstring> = {"RANGE_TOMBSTONES"}, think it does not understand the feature of a node with no features at all.	2016-06-17 10:49:45 +08:00
Gleb Natapov	4659800ab9	storage_proxy: implement custom speculative retry strategy User may specify time after which speculative retry should happen instead of relying on cf statics. Use provided value in speculative executor. Message-Id: <20160616104422.GH5961@scylladb.com>	2016-06-16 13:45:56 +03:00
Pekka Enberg	d72c608868	service/storage_service: Make do_isolate_on_error() more robust Currently, we only stop the CQL transport server. Extract a stop_transport() function from drain_on_shutdown() and call it from do_isolate_on_error() to also shut down the inter-node RPC transport, Thrift, and other communications services. Fixes #1353	2016-06-16 13:34:09 +03:00
Avi Kivity	85bb5ea064	Merge "Reduce LSA reclaim latency" from Tomasz "Reclaiming many segments was observed to cause up to multi-ms latency. With the new setting, the latency of reclamation cycle with full segments (worst case mode) is below 1ms. I saw no difference in throughput in a CQL write micro benchmark in neither of these workloads: - full segments, reclaim by random eviction - sparse segments (3% occupancy), reclaim by compaction and no eviction Fixes #1274."	2016-06-16 10:47:57 +03:00
Pekka Enberg	a8f95e8081	dist/docker: Use Scylla superpackage for installation Make the Dockerfile more future-proof by using the Scylla superpackage for installation. Message-Id: <1466015996-19792-1-git-send-email-penberg@scylladb.com>	2016-06-16 10:32:18 +03:00
Benoît Canet	c133748a24	scylla_setup: Fix RAID device enumeration Commit `f42673ed1e` ("scylla_setup: Hide busy block devices from RAID0 configuration") wasn't enumerating anything. Additionally it listed from /dev/ and not /dev/dm which broke the tests conditions. This one uses blkid instead of /proc/partitions. A follow up patch will be required to mask encrypted devices. Signed-of-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1466059657-12377-1-git-send-email-benoit@scylladb.com>	2016-06-16 09:52:25 +03:00
Glauber Costa	01a658f51d	LSA: helper function for region_group current hierarchy walk converted, but more users will come. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-06-15 22:26:50 -04:00
Glauber Costa	741aa16748	LSA: allow a region_group to have a threshold for throttling specified Allocations will still be allowed if made directly, but callers will have the choice (in an upcoming patch) to proceed only if memory is below this threshold. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-06-15 22:26:50 -04:00
Glauber Costa	7cd0c0731e	region_group: delete move constructor Tomek correctly points out that since we are now using "this" in lambda captures, we should make the region_group not movable. We currently define a move constructor, but there are no users. So we should just remove them. copy constructor is already deleted, and so are the copy and move assignment operators. So by removing the move constructor, we should be fine. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-06-15 22:26:50 -04:00
Benoît Canet	0cf8144485	scylla_setup: Propose defaults values when judicious Also takes care of explaining the options. Fixes #1031 Signed-of-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1466011848-11054-1-git-send-email-benoit@scylladb.com>	2016-06-15 20:33:55 +03:00
Benoît Canet	263a55c0da	scylla_setup: Inform the user that he can skip any step Fixes: #1188 Signed-of-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1466001423-9547-3-git-send-email-benoit@scylladb.com>	2016-06-15 19:38:23 +03:00
Benoît Canet	f42673ed1e	scylla_setup: Hide busy block devices from RAID0 configuration This patch look in /proc/mount for the device name so the device or it's subdevices will be excluded from the availables RAID0 targets. It does the same with physical volume from device mapper. Fixes #1189 Message-Id: <1466001423-9547-4-git-send-email-benoit@scylladb.com>	2016-06-15 19:36:11 +03:00
Paweł Dziepak	c8e75d2e84	schema: cache is_atomic() in column_definition is_atomic() is called for each cell in mutation applies, compaction and query. Since the value doesn't change it can be easily cached which would save one indirection and virtual call. Results of perf_simple_query -c1 (median, duration 60): before after read 54611.49 55396.01 +1.44% write 65378.92 68554.25 +4.86% Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1465991045-11140-1-git-send-email-pdziepak@scylladb.com>	2016-06-15 19:18:13 +03:00
Benoît Canet	4def1f4524	dist: sysctl.d: Disable automatic numa balancing On NUMA hardware, autonuma may reduce performance by unmapping memory. Since we do manual NUMA placement, autonuma will not help anything. We ought to disable it by setting the kernel.numa_balancing sysctl to 0. Fixes: #1120 Signed-of-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1466006345-9972-1-git-send-email-benoit@scylladb.com>	2016-06-15 19:11:00 +03:00
Gleb Natapov	7f54333c45	storage_proxy: fix complication on older boost boost before 1.56.0 had broken boost:size() implementation. Do not use it. Message-Id: <20160615123134.GD5961@scylladb.com>	2016-06-15 15:34:57 +03:00
Asias He	de0fd98349	repair: Switch log level to warn instead of error dtest takes error level log as serious error. It is not a serious error for streaming to fail to send a verb and fail a streaming session which triggers a repair failure, for example, the peer node is gone or stopped. Switch to use log level warn instead of level error. Fixes repair_additional_test.py:RepairAdditionalTest.repair_kill_3_test Fixes: #1335 Message-Id: <406fb0c4a45b81bd9c0aea2a898d7ca0787b23e9.1465979288.git.asias@scylladb.com>	2016-06-15 13:01:35 +03:00
Asias He	94c9211b0e	streaming: Switch log level to warn instead of error dtest takes error level log as serious error. It is not a serious error for streaming to fail to send a verb and fail a streaming session, for example, the peer node is gone or stopped. Switch to use log level warn instead of level error. Fixes repair_additional_test.py:RepairAdditionalTest.repair_kill_3_test Fixes: #1335 Message-Id: <0149d30044e6e4d80732f1a20cd20593de489fc8.1465979288.git.asias@scylladb.com>	2016-06-15 13:01:22 +03:00
Vlad Zolotarov	c616e74ae4	locator::gossiping_property_file_snitch: use a lowres_clock time source for a timer gossiping_property_file_snitch checks a configuration file every 60s. lowres_clock clock source should be good enough for that. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1465314448-11611-1-git-send-email-vladz@cloudius-systems.com>	2016-06-15 13:01:05 +03:00
Tomasz Grabiec	207c8d94f1	idl: Rename variable to a more meaningful name Message-Id: <1465909911-10534-2-git-send-email-tgrabiec@scylladb.com>	2016-06-14 17:02:59 +03:00
Raphael S. Carvalho	80d8c5ef6f	compaction: use proper type in constructor Correctness is not affected due to long type, but an unsigned long type should be definitely used instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <d3ab15a3206306de195aeb3d78f9b5bc4ca9208e.1465908970.git.raphaelsc@scylladb.com>	2016-06-14 17:02:32 +03:00
Tomasz Grabiec	8e8f63de85	mutation_partition_view: Avoid unnecessary copy into temporary Message-Id: <1465909038-8174-1-git-send-email-tgrabiec@scylladb.com>	2016-06-14 17:02:17 +03:00
Tomasz Grabiec	75f899cc93	lsa: Make reclamation step configurable via config	2016-06-14 15:13:15 +02:00
Tomasz Grabiec	cd9955d2ce	lsa: Reclaim 1 segment by default Reclaiming many segments was observed to cause up to multi-ms latency. With the new setting, the latency of reclamation cycle with full segments (worst case mode) is below 1ms. I saw no decrease in throughput compared to the step of 16 segments in neither of these modes: - full segments, reclaim by random evicition - sparse segments (3% occupancy), reclaim by compaction and no eviction Fixes #1274.	2016-06-14 15:13:15 +02:00
Tomasz Grabiec	86b76171a8	lsa: Use the same step in both internal and external reclamations	2016-06-14 15:13:15 +02:00
Tomasz Grabiec	d74d902a01	lsa: Make reclamation step configurable	2016-06-14 15:13:14 +02:00
Tomasz Grabiec	93bb95bd0d	lsa: Log reclamation rate	2016-06-14 15:13:14 +02:00
Tomasz Grabiec	cb18418022	lsa: Print more details before aborting	2016-06-14 15:13:14 +02:00
Tomasz Grabiec	7cb98c916f	tests: lsa_async_eviction_test: Push to refs with reclaim lock push_back() is not reentrant with pop_front(), used by the evictor. If reclaimer runs when std::deque allocates a new node it will get corrupted. Fix by runnning push_back() under reclaim lock.	2016-06-14 15:13:14 +02:00
Tomasz Grabiec	de8772525a	tests: lsa_async_eviction_test: Make sure refs scope encloses reclaimer scope	2016-06-14 15:13:14 +02:00
Tomasz Grabiec	c4a556ac13	tests: lsa_async_eviction_test: Fix use after free due to at_exit() callback The callback will run after thread is destroyed. We don't really need the stop feature, so for now just remove it.	2016-06-14 15:13:14 +02:00
Pekka Enberg	155ad2eeb5	storage_service: Fix start_rpc_server() to use logger Message-Id: <1465882880-7392-1-git-send-email-penberg@scylladb.com>	2016-06-14 09:52:04 +02:00
Raphael S. Carvalho	0b2cd41daf	database: remember sstable level when cleaning it up Cleanup operation wasn't preserving level of sstables. That will have a bad impact on performance because compaction work is lost. Fixes #1317. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <35ce8fbbb4590725bb0414e6a5450fcbe6cb7212.1465843387.git.raphaelsc@scylladb.com>	2016-06-14 08:06:00 +03:00
Vlad Zolotarov	d3960f0bbb	tracing: rearrange shut down tracing::tracing local instance is dereferenced from a cql_server::connection::process_request(), therefore tracing::tracing service may be stop()ed only after a CQL server service is down. On the other hand it may not be stopped before RPC service is down because a remote side may request a tracing for a specific command too. This patch splits the tracing::tracing stop() into two phases: 1) Flush all pending tracing records and stop the backend. 2) Stop the service. The first phase is called after CQL server is down and before RPC is down. The second phase is called after RPC is down. Fixes #1339 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1465840496-19990-1-git-send-email-vladz@cloudius-systems.com>	2016-06-14 07:58:04 +03:00
Avi Kivity	49449fc30c	Merge seastar upstream * seastar 864d6dc...401c333 (8): > scollectd: Support filtering specific collectd metrics > core: Integrate error reporting with the logging framework > rpc: wait for all replies to be completed before closing rpc server > rpc: clean up resource accounting > queue: fix race between pop_eventually() and abort() > rpc_test: fix cancel test to not depend on timing. > tutorial: explain application-specific command line options > add ostream output operator for std::unordered_map	2016-06-13 19:35:00 +03:00
Gleb Natapov	e089166cfa	storage_proxy: wait only for expected CL when writing back data during read repair When read repair writes diffs back to replicas it is enough to wait for requested CL to guaranty read monotonicity. This patch makes read repair write reuse regular mutate functionality which already tracks CL status. This is done by changing write response handler to not hold mutation directly, but instead hold a container that, depending on whether this is read repair write or regular one, can provide different mutation per destination. Message-Id: <20160613124727.GL1096@scylladb.com>	2016-06-13 19:01:51 +03:00
Duarte Nunes	c896309383	database: Actually decrease query_state limit query_state expects the current row limit to be updated so it can be enforced across partition ranges. A regression introduced in `e4e8acc946` prevented that from happening by passing a copy of the limit to querying_reader. This patch fixes the issue by having column_family::query update the limit as it processes partitions from the querying_reader. Fixes #1338 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1465804012-30535-1-git-send-email-duarte@scylladb.com>	2016-06-13 10:03:27 +02:00
Avi Kivity	465c0a4ead	Merge "Make stronger guarantees in row_cache's clear/invalidate" from Tomasz "Correctness of current uses of clear() and invalidate() relies on fact that cache is not populated using readers created before invalidation. Sstables are first modified and then cache is invalidated. This is not guaranteed by current implementation though. As pointed out by Avi, a populating read may race with the call to clear(). If that read started before clear() and completed after it, the cache may be populated with data which does not correspond to the new sstable set. To provide such guarantee, invalidate() variants were adjusted to synchronize using _populate_phaser, similarly like row_cache::update() does. Fixes #1291."	2016-06-13 09:55:29 +03:00
Shlomi Livne	ac6f2b5c13	dist/common: Update scylla_io_setup to use settings done in cpuset.conf scylla_io_setup is searching for --smp and --cpuset setting in SCYLLA_ARGS. We have moved the settings of this args into /etc/scylla.d/cpuset.conf and they are set by scylla_cpuset_setup into CPUSET. Fixes: #1327 Signed-off-by: Shlomi Livne <shlomi@scylladb.com> Message-Id: <2735e3abdd63d245ec96cfa1e65f766b1c12132e.1465508701.git.shlomi@scylladb.com>	2016-06-10 09:37:44 +03:00
Vlad Zolotarov	89375d4c2a	service::storage_proxy: tracing: instrument read_digest and read_mutation_data Instrument read_digest and read_mutation_data handlers similarly to a read_data handler instrumentation. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1465304055-4263-1-git-send-email-vladz@cloudius-systems.com>	2016-06-09 14:32:42 +02:00
Pekka Enberg	8df5aa7b0c	utils/exceptions: Whitelist EEXIST and ENOENT in should_stop_on_system_error() There are various call-sites that explicitly check for EEXIST and ENOENT: $ git grep "std::error_code(E" database.cc: if (e.code() != std::error_code(EEXIST, std::system_category())) { database.cc: if (e.code() != std::error_code(ENOENT, std::system_category())) { database.cc: if (e.code() != std::error_code(ENOENT, std::system_category())) { database.cc: if (e.code() != std::error_code(ENOENT, std::system_category())) { sstables/sstables.cc: if (e.code() == std::error_code(ENOENT, std::system_category())) { sstables/sstables.cc: if (e.code() == std::error_code(ENOENT, std::system_category())) { Commit `961e80a` ("Be more conservative when deciding when to shut down due to disk errors") turned these errors into a storage_io_exception that is not expected by the callers, which causes 'nodetool snapshot' functionality to break, for example. Whitelist the two error codes to revert back to the old behavior of io_check(). Message-Id: <1465454446-17954-1-git-send-email-penberg@scylladb.com>	2016-06-09 10:03:04 +02:00

... 42 43 44 45 46 ...

11716 Commits