scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 22:25:48 +00:00

Author	SHA1	Message	Date
Paweł Dziepak	5b45d46f82	row_cache: simplify slicing_reader Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:49 +01:00
Paweł Dziepak	9c83eb9542	mutation_reader: drop joining and lazy readers Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:49 +01:00
Paweł Dziepak	579de26e95	storage_proxy: drop make_local_reader() This code was used only by its unit test. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:49 +01:00
Paweł Dziepak	c8f4b96e76	tests: add streamed_mutation_tests Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:49 +01:00
Paweł Dziepak	a1fc5888d3	streamed_mutation: add mutation_merger Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:49 +01:00
Paweł Dziepak	48e08fa997	mutation: add mutation_from_streamed_mutation() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:49 +01:00
Paweł Dziepak	9df01c2a36	streamed_mutation: add streamed_mutation_from_mutation() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:49 +01:00
Paweł Dziepak	22160ae6d5	mutation_partition: make rows_type public Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:49 +01:00
Paweł Dziepak	675f684788	streamed_mutation: introduce streamed_mutation streamed_mutation represents a mutation in a form of a stream of mutation_fragments. streamed_mutation emits mutation fragments in the order they should appear in the sstables, i.e. static row is always the first one, then clustering rows and range tombstones are emitted according to the lexicographical ordering of their clustering keys and bounds of the range tombstones. Range tombstones are disjoint, i.e. after emitting range_tombstone_begin it is guaranteed that there is going to be a single range_tombstone_end before another range_tombstone_begin is emitted. The ordering of mutation_fragments also guarantees that by the time the consumer sees a clustering row it has already received all relevant tombstones. Partition key and partition tombstone are not streamed and is part of the streamed_mutation itself. streamed_mutation uses batching. The mutation implementations are supposed to fill a buffer with mutation fragments until is_buffer_full() or the end of stream is encountered. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:49 +01:00
Paweł Dziepak	262337768a	streamed_mutation: introduce mutation_fragment This commit introduces mutation_fragment class which represents the parts of mutation streamed by streamed_mutation. mutation_fragment can be: - a static row (only one in the mutation) - a clustering row - start of range tombstone - end of range rombstone There is an ordering (implemented in position_in_partition class) between mutation_fragment objects. It reflects the order in which content of partition appears in the sstables. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:49 +01:00
Paweł Dziepak	84713d2236	utils: extract optimized_optional<> from mutation_opt Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:49 +01:00
Paweł Dziepak	847bf878ec	mutation_partition: add more row::apply() overloads Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:48 +01:00
Paweł Dziepak	7809adc6ce	keys: add compound_wrapper::tri_compare Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:48 +01:00
Paweł Dziepak	c24f08a683	range_tombstone_list: compare full tombstones not just timestamps Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:48 +01:00
Paweł Dziepak	df4c1c6293	range_tombstone: simplify bound_view::equal() Bounds are equal only if they are of the same kind. No need to check weights. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:48 +01:00
Paweł Dziepak	a6aceb179d	range_tombstone: fix bound ordering Assuming the clustering keys are equal: excl_end < incl_start < incl_end < excl_start. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:48 +01:00
Paweł Dziepak	3a0e76d635	range_tombstone: check for adjacent instead of equal bounds Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:48 +01:00
Nadav Har'El	3372052d48	Rewriting shared sstables only after all shards loaded sstables After commit `faa4581`, each shard only starts splitting its shared sstables after opening all sstables. This was important because compaction needs to be aware of all sstables. However, another bug remained: If one shard finishes loading its sstables and starts the splitting compactions, and in parallel a different shard is still opening sstables - the second shard might find a half-written sstable being written by the first shard, and abort on a malformed sstable. So in this patch we start the shared sstable rewrites - on all shards - only after all shards finished loading their sstables. Doing this is easy, because main.cc already contains a list of sequential steps where each uses invoke_on_all() to make sure the step completes on all shards before continuing to the next step. Fixes #1371 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1466426641-3972-1-git-send-email-nyh@scylladb.com>	2016-06-20 16:25:24 +03:00
Calle Wilund	7cdea1b889	commitlog: Use flush queue for write/flush ordering, improve batch Using an ordering mechanism better than rw-locks for write/flush means we can wait for pending write in batch mode, and coalesce data from more than one mutation into a chunk. It also means we can wait for a specific read+flush pair (based on file position). Downside is that we will not do parallel writes in batch mode (unless we run out of buffer), which might underutilize the disk bandwidth. Upside is that running in batch mode (i.e. per-write consistency) now has way better bandwidth, and also, at least with high mutation rate, better average latency. Message-Id: <1465990064-2258-1-git-send-email-calle@scylladb.com>	2016-06-20 13:09:16 +03:00
Benoît Canet	77375cefaa	docker: normalize environment variables names Use a more docker like form. Signed-of-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1466414939-5019-1-git-send-email-benoit@scylladb.com>	2016-06-20 12:30:13 +03:00
Benoît Canet	4c7ac4cab7	docker: implement seeds and broadcast_address variables Implement the seeds and broadcast_address variable required for clustering behavior. Do it raw with sed in the startup script. Signed-of-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1466412846-4760-3-git-send-email-benoit@scylladb.com>	2016-06-20 11:55:03 +03:00
Benoît Canet	fd811c90fc	docker: Complete the missing part of production mode Scylla will not start if the disk was not benchmarked so start run io_tune with the right parameters. Also add the cpu_set environment variables for passing cpu set to iotune and scylla. Signed-of-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1466412846-4760-2-git-send-email-benoit@scylladb.com>	2016-06-20 11:54:54 +03:00
Pekka Enberg	1d5f7be447	systemd: Use PermissionsStartOnly instead of running sudo Use the PermissionsStartOnly systemd option to apply the permission related configurations only to the start command. This allows us to stop using "sudo" for ExecStartPre and ExecStopPost hooks and drop the "requiretty" /etc/sudoers hack from Scylla's RPM. Tested-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1466407587-31734-1-git-send-email-penberg@scylladb.com>	2016-06-20 11:53:24 +03:00
Vlad Zolotarov	baf3614e8f	sstables: don't backup sstables that are a result of a compaction According to incremental backup description (http://docs.datastax.com/en/cassandra_win/2.2/cassandra/operations/opsBackupIncremental.html) sstables that are a result of a compaction process should not be backed up since original sstables had already been backed up. Fixes #1308 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <1466338622-7323-1-git-send-email-vladz@cloudius-systems.com>	2016-06-20 09:52:30 +03:00
Pekka Enberg	f4153c75a0	cql3: Bump CQL language version to 3.2.1 We already added 3.2.1 support in commit `569d288` ("cql3: Add TRUNCATE TABLE alias for TRUNCATE") but never got around fixing the CQL version reported to drivers. Fixes #1358. Message-Id: <1466403967-28654-1-git-send-email-penberg@scylladb.com>	2016-06-20 09:42:12 +03:00
Avi Kivity	07045ffd7c	dist: fix scylla-kernel-conf postinstall scriptlet failure Because we build on CentOS 7, which does not have the %sysctl_apply macro, the macro is not expanded, and therefore executed incorrectly even on 7.2, which does. Fix by expanding the macro manually. Fixes #1360. Message-Id: <1466250006-19476-1-git-send-email-avi@scylladb.com>	2016-06-20 09:36:39 +03:00
Lucas Meneghel Rodrigues	ae622b0c08	dist/common/scripts/scylla_kernel_check: Update messages Small grammar tweaks to the script's output messages. Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com> Message-Id: <1466205496-3885-3-git-send-email-lmr@scylladb.com>	2016-06-19 19:28:58 +03:00
Lucas Meneghel Rodrigues	aacf7eb2ae	dist/common/scripts/scylla_kernel_check: Fix conditional statement Since most of the time people are running scylla_setup on a fully upgraded ubuntu 14.04 box, we rarely reach that code path, but once we do we end up with an error. Let's fix that. Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com> Message-Id: <1466205496-3885-2-git-send-email-lmr@scylladb.com>	2016-06-19 19:28:56 +03:00
Nadav Har'El	faa45812b2	Rewrite shared sstables only after entire CF is read Starting in commit `721f7d1d4f`, we start "rewriting" a shared sstable (i.e., splitting it into individual shards) as soon as it is loaded in each shard. However as discovered in issue #1366, this is too soon: Our compaction process relies in several places that compaction is only done after all the sstables of the same CF have been loaded. One example is that we need to know the content of the other sstables to decide which tombstones we can expire (this is issue #1366). Another example is that we use the last generation number we are aware of to decide the number of the next compaction output - and this is wrong before we saw all sstables. So with this patch, while loading sstables we only make a list of shared sstables which need to be rewritten - and the actual rewrite is only started when we finish reading all the sstables for this CF. We need to do this in two cases: reboot (when we load all the existing sstables we find on disk), and nodetool referesh (when we import a set of new sstables). Fixes #1366. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1466344078-31290-1-git-send-email-nyh@scylladb.com>	2016-06-19 16:50:51 +03:00
Paweł Dziepak	dde87e0b0e	row_cache: drop schema upgrade for new entries in update() Commit `daad2eb` "row_cache: fix memory leak in case of schema upgrade failure" has fixed a memory leak caused by failed upgrade_entry(). However, in case of upgrade failure memtable_entry used to create the new cache entry was left in some invalid state. If the operation was retried the cache would attempt again to apply that memtable_entry which now would be in invalid state. The solution is to either to ignore upgrade_entry() exceptions or do not call it at all and let the cache entry be upgraded on demand. This patch implements the latter. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1466163435-27367-1-git-send-email-pdziepak@scylladb.com>	2016-06-17 13:43:01 +02:00
Paweł Dziepak	daad2ebf81	row_cache: fix memory leak in case of schema upgrade failure When update() causes a new entry to be inserted to the cache the procedure is as follows: 1. allocate and construct new entry 2. upgrade entry schema 3. add entry to lru list and cache tree Step 2 may fail and at this point the pointer to the entry is neither protected by RAII nor added in any of the cache containers. The solution is to swap steps 2 and 3 so that even if the upgrade fails the entry is already owned by the cache and won't leak. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1466161709-25288-1-git-send-email-pdziepak@scylladb.com>	2016-06-17 13:12:01 +02:00
Gleb Natapov	4659800ab9	storage_proxy: implement custom speculative retry strategy User may specify time after which speculative retry should happen instead of relying on cf statics. Use provided value in speculative executor. Message-Id: <20160616104422.GH5961@scylladb.com>	2016-06-16 13:45:56 +03:00
Pekka Enberg	d72c608868	service/storage_service: Make do_isolate_on_error() more robust Currently, we only stop the CQL transport server. Extract a stop_transport() function from drain_on_shutdown() and call it from do_isolate_on_error() to also shut down the inter-node RPC transport, Thrift, and other communications services. Fixes #1353	2016-06-16 13:34:09 +03:00
Avi Kivity	85bb5ea064	Merge "Reduce LSA reclaim latency" from Tomasz "Reclaiming many segments was observed to cause up to multi-ms latency. With the new setting, the latency of reclamation cycle with full segments (worst case mode) is below 1ms. I saw no difference in throughput in a CQL write micro benchmark in neither of these workloads: - full segments, reclaim by random eviction - sparse segments (3% occupancy), reclaim by compaction and no eviction Fixes #1274."	2016-06-16 10:47:57 +03:00
Pekka Enberg	a8f95e8081	dist/docker: Use Scylla superpackage for installation Make the Dockerfile more future-proof by using the Scylla superpackage for installation. Message-Id: <1466015996-19792-1-git-send-email-penberg@scylladb.com>	2016-06-16 10:32:18 +03:00
Benoît Canet	c133748a24	scylla_setup: Fix RAID device enumeration Commit `f42673ed1e` ("scylla_setup: Hide busy block devices from RAID0 configuration") wasn't enumerating anything. Additionally it listed from /dev/ and not /dev/dm which broke the tests conditions. This one uses blkid instead of /proc/partitions. A follow up patch will be required to mask encrypted devices. Signed-of-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1466059657-12377-1-git-send-email-benoit@scylladb.com>	2016-06-16 09:52:25 +03:00
Benoît Canet	0cf8144485	scylla_setup: Propose defaults values when judicious Also takes care of explaining the options. Fixes #1031 Signed-of-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1466011848-11054-1-git-send-email-benoit@scylladb.com>	2016-06-15 20:33:55 +03:00
Benoît Canet	263a55c0da	scylla_setup: Inform the user that he can skip any step Fixes: #1188 Signed-of-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1466001423-9547-3-git-send-email-benoit@scylladb.com>	2016-06-15 19:38:23 +03:00
Benoît Canet	f42673ed1e	scylla_setup: Hide busy block devices from RAID0 configuration This patch look in /proc/mount for the device name so the device or it's subdevices will be excluded from the availables RAID0 targets. It does the same with physical volume from device mapper. Fixes #1189 Message-Id: <1466001423-9547-4-git-send-email-benoit@scylladb.com>	2016-06-15 19:36:11 +03:00
Paweł Dziepak	c8e75d2e84	schema: cache is_atomic() in column_definition is_atomic() is called for each cell in mutation applies, compaction and query. Since the value doesn't change it can be easily cached which would save one indirection and virtual call. Results of perf_simple_query -c1 (median, duration 60): before after read 54611.49 55396.01 +1.44% write 65378.92 68554.25 +4.86% Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1465991045-11140-1-git-send-email-pdziepak@scylladb.com>	2016-06-15 19:18:13 +03:00
Benoît Canet	4def1f4524	dist: sysctl.d: Disable automatic numa balancing On NUMA hardware, autonuma may reduce performance by unmapping memory. Since we do manual NUMA placement, autonuma will not help anything. We ought to disable it by setting the kernel.numa_balancing sysctl to 0. Fixes: #1120 Signed-of-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1466006345-9972-1-git-send-email-benoit@scylladb.com>	2016-06-15 19:11:00 +03:00
Gleb Natapov	7f54333c45	storage_proxy: fix complication on older boost boost before 1.56.0 had broken boost:size() implementation. Do not use it. Message-Id: <20160615123134.GD5961@scylladb.com>	2016-06-15 15:34:57 +03:00
Asias He	de0fd98349	repair: Switch log level to warn instead of error dtest takes error level log as serious error. It is not a serious error for streaming to fail to send a verb and fail a streaming session which triggers a repair failure, for example, the peer node is gone or stopped. Switch to use log level warn instead of level error. Fixes repair_additional_test.py:RepairAdditionalTest.repair_kill_3_test Fixes: #1335 Message-Id: <406fb0c4a45b81bd9c0aea2a898d7ca0787b23e9.1465979288.git.asias@scylladb.com>	2016-06-15 13:01:35 +03:00
Asias He	94c9211b0e	streaming: Switch log level to warn instead of error dtest takes error level log as serious error. It is not a serious error for streaming to fail to send a verb and fail a streaming session, for example, the peer node is gone or stopped. Switch to use log level warn instead of level error. Fixes repair_additional_test.py:RepairAdditionalTest.repair_kill_3_test Fixes: #1335 Message-Id: <0149d30044e6e4d80732f1a20cd20593de489fc8.1465979288.git.asias@scylladb.com>	2016-06-15 13:01:22 +03:00
Vlad Zolotarov	c616e74ae4	locator::gossiping_property_file_snitch: use a lowres_clock time source for a timer gossiping_property_file_snitch checks a configuration file every 60s. lowres_clock clock source should be good enough for that. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1465314448-11611-1-git-send-email-vladz@cloudius-systems.com>	2016-06-15 13:01:05 +03:00
Tomasz Grabiec	207c8d94f1	idl: Rename variable to a more meaningful name Message-Id: <1465909911-10534-2-git-send-email-tgrabiec@scylladb.com>	2016-06-14 17:02:59 +03:00
Raphael S. Carvalho	80d8c5ef6f	compaction: use proper type in constructor Correctness is not affected due to long type, but an unsigned long type should be definitely used instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <d3ab15a3206306de195aeb3d78f9b5bc4ca9208e.1465908970.git.raphaelsc@scylladb.com>	2016-06-14 17:02:32 +03:00
Tomasz Grabiec	8e8f63de85	mutation_partition_view: Avoid unnecessary copy into temporary Message-Id: <1465909038-8174-1-git-send-email-tgrabiec@scylladb.com>	2016-06-14 17:02:17 +03:00
Tomasz Grabiec	75f899cc93	lsa: Make reclamation step configurable via config	2016-06-14 15:13:15 +02:00
Tomasz Grabiec	cd9955d2ce	lsa: Reclaim 1 segment by default Reclaiming many segments was observed to cause up to multi-ms latency. With the new setting, the latency of reclamation cycle with full segments (worst case mode) is below 1ms. I saw no decrease in throughput compared to the step of 16 segments in neither of these modes: - full segments, reclaim by random evicition - sparse segments (3% occupancy), reclaim by compaction and no eviction Fixes #1274.	2016-06-14 15:13:15 +02:00

1 2 3 4 5 ...

9583 Commits