scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 20:27:03 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	2fd339787b	tests: lsa: Add test for reclaimer starting and stopping	2017-02-01 17:41:56 +01:00
Tomasz Grabiec	f943296da0	tests: lsa: Add request releasing stress test	2017-02-01 17:41:55 +01:00
Tomasz Grabiec	e40fb438f5	lsa: Avoid avalanche releasing of requests Before, the logic for releasing writes blocked on dirty worked like this: 1) When region group size changes and it is not under pressure and there are some requests blocked, then schedule request releasing task 2) request releasing task, if no pressure, runs one request and if there are still blocked requests, schedules next request releasing task If requests don't change the size of the region group, then either some request executes or there is a request releasing task scheduled. The amount of scheduled tasks is at most 1, there is a single thread of excution. However, if requests themselves would change the size of the group, then each such change would schedule yet another request releasing thread, growing the task queue size by one. The group size can also change when memory is reclaimed from the groups (e.g. when contains sparse segments). Compaction may start many request releasing threads due to group size updates. Such behavior is detrimental for performance and stability if there are a lot of blocked requests. This can happen on 1.5 even with modest concurrency becuase timed out requests stay in the queue. This is less likely on 1.6 where they are dropped from the queue. The releasing of tasks may start to dominate over other processes in the system. When the amount of scheduled tasks reaches 1000, polling stops and server becomes unresponsive until all of the released requests are done, which is either when they start to block on dirty memory again or run out of blocked requests. It may take a while to reach pressure condition after memtable flush if it brings virtual dirty much below the threshold, which is currently the case for workloads with overwrites producing sparse regions. Refs #2021. Fix by ensuring there is at most one request releasing thread at a time. There will be one releasing fiber per region group which is woken up when pressure is lifted. It executes blocked requests until pressure occurs. The logic for notification across hierachy was replaced by calling region_group::notify_relief() from region_group::update() on the broadest relieved group.	2017-02-01 17:41:55 +01:00
Tomasz Grabiec	d55baa0cd1	lsa: Move definitions to .cc	2017-02-01 17:41:55 +01:00
Tomasz Grabiec	8f8b111b33	lsa: Simplify hard pressure notification management The hard pressure was only signalled on region group when run_when_memory_available() was called after the pressure condition was met. So the following loop is always an infinite loop rather than stopping when engouh is allocated to cause pressure: while (!gr.under_pressure()) { region.allocate(...); } It's cleaner if pressure notification works not only if run_when_memory_available() is used but whenever conditino changes, like we do for the soft pressure. There is comment in run_when_memory_available() which gives reasons why notifications are called from there, but I think those reasons no longer hold: - we already notify on soft pressure conditions from update(), and if that is safe, notifying about hard pressure should also be safe. I checked and it looks safe to me. - avoiding notification in the rare case when we stopped writing right after crossing the threshold doesn't seem benefitial. It's unlikely in the first place, and one could argue it's better to actually flush now so that when writes resume they will not block.	2017-02-01 17:41:55 +01:00
Tomasz Grabiec	9aa1be5d08	lsa: Do not start or stop reclaiming on hard pressure We already call these when crossing the soft threshold. We shouldn't stop reclaiming when hard pressure is gone because soft pressure may still be present. Calling start_reclaiming() on hard pressure is unnecessary because soft pressure also starts it, and when there is hard pressure there is also soft pressure.	2017-02-01 17:40:15 +01:00
Tomasz Grabiec	f053b48f7c	tests: lsa: Adjust to take into account that reclaimers are run synchronously	2017-01-30 19:18:07 +01:00
Tomasz Grabiec	ed9ff19467	lsa: Document and annotate reclaimer notification callbacks They are called from region_group::update(), so must be alloc-free and noexcept.	2017-01-30 19:18:07 +01:00
Tomasz Grabiec	2ec6fe415e	tests: lsa: Use with_timeout() in quiesce() Current consutrct doesn't interrupt the test, the timeout failure will only be logged.	2017-01-30 19:18:07 +01:00
Duarte Nunes	937ed1bacb	bound_view: Simplify copy ctor By using default generation. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Reviewed-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1485355007-1913-1-git-send-email-duarte@scylladb.com>	2017-01-26 19:29:29 +02:00
Avi Kivity	b91b9b351a	Revert "Merge seastar upstream" This reverts commit f301c678bfe5eb5df71f71fd20e08b422b1023bb; the rpc changes don't compile due to rpc timeout type change.	2017-01-26 18:30:56 +02:00
Avi Kivity	f301c678bf	Merge seastar upstream * seastar 397685c...f5fa2e3 (3): > rpc: use lowres_clock instead of high resolution one > semaphore: make semaphore's clock configurable > rpc: detect timedout outgoing packets earlier	2017-01-26 18:16:14 +02:00
Gleb Natapov	64660397fc	storage_proxy: move operation type information from counter's name to a label Makes it much more flexible to view the data in various ways in Graphana. Message-Id: <20170126102746.GL11469@scylladb.com>	2017-01-26 12:38:29 +02:00
Tomasz Grabiec	2c7902fb2b	Revert "lsa: Reduce reclamation latency" This reverts commit `d61002cc33`. Introduced a regression in row_cache_alloc_stress. The problem is that reclaim_from_evictable() evicts way too much after the refactor due to the stop condition not taking into account how much data was evicted so far and only looking at occupancy of the minimal segment. This may lead to eviction of the whole region.	2017-01-26 10:43:18 +01:00
Paweł Dziepak	8cdffd7c57	time_type_impl: value initialize result parse_time() adds hourse, minutes, etc to a final value 'result'. However, it is of type std::chrono::nanoseconds which means it is not zeroed at initialization unless it is explicitly asked to do so. Fixed debug mode failures in types_tyes and cql_query_test. Message-Id: <20170125155239.1253-1-pdziepak@scylladb.com>	2017-01-25 17:56:31 +02:00
Paweł Dziepak	034d028329	Merge "range_tombstone_list: Properly implement difference()" from Duarte "This patchset properly implements range_tombstone_list::difference(), which was very broken. We add unit tests for the function and ensure we always randomly generate range_tombstones in other unit tests so other problems aren't hidden."	2017-01-25 12:08:19 +00:00
Duarte Nunes	8c65b98ea7	mutation_merger: Emit deferred tombstones This patch ensures the mutation_merger emits any deferred tombstones that it still may be holding before closing the stream. Together with the range_tombstone_list: Properly implement difference() patch set, this fixes breakage of streamed_mutation_test and row_cache_test. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170123195643.9876-1-duarte@scylladb.com>	2017-01-25 12:02:03 +00:00
Takuya ASADA	bce0fb3fa2	dist: add lspci on dependencies, since it used by dpdk-devbind.py On minimum setup environment scylla_sysconfig_setup will fail because lspci command is not installed. So install it on package installation time. Fixes #2035 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1485327435-20543-1-git-send-email-syuu@scylladb.com>	2017-01-25 10:22:57 +02:00
Avi Kivity	d2fc98270e	Merge seastar upstream * seastar 6d80c6a...397685c (4): > Merge "add label to the io_queue" from Amnon > rpc: Modify the shutdown code to wait and handle exceptions > tls.cc: Fix shutdown_input/output to conform with expected socket behaviour > core: Add counter for polls	2017-01-24 18:36:25 +02:00
Gleb Natapov	ccee01f352	storage_proxy: put datacenter name into a label instead of counter's name Having datacenter name as a label makes it possible to create Prometheus board for the counters. Message-Id: <20170124132051.GX11469@scylladb.com>	2017-01-24 15:27:34 +02:00
Duarte Nunes	54a464ae27	random_mutation_generator: Always generate range tombstones Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-01-23 19:02:23 +01:00
Duarte Nunes	a01aa91c82	range_tombstone_list: Add unit tests for difference() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-01-23 18:14:33 +01:00
Duarte Nunes	85315d1760	range_tombstone_list: Correctly implement difference() The difference method wasn't properly implemented. The version in this patch correctly computes the difference and returns a range tombstone list contains those range tombstones in "this" but absent from the other, specified range tombstone list. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-01-23 18:14:33 +01:00
Duarte Nunes	e7d20ea900	range_tombstone_list: Add apply() convenience overload Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-01-23 18:14:33 +01:00
Duarte Nunes	0847954d92	bound_view: Add copy ctor and assignment operator Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-01-23 18:14:33 +01:00
Avi Kivity	1758361640	Merge seastar upstream * seastar 38aaa4a...6d80c6a (2): > DPDK: Change the metrics registration with label support > metric: Fix the error: could not convert {...} from <brace-enclosed initializer list> to struct metric_definition_impl	2017-01-23 11:55:21 +02:00
Takuya ASADA	f6d7a76223	dist: rename dist/ubuntu to dist/debian Now we supported both Ubuntu and Debian on dist/ubuntu, and Ubuntu is one of Debian variant, so dist/debian is better naming. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1485161896-21851-1-git-send-email-syuu@scylladb.com>	2017-01-23 10:59:52 +02:00
Avi Kivity	31c8e6885b	build: improve support for custom builds Add a counter field to RELEASE, just before the date, and fix it at zero. This allows custom package builds to override it in a way that sorts before the official packages. Example: Official release: 1.6.0-0.20160120.<githash> Custom release 1: 1.6.0-1.avi.20160121.<githash> Custom release 2: 1.6.0-2.avi.20160122.<githash> The counter (0/1/2) ensures that the build number dominates over the date when sorting. Message-Id: <20170122102814.19649-1-avi@scylladb.com>	2017-01-22 14:56:52 +02:00
Avi Kivity	1be9c232b6	Merge seastar upstream * seastar ff098c8...38aaa4a (1): > metrics: equal operator should use ==	2017-01-22 14:41:59 +02:00
Tomasz Grabiec	834df74df0	Merge batch statement optimization from github.com/avikivity/scylla/1689/v2 From Avi: In many cases, batch statements are used to mutate a single partition, or a number of partitions that is smaller than the number of statements within the batch. We can detect this case and reduce the numbers of mutations applied, and in some cases, convert a logged batch into an unlogged batch. Ref #1689.	2017-01-20 13:44:05 +01:00
Tomasz Grabiec	6c75614d19	sstables: Fix input_stream not being closed by index_reader Fixes #2022 Message-Id: <1484912679-5729-1-git-send-email-tgrabiec@scylladb.com>	2017-01-20 11:58:33 +00:00
Paweł Dziepak	19ad35610b	sstables: do not discard future returned by fast_forward_to() continuous_data_consumer::fast_forward_to() returns a future which was later ignored by data_consume_context::fast_forward_to(). With the current implementation, the future in question is always ready and that's why the problem didn't manifest itself in the form of crashes or invalid results. Message-Id: <20170120105746.7300-1-pdziepak@scylladb.com>	2017-01-20 12:22:17 +01:00
Avi Kivity	a9403877e4	cql3: add more metrics for batch statements - how many statements are in a batch - different types of batches - whether we were able to convert a logged batch to an unlogged batch	2017-01-20 13:19:00 +02:00
Avi Kivity	e3c003544d	cql3: optimize batch_statement when the same partition is mutated multiple times Batch statements are often used to insert multiple rows into the same partition. Recognize this case and merge mutations to the same partition. If the result is a single mutation, there is an additional win (already present in the code), where a logged batch can be converted into an unlogged batch. Ref #1689.	2017-01-20 13:18:56 +02:00
Benoît Canet	bcc826cc34	mutation_reader: Short circuit the read path on empty range Add a boolean to short circuit the read path on empty range hoping for some speedup. tested in read write with cs using: cl=QUORUM duration=1m -mode native cql3 -rate threads=700 -node localhost Will do some additional benchmark. Fixes #1056 Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <20170118194451.16836-1-benoit@scylladb.com>	2017-01-20 10:05:40 +00:00
Avi Kivity	54b8acdd9f	dht: add hashing and comparison helpers to dht::decorarted_key An std::hash specialization, and an equality comparator.	2017-01-20 11:24:14 +02:00
Avi Kivity	141048e0e5	dht: improve token hash function For a small token, we can just return it, since it already is a hash. We hash large tokens using murmur3, which is supposedly a good hash.	2017-01-20 11:24:14 +02:00
Raphael S. Carvalho	1857ba0abc	db: fix bad resource usage distribution when resharding due to refresh That's because a single shard is used to calculate generation for new sstables in upload directory, and that will result in that single shard sharing all the resources with other shards. For refresh without upload dir, it currently works fine because we reshuffle column family dir instead. flush_upload_dir() is now a free function, takes a distributed database object, and uses calculate_shard_from_sstable_generation() to decide which shard will move sstable using its own generation namespace. Fixes #2008. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <b0cccf7bbb61416ff8718bac92fdca90cc5fb9c9.1484253232.git.raphaelsc@scylladb.com>	2017-01-19 18:55:21 +02:00
Duarte Nunes	d53f96e0da	column_family: Only update stats once for a shared sstables This patch ensures that when adding a shared sstable, we select only one cpu to update that column family's stats. This is important so we don't overestimated the on-disk size of sstables when resharding This fixes only a temporary miscount of the current load, since shared sstables are eventually re-written, but a fixes a permanent miscount of the total load. Refs #1592 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170119144823.31041-1-duarte@scylladb.com>	2017-01-19 17:40:35 +02:00
Tomasz Grabiec	d61002cc33	lsa: Reduce reclamation latency Currently eviction is performed until occupancy of the whole region drops below the 85% threshold. This may take a while if region had high occupancy and is large. We could improve the situation by only evicting until occupancy of the sparsest segment drops below the threshold, as is done by this change. I tested this using a c-s read workload in which the condition triggers in the cache region, with 1G per shard: lsa-timing - Reclamation cycle took 12.934 us. lsa-timing - Reclamation cycle took 47.771 us. lsa-timing - Reclamation cycle took 125.946 us. lsa-timing - Reclamation cycle took 144356 us. lsa-timing - Reclamation cycle took 655.765 us. lsa-timing - Reclamation cycle took 693.418 us. lsa-timing - Reclamation cycle took 509.869 us. lsa-timing - Reclamation cycle took 1139.15 us. The 144ms pause is when large eviction is necessary. The change improves worst case latency. Reclamation time statistics over 30 second period after cache fills up, in microseconds: Before: avg = 1524.283148 stdev = 11021.021118 min = 12.934000 max = 144356.000000 sum = 257603.852000 samples = 169 After: avg = 1317.362414 stdev = 1913.542802 min = 263.935000 max = 19244.600000 sum = 175209.201000 samples = 133 Refs #1634. Message-Id: <1484730859-11969-1-git-send-email-tgrabiec@scylladb.com>	2017-01-19 17:35:36 +02:00
Amos Kong	b880bdccef	dist/redhat: fix path of housekeeping.cfg scylla-housekeeping[3857]: Config file /etc/scylla.d/housekeeping.cfg is missing, terminating Housekeeping failed to execute for missing the config file, the config file should be in /etc/scylla.d/. Fixes #2020 Signed-off-by: Amos Kong <amos@scylladb.com> Message-Id: <e63f2f8cb94410a6dca4e6193932f0079755ad47.1484724328.git.amos@scylladb.com>	2017-01-19 11:08:46 +02:00
Avi Kivity	3c05a81ef9	Merge seastar upstream * seastar 240b0bf...ff098c8 (15): > metrics::impl::shard(): check if reactor is initialized before using it > reactor: introduce engine_is_ready() > fix metric name > Merge "Add label support to the metric layer" from Amnon > core: Avoid memory leak when submission to syscall_work_queue fails > core: Avoid memory leak when submission to smp_message_queue fails > core: append_challenged_posix_file_impl: Make exception-safe > Merge "Log backtrace in report_failed_future" from Tomasz > install-dependencies.sh: add systemtap-sdt-dev to Ubuntu/Debian dependencies > core: add fsqual.cc/.hh to core > dpdk: Fix compile error with rte_pci.h > fstream_test: fix spurious failures due to BOOST_REQUIRE_EQUAL thread-unsafety > reactor: unregister metrics of queue on shard 0 > build: track system header changes too > Prometheus: do not rely on collectd for the hostname	2017-01-19 11:00:12 +02:00
Tomasz Grabiec	dd0fb48564	sstables: Close _file even if random_access_reader::close() reports errors close() operation is like a destructor, it cannot fail. It just reports errors, but close itself succeeds. So we should proceed with the closing even if it fails. Message-Id: <1484245886-7269-1-git-send-email-tgrabiec@scylladb.com>	2017-01-18 12:41:55 +00:00
Tomasz Grabiec	d048eec254	row_cache: Fix stats handling for uncached wide partitions Report hitting wide partition dummy as a cache miss instead of a hit. Refs #2011 Message-Id: <1484302266-3828-1-git-send-email-tgrabiec@scylladb.com>	2017-01-18 09:58:04 +00:00
Tomasz Grabiec	87f15624f4	row_cache: Add counter for wide partition mispopulations Message-Id: <1484733250-14470-1-git-send-email-tgrabiec@scylladb.com>	2017-01-18 09:57:51 +00:00
Calle Wilund	5da92db432	cell_comparator: Better fix (i.e. potentially correct) for compound/clustered desc. As Tomek pointed out, previous code, regardless of version mismatch, of generating comparator description string was not correct (as in: in sync with origin). This modifies it to look at 1.) Actual clustring size 2.) Compound-ness 3.) Dense-ness to determine whether we should generate a compound desc, and whether it should contain a trailing utf8-desc type. v2: Simplify non-dense base column addition and ensure it handles thrift non-utf8 (as per comments from tomek) Message-Id: <1484670171-18362-1-git-send-email-calle@scylladb.com>	2017-01-17 18:03:11 +01:00
Amnon Heiman	e19fa02a17	remove scollectd from headers As the metrics migration progressed, some include to scollectd.hh left behind. Because of the nature of the scollecd implementation those include brings alot of code with them to the header files and eventually to many source file. This patch remove those include and add a missing include to storage_proxy.cc. The reason the compiler didn't complain is an indication to the problematic nature of those include in the first place. Before this patch, change in metrics.hh would cause 169 files to compile, after this change 17. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1484667536-2185-1-git-send-email-amnon@scylladb.com>	2017-01-17 17:39:47 +02:00
Calle Wilund	7d2a4defcf	schema: Fix version check for comparator desc string formatting Fixes #2019 According to the Java driver and cassandra, all versions < 3 include the PK in the comparator descriptor string. This broke for us when bumping the cassandra version 2.1 -> 2.2 Message-Id: <1484657580-14411-1-git-send-email-calle@scylladb.com>	2017-01-17 14:59:47 +02:00
Tomasz Grabiec	ddfee57c97	Replace iostream include with iosfwd in headers Message-Id: <1484656119-8386-4-git-send-email-tgrabiec@scylladb.com>	2017-01-17 14:52:44 +02:00
Tomasz Grabiec	50e3e3af08	db: Add missing include Message-Id: <1484656119-8386-3-git-send-email-tgrabiec@scylladb.com>	2017-01-17 14:52:44 +02:00

1 2 3 4 5 ...

11203 Commits