scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 19:35:12 +00:00

Author	SHA1	Message	Date
Glauber Costa	45f3bc679e	distributed_loader: assume populate_column_families is run in shard 0 This is already the case, since main.cc calls it from shard 0 and relies on it to spread the information to the other shards. We will turn this branch - which is always taken - into an assert for the sake of future-proofing and soon add even more code that relies on this being executed in shard 0. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-18 09:00:27 -04:00
Glauber Costa	bb07678346	api: do not allow user to meddle with auto compaction too early We are about to use the auto compaction property during the populate/reshard process. If the user toggles it, the database can be left in a bad state. There should be no reason why a user would want to set that up this early. So we'll disallow it. To do that property, it is better if the check of whether or not the storage service is ready to accomodate this request is local to the storage service itself. We then move the logic of set_tables_autocompaction from api to the storage service. The API layer now merely translates the table names and pass it along. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-18 09:00:25 -04:00
Glauber Costa	1c70a7c54e	upload: use custom error handler for upload directory SSTables created for the upload directory should be using its custom error handler. There is one user of the custom error handler in tree, which is the current upload directory function. As we will use a free function instead of a lambda in our implementation we also use the opportunity to fix it for consistency. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-16 19:42:19 -04:00
Glauber Costa	c188aef088	sstable_directory: fix debug message I just noticed while working on the reshape patches that there is an extra format bracket in two of the debug message. As they are debug I've seen them less often than the others and that slipped. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2020-06-16 19:42:19 -04:00
Yaron Kaikov	ac7237f991	dbuild: Add an option to run with 'docker' or 'podman' This adds support for configuring whether to run dbuild with 'docker' or 'podman' via a new environment variable, DBUILD_TOOL. While at it, check if 'podman' exists, and prefer that by default as the tool for dbuild.	2020-06-16 15:18:46 +03:00
Gleb Natapov	7ca937778d	cql transport: do not log broken pipe error when a client closes its side of a connection abruptly Fixes #5661 Message-Id: <20200615075958.GL335449@scylladb.com>	2020-06-16 13:59:12 +02:00
Nadav Har'El	41a049d906	README: better explanation of dependencies and build In this patch I rewrote the explanations in both README.md and HACKING.md about Scylla's dependencies, and about dbuild. README.md used to mention only dbuild. It now explains better (I think) why dbuild is needed in the first place, and that the alternative is explained in HACKING.md. HACKING.md used to explain only install-dependencies.sh - and now explains why it is needed, what install-dependencies.sh and that it ONLY works on very recent distributions (e.g., Fedora older than 32 are not supported), and now also mentions the alternative - dbuild. Mentions of incorrect requirements (like "gcc > 8.1") were fixed or dropped. Mention of the archaic 'scripts/scylla_current_repo' script, which we used to need to install additional packages on non-Fedora systems, was dropped. The script itself is also removed. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200616100253.830139-1-nyh@scylladb.com>	2020-06-16 13:26:04 +02:00
Avi Kivity	bd794629f9	range: rename range template family to interval nonwrapping_range<T> and related templates represent mathematical intervals, and are different from C++ ranges. This causes confusion, especially when C++ ranges and the range templates are used together. As the first step to disentable this, introduce a new interval.hh header with the contents of the old range.hh header, renaming as follows: range_bound -> interval_bound nonwrapping_range -> nonwrapping_interval wrapping_range -> wrapping_interval Range -> Interval (concepts) The range alias, which previously aliased wrapping_range, did not get renamed - instead the interval alias now aliases nonwrapping_interval, which is the natural interval type. I plan to follow up making interval the template, and nonwrapping_interval the alias (or perhaps even remove it). To avoid churn, a new range.hh header is provided with the old names as aliases (range, nonwrapping_range, wrapping_range, range_bound, and Range) with the same meaning as their former selves. Tests: unit (dev)	2020-06-16 13:36:20 +03:00
Piotr Sarna	3bcc2e8f09	Merge 'hinted handoff: improve segment replay logic' from PiotrD This series contains two improvements to hint file replay logic in hints manager: - During replay of a hint file, keeping track of the first hint that fails to be sent is now done via a simple std::optional variable instead of an unordered_set. This slightly reduces complexity of next replay position calculation. - A corner case is handled: if reading commitlog fails, but there won't be an error related to sending hints, starting position wouldn't be updated. This could cause us to replay more hints than necessary. Tests: - unit(dev) - dtest(hintedhandoff_additional_test, dev) * piodul-hints-manager-handle-commitlog-failure-in-replay-position-calculation: hinted handoff: use bool instead of send_state_set hinted handoff: update replay position on commitlog failure hinted handoff: remove rps_set, use first_failed_rp instead	2020-06-16 12:24:55 +02:00
Avi Kivity	6ba7b8f3f5	Update seastar submodule * seastar 81242ccc3f...8f0858cfd7 (18): > Merge 'future, future-utils: stop returning a variadic future from when_all_succeed' > file: introduce layered_file_impl, a helper for layered files > net: packet: mark move assignment operator as noexcept > core: weak_ptr, weakly_referencable: implement empty default constructor > circular_buffer: Fix build with gcc 11 (avoid template parameters in d'tor declaration) > test: weak_ptr_test: fix static asserts about nothrow constructibility > coroutines: Fix clang build > cmake: Delete SEASTAR_COROUTINES_TS > Merge "future-util: Mark a few more functions as noexcept" from Rafael > tests: add a perf test to measure the fair_queue performance > Merge "iostream: make iostream stack nothrow move constructible" from Benny > future: Move most of rethrow_with_nested out of line. > future_test: Add test for nested exceptions in finally > core: Add noexcept to unaligned members functions > Merge "core: make weak_ptr and checked_ptr default and move nothrow constructible" from Benny > core: file: Fix typo in a comment > byteorder: Mark functions as noexcept > future: replace CanInvoke concepts with std::invocable	2020-06-16 13:19:36 +03:00
Piotr Sarna	e59d41dad6	alternator: use plain function pointer instead of std::function Since all function handlers are plain functions without any state, there's no need for wrapping them with a 32-byte std::function when a plain function pointer would suffice. Reported-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <913c1de7d02c252b40dc0c545989ec83fe74e5a9.1592291413.git.sarna@scylladb.com>	2020-06-16 12:08:21 +03:00
Raphael S. Carvalho	238ba899c0	compaction_manager: use double for backlog everywhere Avi says: "The backlog is a large number that changes slowly, so float might not have enough resolution to track small changes. For example, if the backlog is 800GB and changes less than 100kB, then we won't see a change (float resolution is 2^23 ~ 1:8,000,000). This is outside the normal range of values (usually the backlog changes a lot more than 100kB per 15-second period), so it will work, but better to be more careful." Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200615150621.17543-1-raphaelsc@scylladb.com>	2020-06-16 12:05:05 +03:00
Piotr Sarna	45bf039357	alternator: use has_function instead of try-catch With the new interface available, the try-catch idiom can be removed, thus resolving a TODO. Tests: unit(dev) Message-Id: <788a29f8f9d7bcf952b28a6148670dbadb97a619.1592233511.git.sarna@scylladb.com>	2020-06-15 23:55:20 +03:00
Piotr Sarna	911dee5417	schema: add has_column utility function With this simple helper function, a code snippet in alternator can be transformed from try-catch to a simple condition. Message-Id: <553debf4e91c0511566e53e2c8a5e8e6ee6552e2.1592233511.git.sarna@scylladb.com>	2020-06-15 23:55:06 +03:00
Piotr Sarna	b1684cf2e1	alternator: move function handlers to a lookup map Instead of a long chain of `if` statements, handlers are now created in a static map. Fixes a TODO in the code. Tests: unit(dev) Message-Id: <0ea577a44dd56859da170fe82c16c8f810f9d695.1592232448.git.sarna@scylladb.com>	2020-06-15 23:44:45 +03:00
Piotr Sarna	e76fba6f86	alternator: remove outdated TODO for adding timeouts The TODO is already fixed, not to mention that it had an incorrect ordinal number (: Message-Id: <006dc3061e0f30641c2e63ff471686f4c2e82829.1592230155.git.sarna@scylladb.com>	2020-06-15 23:04:42 +03:00
Tomasz Grabiec	1c5db178dd	Merge "logalloc: Get rid of segments migration" from Pavel But not compaction. When reclaiming segments to seastar non-empty segments are copied as-is to some other place. Instead of doing this reclaimer can copy only allocated objects and leave the freed holes behing, i.e. -- do the regular compaction. This would be the same or better from the timing perspective, and will help to avoid yet another compaction pass over the same set of objects in the future. Current migration code checks for the free segments reserve to be above minimum to proceed with migration, so does the code after this patch, thus the segment compaction is called with non-empty free segments set and thus it's guaranteed not to fail the new segment allocation (if it will be required at all). Plus some bikeshedding patches for the run-up. tests: unit(dev) * https://github.com/xemul/scylla/tree/br-logalloc-compact-on-reclaim-2: logalloc: Compact segments on reclaim instead of migration logallog: Introduce RAII allocation lock logalloc: Shuffle code around region::impl::compact logalloc: Do not lock reclaimer twice logalloc: Do not calculate object size twice logalloc: Do not convert obj_desc to migrator back and forth	2020-06-15 16:28:16 +02:00
Glauber Costa	093328741d	compaction: test that sstable set is not null in update_pending_ranges SSTable_set is now an optional, and if we don't want to expire data it will be empty. We need to check that it is not empty before dereferencing it. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200610170647.142817-1-glauber@scylladb.com>	2020-06-15 15:43:08 +02:00
Tomasz Grabiec	e81fc1f095	row_cache: Fix undefined behavior on key linearization This is relevant only when using partition or clustering keys which have a representation in memory which is larger than 12.8 KB (10% of LSA segment size). There are several places in code (cache, background garbage collection) which may need to linearize keys because of performing key comparison, but it's not done safely: 1) the code does not run with the LSA region locked, so pointers may get invalidated on linearization if it needs to reclaim memory. This is fixed by running the code inside an allocating section. 2) LSA region is locked, but the scope of with_linearized_managed_bytes() encloses the allocating section. If allocating section needs to reclaim, linearization context will contain invalidated pointers. The fix is to reorder the scopes so that linearization context lives within an allocating section. Example of 1 can be found in range_populating_reader::handle_end_of_stream() where it performs a lookup: auto prev = std::prev(it); if (prev->key().equal(_cache._schema, _last_key->_key)) { it->set_continuous(true); but handle_end_of_stream() is not invoked under allocating section. Example of 2 can be found in mutation_cleaner_impl::merge_some() where it does: return with_linearized_managed_bytes([&] { ... return _worker_state->alloc_section(region, [&] { Fixes #6637. Refs #6108. Tests: - unit (all) Message-Id: <1592218544-9435-1-git-send-email-tgrabiec@scylladb.com>	2020-06-15 16:03:33 +03:00
Nadav Har'El	86a4dfcd29	merge: api: Command to check and repair cdc streams Merged pull request https://github.com/scylladb/scylla/pull/6551 from Juliusz Stasiewicz: The command regenerates streams when: generations corresponding to a gossiped timestamp cannot be fetched from system_distributed table, or when generation token ranges do not align with token metadata. In such case the streams are regenerated and new timestamp is gossiped around. The returned JSON is always empty, regardless of whether streams needed regeneration or not. Fixes #6498 Accompanied by: scylladb/scylla-jmx#109, scylladb/scylla-tools-java#172	2020-06-15 14:17:35 +03:00
Takuya ASADA	ecc83e83e5	scylla_cpuscaling_setup: move the unit file to /etc/systemd Since scylla-cpupower.service isn't installed by .rpm package, but created in the setup script, it's better to not use /usr/lib directory, use /etc. We already doing same way for scylla-server.service.d/.conf, .mount, and *.swap created by setup scripts.	2020-06-15 11:36:20 +03:00
Asias He	61e4387811	repair: Relax node selection in decommission for non network topology strategy In decommission operation, current code requires a node in local dc to sync data with. This requirement is too strong for a non network topology strategy. For example, consider: n1 dc1 n2 dc1 n3 dc2 n2 runs decommission operation. For a keyspace with simple strategy and RF = 2, it is possible n3 is the new owner but n3 is not in the same dc as n2. To fix, perform the dc check only for the network topology strategy. Fixes #6564	2020-06-15 11:26:02 +03:00
Avi Kivity	d17b05e911	Merge 'Adding Optimized pseudo floating point estimated histogram' from Amnon " This series Adds a pseudo-floating-point histogram implementation. The histogram is used for time_estimated_histogram a histogram for latency tracking and then used in storage_proxy as a more efficient with a higher resolution histogram. Follow up series would use the new histogram in other places in the system and will add an implementation that supports lower values. Fixes #5815 Fixes #4746 " * amnonh-quicker_estimated_histogram: storage_proxy: use time_estimated_histogram for latencies test/boost/estimated_histogram_test utils/histogram_metrics_helper Adding histogram converter utils/estimated_histogram: Adding approx_exponential_histogram	2020-06-15 10:19:36 +03:00
Avi Kivity	493d16e800	build: fix --enable-dpdk/--disable-dpdk configure switch `5ceb20c439` switched --enable-dpdk to a tristate switch, but forgot that add_tristate() prepends --enable and --disable itself; so now the switch looks like --enable-enable-dpdk and --disable-enable-dpdk. Fix by removing the "enable-" prefix.	2020-06-15 09:37:45 +03:00
Amnon Heiman	6e1f042b93	storage_proxy: use time_estimated_histogram for latencies This patch change storage_proxy to use time_estimated_histogram. Besides the type, it changes how values are inserted and how the histogram is used by the API. An example how a metric looks like after the change: scylla_storage_proxy_coordinator_write_latency_bucket{le="640.000000",scheduling_group_name="statement",shard="0",type="histogram"} 0 scylla_storage_proxy_coordinator_write_latency_bucket{le="768.000000",scheduling_group_name="statement",shard="0",type="histogram"} 0 scylla_storage_proxy_coordinator_write_latency_bucket{le="896.000000",scheduling_group_name="statement",shard="0",type="histogram"} 0 scylla_storage_proxy_coordinator_write_latency_bucket{le="1024.000000",scheduling_group_name="statement",shard="0",type="histogram"} 0 scylla_storage_proxy_coordinator_write_latency_bucket{le="1280.000000",scheduling_group_name="statement",shard="0",type="histogram"} 0 scylla_storage_proxy_coordinator_write_latency_bucket{le="1536.000000",scheduling_group_name="statement",shard="0",type="histogram"} 0 scylla_storage_proxy_coordinator_write_latency_bucket{le="1792.000000",scheduling_group_name="statement",shard="0",type="histogram"} 2 scylla_storage_proxy_coordinator_write_latency_bucket{le="2048.000000",scheduling_group_name="statement",shard="0",type="histogram"} 2 scylla_storage_proxy_coordinator_write_latency_bucket{le="2560.000000",scheduling_group_name="statement",shard="0",type="histogram"} 3 scylla_storage_proxy_coordinator_write_latency_bucket{le="3072.000000",scheduling_group_name="statement",shard="0",type="histogram"} 5 scylla_storage_proxy_coordinator_write_latency_bucket{le="3584.000000",scheduling_group_name="statement",shard="0",type="histogram"} 5 scylla_storage_proxy_coordinator_write_latency_bucket{le="4096.000000",scheduling_group_name="statement",shard="0",type="histogram"} 7 scylla_storage_proxy_coordinator_write_latency_bucket{le="5120.000000",scheduling_group_name="statement",shard="0",type="histogram"} 8 scylla_storage_proxy_coordinator_write_latency_bucket{le="6144.000000",scheduling_group_name="statement",shard="0",type="histogram"} 9 scylla_storage_proxy_coordinator_write_latency_bucket{le="7168.000000",scheduling_group_name="statement",shard="0",type="histogram"} 11 scylla_storage_proxy_coordinator_write_latency_bucket{le="8192.000000",scheduling_group_name="statement",shard="0",type="histogram"} 11 scylla_storage_proxy_coordinator_write_latency_bucket{le="10240.000000",scheduling_group_name="statement",shard="0",type="histogram"} 19 scylla_storage_proxy_coordinator_write_latency_bucket{le="12288.000000",scheduling_group_name="statement",shard="0",type="histogram"} 49 scylla_storage_proxy_coordinator_write_latency_bucket{le="14336.000000",scheduling_group_name="statement",shard="0",type="histogram"} 132 scylla_storage_proxy_coordinator_write_latency_bucket{le="16384.000000",scheduling_group_name="statement",shard="0",type="histogram"} 294 scylla_storage_proxy_coordinator_write_latency_bucket{le="20480.000000",scheduling_group_name="statement",shard="0",type="histogram"} 1035 scylla_storage_proxy_coordinator_write_latency_bucket{le="24576.000000",scheduling_group_name="statement",shard="0",type="histogram"} 2790 scylla_storage_proxy_coordinator_write_latency_bucket{le="28672.000000",scheduling_group_name="statement",shard="0",type="histogram"} 5788 scylla_storage_proxy_coordinator_write_latency_bucket{le="32768.000000",scheduling_group_name="statement",shard="0",type="histogram"} 9815 scylla_storage_proxy_coordinator_write_latency_bucket{le="40960.000000",scheduling_group_name="statement",shard="0",type="histogram"} 19821 scylla_storage_proxy_coordinator_write_latency_bucket{le="49152.000000",scheduling_group_name="statement",shard="0",type="histogram"} 30063 scylla_storage_proxy_coordinator_write_latency_bucket{le="57344.000000",scheduling_group_name="statement",shard="0",type="histogram"} 38642 scylla_storage_proxy_coordinator_write_latency_bucket{le="65536.000000",scheduling_group_name="statement",shard="0",type="histogram"} 44987 scylla_storage_proxy_coordinator_write_latency_bucket{le="81920.000000",scheduling_group_name="statement",shard="0",type="histogram"} 51821 scylla_storage_proxy_coordinator_write_latency_bucket{le="98304.000000",scheduling_group_name="statement",shard="0",type="histogram"} 54197 scylla_storage_proxy_coordinator_write_latency_bucket{le="114688.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55054 scylla_storage_proxy_coordinator_write_latency_bucket{le="131072.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55363 scylla_storage_proxy_coordinator_write_latency_bucket{le="163840.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55520 scylla_storage_proxy_coordinator_write_latency_bucket{le="196608.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55545 scylla_storage_proxy_coordinator_write_latency_bucket{le="229376.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="262144.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="327680.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="393216.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="458752.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="524288.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="655360.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="786432.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="917504.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="1048576.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="1310720.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="1572864.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="1835008.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="2097152.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="2621440.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="3145728.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="3670016.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="4194304.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="5242880.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="6291456.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="7340032.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="8388608.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="10485760.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="12582912.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="14680064.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="16777216.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="20971520.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="25165824.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="29360128.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="33554432.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="+Inf",scheduling_group_name="statement",shard="0",type="histogram"} 55549 Fixes #4746 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-06-15 08:23:02 +03:00
Amnon Heiman	1cbc2e3d3e	test/boost/estimated_histogram_test This patch adds basic testing for the approx_exponential_histogram implementations. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-06-15 08:22:57 +03:00
Amnon Heiman	f30f926703	utils/histogram_metrics_helper Adding histogram converter This patch adds a helper converter function to convert from a approx_exponential_histogram histogram to a seastar::metrics::histogram Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-06-15 08:22:49 +03:00
Amnon Heiman	3319756f36	utils/estimated_histogram: Adding approx_exponential_histogram This patch adds an efficient histogram implementation. The implementation chooses efficiency over flexibility. That is why templates are used. How the approx_exponential_histogram pseudo floating point histogram works: It split the range [MIN, MAX] into log2(MAX/MIN) ranges it then split each of that ranges linearly according to a given resolution. For example, using resolution of 4, would be similar to using an exponentially growing histogram with a coefficient of 1.2. All values are uint64. To prevent handling of corner cases, it is not allowed to set the MIN to be lower than the resolution. The approx_exponential_histogram will probably not be used directly, the first used is by time_estimated_histogram. A histogram for durations. It should be compared to the estimated_histogram. Performance comparison: Comparison was done by inserting 2^20 values into time_estimated_histogram and estimated_histogram. In debug mode on a local machine insert operation took an average of 26.0 nanoseconds vs 342.2 nanoseconds. In release mode insert operation took an average of 1.90 vs 8.28 nanoseconds Fixes #5815 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-06-15 08:22:43 +03:00
Piotr Sarna	23c63ec19d	Merge 'alternator: implement FilterExpression' from Nadav The main goal of this series is to implement FilterExpression - the newer syntax for filtering results of Query and Scan requests. This feature itself is just one simple patch - it just needs to have the already-existing filtering code call the already-existing expression evaluation code. However, before we can do this, we need a patch to refactor the expression-evaluation interface (this patch also fixes pre-existing bugs). Then we need three additional patches to fix pre- existing bugs in the various corner cases of expressions (this bugs already existed in ConditionExpression but now became visible in tests for FilerExpression). Finally, in the end of the series, we also do a bit of code cleanup. After this series, the FilterExpression feature is complete, and all tests for this feature pass. Tests: unit(dev) * 'alternator-filterexpression' of git://github.com/nyh/scylla: alternator: avoid unnecessary conversion to string alternator: move some code out of executor.cc alternator: implement FilterExpression alternator: improve error path of attribute_type() function alternator: fix begins_with() error path alternator: fix corner case of contains() function in conditions alternator: refactor resolving of references in expressions	2020-06-14 19:42:46 +02:00
Avi Kivity	4220ed849b	Merge "Use abseil's hash map in a couple places" from Rafael " This is part of the work for replacing global sstring variables with constexpr std::string_view ones. To have std::string_view values we have to convert a few APIs to take std::string_view instead of sstring references. The API conversions are complicated by the fact that std::unordered_map doesn't support heterogeneous lookup, so we need another hash map. The one provided by abseil seems like a natural choice since it has an API that looks like what is being proposed for c++ (http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2019/p1690r0.html) but is also much faster. A nice side effect is that this series is a 0.46% win in perf_simple_query --duration 16 --smp 1 -m4G Over 500 runs with randomized section layout and environment on each run. " * 'espindola/absl-v10' of https://github.com/espindola/scylla: database: Use a flat_hash_map for _ks_cf_to_uuid database: Use flat_hash_map for _keyspaces Add absl wrapper headers build: Link with abseil cofigure: Don't overwrite seastar_cflags Add abseil as a submodule	2020-06-14 18:26:59 +03:00
Rafael Ávila de Espíndola	336d541f58	database: Use a flat_hash_map for _ks_cf_to_uuid Given that the key is a std::pair, we have to explicitly mark the hash and eq types as transparent for heterogeneous lookup to work. With that, pass std::string_view to a few functions that just check if a value is in the map. This increases the .text section by 11 KiB (0.03%). Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-14 08:18:39 -07:00
Rafael Ávila de Espíndola	6da9eef25f	database: Use flat_hash_map for _keyspaces This changes the hash map used for _keyspaces. Using a flat_hash_map allows using std::string_view in has_keyspace thanks to the heterogeneous lookup support. This add 200 KiB to .text, since this is the first use of absl and brings in files from the .a. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-14 08:18:39 -07:00
Rafael Ávila de Espíndola	dd0d4ae217	Add absl wrapper headers Using these instead of using the absl headers directly adds support for heterogeneous lookup with sstring as key. The is no gain from having the hash function inline, so this implements it in a .cc file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-14 08:18:39 -07:00
Rafael Ávila de Espíndola	7d1f6725dd	build: Link with abseil It is a pity we have to list so many libraries, but abseil doesn't provide a .pc file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-14 08:18:39 -07:00
Rafael Ávila de Espíndola	2ad09aefb6	cofigure: Don't overwrite seastar_cflags The variable seastar_cflags was being used for flags passed to seastar and for flags extracted from the seastar.pc file. This introduces a new variable for the flags extracted from the seastar.pc file. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-14 08:18:39 -07:00
Rafael Ávila de Espíndola	383a9c6da9	Add abseil as a submodule This adds the https://abseil.io library as a submodule. The patch series that follows needs a hash table that supports heterogeneous lookup, and abseil has a really good hash table that supports that (https://abseil.io/blog/20180927-swisstables). The library is still not available in Fedora, but it is fairly easy to use it directly from a submodule. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-14 08:18:37 -07:00
Avi Kivity	08313106ce	Merge 'Repair use table id instead of table name' from Asias " Use table_id instead of table_name in row level repair to find a table. It guarantees we repair the same table even if a table is dropped and a new table is created with the same name. Refs: #5942 " * asias-repair_use_table_id_instead_of_table_name: repair: Do not pass table names to repair_info repair: Add table_id to row_level_repair repair: Use table id to find a table in get_sharder_for_tables repair: Add table_ids to repair_info repair: Make func in tracker::run run inside a thread	2020-06-14 14:58:46 +03:00
Raphael S. Carvalho	9983fa8766	compaction_manager: Export backlog metric This backlog metric holds the sum of backlog for all the tables in the system. This is very useful for understanding the behavior of the backlog trackers. That's how we managed to fix most of backlog bugs like #6054, #6021, etc. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20200612194908.39909-1-raphaelsc@scylladb.com>	2020-06-14 14:07:53 +03:00
Avi Kivity	76d082c2b2	Merge "Decouple client services from storage_service" from Pavel E " The cql_server and thrift are "owned" by storage_service for the sake of managing those, i.e. starting and stopping. Since other services (still) need the storage_service this creates dependencies loops. This set makes the client services independent from the storage service. As a consequence of it the auth service is also removed from storage_service and put standalone. This, in turn, sets some tests free from the need to start and stop auth and makes one step towards NOT join_cluster()-ing in unit tests. Also the set fixes few wierd races on scylla start and stop that can trigger local_is_initialized() asserts, and one case of unclear aborted shutdown when client services remain running till the scylla process exit. Yet another benefit is localization of "isolating" functionality that sits deeper in storage_service than it should. One thing that's not completely clean after it is the need for cql server to continue referencing the service_memory_limiter semaphore from the storage_service, but this will go away with one of the next sets. tests: unit(debug), manual start-stop, nodetool check of cql/thrift start/stop " * 'br-split-transport-1' of https://github.com/xemul/scylla: storage_service: Isolate isolator auth: Move away from storage_service auth: Move start-stop code into main main: Don't forget to stop cql/thrift when start is aborted thrift_controller: Switch on standalone thrift_controller: Pass one through management API thrift_controller: Move the code into thrift/ thrift_controller: Introduce own lock for management thrift: Wrap start/stop/is_running code into a class cql_controller: Switch on standalone cql_controller: Pass one through management API cql_controller: Move the code into transport/ cql_controller: Introduce own lock for management cql: Wrap start/stop/is_running code into a class api: Tune reg/unreg of client services control endpoints	2020-06-14 13:49:23 +03:00
Takuya ASADA	863293576c	scylla_setup: add swapfile setup Adding swapfile setup on scylla_setup. Fixes #6539	2020-06-14 13:18:51 +03:00
Amnon Heiman	06510a4752	service/storage_service.cc: Make effective_ownership preemptable A lot is going on when calculating effective ownership. For each node in the cluster, we need to go over all the ranges belong to that node and see if that node is the owner or not. This patch uses futurized loops with do_for_each so it would preempt if needed. The patch replaces the current for-loops with do_for_each and do_with but keeps the logic. Fixes #6380	2020-06-14 12:56:07 +03:00
Nadav Har'El	493d7e6716	alternator: avoid unnecessary conversion to string In a couple of places, where we already have a std::string_view, there is no need to convert to to a std::string (which requires allocation). One cool observation (by Piotr Sarna) is that map over std::string_view is fine, when the strings in the map are always string constants. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 12:16:26 +03:00
Nadav Har'El	8c026b9f10	alternator: move some code out of executor.cc The source file alternator/executor.cc has grown too much, reaching almost 4,000 lines. In this patch I move about 400 lines out of executor.cc: 1. Some functions related to serialization of sets and lists were moved to serialization.cc, 2. Functions related to evaluating parsed expressions were moved to expressions.cc. The header file expressions_eval.hh was also removed - the calculate_value() functions now live in expressions.cc, so we can just define them in expressions.hh, no need for a separate header files. This patch just moves code around. It doesn't make any functional changes. Refs #5783. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 12:16:26 +03:00
Nadav Har'El	0b9f25ab50	alternator: implement FilterExpression This patch provides a complete implementation for the FilterExpression parameter - the newer syntax for filtering the results of the Query or Scan operations. The implementation is pretty straightforward - we already added earlier a result-filtering framework to Alternator, and used it for the older filtering syntax - QuryFilter and ScanFilter. All we had to do now was to run the FilterExpression (which has the same syntax as a ConditionExpression) on each individual items. The previous cleanup patches were important to reduce the friction of running these expressions on the items. After the previous patches fixing small esoteric bugs in a few expression functions, with this patch all the tests in test_filter_expression.py now pass, and so do the two FilterExpression tests in test_query.py and test_scan.py. As far as I know (and of course minus any bugs we'll discover later), this marks the FilterExpression feature complete. Fixes #5038. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 12:16:26 +03:00
Nadav Har'El	f87259a762	alternator: improve error path of attribute_type() function The attribute_type() function, which can be used in expressions like ConditionExpression and FilterExpression, is supposed to generate an error if its second parameter is not one of the known types. What we did until now was to just report a failed check in this case. We already had a reproducing test with FilterExpression, but in this patch we also add a test with ConditionExpression - which fails before this patch and passes afterwards (and of course, passes with DynamoDB). Fixes #6641. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 12:16:20 +03:00
Nadav Har'El	11d86dfb06	alternator: fix begins_with() error path The begins_with() function should report an error if a constant is passed to it which isn't one of the supported types - string or bytes (e.g., a number). The code we had to check this had wrong logic, though. If the item attribute was also a number, we silently returned false, and didn't go on to detect that the second parameter - a constant - was a number too and should generate an error - not be silent. Fixed and added a reproducing test case and another test to validate my understanding of the type of parameters that begins_with() accepts. Fixes #6640. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 12:13:23 +03:00
Nadav Har'El	f79a4e0e78	alternator: fix corner case of contains() function in conditions It turns out that the contains() functions in the new syntax of conditions (ConditionExpression, FilterExpression) is not identical to the CONTAINS operator in the old-syntax conditions (Expected). In the new syntax, one can check whether any constant object is contained in a list. In the old syntax, the constant object must be of specific types. So we need to move the testing out of the check_CONTAINS() functions that both implementations used, and into just the implementation of the old syntax (in conditions.cc). This bug broke one of the FilterExpression tests, but this patch also adds new tests for the different behaviour of ConditionExpression and Expected - tests which also reproduce this issue and verify its fix. Fixes #6639. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 12:02:14 +03:00
Nadav Har'El	13ef31f38b	alternator: refactor resolving of references in expressions In the DynamoDB API, expressions (e.g., ConditionExpression and many more) may contain references to column names ("#name") or to values (":val") given in a separate part of the request - ExpressionAttributeNames and ExpressionAttributeValues respectively. Before this patch, we resolved these references as part of the expression's evaluation. This approach had two downsides: 1. It often misdiagnosed (both false negatives and false positives) cases of unused names and values in expressions. We already had two xfailing tests with examples - which pass after this patch. This patch also adds two additional tests, which failed before this patch and pass with it. 2. In one of the following patches we will add support for FilterExpression, where the same expression is used repeatedly on many items. It is a waste (as well as makes the code uglier) to resolve the same references again and again each time the expression is evaluated. We should be able to do it just once. So this patch introduces an intermediate step between parsing and evaluating an expression - "resolving" the expression. The new resolve_() functions modify the already parsed expression, replacing references to attribute names and constant values by the actual names and values taken from the request. The resolve_() functions also keep track which references were used, making it very easy to check (as DynamoDB does) if there are any unused names or values, before starting the evaluation. The interface of evaluate() functions become much simpler - they no longer need to know the original request (which was previously needed for ExpressionAttributeNames/Values), the table's schema (which was previously needed only for some error checking), keep track of which references were used. This simplification is helpful for using the expressions in contexts where these things (request and schema) are no longer conveniently available, namely in FilterExpression. A small side-benefit of this patch is that it moves a bit of code, which handled resolving of references in expressions, from executor.cc to expressions.cc. This is just the first step in a bigger effort to reduce the size of executor.cc by moving code to smaller source files. There is no attempt in this patch to move as much code as we can. We will move more code in a separate patch in this series. Fixes #6572. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-06-14 11:57:13 +03:00
Glauber Costa	b0a0c207c3	twcs: move implementations to its own file LCS and SCTS already have their own files, reducing the clutter in compaction_strategy.cc. Do the same for TWCS. I am doing this in preparation to add more functions. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20200611230906.409023-6-glauber@scylladb.com>	2020-06-14 11:50:08 +03:00
Pavel Emelyanov	514a1580da	storage_service: Isolate isolator There is a code that isolates a node on disk error. After all the previous changes this code can be collected in one place (better to move it away from storage_service at all, but still). This simplifies the stop_transport(): now it can avoid rescheduling itself on shard 0 for the 2nd time. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-12 22:14:33 +03:00

1 2 3 4 5 ...

22435 Commits