scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-13 03:12:13 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	af4cc233c3	sstables: Destroy partition index cache gently There could be a lot of them so we should clear it gently to avoid reactor stalls.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	9f957f1cf9	sstables: Cache partition index pages in LSA and link to LRU As part of this change, the container for partition index pages was changed from utils::loading_shared_values to intrusive_btree. This is to avoid reactor stalls which the former induces with a large number of elements (pages) due to its use of a hashtable under the hood, which reallocates contiguous storage.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	b3728f7d9b	utils: Introduce lsa::weak_ptr<> Simplifies managing non-owning references to LSA-managed objects. The lsa::weak_ptr is a smart pointer which is not invalidated by LSA and can be used safely in any allocator context. Dereferenced will always give a valid reference. This can be used as a building block for implementing cursors into LSA-based caches. Example simple use: // LSA-managed struct X : public lsa::weakly_referencable<X> { int value; }; lsa::weak_ptr<X> x_ptr = with_allocator(region(), [] { X* x = current_allocator().construct<X>(); return x->weak_from_this(); }); std::cout << x_ptr->value;	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	2a852cd0c9	sstables: Rename index_list to partition_index_page and shared_index_lists to partition_index_cache The new names are less confusing.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	934824394a	sstables, cached_file: Avoid copying buffers from cache when parsing promoted index	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	7b6f18b4ed	cached_file: Introduce get_page_units() Will be needed later for reading a page view which cannot use make_tracked_temporary_buffer(). Standardize on get_page_units(), converting existing code to wrap the units in a deleter.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	23bc19643f	sstables: read: Document that primitive_consumer::read_32() is alloc-free Callers will rely on it to assume that it does not invalidate references to LSA objects.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	b98e660a4a	sstables: read: Count partition index page evictions	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	8360a64f73	sstables: Drop the _use_binary_search flag from index entries It doesn't have to be set by the parser now that the cursors are created lazily, pass it to the cursor when it's created.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	06e373e272	sstables: index_reader: Keep index objects under LSA In preparation for caching index objects, manage them under LSA. Implementation notes: key_view was changed to be a view on managed_bytes_view instead of bytes, so it now can be fragmented. Old users of key_view now have to linearize it. Actual linearization should be rare since partition keys are typically small. Index parser is now not constructing the index_entry directly, but produces value objects which live in the standard allocator space: class parsed_promoted_index_entry; calss parsed_partition_index_entry; This change was needed to support consumers which don't populate the partition index cache and don't use LSA, e.g. sstable::generate_summary(). It's now consumer's responsibility to allocate index_entry out of parsed_partition_index_entry.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	20ef54e9ed	lsa: chunked_managed_vector: Adapt more to managed_vector For seamless transition.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	78e5b9fd85	utils: lsa: chunked_managed_vector: Make LSA-aware The max chunk size is set to be 10% of segment size.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	856e4a539d	test: chunked_managed_vector_test: Make exception_safe_class standard layout Required by managed_vector<> due to its use of offsetof() In preparation for swtiching chunked_managed_vector storage to managed_vector<>.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	c87ea09535	lsa: Copy chunked_vector to chunked_managed_vector In preparation for adapting it to LSA. Split into two steps to make reiew easier.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	1523a7d367	utils: managed_vector: Make clear_and_release() public Will be needed by index reader to ensure that destructor doesn't invoke the allocator so that all is destroyed in the desried allocation context before the object is destroyed.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	2b673478aa	sstables: index_reader: Do not expose index_entry references index_entry will be an LSA-managed object. Those have to be accessed with care, with the LSA region locked. This patch hides most of direct index_entry accesses inside the index_reader so that users are safe.	2021-07-02 19:02:13 +02:00
Tomasz Grabiec	a955e7971d	sstables: index_reader: Don't store schema reference inside index_entry To save space.	2021-07-02 19:02:13 +02:00
Tomasz Grabiec	9e7bf066a9	sstables: index_reader: Don't store file object inside promoted_index The file object which is currently stored there has per-request tracing wrappers (permit, trace_state) attached to it. It doesn't make sense once the entry is cached and shared. Annotate when the cursor is created instead.	2021-07-02 19:02:13 +02:00
Tomasz Grabiec	86b135056c	sstables: index_reader: Don't store front buffer inside promoted_index Index reads and promoted index reads are both using the same cached_file now, so there's no need to pass the buffers between the index parser and promoted index reader. Makes the promoted_index structure easier to move to LSA.	2021-07-02 19:02:13 +02:00
Tomasz Grabiec	484e06d69b	cached_file: Always start at offset 0 All current uses start at offset 0, so simplify the code by assuming it.	2021-07-02 19:02:13 +02:00
Tomasz Grabiec	078a6e422b	sstables: Cache all index file reads After this patch, there is a singe index file page cache per sstable, shared by index readers. The cache survives reads, which reduces amount of I/O on subsequent reads. As part of this, cached_file needed to be adjusted in the following ways. The page cache may occupy a significant portion of memory. Keeping the pages in the standard allocator could cause memory fragmentation problems. To avoid them, the cache_file is changed to keep buffers in LSA using lsa_buffer allocation method. When a page is needed by the seastar I/O layer, it needs to be copied to a temporary_buffer which is stable, so must be allocated in the standard allocator space. We copy the page on-demand. Concurrent requests for the same page will share the temporary_buffer. When page is not used, it only lives in the LSA space. In the subsequent patches cached_file::stream will be adjusted to also support access via cached_page::ptr_type directly, to avoid materializating a temporary_buffer. While a page is used, it is not linked in the LRU so that it is not freed. This ensures that the storage which is actively consumed remains stable, either via temporary_buffer (kept alive by its deleter), or by cached_page::ptr_type directly.	2021-07-02 19:02:13 +02:00
Tomasz Grabiec	b5ca0eb2a2	lsa: Introduce lsa_buffer lsa_buffer is similar in spirit to std::unique_ptr<char[]>. It owns buffers allocated inside LSA segments. It uses an alternative allocation method which differs from regular LSA allocations in the following ways: 1) LSA segments only hold buffers, they don't hold metadata. They also don't mix with standard allocations. So a 128K segment can hold 32 4K buffers. 2) objects' life time is managed by lsa_buffer, an owning smart pointer, which is automatically updated when buffers are migrated to another segment. This makes LSA allocations easier to use and off-loads metadata management to the client (which can keep the lsa_buffer wherever he wants). The metadata is kept inside segment_descriptor, in a vector. Each allocated buffer will have an entangled object there (8 bytes), which is paired with an entabled object inside lsa_buffer. The reason to have an alternative allocation method is to efficiently pack buffers inside LSA segments.	2021-07-02 19:02:13 +02:00
Tomasz Grabiec	a23f27034f	lsa: Introduce entangled helper Will be useful in building higher-level LSA tools.	2021-07-02 19:02:13 +02:00
Tomasz Grabiec	056f14063e	lsa: Encapsulate segment_descriptor::_free_space access Prepares for reusing some of its bits for storing segment kind.	2021-07-02 19:02:13 +02:00
Tomasz Grabiec	019956739d	cached_file: Switch to bplus::tree In order to be able to move it to LSA later.	2021-07-02 10:25:58 +02:00
Tomasz Grabiec	f537d1a7e5	tests: sstables: Do not call open_data() twice make_sstable_containing() already calls open_data(), so does load(). This will trigger assertion failure added in a later patch: assert(!_cached_index_file); There is no need to call load() here.	2021-07-02 10:25:58 +02:00
Tomasz Grabiec	627a2ef087	test: cached_file: Add test for eof_error	2021-07-02 10:25:58 +02:00
Tomasz Grabiec	8fbea0b5b7	utils: cached_file: Introduce file wrapper It's an adpator between seastar::file and cached_file. It gives a seastar::file which will serve reads using a given cached_file as a read-through cache.	2021-07-02 10:25:58 +02:00
Tomasz Grabiec	8e2118069b	sstables: cached_file: Account buffers returned by cached_file under read_permit We want buffers to be accounted only when they are used outside cached_file. Cached pages should not be accounted because they will stay around for longer than the read after subsequent commits.	2021-07-02 10:25:58 +02:00
Tomasz Grabiec	a5c72ed899	sstables, database: Keep cache_tracker reference inside sstables_manager So that sstable code can pick it up for caching (lru and region).	2021-07-02 10:25:58 +02:00
Tomasz Grabiec	4b51e0bf30	row_cache: Move cache_tracker to a separate header It will be needed by the sstable layer to get the to the LRU and the LSA region. Split to avoid inclusion of whole row_cache.hh	2021-07-02 10:25:58 +02:00
Tomasz Grabiec	7fa4e10aa0	row_cache: Use generic LRU for eviction In preparation for tracking different kinds of objects, not just rows_entry, in the LRU, switch to the LRU implementation form utils/lru.hh which can hold arbitrary element type.	2021-07-02 10:25:58 +02:00
Tomasz Grabiec	6b59c8cfb1	utils: Introduce general-purpose LRU The LRU can link objects of different types, which is achieved by having a virtual base class called "evictable" from which the linked objects should inherit. Whe the object is removed from the LRU, evictable::on_evicted() is called. The container is non-owning.	2021-07-02 10:25:58 +02:00
Nadav Har'El	7a5111c580	Merge 'messaging_service: do not listen on port 0' from Benny Halevy We never want to listen on port 0, even if configured so. When the listen port is set to 0, the OS will choose the port randomly, which makes it useless for communicating with other nodes in the cluster, since we don't support that. Also, it causes the listen_ports_conf_test internode_ssl_test to fail since it expects to disable listening on storage_port or ssl_storage_port when set to 0, as seen in https://github.com/scylladb/scylla-dtest/issues/2174. Fixes #8957 Test: unit(dev) DTest: listen_ports_conf_test (modified) Closes #8956 * github.com:scylladb/scylla: messaging_service: do_start_listen: improve info log accuracy messaging_service: never listen on port 0	2021-06-30 18:41:58 +03:00
Nadav Har'El	7ab48b405f	CQL: always validate NetworkTopologyStrategy replication factor The replication factor passed to NetworkTopologyStrategy (which we call by the confusing name "auto expand") may or may not be used (see explanation why in #8881), but regardless, we should validate that it's a legal number and not some non-numeric junk, and we should report the error. Before this patch, the two commands CREATE KEYSPACE name WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 } ALTER KEYSPACE name WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'replication_factor' : 'foo' } succeed despite the invalid replication factor "foo". After this patch, the second command fails. The problem fixed here is reproduced by the existing test test_keyspace.py::test_alter_keyspace_invalid when switching it to use NetworkTopologyStrategy, as suggested by issue #8638. Fixes #8880 Refs #8881 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210620100442.194610-1-nyh@scylladb.com>	2021-06-30 16:49:46 +03:00
Benny Halevy	51bc6c8b5a	messaging_service: do_start_listen: improve info log accuracy Make sure to log the info message when we actually start listening. Also, print a log message when listening on the broadcast address. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-06-30 16:25:21 +03:00
Benny Halevy	df442d4d24	messaging_service: never listen on port 0 We never want to listen on port 0, even if configured so. When the listen port is set to 0, the OS will choose the port randomly, which makes it useless for communicating with other nodes in the cluster, since we don't support that. Also, it causes the listen_ports_conf_test internode_ssl_test to fail since it expects to disable listening on storage_port or ssl_storage_port when set to 0, as seen in https://github.com/scylladb/scylla-dtest/issues/2174. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-06-30 16:24:54 +03:00
Nadav Har'El	029991bfc2	test/cql-pytest: test that SSL CQL port doesn't accept unencrypted connections Scylla doesn't allow unencrypted connections over encrypted CQL ports (Cassandra does allow this, by setting "optional: true", but it's not secure and not recommended). Here we add a test that in indeed, we can't connect to an SSL port using an unencrypted connection. The test passes on Scylla, and also on Cassandra (run it on Cassandra with "test/cql-pytest/run-cassandra --ssl" - for which we added support in a recent patch). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210629121514.541042-1-nyh@scylladb.com>	2021-06-29 16:42:22 +03:00
Nadav Har'El	dc4c05b2e3	test/cql-pytest: switch some fixture scopes from "session" to "module" Fixtures in conftest.py (e.g., the test_keyspace fixture) can be shared by all tests in all source files, so they are marked with the "session" scope: All the tests in the testing session may share the same instance. This is fine. Some of test files have additional fixtures for creating special tables needed only in those files. Those were also, unnecessarily, marked "session" scope as well. This means that these temporary tables are only deleted at the very end of test suite, event though they can be deleted at the end of the test file which needed them - other test source files don't have access to it anyway. This is exactly what the "module" fixture scope is, so this patch changes all the fixtures that are private to one test file to use the "module" scope. After this patch, the teardown of the last test in the suite goes down from 0.26 seconds to just 0.06 seconds. Another benefit is that the peak disk usage of the test suite is lower, because some of the temporary tables are deleted sooner. This patch does not change any test functionality, and also does not make any test faster - it just changes the order of the fixture teardowns. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #8932	2021-06-29 16:10:47 +03:00
Calle Wilund	a40b6a2f54	commitlog: Use disk file alignment info (with lower value if possible) Previously, the disk block alignment of segments was hardcoded (due to really old code). Now we use the value as declared in the actual file opened. If we are using a previously written file (i.e. o_dsync), we can even use the sometimes smaller "read" alignment. Also allow config to completely override this with a disk alignment config option (not exposed to global config yet, but can be). v2: * Use overwrite alignment if doing only overwrite * Ensure to adjust actual alignment if/when doing file wrapping v3: * Kill alignment config param. Useless and unsafe. Closes #8935	2021-06-29 16:00:49 +03:00
Nadav Har'El	7e4bef96af	test/cql-pytest: support "--ssl" option in run-cassandra This patch adds support for the "--ssl" option in run-cassandra, which will now be able, like run (which runs Scylla), to run Cassandra with listening to a SSL-encrypted CQL connection. The "--ssl" option is also passed to the tests, so they know to encrypt their CQL connections. We already had support for this feature in the test/cql-pytest/run script - which runs Scylla. Adding this also to the run-cassandra script can help verify that a behavior we notice in Scylla's SSL support and we want to add to a test - is also shared by Cassandra. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210629082532.535229-1-nyh@scylladb.com>	2021-06-29 12:05:40 +03:00
Takuya ASADA	edd54a9463	reloc: add arch to relocatable package filename Add architecture name for relocatable packages, to support distributing both x86_64 version and aarch64 version. Also create symlink from new filename to old filename to keep compatibility with older scripts. Fixes #8675 Closes #8709 [update tools/python3 submodule: * tools/python3 ad04e8e...afe2e7f (1): > reloc: add arch to relocatable package filename ]	2021-06-28 15:01:09 +03:00
Avi Kivity	f660726773	Update seastar submodule * seastar 0e48ba883...eaa00e761 (3): > memory: reduce statistics TLS initialization even more > Merge "Sanitize io-topology creation on start" from Pavel E > doc/prometheus: note that metric family is passed by query name	2021-06-28 11:52:36 +03:00
Botond Dénes	09309f5dbf	reader_concurrency_semaphore: on_permit_created(): remove noexcept The permit creation path enters the semaphore's permit gate in on_permit_created(). Entering this gate can throw so this method is not noexcept. Remove the noexcept specifier accordingly. Also enter the gate before adding the permit to the permit list, to save some work when this fails. Fixes: #8933 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210628074941.32878-1-bdenes@scylladb.com>	2021-06-28 11:04:38 +03:00
Avi Kivity	c0c1e26014	Merge 'Remove code writing LA/KA sstables' from Piotr Jastrzębski Now that all supported versions write mc/md sstables, we can deprecate the MC_SSTABLE feature bit and consider it implicitly true, and with it the ability to write la/ka sstables. We still need to support reading them, e.g. from restoring old snapshots or migrating data from legacy clusters. Test: unit(dev, debug) Fixes #8352 Closes #8884 * github.com:scylladb/scylla: compress: Remove unused make_compressed_file_k_l_format_output_stream sstables: move sstable_writer to separate header sstable_writer: remove get_metadata_collector sstables: stop including metadata_collector.hh in sstables.hh sstables: Remove duplicated friend declaration sstables: remove unused KL writer sstables: Always use MC/MD writer sstable_datafile_test: switch tests to use latest sstables format sstable_datafile_test: switch compaction_with_fully_expired_table to latest sstable version test_offstrategy_sstable_compaction: test all writable sstables compaction_with_fully_expired_table: Remove some LA specific code sstable_mutation_test: test latest sstable format instead of LA sstable_test: Test MX sstables instead of KA/LA sstable_datafile_test: Fix schema used by check_compacted_sstables sstables: Remove LA/KA sstable writting tests that check exact format sstables: define writable_sstable_versions features: assume MC_SSTABLE and UNBOUNDED_RANGE_TOMBSTONES are always enabled	2021-06-27 20:50:51 +03:00
Avi Kivity	121971ec0f	Merge "storage_proxy: specialize query_singular() for non-IN queries" from Gleb " query_singular() accepts a partition_range_vector, corresponding to an IN query. But such queries are rare compared to single-partition queries. Co-routinise the code and special case non-IN queries by avoiding the call to map_reduce. Also replace executers array with small_vector to avoid an allocation in the common case. perf_simple_query --smp 1 --operations-per-shard 1000000 --task-quota-ms 10: before: median 204545.04 tps ( 81.1 allocs/op, 15.1 tasks/op, 48828 insns/op) after: median 219769.97 tps ( 74.1 allocs/op, 12.1 tasks/op, 46495 insns/op) So, a ~7% improvement in tps and 5% improvement in instructions per op. Also large reduction in tasks and allocations. This is an alternative proposal to https://github.com/scylladb/scylla/pull/8909. The benefit of this one is that it does not duplicate any code (almost). " * 'query_singular-coroutine' of github.com:scylladb/scylla-dev: storage_proxy: avoid map_reduce in storage_proxy::query_singular if only one pk is queried storage_proxy: use small_vector in storage_proxy::query_singular to store executors storage_proxy: co-routinize storage_proxy::query_singular()	2021-06-27 16:30:19 +03:00
Piotr Jastrzebski	10228b35c5	compress: Remove unused make_compressed_file_k_l_format_output_stream Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2021-06-27 15:12:31 +02:00
Piotr Jastrzebski	430fd5cfa9	sstables: move sstable_writer to separate header This class is used in only few places and does not have to be included everywhere sstable class is needed. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2021-06-27 15:12:31 +02:00
Piotr Jastrzebski	9e7144f719	sstable_writer: remove get_metadata_collector This function is only called internally so it does not have to be exposed and can be inlined instead. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2021-06-27 15:12:31 +02:00
Piotr Jastrzebski	2d6608bb88	sstables: stop including metadata_collector.hh in sstables.hh metadata collector is rarely used so it's better to include it only in those few places. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2021-06-27 15:12:31 +02:00

1 2 3 4 5 ...

27159 Commits