scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 21:47:10 +00:00

Author	SHA1	Message	Date
Avi Kivity	7534412071	table_helper: de-inline insert() and setup_keyspace() After previous patches de-templated these functions, we can de-inline them. This helps reduce compile time and prepares to reduce header dependencies.	2019-01-05 16:28:46 +02:00
Avi Kivity	cfedf4ab0f	table_helper: de-template setup_keyspace() This setup function has no reason to be a template and is easily converted. We can then later de-inline it to reduce dependencies.	2019-01-05 16:23:10 +02:00
Avi Kivity	659147cd79	table_helper: simplify template body of table_helper::insert() Move most of the body into a non-template overload to reduce dependencies in the header (and template bloat). The function is not on any fast path, and noncopyable_function will likely not even allocate anything.	2019-01-05 16:22:08 +02:00
Avi Kivity	c3ef99f84f	schema_tables: remove #include of database.hh Distribute in source files (and one header - table_helper.hh) that need it.	2019-01-05 15:43:07 +02:00
Avi Kivity	f43f82d1d2	cql_type_parser: remove dependency on user_types_metadata A default parameter of type T (or lw_shared_ptr<T>) requires that T be defined. Remove the depndency by redefining the default parameter as an overload, for T = user_types_metadata.	2019-01-05 15:40:58 +02:00
Avi Kivity	4ba1d4d1dc	thrift: add missing include of sleep.hh Currently obtained indirectly through database.hh.	2019-01-05 15:39:30 +02:00
Avi Kivity	d24962e16c	cql3: ks_prop_defs: remove #include "database.hh" Replace with forward declaration to reduce rebuilds.	2019-01-05 14:26:03 +02:00
Jesse Haber-Kucharsky	17a5f7acab	build: Link against libatomic Since Scylla uses functions from the `atomic` header in its own source code, we need to explicitly link against the stub library that is provided for hardware architectures that do not have native support for atomic operations. Fixes #4053 Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <7d62e762130494d73565ce8c031f53aaf866d3aa.1546645041.git.jhaberku@scylladb.com>	2019-01-05 13:38:57 +02:00
Avi Kivity	36e4e9fb54	Update seastar submodule * seastar 6c8c229...67fd967 (1): > perftune.py: tune only active NVMe HW queues on i3 AWS instances	2019-01-04 13:17:29 +02:00
Avi Kivity	b0980ba7c6	compaction_controller: increase minimum shares to 50 (~5%) for small-data workloads The workload in #3844 has these characteristics: - very small data set size (a few gigabytes per shard) - large working set size (all the data, enough for high cache miss rate) - high overwrite rate (so a compaction results in 12X data reduction) As a result, the compaction backlog controller assigns very few shares to compaction (low data set size -> low backlog), so compaction proceeds very slowly. Meanwhile, we have tons of cache misses, and each cache miss needs to read from a large number of sstables (since compaction isn't progressing). The end result is a high read amplification, and in this test, timeouts. While we could declare that the scenario is very artificial, there are other real-world scenarios that could trigger it. Consider a 100% write load (population phase) followed by 100% read. Towards the end of the last compaction, the backlog will drop more and more until compaction slows to a crawl, and until it completes, all the data (for that compaction) will have to be read from its input sstables, resulting in read amplification. We should probably have read amplification affect the backlog, but for now the simpler solution is to increase the minimum shares to 50 so that compaction always makes forward progress. This will result in higher-than-needed compaction bandwidth in some low write rate scenarios so we will see fluctuations in request rate (what the controller was designed to avoid), but these fluctioations will be limited to 5%. Since the base class backlog_controller has a fixed (0, 0) point, remove it and add it to derived classes (setting it to (0, 50) for compaction). Fixes #3844 (or at least improves it). Message-Id: <20181231162710.29410-1-avi@scylladb.com>	2019-01-04 10:58:43 +01:00
Duarte Nunes	b851cb1a9a	distributed_loader: Forbid uploading MV sstables Instead suggest that the views be re-created. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190103142933.35354-1-duarte@scylladb.com>	2019-01-03 16:31:20 +02:00
Duarte Nunes	3235c13125	utils/fragmented_temporary_buffer: Correctly implement remove_suffix() The current implementation breaks the invariant that _size_bytes = reduce(_fragments, &temporary_buffer::size) In particular, this breaks algorithms that check the individual segment size. Correctly implement remove_suffix() by destroying superfluous temporary_buffer's and by trimming the last one, if needed. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190103133523.34937-1-duarte@scylladb.com>	2019-01-03 13:37:01 +00:00
Botond Dénes	021feef513	querier_cache: simplify memory eviction use-after-free fix, add tests Simplify the fix for memory based eviction, introduced by `918d255` so there is no need to massage the counters. Also add a check to `test_memory_based_cache_eviction` which checks for the bug fixed. While at it also add a check to `test_time_based_cache_eviction` for the fix to time based eviction (`e5a0ea3`). Tests: tests/querier_cache:debug Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <c89e2788a88c2a701a2c39f377328e77ac01e3ef.1546515465.git.bdenes@scylladb.com>	2019-01-03 13:44:08 +02:00
Tomasz Grabiec	1613a623e1	Merge "Fix crash on corrupt sstable" from Rafael * https://github.com/espindola/scylla espindola/invalid_boundary4: sstables: Refactor predicates on bound_kind_m Fix crash on corrupt sstable	2019-01-03 12:02:09 +01:00
Duarte Nunes	42d9ca8266	Merge 'Add staging SSTables support to row level repair' from Piotr " This series adds staging SSTables support to row level repair. It was introduced for streaming sessions before, but since row level repair doesn't leverage sessions at all, it's added separately. Tests: unit (release) dtest (repair_additional_test.py:RepairAdditionalTest, excluding repair_abort_test, which fails for me locally on master) " * 'add_staging_sstables_generation_to_row_level_repair_2' of https://github.com/psarna/scylla: repair: add staging sstables support to row level repair main,repair: add params to row level repair init streaming,view: move view update checks to separate file	2019-01-03 09:40:13 +00:00
Piotr Sarna	a73d9ccf31	service: mark existing views as built before bootstrap When a node is bootstrapping, it will receive data from other nodes via streaming, including materialized views. Regardless whether these views are built on other nodes or not, building them on newly bootstrapped nodes has no effect - updates were either already streamed completely (if view building have finished) or will be propagated via view building, if the process is still ongoing. So, marking all views as 'built' for the bootstrapped node prevents it from spawning superfluous view building processes. Fixes #4001 Message-Id: <fd53692c38d944122d1b1013fdb0aedf517fa409.1546498861.git.sarna@scylladb.com>	2019-01-03 09:39:33 +00:00
Botond Dénes	e5a0ea390a	querier_cache: unregister queriers evicted due to expired TTL Currently queriers evicted due to their TTL expiring are not unregistered from the `reader_concurrency_semaphore`. This can cause a use-after-free when the semaphore tries to evict the same querier at some later point in time, as the querier entry it has a pointer to is now invalid. Fix by unregistering the querier from the semaphore before destroying the entry. Refs: #4018 Refs: #4031 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4adfd09f5af8a12d73c29d59407a789324cd3d01.1546504034.git.bdenes@scylladb.com>	2019-01-03 10:29:26 +02:00
Piotr Sarna	bc74ac6f09	repair: add staging sstables support to row level repair In some cases, sstables created during row level repair should be enqueued as staging in order to generate view updates from them. Fixes #4034	2019-01-03 08:36:45 +01:00
Piotr Sarna	a0003c52cf	main,repair: add params to row level repair init Row level repair needs references to system distributed keyspace and view update generator in order to enqueue some sstables as staging.	2019-01-03 08:31:41 +01:00
Piotr Sarna	9d46715613	streaming,view: move view update checks to separate file Checking if view update path should be used for sstables is going to be reused in row level repair code, so relevant functions are moved to a separate header.	2019-01-03 08:31:40 +01:00
Avi Kivity	918d255168	querier_cache: unregister querier from reader_concurrency_semaphore during eviction In insert_querier(), we may evict older queriers to make room for the new one. However, we forgot to unregister the evicted queriers from reader_concurrency_semaphore. As a result, when reader_concurrency_semaphore eventually wanted to evict something, it saw an inactive_read_handle that was not connected to a querier_cache::entry, and crashed on use-after-free. Fix by evicting through the inactive_read_handle associated with the querier to be evicted. This removes traces of the querier from both reader_concurrency_semaphore and querier_cache. We also have to massage the statistics since querier_inactive_read::evict() updates different counters. Fixes #4018. Tests: unit(release) Reviewed-by: Botond Denes <bdenes@scylladb.com> Message-Id: <20190102175023.26093-1-avi@scylladb.com>	2019-01-03 09:15:07 +02:00
Rafael Ávila de Espíndola	28c014351f	Fix crash on corrupt sstable The check in consume_range_tombstone was too late. Before getting to it we would fail an assert calling to_bound_kind. This moves the check earlier and adds a testcase. Tests: unit (release) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-02 17:52:07 -08:00
Rafael Ávila de Espíndola	3c9178d122	sstables: Refactor predicates on bound_kind_m This moves the predicate functions to the start of the file, renames is_in_bound_kind to is_bound_kind for consistency with to_bound_kind and defines all predicates in a similar fashion. It also uses the predicates to reduce code duplication. Tests: unit (release) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2019-01-02 17:50:44 -08:00
Avi Kivity	2717bdd301	tools: toolchain: allow adjusting "docker run" command line It is useful to adjust the command line when running the docker image, for example to attach a data volume or a ccache directory. Add e mechanism to do that. Message-Id: <20181228163306.19439-1-avi@scylladb.com>	2019-01-01 21:44:50 +00:00
Avi Kivity	d19660ec0a	Merge "commitlog: Use fragmented buffers for reading entries" from Duarte " Instead of allocating a contiguous temporary_buffer when reading mutations from the commitlog - or hint - replaying, use fragemnted buffers instead. Refs #4020 " * 'commitlog/fragmented-read/v1' of https://github.com/duarten/scylla: db/commitlog: Use fragmented buffers to read entries db/commitlog: Implement skip in terms of input buffer skipping tests/fragmented_temporary_buffer_test: Add unit test for remove_suffix() utils/fragmented_temporary_buffer: Add remove_suffix tests/fragmented_temporary_buffer_test: Add unit test for skip() utils/fragmented_temporary_buffer: Allow skipping in the input stream	2019-01-01 19:08:34 +02:00
Avi Kivity	6641353854	tracing: remove static class_registry Static class_registries hinder librarification by requiring linking with all object files (instead of a library from which objects are linked on demand) and reduce readability by hiding dependencies and by their horrible syntax. Hide them behind a non-static, non-template tracing backend registry. Message-Id: <20181229121000.7885-1-avi@scylladb.com>	2018-12-31 13:24:54 +00:00
Duarte Nunes	b7517183fa	db/commitlog: Use fragmented buffers to read entries Leverage fragmented_temporary_buffer when reading commit log entries, avoiding large allocations. Refs #4020 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	0e50a9bc6d	db/commitlog: Implement skip in terms of input buffer skipping This simplifies the code and allows to get rid of the overload of advance() taking a temporary_buffer. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	8379ac6189	tests/fragmented_temporary_buffer_test: Add unit test for remove_suffix() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	1a88cd7992	utils/fragmented_temporary_buffer: Add remove_suffix Essentially hide some bytes off the end of the buffer. Needed for subsequent commit log changes. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	50dd8b67b2	tests/fragmented_temporary_buffer_test: Add unit test for skip() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	8eab0a3e01	utils/fragmented_temporary_buffer: Allow skipping in the input stream Add fragmented_temporary_buffer::istream::skip(), needed for subsequent commit log changes. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Avi Kivity	c180a18dbb	Distribute distributed_loader into its own header and source files distributed_loader is a sizeable fraction of database.cc, so moving it out reduces compile time and improves readability. Message-Id: <20181230200926.15074-1-avi@scylladb.com>	2018-12-31 14:27:27 +02:00
Avi Kivity	49958d5836	tools: toolchain: update for lz4 1.8.3 lz4 1.8.3 was released with a fix for data corruption during compression. While the release notes indicate we aren't vulnerable, be cautious and update anyway. Message-Id: <20181230144716.7238-1-avi@scylladb.com>	2018-12-31 14:27:27 +02:00
Hagit Segev	141fad9c14	Update README.md fix a typo	2018-12-31 13:33:04 +02:00
Asias He	d90836a2d3	streaming: Make total_incoming_bytes and total_outgoing_bytes metrics monotonic Currently, they increases and decreases as the stream sessions are created and destroyed. Make them prometheus monotonically increasing counter for easier monitoring. Message-Id: <7c07cea25a59a09377292dc8f64ed33ff12eda87.1545959905.git.asias@scylladb.com>	2018-12-30 16:52:17 +02:00
Pekka Enberg	96172b7bca	Merge 'Fixes for the view_update_from_staging_generator' from Duarte "This series contains a couple of fixes to the view_update_from_staging_generator, the object responsible for generating view updates from sstables written through streaming. Fixes #4021" * 'materialized-views/staging-generator-fixes/v2' of https://github.com/duarten/scylla: db/view/view_update_from_staging_generator: Break semaphore on stop() db/view/view_update_from_staging_generator: Restore formatting db/view/view_update_from_staging_generator: Avoid creating more than one fiber	2018-12-29 18:31:40 +02:00
Duarte Nunes	f41d13f38c	db/view/view_update_from_staging_generator: Break semaphore on stop() This avoid having fibers waiting _registration_sem without ever being notified. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-29 12:55:04 +00:00
Duarte Nunes	4974addc5c	db/view/view_update_from_staging_generator: Restore formatting Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-29 12:55:02 +00:00
Duarte Nunes	201196130d	db/view/view_update_from_staging_generator: Avoid creating more than one fiber If view_update_from_staging_generator::maybe_generate_view_updates() is called before view_update_from_staging_generator::start(), as can happen in main.cc, then we can potentially create more than one fiber, which leads to corrupted state and conflicting operations. To avoid this, use just one fiber and be explicit about notifying it that more work is needed, by leveraging a condition-variable. Fixes #4021 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-29 12:52:51 +00:00
Duarte Nunes	66113a2d39	Merge 'Replace query_processor's sharded<database> with plain database' from Avi " A sharded<database> is not very useful for accessing data since data is usually distributed across many nodes, while a sharded<database> contains only a single node's view. So it is really only used for accessing replicated metadata, not data. As such only the local shard is accessed. Use that to simplify query_processor a little by replacing sharded<database> with a plain database. We can probably be more ambitious and make all accesses, data and metadata, go through storage_proxy, but this is a start. " * tag 'qp-unshard-database/v1' of https://github.com/avikivity/scylla: query_processor: replace sharded<database> with the local shard commitlog_replayer: don't use query_processor client_state: change set_keyspace() to accept a single database shard legacy_schema_migrator: initialize with database reference	2018-12-29 12:14:19 +00:00
Avi Kivity	0c0cc66ee7	system_keyspace, view: reduce interdependencies system_keyspace is an implementation detail for most of its users, not part of the interface, as it's only used to store internal data. Therefore, including it in a header file causes unneeded dependencies. This patch removes a dependency between views and system_keyspace.hh by moving view_name and view_build_progress into a separate header file, and using forward declarations where possible. This allows us to remove an inclusion of system_keyspace.hh from a header file (the last one), so that further changes to system_keyspace.hh will cause fewer recompilations. Message-Id: <20181228215736.11493-1-avi@scylladb.com>	2018-12-29 12:12:15 +00:00
Avi Kivity	30745eeb72	query_processor: replace sharded<database> with the local shard query_processor uses storage_proxy to access data, and the local database object to access replicated metadata. While it seems strange that the database object is not used to access data, it is logical when you consider that a sharded<database> only contain's this node's data, not the cluster data. Take advantage of this to replace sharded<database> with a single database shard.	2018-12-29 11:02:15 +02:00
Avi Kivity	f0a709cfc8	commitlog_replayer: don't use query_processor During normal writes, query processing happens before commitlog, so logically commitlog replaying the commitlog shouldn't need it. And in fact the dependency on query_processor can be eliminated, all it needs is the local node's database.	2018-12-29 11:00:29 +02:00
Avi Kivity	7830086317	client_state: change set_keyspace() to accept a single database shard set_keyspace() only needs one shard (it is checking replicated state, not sharded data) so arrange for it to receive only that one shard.	2018-12-29 10:58:39 +02:00
Avi Kivity	e4233262cf	legacy_schema_migrator: initialize with database reference Provide legacy_schema_migrator with a sharded<database> so it doesn't need to use the one from query_processor. We want to replace query_processor's sharded<database> with just a local database reference in order to simplify it, and this is standing in the way.	2018-12-29 10:58:22 +02:00
Duarte Nunes	bab7e6877b	streaming/stream_session: Only stage sstables for tables with views When streaming, sstables for which we need to generate view updates are placed in a special staging directory. However, we only need to do this for tables that actually have views. Refs #4021 Message-Id: <20181227215412.5632-1-duarte@scylladb.com>	2018-12-28 18:32:24 +02:00
Avi Kivity	feddf0b021	tools: toolchain: patch boost for use-after-free in Boost.Test XML output The version of boost in Fedora 29 has a use-after-free bug that is only exposed when ./test.py is run with the --jenkins flag. To patch it, use a fixed version from the copr repository scylladb/toolchain. Message-Id: <20181228150419.29623-1-avi@scylladb.com>	2018-12-28 16:35:28 +01:00
Tomasz Grabiec	7747f2dde3	Merge "nodetool toppartitions" from Rafi & Avi Implementation of nodetool toppartiotion query, which samples most frequest PKs in read/write operation over a period of time. Content: - data_listener classes: mechanism that interfaces with mutation readers in database and table classes, - toppartition_query and toppartition_data_listener classes to implement toppartition-specific query (this interfaces with data_listeners and the REST api), - REST api for toppartitions query. Uses Top-k structure for handling stream summary statistics (based on implementation in C, see #2811). What's still missing: - JMX interface to nodetool (interface customization may be required), - Querying #rows and #bytes (currently, only #partitions is supported). Fixes #2811 https://github.com/avikivity/scylla rafie_toppartitions_v7.1: top_k: whitespace and minor fixes top_k: map template arguments top_k: std::list -> chunked_vector top_k: support for appending top_k results nodetool toppartitions: refactor table::config constructor nodetool toppartitions: data listeners nodetool toppartitions: add data_listeners to database/table nodetool toppartitions: fully_qualified_cf_name nodetool toppartitions: Toppartitions query implementation nodetool toppartitions: Toppartitions query REST API nodetool toppartitions: nodetool-toppartitions script	2018-12-28 16:31:24 +01:00
Rafi Einstein	7677d2ba2c	nodetool toppartitions: nodetool-toppartitions script A Python script mimicking the nodetool toppartitions utility, utilizing Scylla REST API. Examples: $ ./nodetool-toppartitions --help usage: nodetool-toppartitions [-h] [-k LIST_SIZE] [-s CAPACITY] keyspace table duration Samples database reads and writes and reports the most active partitions in a specified table positional arguments: keyspace Name of keyspace table Name of column family duration Query duration in milliseconds optional arguments: -h, --help show this help message and exit -k LIST_SIZE The number of the top partitions to list (default: 10) -s CAPACITY The capacity of stream summary (default: 256) $ ./nodetool-toppartitions ks test1 10000 READ Partition Count 30 2 20 2 10 2 WRITE Partition Count 30 1 20 1 10 1 Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:48:03 +02:00

1 2 3 4 5 ...

17595 Commits