scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-08 16:03:20 +00:00

Author	SHA1	Message	Date
Avi Kivity	527e3a58ff	install-dependencies.sh: add maven and ant Add tools needed to build scylla-jmx and scylla-tools-java. While not requirements of this repository, it's nicer if a single setup can be used to build and run everything. We also install pystache as it's used by packaging scripts.	2019-01-03 16:16:45 +02:00
Avi Kivity	918d255168	querier_cache: unregister querier from reader_concurrency_semaphore during eviction In insert_querier(), we may evict older queriers to make room for the new one. However, we forgot to unregister the evicted queriers from reader_concurrency_semaphore. As a result, when reader_concurrency_semaphore eventually wanted to evict something, it saw an inactive_read_handle that was not connected to a querier_cache::entry, and crashed on use-after-free. Fix by evicting through the inactive_read_handle associated with the querier to be evicted. This removes traces of the querier from both reader_concurrency_semaphore and querier_cache. We also have to massage the statistics since querier_inactive_read::evict() updates different counters. Fixes #4018. Tests: unit(release) Reviewed-by: Botond Denes <bdenes@scylladb.com> Message-Id: <20190102175023.26093-1-avi@scylladb.com>	2019-01-03 09:15:07 +02:00
Avi Kivity	2717bdd301	tools: toolchain: allow adjusting "docker run" command line It is useful to adjust the command line when running the docker image, for example to attach a data volume or a ccache directory. Add e mechanism to do that. Message-Id: <20181228163306.19439-1-avi@scylladb.com>	2019-01-01 21:44:50 +00:00
Avi Kivity	d19660ec0a	Merge "commitlog: Use fragmented buffers for reading entries" from Duarte " Instead of allocating a contiguous temporary_buffer when reading mutations from the commitlog - or hint - replaying, use fragemnted buffers instead. Refs #4020 " * 'commitlog/fragmented-read/v1' of https://github.com/duarten/scylla: db/commitlog: Use fragmented buffers to read entries db/commitlog: Implement skip in terms of input buffer skipping tests/fragmented_temporary_buffer_test: Add unit test for remove_suffix() utils/fragmented_temporary_buffer: Add remove_suffix tests/fragmented_temporary_buffer_test: Add unit test for skip() utils/fragmented_temporary_buffer: Allow skipping in the input stream	2019-01-01 19:08:34 +02:00
Avi Kivity	6641353854	tracing: remove static class_registry Static class_registries hinder librarification by requiring linking with all object files (instead of a library from which objects are linked on demand) and reduce readability by hiding dependencies and by their horrible syntax. Hide them behind a non-static, non-template tracing backend registry. Message-Id: <20181229121000.7885-1-avi@scylladb.com>	2018-12-31 13:24:54 +00:00
Duarte Nunes	b7517183fa	db/commitlog: Use fragmented buffers to read entries Leverage fragmented_temporary_buffer when reading commit log entries, avoiding large allocations. Refs #4020 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	0e50a9bc6d	db/commitlog: Implement skip in terms of input buffer skipping This simplifies the code and allows to get rid of the overload of advance() taking a temporary_buffer. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	8379ac6189	tests/fragmented_temporary_buffer_test: Add unit test for remove_suffix() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	1a88cd7992	utils/fragmented_temporary_buffer: Add remove_suffix Essentially hide some bytes off the end of the buffer. Needed for subsequent commit log changes. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	50dd8b67b2	tests/fragmented_temporary_buffer_test: Add unit test for skip() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	8eab0a3e01	utils/fragmented_temporary_buffer: Allow skipping in the input stream Add fragmented_temporary_buffer::istream::skip(), needed for subsequent commit log changes. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Avi Kivity	c180a18dbb	Distribute distributed_loader into its own header and source files distributed_loader is a sizeable fraction of database.cc, so moving it out reduces compile time and improves readability. Message-Id: <20181230200926.15074-1-avi@scylladb.com>	2018-12-31 14:27:27 +02:00
Avi Kivity	49958d5836	tools: toolchain: update for lz4 1.8.3 lz4 1.8.3 was released with a fix for data corruption during compression. While the release notes indicate we aren't vulnerable, be cautious and update anyway. Message-Id: <20181230144716.7238-1-avi@scylladb.com>	2018-12-31 14:27:27 +02:00
Hagit Segev	141fad9c14	Update README.md fix a typo	2018-12-31 13:33:04 +02:00
Asias He	d90836a2d3	streaming: Make total_incoming_bytes and total_outgoing_bytes metrics monotonic Currently, they increases and decreases as the stream sessions are created and destroyed. Make them prometheus monotonically increasing counter for easier monitoring. Message-Id: <7c07cea25a59a09377292dc8f64ed33ff12eda87.1545959905.git.asias@scylladb.com>	2018-12-30 16:52:17 +02:00
Pekka Enberg	96172b7bca	Merge 'Fixes for the view_update_from_staging_generator' from Duarte "This series contains a couple of fixes to the view_update_from_staging_generator, the object responsible for generating view updates from sstables written through streaming. Fixes #4021" * 'materialized-views/staging-generator-fixes/v2' of https://github.com/duarten/scylla: db/view/view_update_from_staging_generator: Break semaphore on stop() db/view/view_update_from_staging_generator: Restore formatting db/view/view_update_from_staging_generator: Avoid creating more than one fiber	2018-12-29 18:31:40 +02:00
Duarte Nunes	f41d13f38c	db/view/view_update_from_staging_generator: Break semaphore on stop() This avoid having fibers waiting _registration_sem without ever being notified. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-29 12:55:04 +00:00
Duarte Nunes	4974addc5c	db/view/view_update_from_staging_generator: Restore formatting Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-29 12:55:02 +00:00
Duarte Nunes	201196130d	db/view/view_update_from_staging_generator: Avoid creating more than one fiber If view_update_from_staging_generator::maybe_generate_view_updates() is called before view_update_from_staging_generator::start(), as can happen in main.cc, then we can potentially create more than one fiber, which leads to corrupted state and conflicting operations. To avoid this, use just one fiber and be explicit about notifying it that more work is needed, by leveraging a condition-variable. Fixes #4021 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-29 12:52:51 +00:00
Duarte Nunes	66113a2d39	Merge 'Replace query_processor's sharded<database> with plain database' from Avi " A sharded<database> is not very useful for accessing data since data is usually distributed across many nodes, while a sharded<database> contains only a single node's view. So it is really only used for accessing replicated metadata, not data. As such only the local shard is accessed. Use that to simplify query_processor a little by replacing sharded<database> with a plain database. We can probably be more ambitious and make all accesses, data and metadata, go through storage_proxy, but this is a start. " * tag 'qp-unshard-database/v1' of https://github.com/avikivity/scylla: query_processor: replace sharded<database> with the local shard commitlog_replayer: don't use query_processor client_state: change set_keyspace() to accept a single database shard legacy_schema_migrator: initialize with database reference	2018-12-29 12:14:19 +00:00
Avi Kivity	0c0cc66ee7	system_keyspace, view: reduce interdependencies system_keyspace is an implementation detail for most of its users, not part of the interface, as it's only used to store internal data. Therefore, including it in a header file causes unneeded dependencies. This patch removes a dependency between views and system_keyspace.hh by moving view_name and view_build_progress into a separate header file, and using forward declarations where possible. This allows us to remove an inclusion of system_keyspace.hh from a header file (the last one), so that further changes to system_keyspace.hh will cause fewer recompilations. Message-Id: <20181228215736.11493-1-avi@scylladb.com>	2018-12-29 12:12:15 +00:00
Avi Kivity	30745eeb72	query_processor: replace sharded<database> with the local shard query_processor uses storage_proxy to access data, and the local database object to access replicated metadata. While it seems strange that the database object is not used to access data, it is logical when you consider that a sharded<database> only contain's this node's data, not the cluster data. Take advantage of this to replace sharded<database> with a single database shard.	2018-12-29 11:02:15 +02:00
Avi Kivity	f0a709cfc8	commitlog_replayer: don't use query_processor During normal writes, query processing happens before commitlog, so logically commitlog replaying the commitlog shouldn't need it. And in fact the dependency on query_processor can be eliminated, all it needs is the local node's database.	2018-12-29 11:00:29 +02:00
Avi Kivity	7830086317	client_state: change set_keyspace() to accept a single database shard set_keyspace() only needs one shard (it is checking replicated state, not sharded data) so arrange for it to receive only that one shard.	2018-12-29 10:58:39 +02:00
Avi Kivity	e4233262cf	legacy_schema_migrator: initialize with database reference Provide legacy_schema_migrator with a sharded<database> so it doesn't need to use the one from query_processor. We want to replace query_processor's sharded<database> with just a local database reference in order to simplify it, and this is standing in the way.	2018-12-29 10:58:22 +02:00
Duarte Nunes	bab7e6877b	streaming/stream_session: Only stage sstables for tables with views When streaming, sstables for which we need to generate view updates are placed in a special staging directory. However, we only need to do this for tables that actually have views. Refs #4021 Message-Id: <20181227215412.5632-1-duarte@scylladb.com>	2018-12-28 18:32:24 +02:00
Avi Kivity	feddf0b021	tools: toolchain: patch boost for use-after-free in Boost.Test XML output The version of boost in Fedora 29 has a use-after-free bug that is only exposed when ./test.py is run with the --jenkins flag. To patch it, use a fixed version from the copr repository scylladb/toolchain. Message-Id: <20181228150419.29623-1-avi@scylladb.com>	2018-12-28 16:35:28 +01:00
Tomasz Grabiec	7747f2dde3	Merge "nodetool toppartitions" from Rafi & Avi Implementation of nodetool toppartiotion query, which samples most frequest PKs in read/write operation over a period of time. Content: - data_listener classes: mechanism that interfaces with mutation readers in database and table classes, - toppartition_query and toppartition_data_listener classes to implement toppartition-specific query (this interfaces with data_listeners and the REST api), - REST api for toppartitions query. Uses Top-k structure for handling stream summary statistics (based on implementation in C, see #2811). What's still missing: - JMX interface to nodetool (interface customization may be required), - Querying #rows and #bytes (currently, only #partitions is supported). Fixes #2811 https://github.com/avikivity/scylla rafie_toppartitions_v7.1: top_k: whitespace and minor fixes top_k: map template arguments top_k: std::list -> chunked_vector top_k: support for appending top_k results nodetool toppartitions: refactor table::config constructor nodetool toppartitions: data listeners nodetool toppartitions: add data_listeners to database/table nodetool toppartitions: fully_qualified_cf_name nodetool toppartitions: Toppartitions query implementation nodetool toppartitions: Toppartitions query REST API nodetool toppartitions: nodetool-toppartitions script	2018-12-28 16:31:24 +01:00
Rafi Einstein	7677d2ba2c	nodetool toppartitions: nodetool-toppartitions script A Python script mimicking the nodetool toppartitions utility, utilizing Scylla REST API. Examples: $ ./nodetool-toppartitions --help usage: nodetool-toppartitions [-h] [-k LIST_SIZE] [-s CAPACITY] keyspace table duration Samples database reads and writes and reports the most active partitions in a specified table positional arguments: keyspace Name of keyspace table Name of column family duration Query duration in milliseconds optional arguments: -h, --help show this help message and exit -k LIST_SIZE The number of the top partitions to list (default: 10) -s CAPACITY The capacity of stream summary (default: 256) $ ./nodetool-toppartitions ks test1 10000 READ Partition Count 30 2 20 2 10 2 WRITE Partition Count 30 1 20 1 10 1 Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:48:03 +02:00
Rafi Einstein	197f38d4ee	nodetool toppartitions: Toppartitions query REST API A HTTP GET operation starts the query (with args: ks/cf name and duration in ms). It executes synchroneously, results are returned as JSON: $ curl -s -X GET http://localhost:10000/column_family/toppartitions/ks:cf1?duration=10000 \| jq { "read": [ { "count": "15", "error": "0", "partition": "4b504d39354f37353131" }, { "count": "15", "error": "0", "partition": "3738313134394d353530" } ], "write": [ { "count": "15", "error": "0", "partition": "4b504d39354f37353131" }, { "count": "15", "error": "0", "partition": "3738313134394d353530" } ] } Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Rafi Einstein	6b2c21f69b	nodetool toppartitions: Toppartitions query implementation toppartitions_query installs toppartitions_data_listener-s on all database shards, waits for the designated period, uninstalls shards and collects top-k read/write partition keys. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Rafi Einstein	404f75def5	nodetool toppartitions: fully_qualified_cf_name Encapsulate keyspace:column_family REST API argument parsing into fully_qualified_cf_name class. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Rafi Einstein	0bffe5f83e	nodetool toppartitions: add data_listeners to database/table Add data_listeners member to database. Adds data_listeners* to table::config, to be used by table methods to invoke listeners. Install on_read() listener in table::make_reader(). Install on_write() listener in database::apply_in_memory(). Tests: Unit (release) Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Rafi Einstein	08ba115c16	nodetool toppartitions: data listeners Mechanism that interfaces with mutation readers in database and table classes, to allow tracking most frequent partition keys in read and write operation. Basic design is specified in #2811. Tracking top #rows and #bytes will be supported in the future. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Rafi Einstein	038f8c7988	nodetool toppartitions: refactor table::config constructor Eliminae extra parameters to ctor and deduce them instead from db param. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:57 +02:00
Rafi Einstein	eda43b93c9	top_k: support for appending top_k results Allow appending results of one top_k into another. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:56 +02:00
Rafi Einstein	aeebe8e86b	top_k: std::list -> chunked_vector Replaced std::list with chunked_vector. Because chunked_vector requires a noexcept move constructor from its value type, change the bad_boy type in the unit test not to throw in the move constructor. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:07 +02:00
Avi Kivity	8e2f6d0513	Merge "Fix use-after-free when destroying partition_snapshots in the background"from Tomasz " partition_snapshots created in the memtable will keep a reference to the memtable (as region) and to memtable::_cleaner. As long as the reader is alive, the memtable will be kept alive by partition_snapshot_flat_reader::_container_guard. But after that nothing prevents it from being destroyed. The snapshot can outlive the read if mutation_cleaner::merge_and_destroy() defers its destruction for later. When the read ends after memtable was flushed, the snapshot will be queued in the cache's cleaner, but internally will reference memtable's region and cleaner. This will result in a use-after-free when the snapshot resumes destruction. The fix is to update snapshots's region and cleaner references at the time of queueing to point to the cache's region and cleaner. When memtable is destroyed without being moved to cache there is no problem because the snapshot would be queued into memtable's cleaner, which will be drained on destruction from all snapshots. Introduced in `f3da043` (in >= 3.0-rc1) Fixes #4030. Tests: - mvcc_test (debug) " tag 'fix-snapshot-merging-use-after-free-v1.1' of github.com:tgrabiec/scylla: tests: mvcc: Add test_snapshot_merging_after_container_is_destroyed tests: mvcc: Introduce mvcc_container::migrate() tests: mvcc: Make mvcc_partition move-constructible tests: mvcc: Introduce mvcc_container::make_not_evictable() tests: mvcc: Allow constructing mvcc_container without a cache_tracker mutation_cleaner: Migrate partition_snapshots when queueing for background cleanup mvcc: partition_snapshot: Introduce migrate() mutation_cleaner: impl: Store a back-reference to the owning mutation_cleaner	2018-12-28 12:45:10 +02:00
Tomasz Grabiec	bb1c9cb6f3	tests: mvcc: Add test_snapshot_merging_after_container_is_destroyed	2018-12-28 10:32:39 +01:00
Tomasz Grabiec	4d13dea39a	tests: mvcc: Introduce mvcc_container::migrate()	2018-12-28 10:32:39 +01:00
Tomasz Grabiec	676868ed31	tests: mvcc: Make mvcc_partition move-constructible	2018-12-28 10:32:39 +01:00
Tomasz Grabiec	c6798f7872	tests: mvcc: Introduce mvcc_container::make_not_evictable()	2018-12-28 10:32:39 +01:00
Tomasz Grabiec	1fa00656ea	tests: mvcc: Allow constructing mvcc_container without a cache_tracker Some test cases will need many containers to simulate memtable -> cache transitions, but there can be only one cache_tracker per shard due to metrics. Allow constructing a conatiner without a cache_tracker (and thus non-evictable).	2018-12-28 10:32:39 +01:00
Tomasz Grabiec	ac49b1def0	mutation_cleaner: Migrate partition_snapshots when queueing for background cleanup partition_snapshots created in the memtable will keep a reference to the memtable (as region*) and to memtable::_cleaner. As long as the reader is alive the memtable will be kept alive by partition_snapshot_flat_reader::_container_guard. But after that, nothing prevents it from being destroyed. The snapshot can outlive the read if mutation_cleaner::merge_and_destroy() defers its destruction for later. When the read ends after memtable was flushed, the snapshot will be queued in the cache's cleaner, but internally will reference memtable's region and cleaner. This will result in a use-after-free when the snapshot resumses destruction. The fix is to update snapshots's region and cleaner references at the time of queueing to point to the cache's region and cleaner. When memtable is destroyed without being moved to cache there is no problem, because the snapshot would be queued into memtable's cleaner, which will be drained on destruction from all snapshots. Introduced in `f3da043`. Fixes #4030.	2018-12-27 18:08:50 +01:00
Tomasz Grabiec	20f5d5d1a1	mvcc: partition_snapshot: Introduce migrate() Snapshots which outlive the memtable will need to have their _region and _cleaner references updated. The snapshot can be destroyed after the memtable when it is queud in the mutation_cleaner.	2018-12-27 18:08:50 +01:00
Tomasz Grabiec	67f9afbd1a	mutation_cleaner: impl: Store a back-reference to the owning mutation_cleaner	2018-12-27 18:08:50 +01:00
Gleb Natapov	37b4043677	streaming: always read from rpc::source until end-of-stream during mutation sending rpc::source cannot be abandoned until EOS is reached, but current code does not obey it if error code is received, it throws exception instead that aborts the reading loop. Fix it by moving exception throwing out of the loop. Fixes: #4025 Message-Id: <20181227135051.GC29458@scylladb.com>	2018-12-27 16:50:53 +02:00
Asias He	4d3c463536	storage_service: Stop cql server before gossip We saw failure in dtest concurrent_schema_changes_test.py: TestConcurrentSchemaChanges.changes_while_node_down_test test. ====================================================================== ERROR: changes_while_node_down_test (concurrent_schema_changes_test.TestConcurrentSchemaChanges) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/asias/src/cloudius-systems/scylla-dtest/concurrent_schema_changes_test.py", line 432, in changes_while_node_down_test self.make_schema_changes(session, namespace='ns2') File "/home/asias/src/cloudius-systems/scylla-dtest/concurrent_schema_changes_test.py", line 86, in make_schema_changes session.execute('USE ks_%s' % namespace) File "cassandra/cluster.py", line 2141, in cassandra.cluster.Session.execute return self.execute_async(query, parameters, trace, custom_payload, timeout, execution_profile, paging_state).result() File "cassandra/cluster.py", line 4033, in cassandra.cluster.ResponseFuture.result raise self._final_exception ConnectionShutdown: Connection to 127.0.0.1 is closed The test: session = self.patient_cql_connection(node2) self.prepare_for_changes(session, namespace='ns2') node1.stop() self.make_schema_changes(session, namespace='ns2') --> ConnectionShutdown exception throws The problem is that, after receiving the DOWN event, the python Cassandra driver will call Cluster:on_down which checks if this client has any connections to the node being shutdown. If there is any connections, the Cluster:on_down handler will exit early, so the session to the node being shutdown will not be removed. If we shutdown the cql server first, the connection count will be zero and the session will be removed. Fixes: #4013 Message-Id: <7388f679a7b09ada10afe7e783d7868a58aac6ec.1545634941.git.asias@scylladb.com>	2018-12-27 14:13:43 +02:00
Duarte Nunes	2f69ba2844	lwt: Remove Paxos-related Cassandra code Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181227112526.4180-1-duarte@scylladb.com>	2018-12-27 13:30:10 +02:00
Duarte Nunes	66e45469b2	streaming/stream_session: Don't use table reference across defer points When creating a sstable from which to generate view updates, we held on to a table reference across defer points. In case there's a concurrent schema drop, the table object might be destroyed and we will incur in a use-after-free. Solve this by holding on to a shared pointer and pinning the table object. Refs #4021 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181227105921.3601-1-duarte@scylladb.com>	2018-12-27 13:05:46 +02:00

1 2 3 4 5 ...

17574 Commits