scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 19:10:42 +00:00

Author	SHA1	Message	Date
Pekka Enberg	f6da9bc92b	Merge "Additional mutations/queries related collectd metrics" from Vlad "This series introduces some additional metrics (mostly) in a storage_proxy and a database level that are meant to create a better picture of how data flows in the cluster. First of all where possible counters of each category (e.g. total writes in the storage proxy level) are split into the following categories: - operations performed on a local Node - operations performed on remote Nodes aggregated per DC In a storage_proxy level there are the following metrics that have this "split" nature (all on a sending side): - total writes (attempts/errors) - writes performed as a result of a Read Repair logic - total data reads (attempts/completed/errors) - total digest reads (attempts/completed/errors) - total mutations data reads (attempts/completed/errors) In a batchlog_manager: - writes performed as a result of a batchlog replay logic Thereby if for instance somebody wants to get an idea of how many writes the current Node performs due to user requested mutations only he/she has to take a counter of total writes and subtract the writes resulted by Read Repairs and batchlog replays. On a receiving side of a storage_proxy we add the two following counters: - total number of received mutations - total number of forwarded mutations (attempts/errors) In order to get a better picture of what is going on on a local Node we are adding two counters on a database level: - total number of writes - total number of reads Comparing these to total writes/reads in a storage_proxy may give a good idea if there is an excessive access to a local DB for example."	2016-04-21 15:58:45 +03:00
Vlad Zolotarov	4ef5b11e9b	batchlog_manager: add a counter for a total number of write attempts Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-04-21 11:29:21 +03:00
Duarte Nunes	bc90d6a730	udt: type_parser handles user defined types This patch ensures type_parser can handle user defined types. It also prefixes user_type_impl::make_name() with org.apache.cassandra.db.marshal.UserType. Fixes #631 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-04-20 18:07:07 +02:00
Duarte Nunes	809b45e160	udt: Add drop type statement Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-04-20 18:07:02 +02:00
Duarte Nunes	d1f215b743	udt: Merge user defined type mutations This patch implements the merge_types() function, allowing mutations to user defined types to be applied. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-04-20 09:54:06 +02:00
Duarte Nunes	d6d29f7c52	schema: Replace ad hoc func with indirect_equal_to Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-04-20 09:54:06 +02:00
Duarte Nunes	dd75fe8ec0	udt: Add mutations for user defined types This patch implements mutations for user defined types. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-04-20 09:54:06 +02:00
Duarte Nunes	c7b3a4b144	udt: Parse user types system table This patch loads and parses the user types system table during bootstrap. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-04-20 09:54:06 +02:00
Calle Wilund	a7e1af1c06	db::config: Add permissions cache entries/mark auth/perm as used	2016-04-19 11:49:05 +00:00
Gleb Natapov	6f13715f8c	storage_proxy: add logging to read executor creation path Message-Id: <1460549369-29523-4-git-send-email-gleb@scylladb.com>	2016-04-14 14:58:02 +03:00
Gleb Natapov	dbb1217896	cl: enable logging for insufficient LOCAL_QUORUM consistency Message-Id: <1460549369-29523-2-git-send-email-gleb@scylladb.com>	2016-04-14 14:56:58 +03:00
Gleb Natapov	dfdbb1e703	storage_proxy: move hack to make coordinator most preferable node for read into sorting function This is kind of sorting, so it belongs there, but it also fixes a bug in storage_proxy::get_read_executor() that assumes filter_for_query() do not change order of nodes in all_nodes when extra replica is chosen. Otherwise if coordinator ip happens to be last in all_nodes then it will be chosen as extra replica and will be quired twice. Message-Id: <1460549369-29523-1-git-send-email-gleb@scylladb.com>	2016-04-14 14:56:21 +03:00
Pekka Enberg	47a904c0f6	Merge "gossip: Introduce SUPPORTED_FEATURES" from Asias "There is a need to have an ability to detect whether a feature is supported by entire cluster. The way to do it is to advertise feature availability over gossip and then each node will be able to check if all other nodes have a feature in question. The idea is to have new application state SUPPORTED_FEATURES that will contain set of strings, each string holding feature name. This series adds API to do so. The following patch on top of this series demostreates how to wait for features during boot up. FEATURE1 and FEATURE2 are introduced. We use wait_for_feature_on_all_node to wait for FEATURE1 and FEATURE2 successfully. Since FEATURE3 is not supported, the wait will not succeed, the wait will timeout. --- a/service/storage_service.cc +++ b/service/storage_service.cc @@ -95,7 +95,7 @@ sstring storage_service::get_config_supported_features() { // Add features supported by this local node. When a new feature is // introduced in scylla, update it here, e.g., // return sstring("FEATURE1,FEATURE2") - return sstring(""); + return sstring("FEATURE1,FEATURE2"); } std::set<inet_address> get_seeds() { @@ -212,6 +212,11 @@ void storage_service::prepare_to_join() { // gossip snitch infos (local DC and rack) gossip_snitch_info().get(); + gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE1"), sstring("FEATURE2")}, std::chrono::seconds(30)).get(); + logger.info("Wait for FEATURE1 and FEATURE2 done"); + gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE3")}).get(); + logger.info("Wait for FEATURE3 done"); + We can query the supported_features: cqlsh> SELECT supported_features from system.peers; supported_features -------------------- FEATURE1,FEATURE2 FEATURE1,FEATURE2 (2 rows) cqlsh> SELECT supported_features from system.local; supported_features -------------------- FEATURE1,FEATURE2 (1 rows)"	2016-04-08 09:22:50 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Asias He	50bcfe569a	system_keyspace: Add supported_features into system.local table	2016-04-06 07:12:34 +08:00
Asias He	214c0f72b2	db: Add supported_features column in system.local and system.peers table	2016-04-06 07:12:34 +08:00
Pekka Enberg	32471fcb96	Merge "Do batch log replay in decommission" from Asias "batchlog_manager is modified to allow the storage_service to initate a bachlog replay operation. Refs #1085. Tested with tests/batchlog_manager_test and batch_test.py"	2016-04-05 08:42:47 +03:00
Gleb Natapov	70575699e4	commitlog, sstables: enlarge XFS extent allocation for large files With big rows I see contention in XFS allocations which cause reactor thread to sleep. Commitlog is a main offender, so enlarge extent to commitlog segment size for big files (commitlog and sstable Data files). Message-Id: <20160404110952.GP20957@scylladb.com>	2016-04-04 14:15:00 +03:00
Paweł Dziepak	c8159eca52	commitlog: make sure that segment destructor doesn't throw Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-31 16:42:56 +01:00
Avi Kivity	417bcb122d	commitlog: ignore commitlog segments generated by Cassandra-derived tools Cassandra-derived tools (such as sstable2json) may write commitlog segments, that Scylla cannot recognize. Since we now write them with a distinct name, we can recognize the name and ignore these segments, as we know the data they contain is not interesting. Fixes #1112. Message-Id: <1459356904-20699-1-git-send-email-avi@scylladb.com>	2016-03-31 16:01:08 +03:00
Asias He	5550aeba1d	batchlog_manager: Avoid stopping batchlog_manager more than once We can stop batchlog_manager in decommission and drain. Avoid stopping it more than once. Fix the following error: $ nodetool decommission $ nodetool drain storage_service - DECOMMISSIONING: stop_gossiping done storage_service - messaging_service stopped storage_service - DECOMMISSIONING: stop messaging_service done storage_service - DECOMMISSIONING: set_bootstrap_state done storage_service - DECOMMISSIONED: storage_service - DECOMMISSIONING: done storage_service - DRAINING: starting drain process gossip - gossip is already stopped scylla: ./seastar/core/gate.hh:93: future<> seastar::gate::close(): Assertion `!_stopped && "seastar::gate::close() cannot be called more than once"' failed.	2016-03-30 20:54:30 +08:00
Asias He	cdb43c5586	batchlog_manager: Allow user initiated bachlog replay operation During decommission, the storage_service::unbootstrap() needs to initiate a batchlog replay operation. To sync the replay operation initiated by the timer in batchlog_manager and storage_service, a semaphore is introduced. To simplify the semaphore locking, the management code now always runs on shard zero, but the real work is distruted to all shards.	2016-03-30 20:54:30 +08:00
Glauber Costa	d536846433	commitlog: initialize sync period with actual sync period commitlog's sync period is initialized as the batch period, and not as the sync period itself as it should be. I've found this by code inspection, but unless I am missing something really fundamental, this seems to be completely wrong. It's been working fine because in our defaults, I have checked that both variables default to the same value. But it seems to me that as long as anyone would change one of them, the behavior wouldn't be as expected. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <2e7c565242fe5d4481a3ee8b0ba425ef14f5e42a.1459252783.git.glauber@scylladb.com>	2016-03-29 15:21:02 +03:00
Avi Kivity	a919113fdb	schema_tables: fix deadlock in cross-node communications Seastar wrongly limits the number of concurrent submit_to()s to a single remote shard. This can cause an ABBA deadlock: fiberA fiberB (x127) submit_to(0) # lock schema <- returns submit_to(0) # lock schema (waits) submit_to(0) # do work (waits) The fiberBs wait for fiberA, which in turn waits for a fiberB to return. While the correct fix is to remote the client-side limit and replace it with a server-side per-verb limit, we start with a simpler fix that replaces the blocking lock call with a non-blocking call, removing the deadlock. Fixes #1088. Message-Id: <1459095357-28950-1-git-send-email-avi@scylladb.com>	2016-03-28 10:12:10 +03:00
Tomasz Grabiec	53bbcf4a1e	schema_tables: Wait for notifications to be processed. Listeners may defer since: `93015bcc54` "migration_manager: Make the migration callbacks runs inside seastar thread" Not all places were adjusted to wait for them. Fix that. Message-Id: <1458837613-27616-1-git-send-email-tgrabiec@scylladb.com>	2016-03-24 19:04:12 +02:00
Gleb Natapov	0afd1c6f0a	config: enable truncate_request_timeout_in_ms option Option truncate_request_timeout_in_ms is used by truncate. Mark it as used. Message-Id: <20160323162649.GH2282@scylladb.com>	2016-03-23 18:50:24 +02:00
Benoît Canet	3b1d3d977d	exceptions: Shutdown communications on non file I/O errors Apply the same treatment to non file filesystem I/O errors. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-2-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:54 +02:00
Benoît Canet	1fb9a48ac5	exception: Optionally shutdown communication on I/O errors. I/O errors cannot be fixed by Scylla the only solution is to shutdown the database communications. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:52 +02:00
Gleb Natapov	c6157dd99e	enable rpc_keepalive parameter Fixes #1044 Message-Id: <20160315104609.GV6117@scylladb.com>	2016-03-15 12:51:12 +02:00
Asias He	93015bcc54	migration_manager: Make the migration callbacks runs inside seastar thread At the momment, the callbacks returns void, it is impossible to wait for the callbacks to complete. Make the callbacks runs inside seastar thread, so if we need to wait for the callback, we can make it call foo_operation().get() in the callback. It is easier than making the callbacks return future<>.	2016-03-15 15:41:23 +08:00
Glauber Costa	a339296385	database: turn sstable generation number into an optional This patch makes sure that every time we need to create a new generation number - the very first step in the creation of a new SSTable, the respective CF is already initialized and populated. Failure to do so can lead to data being overwritten. Extensive details about why this is important can be found in Scylla's Github Issue #1014 Nothing should be writing to SSTables before we have the chance to populate the existing SSTables and calculate what should the next generation number be. However, if that happens, we want to protect against it in a way that does not involve overwriting existing tables. This is one of the ways to do it: every column family starts in an unwriteable state, and when it can finally be written to, we mark it as writeable. Note that this cannot be a part of add_column_family. That adds a column family to a db in memory only, and if anybody is about to write to a CF, that was most likely already called. We need to call this explicitly when we are sure we're ready to issue disk operations safely. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-10 21:06:05 -05:00
Gleb Natapov	f59415b3c6	Take pending endpoints into account while checking for sufficient live nodes During bootstrapping additional copies of data has to be made to ensure that CL level is met (see CASSANDRA-833 for details). Our code does that, but it does not take into account that bootstraping node can be dead which may cause request to proceed even though there is no enough live nodes for it to be completed. In such a case request neither completes nor timeouts, so it appear to be stuck from CQL layer POV. The patch fixes this by taking into account pending nodes while checking that there are enough sufficient live nodes for operation to proceed. Fixes #965 Message-Id: <20160303165250.GG2253@scylladb.com>	2016-03-07 13:30:13 +01:00
Pekka Enberg	9c930d88a0	db/system_keyspace: Remove ifdef'd code We have our implementations of all the three ifdef'd functions. Message-Id: <1456926917-12594-1-git-send-email-penberg@scylladb.com>	2016-03-03 12:26:50 +02:00
Tomasz Grabiec	04f2482d74	schema_tables: Log results of schema merge Currently schema changes are only logged at coordinator node which initiates the change. It would be helpful in post morten analysis to also see when and how schema changes are resolved when applied on other nodes. Message-Id: <1456953095-1982-1-git-send-email-tgrabiec@scylladb.com>	2016-03-03 11:12:15 +02:00
Calle Wilund	0c3322befd	commitlog: Ensure segment survives whole flush call Must keep shared pointer alíve. Likewise though, the shared pointer copy in cycle main continuation is not needed. Message-Id: <1456931988-5876-3-git-send-email-calle@scylladb.com>	2016-03-02 18:22:13 +02:00
Calle Wilund	f1c4e3eb3d	commitlog: Clear reserve segments in orphan_all Otherwise they will keep the segment_manager alive (leak). Fixes jenkins ASan errors. Message-Id: <1456931988-5876-2-git-send-email-calle@scylladb.com>	2016-03-02 18:22:09 +02:00
Calle Wilund	a556f665c0	commitlog: Take segment_manager locks first in write/flush While is is formally better to take a local lock first and then first contend for a global, in this case it is arguably better to ensure we get a gate exception synchronously (early) instead of potentially in a continuation. Old version might cause us to do a gate::leave even while never entered. And since we should really only have one active (contending) segment per shard anyway, it should not matter. Message-Id: <1456931988-5876-1-git-send-email-calle@scylladb.com>	2016-03-02 18:22:05 +02:00
Paweł Dziepak	d50594351b	db: remove old-style serializers Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-02 09:09:30 +00:00
Paweł Dziepak	bdc23ae5b5	remove db/serializer.hh includes Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-02 09:07:09 +00:00
Calle Wilund	e667dcc3d0	commitlog: Make segment->segment_manager relation shared pointer The segment->segment_manager pointer has, until now, been a raw pointer, which in a way is sensible, since making circular shared pointer relations is in general bad. However, since the code and life cycle of segments has evolved quite a bit since that initial relation was defined, becoming both more and then suddenly, in a sense, less, asynchronous over time, the usage of the relation is in fact more consistent with a shared pointer, in that a segment needs to access its manager to properly do things like write and flush. These two ops in particular depend on accessing the segment manager in a way that might be fine even using raw pointers, if it was not again for that little annoying thing of continuation reordering. So, lets just make the relation a shared pointer, solving the issue of whether the manager is alive when a segment accesses it. If it has been "released" (shut down), the existing mechanisms (gate) will then trigger and prevent any actual _actions_ from taking place. And we don't have to complicate anything else even more. Only "big" change is that we need to explicitly orphan all segments in commitlog destructor (segment_manager is essentially a p-impl). This fixes some spurious crashes in nightly unit tests. Fixes #966. Message-Id: <1456838735-17108-1-git-send-email-calle@scylladb.com>	2016-03-01 16:48:28 +02:00
Paweł Dziepak	dec63eac6e	commitlog: add commitlog entry move constructor Default move constructor and assignment didn't handle reference to mutation (_mutation) properly. Fixes #935. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1456760905-23478-1-git-send-email-pdziepak@scylladb.com>	2016-02-29 18:10:15 +02:00
Calle Wilund	dc136a6a1c	commitlog: Fix reserve counter overflow Fixes #482 See code comment. Reserve segment allocation count sum can temporarily overflow due to continuation delay/reordering, if we manage to reach the on_timer code before finally clauses from previous reserve allocation invocation has processed. However, since these are benign overflows (just indicating even more that we don't need to do anything right now) simply capping the count should be fine. Avoids assert in boost irange. Message-Id: <1456740679-4537-1-git-send-email-calle@scylladb.com>	2016-02-29 14:56:24 +02:00
Avi Kivity	5cc1b39cc9	Merge "Store gossip generation in system table" from Asias "Kill one FIXME."	2016-02-29 14:53:06 +02:00
Asias He	abafec99a5	system_keyspace: Implement increment_and_get_generation	2016-02-29 16:31:42 +08:00
Tomasz Grabiec	697d9bfa56	serializer: Introduce as_input_stream(bytes_view)	2016-02-26 12:26:13 +01:00
Calle Wilund	590ec1674b	truncate: Require timestamp join-function to ensure equal values Fixes #937 In fixing #884, truncation not truncating memtables properly, time stamping in truncate was made shard-local. This however breaks the snapshot logic, since for all shards in a truncate, the sstables should snapshot to the same location. This patch adds a required function argument to truncate (and by extension drop_column_family) that produces a time stamp in a "join" fashion (i.e. same on all shards), and utilizes the joinpoint type in caller to do so. Message-Id: <1456332856-23395-2-git-send-email-calle@scylladb.com>	2016-02-24 18:59:31 +02:00
Avi Kivity	efabb1a1d8	commitlog: fix buffer size calculation We were adding bool(buffer), instead of buffer.size(); exposed by making temporary_buffer::operator bool explicit.	2016-02-24 13:38:05 +02:00
Paweł Dziepak	1b52264dfd	batchlog_manager: use new canonical_mutation serializers Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-02-19 23:12:00 +00:00
Paweł Dziepak	89b75a02d4	commitlog: use IDL-based serialization for entries Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-02-19 23:11:59 +00:00
Paweł Dziepak	f548c75200	commitlog: move implementation to *.cc file Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-02-19 23:11:59 +00:00

1 2 3 4 5 ...

638 Commits