scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 11:55:15 +00:00

Author	SHA1	Message	Date
Paweł Dziepak	c8159eca52	commitlog: make sure that segment destructor doesn't throw Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-31 16:42:56 +01:00
Avi Kivity	417bcb122d	commitlog: ignore commitlog segments generated by Cassandra-derived tools Cassandra-derived tools (such as sstable2json) may write commitlog segments, that Scylla cannot recognize. Since we now write them with a distinct name, we can recognize the name and ignore these segments, as we know the data they contain is not interesting. Fixes #1112. Message-Id: <1459356904-20699-1-git-send-email-avi@scylladb.com>	2016-03-31 16:01:08 +03:00
Glauber Costa	d536846433	commitlog: initialize sync period with actual sync period commitlog's sync period is initialized as the batch period, and not as the sync period itself as it should be. I've found this by code inspection, but unless I am missing something really fundamental, this seems to be completely wrong. It's been working fine because in our defaults, I have checked that both variables default to the same value. But it seems to me that as long as anyone would change one of them, the behavior wouldn't be as expected. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <2e7c565242fe5d4481a3ee8b0ba425ef14f5e42a.1459252783.git.glauber@scylladb.com>	2016-03-29 15:21:02 +03:00
Avi Kivity	a919113fdb	schema_tables: fix deadlock in cross-node communications Seastar wrongly limits the number of concurrent submit_to()s to a single remote shard. This can cause an ABBA deadlock: fiberA fiberB (x127) submit_to(0) # lock schema <- returns submit_to(0) # lock schema (waits) submit_to(0) # do work (waits) The fiberBs wait for fiberA, which in turn waits for a fiberB to return. While the correct fix is to remote the client-side limit and replace it with a server-side per-verb limit, we start with a simpler fix that replaces the blocking lock call with a non-blocking call, removing the deadlock. Fixes #1088. Message-Id: <1459095357-28950-1-git-send-email-avi@scylladb.com>	2016-03-28 10:12:10 +03:00
Tomasz Grabiec	53bbcf4a1e	schema_tables: Wait for notifications to be processed. Listeners may defer since: `93015bcc54` "migration_manager: Make the migration callbacks runs inside seastar thread" Not all places were adjusted to wait for them. Fix that. Message-Id: <1458837613-27616-1-git-send-email-tgrabiec@scylladb.com>	2016-03-24 19:04:12 +02:00
Gleb Natapov	0afd1c6f0a	config: enable truncate_request_timeout_in_ms option Option truncate_request_timeout_in_ms is used by truncate. Mark it as used. Message-Id: <20160323162649.GH2282@scylladb.com>	2016-03-23 18:50:24 +02:00
Benoît Canet	3b1d3d977d	exceptions: Shutdown communications on non file I/O errors Apply the same treatment to non file filesystem I/O errors. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-2-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:54 +02:00
Benoît Canet	1fb9a48ac5	exception: Optionally shutdown communication on I/O errors. I/O errors cannot be fixed by Scylla the only solution is to shutdown the database communications. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:52 +02:00
Gleb Natapov	c6157dd99e	enable rpc_keepalive parameter Fixes #1044 Message-Id: <20160315104609.GV6117@scylladb.com>	2016-03-15 12:51:12 +02:00
Asias He	93015bcc54	migration_manager: Make the migration callbacks runs inside seastar thread At the momment, the callbacks returns void, it is impossible to wait for the callbacks to complete. Make the callbacks runs inside seastar thread, so if we need to wait for the callback, we can make it call foo_operation().get() in the callback. It is easier than making the callbacks return future<>.	2016-03-15 15:41:23 +08:00
Glauber Costa	a339296385	database: turn sstable generation number into an optional This patch makes sure that every time we need to create a new generation number - the very first step in the creation of a new SSTable, the respective CF is already initialized and populated. Failure to do so can lead to data being overwritten. Extensive details about why this is important can be found in Scylla's Github Issue #1014 Nothing should be writing to SSTables before we have the chance to populate the existing SSTables and calculate what should the next generation number be. However, if that happens, we want to protect against it in a way that does not involve overwriting existing tables. This is one of the ways to do it: every column family starts in an unwriteable state, and when it can finally be written to, we mark it as writeable. Note that this cannot be a part of add_column_family. That adds a column family to a db in memory only, and if anybody is about to write to a CF, that was most likely already called. We need to call this explicitly when we are sure we're ready to issue disk operations safely. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-10 21:06:05 -05:00
Gleb Natapov	f59415b3c6	Take pending endpoints into account while checking for sufficient live nodes During bootstrapping additional copies of data has to be made to ensure that CL level is met (see CASSANDRA-833 for details). Our code does that, but it does not take into account that bootstraping node can be dead which may cause request to proceed even though there is no enough live nodes for it to be completed. In such a case request neither completes nor timeouts, so it appear to be stuck from CQL layer POV. The patch fixes this by taking into account pending nodes while checking that there are enough sufficient live nodes for operation to proceed. Fixes #965 Message-Id: <20160303165250.GG2253@scylladb.com>	2016-03-07 13:30:13 +01:00
Pekka Enberg	9c930d88a0	db/system_keyspace: Remove ifdef'd code We have our implementations of all the three ifdef'd functions. Message-Id: <1456926917-12594-1-git-send-email-penberg@scylladb.com>	2016-03-03 12:26:50 +02:00
Tomasz Grabiec	04f2482d74	schema_tables: Log results of schema merge Currently schema changes are only logged at coordinator node which initiates the change. It would be helpful in post morten analysis to also see when and how schema changes are resolved when applied on other nodes. Message-Id: <1456953095-1982-1-git-send-email-tgrabiec@scylladb.com>	2016-03-03 11:12:15 +02:00
Calle Wilund	0c3322befd	commitlog: Ensure segment survives whole flush call Must keep shared pointer alíve. Likewise though, the shared pointer copy in cycle main continuation is not needed. Message-Id: <1456931988-5876-3-git-send-email-calle@scylladb.com>	2016-03-02 18:22:13 +02:00
Calle Wilund	f1c4e3eb3d	commitlog: Clear reserve segments in orphan_all Otherwise they will keep the segment_manager alive (leak). Fixes jenkins ASan errors. Message-Id: <1456931988-5876-2-git-send-email-calle@scylladb.com>	2016-03-02 18:22:09 +02:00
Calle Wilund	a556f665c0	commitlog: Take segment_manager locks first in write/flush While is is formally better to take a local lock first and then first contend for a global, in this case it is arguably better to ensure we get a gate exception synchronously (early) instead of potentially in a continuation. Old version might cause us to do a gate::leave even while never entered. And since we should really only have one active (contending) segment per shard anyway, it should not matter. Message-Id: <1456931988-5876-1-git-send-email-calle@scylladb.com>	2016-03-02 18:22:05 +02:00
Paweł Dziepak	d50594351b	db: remove old-style serializers Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-02 09:09:30 +00:00
Paweł Dziepak	bdc23ae5b5	remove db/serializer.hh includes Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-02 09:07:09 +00:00
Calle Wilund	e667dcc3d0	commitlog: Make segment->segment_manager relation shared pointer The segment->segment_manager pointer has, until now, been a raw pointer, which in a way is sensible, since making circular shared pointer relations is in general bad. However, since the code and life cycle of segments has evolved quite a bit since that initial relation was defined, becoming both more and then suddenly, in a sense, less, asynchronous over time, the usage of the relation is in fact more consistent with a shared pointer, in that a segment needs to access its manager to properly do things like write and flush. These two ops in particular depend on accessing the segment manager in a way that might be fine even using raw pointers, if it was not again for that little annoying thing of continuation reordering. So, lets just make the relation a shared pointer, solving the issue of whether the manager is alive when a segment accesses it. If it has been "released" (shut down), the existing mechanisms (gate) will then trigger and prevent any actual _actions_ from taking place. And we don't have to complicate anything else even more. Only "big" change is that we need to explicitly orphan all segments in commitlog destructor (segment_manager is essentially a p-impl). This fixes some spurious crashes in nightly unit tests. Fixes #966. Message-Id: <1456838735-17108-1-git-send-email-calle@scylladb.com>	2016-03-01 16:48:28 +02:00
Paweł Dziepak	dec63eac6e	commitlog: add commitlog entry move constructor Default move constructor and assignment didn't handle reference to mutation (_mutation) properly. Fixes #935. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1456760905-23478-1-git-send-email-pdziepak@scylladb.com>	2016-02-29 18:10:15 +02:00
Calle Wilund	dc136a6a1c	commitlog: Fix reserve counter overflow Fixes #482 See code comment. Reserve segment allocation count sum can temporarily overflow due to continuation delay/reordering, if we manage to reach the on_timer code before finally clauses from previous reserve allocation invocation has processed. However, since these are benign overflows (just indicating even more that we don't need to do anything right now) simply capping the count should be fine. Avoids assert in boost irange. Message-Id: <1456740679-4537-1-git-send-email-calle@scylladb.com>	2016-02-29 14:56:24 +02:00
Avi Kivity	5cc1b39cc9	Merge "Store gossip generation in system table" from Asias "Kill one FIXME."	2016-02-29 14:53:06 +02:00
Asias He	abafec99a5	system_keyspace: Implement increment_and_get_generation	2016-02-29 16:31:42 +08:00
Tomasz Grabiec	697d9bfa56	serializer: Introduce as_input_stream(bytes_view)	2016-02-26 12:26:13 +01:00
Calle Wilund	590ec1674b	truncate: Require timestamp join-function to ensure equal values Fixes #937 In fixing #884, truncation not truncating memtables properly, time stamping in truncate was made shard-local. This however breaks the snapshot logic, since for all shards in a truncate, the sstables should snapshot to the same location. This patch adds a required function argument to truncate (and by extension drop_column_family) that produces a time stamp in a "join" fashion (i.e. same on all shards), and utilizes the joinpoint type in caller to do so. Message-Id: <1456332856-23395-2-git-send-email-calle@scylladb.com>	2016-02-24 18:59:31 +02:00
Avi Kivity	efabb1a1d8	commitlog: fix buffer size calculation We were adding bool(buffer), instead of buffer.size(); exposed by making temporary_buffer::operator bool explicit.	2016-02-24 13:38:05 +02:00
Paweł Dziepak	1b52264dfd	batchlog_manager: use new canonical_mutation serializers Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-02-19 23:12:00 +00:00
Paweł Dziepak	89b75a02d4	commitlog: use IDL-based serialization for entries Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-02-19 23:11:59 +00:00
Paweł Dziepak	f548c75200	commitlog: move implementation to *.cc file Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-02-19 23:11:59 +00:00
Tomasz Grabiec	6709c0ac15	cql_serialization_format: Make it CQL protocol version aware We want to serialize it as a single number, the CQL binary protocol version to which it corresponds, so it needs to be aware of the version number.	2016-02-15 17:05:55 +01:00
Calle Wilund	18203a4244	database::truncate/drop: Move time stamp generation to shard Fixes #884 Time stamps for truncation must be generated after flush, either by splitting the truncate into two (or more) for-each-shard operations, or simply by doing time stamping per shard (this solution). We generate TS on each shard after flushing, and then rely on the actual stored value to be the highest time point generated. This should however, from batch replay point of view, be functionally equivalent. And not a problem.	2016-02-09 15:45:37 +00:00
Calle Wilund	ce66acc771	system_keyspace: Always retain highest truncation time stamp Since the table is written from all shards, and we possibly might have conflicting time stamps, we define the trucated_at time as the highest time point. I.e. conservative.	2016-02-09 15:45:37 +00:00
Calle Wilund	22a38f0025	db/serializer: Fix db::serializer<replay_position> format Should match struct/"official" serial format. (64+32) This serializer is however not really used any more and could be removed.	2016-02-09 15:45:37 +00:00
Calle Wilund	1c213e1f38	system_keyspace: Use IDL types + better verification of truncation record Truncation records are not portable between us and Origin. We need to detect and ensure we neither try to use, and more to the point, don't crash because of data format error when loading, origin records from a migrated system. This problem was seen by Tzach when doing a migration from an origin setup. Updated record storage to use IDL-serialized types + added versioning and magic marking + odd-size-checking to ensure we load only correct data. The code will also deal with records from an older version of scylla.	2016-02-09 15:45:37 +00:00
Gleb Natapov	63a5aa6122	prevent superfluous frozen_mutation copying Sometimes frozen_mutation is copied while it can be moved instead. Fix those cases. Message-Id: <20160204165708.GI6705@scylladb.com>	2016-02-07 10:54:16 +02:00
Gleb Natapov	c509e48674	Parallelize batchlog replay Current code is serialized by get_truncated_at(). Use map_reduce to make it run in parallel. Message-Id: <1454421603-13080-4-git-send-email-gleb@scylladb.com>	2016-02-02 17:08:54 +01:00
Gleb Natapov	42e3999a00	Check batchlog version before replaying In case batchlog serialization format changes check it before trying to interpret raw data. Message-Id: <1454421603-13080-3-git-send-email-gleb@scylladb.com>	2016-02-02 17:08:54 +01:00
Pekka Enberg	86173fb8cc	db/commitlog: Fix debug log format string in commitlog_replayer::recover() I saw the following Boost format string related warning during commitlog replay: INFO [shard 0] commitlog_replayer - Replaying node3/commitlog/CommitLog-1-72057594289748293.log, node3/commitlog/CommitLog-1-90071992799230277.log, node3/commitlog/CommitLog-1-108086391308712261.log, node3/commitlog/CommitLog-1-251820357.log, node3/commitlog/CommitLog-1-54043195780266309.log, node3/commitlog/CommitLog-1-36028797270784325.log, node3/commitlog/CommitLog-1-126100789818194245.log, node3/commitlog/CommitLog-1-18014398761302341.log, node3/commitlog/CommitLog-1-126100789818194246.log, node3/commitlog/CommitLog-1-251820358.log, node3/commitlog/CommitLog-1-18014398761302342.log, node3/commitlog/CommitLog-1-36028797270784326.log, node3/commitlog/CommitLog-1-54043195780266310.log, node3/commitlog/CommitLog-1-72057594289748294.log, node3/commitlog/CommitLog-1-90071992799230278.log, node3/commitlog/CommitLog-1-108086391308712262.log WARN [shard 0] commitlog_replayer - error replaying: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::io::too_many_args> > (boost::too_many_args: format-string referred to less arguments than were passed) While inspecting the code, I noticed that one of the error loggers is missing an argument. As I don't know how the original failure triggered, I wasn't able to verify that that was the only one, though. Message-Id: <1453893301-23128-1-git-send-email-penberg@scylladb.com>	2016-01-27 13:40:19 +02:00
Asias He	5003c6e78b	config: Introduce shutdown_announce_in_ms option Time a node waits after sending gossip shutdown message in milliseconds. Reduces ./cql_query_test execution time from real 2m24.272s user 0m8.339s sys 0m10.556s to real 1m17.765s user 0m3.698s sys 0m11.578	2016-01-27 11:19:38 +08:00
Calle Wilund	e6b792b2ff	commitlog bugfix: Fix batch mode Last series accidently broke batch mode. With new, fancy, potentitally blocking ways, we need to treat batch mode differently, since in this case, sync should always come _after_ alloc-write. Previous patch caused infinite loop. Broke jenkins. Message-Id: <1453821077-2385-1-git-send-email-calle@scylladb.com>	2016-01-26 17:13:14 +02:00
Glauber Costa	3f94070d4e	use auto&& instead of auto& for priority classes. By Avi's request, who reminds us that auto& is more suited for situations in which we are assigning to the variable in question. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <87c76520f4df8b8c152e60cac3b5fba5034f0b50.1453820373.git.glauber@scylladb.com>	2016-01-26 17:00:20 +02:00
Calle Wilund	89dc0f7be3	commitlog: wait for writes (if needed) on new segment as well Also check closed status in allocate, since alloc queue waiting could lead to us re-allocating in a segment that gets closed in between queue enter and us running the continuation. Message-Id: <1453811471-1858-1-git-send-email-calle@scylladb.com>	2016-01-26 15:05:12 +02:00
Calle Wilund	f2c5315d33	commitlog: Add write/flush limits Configured on start (for now - and dummy values at that). When shard write/flush count reaches limit, and incoming ops will queue until previous ones finish. Consequently, if an allocation op forces a write, which blocks, any other incoming allocations will also queue up to provide back pressure.	2016-01-26 10:19:24 +00:00
Calle Wilund	7628a4dfe0	commitlog: Add some feedback/measurement methods Suitable to derive "back pressure" from.	2016-01-26 09:47:14 +00:00
Calle Wilund	4f5bd4b64b	commitlog: split write/flush counters	2016-01-26 09:47:14 +00:00
Calle Wilund	215c8b60bf	commitlog: minor cleanup - remove red squiggles in eclipse	2016-01-26 09:42:26 +00:00
Glauber Costa	b63611e148	mark I/O operations with priority classes After this patch, our I/O operations will be tagged into a specific priority class. The available classes are 5, and were defined in the previous patch: 1) memtable flush 2) commitlog writes 3) streaming mutation 4) SSTable compaction 5) CQL query Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-01-25 15:20:38 -05:00
Vlad Zolotarov	de3bb01582	config: allow enabling the incremental backup via .yaml Enable the incremental_backups/--incremental-backups option. When enabled there will be a hard link created in the <column family directory>/backup directory for every flushed sstable. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-01-21 12:13:24 +02:00
Calle Wilund	59bf54d59a	commitlog_replayer: Modify logging to more match origin * Match origin log messages - Demote per-file printouts to "debug" level. * Print an all-files stat summary for whole replay (begin/summary) - At info level, like origin Prompted by dtest that expects origin log output. Message-Id: <1453216558-18359-1-git-send-email-calle@scylladb.com>	2016-01-19 17:19:52 +02:00

1 2 3 4 5 ...

618 Commits