scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-24 10:30:38 +00:00

Author	SHA1	Message	Date
Asias He	efa74dbae0	streaming: Do not send if the cf is deleted It is possible that a cf is deleted after we make the cf reader. Avoid sending them to avoid the unnecessary overhead to send them on the wire and the peer node to drop the received mutations.	2016-03-09 16:50:38 +08:00
Asias He	4abaacfc61	db: Introduce column_family_exists It is cheaper than throwing a no_such_column_family exception to test if a cf is gone, e.g., deleted.	2016-03-09 16:50:38 +08:00
Asias He	dca9e594cc	streaming: Remove the unused test code It is introduced in the early development of streaming. We have dtest for streaming now, drop it. Message-Id: <1457499303-21163-1-git-send-email-asias@scylladb.com>	2016-03-09 10:31:42 +02:00
Pekka Enberg	4f3d6977f1	Merge "Abort stream_session if peer is removed or restarted" from Asias "Hook streaming with gossip callback so we can abort the stream_session in such case: - a node is restarted - a node is removed from the cluster Fixes #1001."	2016-03-09 10:18:42 +02:00
Nadav Har'El	2f56577794	sstables: more efficient read of compressed data file Before this patch, reading large ranges from a compressed data file involved two inefficiencies: 1. The compressed data file was read one compressed chunk at a time. Such a chunk is around 30 KB in size, well below our desired sstable read-ahead size (sstable_buffer_size = 128 KB). 2. Because the compressed chunks have variable length (the uncompressed chunk has a fixed length) they are not aligned to disk blocks, so consecutive chunks have overlapping blocks which were unnecessarily read twice. The fix for both issues is to build the compressed_file_input_stream on an existing file_input_stream, instead of using direct file IO to read the individual chunks. file_input_stream takes care of doing the appropriate amount of read-ahead, and the compressed_file_input_stream layer does the decompression of the data read from the underlying layer. Fixes #992. Historical note: Implementing compressed_file_input_stream on top of file_input_stream was already tried in the past, and rejected. The problem at that time was that compressed_file_input_stream's constructor did not specify the end of the range to read, so that when we wanted to read only a small range we got too much read-ahead beyond the exactly one compressed chunk that we needed to read. Following the fix to issue #964, we now know on every streaming read also the intended end of the stream, so we can now use this to stop reading at the end of the last required chunk, even when we use a read-ahead buffer much larger than a chunk. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1457304335-8507-1-git-send-email-nyh@scylladb.com>	2016-03-09 10:14:15 +02:00
Glauber Costa	8260b8fc6f	touch CF directories during startup We try to be robust against files disappearing (due to any kind of corruption) inside the data directory. But if the data directory itself goes missing, that's a situation that we don't handle correctly. We will keep accepting writes normally, but when we try to flush the memtable to disk, we'll fail with a system error. Having the CF directory disappearing is not a common thing. But it is also one that we can easily protect against, by touching all CF directories we know about on startup. Fixes #999 Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <ed66373dccca11742150a6d08e21ece3980227d3.1457379853.git.glauber@scylladb.com>	2016-03-09 09:06:51 +02:00
Asias He	bf3507d093	messaging_service: Stop retrying if node is removed from gossip - Start a node - Inject data - Start another node to bootstrap - Before the second node finishes streaming, kill the second node - After a while the node will be removed from the cluster becusue it does not manage to join the cluster. - At this time, messaging_service might keep retrying the stream_mutations unncessarily. To fix, check if the peer node is still a known node in the gossip.	2016-03-09 07:35:20 +08:00
Asias He	1f3928c321	streaming: Hook streaming with gossip callback If the peer node of a stream_session is restarted or removed we should abort the streaming. It is better to hook gossip callback in the stream manager than in each streamm_session.	2016-03-09 07:35:20 +08:00
Glauber Costa	2cd756ae5e	repair: replace a magic number with another magic number In due time we will have to fix this, but as an interim step, let's use a "better" magic number. The problem with 100, is that as soon as the partitions start to go bigger, we're using too much memory. Since this is multiplied by the number of token ranges, and happens in every shard, the final number can become really big, and the amount of resources we use go up proportionally. This means that even we are mistaken about the new number (we probably are), in this case it is better to err on the side of a more conservative resource usage. Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <97158f3db5734916cee4ccf12eaa66e7402570bb.1457448855.git.glauber@scylladb.com>	2016-03-08 17:29:00 +02:00
Nadav Har'El	b7e29691c2	sstables: avoid index and data file over-reads When we do a streaming read that knows the expected end position of the read, we can use a large read-ahead buffer, and at the same time, stop reading at exactly the intended end (or small rounding of it to the DMA block size) and not waste resources blindly reading a large amount of data after the end just to fill the read-ahead buffer. The sstable reading code, both for reading the data file and the index file, created a file input stream without specifiying its end, thereby losing this optimization - so when a large buffer was used, we would get a large over-read. This patch fixes this, so sstable data file and index file are read using a file input stream which is a ware of its end. Fixes #964. Note that this patch does not change the behavior when reading a compressed data file. For compressed read, we did not have the problem of over-read in the first place, because chunks are read one by one. But we do have other sources of inefficiencies there (stemming, again, from the fact that the compressed chunks are read one by one), and I opened a separate issue #992 for that. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <1457219304-12680-1-git-send-email-nyh@scylladb.com>	2016-03-08 17:26:10 +02:00
Calle Wilund	8575f1391f	lists.cc: fix update insert of frozen list Fixes #967 Frozen lists are just atomic cells. However, old code inserted the frozen data directly as an atomic_cell_or_collection, which in turn meant it lacked the header data of a cell. When in turn it was handled by internal serialization (freeze), since the schema said is was not a (non-frozen) collection, we tried to look at frozen list data as cell header -> most likely considered dead. Message-Id: <1457432538-28836-1-git-send-email-calle@scylladb.com>	2016-03-08 13:48:45 +01:00
Pekka Enberg	81af486b69	Update scylla-ami submodule * dist/ami/files/scylla-ami d4a0e18...84bcd0d (1): > Add --ami parameter	2016-03-08 13:49:31 +02:00
Takuya ASADA	254b0fa676	dist: show message to use XFS for scylla data directory and also notify about developer mode, when iotune fails Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1457426286-15925-1-git-send-email-syuu@scylladb.com>	2016-03-08 12:20:33 +02:00
Pekka Enberg	83d82ea901	Merge "Fix Ubuntu package issues on AMI" from Takuya "This fixes bugs on Ubuntu package and AMI scripts, closes #991."	2016-03-08 11:51:30 +02:00
Takuya ASADA	18a27de3c8	dist: export all entries on /etc/default/scylla-server on Ubuntu Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2016-03-08 18:18:30 +09:00
Gleb Natapov	ce6d1a242a	storage_proxy: fix background_reads counter background_reads collectd counter was not always properly decremented. Fix it and streamline background read repair error handling. Message-Id: <20160307182255.GI4849@scylladb.com>	2016-03-07 19:41:09 +01:00
Yoav Kleinberger	1cd01cd2ab	tools/scyllatop: defend against curses "out of screen bounds" error Fixes issue #945 (hopefully) This issue was probably the result of trying to write outside the confines of the window. The views.Base class now defends against this. Signed-off-by: Yoav Kleinberger <yoav@scylladb.com> Message-Id: <9735806b211567f3239e187d87437c484f532291.1457265435.git.yoav@scylladb.com>	2016-03-07 18:02:26 +01:00
Raphael S. Carvalho	0f4239d63a	service: improve logging of storage_service::load_new_sstables Closes #952. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <2402f387c32d2d1221e740edb67e56c1593c1936.1457366098.git.raphaelsc@scylladb.com>	2016-03-07 18:01:52 +01:00
Raphael S. Carvalho	e850c1406e	sstables: update comment Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <8abc1c6c66ed8d3bb35ecfb6d8251de3f61a97ae.1457093016.git.raphaelsc@scylladb.com>	2016-03-07 17:36:34 +01:00
Raphael S. Carvalho	822759eee0	compaction_manager: update stat pending_tasks properly Size of both _cfs_to_cleanup and _cfs_to_compact must be added when calculating a new value to _stats.pending_tasks. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <b601e24d0631922798575f39d00fb54fe00d4971.1457093016.git.raphaelsc@scylladb.com>	2016-03-07 17:36:03 +01:00
Gleb Natapov	2d092bbd32	storage_proxy: send read requests with timeout No need to wait for replies long after request is timed out. Message-Id: <1457351304-28721-2-git-send-email-gleb@scylladb.com>	2016-03-07 14:00:11 +01:00
Gleb Natapov	4122422d19	storage_proxy: always wait for digest read resolver done future Currently it is waited upon only if background read repair check is needed and this cause unhandled exception warning to be printed if it enters failed state. Fix this by always waiting on it, but doing anything beyond ignoring an exception only if check is needed. Message-Id: <1457351304-28721-1-git-send-email-gleb@scylladb.com>	2016-03-07 14:00:09 +01:00
Gleb Natapov	626c9d046b	fix EACH_QUORUM handling during bootstrapping Currently write acknowledgements handling does not take bootstrapping node into account for CL=EACH_QUORUM. The patch fixes it. Fixes #994 Message-Id: <20160307121620.GR2253@scylladb.com>	2016-03-07 13:56:34 +01:00
Raphael S. Carvalho	d65642cee8	fix storage_service::load_new_sstables() to not disable write permanently Avi says: "If an exception happens, then enable_sstable_writes won't be called." The problem is fixed by catching a possible exception and enabling sstable write for the relevant column family if it wasn't enabled already. Closes #953. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <32c1bcb2c60c7b9e5514eb0a95062f40ca92093a.1457119308.git.raphaelsc@scylladb.com>	2016-03-07 13:56:02 +01:00
Gleb Natapov	f59415b3c6	Take pending endpoints into account while checking for sufficient live nodes During bootstrapping additional copies of data has to be made to ensure that CL level is met (see CASSANDRA-833 for details). Our code does that, but it does not take into account that bootstraping node can be dead which may cause request to proceed even though there is no enough live nodes for it to be completed. In such a case request neither completes nor timeouts, so it appear to be stuck from CQL layer POV. The patch fixes this by taking into account pending nodes while checking that there are enough sufficient live nodes for operation to proceed. Fixes #965 Message-Id: <20160303165250.GG2253@scylladb.com>	2016-03-07 13:30:13 +01:00
Gleb Natapov	8dad399256	log: add space between log level and date in the outpu It was dropped by `6dc51027a3` Message-Id: <20160306125313.GI2253@scylladb.com>	2016-03-07 13:06:06 +01:00
Tomasz Grabiec	9deb036e4e	Merge branch 'dev/issue-845-set-incremental-backup-config-v1' from seastar-dev.git From Vlad: This series modifies the 'database' class to use the internal _enable_incremental_backups value (initialized with 'incremental_backups' configuration value) instead of using the 'incremental_backups' configuration value directly. Then we update this internal value in runtime from 'nodetool enable/disablebackup' API callback so that newly created keyspaces and column families use the newly configured incremental backup configuration.	2016-03-07 10:47:20 +01:00
Tomasz Grabiec	b3e56549ca	Merge branch 'dev/issue-909-synchronization-part-v2' from seatar-dev.git From Vlad: This series fixes the first part of issue #909 (the second part has a separate github issue #965) which is a discrepancy between a storage_service::token_metadata and a gossiper::endpoint_state_map contents on non-zero shards.	2016-03-07 10:20:15 +01:00
Paweł Dziepak	99b61d3944	lsa: set _active to nullptr in region destructor In region destructor, after active segments is freed pointer to it is left unchanged. This confuses the remaining parts of the destructor logic (namely, removal from region group) which may rely on the information in region_impl::_active. In this particular case the problem was that code removing from the region group called region_impl::occupancy() which was dereferencing _active if not null. Fixes #993. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1457341670-18266-1-git-send-email-pdziepak@scylladb.com>	2016-03-07 10:15:28 +01:00
Takuya ASADA	9ee14abf24	dist: export sysconfig for scylla-io-setup.service Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2016-03-07 18:13:30 +09:00
Takuya ASADA	3d9dc52f5f	Revert "Revert "dist: align ami option with others (-a --> --ami)"" This reverts commit `66c5feb9e9`. Conflicts: dist/common/scripts/scylla_sysconfig_setup Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2016-03-07 18:13:30 +09:00
Takuya ASADA	c9882bc2c4	Revert "Revert "Revert "dist: remove AMI entry from sysconfig, since there is no script refering it""" This reverts commit `643beefc8c`. Conflicts: dist/common/scripts/scylla_sysconfig_setup dist/common/sysconfig/scylla-server Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2016-03-07 17:15:42 +09:00
Takuya ASADA	c888eaac74	dist: add /etc/scylla.d/io.conf on Ubuntu Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2016-03-07 17:15:42 +09:00
Vlad Zolotarov	2cd836a02e	api::set_storage_service(): fix the 'nodetool enablebackup' API 'nodetool enable/disablebackup' callback was modifying only the existing keyspaces and column families configurations. However new keyspaces/column families were using the original 'incremental_backups' configuration value which could be different from the value configured by 'nodetool enable/disablebackup' user command. This patch updates the database::_enable_incremental_backups per-shard value in addition to updating the existing keyspaces and column families configurations. Fixes #845 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-03-06 17:26:31 +02:00
Vlad Zolotarov	a45ecaf336	database: store "incremental backup" configuration value in per-shard instance Store the "incremental_backups" configuration value in the database class (and use it when creating a keyspace::config) in order to be able to modify it in runtime. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-03-06 17:22:48 +02:00
Vlad Zolotarov	87e6efcdab	storage_service: distribute gossiper::endpoint_state_map together with token_metadata If storage_service::token_metadata is not distributed together with gossiper::endpoint_state_map there may be a situation when a non-zero shard sees a new value in token_metadata (e.g. newly added node's token ranges) while still seeing an old gossiper::endpoint_state_map contents (e.g. a mentioned above newly added node may not be present, thus causing gossiper::is_alive() to return FALSE for that node, while the node is actually alive and kicking). To avoid this discrepancy we will always update a token_metadata together with an endpoint_state_map when we distribute new token_metadata data among shards. Fixes #909 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-03-06 13:15:19 +02:00
Vlad Zolotarov	3a72ef87f2	gossiper: make _shadow_endpoint_state_map public and rename We will need to access it from a storage_service class when replicate token_metadata. Rename _shadow_endpoint_state_map -> shadow_endpoint_state_map according to our coding convention. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-03-06 11:16:44 +02:00
Vlad Zolotarov	4a21d48cc5	gossiper: use a semaphore instead of a future<> for serializing a timer callback Use a semaphore to allow serializing with a gossiper's timer callback. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-03-06 11:16:44 +02:00
Takuya ASADA	6dc51027a3	log: make log.cc able to compile with g++-4.9 std::put_time() is not implemented on g++-4.9, so replace it with strftime(). Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1457024183-893-1-git-send-email-syuu@scylladb.com>	2016-03-04 12:48:43 +01:00
Avi Kivity	6c2e57b003	Merge seastar upstream * seastar ba615c7...906b562 (1): > rpc: prepare some more for feature negotiation	2016-03-03 18:22:57 +02:00
Gleb Natapov	b89b6f442b	storage_proxy: fix race between read cl completion and timeout in digest resolver If timeout happens after cl promise is fulfilled, but before continuation runs it removes all the data that cl continuation needs to calculate result. Fix this by calculating result immediately and returning it in cl promise instead of delaying this work until continuation runs. This has a nice side effect of simplifying digest mismatch handling and making it exception free. Fixes #977. Message-Id: <1457015870-2106-3-git-send-email-gleb@scylladb.com>	2016-03-03 16:48:28 +02:00
Gleb Natapov	e4ac5157bc	storage_proxy: store only one data reply in digest resolver. Read executor may ask for more than one data reply during digest resolving stage, but only one result is actually needed to satisfy a query, so no need to store all of them. Message-Id: <1457015870-2106-2-git-send-email-gleb@scylladb.com>	2016-03-03 16:47:53 +02:00
Gleb Natapov	69b61b81ce	storage_proxy: fix cl achieved condition in digest resolver timeout handler In digest resolver for cl to be achieved it is not enough to get correct number of replies, but also to have data reply among them. The condition in digest timeout does not check that, fortunately we have a variable that we set to true when cl is achieved, so use it instead. Message-Id: <1457015870-2106-1-git-send-email-gleb@scylladb.com>	2016-03-03 16:47:11 +02:00
Tomasz Grabiec	2abd62b5cb	bytes_ostream: Drop methods which serialize integers This will make bytes_ostream completely agnostic to serialization format, which should be determined by layer above it. Message-Id: <1457004221-8345-2-git-send-email-tgrabiec@scylladb.com>	2016-03-03 13:27:27 +02:00
Tomasz Grabiec	aaac2a3cec	serializer: Add missing include Message-Id: <1457004221-8345-1-git-send-email-tgrabiec@scylladb.com>	2016-03-03 13:27:22 +02:00
Pekka Enberg	9c930d88a0	db/system_keyspace: Remove ifdef'd code We have our implementations of all the three ifdef'd functions. Message-Id: <1456926917-12594-1-git-send-email-penberg@scylladb.com>	2016-03-03 12:26:50 +02:00
Takuya ASADA	da56325f69	configure.py: add support --static-stdc++ for seastar binaries (iotune) Ubuntu 14.04LTS package is broken now because iotune does not statically linked against libstdc++, so this patch fixed it. Requires seastar patch to add --static-stdc++ on configure.py. Fixes #982 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1456995050-22007-1-git-send-email-syuu@scylladb.com>	2016-03-03 12:18:47 +02:00
Avi Kivity	d4c92c7e27	Merge seastar upstream * seastar b3fc7c5...ba615c7 (1): > configure.py: add --static-stdc++ to link libstdc++ statically	2016-03-03 12:18:23 +02:00
Asias He	01cb6b0d42	gossip: Send syn message in parallel and do not wait for it 1) As explained in commit `697b16414a` (gossip: Make gossip message handling async), in each gossip round we can make talking to the 1-3 peer nodes in parallel to reduce latency of gossip round. 2) Gossip syn message uses one way rpc message, but now the returned future of the one way message is ready only when message is dequeued for some reason (sent or dropped). If we wait for the one way syn messge to return it might block the gossip round for a unbounded time. To fix, do not wait for it in the gossip round. The downside is there will be no back pressure to bound the syn messages, however since the messages are once per second, I think it is fine. Message-Id: <ea4655f121213702b3f58185378bb8899e422dd1.1456991561.git.asias@scylladb.com>	2016-03-03 11:17:50 +02:00
Takuya ASADA	e545013e47	Revert "dist: downgrade g++ to 4.9 on Ubuntu" This reverts commit `01bd4959ac`. Fixes #983 Conflicts: dist/ubuntu/build_deb.sh dist/ubuntu/control.in dist/ubuntu/rules.in Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1456996244-19889-1-git-send-email-syuu@scylladb.com>	2016-03-03 11:12:18 +02:00

1 2 3 4 5 ...

8793 Commits