scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 06:05:53 +00:00

Author	SHA1	Message	Date
Duarte Nunes	a3bbd52e2e	Merge 'Add materialized view metrics' from Piotr " This series introduces materialized view statistics, as stated in issue #3385: - updates pushed - updates failed - row lock stats It also addresses issue #3416 by decoupling user write stats from view update stats. " * 'materialized_view_metrics_9' of https://github.com/psarna/scylla: view: adapt view_stats to act as write stats storage_proxy: decouple write_stats from stats db: add row locking metrics view: add view metrics	2018-05-22 18:41:51 +01:00
Avi Kivity	49892a06b9	Merge "exception safety and minimum work for compaction controller" from Glauber " This was sent before as two separate patchsets. It is now unified because it has a lot of common infrastructure. In this patchset I am aiming at two goals: 1) Provide a minimum amount of shares for user-initiated operations like nodetool compact and nodetool cleanup 2) Be more robust with exceptions in the backlog tracker For the first, the main difference is that I now made the compaction controller a part of the compaction manager. It then becomes easy to consult with the compaction controller for the correct amount of shares those operations should have. In compaction_strategy.cc, the major_compaction_strategy object was actually already unused before. So instead of making use of it, which would require some form of information flow downwards about the backlog we need to export, I am creating a user-initiated backlog type inside the compaction manager. With the two changes described above everything is very well self-contained within the compaction manager and the implementation becomes trivial. For the second, I am now handling exceptions in two places: 1) the backlog computation. Those are const functions so if we just have a transient exception when compacting the backlog, all we need to do is return some fixed amount of shares and try again in the next adjustment window. 2) the process of adding / removing SSTables. Those are harder, since if we fail to manipulate the list we'll be left in an inconsistent state. The best approach is then to disable the backlog tracker and return a fixed amount of shares globally. Tests: unit (release) " * 'backlog-improvements-v3' of github.com:glommer/scylla: compaction_manager: disable backlog tracker if we see an exception backlog tracker: protect against exceptions in backlog calculation. STCS_backlog: protect against negative backlog STCS_backlog: remove unused attribute compaction strategy: move size tiered backlog to a header compaction_strategy: delete major_compaction_strategy class compaction: make sure that user-initiated compactions always have a minimum priority backlog_controller: add constants to represent a globally disabled controller backlog_controller: move compaction controller to the compaction manager backlog_controller: allow users to compute inverse function of shares	2018-05-22 18:35:42 +03:00
Piotr Sarna	3792bed3ed	view: adapt view_stats to act as write stats This commit adapts view_stats structure so it can be passed to storage_proxy as write stats. Thanks to that, mv replica updates will not interfere with user write metrics. As a side effect it also provides more stats to replica view updates. Closes #3385 Closes #3416	2018-05-22 16:52:58 +02:00
Piotr Sarna	1d590b3ca4	storage_proxy: decouple write_stats from stats This commit extracts metrics related to writes from stats structure, so it can be easily replaced later, e.g. for materialized view metrics. References #3385 References #3416	2018-05-22 16:52:58 +02:00
Piotr Sarna	9246bb36bc	db: add row locking metrics This commit adds statistics to row_locker class. Metrics are independendly counted for all lock types: row<->partition and exclusive<->shared. Metrics gathered: - total acquisitions - operations that wait on the lock - histogram of the time spent on waiting on this type of lock References #3385 References #3416	2018-05-22 16:52:58 +02:00
Piotr Sarna	49bebcfa25	view: add view metrics This commit introduces view statistics: - updates pushed to local/remote replicas - updates failed to be pushed to local/remote replicas Metrics are kept on per-table basis, i.e. updates_pushed_remote shows the number of total updates (mutations) pushed to all paired mv replicas that this particular table has. Every single update is taken into consideration, so if view update requires removing a row from one view and adding a row to another, it will be counted as 2 updates. References #3385 References #3416	2018-05-22 16:52:58 +02:00
Tomasz Grabiec	e554a39fbb	tests: memtable_snapshot_source: Fix compact() Compactor collects all currently active memtables and later replaces them with the merged result. The problem is that active memtable belongs to the input set during compaction and as a result mutations applied concurrently with compaction could be lost once compaction replaces the memtables. The fix is to open a new active memtable when compaction starts. Caused sporadic failures of row_cache_test.cc:test_continuity_is_populated_when_read_overlaps_with_older_version() Message-Id: <1526997724-13037-1-git-send-email-tgrabiec@scylladb.com>	2018-05-22 15:08:07 +01:00
Glauber Costa	d4e7783188	compaction_manager: disable backlog tracker if we see an exception If we see an exception when adding or removing SSTables from the backlog tracker, the backlog tracker can be inconsistent forever. It would be best if we act before that happens and disable the backlog tracker. Once the backlog tracker is disabled it will default to returning a fixed number of shares. We can either disable the backlog tracker or remove it. But if we remove it we can end up with a backlog of zero if that's the only tracker with a backlog. We then keep it registered but mark it as disabled. This also leaves room for recovery in some situations: we can recover the backlog by a doing a schema change in the column family that had the backlog disabled, for instance. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-22 09:36:32 -04:00
Glauber Costa	fde26ec633	backlog tracker: protect against exceptions in backlog calculation. Backlog calculations should be exception free, but there are at cases in which I can see they happening. One example is if some backlog tracker that uses temporary objects fails an allocation. Memory shortages can be specially pernicious: if we leave the responsibility of catching those to the individual backlog tracker, we will keep trying to make more allocations in the other backlog trackers if we have many column families. By handling it here we can stop that. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-22 09:36:22 -04:00
Glauber Costa	3e08bd17f0	STCS_backlog: protect against negative backlog A negative backlog can be interpreted as a very large backlog. Part of that is because we keep the total_size as an unsigned type, which is what we expect. But in case there is an issue-- like an exception that causes some SSTable not to be tracked then this size can become negative. Returning a zero backlog is better than allowing it to be interpreted as a giant number. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-22 09:36:22 -04:00
Glauber Costa	4b4e9f6c8c	STCS_backlog: remove unused attribute This attribute ended up being unused in the final version. Spotted now while reading the code for other purposes. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-22 09:36:22 -04:00
Glauber Costa	10046593be	compaction strategy: move size tiered backlog to a header It's very common to other strategies to include a SizeTiered step somehow inside their algorithms: LCS will do SizeTiered on L0, TWCS will do SizeTiered within a window, etc. To make it easier for those strategies to consume the SizeTiered backlog tracker, we will move that to its own file. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-22 09:36:22 -04:00
Glauber Costa	36ccb1dd7c	compaction_strategy: delete major_compaction_strategy class It was already unused before this series. In an earlier version I have used it to provide an ad-hoc backlog for major compactions. But now that this is done by the compaction manager, this class really isn't being used. And it is likely it won't be: major compaction is not a compaction strategy a user can choose, unlike the others that need to be built through make_compaction_strategy. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-22 09:33:59 -04:00
Glauber Costa	9320d6f17f	compaction: make sure that user-initiated compactions always have a minimum priority We have observed the following behavior with user initiated compactions, like major compactions: - if there are no writes, the backlog doesn't increase. - as compaction progresses the backlog decreases. - at some point, the backlog is so low that compaction barely makes any progress. Going forward, we should allow one to read from the generated partial SSTables, in which case this doesn't matter that much. But for user-iniated compactions we would like to guarantee a minimum baseline. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-22 09:33:25 -04:00
Glauber Costa	c55ab93178	backlog_controller: add constants to represent a globally disabled controller There are situations in which we want the controllers to stop working altogether. Usually that's when we have an unimplemented controller or some exception. We want to return fixed shares in this case, but this is a very different situation from when we want fixed shares for one backlog tracker: we want to return fixed shares, yes, but if we disable 200 backlog trackers (because they all failed, for instance), we don't want that fixed number x 200 to be our backlog. So the mechanism to globally disable the controller is still granted, and infinity is a good way to represent that. It's a float that the controller can easily test against. But actually using infinity in the code is confusing. People reading it may interpret it as the other way around from what it means, just meaning "a very large backlog". Let's turn that into a constant instead. It will help us convey meaning. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-22 09:25:23 -04:00
Glauber Costa	d758a416f8	backlog_controller: move compaction controller to the compaction manager There was recently an attempt to add minimum shares to major compactions which ended up being harder than it should be due to all the plumbing necessary to call the compaction controller from inside the compaction manager-- since it is currently a database object. We had this problem again when trying to return fixed shares in case of an exception. Taking a step back, all of those problems stem from the fact that the compaction controller really shouldn't be a part of the database: as it deals with compactions and its consequences it is a lot more natural to have it inside the compaction manager to begin with. Once we do that, all the aforementioned problems go away. So let's move there where it belongs. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-22 09:24:19 -04:00
Calle Wilund	62c3b4c429	commitlog: Ensure file objects are closed before object free Fixes #3446 Previously, only shutdown-synced objects where actually closed, which is wrong. This introduces yet another queue, processed together with the deletion objects, which ensures we explicitly close all objects that have been discarded. Message-Id: <20180521140456.32100-1-calle@scylladb.com>	2018-05-22 14:52:06 +03:00
Duarte Nunes	4b2fd8d6f2	Merge 'Use hinted handoff to replay missed updates from base to view' from Piotr "This series leverages hinted handoff for failed view replica updates." * 'materialized_view_updates_with_hh_5' of https://github.com/psarna/scylla: storage_proxy: enable hinted handoff for materialized views storage_proxy: make view updates use consistency_level::ANY	2018-05-22 11:24:37 +01:00
Paweł Dziepak	05c94bc98d	mutation_partition: do not dereference null in find_cell() row::find_cell() may be called for cells that do not exist in that row. In such case nullptr shall be returned, this patch makes sure that it is not dereferenced. Message-Id: <20180522091726.24396-1-pdziepak@scylladb.com>	2018-05-22 10:31:09 +01:00
Glauber Costa	d3f985ef46	backlog_controller: allow users to compute inverse function of shares There are some situations in which we want to force a specific amount of shares and don't have a backlog. We can provide a function to get that from the controller. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-05-21 19:35:07 -04:00
Avi Kivity	51f5599c75	Merge seastar upstream * seastar a6cb005...5da5d4e (6): > append_challenged_posix_file_impl: Ensure continuation uses non-stale object > utils: make make_visitor() public > tcp: Adjust receive window > tcp: Fix allowed sending size calculation in can_send > tcp: Fix assert in tcp::tcb::output_one > be more descriptive with failed syscalls for filesystem operations Contains alternative fix for #3446 (will also be fixed directly).	2018-05-21 20:35:30 +03:00
Piotr Sarna	f5d6326ced	storage_proxy: enable hinted handoff for materialized views This commit initializes and enables hinted handoff for materialized views, even if HH is not explicitly turned on in config. User writes still use hinted handoff only if it is explicitly enabled, while materialized views are allowed to use it unconditionally in order to store failed replica updates somewhere. Fixes #3383	2018-05-21 17:09:27 +02:00
Piotr Sarna	da0d458f5f	storage_proxy: make view updates use consistency_level::ANY This commit makes view replica updates internally use consistency level ANY, so in case an update fails it will fall back to hinted handoff. References #3383	2018-05-21 17:09:27 +02:00
Piotr Sarna	ba9e8a4f2c	tests: initialize hints directory for cql env This commit initializes hints_directory config value for cql_test_env. It's needed now because materialized views support force-enables hinted handoff. Message-Id: <2aadf35eee329c1f89977c4a55660f330bd9d591.1526914827.git.sarna@scylladb.com>	2018-05-21 18:06:01 +03:00
Botond Dénes	204f6fd478	test.py: print test args when listing failed tests This can be very helpful when a test only fails when run with some particular arguments. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <dac1f7e23afa904156e65c3bb3c8fd52b7e999ff.1526906955.git.bdenes@scylladb.com>	2018-05-21 17:28:18 +03:00
Avi Kivity	f9c2ff1f9c	install: prepare /etc directory install(1) creates missing directories on recent Fedora, but not on CentOS 7. This causes the RPM build (which installs to a pristine tree, without an existing /etc) to fail. Fix by setting up /etc. Tests: rpm (Fedora, CentOS) Message-Id: <20180520124937.20466-1-avi@scylladb.com>	2018-05-21 09:51:46 +02:00
Asias He	db8c3a7059	streaming: Do not use dht::split_ranges_to_shards There is no need to call dht::split_ranges_to_shards to split the token range into <shard> : <a lot of small ranges> mapping and create a flat mutation reader with a lot of small ranges. Because: 1) The flat mutation reader on each shard only returns data belongs to this local shard, there is no correctness issue if we do not split and feed the sub ranges only belongs to this local shard. 2) With murmur3_partitioner_ignore_msb_bits = 12, it is almost certain that given a token range, all the shards will have data for the range anyway. Even if we ask all the shards to work on the token range and some of the shards have no data for it, it is fine. We simply send no data from this shard. Tests: update_cluster_layout_tests.py Message-Id: <ac00cd21d6156c47b74451dd415d627481e48212.1526864222.git.asias@scylladb.com>	2018-05-21 10:42:45 +03:00
Takuya ASADA	5407c34c73	dist/debian: depends to coreutils instead of realpath on Ubuntu 18.04 On Ubuntu 18.04 realpath package is dropped, it becomes part of coreutils. Fixes #3445 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180521031954.30815-1-syuu@scylladb.com>	2018-05-21 10:42:05 +03:00
Asias He	0c54c6e16f	storage_service: Add node has left the cluster log Remove a node from the cluster is a major operation, it deserves a log for it. Add a log when node is removed from the cluster by `nodetool decommission` or `nodetool removenode`. Message-Id: <b6adf34492c8138296911f2b37b39e9dd8ed10a2.1523347916.git.asias@scylladb.com>	2018-05-19 21:47:05 +03:00
Asias He	e20038eb84	streaming: Handle stream_mutation rpc handler on all shards In streaming, the sender sends the mutations on all the local shards in parallel, it is possible that the receiver handle more than one such connection on the same shard. It is determined by where the tcp connection goes. Current rpc ignores the dest shard id when sending the rpc message. For instance, say node1 has 2 shards, node2 has 2 shards. Currently, we can end up with like this: Node 1 shard 0 -> Node 2 shard 1 Node 1 shard 1 -> Node 2 shard 1 It is better if we do: Node 1 shard 0 -> Node 2 shard 0 Node 1 shard 1 -> Node 2 shard 1 This patch solves this problem by let the handler always handle on shard = src_cpu_id % smp::count. If sender and receiver have the same shard config, it is completely distributed the work evenly. If sender and receiver do not have the same shard config, it is unavoidable some of the shard will do more work than the others. Tests: dtest update_cluster_layout_tests.py Message-Id: <911827bcf67459a07ec92623a9ed4c4fbba195ca.1524622375.git.asias@scylladb.com>	2018-05-19 21:08:25 +03:00
Calle Wilund	f69a52c475	storage_service: Add more error info to "isolate_on_error" shutdown Fixes #2793 Prints error handle class (commitlog or "other/disk") + exception type and message. While not exhaustive, at least gives a correlation point to (hopefully) other log printouts. Message-Id: <20180509081040.7676-1-calle@scylladb.com>	2018-05-19 21:06:03 +03:00
Piotr Jastrzebski	1520ffe7f5	sstables: check buffer size when reading vints Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <6ecbedae818fbef1f67a4472aba4ce443b9df0ee.1525888830.git.piotr@scylladb.com>	2018-05-19 21:01:45 +03:00
Avi Kivity	46a0109608	Merge "Support compression when writing SSTables 3.x." from Vladimir " For compression, SSTables 3.x format uses CRC32 for checksumming compressed chunks as well as for calculating the full file checksum. Also, while for older formats "full checksum" of a compressed data file means a combination of checksums of its compressed chunks, in SSTables 3.x this now reads literally and assumes the checkum of all bytes written, including per-chunk digests. Tests: unit {debug, release} " * 'projects/sstables-30/write-compression/v3' of https://github.com/argenet/scylla: tests: Add unit tests for writing compressed SSTables 3.x. tests: Validate Digest32.crc for SSTables 3.x write tests. tests: Fix invalid Digest file for write_counter_table test. sstables: Support writing compressed SSTables 3.0. sstables: Make compressed streams customizable on checksumming. sstables: Move checksum calculation logic to compressed_output_stream.	2018-05-19 20:52:08 +03:00
Vladimir Krivopalov	d588a7e743	tests: Add unit tests for writing compressed SSTables 3.x. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-19 20:52:08 +03:00
Vladimir Krivopalov	e5ab271863	tests: Validate Digest32.crc for SSTables 3.x write tests. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-19 20:52:08 +03:00
Vladimir Krivopalov	fcc7bad777	tests: Fix invalid Digest file for write_counter_table test. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-19 20:52:07 +03:00
Vladimir Krivopalov	dd00d90a05	sstables: Support writing compressed SSTables 3.0. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-19 20:52:07 +03:00
Vladimir Krivopalov	cc62ad3b69	sstables: Make compressed streams customizable on checksumming. Use either Adler32 or CRC32 while writing to or reading from a compressed stream. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-19 20:52:07 +03:00
Vladimir Krivopalov	5183294676	sstables: Move checksum calculation logic to compressed_output_stream. Previously, compressed_output_stream used to calculate checksum of the supplied chunk and pass it to the 'compression' object to combine with the full checksum calculated on prior writes. Now, all the checksum calculation happens inside compressed_output_stream and 'compression' only stores the result. This is done to loosen ties between two classes and simplify compressed_output_stream customisation with various checksum algorithms. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-05-19 20:52:07 +03:00
Glauber Costa	596a525950	commitlog: don't move pointer to segment We are currently moving the pointer we acquired to the segment inside the lambda in which we'll handle the cycle. The problem is, we also use that same pointer inside the exception handler. If an exception happens we'll access it and we'll crash. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180518125820.10726-1-glauber@scylladb.com>	2018-05-18 17:25:18 +02:00
Avi Kivity	684bb2042d	Merge "Fixes and improvements for gdb LSA commands" from Tomasz * tag 'tgrabiec/fixes-and-improvements-for-gdb-scripts-v1' of github.com:tgrabiec/scylla: gdb: Print live object size from 'scylla lsa-segment' gdb: Extend 'scylla segment-descs' output with full occupancy info gdb: Print allocated object's type name instead of full LSA migrator gdb: Fix LSA migrator discovery gdb: Drop code related to LSA zones gdb: Fix uses of removed segment_desctriptor::_lsa_managed lsa: Add use for debug::static_migrators	2018-05-17 15:54:21 +03:00
Tomasz Grabiec	d4a2d22812	gdb: Print live object size from 'scylla lsa-segment'	2018-05-17 14:22:20 +02:00
Tomasz Grabiec	08026a64c5	gdb: Extend 'scylla segment-descs' output with full occupancy info After: 0x600007220000: lsa free=24800 used=106272 81.08% region=0x600000403210 0x600007240000: lsa free=13 used=131059 99.99% region=0x600000403210 0x600007260000: lsa free=23072 used=108000 82.40% region=0x600000403210 0x600007280000: lsa free=16772 used=114300 87.20% region=0x600000403210 0x6000072a0000: lsa free=23996 used=107076 81.69% region=0x600000401410 0x6000072c0000: lsa free=15552 used=115520 88.13% region=0x600000403210	2018-05-17 14:22:20 +02:00
Tomasz Grabiec	abd667d924	gdb: Print allocated object's type name instead of full LSA migrator Before: 0x6000302604e0: live {_vptr.migrate_fn_type = 0x3797a00 <vtable for standard_migrator<cache_entry>+16>, _migrators = std::any containing seastar::lw_shared_ptr<(anonymous namespace)::migrators> = {[contained value] = {_p = 0x600000080a80}}, _align = 8, _index = 0} @ 0x6000302604e8 After: 0x6000302604e0: live cache_entry @ 0x6000302604e8	2018-05-17 14:22:14 +02:00
Tomasz Grabiec	653fcc10bb	gdb: Fix LSA migrator discovery Fixes 'scylla lsa-segment' which broke after recent changes, probably commit `b3699f286d`.	2018-05-17 14:22:14 +02:00
Tomasz Grabiec	bb8f82f43f	gdb: Drop code related to LSA zones LSA zones have been removed.	2018-05-17 14:22:14 +02:00
Tomasz Grabiec	84a7961c23	gdb: Fix uses of removed segment_desctriptor::_lsa_managed	2018-05-17 14:22:14 +02:00
Tomasz Grabiec	498a4132c5	lsa: Add use for debug::static_migrators Otherwise GDB complains about it being optimized out, breaking our debug scritps.	2018-05-17 14:22:14 +02:00
Avi Kivity	d9c80cac26	dist: move Red Hat installation from .spec %install to new install.sh Move code to a traditional install.sh script (more traditional would be a "make install", but this is close enough). This allows testing installation independently of packaging. In addition, non-Red Hat-packaging can share much of the code in install.sh. Ref #3243. Tests: build+install rpm Message-Id: <20180517114147.30863-1-avi@scylladb.com>	2018-05-17 13:46:27 +02:00
Avi Kivity	98967da94f	Merge seastar upstream * seastar 0a1a327...a6cb005 (1): > Merge " misc fixes for iotune" from Glauber	2018-05-17 12:42:46 +03:00

1 2 3 4 5 ...

15432 Commits