scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-28 10:41:12 +00:00

Author	SHA1	Message	Date
Botond Dénes	c274fdf2ec	querier: find_querier(): return end() when no querier matches the range When none of the queriers found for the lookup key match the lookup range `_entries.end()` should be returned as the search failed. Instead the iterator returned from the failed `std::find_if()` is returned which, if the find failed, will be the end iterator returned by the previous call to `_entries.equal_range()`. This is incorrect because as long as `equal_range()`'s end iterator is not also `_entries.end()` the search will always return an iterator to a querier regardless of whether any of them actually matches the read range. Fix by returning `_entries.end()` when it is detected that no queriers match the range. Fixes: #3530 (cherry picked from commit `2609a17a23`)	2018-06-28 18:55:15 +03:00
Botond Dénes	5b88d6b4d6	querier_cache: restructure entries storage Currently querier_cache uses a `std::unordered_map<utils::UUID, querier>` to store cache entries and an `std::list<meta_entry>` to store meta information about the querier entries, like insertion order, expiry time, etc. All cache eviction algorithms use the meta-entry list to evict entries in reverse insertion order (LRU order). To make this possible meta-entries keep an iterator into the entry map so that given a meta-entry one can easily erase the querier entry. This however poses a problem as std::unordered_map can possibly invalidate all its iterators when new items are inserted. This is use-after-free waiting to happen. Another disadvantages of the current solution is that it requires the meta-entry to use a weak pointer to the querier entry so that in case that is removed (as a result of a successful lookup) it doesn't try to access it. This has an impact on all cache eviction algorithms as they have to be prepared to deal with stale meta-entries. Stale meta-entries also unnecesarily consume memory. To solve these problems redesign how querier_cache stores entries completely. Instead of storing the entries in an `std::unordered_map` and storing the meta-entries in an `std::list`, store the entries in an `std::list` and an intrusive-map (index) for lookups. This new design has severeal advantages over the old one: * The entries will now be in insert order, so eviction strategies can work on the entry list itself, no need to involve additional data structures for this. * All data related to an entry is stored in one place, no data duplication. * Removing an entry automatically removes it from the index as intrusive containers support auto unlink. This means there is no need to store iterators for long terms, risking use-after-free when the container invalidates it's iterators. Additional changes: * Modify eviction strategies so that they work with the `entry` interface rather than the stored value directly. Ref #3424 (cherry picked from commit `7ce7f3f0cc`)	2018-06-28 18:55:15 +03:00
Botond Dénes	2d626e1cf8	tests/querier_cache: fix memory based eviction test Do increment the key counter after inserting the first querier into the cache. Otherwise two queriers with the same key will be inserted and will fail the test. This problem is exposed by the changes the next patches make to the querier-cache but will be fixed before to maintain bisectability of the code. Fixes: #3529 (cherry picked from commit `b9d51b4c08`)	2018-06-28 18:55:15 +03:00
Avi Kivity	c11bd3e1cf	Merge "Do not allow compaction controller shares to grow indefinitely" from Glauber " We are seeing some workloads with large datasets where the compaction controller ends up with a lot of shares. Regardless of whether or not we'll change the algorithm, this patchset handles a more basic issue, which is the fact that the current controller doesn't set a maximum explicitly, so if the input is larger than the maximum it will keep growing without bounds. It also pushes the maximum input point of the compaction controller from 10 to 30, allowing us to err on the side of caution for the 2.2 release. " * 'tame-controller' of github.com:glommer/scylla: controller: do not increase shares of controllers for inputs higher than the maximum controller: adjust constants for compaction controller (cherry picked from commit `e0eb66af6b`)	2018-06-20 10:58:20 +03:00
Avi Kivity	9df3df92bc	Merge "Try harder to move STCS towards zero-backlog" from Glauber " Tests: unit (release) Before merging the LCS controller, we merged patches that would guarantee that LCS would move towards zero backlog - otherwise the backlog could get too high. We didn't do the same for STCS, our first controlled strategy. So we may end up with a situation where there are many SSTables inducing a large backlog, but they are not yet meeting the minimum criteria for compaction. The backlog, then, never goes down. This patch changes the SSTable selection criteria so that if there is nothing to do, we'll keep pushing towards reaching a state of zero backlog. Very similar to what we did for LCS. " * 'stcs-min-threshold-v4' of github.com:glommer/scylla: STCS: bypass min_threshold unless configure to enforce strictly compaction_strategy: allow the user to tell us if min_threshold has to be strict (cherry picked from commit `f0fc888381`)	2018-06-18 14:21:52 +03:00
Takuya ASADA	8ad9578a6c	dist/debian: add --jobs <njobs> option just like build_rpm.sh On some build environment we may want to limit number of parallel jobs since ninja-build runs ncpus jobs by default, it may too many since g++ eats very huge memory. So support --jobs <njobs> just like on rpm build script. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180425205439.30053-1-syuu@scylladb.com> (cherry picked from commit `782ebcece4`)	2018-06-14 15:04:50 +03:00
Tomasz Grabiec	4cb6061a9f	tests: row_cache: Reduce concurrency limit to avoid bad_alloc The test uses random mutations. We saw it failing with bad_alloc from time to time. Reduce concurrency to reduce memory footprint. Message-Id: <20180611090304.16681-1-tgrabiec@scylladb.com> (cherry picked from commit `a91974af7a`)	2018-06-14 13:40:00 +02:00
Tomasz Grabiec	1940e6bd95	tests: row_cache: Do not hang when only one of the readers throws Message-Id: <20180531122729.3314-1-tgrabiec@scylladb.com> (cherry picked from commit `b5e42bc6a0`)	2018-06-14 13:40:00 +02:00
Avi Kivity	044cfde5f3	database: stop using incremental selectors There is a bug in incremental_selector for partitioned_sstable_set, so until it is found, stop using it. This degrades scan performance of Leveled Compaction Strategy tables. Fixes #3513. (as a workaround) Introduced: 2.1 Message-Id: <20180613131547.19084-1-avi@scylladb.com> (cherry picked from commit `aeffbb6732`)	2018-06-13 21:04:56 +03:00
Vlad Zolotarov	262a246436	locator::ec2_multi_region_snitch: don't call for ec2_snitch::gossiper_starting() ec2_snitch::gossiper_starting() calls for the base class (default) method that sets _gossip_started to TRUE and thereby prevents to following reconnectable_snitch_helper registration. Fixes #3454 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1528208520-28046-1-git-send-email-vladz@scylladb.com> (cherry picked from commit `2dde372ae6`)	2018-06-12 19:02:19 +03:00
Botond Dénes	799dbb4f2e	forwardable reader: implement fast_forward_to(position_in_partition) Instead of throwing std::bad_function_call. Needed by the foreign_reader unit test. Not sure how other tests didn't hit this before as the test is using `run_mutation_source_tests()`. (cherry picked from commit `50b67232e5`) Fixes #3491.	2018-06-05 12:34:15 +03:00
Shlomi Livne	a2fe669dd3	dist/docker: Switch to Scylla 2.2 repository Signed-off-by: Shlomi Livne <shlomi@scylladb.com> Message-Id: <83b4ff801b283ade512a7035ecea9057a864dcdd.1526995747.git.shlomi@scylladb.com>	2018-06-05 12:34:15 +03:00
Avi Kivity	56de761daf	Update seastar submodule * seastar 7c6ba3a...6f61d74 (1): > tls: Ensure handshake always drains output before return/throw Fixes #3461.	2018-06-05 12:34:15 +03:00
Shlomi Livne	c3187093a3	release: prepare for 2.2.rc2 Signed-off-by: Shlomi Livne <shlomi@scylladb.com>	2018-05-30 17:32:16 +03:00
Avi Kivity	111c2ecf5d	Update scylla-ami submodule * dist/ami/files/scylla-ami 49896ec...6ed71a3 (1): > scylla_install_ami: Update CentOS to latest version	2018-05-28 14:02:43 +03:00
Takuya ASADA	a6ecdbbba6	Revert "dist/ami: update CentOS base image to latest version" This reverts commit `69d226625a`. Since ami-4bf3d731 is Market Place AMI, not possible to publish public AMI based on it. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180523112414.27307-1-syuu@scylladb.com> (cherry picked from commit 6b1b9f9e602c570bbc96692d30046117e7d31ea7)	2018-05-28 13:40:15 +03:00
Glauber Costa	17cc62d0b3	commitlog: don't move pointer to segment We are currently moving the pointer we acquired to the segment inside the lambda in which we'll handle the cycle. The problem is, we also use that same pointer inside the exception handler. If an exception happens we'll access it and we'll crash. Probably #3440. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180518125820.10726-1-glauber@scylladb.com> (cherry picked from commit `596a525950`)	2018-05-19 19:12:26 +03:00
Shlomi Livne	eb646c61ed	release: prepare for 2.2.rc1 Signed-off-by: Shlomi Livne <shlomi@scylladb.com> scylla-2.2.rc1	2018-05-16 21:31:50 +03:00
Avi Kivity	782d817e84	dist: redhat: get rid of raid0.devices_discard_performance This parameter is not available on recent Red Hat kernels or on non-Red Hat kernels (it was removed on 3.10.0-772.el7, RHBZ 1455932). The presence of the parameter on kernels that don't support it cause the module load to fail, with the result that the storage is not available. Fix by removing the parameter. For someone running an older Red Hat kernel the effect will be that discard is disabled, but they can fix that by updating the kernel. For someone running a newer kernel, the effect will be that they can access their data. Fixes #3437. Message-Id: <20180516134913.6540-1-avi@scylladb.com> (cherry picked from commit `3b8118d4e5`)	2018-05-16 20:13:59 +03:00
Avi Kivity	3ed5e63e8a	Update scylla-ami submodule * dist/ami/files/scylla-ami 02b1853...49896ec (1): > Merge "AMI build fix" from Takuya	2018-05-16 12:37:03 +03:00
Tomasz Grabiec	d17ce46983	Update seastar submodule Fixes #3339. * seastar 491f994...7c6ba3a (2): > Merge "fix perftune.py issues with cpu-masks on big machines" from Vlad > Merge 'Handle Intel's NICs in a special way' from Vlad	2018-05-16 09:37:41 +02:00
Takuya ASADA	7ca5e7e993	dist/redhat: replace scylla-libgcc72/scylla-libstdc++72 with scylla-2.2 metapackage We have conflict between scylla-libgcc72/scylla-libstdc++72 and scylla-libgcc73/scylla-libstdc++73, need to replace *72 package with scylla-2.2 metapackage to prevent it. Fixes #3373 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180510081246.17928-1-syuu@scylladb.com> (cherry picked from commit `6fa3c4dcad`)	2018-05-11 09:42:28 +03:00
Duarte Nunes	07b0ce27fa	Merge 'Include OPTIONS with LIST ROLES' from Jesse " Fixes #3420. Tests: dtest (`auth_test.py`), unit (release) " * 'jhk/fix_3420/v2' of https://github.com/hakuch/scylla: cql3: Include custom options in LIST ROLES auth: Query custom options from the `authenticator` auth: Add type alias for custom auth. options (cherry picked from commit `d49348b0e1`)	2018-05-10 13:22:49 +03:00
Amnon Heiman	27be3cd242	scylla-housekeeping: support new 2018.1 path variation Starting from 2018.1 and 2.2 there was a change in the repository path. It was made to support multiple product (like manager and place the enterprise in a different path). As a result, the regular expression that look for the repository fail. This patch change the way the path is searched, both rpm and debian varations are combined and both options of the repository path are unified. See scylladb/scylla-enterprise#527 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20180429151926.20431-1-amnon@scylladb.com> (cherry picked from commit `6bf759128b`)	2018-05-09 15:22:55 +03:00
Calle Wilund	abf50aafef	database: Fix assert in truncate Fixes crash in cql_tests.StorageProxyCQLTester.table_test "avoid race condition when deleting sstable on behalf..." changed discard_sstables behaviour to only return rp:s for sstables owned and submitted for deletion (not all matching time stamp), which can in some cases cause zero rp returned. Message-Id: <20180508070003.1110-1-calle@scylladb.com>	2018-05-09 10:02:09 +01:00
Duarte Nunes	dfe5b38a43	db/view: Limit number of pending view updates This patch adds a simple and naive mechanism to ensure a base replica doesn't overwhelm a potentially overloaded view replica by sending too many concurrent view updates. We add a semaphore to limit to 100 the number of outstanding view updates. We limit globally per shard, and not per destination view replica. We also limit statically. Refs #2538 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180426134457.21290-2-duarte@scylladb.com> (cherry picked from commit `4b3562c3f5`)	2018-05-08 00:46:33 +01:00
Duarte Nunes	9bdc8c25f5	db/view: Return a future when sending view updates While we now send view mutations asynchronously in the normal view write path, other processes interested in sending view updates, such as streaming or view building, may wish to do it synchronously. Signed-off-by: Duarte Nunes <duarte@scylladb.com> (cherry picked from commit `dc44a08370`)	2018-05-08 00:46:19 +01:00
Duarte Nunes	e75c55b2db	db/timeout_clock: Properly scope type names Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180426134457.21290-1-duarte@scylladb.com> (cherry picked from commit `2be75bdfc9`)	2018-05-07 19:29:48 +01:00
Botond Dénes	756feae052	database: when dropping a table evict all relevant queriers Queriers shouldn't outlive the table they read from as that could lead to use-after-free problems when they are destroyed. Fixes: #3414 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <3d7172cef79bb52b7097596e1d4ebba3a6ff757e.1525716986.git.bdenes@scylladb.com> (cherry picked from commit `6f7d919470`)	2018-05-07 21:20:42 +03:00
Tomasz Grabiec	202b4e6797	storage_proxy: Request schema from the coordinator in the original DC The mutation forwarding intermediary (src_addr) may not always know about the schema which was used by the original coordinator. I think this may be the cause of the "Schema version ... not found" error seen in one of the clusters which entered some pathological state: storage_proxy - Failed to apply mutation from 1.1.1.1#5: std::_Nested_exception<schema_version_loading_failed> (Failed to load schema version 32893223-a911-3a01-ad70-df1eb2a15db1): std::runtime_error (Schema version 32893223-a911-3a01-ad70-df1eb2a15db1 not found) Fixes #3393. Message-Id: <1524639030-1696-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `423712f1fe`)	2018-05-07 13:08:40 +03:00
Raphael S. Carvalho	76ac200eff	database: avoid race condition when deleting sstable on behalf of cf truncate After removal of deletion manager, caller is now responsible for properly submitting the deletion of a shared sstable. That's because deletion manager was responsible for holding deletion until all owners agreed on it. Resharding for example was changed to delete the shared sstables at the end, but truncate wasn't changed and so race condition could happen when deleting same sstable at more than one shard in parallel. Change the operation to only submit a shared sstable for deletion in only one owner. Fixes dtest migration_test.TestMigration.migrate_sstable_with_schema_change_test Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180503193427.24049-1-raphaelsc@scylladb.com>	2018-05-04 13:10:12 +01:00
Tomasz Grabiec	9aa172fe8e	db: schema_tables: Treat drop of scylla_tables.version as an alter After upgrade from 1.7 to 2.0, nodes will record a per-table schema version which matches that on 1.7 to support the rolling upgrade. Any later schema change (after the upgrade is done) will drop this record from affected tables so that the per-table schema version is recalculated. If nodes perform a schema pull (they detect schema mismatch), then the merge will affect all tables and will wipe the per-table schema version record from all tables, even if their schema did not change. If then only some nodes get restarted, the restarted nodes will load tables with the new (recalculated) per-table schema version, while not restarted nodes will still use the 1.7 per-table schema version. Until all nodes are restarted, writes or reads between nodes from different groups will involve a needless exchange of schema definition. This will manifest in logs with repeated messages indicating schema merge with no effect, triggered by writes: database - Schema version changed to 85ab46cd-771d-36c9-bc37-db6d61bfa31f database - Schema version changed to 85ab46cd-771d-36c9-bc37-db6d61bfa31f database - Schema version changed to 85ab46cd-771d-36c9-bc37-db6d61bfa31f The sync will be performed if the receiving shard forgets the foreign version, which happens if it doesn't process any request referencing it for more than 1 second. This may impact latency of writes and reads. The fix is to treat schema changes which drop the 1.7 per-table schema version marker as an alter, which will switch in-memory data structures to use the new per-table schema version immediately, without the need for a restart. Fixes #3394 Tests: - dtest: schema_test.py, schema_management_test.py - reproduced and validated the fix with run_upgrade_tests.sh from git@github.com:tgrabiec/scylla-dtest.git - unit (release) Message-Id: <1524764211-12868-1-git-send-email-tgrabiec@scylladb.com> (cherry picked from commit `b1465291cf`)	2018-05-03 10:51:19 +03:00
Takuya ASADA	c4af043ef7	dist/common/scripts/scylla_raid_setup: prevent 'device or resource busy' on creating mdraid device According to this web site, there is possibility we have race condition with mdraid creation vs udev: http://dev.bizo.com/2012/07/mdadm-device-or-resource-busy.html And looks like it can happen on our AMI, too (see #2784). To initialize RAID safely, we should wait udev events are finished before and after mdadm executed. Fixes #2784 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1505898196-28389-1-git-send-email-syuu@scylladb.com> (cherry picked from commit `4a8ed4cc6f`)	2018-04-24 12:53:34 +03:00
Raphael S. Carvalho	06b25320be	sstables: Fix bloom filter size after resharding by properly estimating partition count We were feeding the total estimation partition count of an input shared sstable to the output unshared ones. So sstable writer thinks, from estimation, that each sstable created by resharding will have the same data amount as the shared sstable they are being created from. That's a problem because estimation is feeded to bloom filter creation which directly influences its size. So if we're resharding all sstables that belong to all shards, the disk usage taken by filter components will be multiplied by the number of shards. That becomes more of a problem with #3302. Partition count estimation for a shard S will now be done as follow: // // TE, the total estimated partition count for a shard S, is defined as // TE = Sum(i = 0...N) { Ei / Si }. // // where i is an input sstable that belongs to shard S, // Ei is the estimated partition count for sstable i, // Si is the total number of shards that own sstable i. Fixes #2672. Refs #3302. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180423151001.9995-1-raphaelsc@scylladb.com> (cherry picked from commit `11940ca39e`)	2018-04-24 12:53:34 +03:00
Takuya ASADA	ff70d9f15c	dist: Drop AmbientCapabilities from scylla-server.service for Debian 8 Debian 8 causes "Invalid argument" when we used AmbientCapabilities on systemd unit file, so drop the line when we build .deb package for Debian 8. For other distributions, keep using the feature. Fixes #3344 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180423102041.2138-1-syuu@scylladb.com> (cherry picked from commit `7b92c3fd3f`)	2018-04-24 12:53:34 +03:00
Avi Kivity	9bbd5821a2	Update scylla-ami submodule * dist/ami/files/scylla-ami 9b4be70...02b1853 (1): > scylla_install_ami: remove the host id file after scylla_setup	2018-04-24 12:53:34 +03:00
Avi Kivity	a7841f1f2e	release: prepare for 2.2.rc0	2018-04-18 11:08:43 +03:00
Takuya ASADA	84859e0745	dist/debian: use ~root as HOME to place .pbuilderrc When 'always_set_home' is specified on /etc/sudoers pbuilder won't read .pbuilderrc from current user home directory, and we don't have a way to change the behavor from sudo command parameter. So let's use ~root/.pbuilderrc and switch to HOME=/root when sudo executed, this can work both environment which does specified always_set_home and doesn't specified. Fixes #3366 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1523926024-3937-1-git-send-email-syuu@scylladb.com> (cherry picked from commit `ace44784e8`)	2018-04-17 09:38:43 +03:00
Avi Kivity	6b74e1f02d	Update seastar submodule * seastar bcfbe0c...491f994 (3): > tls: Ensure we always pass through semaphores on shutdown > cpu scheduler: don't penalize first group to run > reactor: fix sleep mode Fixes #3350.	2018-04-14 20:44:11 +03:00
Avi Kivity	520f17b315	Point seastar submodule at scylla-seastar.git This allows backporting seastar patches.	2018-04-14 20:43:28 +03:00
Gleb Natapov	9fe3d04f31	cql_server: fix a race between closing of a connection and notifier registration There is a race between cql connection closure and notifier registration. If a connection is closed before notification registration is complete stale pointer to the connection will remain in notification list since attempt to unregister the connection will happen to early. The fix is to move notifier unregisteration after connection's gate is closed which will ensure that there is no outstanding registration request. But this means that now a connection with closed gate can be in notifier list, so with_gate() may throw and abort a notifier loop. Fix that by replacing with_gate() by call to is_closed(); Fixes: #3355 Tests: unit(release) Message-Id: <20180412134744.GB22593@scylladb.com> (cherry picked from commit `1a9aaece3e`)	2018-04-12 16:57:07 +03:00
Raphael S. Carvalho	a74183eb1e	sstables/compaction_manager: do not break lcs invariant by not allowing parallel compaction for it After change to serialize compaction on compaction weight (`eff62bc61e`), LCS invariant may break because parallel compaction can start, and it's not currently supported for LCS. The condition is that weight is deregistered right before last sstable for a leveled compaction is sealed, so it may happen that a new compaction starts for the same column family meanwhile that will promote a sstable to an overlapping token range. That leads to strategy restoring invariant when it finds the overlapping, and that means wasted resources. The fix is about removing a fast path check which is incorrect now because we release weight early and also fixing a check for ongoing compaction which prevented compaction from starting for LCS whenever weight tracker was not empty. Fixes #3279. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180410034538.30486-1-raphaelsc@scylladb.com> (cherry picked from commit `638a647b7d`)	2018-04-10 20:59:48 +03:00
Raphael S. Carvalho	e059f17bf2	database: make sure sstable is also forwarded to shard responsible for its generation After `f59f423f3c`, sstable is loaded only at shards that own it so as to reduce the sstable load overhead. The problem is that a sstable may no longer be forwarded to a shard that needs to be aware of its existence which would result in that sstable generation being reallocated for a write request. That would result in a failure as follow: "SSTable write failed due to existence of TOC file for generation..." This can be fixed by forwarding any sstable at load to all its owner shards and the shard responsible for its generation, which is determined as follow: s = generation % smp::count Fixes #3273. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180405035245.30194-1-raphaelsc@scylladb.com> (cherry picked from commit `30b6c9b4cd`)	2018-04-05 10:58:29 +03:00
Duarte Nunes	0e8e005357	db/view: Reject view entries with non-composite, empty partition key Empty partition keys are not supported on normal tables - they cannot be inserted or queried (surprisingly, the rules for composite partition keys are different: all components are then allowed to be empty). However, the (non-composite) partition key of a view could end up being empty if that column is: a base table regular column, a base table clustering key column, or a base table partition key column, part of a composite key. Fixes #3262 Refs CASSANDRA-14345 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180403122244.10626-1-duarte@scylladb.com> (cherry picked from commit `ec8960df45`)	2018-04-03 17:20:33 +03:00
Glauber Costa	8bf6f39392	docker: default docker to overprovisioned mode. By default, overprovisioned is not enabled on docker unless it is explicitly set. I have come to believe that this is a mistake. If the user is running alone in the machine, and there are no other processes pinned anywhere - including interrupts - not running overprovisioned is the best choice. But everywhere else, it is not: even if a user runs 2 docker containers in the same machine and statically partitions CPUs with --smp (but without cpuset) the docker containers will pin themselves to the same sets of CPU, as they are totally unaware of each other. It is also very common, specially in some virtualized environments, for interrupts not to be properly distributed - being particularly keen on being delivered on CPU0, a CPU which Scylla will pin by default. Lastly, environments like Kubernetes simply don't support pinning at the moment. This patch enables the overprovisioned flag if it is explicitly set - like we did before - but also by default unless --cpuset is set. Fixes #3336. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180331142131.842-1-glauber@scylladb.com> (cherry picked from commit `ef84780c27`)	2018-04-02 17:07:20 +03:00
Glauber Costa	04ba51986e	parse and ignore background writer controller Unused options are not exposed as command line options and will prevent Scylla from booting when present, although they can still be pased over YAML, for Cassandra compatibility. That has never been a problem, but we have been adding options to i3 (and others) that are now deprecated, but were previously marked as Used. Systems with those options may have issues upgrading. While this problem is common to all Unused options, the likelihood for any other unused option to appear in the command line is near zero, except for those two - since we put them there ourselves. There are two ways to handle this issue: 1) Mark them as Used, and just ignore them. 2) Add them explicitly to boost program options, and then ignore them. The second option is preferred here, because we can add them as hidden options in program_options, meaning they won't show up in the help. We can then just print a discrete message saying that those options are, for now on ignored. v2: mark set as const (Botond) v3: rebase on top of master, identation suggested by Duarte. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180329145517.8462-1-glauber@scylladb.com> (cherry picked from commit `a9ef72537f`)	2018-03-29 17:57:43 +03:00
Asias He	1d5379c462	gossip: Relax generation max difference check start node 1 2 3 shutdown node2 shutdown node1 and node3 start node1 and node3 nodetool removenode node2 clean up all scylla data on node2 bootstrap node2 as a new node I saw node2 could not bootstrap stuck at waiting for schema information to compelte for ever: On node1, node3 [shard 0] gossip - received an invalid gossip generation for peer 127.0.0.2; local generation = 2, received generation = 1521779704 On node2 [shard 0] storage_service - JOINING: waiting for schema information to complete This is becasue in nodetool removenode operation, the generation of node1 was increased from 0 to 2. gossiper::advertise_removing () calls eps.get_heart_beat_state().force_newer_generation_unsafe(); gossiper::advertise_token_removed() calls eps.get_heart_beat_state().force_newer_generation_unsafe(); Each force_newer_generation_unsafe increases the generation by 1. Here is an example, Before nodetool removenode: ``` curl -X GET --header "Accept: application/json" "http://127.0.0.1:10000/failure_detector/endpoints/" \| python -mjson.tool { "addrs": "127.0.0.2", "generation": 0, "is_alive": false, "update_time": 1521778757334, "version": 0 }, ``` After nodetool revmoenode: ``` curl -X GET --header "Accept: application/json" "http://127.0.0.1:10000/failure_detector/endpoints/" \| python -mjson.tool { "addrs": "127.0.0.2", "application_state": [ { "application_state": 0, "value": "removed,146b52d5-dc94-4e35-b7d4-4f64be0d2672,1522038476246", "version": 214 }, { "application_state": 6, "value": "REMOVER,14ecc9b0-4b88-4ff3-9c96-38505fb4968a", "version": 153 } ], "generation": 2, "is_alive": false, "update_time": 1521779276246, "version": 0 }, ``` In gossiper::apply_state_locally, we have this check: ``` if (local_generation != 0 && remote_generation > local_generation + MAX_GENERATION_DIFFERENCE) { // assume some peer has corrupted memory and is broadcasting an unbelievable generation about another peer (or itself) logger.warn("received an invalid gossip generation for peer {}; local generation = {}, received generation = {}",ep, local_generation, remote_generation); } ``` to skip the gossip update. To fix, we relax generation max difference check to allow the generation of a removed node. After this patch, the removed node bootstraps successfully. Tests: dtest:update_cluster_layout_tests.py Fixes #3331 Message-Id: <678fb60f6b370d3ca050c768f705a8f2fd4b1287.1522289822.git.asias@scylladb.com> (cherry picked from commit `f539e993d3`)	2018-03-29 12:10:09 +03:00
Avi Kivity	cb5dc56bfd	Update scylla-ami submodule Ref #3332.	2018-03-29 10:35:54 +03:00
Duarte Nunes	b578b492cd	column_family: Don't retry flushing memtable if shutdown is requested Since we just keep retrying, this can cause Scylla to not shutdown for a while. The data will be safe in the commit log. Note that this patch doesn't fix the issue when shutdown goes through storage_service::drain_on_shutdown - more work is required to handle that case. Ref #3318. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180324140822.3743-3-duarte@scylladb.com> (cherry picked from commit `a985ea0fcb`)	2018-03-26 15:26:56 +03:00
Duarte Nunes	30c950a7f6	column_family: Increase scope of exception handling when flushing a memtable In column_family::try_flush_memtable_to_sstable, the handle_exception() block is on the inside of the continuations to write_memtable_to_sstable(), which, if it fails, will leave the sstable in the compaction_backlog_tracker::_ongoing_writes map, which will waste disk space, and that sstable will map to a dangling pointer to a destroyed database_sstable_write_monitor, which causes a seg fault when accessed (for example, through the backlog_controller, which accounts the _ongoing_writes when calculating the backlog). Fix this by increasing the scope of handle_exception(). Fixes #3315 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180324140822.3743-2-duarte@scylladb.com> (cherry picked from commit `50ad37d39b`)	2018-03-26 15:26:54 +03:00

1 2 3 4 5 ...

14876 Commits