Compare commits

...

62 Commits

Author SHA1 Message Date
Gleb Natapov
2d630e068b mutation_query_test: add test for result size calculation
Check that digest only and digest+data query calculate result size to be
the same.

Message-Id: <20180906153800.GK2326@scylladb.com>
(cherry picked from commit 9e438933a2)
2018-09-08 18:55:23 +03:00
Gleb Natapov
5a8e9698d8 mutation_partition: accurately account for result size in digest only queries
When measuring_output_stream is used to calculate result's element size
it incorrectly takes into account not only serialized element size, but
a placeholder that ser::qr_partition__rows/qr_partition__static_row__cells
constructors puts in the beginning. Fix it by taking starting point in a
stream before element serialization and subtracting it afterwords.

Fixes #3755

Message-Id: <20180906153609.GJ2326@scylladb.com>
(cherry picked from commit d7674288a9)
2018-09-08 18:55:23 +03:00
Gleb Natapov
64f1aa8d99 mutation_partition: correctly measure static row size when doing digest calculation
The code uses incorrect output stream in case only digest is requested
and thus getting incorrect data size. Failing to correctly account
for static row size while calculating digest may cause digest mismatch
between digest and data query.

Fixes #3753.

Message-Id: <20180905131219.GD2326@scylladb.com>
(cherry picked from commit 98092353df)
2018-09-06 16:51:31 +03:00
Eliran Sinvani
280e6eedb9 cql3: ensure repeated values in IN clauses don't return repeated rows
When the list of values in the IN list of a single column contains
duplicates, multiple executors are activated since the assumption
is that each value in the IN list corresponds to a different partition.
this results in the same row appearing in the result number times
corresponding to the duplication of the partition value.

Added queries for the in restriction unitest and fixed with a bad result check.

Fixes #2837
Tests: Queries as in the usecase from the GitHub issue in both forms ,
prepared and plain (using python driver),Unitest.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <ad88b7218fa55466be7bc4303dc50326a3d59733.1534322238.git.eliransin@scylladb.com>
(cherry picked from commit d734d316a6)
2018-08-26 15:52:18 +03:00
Tomasz Grabiec
f80f15a6af Merge 'Fix multi-cell static list updates in the presence of ckeys' from Duarte
Fixes a regression introduced in
9e88b60ef5, which broke the lookup for
prefetched values of lists when a clustering key is specified.

This is the code that was removed from some list operations:

 std::experimental::optional<clustering_key> row_key;
 if (!column.is_static()) {
   row_key = clustering_key::from_clustering_prefix(*params._schema, prefix);
 }
 ...
 auto&& existing_list = params.get_prefetched_list(m.key().view(), row_key, column);

Put it back, in the form of common code in the update_parameters class.

Fixes #3703

* https://github.com/duarten/scylla cql-list-fixes/v1:
  tests/cql_query_test: Test multi-cell static list updates with ckeys
  cql3/lists: Fix multi-cell static list updates in the presence of ckeys
  keys: Add factory for an empty clustering_key_prefix_view

(cherry picked from commit 6937cc2d1c)
2018-08-21 17:37:36 +01:00
Duarte Nunes
d0eb0c0b90 cql3/query_options: Use _value_views in prepare()
_value_views is the authoritative data structure for the
client-specified values. Indeed, the ctor called
transport::request::read_options() leaves _values completely empty.

In query_options::prepare() we were, however, using _values to
associated values to the client-specified column names, and not
_value_views. Fix this by using _value_views instead.

As for the reasons we didn't see this bug earlier, I assume it's
because very few drivers set the 0x04 query options flag, which means
column names are omitted. This is the right thing to do since most
drivers have enough information to correctly position the values.

Fixes #3688

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180814234605.14775-1-duarte@scylladb.com>
(cherry picked from commit a4355fe7e7)
2018-08-21 18:24:06 +03:00
Jesse Haber-Kucharsky
1427c4d428 auth: Don't use unsupported hashing algorithms
In previous versions of Fedora, the `crypt_r` function returned
`nullptr` when a requested hashing algorithm was not supported.

This is consistent with the documentation of the function in its man
page.

As of Fedora 28, the function's behavior changes so that the encrypted
text is not `nullptr` on error, but instead the string "*0".

The info pages for `crypt_r` clarify somewhat (and contradict the man
pages):

    Some implementations return `NULL` on failure, and others return an
    _invalid_ hashed passphrase, which will begin with a `*` and will
    not be the same as SALT.

Because of this change of behavior, users running Scylla on a Fedora 28
machine which was upgraded from a previous release would not be able to
authenticate: an unsupported hashing algorithm would be selected,
producing encrypted text that did not match the entry in the table.

With this change, unsupported algorithms are correctly detected and
users should be able to continue to authenticate themselves.

Fixes #3637.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <bcd708f3ec195870fa2b0d147c8910fb63db7e0e.1533322594.git.jhaberku@scylladb.com>
(cherry picked from commit fce10f2c6e)
2018-08-05 10:30:58 +03:00
Gleb Natapov
034f2cb42d cache_hitrate_calculator: fix race when new table is added during calculations
The calculation consists of several parts with preemption point between
them, so a table can be added while calculation is ongoing. Do not
assume that table exists in intermediate data structure.

Fixes #3636

Message-Id: <20180801093147.GD23569@scylladb.com>
(cherry picked from commit 44a6afad8c)
2018-08-01 14:30:58 +03:00
Amos Kong
e043a5c276 scylla_setup: fix conditional statement of silent mode
Commit 300af65555 introdued a problem in
conditional statement, script will always abort in silent mode, it doesn't
care about the return value.

Fixes #3485

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <1c12ab04651352964a176368f8ee28f19ae43c68.1528077114.git.amos@scylladb.com>
(cherry picked from commit 364c2551c8)
2018-07-25 12:34:11 +03:00
Takuya ASADA
5da9bd3a6e dist/common/scripts/scylla_setup: abort running script when one of setup failed in silent mode
Current script silently continues even one of setup fails, need to
abort.

Fixes #3433

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180522180355.1648-1-syuu@scylladb.com>
(cherry picked from commit 300af65555)
2018-07-25 12:34:11 +03:00
Avi Kivity
3578027e2e Merge "row_cache: Fix violation of continuity on concurrent eviction and population" from Tomasz
"
The problem happens under the following circumstances:

  - we have a partially populated partition in cache, with a gap in the middle

  - a read with no clustering restrictions trying to populate that gap

  - eviction of the entry for the lower bound of the gap concurrent with population

The population may incorrectly mark the range before the gap as continuous.
This may result in temporary loss of writes in that clustering range. The
problem heals by clearing cache.

Caught by row_cache_test::test_concurrent_reads_and_eviction, which has been
failing sporadically.

The problem is in ensure_population_lower_bound(), which returns true if
current clustering range covers all rows, which means that the populator has a
right to set continuity flag to true on the row it inserts. This is correct
only if the current population range actually starts since before all
clustering rows. Otherwise, we're populating since _last_row and should
consult it.

Fixes #3608.
"

* 'tgrabiec/fix-violation-of-continuity-on-concurrent-read-and-eviction' of github.com:tgrabiec/scylla:
  row_cache: Fix violation of continuity on concurrent eviction and population
  position_in_partition: Introduce is_before_all_clustered_rows()

(cherry picked from commit 31151cadd4)
2018-07-25 12:34:11 +03:00
Shlomi Livne
7d2150a057 release: prepare for 2.1.6
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2018-07-01 22:35:26 +03:00
Avi Kivity
afd3c571cc Merge "Backport Disable sstable filtering based on min/max clustering key components" to 2.1" from Tomasz
"
Changes made:
  - switched the test to use do_with_cql_env_thread due to lack of SEASTAR_TEST_CASE_THREAD macro
  - imported make_local_key() from master, needed for the database_test to pass
"

* tag 'tgrabiec/disable-min-max-sstable-filtering-v1-branch-2.1' of github.com:tgrabiec/scylla:
  Merge "Disable sstable filtering based on min/max clustering key components" from Tomasz
  tests: simple_schema: Generate local keys form make_pkeys()
  tests: Import make_local_key() from master
2018-06-28 12:41:00 +03:00
Avi Kivity
093c8512db Merge "Disable sstable filtering based on min/max clustering key components" from Tomasz
"
With DateTiered and TimeWindow, there is a read optimization enabled
which excludes sstables based on overlap with recorded min/max values
of clustering key components. The problem is that it doesn't take into
account partition tombstones and static rows, which should still be
returned by the reader even if there is no overlap in the query's
clustering range. A read which returns no clustering rows can
mispopulate cache, which will appear as partition deletion or writes
to the static row being lost. Until node restart or eviction of the
partition entry.

There is also a bad interaction between cache population on read and
that optimization. When the clustering range of the query doesn't
overlap with any sstable, the reader will return no partition markers
for the read, which leads cache populator to assume there is no
partition in sstables and it will cache an empty partition. This will
cause later reads of that partition to miss prior writes to that
partition until it is evicted from cache or node is restarted.

Disable until a more elaborate fix is implemented.

Fixes #3552
Fixes #3553
"

* tag 'tgrabiec/disable-min-max-sstable-filtering-v1' of github.com:tgrabiec/scylla:
  tests: Add test for slicing a mutation source with date tiered compaction strategy
  tests: Check that database conforms to mutation source
  database: Disable sstable filtering based on min/max clustering key components

(cherry picked from commit e1efda8b0c)
2018-06-28 11:10:41 +02:00
Tomasz Grabiec
9c0b8ec736 tests: simple_schema: Generate local keys form make_pkeys()
Extracted from commit 2b0b703615
2018-06-28 11:10:41 +02:00
Tomasz Grabiec
1794b732b0 tests: Import make_local_key() from master
Imported from master at 8a25bd467c69df94ea3f3638b42d36beee20adf0
2018-06-28 11:10:41 +02:00
Avi Kivity
c1ac4fb8b0 Update seastar submodule
* seastar 2a2c1d2...c89c8b8 (1):
  > tests/test-utils: Add macro for running tests within a seastar thread

Needed for tests in the following patch.
2018-06-28 10:00:05 +03:00
Asias He
2e7e59fb50 gossip: Fix tokens assignment in assassinate_endpoint
The tokens vector is defined a few lines above and is needed outsie the
if block.

Do not redefine it again in the if block, otherwise the tokens will be empty.

Found by code inspection.

Fixes #3551.

Message-Id: <c7a06375c65c950e94236571127f533e5a60cbfd.1530002177.git.asias@scylladb.com>
(cherry picked from commit c3b5a2ecd5)
2018-06-27 12:00:58 +03:00
Vladimir Krivopalov
af29d4bed3 Fix Scylla compilation with Crypto++ v6.
In Crypto++ v6, the `byte` typedef has been moved from the global
namespace to the `CryptoPP::` namespace.

This fix brings in the CryptoPP namespace so that the `byte` typedef is
seen with both old and new versions of Crypto++.

Fixes #3252.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <799d055be710231884d101a52c0be8ed8b0a9806.1520125889.git.vladimir@scylladb.com>
(cherry picked from commit 99bd5180ba)
2018-06-25 17:49:32 +03:00
Shlomi Livne
72494bbe05 release: prepare for 2.1.5
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2018-06-19 09:05:55 +03:00
Avi Kivity
5784823888 Update scylla-ami submodule
* dist/ami/files/scylla-ami c5d9e96...0df779d (1):
  > scylla_install_ami: Update CentOS to latest version

Fixes #3523.
2018-06-17 12:12:21 +03:00
Takuya ASADA
a7633be1a9 Revert "dist/ami: update CentOS base image to latest version"
This reverts commit 69d226625a.
Since ami-4bf3d731 is Market Place AMI, not possible to publish public AMI based on it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180523112414.27307-1-syuu@scylladb.com>
(cherry picked from commit 55d6be9254)
2018-06-17 11:33:55 +03:00
Takuya ASADA
e78ded74ce dist/debian: add --jobs <njobs> option just like build_rpm.sh
On some build environment we may want to limit number of parallel jobs since
ninja-build runs ncpus jobs by default, it may too many since g++ eats very
huge memory.
So support --jobs <njobs> just like on rpm build script.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180425205439.30053-1-syuu@scylladb.com>
(cherry picked from commit 782ebcece4)
2018-06-14 15:05:09 +03:00
Avi Kivity
6615c2a6a9 database: stop using incremental selectors
There is a bug in incremental_selector for partitioned_sstable_set, so
until it is found, stop using it.

This degrades scan performance of Leveled Compaction Strategy tables.

Fixes #3513. (as a workaround)
Introduced: 2.1
Message-Id: <20180613131547.19084-1-avi@scylladb.com>

(cherry picked from commit aeffbb6732)
2018-06-14 10:52:39 +03:00
Vlad Zolotarov
11500ccd3a locator::ec2_multi_region_snitch: don't call for ec2_snitch::gossiper_starting()
ec2_snitch::gossiper_starting() calls for the base class (default) method
that sets _gossip_started to TRUE and thereby prevents to following
reconnectable_snitch_helper registration.

Fixes #3454

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1528208520-28046-1-git-send-email-vladz@scylladb.com>
(cherry picked from commit 2dde372ae6)
2018-06-14 10:52:39 +03:00
Shlomi Livne
955f3eeb56 release: prepare for 2.1.4
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2018-06-06 11:27:01 +03:00
Avi Kivity
08bfd96774 Update seastar submodule
* seastar 675acd5...2a2c1d2 (1):
  > tls: Ensure handshake always drains output before return/throw

Fixes #3461.
2018-05-31 12:06:13 +03:00
Mika Eloranta' via ScyllaDB development
f6c4d558eb build: fix rpm build script --jobs N handling
Fixes argument misquoting at $SRPM_OPTS expansion for the mock commands
and makes the --jobs argument work as supposed.

Signed-off-by: Mika Eloranta <mel@aiven.io>
Message-Id: <20180113212904.85907-1-mel@aiven.io>
(cherry picked from commit 7266446227)
2018-05-27 10:25:26 +03:00
Avi Kivity
0040ff6de2 Update seastar submodule
* seastar 0e6dcd5...675acd5 (1):
  > net/tls: Wait for output to be sent when shutting down

Fixes #3459.
2018-05-24 12:03:10 +03:00
Glauber Costa
c238bc7a81 commitlog: don't move pointer to segment
We are currently moving the pointer we acquired to the segment inside
the lambda in which we'll handle the cycle.

The problem is, we also use that same pointer inside the exception
handler. If an exception happens we'll access it and we'll crash.

Probably #3440.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20180518125820.10726-1-glauber@scylladb.com>
(cherry picked from commit 596a525950)
2018-05-19 19:13:58 +03:00
Avi Kivity
3b984a4293 dist: redhat: get rid of raid0.devices_discard_performance
This parameter is not available on recent Red Hat kernels or on
non-Red Hat kernels (it was removed on 3.10.0-772.el7,
RHBZ 1455932). The presence of the parameter on kernels that don't
support it cause the module load to fail, with the result that the
storage is not available.

Fix by removing the parameter. For someone running an older Red Hat
kernel the effect will be that discard is disabled, but they can fix
that by updating the kernel. For someone running a newer kernel, the
effect will be that they can access their data.

Fixes #3437.
Message-Id: <20180516134913.6540-1-avi@scylladb.com>

(cherry picked from commit 3b8118d4e5)
2018-05-19 19:13:58 +03:00
Takuya ASADA
156761d77e dist/ami: update CentOS base image to latest version
Since we requires updated version of systemd, we need to update CentOS base
image.

Fixes #3184

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1518118694-23770-1-git-send-email-syuu@scylladb.com>

Conflicts:
	dist/ami/build_ami.sh

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180508083521.18661-1-syuu@scylladb.com>
2018-05-19 19:13:58 +03:00
Avi Kivity
8e33e80ad3 release: prepare for 2.1.3 2018-04-25 09:01:30 +03:00
Duarte Nunes
c35dd86c87 db/schema_tables: Only drop UDTs after merging tables
Dropping a user type requires that all tables using that type also be
dropped. However, a type may appear to be dropped at the same time as
a table, for instance due to the order in which a node receives schema
notifications, or when dropping a keyspace.

When dropping a table, if we build a schema in a shard through a
global_schema_pointer, then we'll check for the existence of any user
type the schema employs. We thus need to ensure types are only dropped
after tables, similarly to how it's done for keyspaces.

Fixes #3068

Tests: unit-tests (release)

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180129114137.85149-1-duarte@scylladb.com>
(cherry picked from commit 1e3fae5bef)
2018-04-25 01:15:25 +03:00
Pekka Enberg
87cb8a1fa4 release: prepare for 2.1.2 2018-04-17 09:45:00 +03:00
Takuya ASADA
26f3340c32 dist/debian: use ~root as HOME to place .pbuilderrc
When 'always_set_home' is specified on /etc/sudoers pbuilder won't read
.pbuilderrc from current user home directory, and we don't have a way to change
the behavor from sudo command parameter.

So let's use ~root/.pbuilderrc and switch to HOME=/root when sudo executed,
this can work both environment which does specified always_set_home and doesn't
specified.

Fixes #3366

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1523926024-3937-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit ace44784e8)
2018-04-17 09:39:15 +03:00
Avi Kivity
aaba093371 Update seastar submodule
* seastar af1b789...0e6dcd5 (1):
  > tls: Ensure we always pass through semaphores on shutdown

Fixes #3358.
2018-04-14 20:52:02 +03:00
Gleb Natapov
a64c6e6be9 cql_server: fix a race between closing of a connection and notifier registration
There is a race between cql connection closure and notifier
registration. If a connection is closed before notification registration
is complete stale pointer to the connection will remain in notification
list since attempt to unregister the connection will happen to early.
The fix is to move notifier unregisteration after connection's gate
is closed which will ensure that there is no outstanding registration
request. But this means that now a connection with closed gate can be in
notifier list, so with_gate() may throw and abort a notifier loop. Fix
that by replacing with_gate() by call to is_closed();

Fixes: #3355
Tests: unit(release)

Message-Id: <20180412134744.GB22593@scylladb.com>
(cherry picked from commit 1a9aaece3e)
2018-04-12 16:57:18 +03:00
Duarte Nunes
c83d2d0d77 db/view: Reject view entries with non-composite, empty partition key
Empty partition keys are not supported on normal tables - they cannot
be inserted or queried (surprisingly, the rules for composite
partition keys are different: all components are then allowed to be
empty). However, the (non-composite) partition key of a view could end
up being empty if that column is: a base table regular column, a
base table clustering key column, or a base table partition key column,
part of a composite key.

Fixes #3262
Refs CASSANDRA-14345

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180403122244.10626-1-duarte@scylladb.com>
(cherry picked from commit ec8960df45)
2018-04-03 19:08:38 +03:00
Asias He
0aa49d0311 gossip: Relax generation max difference check
start node 1 2 3
shutdown node2
shutdown node1 and node3
start node1 and node3
nodetool removenode node2
clean up all scylla data on node2
bootstrap node2 as a new node

I saw node2 could not bootstrap stuck at waiting for schema information to compelte for ever:

On node1, node3

    [shard 0] gossip - received an invalid gossip generation for peer 127.0.0.2; local generation = 2, received generation = 1521779704

On node2

    [shard 0] storage_service - JOINING: waiting for schema information to complete

This is becasue in nodetool removenode operation, the generation of node1 was increased from 0 to 2.

   gossiper::advertise_removing () calls eps.get_heart_beat_state().force_newer_generation_unsafe();
   gossiper::advertise_token_removed() calls eps.get_heart_beat_state().force_newer_generation_unsafe();

Each force_newer_generation_unsafe increases the generation by 1.

Here is an example,

Before nodetool removenode:
```
curl -X GET --header "Accept: application/json" "http://127.0.0.1:10000/failure_detector/endpoints/" | python -mjson.tool
   {
   "addrs": "127.0.0.2",
   "generation": 0,
   "is_alive": false,
   "update_time": 1521778757334,
   "version": 0
   },
```

After nodetool revmoenode:
```
curl -X GET --header "Accept: application/json" "http://127.0.0.1:10000/failure_detector/endpoints/" | python -mjson.tool
 {
     "addrs": "127.0.0.2",
     "application_state": [
         {
             "application_state": 0,
             "value": "removed,146b52d5-dc94-4e35-b7d4-4f64be0d2672,1522038476246",
             "version": 214
         },
         {
             "application_state": 6,
             "value": "REMOVER,14ecc9b0-4b88-4ff3-9c96-38505fb4968a",
             "version": 153
            }
     ],
     "generation": 2,
     "is_alive": false,
     "update_time": 1521779276246,
     "version": 0
 },
```

In gossiper::apply_state_locally, we have this check:

```
if (local_generation != 0 && remote_generation > local_generation + MAX_GENERATION_DIFFERENCE) {
    // assume some peer has corrupted memory and is broadcasting an unbelievable generation about another peer (or itself)
  logger.warn("received an invalid gossip generation for peer {}; local generation = {}, received generation = {}",ep, local_generation, remote_generation);

}
```
to skip the gossip update.

To fix, we relax generation max difference check to allow the generation
of a removed node.

After this patch, the removed node bootstraps successfully.

Tests: dtest:update_cluster_layout_tests.py
Fixes #3331

Message-Id: <678fb60f6b370d3ca050c768f705a8f2fd4b1287.1522289822.git.asias@scylladb.com>
(cherry picked from commit f539e993d3)
2018-04-03 19:08:38 +03:00
Shlomi Livne
cce455b1f5 release: prepare for 2.1.1
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2018-03-25 09:32:02 +03:00
Avi Kivity
6772f3806b tests: mutation_source_test: fix scattering or partition tombstone
The partition tombstone is not part of a mutation_fragment in the old
streamed_mutation, so it was not scattered correctly by fragment_scatterer.
This causes test failures if the mutations to be scattered have a partition
tombstone.

Fix by calling consume(tombstone) directly. This isn't nice, but the code
is dead anyway.
2018-03-24 15:15:02 +03:00
Avi Kivity
6c9d699835 Merge "Fix abort during counter table read-on-delete" from Tomasz
"
This fixes an abort in an sstable reader when querying a partition with no
clustering ranges (happens on counter table mutation with no live rows) which
also doesn't have any static columns. In such case, the
sstable_mutation_reader will setup the data_consume_context such that it only
covers the static row of the partition, knowing that there is no need to read
any clustered rows. See partition.cc::advance_to_upper_bound(). Later when
the reader is done with the range for the static row, it will try to skip to
the first clustering range (missing in this case). If clustering_ranges_walker
tells us to skip to after_all_clustering_rows(), we will hit an assert inside
continuous_data_consumer::fast_forward_to() due to attempt to skip past the
original data file range. If clustering_ranges_walker returns
before_all_clustering_rows() instead, all is fine because we're still at the
same data file position.

Fixes #3304.
"

* 'tgrabiec/fix-counter-read-no-static-columns' of github.com:scylladb/seastar-dev:
  tests: mutation_source_test: Test reads with no clustering ranges and no static columns
  tests: simple_schema: Allow creating schema with no static column
  clustering_ranges_walker: Stop after static row in case no clustering ranges

(cherry picked from commit 054854839a)
2018-03-23 10:47:23 +03:00
Vlad Zolotarov
a75e1632c8 test.py: limit the tests to run on 2 shards with 4GB of memory
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
(cherry picked from commit 57a6ed5aaa)
2018-03-22 12:45:25 +02:00
Jesse Haber-Kucharsky
c5718bf620 auth: Fix improper sharing of sharded service
This change is backported from 092f2e659c.

Previously, the sharded permissions cache was only accessible to the
implementation of `auth::service` in `auth/service.cc`. The intention
was that invoking `auth::service::get_permissions` on shard `k` would
query the cache on shard `k`, which would in turn depend on
`auth::service` on shard k to check for superuser status.

The problem is in `auth::service::start`.

`seastar::sharded<auth::permissions_cache>::start` is invoked with
`*this` of shard 0, causing all instances of the cache to reference the
same object.

I wasn't able to locally reproduce errors or crashes due to this bug
when I compiled a release build of Scylla. However, running a debug
build meant that the glorious `seastar::debug_shared_ptr_counter_type`
quickly saved the day with its checks that `seastar::shared_ptr` isn't
being misused.

To eliminate this problem, we move ownership of a single instance of
`auth::permissions_cache` to a single instance of `auth::service`. When
`auth::service` is sharded, so is the permissions cache.

I verified interactively that no assertions failed in debug mode with
this change.

Fixes #3296.

Tests: unit (debug, release)
Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <280a889f551180db1c00d8a80eddf85b2ff0ac60.1521696176.git.jhaberku@scylladb.com>
2018-03-22 10:04:50 +02:00
Duarte Nunes
2315fcd6cf gms/gossiper: Synchronize endpoint state destruction
In gossiper::handle_major_state_change() we set the endpoint_state for
a particular endpoint and replicate the changes to other cores.

This is totally unsynchronized with the execution of
gossiper::evict_from_membership(), which can happen concurrently, and
can remove the very same endpoint from the map  (in all cores).

Replicating the changes to other cores in handle_major_state_change()
can interleave with replicating the changes to other cores in
evict_from_membership(), and result in an undefined final state.

Another issue happened in debug mode dtests, where a fiber executes
handle_major_state_change(), calls into the subscribers, of which
storage_service is one, and ultimately lands on
storage_service::update_peer_info(), which iterates over the
endpoint's application state with deferring points in between (to
update a system table). gossiper::evict_from_membership() was executed
concurrently by another fiber, which freed the state the first one is
iterating over.

Fixes #3299.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180318123211.3366-1-duarte@scylladb.com>
(cherry picked from commit 810db425a5)
2018-03-18 14:55:32 +02:00
Asias He
8c5464d2fd range_streamer: Stream 10% of ranges instead of 10 ranges per time
If there are a lot of ranges, e.g., num_tokens=2048, 10 ranges per
stream plan will cause tons of stream plan to be created to stream data,
each having very few data. This cause each stream plan has low transfer
bandwidth, so that the total time to complete the streaming increases.

It makes more sense to send a percentage of the total ranges per stream
plan than a fixed ranges.

Here is an example to stream a keyspace with 513 ranges in
total, 10 ranges v.s. 10% ranges:

Before:
[shard 0] range_streamer - Bootstrap with 127.0.0.1 for
keyspace=system_traces, 510 out of 513 ranges: ranges = 51
[shard 0] range_streamer - Bootstrap with ks for keyspace=127.0.0.1
succeeded, took 107 seconds

After:
[shard 0] range_streamer - Bootstrap with 127.0.0.1 for
keyspace=system_traces, 510 out of 513 ranges: ranges = 10
[shard 0] range_streamer - Bootstrap with ks for keyspace=127.0.0.1
succeeded, took 22 seconds

Message-Id: <a890b84fbac0f3c3cc4021e30dbf4cdf135b93ea.1520992228.git.asias@scylladb.com>
(cherry picked from commit 9b5585ebd5)
2018-03-14 10:13:01 +02:00
Asias He
346d2788e3 Revert "streaming: Do not abort session too early in idle detection"
This reverts commit f792c78c96.

With the "Use range_streamer everywhere" (7217b7ab36) series,
all the user of streaming now do streaming with relative small ranges
and can retry streaming at higher level.

Reduce the time-to-recover from 5 hours to 10 minutes per stream session.

Even if the 10 minutes idle detection might cause higher false positive,
it is fine, since we can retry the "small" stream session anyway. In the
long term, we should replace the whole idle detection logic with
whenever the stream initiator goes away, the stream slave goes away.

Message-Id: <75f308baf25a520d42d884c7ef36f1aecb8a64b0.1520992219.git.asias@scylladb.com>
(cherry picked from commit ad7b132188)
2018-03-14 10:12:59 +02:00
Avi Kivity
4f68fede6d Merge "Make reader concurrency dual-restricted by count and memory" from Botond
"
Refs #2692
Fixes #3246

The current restricting algorithm [1] restricts the active-reader queue
based on the memory consumption of the existing active readers. When
this memory consumption is above the limit new readers are not admitted.
The inactive reader queue on the other hand has a fixed length.
This caused performance regressions on two workloads:
* read-only: since the inactive-reader queue length is severly limited
  (compared to the previous situation) reads will timeout at loads
  comfortably handled before.
* mixed: since the memory consumption happens only at admission time
  (already created active readers are not limited) memory consumption
  growed significantly causing problems when compactions kicked in.

The solution is to reintroduce the old limit of 100 active concurrent
user-reads while still keeping the memory-based limit as well. For
workloads that don't consume a lot of memory or on large boxes with lots
of memory the count-based limit will be reached which is reverting to the
old well-known behaviour. For memory-hungry workloads or on small boxes
with little memory the memory based-limit will kick in sooner avoiding
memory overconsumption.

[1] introduced by bdbbfe9390
"

* 'restricted-reader-dual-limit/v3-backport-2.1' of https://github.com/denesb/scylla:
  Modify unit tests so that they test the dual-limits
  Use the reader_concurrency_semaphore to limit reader concurrency
  Add reader_concurrency_semaphore
  Add reader_resource_tracker param to mutation_source
  mv reader_resource_tracker.hh -> reader_concurrency_semaphore.hh
2018-03-08 19:10:06 +02:00
Botond Dénes
681f9e4f50 Modify unit tests so that they test the dual-limits 2018-03-08 18:54:16 +02:00
Botond Dénes
c503bc7693 Use the reader_concurrency_semaphore to limit reader concurrency 2018-03-08 18:54:15 +02:00
Botond Dénes
de7024251b Add reader_concurrency_semaphore
This semaphore implements the new dual, count and memory based active
reader limiting. As purely memory-based limiting proved to cause
problems on big boxes admitting a large number of readers (more than any
disk could handle) the previous count-based limit is reintroduced in
addition to the existing memory-based limit.
When creating new readers first the count-based limit is checked. If
that clears the memory limit is checked before admitting the reader.
reader_conccurency_semaphore wraps the two semaphores that implement
these limits and enforces the correct order of limit checking.
This class also completely replaces the restricted_reader_config struct,
it encapsulates all data and related functinality of the latter, making
client code simpler.
2018-03-08 18:54:15 +02:00
Botond Dénes
9a0eb2319c Add reader_resource_tracker param to mutation_source
Soon, reader_resource_tracker will only be constructible after the
reader has been admitted. This means that the resource tracker cannot be
preconstructed and just captured by the lambda stored in the mutation
source and instead has to be passed in along the other parameters.
2018-03-08 18:54:12 +02:00
Botond Dénes
9ef462449b mv reader_resource_tracker.hh -> reader_concurrency_semaphore.hh
In preparation to reader_concurrency_semaphore being added to the file.
The reader_resource_tracker is really only a helper class for
reader_concurrency_semaphore so the latter is better suited to provide
the name of the file.
2018-03-08 15:34:48 +02:00
Amnon Heiman
6271f30716 dist/docker: Add support for housekeeping
This patch takes a modified version of the Ubuntu 14.04 housekeeping
service script and uses it in Docker to validate the current version.

To disable the version validation, pass the --disable-version-check flag
when running the container.

Message-Id: <20180220161231.1630-1-amnon@scylladb.com>
(cherry picked from commit edcfab3262)
2018-03-07 16:17:13 +02:00
Takuya ASADA
8b64e80c88 dist/debian: install scylla-housekeeping upstart script correctly on Ubuntu 14.04
Since we splited scylla-housekeeping service to two different services for systemd, we don't share same service name between systemd and upstart anymore.
So handle it independently for each distribution, try to install
/etc/init/scylla-housekeeping.conf on Ubuntu 14.04.

Fixes #3239

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1519852659-10688-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 101e909483)
2018-03-07 16:16:36 +02:00
Amnon Heiman
c5bffcaa68 scylla-housekeeing: need to support both debian/ubuntu variations
Debian and ubuntu list files come in two variations.
The housekeeping should support both.

This patch change the regexp that match the os in the repository file.
After the introduction of the second list variation, the os name can be in the middle of the path not only at the end.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20180227092543.19538-1-amnon@scylladb.com>
(cherry picked from commit 57d46c6959)
2018-03-07 16:15:54 +02:00
Tomasz Grabiec
8aa0b60e91 tests: cache: Fix invalidate() not being waited for
Probably responsible for occasional failures of subsequent assertion.
Didn't mange to reproduce.

Message-Id: <1520330967-584-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit d9f0c1f097)
2018-03-06 12:17:16 +02:00
Asias He
dccf762654 storage_service: Add missing return in pieces empty check
If pieces.empty is empty, it is bogus to access pieces[0]:

   sstring move_name = pieces[0];

Fix by adding the missing return.

Spotted by Vlad Zolotarov <vladz@scylladb.com>

Fixes #3258
Message-Id: <bcb446f34f953bc51c3704d06630b53fda82e8d2.1520297558.git.asias@scylladb.com>

(cherry picked from commit 8900e830a3)
2018-03-06 09:58:21 +02:00
Tomasz Grabiec
e5344079d9 intrusive_set_external_comparator: Fix _header having undefined color on move
swap_tree() doesn't change the color of the header, and becasue header
was not initialized, it is undefined (can be both red or black). One
problem this causes is that algo::is_header() expects the header to be
always red. It is used by unlink(), which for trees which have a black
header would infinite-loop.

The fix is to initialize the header.

Fixes #3242.

Message-Id: <1519815091-13111-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 30635510a2)
2018-02-28 13:57:33 +02:00
Paweł Dziepak
7bc8515c48 tests/cql3: increase TTL to avoid spurious failures
The test inserts some values with a TTL of 1 second and then
reads them back expecting them not to be expired yet. That may not
always be the case if the machine is slow and we are running in the
debug mode. Increasising the TTLs by x100 should help avoid these
false positives.

Message-Id: <20180219133816.17452-1-pdziepak@scylladb.com>
(cherry picked from commit d97eebe82d)
2018-02-22 14:14:41 +00:00
Duarte Nunes
1228a41eaa cql3/query_processor: Remove prepared statements upon dropping a view
Fixes #3198

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180209143652.31852-1-duarte@scylladb.com>
(cherry picked from commit d757c87107)
2018-02-22 14:11:08 +00:00
63 changed files with 1137 additions and 447 deletions

View File

@@ -1,6 +1,6 @@
#!/bin/sh
VERSION=2.1.0
VERSION=2.1.6
if test -f version
then

View File

@@ -141,7 +141,9 @@ static sstring gensalt() {
// blowfish 2011 fix, blowfish, sha512, sha256, md5
for (sstring pfx : { "$2y$", "$2a$", "$6$", "$5$", "$1$" }) {
salt = pfx + input;
if (crypt_r("fisk", salt.c_str(), &tlcrypt)) {
const char* e = crypt_r("fisk", salt.c_str(), &tlcrypt);
if (e && (e[0] != '*')) {
prefix = pfx;
return salt;
}

View File

@@ -89,10 +89,6 @@ class permissions_cache final {
public:
explicit permissions_cache(const permissions_cache_config&, service&, logging::logger&);
future<> start() {
return make_ready_future<>();
}
future <> stop() {
return _cache.stop();
}

View File

@@ -24,7 +24,6 @@
#include <map>
#include <seastar/core/future-util.hh>
#include <seastar/core/sharded.hh>
#include <seastar/core/shared_ptr.hh>
#include "auth/allow_all_authenticator.hh"
@@ -86,8 +85,6 @@ private:
void on_drop_view(const sstring& ks_name, const sstring& view_name) override {}
};
static sharded<permissions_cache> sharded_permissions_cache{};
static db::consistency_level consistency_for_user(const sstring& name) {
if (name == meta::DEFAULT_SUPERUSER_NAME) {
return db::consistency_level::QUORUM;
@@ -130,7 +127,8 @@ service::service(
::service::migration_manager& mm,
std::unique_ptr<authorizer> a,
std::unique_ptr<authenticator> b)
: _cache_config(std::move(c))
: _permissions_cache_config(std::move(c))
, _permissions_cache(nullptr)
, _qp(qp)
, _migration_manager(mm)
, _authorizer(std::move(a))
@@ -240,10 +238,12 @@ future<> service::start() {
return make_ready_future<>();
}).then([this] {
return when_all_succeed(_authorizer->start(), _authenticator->start());
}).then([this] {
_permissions_cache = std::make_unique<permissions_cache>(_permissions_cache_config, *this, log);
}).then([this] {
return once_among_shards([this] {
_migration_manager.register_listener(_migration_listener.get());
return sharded_permissions_cache.start(std::ref(_cache_config), std::ref(*this), std::ref(log));
return make_ready_future<>();
});
});
}
@@ -251,7 +251,9 @@ future<> service::start() {
future<> service::stop() {
return once_among_shards([this] {
_delayed.cancel_all();
return sharded_permissions_cache.stop();
return make_ready_future<>();
}).then([this] {
return _permissions_cache->stop();
}).then([this] {
return when_all_succeed(_authorizer->stop(), _authenticator->stop());
});
@@ -335,7 +337,7 @@ future<> service::delete_user(const sstring& name) {
}
future<permission_set> service::get_permissions(::shared_ptr<authenticated_user> u, data_resource r) const {
return sharded_permissions_cache.local().get(std::move(u), std::move(r));
return _permissions_cache->get(std::move(u), std::move(r));
}
//

View File

@@ -60,7 +60,8 @@ struct service_config final {
};
class service final {
permissions_cache_config _cache_config;
permissions_cache_config _permissions_cache_config;
std::unique_ptr<permissions_cache> _permissions_cache;
cql3::query_processor& _qp;

View File

@@ -60,6 +60,7 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
// - _next_row_in_range = _next.position() < _upper_bound
// - _last_row points at a direct predecessor of the next row which is going to be read.
// Used for populating continuity.
// - _population_range_starts_before_all_rows is set accordingly
reading_from_underlying,
end_of_stream
@@ -96,6 +97,13 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
partition_snapshot_row_cursor _next_row;
bool _next_row_in_range = false;
// True iff current population interval, since the previous clustering row, starts before all clustered rows.
// We cannot just look at _lower_bound, because emission of range tombstones changes _lower_bound and
// because we mark clustering intervals as continuous when consuming a clustering_row, it would prevent
// us from marking the interval as continuous.
// Valid when _state == reading_from_underlying.
bool _population_range_starts_before_all_rows;
future<> do_fill_buffer();
void copy_from_cache_to_buffer();
future<> process_static_row();
@@ -225,6 +233,7 @@ inline
future<> cache_flat_mutation_reader::do_fill_buffer() {
if (_state == state::move_to_underlying) {
_state = state::reading_from_underlying;
_population_range_starts_before_all_rows = _lower_bound.is_before_all_clustered_rows(*_schema);
auto end = _next_row_in_range ? position_in_partition(_next_row.position())
: position_in_partition(_upper_bound);
return _read_context->fast_forward_to(position_range{_lower_bound, std::move(end)}).then([this] {
@@ -348,7 +357,7 @@ future<> cache_flat_mutation_reader::read_from_underlying() {
inline
void cache_flat_mutation_reader::maybe_update_continuity() {
if (can_populate() && (!_ck_ranges_curr->start() || _last_row.refresh(*_snp))) {
if (can_populate() && (_population_range_starts_before_all_rows || _last_row.refresh(*_snp))) {
if (_next_row.is_in_latest_version()) {
clogger.trace("csm {}: mark {} continuous", this, _next_row.get_iterator_in_latest_version()->position());
_next_row.get_iterator_in_latest_version()->set_continuous(true);
@@ -387,6 +396,7 @@ inline
void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
if (!can_populate()) {
_last_row = nullptr;
_population_range_starts_before_all_rows = false;
_read_context->cache().on_mispopulate();
return;
}
@@ -417,6 +427,7 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
with_allocator(standard_allocator(), [&] {
_last_row = partition_snapshot_row_weakref(*_snp, it);
});
_population_range_starts_before_all_rows = false;
});
}

View File

@@ -70,7 +70,7 @@ public:
{
if (!with_static_row) {
if (_current == _end) {
_current_start = _current_end = position_in_partition_view::after_all_clustered_rows();
_current_start = position_in_partition_view::before_all_clustered_rows();
} else {
_current_start = position_in_partition_view::for_range_start(*_current);
_current_end = position_in_partition_view::for_range_end(*_current);

View File

@@ -209,19 +209,18 @@ void query_options::prepare(const std::vector<::shared_ptr<column_specification>
}
auto& names = *_names;
std::vector<cql3::raw_value> ordered_values;
std::vector<cql3::raw_value_view> ordered_values;
ordered_values.reserve(specs.size());
for (auto&& spec : specs) {
auto& spec_name = spec->name->text();
for (size_t j = 0; j < names.size(); j++) {
if (names[j] == spec_name) {
ordered_values.emplace_back(_values[j]);
ordered_values.emplace_back(_value_views[j]);
break;
}
}
}
_values = std::move(ordered_values);
fill_value_views();
_value_views = std::move(ordered_values);
}
void query_options::fill_value_views()

View File

@@ -606,6 +606,7 @@ void query_processor::migration_subscriber::on_drop_aggregate(const sstring& ks_
}
void query_processor::migration_subscriber::on_drop_view(const sstring& ks_name, const sstring& view_name) {
remove_invalid_prepared_statements(ks_name, view_name);
}
void query_processor::migration_subscriber::remove_invalid_prepared_statements(

View File

@@ -202,6 +202,14 @@ public:
const query_options& options,
gc_clock::time_point now) const override;
virtual std::vector<bytes_opt> values_raw(const query_options& options) const = 0;
virtual std::vector<bytes_opt> values(const query_options& options) const override {
std::vector<bytes_opt> ret = values_raw(options);
std::sort(ret.begin(),ret.end());
ret.erase(std::unique(ret.begin(),ret.end()),ret.end());
return ret;
}
#if 0
@Override
protected final boolean isSupportedBy(SecondaryIndex index)
@@ -224,7 +232,7 @@ public:
return abstract_restriction::term_uses_function(_values, ks_name, function_name);
}
virtual std::vector<bytes_opt> values(const query_options& options) const override {
virtual std::vector<bytes_opt> values_raw(const query_options& options) const override {
std::vector<bytes_opt> ret;
for (auto&& v : _values) {
ret.emplace_back(to_bytes_opt(v->bind_and_get(options)));
@@ -249,7 +257,7 @@ public:
return false;
}
virtual std::vector<bytes_opt> values(const query_options& options) const override {
virtual std::vector<bytes_opt> values_raw(const query_options& options) const override {
auto&& lval = dynamic_pointer_cast<multi_item_terminal>(_marker->bind(options));
if (!lval) {
throw exceptions::invalid_request_exception("Invalid null value for IN restriction");

View File

@@ -53,6 +53,9 @@ update_parameters::get_prefetched_list(
return {};
}
if (column.is_static()) {
ckey = clustering_key_view::make_empty();
}
auto i = _prefetched->rows.find(std::make_pair(std::move(pkey), std::move(ckey)));
if (i == _prefetched->rows.end()) {
return {};

View File

@@ -328,9 +328,13 @@ filter_sstable_for_reader(std::vector<sstables::shared_sstable>&& sstables, colu
};
sstables.erase(boost::remove_if(sstables, sstable_has_not_key), sstables.end());
// FIXME: Workaround for https://github.com/scylladb/scylla/issues/3552
// and https://github.com/scylladb/scylla/issues/3553
const bool filtering_broken = true;
// no clustering filtering is applied if schema defines no clustering key or
// compaction strategy thinks it will not benefit from such an optimization.
if (!schema->clustering_key_size() || !cf.get_compaction_strategy().use_clustering_key_filter()) {
if (filtering_broken || !schema->clustering_key_size() || !cf.get_compaction_strategy().use_clustering_key_filter()) {
return sstables;
}
::cf_stats* stats = cf.cf_stats();
@@ -512,9 +516,9 @@ column_family::make_sstable_reader(schema_ptr s,
tracing::trace_state_ptr trace_state,
streamed_mutation::forwarding fwd,
mutation_reader::forwarding fwd_mr) const {
auto& config = service::get_local_streaming_read_priority().id() == pc.id()
? _config.streaming_read_concurrency_config
: _config.read_concurrency_config;
auto* semaphore = service::get_local_streaming_read_priority().id() == pc.id()
? _config.streaming_read_concurrency_semaphore
: _config.read_concurrency_semaphore;
// CAVEAT: if make_sstable_reader() is called on a single partition
// we want to optimize and read exactly this partition. As a
@@ -526,37 +530,39 @@ column_family::make_sstable_reader(schema_ptr s,
return make_empty_flat_reader(s); // range doesn't belong to this shard
}
if (config.resources_sem) {
auto ms = mutation_source([&config, sstables=std::move(sstables), this] (
if (semaphore) {
auto ms = mutation_source([semaphore, this, sstables=std::move(sstables)] (
schema_ptr s,
const dht::partition_range& pr,
const query::partition_slice& slice,
const io_priority_class& pc,
tracing::trace_state_ptr trace_state,
streamed_mutation::forwarding fwd,
mutation_reader::forwarding fwd_mr) {
mutation_reader::forwarding fwd_mr,
reader_resource_tracker tracker) {
return create_single_key_sstable_reader(const_cast<column_family*>(this), std::move(s), std::move(sstables),
_stats.estimated_sstable_per_read, pr, slice, pc, reader_resource_tracker(config.resources_sem), std::move(trace_state), fwd, fwd_mr);
_stats.estimated_sstable_per_read, pr, slice, pc, tracker, std::move(trace_state), fwd, fwd_mr);
});
return make_restricted_flat_reader(config, std::move(ms), std::move(s), pr, slice, pc, std::move(trace_state), fwd, fwd_mr);
return make_restricted_flat_reader(*semaphore, std::move(ms), std::move(s), pr, slice, pc, std::move(trace_state), fwd, fwd_mr);
} else {
return create_single_key_sstable_reader(const_cast<column_family*>(this), std::move(s), std::move(sstables),
_stats.estimated_sstable_per_read, pr, slice, pc, no_resource_tracking(), std::move(trace_state), fwd, fwd_mr);
}
} else {
if (config.resources_sem) {
auto ms = mutation_source([&config, sstables=std::move(sstables)] (
if (semaphore) {
auto ms = mutation_source([semaphore, sstables=std::move(sstables)] (
schema_ptr s,
const dht::partition_range& pr,
const query::partition_slice& slice,
const io_priority_class& pc,
tracing::trace_state_ptr trace_state,
streamed_mutation::forwarding fwd,
mutation_reader::forwarding fwd_mr) {
mutation_reader::forwarding fwd_mr,
reader_resource_tracker tracker) {
return make_local_shard_sstable_reader(std::move(s), std::move(sstables), pr, slice, pc,
reader_resource_tracker(config.resources_sem), std::move(trace_state), fwd, fwd_mr);
tracker, std::move(trace_state), fwd, fwd_mr);
});
return make_restricted_flat_reader(config, std::move(ms), std::move(s), pr, slice, pc, std::move(trace_state), fwd, fwd_mr);
return make_restricted_flat_reader(*semaphore, std::move(ms), std::move(s), pr, slice, pc, std::move(trace_state), fwd, fwd_mr);
} else {
return make_local_shard_sstable_reader(std::move(s), std::move(sstables), pr, slice, pc,
no_resource_tracking(), std::move(trace_state), fwd, fwd_mr);
@@ -2001,6 +2007,18 @@ database::database(const db::config& cfg)
, _memtable_cpu_controller(make_flush_cpu_controller(*_cfg, &_background_writer_scheduling_group, [this, limit = 2.0f * _dirty_memory_manager.throttle_threshold()] {
return (_dirty_memory_manager.virtual_dirty_memory()) / limit;
}))
, _read_concurrency_sem(max_count_concurrent_reads,
max_memory_concurrent_reads(),
_cfg->read_request_timeout_in_ms() * 1ms,
max_inactive_queue_length(),
[this] {
++_stats->sstable_read_queue_overloaded;
return std::make_exception_ptr(std::runtime_error("sstable inactive read queue overloaded"));
})
// No timeouts or queue length limits - a failure here can kill an entire repair.
// Trust the caller to limit concurrency.
, _streaming_concurrency_sem(max_count_streaming_concurrent_reads, max_memory_streaming_concurrent_reads())
, _system_read_concurrency_sem(max_count_system_concurrent_reads, max_memory_system_concurrent_reads())
, _version(empty_version)
, _compaction_manager(std::make_unique<compaction_manager>())
, _enable_incremental_backups(cfg.incremental_backups())
@@ -2132,11 +2150,11 @@ database::setup_metrics() {
sm::description("Counts the number of times the sstable read queue was overloaded. "
"A non-zero value indicates that we have to drop read requests because they arrive faster than we can serve them.")),
sm::make_gauge("active_reads", [this] { return _stats->active_reads; },
sm::make_gauge("active_reads", [this] { return max_count_concurrent_reads - _read_concurrency_sem.available_resources().count; },
sm::description("Holds the number of currently active read operations. "),
{user_label_instance}),
sm::make_gauge("active_reads_memory_consumption", [this] { return max_memory_concurrent_reads() - _read_concurrency_sem.available_units(); },
sm::make_gauge("active_reads_memory_consumption", [this] { return max_memory_concurrent_reads() - _read_concurrency_sem.available_resources().memory; },
sm::description(seastar::format("Holds the amount of memory consumed by currently active read operations. "
"If this value gets close to {} we are likely to start dropping new read requests. "
"In that case sstable_read_queue_overloads is going to get a non-zero value.", max_memory_concurrent_reads())),
@@ -2146,12 +2164,12 @@ database::setup_metrics() {
sm::description("Holds the number of currently queued read operations."),
{user_label_instance}),
sm::make_gauge("active_reads", [this] { return _stats->active_reads_streaming; },
sm::make_gauge("active_reads", [this] { return max_count_streaming_concurrent_reads - _streaming_concurrency_sem.available_resources().count; },
sm::description("Holds the number of currently active read operations issued on behalf of streaming "),
{streaming_label_instance}),
sm::make_gauge("active_reads_memory_consumption", [this] { return max_memory_streaming_concurrent_reads() - _streaming_concurrency_sem.available_units(); },
sm::make_gauge("active_reads_memory_consumption", [this] { return max_memory_streaming_concurrent_reads() - _streaming_concurrency_sem.available_resources().memory; },
sm::description(seastar::format("Holds the amount of memory consumed by currently active read operations issued on behalf of streaming "
"If this value gets close to {} we are likely to start dropping new read requests. "
"In that case sstable_read_queue_overloads is going to get a non-zero value.", max_memory_streaming_concurrent_reads())),
@@ -2161,11 +2179,11 @@ database::setup_metrics() {
sm::description("Holds the number of currently queued read operations on behalf of streaming."),
{streaming_label_instance}),
sm::make_gauge("active_reads", [this] { return _stats->active_reads_system_keyspace; },
sm::make_gauge("active_reads", [this] { return max_count_system_concurrent_reads - _system_read_concurrency_sem.available_resources().count; },
sm::description("Holds the number of currently active read operations from \"system\" keyspace tables. "),
{system_label_instance}),
sm::make_gauge("active_reads_memory_consumption", [this] { return max_memory_system_concurrent_reads() - _system_read_concurrency_sem.available_units(); },
sm::make_gauge("active_reads_memory_consumption", [this] { return max_memory_system_concurrent_reads() - _system_read_concurrency_sem.available_resources().memory; },
sm::description(seastar::format("Holds the amount of memory consumed by currently active read operations from \"system\" keyspace tables. "
"If this value gets close to {} we are likely to start dropping new read requests. "
"In that case sstable_read_queue_overloads is going to get a non-zero value.", max_memory_system_concurrent_reads())),
@@ -2647,8 +2665,8 @@ keyspace::make_column_family_config(const schema& s, const db::config& db_config
cfg.enable_cache = _config.enable_cache;
cfg.dirty_memory_manager = _config.dirty_memory_manager;
cfg.streaming_dirty_memory_manager = _config.streaming_dirty_memory_manager;
cfg.read_concurrency_config = _config.read_concurrency_config;
cfg.streaming_read_concurrency_config = _config.streaming_read_concurrency_config;
cfg.read_concurrency_semaphore = _config.read_concurrency_semaphore;
cfg.streaming_read_concurrency_semaphore = _config.streaming_read_concurrency_semaphore;
cfg.cf_stats = _config.cf_stats;
cfg.enable_incremental_backups = _config.enable_incremental_backups;
cfg.background_writer_scheduling_group = _config.background_writer_scheduling_group;
@@ -3386,18 +3404,8 @@ database::make_keyspace_config(const keyspace_metadata& ksm) {
}
cfg.dirty_memory_manager = &_dirty_memory_manager;
cfg.streaming_dirty_memory_manager = &_streaming_dirty_memory_manager;
cfg.read_concurrency_config.resources_sem = &_read_concurrency_sem;
cfg.read_concurrency_config.active_reads = &_stats->active_reads;
cfg.read_concurrency_config.timeout = _cfg->read_request_timeout_in_ms() * 1ms;
cfg.read_concurrency_config.max_queue_length = 100;
cfg.read_concurrency_config.raise_queue_overloaded_exception = [this] {
++_stats->sstable_read_queue_overloaded;
throw std::runtime_error("sstable inactive read queue overloaded");
};
// No timeouts or queue length limits - a failure here can kill an entire repair.
// Trust the caller to limit concurrency.
cfg.streaming_read_concurrency_config.resources_sem = &_streaming_concurrency_sem;
cfg.streaming_read_concurrency_config.active_reads = &_stats->active_reads_streaming;
cfg.read_concurrency_semaphore = &_read_concurrency_sem;
cfg.streaming_read_concurrency_semaphore = &_streaming_concurrency_sem;
cfg.cf_stats = &_cf_stats;
cfg.enable_incremental_backups = _enable_incremental_backups;
@@ -4234,16 +4242,14 @@ flat_mutation_reader make_local_shard_sstable_reader(schema_ptr s,
}
return reader;
};
return make_combined_reader(s, std::make_unique<incremental_reader_selector>(s,
std::move(sstables),
pr,
slice,
pc,
std::move(resource_tracker),
std::move(trace_state),
fwd,
fwd_mr,
std::move(reader_factory_fn)),
auto all_readers = boost::copy_range<std::vector<flat_mutation_reader>>(
*sstables->all()
| boost::adaptors::transformed([&] (sstables::shared_sstable sst) -> flat_mutation_reader {
return reader_factory_fn(sst, pr);
})
);
return make_combined_reader(s,
std::move(all_readers),
fwd,
fwd_mr);
}
@@ -4261,16 +4267,14 @@ flat_mutation_reader make_range_sstable_reader(schema_ptr s,
auto reader_factory_fn = [s, &slice, &pc, resource_tracker, fwd, fwd_mr] (sstables::shared_sstable& sst, const dht::partition_range& pr) {
return sst->read_range_rows_flat(s, pr, slice, pc, resource_tracker, fwd, fwd_mr);
};
return make_combined_reader(s, std::make_unique<incremental_reader_selector>(s,
std::move(sstables),
pr,
slice,
pc,
std::move(resource_tracker),
std::move(trace_state),
fwd,
fwd_mr,
std::move(reader_factory_fn)),
auto sstable_readers = boost::copy_range<std::vector<flat_mutation_reader>>(
*sstables->all()
| boost::adaptors::transformed([&] (sstables::shared_sstable sst) {
return reader_factory_fn(sst, pr);
})
);
return make_combined_reader(s,
std::move(sstable_readers),
fwd,
fwd_mr);
}

View File

@@ -79,7 +79,7 @@
#include "utils/phased_barrier.hh"
#include "cpu_controller.hh"
#include "dirty_memory_manager.hh"
#include "reader_resource_tracker.hh"
#include "reader_concurrency_semaphore.hh"
class cell_locker;
class cell_locker_stats;
@@ -296,8 +296,8 @@ public:
bool enable_incremental_backups = false;
::dirty_memory_manager* dirty_memory_manager = &default_dirty_memory_manager;
::dirty_memory_manager* streaming_dirty_memory_manager = &default_dirty_memory_manager;
restricted_mutation_reader_config read_concurrency_config;
restricted_mutation_reader_config streaming_read_concurrency_config;
reader_concurrency_semaphore* read_concurrency_semaphore;
reader_concurrency_semaphore* streaming_read_concurrency_semaphore;
::cf_stats* cf_stats = nullptr;
seastar::thread_scheduling_group* background_writer_scheduling_group = nullptr;
seastar::thread_scheduling_group* memtable_scheduling_group = nullptr;
@@ -954,8 +954,8 @@ public:
bool enable_incremental_backups = false;
::dirty_memory_manager* dirty_memory_manager = &default_dirty_memory_manager;
::dirty_memory_manager* streaming_dirty_memory_manager = &default_dirty_memory_manager;
restricted_mutation_reader_config read_concurrency_config;
restricted_mutation_reader_config streaming_read_concurrency_config;
reader_concurrency_semaphore* read_concurrency_semaphore;
reader_concurrency_semaphore* streaming_read_concurrency_semaphore;
::cf_stats* cf_stats = nullptr;
seastar::thread_scheduling_group* background_writer_scheduling_group = nullptr;
seastar::thread_scheduling_group* memtable_scheduling_group = nullptr;
@@ -1041,10 +1041,17 @@ public:
using timeout_clock = lowres_clock;
private:
::cf_stats _cf_stats;
static const size_t max_count_concurrent_reads{100};
static size_t max_memory_concurrent_reads() { return memory::stats().total_memory() * 0.02; }
// Assume a queued read takes up 10kB of memory, and allow 2% of memory to be filled up with such reads.
static size_t max_inactive_queue_length() { return memory::stats().total_memory() * 0.02 / 10000; }
// They're rather heavyweight, so limit more
static const size_t max_count_streaming_concurrent_reads{10};
static size_t max_memory_streaming_concurrent_reads() { return memory::stats().total_memory() * 0.02; }
static const size_t max_count_system_concurrent_reads{10};
static size_t max_memory_system_concurrent_reads() { return memory::stats().total_memory() * 0.02; };
static constexpr size_t max_concurrent_sstable_loads() { return 3; }
struct db_stats {
uint64_t total_writes = 0;
uint64_t total_writes_failed = 0;
@@ -1053,10 +1060,6 @@ private:
uint64_t total_reads_failed = 0;
uint64_t sstable_read_queue_overloaded = 0;
uint64_t active_reads = 0;
uint64_t active_reads_streaming = 0;
uint64_t active_reads_system_keyspace = 0;
uint64_t short_data_queries = 0;
uint64_t short_mutation_queries = 0;
};
@@ -1073,11 +1076,9 @@ private:
seastar::thread_scheduling_group _background_writer_scheduling_group;
flush_cpu_controller _memtable_cpu_controller;
semaphore _read_concurrency_sem{max_memory_concurrent_reads()};
semaphore _streaming_concurrency_sem{max_memory_streaming_concurrent_reads()};
restricted_mutation_reader_config _read_concurrency_config;
semaphore _system_read_concurrency_sem{max_memory_system_concurrent_reads()};
restricted_mutation_reader_config _system_read_concurrency_config;
reader_concurrency_semaphore _read_concurrency_sem;
reader_concurrency_semaphore _streaming_concurrency_sem;
reader_concurrency_semaphore _system_read_concurrency_sem;
semaphore _sstable_load_concurrency_sem{max_concurrent_sstable_loads()};
@@ -1245,7 +1246,7 @@ public:
std::unordered_set<sstring> get_initial_tokens();
std::experimental::optional<gms::inet_address> get_replace_address();
bool is_replacing();
semaphore& system_keyspace_read_concurrency_sem() {
reader_concurrency_semaphore& system_keyspace_read_concurrency_sem() {
return _system_read_concurrency_sem;
}
semaphore& sstable_load_concurrency_sem() {

View File

@@ -718,7 +718,7 @@ public:
*/
auto me = shared_from_this();
auto fp = _file_pos;
return _pending_ops.wait_for_pending(timeout).then([me = std::move(me), fp, timeout] {
return _pending_ops.wait_for_pending(timeout).then([me, fp, timeout] {
if (fp != me->_file_pos) {
// some other request already wrote this buffer.
// If so, wait for the operation at our intended file offset

View File

@@ -64,8 +64,11 @@
#include "db/config.hh"
#include "md5_hasher.hh"
#include <seastar/util/noncopyable_function.hh>
#include <boost/algorithm/string/predicate.hpp>
#include <boost/range/algorithm/copy.hpp>
#include <boost/range/algorithm/transform.hpp>
#include <boost/range/adaptor/map.hpp>
#include <boost/range/join.hpp>
@@ -126,7 +129,11 @@ static void merge_tables_and_views(distributed<service::storage_proxy>& proxy,
std::map<qualified_name, schema_mutations>&& views_before,
std::map<qualified_name, schema_mutations>&& views_after);
static void merge_types(distributed<service::storage_proxy>& proxy,
struct user_types_to_drop final {
seastar::noncopyable_function<void()> drop;
};
[[nodiscard]] static user_types_to_drop merge_types(distributed<service::storage_proxy>& proxy,
schema_result&& before,
schema_result&& after);
@@ -832,7 +839,7 @@ static future<> do_merge_schema(distributed<service::storage_proxy>& proxy, std:
#endif
std::set<sstring> keyspaces_to_drop = merge_keyspaces(proxy, std::move(old_keyspaces), std::move(new_keyspaces)).get0();
merge_types(proxy, std::move(old_types), std::move(new_types));
auto types_to_drop = merge_types(proxy, std::move(old_types), std::move(new_types));
merge_tables_and_views(proxy,
std::move(old_column_families), std::move(new_column_families),
std::move(old_views), std::move(new_views));
@@ -840,6 +847,8 @@ static future<> do_merge_schema(distributed<service::storage_proxy>& proxy, std:
mergeFunctions(oldFunctions, newFunctions);
mergeAggregates(oldAggregates, newAggregates);
#endif
types_to_drop.drop();
proxy.local().get_db().invoke_on_all([keyspaces_to_drop = std::move(keyspaces_to_drop)] (database& db) {
// it is safe to drop a keyspace only when all nested ColumnFamilies where deleted
return do_for_each(keyspaces_to_drop, [&db] (auto keyspace_to_drop) {
@@ -996,30 +1005,37 @@ static void merge_tables_and_views(distributed<service::storage_proxy>& proxy,
}).get();
}
static inline void collect_types(std::set<sstring>& keys, schema_result& result, std::vector<user_type>& to)
struct naked_user_type {
const sstring keyspace;
const sstring qualified_name;
};
static inline void collect_types(std::set<sstring>& keys, schema_result& result, std::vector<naked_user_type>& to)
{
for (auto&& key : keys) {
auto&& value = result[key];
auto types = create_types_from_schema_partition(schema_result_value_type{key, std::move(value)});
std::move(types.begin(), types.end(), std::back_inserter(to));
boost::transform(types, std::back_inserter(to), [] (user_type type) {
return naked_user_type{std::move(type->_keyspace), std::move(type->name())};
});
}
}
// see the comments for merge_keyspaces()
static void merge_types(distributed<service::storage_proxy>& proxy, schema_result&& before, schema_result&& after)
// see the comments for merge_keyspaces()
[[nodiscard]] static user_types_to_drop merge_types(distributed<service::storage_proxy>& proxy, schema_result&& before, schema_result&& after)
{
std::vector<user_type> created, altered, dropped;
std::vector<naked_user_type> created, altered, dropped;
auto diff = difference(before, after, indirect_equal_to<lw_shared_ptr<query::result_set>>());
collect_types(diff.entries_only_on_left, before, dropped); // Keyspaces with no more types
collect_types(diff.entries_only_on_right, after, created); // New keyspaces with types
for (auto&& key : diff.entries_differing) {
for (auto&& keyspace : diff.entries_differing) {
// The user types of this keyspace differ, so diff the current types with the updated ones
auto current_types = proxy.local().get_db().local().find_keyspace(key).metadata()->user_types()->get_all_types();
auto current_types = proxy.local().get_db().local().find_keyspace(keyspace).metadata()->user_types()->get_all_types();
decltype(current_types) updated_types;
auto ts = create_types_from_schema_partition(schema_result_value_type{key, std::move(after[key])});
auto ts = create_types_from_schema_partition(schema_result_value_type{keyspace, std::move(after[keyspace])});
updated_types.reserve(ts.size());
for (auto&& type : ts) {
updated_types[type->_name] = std::move(type);
@@ -1027,36 +1043,46 @@ static void merge_types(distributed<service::storage_proxy>& proxy, schema_resul
auto delta = difference(current_types, updated_types, indirect_equal_to<user_type>());
for (auto&& key : delta.entries_only_on_left) {
dropped.emplace_back(current_types[key]);
for (auto&& type_name : delta.entries_only_on_left) {
dropped.emplace_back(naked_user_type{keyspace, current_types[type_name]->name()});
}
for (auto&& key : delta.entries_only_on_right) {
created.emplace_back(std::move(updated_types[key]));
for (auto&& type_name : delta.entries_only_on_right) {
created.emplace_back(naked_user_type{keyspace, updated_types[type_name]->name()});
}
for (auto&& key : delta.entries_differing) {
altered.emplace_back(std::move(updated_types[key]));
for (auto&& type_name : delta.entries_differing) {
altered.emplace_back(naked_user_type{keyspace, updated_types[type_name]->name()});
}
}
proxy.local().get_db().invoke_on_all([&created, &dropped, &altered] (database& db) {
// Create and update user types before any tables/views are created that potentially
// use those types. Similarly, defer dropping until after tables/views that may use
// some of these user types are dropped.
proxy.local().get_db().invoke_on_all([&created, &altered] (database& db) {
return seastar::async([&] {
for (auto&& type : created) {
auto user_type = dynamic_pointer_cast<const user_type_impl>(parse_type(type->name()));
auto user_type = dynamic_pointer_cast<const user_type_impl>(parse_type(type.qualified_name));
db.find_keyspace(user_type->_keyspace).add_user_type(user_type);
service::get_local_migration_manager().notify_create_user_type(user_type).get();
}
for (auto&& type : dropped) {
auto user_type = dynamic_pointer_cast<const user_type_impl>(parse_type(type->name()));
db.find_keyspace(user_type->_keyspace).remove_user_type(user_type);
service::get_local_migration_manager().notify_drop_user_type(user_type).get();
}
for (auto&& type : altered) {
auto user_type = dynamic_pointer_cast<const user_type_impl>(parse_type(type->name()));
auto user_type = dynamic_pointer_cast<const user_type_impl>(parse_type(type.qualified_name));
db.find_keyspace(user_type->_keyspace).add_user_type(user_type);
service::get_local_migration_manager().notify_update_user_type(user_type).get();
}
});
}).get();
return user_types_to_drop{[&proxy, dropped = std::move(dropped)] {
proxy.local().get_db().invoke_on_all([dropped = std::move(dropped)](database& db) {
return do_for_each(dropped, [&db](auto& user_type_to_drop) {
auto user_type = dynamic_pointer_cast<const user_type_impl>(
parse_type(std::move(user_type_to_drop.qualified_name)));
db.find_keyspace(user_type->_keyspace).remove_user_type(user_type);
return service::get_local_migration_manager().notify_drop_user_type(user_type);
});
}).get();
}};
}
#if 0

View File

@@ -1577,10 +1577,7 @@ void make(database& db, bool durable, bool volatile_testing_only) {
kscfg.enable_commitlog = !volatile_testing_only;
kscfg.enable_cache = true;
// don't make system keyspace reads wait for user reads
kscfg.read_concurrency_config.resources_sem = &db.system_keyspace_read_concurrency_sem();
kscfg.read_concurrency_config.active_reads = &db.get_stats().active_reads_system_keyspace;
kscfg.read_concurrency_config.timeout = {};
kscfg.read_concurrency_config.max_queue_length = std::numeric_limits<size_t>::max();
kscfg.read_concurrency_semaphore = &db.system_keyspace_read_concurrency_sem();
// don't make system keyspace writes wait for user writes (if under pressure)
kscfg.dirty_memory_manager = &db._system_dirty_memory_manager;
keyspace _ks{ksm, std::move(kscfg)};

View File

@@ -175,6 +175,31 @@ static bool update_requires_read_before_write(const schema& base,
return false;
}
static bool is_partition_key_empty(
const schema& base,
const schema& view_schema,
const partition_key& base_key,
const clustering_row& update) {
// Empty partition keys are not supported on normal tables - they cannot
// be inserted or queried, so enforce those rules here.
if (view_schema.partition_key_columns().size() > 1) {
// Composite partition keys are different: all components
// are then allowed to be empty.
return false;
}
auto* base_col = base.get_column_definition(view_schema.partition_key_columns().front().name());
switch (base_col->kind) {
case column_kind::partition_key:
return base_key.get_component(base, base_col->position()).empty();
case column_kind::clustering_key:
return update.key().get_component(base, base_col->position()).empty();
default:
// No multi-cell columns in the view's partition key
auto& c = update.cells().cell_at(base_col->id);
return c.as_atomic_cell().value().empty();
}
}
bool matches_view_filter(const schema& base, const view_info& view, const partition_key& key, const clustering_row& update, gc_clock::time_point now) {
return clustering_prefix_matches(base, view, key, update.key())
&& boost::algorithm::all_of(
@@ -330,7 +355,7 @@ static void add_cells_to_view(const schema& base, const schema& view, const row&
* This method checks that the base row does match the view filter before applying anything.
*/
void view_updates::create_entry(const partition_key& base_key, const clustering_row& update, gc_clock::time_point now) {
if (!matches_view_filter(*_base, _view_info, base_key, update, now)) {
if (is_partition_key_empty(*_base, *_view, base_key, update) || !matches_view_filter(*_base, _view_info, base_key, update, now)) {
return;
}
deletable_row& r = get_view_row(base_key, update);
@@ -346,7 +371,7 @@ void view_updates::create_entry(const partition_key& base_key, const clustering_
void view_updates::delete_old_entry(const partition_key& base_key, const clustering_row& existing, const row_tombstone& t, gc_clock::time_point now) {
// Before deleting an old entry, make sure it was matching the view filter
// (otherwise there is nothing to delete)
if (matches_view_filter(*_base, _view_info, base_key, existing, now)) {
if (!is_partition_key_empty(*_base, *_view, base_key, existing) && matches_view_filter(*_base, _view_info, base_key, existing, now)) {
do_delete_old_entry(base_key, existing, t, now);
}
}
@@ -391,11 +416,11 @@ void view_updates::do_delete_old_entry(const partition_key& base_key, const clus
void view_updates::update_entry(const partition_key& base_key, const clustering_row& update, const clustering_row& existing, gc_clock::time_point now) {
// While we know update and existing correspond to the same view entry,
// they may not match the view filter.
if (!matches_view_filter(*_base, _view_info, base_key, existing, now)) {
if (is_partition_key_empty(*_base, *_view, base_key, existing) || !matches_view_filter(*_base, _view_info, base_key, existing, now)) {
create_entry(base_key, update, now);
return;
}
if (!matches_view_filter(*_base, _view_info, base_key, update, now)) {
if (is_partition_key_empty(*_base, *_view, base_key, update) || !matches_view_filter(*_base, _view_info, base_key, update, now)) {
do_delete_old_entry(base_key, existing, row_tombstone(), now);
return;
}

View File

@@ -300,6 +300,7 @@ future<> range_streamer::do_stream_async() {
unsigned sp_index = 0;
unsigned nr_ranges_streamed = 0;
size_t nr_ranges_total = range_vec.size();
size_t nr_ranges_per_stream_plan = nr_ranges_total / 10;
dht::token_range_vector ranges_to_stream;
auto do_streaming = [&] {
auto sp = stream_plan(sprint("%s-%s-index-%d", description, keyspace, sp_index++));
@@ -318,7 +319,7 @@ future<> range_streamer::do_stream_async() {
ranges_to_stream.push_back(*it);
it = range_vec.erase(it);
nr_ranges_streamed++;
if (ranges_to_stream.size() < _nr_ranges_per_stream_plan) {
if (ranges_to_stream.size() < nr_ranges_per_stream_plan) {
continue;
} else {
do_streaming();

View File

@@ -174,8 +174,6 @@ private:
std::unordered_set<std::unique_ptr<i_source_filter>> _source_filters;
stream_plan _stream_plan;
std::unordered_map<sstring, std::vector<sstring>> _column_families;
// Number of ranges to stream per stream plan
unsigned _nr_ranges_per_stream_plan = 10;
// Retry the stream plan _nr_max_retry times
unsigned _nr_retried = 0;
unsigned _nr_max_retry = 5;

View File

@@ -43,7 +43,7 @@ done
. /etc/os-release
case "$ID" in
"centos")
AMI=ami-46bf8a51
AMI=ami-ae7bfdb8
REGION=us-east-1
SSH_USERNAME=centos
;;

View File

@@ -1 +0,0 @@
options raid0 devices_discard_performance=Y

View File

@@ -112,10 +112,15 @@ run_setup_script() {
name=$1
shift 1
$* &&:
if [ $? -ne 0 ] && [ $INTERACTIVE -eq 1 ]; then
printf "${RED}$name setup failed. press any key to continue...${NO_COLOR}\n"
read
return 1
if [ $? -ne 0 ]; then
if [ $INTERACTIVE -eq 1 ]; then
printf "${RED}$name setup failed. press any key to continue...${NO_COLOR}\n"
read
return 1
else
printf "$name setup failed.\n"
exit 1
fi
fi
return 0
}

View File

@@ -2,10 +2,11 @@
. /etc/os-release
print_usage() {
echo "build_deb.sh -target <codename> --dist --rebuild-dep"
echo "build_deb.sh -target <codename> --dist --rebuild-dep --jobs 2"
echo " --target target distribution codename"
echo " --dist create a public distribution package"
echo " --no-clean don't rebuild pbuilder tgz"
echo " --jobs specify number of jobs"
exit 1
}
install_deps() {
@@ -19,6 +20,7 @@ install_deps() {
DIST=0
TARGET=
NO_CLEAN=0
JOBS=0
while [ $# -gt 0 ]; do
case "$1" in
"--dist")
@@ -33,6 +35,10 @@ while [ $# -gt 0 ]; do
NO_CLEAN=1
shift 1
;;
"--jobs")
JOBS=$2
shift 2
;;
*)
print_usage
;;
@@ -127,6 +133,8 @@ if [ "$TARGET" = "jessie" ]; then
cp dist/debian/scylla-server.cron.d debian/
sed -i -e "s/@@REVISION@@/1~$TARGET/g" debian/changelog
sed -i -e "s/@@DH_INSTALLINIT@@//g" debian/rules
sed -i -e "s/@@INSTALL_HK_DAILY_INIT@@/dh_installinit --no-start --name scylla-housekeeping-daily/g" debian/rules
sed -i -e "s/@@INSTALL_HK_RESTART_INIT@@/dh_installinit --no-start --name scylla-housekeeping-restart/g" debian/rules
sed -i -e "s#@@COMPILER@@#/opt/scylladb/bin/g++-7#g" debian/rules
sed -i -e "s/@@BUILD_DEPENDS@@/libsystemd-dev, scylla-gcc72-g++-7, libunwind-dev, scylla-antlr35, scylla-libthrift010-dev, scylla-antlr35-c++-dev, scylla-libboost-program-options163-dev, scylla-libboost-filesystem163-dev, scylla-libboost-system163-dev, scylla-libboost-thread163-dev, scylla-libboost-test163-dev/g" debian/control
sed -i -e "s/@@DEPENDS@@//g" debian/control
@@ -141,6 +149,8 @@ elif [ "$TARGET" = "stretch" ]; then
cp dist/debian/scylla-server.cron.d debian/
sed -i -e "s/@@REVISION@@/1~$TARGET/g" debian/changelog
sed -i -e "s/@@DH_INSTALLINIT@@//g" debian/rules
sed -i -e "s/@@INSTALL_HK_DAILY_INIT@@/dh_installinit --no-start --name scylla-housekeeping-daily/g" debian/rules
sed -i -e "s/@@INSTALL_HK_RESTART_INIT@@/dh_installinit --no-start --name scylla-housekeeping-restart/g" debian/rules
sed -i -e "s#@@COMPILER@@#/opt/scylladb/bin/g++-7#g" debian/rules
sed -i -e "s/@@BUILD_DEPENDS@@/libsystemd-dev, scylla-gcc72-g++-7, libunwind-dev, antlr3, scylla-libthrift010-dev, scylla-antlr35-c++-dev, libboost-program-options1.62-dev, libboost-filesystem1.62-dev, libboost-system1.62-dev, libboost-thread1.62-dev, libboost-test1.62-dev/g" debian/control
sed -i -e "s/@@DEPENDS@@//g" debian/control
@@ -155,6 +165,8 @@ elif [ "$TARGET" = "trusty" ]; then
cp dist/debian/scylla-server.cron.d debian/
sed -i -e "s/@@REVISION@@/0ubuntu1~$TARGET/g" debian/changelog
sed -i -e "s/@@DH_INSTALLINIT@@/--upstart-only/g" debian/rules
sed -i -e "s/@@INSTALL_HK_DAILY_INIT@@/dh_installinit --no-start --name scylla-housekeeping --upstart-only/g" debian/rules
sed -i -e "s/@@INSTALL_HK_RESTART_INIT@@//g" debian/rules
sed -i -e "s#@@COMPILER@@#/opt/scylladb/bin/g++-7#g" debian/rules
sed -i -e "s/@@BUILD_DEPENDS@@/scylla-gcc72-g++-7, libunwind8-dev, scylla-antlr35, scylla-libthrift010-dev, scylla-antlr35-c++-dev, scylla-libboost-program-options163-dev, scylla-libboost-filesystem163-dev, scylla-libboost-system163-dev, scylla-libboost-thread163-dev, scylla-libboost-test163-dev/g" debian/control
sed -i -e "s/@@DEPENDS@@/hugepages, num-utils/g" debian/control
@@ -168,6 +180,8 @@ elif [ "$TARGET" = "trusty" ]; then
elif [ "$TARGET" = "xenial" ]; then
sed -i -e "s/@@REVISION@@/0ubuntu1~$TARGET/g" debian/changelog
sed -i -e "s/@@DH_INSTALLINIT@@//g" debian/rules
sed -i -e "s/@@INSTALL_HK_DAILY_INIT@@/dh_installinit --no-start --name scylla-housekeeping-daily/g" debian/rules
sed -i -e "s/@@INSTALL_HK_RESTART_INIT@@/dh_installinit --no-start --name scylla-housekeeping-restart/g" debian/rules
sed -i -e "s#@@COMPILER@@#/opt/scylladb/bin/g++-7#g" debian/rules
sed -i -e "s/@@BUILD_DEPENDS@@/libsystemd-dev, scylla-gcc72-g++-7, libunwind-dev, antlr3, scylla-libthrift010-dev, scylla-antlr35-c++-dev, scylla-libboost-program-options163-dev, scylla-libboost-filesystem163-dev, scylla-libboost-system163-dev, scylla-libboost-thread163-dev, scylla-libboost-test163-dev/g" debian/control
sed -i -e "s/@@DEPENDS@@/hugepages, /g" debian/control
@@ -181,6 +195,8 @@ elif [ "$TARGET" = "xenial" ]; then
elif [ "$TARGET" = "bionic" ]; then
sed -i -e "s/@@REVISION@@/0ubuntu1~$TARGET/g" debian/changelog
sed -i -e "s/@@DH_INSTALLINIT@@//g" debian/rules
sed -i -e "s/@@INSTALL_HK_DAILY_INIT@@/dh_installinit --no-start --name scylla-housekeeping-daily/g" debian/rules
sed -i -e "s/@@INSTALL_HK_RESTART_INIT@@/dh_installinit --no-start --name scylla-housekeeping-restart/g" debian/rules
sed -i -e "s#@@COMPILER@@#g++-7#g" debian/rules
sed -i -e "s/@@BUILD_DEPENDS@@/libsystemd-dev, g++, libunwind-dev, antlr3, scylla-libthrift010-dev, scylla-antlr35-c++-dev, libboost-program-options-dev, libboost-filesystem-dev, libboost-system-dev, libboost-thread-dev, libboost-test-dev/g" debian/control
sed -i -e "s/@@DEPENDS@@/hugepages, /g" debian/control
@@ -194,6 +210,8 @@ elif [ "$TARGET" = "bionic" ]; then
elif [ "$TARGET" = "yakkety" ] || [ "$TARGET" = "zesty" ] || [ "$TARGET" = "artful" ]; then
sed -i -e "s/@@REVISION@@/0ubuntu1~$TARGET/g" debian/changelog
sed -i -e "s/@@DH_INSTALLINIT@@//g" debian/rules
sed -i -e "s/@@INSTALL_HK_DAILY_INIT@@/dh_installinit --no-start --name scylla-housekeeping-daily/g" debian/rules
sed -i -e "s/@@INSTALL_HK_RESTART_INIT@@/dh_installinit --no-start --name scylla-housekeeping-restart/g" debian/rules
sed -i -e "s/@@COMPILER@@/g++-7/g" debian/rules
sed -i -e "s/@@BUILD_DEPENDS@@/libsystemd-dev, g++-7, libunwind-dev, antlr3, scylla-libthrift010-dev, scylla-antlr35-c++-dev, libboost-program-options-dev, libboost-filesystem-dev, libboost-system-dev, libboost-thread-dev, libboost-test-dev/g" debian/control
sed -i -e "s/@@DEPENDS@@/hugepages, /g" debian/control
@@ -221,16 +239,19 @@ sed -i -e "s#@@REPOFILES@@#'/etc/apt/sources.list.d/scylla*.list'#g" debian/scyl
cp dist/common/systemd/scylla-fstrim.service debian/scylla-server.scylla-fstrim.service
cp dist/common/systemd/node-exporter.service debian/scylla-server.node-exporter.service
cp ./dist/debian/pbuilderrc ~/.pbuilderrc
sudo cp ./dist/debian/pbuilderrc ~root/.pbuilderrc
if [ $NO_CLEAN -eq 0 ]; then
sudo rm -fv /var/cache/pbuilder/scylla-server-$TARGET.tgz
sudo -E DIST=$TARGET /usr/sbin/pbuilder clean
sudo -E DIST=$TARGET /usr/sbin/pbuilder create --allow-untrusted
sudo -H DIST=$TARGET /usr/sbin/pbuilder clean
sudo -H DIST=$TARGET /usr/sbin/pbuilder create --allow-untrusted
fi
sudo -E DIST=$TARGET /usr/sbin/pbuilder update --allow-untrusted
if [ $JOBS -ne 0 ]; then
DEB_BUILD_OPTIONS="parallel=$JOBS"
fi
sudo -H DIST=$TARGET /usr/sbin/pbuilder update --allow-untrusted
if [ "$TARGET" = "trusty" ] || [ "$TARGET" = "xenial" ] || [ "$TARGET" = "yakkety" ] || [ "$TARGET" = "zesty" ] || [ "$TARGET" = "artful" ] || [ "$TARGET" = "bionic" ]; then
sudo -E DIST=$TARGET /usr/sbin/pbuilder execute --save-after-exec dist/debian/ubuntu_enable_ppa.sh
sudo -H DIST=$TARGET /usr/sbin/pbuilder execute --save-after-exec dist/debian/ubuntu_enable_ppa.sh
elif [ "$TARGET" = "jessie" ] || [ "$TARGET" = "stretch" ]; then
sudo -E DIST=$TARGET /usr/sbin/pbuilder execute --save-after-exec dist/debian/debian_install_gpgkey.sh
sudo -H DIST=$TARGET /usr/sbin/pbuilder execute --save-after-exec dist/debian/debian_install_gpgkey.sh
fi
sudo -E DIST=$TARGET pdebuild --buildresult build/debs
sudo -H DIST=$TARGET DEB_BUILD_OPTIONS=$DEB_BUILD_OPTIONS pdebuild --buildresult build/debs

View File

@@ -1,12 +1,13 @@
#!/usr/bin/make -f
export PYBUILD_DISABLE=1
jobs := $(shell echo $$DEB_BUILD_OPTIONS | sed -r "s/.*parallel=([0-9]+).*/-j\1/")
override_dh_auto_configure:
./configure.py --enable-dpdk --mode=release --static-thrift --static-boost --compiler=@@COMPILER@@ --cflags="-I/opt/scylladb/include -L/opt/scylladb/lib/x86-linux-gnu/" --ldflags="-Wl,-rpath=/opt/scylladb/lib"
override_dh_auto_build:
PATH="/opt/scylladb/bin:$$PATH" ninja
PATH="/opt/scylladb/bin:$$PATH" ninja $(jobs)
override_dh_auto_clean:
rm -rf build/release seastar/build
@@ -15,8 +16,8 @@ override_dh_auto_clean:
override_dh_installinit:
dh_installinit --no-start @@DH_INSTALLINIT@@
dh_installinit --no-start --name scylla-housekeeping-daily @@DH_INSTALLINIT@@
dh_installinit --no-start --name scylla-housekeeping-restart @@DH_INSTALLINIT@@
@@INSTALL_HK_DAILY_INIT@@
@@INSTALL_HK_RESTART_INIT@@
dh_installinit --no-start --name scylla-fstrim @@DH_INSTALLINIT@@
dh_installinit --no-start --name node-exporter @@DH_INSTALLINIT@@

View File

@@ -14,8 +14,10 @@ ADD etc/sysconfig/scylla-server /etc/sysconfig/scylla-server
# Supervisord configuration:
ADD etc/supervisord.conf /etc/supervisord.conf
ADD etc/supervisord.conf.d/scylla-server.conf /etc/supervisord.conf.d/scylla-server.conf
ADD etc/supervisord.conf.d/scylla-housekeeping.conf /etc/supervisord.conf.d/scylla-housekeeping.conf
ADD etc/supervisord.conf.d/scylla-jmx.conf /etc/supervisord.conf.d/scylla-jmx.conf
ADD scylla-service.sh /scylla-service.sh
ADD scylla-housekeeping-service.sh /scylla-housekeeping-service.sh
ADD scylla-jmx-service.sh /scylla-jmx-service.sh
# Docker image startup scripts:

View File

@@ -14,4 +14,5 @@ def parse():
parser.add_argument('--broadcast-address', default=None, dest='broadcastAddress')
parser.add_argument('--broadcast-rpc-address', default=None, dest='broadcastRpcAddress')
parser.add_argument('--api-address', default=None, dest='apiAddress')
parser.add_argument('--disable-version-check', default=False, action='store_true', dest='disable_housekeeping', help="Disable version check")
return parser.parse_args()

View File

@@ -15,6 +15,7 @@ try:
setup.io()
setup.cqlshrc()
setup.arguments()
setup.set_housekeeping()
os.system("/usr/bin/supervisord -c /etc/supervisord.conf")
except:
logging.exception('failed!')

View File

@@ -0,0 +1,6 @@
[program:scylla-housekeeping]
command=/scylla-housekeeping-service.sh
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0

View File

@@ -0,0 +1,8 @@
#!/bin/bash
sleep 5
/usr/lib/scylla/scylla-housekeeping --uuid-file /var/lib/scylla-housekeeping/housekeeping.uuid --repo-files '/etc/yum.repos.d/scylla*.repo' -q version --mode cr || true
while true; do
sleep 1d
/usr/lib/scylla/scylla-housekeeping --uuid-file /var/lib/scylla-housekeeping/housekeeping.uuid --repo-files '/etc/yum.repos.d/scylla*.repo' -q version --mode cd || true
done

View File

@@ -15,6 +15,7 @@ class ScyllaSetup:
self._smp = arguments.smp
self._memory = arguments.memory
self._overprovisioned = arguments.overprovisioned
self._housekeeping = not arguments.disable_housekeeping
self._experimental = arguments.experimental
def _run(self, *args, **kwargs):
@@ -38,6 +39,14 @@ class ScyllaSetup:
with open("%s/.cqlshrc" % home, "w") as cqlshrc:
cqlshrc.write("[connection]\nhostname = %s\n" % hostname)
def set_housekeeping(self):
with open("/etc/scylla.d/housekeeping.cfg", "w") as f:
f.write("[housekeeping]\ncheck-version: ")
if self._housekeeping:
f.write("True\n")
else:
f.write("False\n")
def arguments(self):
args = []
if self._memory is not None:

View File

@@ -102,11 +102,11 @@ fi
if [ $JOBS -gt 0 ]; then
SRPM_OPTS="$SRPM_OPTS --define='_smp_mflags -j$JOBS'"
RPM_JOBS_OPTS=(--define="_smp_mflags -j$JOBS")
fi
sudo mock --buildsrpm --root=$TARGET --resultdir=`pwd`/build/srpms --spec=build/scylla.spec --sources=build/scylla-$VERSION.tar $SRPM_OPTS
sudo mock --buildsrpm --root=$TARGET --resultdir=`pwd`/build/srpms --spec=build/scylla.spec --sources=build/scylla-$VERSION.tar $SRPM_OPTS "${RPM_JOBS_OPTS[@]}"
if [ "$TARGET" = "epel-7-x86_64" ]; then
TARGET=scylla-$TARGET
RPM_OPTS="$RPM_OPTS --configdir=dist/redhat/mock"
fi
sudo mock --rebuild --root=$TARGET --resultdir=`pwd`/build/rpms $RPM_OPTS build/srpms/scylla-$VERSION*.src.rpm
sudo mock --rebuild --root=$TARGET --resultdir=`pwd`/build/rpms $RPM_OPTS "${RPM_JOBS_OPTS[@]}" build/srpms/scylla-$VERSION*.src.rpm

View File

@@ -92,9 +92,6 @@ mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/security/limits.d/
mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/collectd.d/
mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/scylla/
mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/scylla.d/
%if 0%{?rhel}
mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/modprobe.d/
%endif
mkdir -p $RPM_BUILD_ROOT%{_sysctldir}/
mkdir -p $RPM_BUILD_ROOT%{_docdir}/scylla/
mkdir -p $RPM_BUILD_ROOT%{_unitdir}
@@ -105,9 +102,6 @@ install -m644 dist/common/limits.d/scylla.conf $RPM_BUILD_ROOT%{_sysconfdir}/sec
install -m644 dist/common/collectd.d/scylla.conf $RPM_BUILD_ROOT%{_sysconfdir}/collectd.d/
install -m644 dist/common/scylla.d/*.conf $RPM_BUILD_ROOT%{_sysconfdir}/scylla.d/
install -m644 dist/common/sysctl.d/*.conf $RPM_BUILD_ROOT%{_sysctldir}/
%if 0%{?rhel}
install -m644 dist/common/modprobe.d/*.conf $RPM_BUILD_ROOT%{_sysconfdir}/modprobe.d/
%endif
install -d -m755 $RPM_BUILD_ROOT%{_sysconfdir}/scylla
install -m644 conf/scylla.yaml $RPM_BUILD_ROOT%{_sysconfdir}/scylla/
install -m644 conf/cassandra-rackdc.properties $RPM_BUILD_ROOT%{_sysconfdir}/scylla/
@@ -300,18 +294,9 @@ if Scylla is the main application on your server and you wish to optimize its la
# We cannot use the sysctl_apply rpm macro because it is not present in 7.0
# following is a "manual" expansion
/usr/lib/systemd/systemd-sysctl 99-scylla-sched.conf >/dev/null 2>&1 || :
# Write modprobe.d params when module already loaded
%if 0%{?rhel}
if [ -e /sys/module/raid0/parameters/devices_discard_performance ]; then
echo Y > /sys/module/raid0/parameters/devices_discard_performance
fi
%endif
%files kernel-conf
%defattr(-,root,root)
%if 0%{?rhel}
%config(noreplace) %{_sysconfdir}/modprobe.d/*.conf
%endif
%{_sysctldir}/*.conf
%changelog

View File

@@ -208,6 +208,12 @@ $ docker run --name some-scylla -d scylladb/scylla --experimental 1
**Since: 2.0**
### `--disable-version-check`
The `--disable-version-check` disable the version validation check.
**Since: 2.2**
# User Feedback
## Issues

View File

@@ -461,7 +461,8 @@ future<> gossiper::apply_state_locally(std::map<inet_address, endpoint_state> ma
int local_generation = local_ep_state_ptr.get_heart_beat_state().get_generation();
int remote_generation = remote_state.get_heart_beat_state().get_generation();
logger.trace("{} local generation {}, remote generation {}", ep, local_generation, remote_generation);
if (local_generation != 0 && remote_generation > local_generation + MAX_GENERATION_DIFFERENCE) {
// A node was removed with nodetool removenode can have a generation of 2
if (local_generation > 2 && remote_generation > local_generation + MAX_GENERATION_DIFFERENCE) {
// assume some peer has corrupted memory and is broadcasting an unbelievable generation about another peer (or itself)
logger.warn("received an invalid gossip generation for peer {}; local generation = {}, received generation = {}",
ep, local_generation, remote_generation);
@@ -832,6 +833,7 @@ int gossiper::get_max_endpoint_state_version(endpoint_state state) {
// Runs inside seastar::async context
void gossiper::evict_from_membership(inet_address endpoint) {
auto permit = lock_endpoint(endpoint).get0();
_unreachable_endpoints.erase(endpoint);
container().invoke_on_all([endpoint] (auto& g) {
g.endpoint_state_map.erase(endpoint);
@@ -982,7 +984,7 @@ future<> gossiper::assassinate_endpoint(sstring address) {
logger.warn("Assassinating {} via gossip", endpoint);
if (es) {
auto& ss = service::get_local_storage_service();
auto tokens = ss.get_token_metadata().get_tokens(endpoint);
tokens = ss.get_token_metadata().get_tokens(endpoint);
if (tokens.empty()) {
logger.warn("Unable to calculate tokens for {}. Will use a random one", address);
throw std::runtime_error(sprint("Unable to calculate tokens for %s", endpoint));

View File

@@ -105,6 +105,7 @@ private:
public:
intrusive_set_external_comparator() { algo::init_header(_header.this_ptr()); }
intrusive_set_external_comparator(intrusive_set_external_comparator&& o) {
algo::init_header(_header.this_ptr());
algo::swap_tree(_header.this_ptr(), node_ptr(o._header.this_ptr()));
}
iterator begin() { return iterator(algo::begin_node(_header.this_ptr()), priv_value_traits_ptr()); }

View File

@@ -733,6 +733,10 @@ public:
static const compound& get_compound_type(const schema& s) {
return s.clustering_key_prefix_type();
}
static clustering_key_prefix_view make_empty() {
return { bytes_view() };
}
};
class clustering_key_prefix : public prefix_compound_wrapper<clustering_key_prefix, clustering_key_prefix_view, clustering_key> {

View File

@@ -100,7 +100,6 @@ future<> ec2_multi_region_snitch::gossiper_starting() {
// Note: currently gossiper "main" instance always runs on CPU0 therefore
// this function will be executed on CPU0 only.
//
ec2_snitch::gossiper_starting();
using namespace gms;
auto& g = get_local_gossiper();

View File

@@ -31,6 +31,7 @@ class md5_hasher {
CryptoPP::Weak::MD5 hash{};
public:
void update(const char* ptr, size_t length) {
using namespace CryptoPP;
static_assert(sizeof(char) == sizeof(byte), "Assuming lengths will be the same");
hash.Update(reinterpret_cast<const byte*>(ptr), length * sizeof(byte));
}

View File

@@ -1849,9 +1849,10 @@ void mutation_querier::query_static_row(const row& r, tombstone current_tombston
} else if (_short_reads_allowed) {
seastar::measuring_output_stream stream;
ser::qr_partition__static_row__cells<seastar::measuring_output_stream> out(stream, { });
auto start = stream.size();
get_compacted_row_slice(_schema, slice, column_kind::static_column,
r, slice.static_columns, _static_cells_wr);
_memory_accounter.update(stream.size());
r, slice.static_columns, out);
_memory_accounter.update(stream.size() - start);
}
if (_pw.requested_digest()) {
::feed_hash(_pw.digest(), current_tombstone);
@@ -1909,8 +1910,9 @@ stop_iteration mutation_querier::consume(clustering_row&& cr, row_tombstone curr
} else if (_short_reads_allowed) {
seastar::measuring_output_stream stream;
ser::qr_partition__rows<seastar::measuring_output_stream> out(stream, { });
auto start = stream.size();
write_row(out);
stop = _memory_accounter.update_and_check(stream.size());
stop = _memory_accounter.update_and_check(stream.size() - start);
}
_live_clustering_rows++;

View File

@@ -27,7 +27,6 @@
#include "core/future-util.hh"
#include "utils/move.hh"
#include "stdx.hh"
#include "reader_resource_tracker.hh"
#include "flat_mutation_reader.hh"
@@ -715,23 +714,55 @@ mutation_reader make_empty_reader() {
return make_mutation_reader<empty_reader>();
}
const reader_concurrency_semaphore::timeout_clock::duration
reader_concurrency_semaphore::no_timeout{reader_concurrency_semaphore::timeout_clock::duration::max()};
void reader_concurrency_semaphore::signal(const resources& r) {
_resources += r;
while (!_wait_list.empty() && has_available_units(_wait_list.front().res)) {
auto& x = _wait_list.front();
_resources -= x.res;
x.pr.set_value(make_lw_shared<reader_permit>(*this, x.res));
_wait_list.pop_front();
}
}
future<lw_shared_ptr<reader_concurrency_semaphore::reader_permit>> reader_concurrency_semaphore::wait_admission(size_t memory) {
if (_wait_list.size() >= _max_queue_length) {
return make_exception_future<lw_shared_ptr<reader_permit>>(_make_queue_overloaded_exception());
}
auto r = resources(1, static_cast<ssize_t>(memory));
if (may_proceed(r)) {
_resources -= r;
return make_ready_future<lw_shared_ptr<reader_permit>>(make_lw_shared<reader_permit>(*this, r));
}
promise<lw_shared_ptr<reader_permit>> pr;
auto fut = pr.get_future();
if (_timeout == no_timeout) {
_wait_list.push_back(entry(std::move(pr), r));
} else {
_wait_list.push_back(entry(std::move(pr), r), timeout_clock::now() + _timeout);
}
return fut;
}
// A file that tracks the memory usage of buffers resulting from read
// operations.
class tracking_file_impl : public file_impl {
file _tracked_file;
semaphore* _semaphore;
lw_shared_ptr<reader_concurrency_semaphore::reader_permit> _permit;
// Shouldn't be called if semaphore is NULL.
temporary_buffer<uint8_t> make_tracked_buf(temporary_buffer<uint8_t> buf) {
return seastar::temporary_buffer<uint8_t>(buf.get_write(),
buf.size(),
make_deleter(buf.release(), std::bind(&semaphore::signal, _semaphore, buf.size())));
make_deleter(buf.release(), std::bind(&reader_concurrency_semaphore::reader_permit::signal_memory, _permit, buf.size())));
}
public:
tracking_file_impl(file file, reader_resource_tracker resource_tracker)
: _tracked_file(std::move(file))
, _semaphore(resource_tracker.get_semaphore()) {
, _permit(resource_tracker.get_permit()) {
}
tracking_file_impl(const tracking_file_impl&) = delete;
@@ -793,9 +824,9 @@ public:
virtual future<temporary_buffer<uint8_t>> dma_read_bulk(uint64_t offset, size_t range_size, const io_priority_class& pc) override {
return get_file_impl(_tracked_file)->dma_read_bulk(offset, range_size, pc).then([this] (temporary_buffer<uint8_t> buf) {
if (_semaphore) {
if (_permit) {
buf = make_tracked_buf(std::move(buf));
_semaphore->consume(buf.size());
_permit->consume_memory(buf.size());
}
return make_ready_future<temporary_buffer<uint8_t>>(std::move(buf));
});
@@ -819,33 +850,23 @@ class restricting_mutation_reader : public flat_mutation_reader::impl {
streamed_mutation::forwarding _fwd;
mutation_reader::forwarding _fwd_mr;
flat_mutation_reader operator()() {
return _ms.make_flat_mutation_reader(std::move(_s), _range.get(), _slice.get(), _pc.get(), std::move(_trace_state), _fwd, _fwd_mr);
flat_mutation_reader operator()(reader_resource_tracker tracker) {
return _ms.make_flat_mutation_reader(std::move(_s), _range.get(), _slice.get(), _pc.get(), std::move(_trace_state), _fwd, _fwd_mr, tracker);
}
};
const restricted_mutation_reader_config& _config;
boost::variant<mutation_source_and_params, flat_mutation_reader> _reader_or_mutation_source;
struct pending_state {
reader_concurrency_semaphore* semaphore;
mutation_source_and_params reader_factory;
};
struct admitted_state {
lw_shared_ptr<reader_concurrency_semaphore::reader_permit> permit;
flat_mutation_reader reader;
};
boost::variant<pending_state, admitted_state> _state;
static const std::size_t new_reader_base_cost{16 * 1024};
future<> create_reader() {
auto f = _config.timeout.count() != 0
? _config.resources_sem->wait(_config.timeout, new_reader_base_cost)
: _config.resources_sem->wait(new_reader_base_cost);
return f.then([this] {
flat_mutation_reader reader = boost::get<mutation_source_and_params>(_reader_or_mutation_source)();
_reader_or_mutation_source = std::move(reader);
if (_config.active_reads) {
++(*_config.active_reads);
}
return make_ready_future<>();
});
}
template<typename Function>
GCC6_CONCEPT(
requires std::is_move_constructible<Function>::value
@@ -854,15 +875,19 @@ class restricting_mutation_reader : public flat_mutation_reader::impl {
}
)
decltype(auto) with_reader(Function fn) {
if (auto* reader = boost::get<flat_mutation_reader>(&_reader_or_mutation_source)) {
return fn(*reader);
if (auto* state = boost::get<admitted_state>(&_state)) {
return fn(state->reader);
}
return create_reader().then([this, fn = std::move(fn)] () mutable {
return fn(boost::get<flat_mutation_reader>(_reader_or_mutation_source));
return boost::get<pending_state>(_state).semaphore->wait_admission(new_reader_base_cost).then(
[this, fn = std::move(fn)] (lw_shared_ptr<reader_concurrency_semaphore::reader_permit> permit) mutable {
auto reader_factory = std::move(boost::get<pending_state>(_state).reader_factory);
_state = admitted_state{permit, reader_factory(reader_resource_tracker(permit))};
return fn(boost::get<admitted_state>(_state).reader);
});
}
public:
restricting_mutation_reader(const restricted_mutation_reader_config& config,
restricting_mutation_reader(reader_concurrency_semaphore& semaphore,
mutation_source ms,
schema_ptr s,
const dht::partition_range& range,
@@ -872,20 +897,8 @@ public:
streamed_mutation::forwarding fwd,
mutation_reader::forwarding fwd_mr)
: impl(s)
, _config(config)
, _reader_or_mutation_source(
mutation_source_and_params{std::move(ms), std::move(s), range, slice, pc, std::move(trace_state), fwd, fwd_mr}) {
if (_config.resources_sem->waiters() >= _config.max_queue_length) {
_config.raise_queue_overloaded_exception();
}
}
~restricting_mutation_reader() {
if (boost::get<flat_mutation_reader>(&_reader_or_mutation_source)) {
_config.resources_sem->signal(new_reader_base_cost);
if (_config.active_reads) {
--(*_config.active_reads);
}
}
, _state(pending_state{&semaphore,
mutation_source_and_params{std::move(ms), std::move(s), range, slice, pc, std::move(trace_state), fwd, fwd_mr}}) {
}
virtual future<> fill_buffer() override {
@@ -904,8 +917,8 @@ public:
return;
}
_end_of_stream = false;
if (auto* reader = boost::get<flat_mutation_reader>(&_reader_or_mutation_source)) {
return reader->next_partition();
if (auto* state = boost::get<admitted_state>(&_state)) {
return state->reader.next_partition();
}
}
virtual future<> fast_forward_to(const dht::partition_range& pr) override {
@@ -925,7 +938,7 @@ public:
};
flat_mutation_reader
make_restricted_flat_reader(const restricted_mutation_reader_config& config,
make_restricted_flat_reader(reader_concurrency_semaphore& semaphore,
mutation_source ms,
schema_ptr s,
const dht::partition_range& range,
@@ -934,7 +947,7 @@ make_restricted_flat_reader(const restricted_mutation_reader_config& config,
tracing::trace_state_ptr trace_state,
streamed_mutation::forwarding fwd,
mutation_reader::forwarding fwd_mr) {
return make_flat_mutation_reader<restricting_mutation_reader>(config, std::move(ms), std::move(s), range, slice, pc, std::move(trace_state), fwd, fwd_mr);
return make_flat_mutation_reader<restricting_mutation_reader>(semaphore, std::move(ms), std::move(s), range, slice, pc, std::move(trace_state), fwd, fwd_mr);
}

View File

@@ -30,6 +30,7 @@
#include "core/do_with.hh"
#include "tracing/trace_state.hh"
#include "flat_mutation_reader.hh"
#include "reader_concurrency_semaphore.hh"
// A mutation_reader is an object which allows iterating on mutations: invoke
// the function to get a future for the next mutation, with an unset optional
@@ -275,7 +276,8 @@ class mutation_source {
io_priority,
tracing::trace_state_ptr,
streamed_mutation::forwarding,
mutation_reader::forwarding
mutation_reader::forwarding,
reader_resource_tracker
)>;
using flat_reader_factory_type = std::function<flat_mutation_reader(schema_ptr,
partition_range,
@@ -283,7 +285,8 @@ class mutation_source {
io_priority,
tracing::trace_state_ptr,
streamed_mutation::forwarding,
mutation_reader::forwarding)>;
mutation_reader::forwarding,
reader_resource_tracker)>;
class impl {
public:
virtual ~impl() { }
@@ -293,14 +296,16 @@ class mutation_source {
io_priority pc,
tracing::trace_state_ptr trace_state,
streamed_mutation::forwarding fwd,
mutation_reader::forwarding fwd_mr) = 0;
mutation_reader::forwarding fwd_mr,
reader_resource_tracker tracker) = 0;
virtual flat_mutation_reader make_flat_mutation_reader(schema_ptr s,
partition_range range,
const query::partition_slice& slice,
io_priority pc,
tracing::trace_state_ptr trace_state,
streamed_mutation::forwarding fwd,
mutation_reader::forwarding fwd_mr) = 0;
mutation_reader::forwarding fwd_mr,
reader_resource_tracker tracker) = 0;
};
class mutation_reader_mutation_source : public impl {
func_type _fn;
@@ -312,8 +317,9 @@ class mutation_source {
io_priority pc,
tracing::trace_state_ptr trace_state,
streamed_mutation::forwarding fwd,
mutation_reader::forwarding fwd_mr) override {
return _fn(std::move(s), range, slice, pc, std::move(trace_state), fwd, fwd_mr);
mutation_reader::forwarding fwd_mr,
reader_resource_tracker tracker) override {
return _fn(std::move(s), range, slice, pc, std::move(trace_state), fwd, fwd_mr, tracker);
}
virtual flat_mutation_reader make_flat_mutation_reader(schema_ptr s,
partition_range range,
@@ -321,9 +327,10 @@ class mutation_source {
io_priority pc,
tracing::trace_state_ptr trace_state,
streamed_mutation::forwarding fwd,
mutation_reader::forwarding fwd_mr) override {
mutation_reader::forwarding fwd_mr,
reader_resource_tracker tracker) override {
return flat_mutation_reader_from_mutation_reader(s,
_fn(s, range, slice, pc, std::move(trace_state), fwd, fwd_mr),
_fn(s, range, slice, pc, std::move(trace_state), fwd, fwd_mr, tracker),
fwd);
}
};
@@ -337,8 +344,9 @@ class mutation_source {
io_priority pc,
tracing::trace_state_ptr trace_state,
streamed_mutation::forwarding fwd,
mutation_reader::forwarding fwd_mr) override {
return mutation_reader_from_flat_mutation_reader(_fn(std::move(s), range, slice, pc, std::move(trace_state), fwd, fwd_mr));
mutation_reader::forwarding fwd_mr,
reader_resource_tracker tracker) override {
return mutation_reader_from_flat_mutation_reader(_fn(std::move(s), range, slice, pc, std::move(trace_state), fwd, fwd_mr, tracker));
}
virtual flat_mutation_reader make_flat_mutation_reader(schema_ptr s,
partition_range range,
@@ -346,8 +354,9 @@ class mutation_source {
io_priority pc,
tracing::trace_state_ptr trace_state,
streamed_mutation::forwarding fwd,
mutation_reader::forwarding fwd_mr) override {
return _fn(std::move(s), range, slice, pc, std::move(trace_state), fwd, fwd_mr);
mutation_reader::forwarding fwd_mr,
reader_resource_tracker tracker) override {
return _fn(std::move(s), range, slice, pc, std::move(trace_state), fwd, fwd_mr, tracker);
}
};
// We could have our own version of std::function<> that is nothrow
@@ -368,23 +377,78 @@ public:
: _impl(seastar::make_shared<flat_mutation_reader_mutation_source>(std::move(fn)))
, _presence_checker_factory(make_lw_shared(std::move(pcf)))
{ }
mutation_source(std::function<flat_mutation_reader(schema_ptr, partition_range, const query::partition_slice&, io_priority,
tracing::trace_state_ptr, streamed_mutation::forwarding, mutation_reader::forwarding)> fn,
std::function<partition_presence_checker()> pcf = [] { return make_default_partition_presence_checker(); })
: mutation_source([fn = std::move(fn)] (schema_ptr s,
partition_range range,
const query::partition_slice& slice,
io_priority pc,
tracing::trace_state_ptr tr,
streamed_mutation::forwarding fwd,
mutation_reader::forwarding fwd_mr,
reader_resource_tracker) {
return fn(s, range, slice, pc, std::move(tr), fwd, fwd_mr);
}
, std::move(pcf)) {}
mutation_source(std::function<mutation_reader(schema_ptr, partition_range, const query::partition_slice&, io_priority,
tracing::trace_state_ptr, streamed_mutation::forwarding, mutation_reader::forwarding)> fn)
: mutation_source([fn = std::move(fn)] (schema_ptr s,
partition_range range,
const query::partition_slice& slice,
io_priority pc,
tracing::trace_state_ptr tr,
streamed_mutation::forwarding fwd,
mutation_reader::forwarding fwd_mr,
reader_resource_tracker) {
return fn(s, range, slice, pc, std::move(tr), fwd, fwd_mr);
}) {}
// For sources which don't care about the mutation_reader::forwarding flag (always fast forwardable)
mutation_source(std::function<mutation_reader(schema_ptr s, partition_range range, const query::partition_slice& slice, io_priority pc, tracing::trace_state_ptr, streamed_mutation::forwarding)> fn)
: mutation_source([fn = std::move(fn)] (schema_ptr s, partition_range range, const query::partition_slice& slice, io_priority pc, tracing::trace_state_ptr tr, streamed_mutation::forwarding fwd, mutation_reader::forwarding) {
: mutation_source([fn = std::move(fn)] (schema_ptr s,
partition_range range,
const query::partition_slice& slice,
io_priority pc,
tracing::trace_state_ptr tr,
streamed_mutation::forwarding fwd,
mutation_reader::forwarding,
reader_resource_tracker) {
return fn(s, range, slice, pc, std::move(tr), fwd);
}) {}
mutation_source(std::function<mutation_reader(schema_ptr, partition_range, const query::partition_slice&, io_priority)> fn)
: mutation_source([fn = std::move(fn)] (schema_ptr s, partition_range range, const query::partition_slice& slice, io_priority pc, tracing::trace_state_ptr, streamed_mutation::forwarding fwd, mutation_reader::forwarding) {
: mutation_source([fn = std::move(fn)] (schema_ptr s,
partition_range range,
const query::partition_slice& slice,
io_priority pc,
tracing::trace_state_ptr,
streamed_mutation::forwarding fwd,
mutation_reader::forwarding,
reader_resource_tracker) {
assert(!fwd);
return fn(s, range, slice, pc);
}) {}
mutation_source(std::function<mutation_reader(schema_ptr, partition_range, const query::partition_slice&)> fn)
: mutation_source([fn = std::move(fn)] (schema_ptr s, partition_range range, const query::partition_slice& slice, io_priority, tracing::trace_state_ptr, streamed_mutation::forwarding fwd, mutation_reader::forwarding) {
: mutation_source([fn = std::move(fn)] (schema_ptr s,
partition_range range,
const query::partition_slice& slice,
io_priority,
tracing::trace_state_ptr,
streamed_mutation::forwarding fwd,
mutation_reader::forwarding,
reader_resource_tracker) {
assert(!fwd);
return fn(s, range, slice);
}) {}
mutation_source(std::function<mutation_reader(schema_ptr, partition_range range)> fn)
: mutation_source([fn = std::move(fn)] (schema_ptr s, partition_range range, const query::partition_slice&, io_priority, tracing::trace_state_ptr, streamed_mutation::forwarding fwd, mutation_reader::forwarding) {
: mutation_source([fn = std::move(fn)] (schema_ptr s,
partition_range range,
const query::partition_slice&,
io_priority,
tracing::trace_state_ptr,
streamed_mutation::forwarding fwd,
mutation_reader::forwarding,
reader_resource_tracker) {
assert(!fwd);
return fn(s, range);
}) {}
@@ -404,9 +468,10 @@ public:
io_priority pc = default_priority_class(),
tracing::trace_state_ptr trace_state = nullptr,
streamed_mutation::forwarding fwd = streamed_mutation::forwarding::no,
mutation_reader::forwarding fwd_mr = mutation_reader::forwarding::yes) const
mutation_reader::forwarding fwd_mr = mutation_reader::forwarding::yes,
reader_resource_tracker tracker = no_resource_tracking()) const
{
return _impl->make_mutation_reader(std::move(s), range, slice, pc, std::move(trace_state), fwd, fwd_mr);
return _impl->make_mutation_reader(std::move(s), range, slice, pc, std::move(trace_state), fwd, fwd_mr, tracker);
}
mutation_reader operator()(schema_ptr s, partition_range range = query::full_partition_range) const {
@@ -422,9 +487,10 @@ public:
io_priority pc = default_priority_class(),
tracing::trace_state_ptr trace_state = nullptr,
streamed_mutation::forwarding fwd = streamed_mutation::forwarding::no,
mutation_reader::forwarding fwd_mr = mutation_reader::forwarding::yes) const
mutation_reader::forwarding fwd_mr = mutation_reader::forwarding::yes,
reader_resource_tracker tracker = no_resource_tracking()) const
{
return _impl->make_flat_mutation_reader(std::move(s), range, slice, pc, std::move(trace_state), fwd, fwd_mr);
return _impl->make_flat_mutation_reader(std::move(s), range, slice, pc, std::move(trace_state), fwd, fwd_mr, tracker);
}
flat_mutation_reader
@@ -467,18 +533,6 @@ public:
mutation_source make_empty_mutation_source();
snapshot_source make_empty_snapshot_source();
struct restricted_mutation_reader_config {
semaphore* resources_sem = nullptr;
uint64_t* active_reads = nullptr;
std::chrono::nanoseconds timeout = {};
size_t max_queue_length = std::numeric_limits<size_t>::max();
std::function<void ()> raise_queue_overloaded_exception = default_raise_queue_overloaded_exception;
static void default_raise_queue_overloaded_exception() {
throw std::runtime_error("restricted mutation reader queue overload");
}
};
// Creates a restricted reader whose resource usages will be tracked
// during it's lifetime. If there are not enough resources (dues to
// existing readers) to create the new reader, it's construction will
@@ -488,7 +542,7 @@ struct restricted_mutation_reader_config {
// a semaphore to track and limit the memory usage of readers. It also
// contains a timeout and a maximum queue size for inactive readers
// whose construction is blocked.
flat_mutation_reader make_restricted_flat_reader(const restricted_mutation_reader_config& config,
flat_mutation_reader make_restricted_flat_reader(reader_concurrency_semaphore& semaphore,
mutation_source ms,
schema_ptr s,
const dht::partition_range& range,
@@ -498,12 +552,12 @@ flat_mutation_reader make_restricted_flat_reader(const restricted_mutation_reade
streamed_mutation::forwarding fwd = streamed_mutation::forwarding::no,
mutation_reader::forwarding fwd_mr = mutation_reader::forwarding::yes);
inline flat_mutation_reader make_restricted_flat_reader(const restricted_mutation_reader_config& config,
inline flat_mutation_reader make_restricted_flat_reader(reader_concurrency_semaphore& semaphore,
mutation_source ms,
schema_ptr s,
const dht::partition_range& range = query::full_partition_range) {
auto& full_slice = s->full_slice();
return make_restricted_flat_reader(config, std::move(ms), std::move(s), range, full_slice);
return make_restricted_flat_reader(semaphore, std::move(ms), std::move(s), range, full_slice);
}
template<>

View File

@@ -259,6 +259,11 @@ public:
return is_partition_end() || (_ck && _ck->is_empty(s) && _bound_weight > 0);
}
bool is_before_all_clustered_rows(const schema& s) const {
return _type < partition_region::clustered
|| (_type == partition_region::clustered && _ck->is_empty(s) && _bound_weight < 0);
}
template<typename Hasher>
void feed_hash(Hasher& hasher, const schema& s) const {
::feed_hash(hasher, _bound_weight);

View File

@@ -0,0 +1,204 @@
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* Copyright (C) 2017 ScyllaDB
*/
#pragma once
#include <core/file.hh>
#include <core/semaphore.hh>
/// Specific semaphore for controlling reader concurrency
///
/// Before creating a reader one should obtain a permit by calling
/// `wait_admission()`. This permit can then be used for tracking the
/// reader's memory consumption via `reader_resource_tracker`.
/// The permit should be held onto for the lifetime of the reader
/// and/or any buffer its tracking.
/// Reader concurrency is dual limited by count and memory.
/// The semaphore can be configured with the desired limits on
/// construction. New readers will only be admitted when there is both
/// enough count and memory units available. Readers are admitted in
/// FIFO order.
/// It's possible to specify the maximum allowed number of waiting
/// readers by the `max_queue_length` constructor parameter. When the
/// number waiting readers would be equal or greater than this number
/// (when calling `wait_admission()`) an exception will be thrown.
/// The type of the exception and optionally some additional code
/// that should be executed when this happens can be customized by the
/// `raise_queue_overloaded_exception` constructor parameter. This
/// function will be called every time the queue limit is surpassed.
/// It is expected to return an `std::exception_ptr` that will be
/// injected into the future.
class reader_concurrency_semaphore {
public:
using timeout_clock = lowres_clock;
static const timeout_clock::duration no_timeout;
struct resources {
int count = 0;
ssize_t memory = 0;
resources() = default;
resources(int count, ssize_t memory)
: count(count)
, memory(memory) {
}
bool operator>=(const resources& other) const {
return count >= other.count && memory >= other.memory;
}
resources& operator-=(const resources& other) {
count -= other.count;
memory -= other.memory;
return *this;
}
resources& operator+=(const resources& other) {
count += other.count;
memory += other.memory;
return *this;
}
explicit operator bool() const {
return count >= 0 && memory >= 0;
}
};
class reader_permit {
reader_concurrency_semaphore& _semaphore;
const resources _base_cost;
public:
reader_permit(reader_concurrency_semaphore& semaphore, resources base_cost)
: _semaphore(semaphore)
, _base_cost(base_cost) {
}
~reader_permit() {
_semaphore.signal(_base_cost);
}
reader_permit(const reader_permit&) = delete;
reader_permit& operator=(const reader_permit&) = delete;
reader_permit(reader_permit&& other) = delete;
reader_permit& operator=(reader_permit&& other) = delete;
void consume_memory(size_t memory) {
_semaphore.consume_memory(memory);
}
void signal_memory(size_t memory) {
_semaphore.signal_memory(memory);
}
};
private:
static std::exception_ptr default_make_queue_overloaded_exception() {
return std::make_exception_ptr(std::runtime_error("restricted mutation reader queue overload"));
}
resources _resources;
struct entry {
promise<lw_shared_ptr<reader_permit>> pr;
resources res;
entry(promise<lw_shared_ptr<reader_permit>>&& pr, resources r) : pr(std::move(pr)), res(r) {}
};
struct expiry_handler {
void operator()(entry& e) noexcept {
e.pr.set_exception(semaphore_timed_out());
}
};
expiring_fifo<entry, expiry_handler, timeout_clock> _wait_list;
timeout_clock::duration _timeout;
size_t _max_queue_length = std::numeric_limits<size_t>::max();
std::function<std::exception_ptr()> _make_queue_overloaded_exception = default_make_queue_overloaded_exception;
bool has_available_units(const resources& r) const {
return bool(_resources) && _resources >= r;
}
bool may_proceed(const resources& r) const {
return has_available_units(r) && _wait_list.empty();
}
void consume_memory(size_t memory) {
_resources.memory -= memory;
}
void signal(const resources& r);
void signal_memory(size_t memory) {
signal(resources(0, static_cast<ssize_t>(memory)));
}
public:
reader_concurrency_semaphore(unsigned count,
size_t memory,
timeout_clock::duration timeout = no_timeout,
size_t max_queue_length = std::numeric_limits<size_t>::max(),
std::function<std::exception_ptr()> raise_queue_overloaded_exception = default_make_queue_overloaded_exception)
: _resources(count, memory)
, _timeout(timeout)
, _max_queue_length(max_queue_length)
, _make_queue_overloaded_exception(raise_queue_overloaded_exception) {
}
reader_concurrency_semaphore(const reader_concurrency_semaphore&) = delete;
reader_concurrency_semaphore& operator=(const reader_concurrency_semaphore&) = delete;
reader_concurrency_semaphore(reader_concurrency_semaphore&&) = delete;
reader_concurrency_semaphore& operator=(reader_concurrency_semaphore&&) = delete;
future<lw_shared_ptr<reader_permit>> wait_admission(size_t memory);
const resources available_resources() const {
return _resources;
}
size_t waiters() const {
return _wait_list.size();
}
};
class reader_resource_tracker {
lw_shared_ptr<reader_concurrency_semaphore::reader_permit> _permit;
public:
reader_resource_tracker() = default;
explicit reader_resource_tracker(lw_shared_ptr<reader_concurrency_semaphore::reader_permit> permit)
: _permit(std::move(permit)) {
}
bool operator==(const reader_resource_tracker& other) const {
return _permit == other._permit;
}
file track(file f) const;
lw_shared_ptr<reader_concurrency_semaphore::reader_permit> get_permit() const {
return _permit;
}
};
inline reader_resource_tracker no_resource_tracking() {
return {};
}

View File

@@ -1,48 +0,0 @@
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* Copyright (C) 2017 ScyllaDB
*/
#pragma once
#include <core/file.hh>
#include <core/semaphore.hh>
class reader_resource_tracker {
seastar::semaphore* _sem = nullptr;
public:
reader_resource_tracker() = default;
explicit reader_resource_tracker(seastar::semaphore* sem)
: _sem(sem) {
}
bool operator==(const reader_resource_tracker& other) const {
return _sem == other._sem;
}
file track(file f) const;
semaphore* get_semaphore() const {
return _sem;
}
};
inline reader_resource_tracker no_resource_tracking() {
return reader_resource_tracker(nullptr);
}

View File

@@ -87,7 +87,7 @@ def get_repo_file(dir):
for name in files:
with open(name, 'r') as myfile:
for line in myfile:
match = re.search(".*http.?://.*/scylladb/([^/\s]+)/deb/([^/\s]+)\s.*", line)
match = re.search(".*http.?://.*/scylladb/([^/\s]+)/deb/([^/\s]+)[\s/].*", line)
if match:
return match.group(2), match.group(1)
match = re.search(".*http.?://.*/scylladb/([^/]+)/rpm/[^/]+/([^/\s]+)/.*", line)

Submodule seastar updated: af1b789855...c89c8b8043

View File

@@ -144,7 +144,11 @@ future<lowres_clock::duration> cache_hitrate_calculator::recalculate_hitrates()
return _db.invoke_on_all([this, rates = std::move(rates), cpuid = engine().cpu_id()] (database& db) {
sstring gstate;
for (auto& cf : db.get_column_families() | boost::adaptors::filtered(non_system_filter)) {
stat s = rates.at(cf.first);
auto it = rates.find(cf.first);
if (it == rates.end()) { // a table may be added before map/reduce compltes and this code runs
continue;
}
stat s = it->second;
float rate = 0;
if (s.h) {
rate = s.h / (s.h + s.m);

View File

@@ -1011,6 +1011,7 @@ void storage_service::on_change(inet_address endpoint, application_state state,
boost::split(pieces, value.value, boost::is_any_of(sstring(versioned_value::DELIMITER_STR)));
if (pieces.empty()) {
slogger.warn("Fail to split status in on_change: endpoint={}, app_state={}, value={}", endpoint, state, value);
return;
}
sstring move_name = pieces[0];
if (move_name == sstring(versioned_value::STATUS_BOOTSTRAPPING)) {

View File

@@ -27,7 +27,7 @@
#include "core/temporary_buffer.hh"
#include "consumer.hh"
#include "sstables/types.hh"
#include "reader_resource_tracker.hh"
#include "reader_concurrency_semaphore.hh"
// sstables::data_consume_row feeds the contents of a single row into a
// row_consumer object:

View File

@@ -175,10 +175,10 @@ private:
bool _complete_sent = false;
bool _received_failed_complete_message = false;
// If the session is idle for 300 minutes, close the session
std::chrono::seconds _keep_alive_timeout{60 * 300};
// Check every 10 minutes
std::chrono::seconds _keep_alive_interval{60 * 10};
// If the session is idle for 10 minutes, close the session
std::chrono::seconds _keep_alive_timeout{60 * 10};
// Check every 1 minutes
std::chrono::seconds _keep_alive_interval{60};
timer<lowres_clock> _keep_alive;
stream_bytes _last_stream_bytes;
lowres_clock::time_point _last_stream_progress;

View File

@@ -148,9 +148,9 @@ if __name__ == "__main__":
for mode in modes_to_run:
prefix = os.path.join('build', mode, 'tests')
for test in other_tests:
test_to_run.append((os.path.join(prefix, test), 'other'))
test_to_run.append((os.path.join(prefix, test), 'other', '-c2 -m4G'.split()))
for test in boost_tests:
test_to_run.append((os.path.join(prefix, test), 'boost'))
test_to_run.append((os.path.join(prefix, test), 'boost', '-c2 -m4G'.split()))
if 'release' in modes_to_run:
test_to_run.append(('build/release/tests/lsa_async_eviction_test', 'other',

View File

@@ -1395,14 +1395,14 @@ SEASTAR_TEST_CASE(test_ttl) {
{{"p1", utf8_type}}, {}, {{"r1", utf8_type}, {"r2", utf8_type}, {"r3", make_my_list_type()}}, {}, utf8_type);
}).then([&e] {
return e.execute_cql(
"update cf using ttl 1000 set r1 = 'value1_1', r3 = ['a', 'b', 'c'] where p1 = 'key1';").discard_result();
"update cf using ttl 100000 set r1 = 'value1_1', r3 = ['a', 'b', 'c'] where p1 = 'key1';").discard_result();
}).then([&e] {
return e.execute_cql(
"update cf using ttl 1 set r1 = 'value1_3', r3 = ['a', 'b', 'c'] where p1 = 'key3';").discard_result();
"update cf using ttl 100 set r1 = 'value1_3', r3 = ['a', 'b', 'c'] where p1 = 'key3';").discard_result();
}).then([&e] {
return e.execute_cql("update cf using ttl 1 set r3[1] = 'b' where p1 = 'key1';").discard_result();
return e.execute_cql("update cf using ttl 100 set r3[1] = 'b' where p1 = 'key1';").discard_result();
}).then([&e] {
return e.execute_cql("update cf using ttl 1 set r1 = 'value1_2' where p1 = 'key2';").discard_result();
return e.execute_cql("update cf using ttl 100 set r1 = 'value1_2' where p1 = 'key2';").discard_result();
}).then([&e] {
return e.execute_cql("insert into cf (p1, r2) values ('key2', 'value2_2');").discard_result();
}).then([&e, my_list_type] {
@@ -1420,7 +1420,7 @@ SEASTAR_TEST_CASE(test_ttl) {
});
});
}).then([&e] {
forward_jump_clocks(2s);
forward_jump_clocks(200s);
return e.execute_cql("select r1, r2 from cf;").then([](auto msg) {
assert_that(msg).is_rows().with_size(2)
.with_row({{}, utf8_type->decompose(sstring("value2_2"))})
@@ -1448,7 +1448,7 @@ SEASTAR_TEST_CASE(test_ttl) {
}).then([&e] {
return e.execute_cql("create table cf2 (p1 text PRIMARY KEY, r1 text, r2 text);").discard_result();
}).then([&e] {
return e.execute_cql("insert into cf2 (p1, r1) values ('foo', 'bar') using ttl 5;").discard_result();
return e.execute_cql("insert into cf2 (p1, r1) values ('foo', 'bar') using ttl 500;").discard_result();
}).then([&e] {
return e.execute_cql("select p1, r1 from cf2 where p1 = 'foo';").then([] (auto msg) {
assert_that(msg).is_rows().with_rows({
@@ -1456,7 +1456,7 @@ SEASTAR_TEST_CASE(test_ttl) {
});
});
}).then([&e] {
forward_jump_clocks(6s);
forward_jump_clocks(600s);
return e.execute_cql("select p1, r1 from cf2 where p1 = 'foo';").then([] (auto msg) {
assert_that(msg).is_rows().with_rows({ });
});
@@ -1471,7 +1471,7 @@ SEASTAR_TEST_CASE(test_ttl) {
});
});
}).then([&e] {
return e.execute_cql("insert into cf2 (p1, r1) values ('foo', 'bar') using ttl 5;").discard_result();
return e.execute_cql("insert into cf2 (p1, r1) values ('foo', 'bar') using ttl 500;").discard_result();
}).then([&e] {
return e.execute_cql("update cf2 set r1 = null where p1 = 'foo';").discard_result();
}).then([&e] {
@@ -1481,16 +1481,16 @@ SEASTAR_TEST_CASE(test_ttl) {
});
});
}).then([&e] {
forward_jump_clocks(6s);
forward_jump_clocks(600s);
return e.execute_cql("select p1, r1 from cf2 where p1 = 'foo';").then([] (auto msg) {
assert_that(msg).is_rows().with_rows({ });
});
}).then([&e] {
return e.execute_cql("insert into cf2 (p1, r1) values ('foo', 'bar') using ttl 5;").discard_result();
return e.execute_cql("insert into cf2 (p1, r1) values ('foo', 'bar') using ttl 500;").discard_result();
}).then([&e] {
return e.execute_cql("insert into cf2 (p1, r2) values ('foo', null);").discard_result();
}).then([&e] {
forward_jump_clocks(6s);
forward_jump_clocks(600s);
return e.execute_cql("select p1, r1 from cf2 where p1 = 'foo';").then([] (auto msg) {
assert_that(msg).is_rows().with_rows({
{utf8_type->decompose(sstring("foo")), { }}
@@ -1987,10 +1987,9 @@ SEASTAR_TEST_CASE(test_in_restriction) {
assert_that(msg).is_rows().with_size(0);
return e.execute_cql("select r1 from tir where p1 in (2, 0, 2, 1);");
}).then([&e] (auto msg) {
assert_that(msg).is_rows().with_rows({
assert_that(msg).is_rows().with_rows_ignore_order({
{int32_type->decompose(4)},
{int32_type->decompose(0)},
{int32_type->decompose(4)},
{int32_type->decompose(1)},
{int32_type->decompose(2)},
{int32_type->decompose(3)},
@@ -2012,6 +2011,22 @@ SEASTAR_TEST_CASE(test_in_restriction) {
{int32_type->decompose(2)},
{int32_type->decompose(1)},
});
return e.prepare("select r1 from tir where p1 in ?");
}).then([&e] (cql3::prepared_cache_key_type prepared_id){
auto my_list_type = list_type_impl::get_instance(int32_type, true);
std::vector<cql3::raw_value> raw_values;
auto in_values_list = my_list_type->decompose(make_list_value(my_list_type,
list_type_impl::native_type{{int(2), int(0), int(2), int(1)}}));
raw_values.emplace_back(cql3::raw_value::make_value(in_values_list));
return e.execute_prepared(prepared_id,raw_values);
}).then([&e] (shared_ptr<cql_transport::messages::result_message> msg) {
assert_that(msg).is_rows().with_rows_ignore_order({
{int32_type->decompose(4)},
{int32_type->decompose(0)},
{int32_type->decompose(1)},
{int32_type->decompose(2)},
{int32_type->decompose(3)},
});
});
});
}
@@ -2479,3 +2494,66 @@ SEASTAR_TEST_CASE(test_secondary_index_query) {
});
});
}
SEASTAR_TEST_CASE(test_static_multi_cell_static_lists_with_ckey) {
return do_with_cql_env_thread([] (cql_test_env& e) {
e.execute_cql("CREATE TABLE t (p int, c int, slist list<int> static, v int, PRIMARY KEY (p, c));").get();
e.execute_cql("INSERT INTO t (p, c, slist, v) VALUES (1, 1, [1], 1); ").get();
{
e.execute_cql("UPDATE t SET slist[0] = 3, v = 3 WHERE p = 1 AND c = 1;").get();
auto msg = e.execute_cql("SELECT slist, v FROM t WHERE p = 1 AND c = 1;").get0();
auto slist_type = list_type_impl::get_instance(int32_type, true);
assert_that(msg).is_rows().with_row({
{ slist_type->decompose(make_list_value(slist_type, list_type_impl::native_type({{3}}))) },
{ int32_type->decompose(3) }
});
}
{
e.execute_cql("UPDATE t SET slist = [4], v = 4 WHERE p = 1 AND c = 1;").get();
auto msg = e.execute_cql("SELECT slist, v FROM t WHERE p = 1 AND c = 1;").get0();
auto slist_type = list_type_impl::get_instance(int32_type, true);
assert_that(msg).is_rows().with_row({
{ slist_type->decompose(make_list_value(slist_type, list_type_impl::native_type({{4}}))) },
{ int32_type->decompose(4) }
});
}
{
e.execute_cql("UPDATE t SET slist = [3] + slist , v = 5 WHERE p = 1 AND c = 1;").get();
auto msg = e.execute_cql("SELECT slist, v FROM t WHERE p = 1 AND c = 1;").get0();
auto slist_type = list_type_impl::get_instance(int32_type, true);
assert_that(msg).is_rows().with_row({
{ slist_type->decompose(make_list_value(slist_type, list_type_impl::native_type({3, 4}))) },
{ int32_type->decompose(5) }
});
}
{
e.execute_cql("UPDATE t SET slist = slist + [5] , v = 6 WHERE p = 1 AND c = 1;").get();
auto msg = e.execute_cql("SELECT slist, v FROM t WHERE p = 1 AND c = 1;").get0();
auto slist_type = list_type_impl::get_instance(int32_type, true);
assert_that(msg).is_rows().with_row({
{ slist_type->decompose(make_list_value(slist_type, list_type_impl::native_type({3, 4, 5}))) },
{ int32_type->decompose(6) }
});
}
{
e.execute_cql("DELETE slist[2] from t WHERE p = 1;").get();
auto msg = e.execute_cql("SELECT slist, v FROM t WHERE p = 1 AND c = 1;").get0();
auto slist_type = list_type_impl::get_instance(int32_type, true);
assert_that(msg).is_rows().with_row({
{ slist_type->decompose(make_list_value(slist_type, list_type_impl::native_type({3, 4}))) },
{ int32_type->decompose(6) }
});
}
{
e.execute_cql("UPDATE t SET slist = slist - [4] , v = 7 WHERE p = 1 AND c = 1;").get();
auto msg = e.execute_cql("SELECT slist, v FROM t WHERE p = 1 AND c = 1;").get0();
auto slist_type = list_type_impl::get_instance(int32_type, true);
assert_that(msg).is_rows().with_row({
{ slist_type->decompose(make_list_value(slist_type, list_type_impl::native_type({3}))) },
{ int32_type->decompose(7) }
});
}
});
}

View File

@@ -29,6 +29,9 @@
#include "database.hh"
#include "partition_slice_builder.hh"
#include "frozen_mutation.hh"
#include "mutation_source_test.hh"
#include "schema_registry.hh"
#include "service/migration_manager.hh"
#include "disk-error-handler.hh"
@@ -79,3 +82,33 @@ SEASTAR_TEST_CASE(test_querying_with_limits) {
});
});
}
SEASTAR_TEST_CASE(test_database_with_data_in_sstables_is_a_mutation_source) {
return do_with_cql_env_thread([] (cql_test_env& e) {
run_mutation_source_tests([&] (schema_ptr s, const std::vector<mutation>& partitions) -> mutation_source {
try {
e.local_db().find_column_family(s->ks_name(), s->cf_name());
service::get_local_migration_manager().announce_column_family_drop(s->ks_name(), s->cf_name(), true).get();
} catch (const no_such_column_family&) {
// expected
}
service::get_local_migration_manager().announce_new_column_family(s, true).get();
column_family& cf = e.local_db().find_column_family(s);
for (auto&& m : partitions) {
e.local_db().apply(cf.schema(), freeze(m)).get();
}
cf.flush().get();
cf.get_row_cache().invalidate([] {}).get();
return mutation_source([&] (schema_ptr s,
const dht::partition_range& range,
const query::partition_slice& slice,
const io_priority_class& pc,
tracing::trace_state_ptr trace_state,
streamed_mutation::forwarding fwd,
mutation_reader::forwarding fwd_mr) {
return cf.make_reader(s, range, slice, pc, std::move(trace_state), fwd, fwd_mr);
});
});
return make_ready_future<>();
});
}

View File

@@ -26,11 +26,13 @@
#include <boost/test/unit_test.hpp>
#include <query-result-set.hh>
#include <query-result-writer.hh>
#include "tests/test_services.hh"
#include "tests/test-utils.hh"
#include "tests/mutation_assertions.hh"
#include "tests/result_set_assertions.hh"
#include "tests/mutation_source_test.hh"
#include "mutation_query.hh"
#include "core/do_with.hh"
@@ -530,3 +532,22 @@ SEASTAR_TEST_CASE(test_partition_limit) {
}
});
}
SEASTAR_THREAD_TEST_CASE(test_result_size_calculation) {
random_mutation_generator gen(random_mutation_generator::generate_counters::no);
std::vector<mutation> mutations = gen(1);
schema_ptr s = gen.schema();
mutation_source source = make_source(std::move(mutations));
query::result_memory_limiter l(std::numeric_limits<ssize_t>::max());
query::partition_slice slice = make_full_slice(*s);
slice.options.set<query::partition_slice::option::allow_short_read>();
query::result::builder digest_only_builder(slice, query::result_options{query::result_request::only_digest, query::digest_algorithm::xxHash}, l.new_digest_read(query::result_memory_limiter::maximum_result_size).get0());
data_query(s, source, query::full_partition_range, slice, std::numeric_limits<uint32_t>::max(), std::numeric_limits<uint32_t>::max(), gc_clock::now(), digest_only_builder).get0();
query::result::builder result_and_digest_builder(slice, query::result_options{query::result_request::result_and_digest, query::digest_algorithm::xxHash}, l.new_data_read(query::result_memory_limiter::maximum_result_size).get0());
data_query(s, source, query::full_partition_range, slice, std::numeric_limits<uint32_t>::max(), std::numeric_limits<uint32_t>::max(), gc_clock::now(), result_and_digest_builder).get0();
BOOST_REQUIRE_EQUAL(digest_only_builder.memory_accounter().used_memory(), result_and_digest_builder.memory_accounter().used_memory());
}

View File

@@ -753,19 +753,18 @@ class tracking_reader : public flat_mutation_reader::impl {
std::size_t _call_count{0};
std::size_t _ff_count{0};
public:
tracking_reader(semaphore* resources_sem, schema_ptr schema, lw_shared_ptr<sstables::sstable> sst)
tracking_reader(schema_ptr schema, lw_shared_ptr<sstables::sstable> sst, reader_resource_tracker tracker)
: impl(schema)
, _reader(sst->read_range_rows_flat(
schema,
query::full_partition_range,
schema->full_slice(),
default_priority_class(),
reader_resource_tracker(resources_sem),
tracker,
streamed_mutation::forwarding::no,
mutation_reader::forwarding::yes)) {
}
virtual future<> fill_buffer() override {
++_call_count;
return _reader.fill_buffer().then([this] {
@@ -811,16 +810,25 @@ class reader_wrapper {
public:
reader_wrapper(
const restricted_mutation_reader_config& config,
reader_concurrency_semaphore& semaphore,
schema_ptr schema,
lw_shared_ptr<sstables::sstable> sst) : _reader(make_empty_flat_reader(schema)) {
auto ms = mutation_source([this, &config, sst=std::move(sst)] (schema_ptr schema, const dht::partition_range&, auto&&...) {
auto tracker_ptr = std::make_unique<tracking_reader>(config.resources_sem, std::move(schema), std::move(sst));
lw_shared_ptr<sstables::sstable> sst)
: _reader(make_empty_flat_reader(schema))
{
auto ms = mutation_source([this, sst=std::move(sst)] (schema_ptr schema,
const dht::partition_range&,
const query::partition_slice&,
const io_priority_class&,
tracing::trace_state_ptr,
streamed_mutation::forwarding,
mutation_reader::forwarding,
reader_resource_tracker res_tracker) {
auto tracker_ptr = std::make_unique<tracking_reader>(std::move(schema), std::move(sst), res_tracker);
_tracker = tracker_ptr.get();
return flat_mutation_reader(std::move(tracker_ptr));
});
_reader = make_restricted_flat_reader(config, std::move(ms), schema);
_reader = make_restricted_flat_reader(semaphore, std::move(ms), schema);
}
future<> operator()() {
@@ -847,21 +855,6 @@ public:
}
};
struct restriction_data {
std::unique_ptr<semaphore> reader_semaphore;
restricted_mutation_reader_config config;
restriction_data(std::size_t units,
std::chrono::nanoseconds timeout = {},
std::size_t max_queue_length = std::numeric_limits<std::size_t>::max())
: reader_semaphore(std::make_unique<semaphore>(units)) {
config.resources_sem = reader_semaphore.get();
config.timeout = timeout;
config.max_queue_length = max_queue_length;
}
};
class dummy_file_impl : public file_impl {
virtual future<size_t> write_dma(uint64_t pos, const void* buffer, size_t len, const io_priority_class& pc) override {
return make_ready_future<size_t>(0);
@@ -922,41 +915,43 @@ class dummy_file_impl : public file_impl {
SEASTAR_TEST_CASE(reader_restriction_file_tracking) {
return async([&] {
restriction_data rd(4 * 1024);
reader_concurrency_semaphore semaphore(100, 4 * 1024);
// Testing the tracker here, no need to have a base cost.
auto permit = semaphore.wait_admission(0).get0();
{
reader_resource_tracker resource_tracker(rd.config.resources_sem);
reader_resource_tracker resource_tracker(permit);
auto tracked_file = resource_tracker.track(
file(shared_ptr<file_impl>(make_shared<dummy_file_impl>())));
BOOST_REQUIRE_EQUAL(4 * 1024, rd.reader_semaphore->available_units());
BOOST_REQUIRE_EQUAL(4 * 1024, semaphore.available_resources().memory);
auto buf1 = tracked_file.dma_read_bulk<char>(0, 0).get0();
BOOST_REQUIRE_EQUAL(3 * 1024, rd.reader_semaphore->available_units());
BOOST_REQUIRE_EQUAL(3 * 1024, semaphore.available_resources().memory);
auto buf2 = tracked_file.dma_read_bulk<char>(0, 0).get0();
BOOST_REQUIRE_EQUAL(2 * 1024, rd.reader_semaphore->available_units());
BOOST_REQUIRE_EQUAL(2 * 1024, semaphore.available_resources().memory);
auto buf3 = tracked_file.dma_read_bulk<char>(0, 0).get0();
BOOST_REQUIRE_EQUAL(1 * 1024, rd.reader_semaphore->available_units());
BOOST_REQUIRE_EQUAL(1 * 1024, semaphore.available_resources().memory);
auto buf4 = tracked_file.dma_read_bulk<char>(0, 0).get0();
BOOST_REQUIRE_EQUAL(0 * 1024, rd.reader_semaphore->available_units());
BOOST_REQUIRE_EQUAL(0 * 1024, semaphore.available_resources().memory);
auto buf5 = tracked_file.dma_read_bulk<char>(0, 0).get0();
BOOST_REQUIRE_EQUAL(-1 * 1024, rd.reader_semaphore->available_units());
BOOST_REQUIRE_EQUAL(-1 * 1024, semaphore.available_resources().memory);
// Reassing buf1, should still have the same amount of units.
buf1 = tracked_file.dma_read_bulk<char>(0, 0).get0();
BOOST_REQUIRE_EQUAL(-1 * 1024, rd.reader_semaphore->available_units());
BOOST_REQUIRE_EQUAL(-1 * 1024, semaphore.available_resources().memory);
// Move buf1 to the heap, so that we can safely destroy it
auto buf1_ptr = std::make_unique<temporary_buffer<char>>(std::move(buf1));
BOOST_REQUIRE_EQUAL(-1 * 1024, rd.reader_semaphore->available_units());
BOOST_REQUIRE_EQUAL(-1 * 1024, semaphore.available_resources().memory);
buf1_ptr.reset();
BOOST_REQUIRE_EQUAL(0 * 1024, rd.reader_semaphore->available_units());
BOOST_REQUIRE_EQUAL(0 * 1024, semaphore.available_resources().memory);
// Move tracked_file to the heap, so that we can safely destroy it.
auto tracked_file_ptr = std::make_unique<file>(std::move(tracked_file));
@@ -964,126 +959,188 @@ SEASTAR_TEST_CASE(reader_restriction_file_tracking) {
// Move buf4 to the heap, so that we can safely destroy it
auto buf4_ptr = std::make_unique<temporary_buffer<char>>(std::move(buf4));
BOOST_REQUIRE_EQUAL(0 * 1024, rd.reader_semaphore->available_units());
BOOST_REQUIRE_EQUAL(0 * 1024, semaphore.available_resources().memory);
// Releasing buffers that overlived the tracked-file they
// originated from should succeed.
buf4_ptr.reset();
BOOST_REQUIRE_EQUAL(1 * 1024, rd.reader_semaphore->available_units());
BOOST_REQUIRE_EQUAL(1 * 1024, semaphore.available_resources().memory);
}
// All units should have been deposited back.
REQUIRE_EVENTUALLY_EQUAL(4 * 1024, rd.reader_semaphore->available_units());
REQUIRE_EVENTUALLY_EQUAL(4 * 1024, semaphore.available_resources().memory);
});
}
SEASTAR_TEST_CASE(restricted_reader_reading) {
return async([&] {
storage_service_for_tests ssft;
restriction_data rd(new_reader_base_cost);
reader_concurrency_semaphore semaphore(2, new_reader_base_cost);
{
simple_schema s;
auto tmp = make_lw_shared<tmpdir>();
auto sst = create_sstable(s, tmp->path);
auto reader1 = reader_wrapper(rd.config, s.schema(), sst);
auto reader1 = reader_wrapper(semaphore, s.schema(), sst);
reader1().get();
BOOST_REQUIRE_LE(rd.reader_semaphore->available_units(), 0);
BOOST_REQUIRE_LE(semaphore.available_resources().count, 1);
BOOST_REQUIRE_LE(semaphore.available_resources().memory, 0);
BOOST_REQUIRE_EQUAL(reader1.call_count(), 1);
auto reader2 = reader_wrapper(rd.config, s.schema(), sst);
auto read_fut = reader2();
auto reader2 = reader_wrapper(semaphore, s.schema(), sst);
auto read2_fut = reader2();
// reader2 shouldn't be allowed just yet.
// reader2 shouldn't be allowed yet
BOOST_REQUIRE_EQUAL(reader2.call_count(), 0);
BOOST_REQUIRE_EQUAL(semaphore.waiters(), 1);
auto reader3 = reader_wrapper(semaphore, s.schema(), sst);
auto read3_fut = reader3();
// reader3 shouldn't be allowed yet
BOOST_REQUIRE_EQUAL(reader3.call_count(), 0);
BOOST_REQUIRE_EQUAL(semaphore.waiters(), 2);
// Move reader1 to the heap, so that we can safely destroy it.
auto reader1_ptr = std::make_unique<reader_wrapper>(std::move(reader1));
reader1_ptr.reset();
// reader1's destruction should've made some space for reader2 by now.
// reader1's destruction should've freed up enough memory for
// reader2 by now.
REQUIRE_EVENTUALLY_EQUAL(reader2.call_count(), 1);
read_fut.get();
read2_fut.get();
// But reader3 should still not be allowed
BOOST_REQUIRE_EQUAL(reader3.call_count(), 0);
BOOST_REQUIRE_EQUAL(semaphore.waiters(), 1);
// Move reader2 to the heap, so that we can safely destroy it.
auto reader2_ptr = std::make_unique<reader_wrapper>(std::move(reader2));
reader2_ptr.reset();
// Again, reader2's destruction should've freed up enough memory
// for reader3 by now.
REQUIRE_EVENTUALLY_EQUAL(reader3.call_count(), 1);
BOOST_REQUIRE_EQUAL(semaphore.waiters(), 0);
read3_fut.get();
{
// Consume all available units.
const auto consume_guard = consume_units(*rd.reader_semaphore, rd.reader_semaphore->current());
BOOST_REQUIRE_LE(semaphore.available_resources().memory, 0);
// Already allowed readers should not be blocked anymore even if
// there are no more units available.
read_fut = reader2();
BOOST_REQUIRE_EQUAL(reader2.call_count(), 2);
read_fut.get();
read3_fut = reader3();
BOOST_REQUIRE_EQUAL(reader3.call_count(), 2);
read3_fut.get();
}
}
// All units should have been deposited back.
REQUIRE_EVENTUALLY_EQUAL(new_reader_base_cost, rd.reader_semaphore->available_units());
REQUIRE_EVENTUALLY_EQUAL(new_reader_base_cost, semaphore.available_resources().memory);
});
}
SEASTAR_TEST_CASE(restricted_reader_timeout) {
using namespace std::chrono_literals;
return async([&] {
storage_service_for_tests ssft;
restriction_data rd(new_reader_base_cost, std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::milliseconds{10}));
reader_concurrency_semaphore semaphore(2,
new_reader_base_cost,
std::chrono::duration_cast<reader_concurrency_semaphore::timeout_clock::duration>(10ms));
{
simple_schema s;
auto tmp = make_lw_shared<tmpdir>();
auto sst = create_sstable(s, tmp->path);
auto reader1 = reader_wrapper(rd.config, s.schema(), sst);
auto reader1 = reader_wrapper(semaphore, s.schema(), sst);
reader1().get();
auto reader2 = reader_wrapper(rd.config, s.schema(), sst);
auto read_fut = reader2();
auto reader2 = reader_wrapper(semaphore, s.schema(), sst);
auto read2_fut = reader2();
auto reader3 = reader_wrapper(semaphore, s.schema(), sst);
auto read3_fut = reader3();
BOOST_REQUIRE_EQUAL(semaphore.waiters(), 2);
seastar::sleep(std::chrono::milliseconds(20)).get();
// The read should have timed out.
BOOST_REQUIRE(read_fut.failed());
BOOST_REQUIRE_THROW(std::rethrow_exception(read_fut.get_exception()), semaphore_timed_out);
// Altough we have regular BOOST_REQUIREs for this below, if
// the test goes wrong these futures will be still pending
// when we leave scope and deleted memory will be accessed.
// To stop people from trying to debug a failing test just
// assert here so they know this is really just the test
// failing and the underlying problem is that the timeout
// doesn't work.
assert(read2_fut.failed());
assert(read3_fut.failed());
// reader2 should have timed out.
BOOST_REQUIRE(read2_fut.failed());
BOOST_REQUIRE_THROW(std::rethrow_exception(read2_fut.get_exception()), semaphore_timed_out);
// readerk should have timed out.
BOOST_REQUIRE(read3_fut.failed());
BOOST_REQUIRE_THROW(std::rethrow_exception(read3_fut.get_exception()), semaphore_timed_out);
}
// All units should have been deposited back.
REQUIRE_EVENTUALLY_EQUAL(new_reader_base_cost, rd.reader_semaphore->available_units());
REQUIRE_EVENTUALLY_EQUAL(new_reader_base_cost, semaphore.available_resources().memory);
});
}
SEASTAR_TEST_CASE(restricted_reader_max_queue_length) {
return async([&] {
storage_service_for_tests ssft;
restriction_data rd(new_reader_base_cost, {}, 1);
struct queue_overloaded_exception {};
reader_concurrency_semaphore semaphore(2,
new_reader_base_cost,
reader_concurrency_semaphore::no_timeout,
2,
[] { return std::make_exception_ptr(queue_overloaded_exception()); });
{
simple_schema s;
auto tmp = make_lw_shared<tmpdir>();
auto sst = create_sstable(s, tmp->path);
auto reader1_ptr = std::make_unique<reader_wrapper>(rd.config, s.schema(), sst);
auto reader1_ptr = std::make_unique<reader_wrapper>(semaphore, s.schema(), sst);
(*reader1_ptr)().get();
auto reader2_ptr = std::make_unique<reader_wrapper>(rd.config, s.schema(), sst);
auto read_fut = (*reader2_ptr)();
auto reader2_ptr = std::make_unique<reader_wrapper>(semaphore, s.schema(), sst);
auto read2_fut = (*reader2_ptr)();
auto reader3_ptr = std::make_unique<reader_wrapper>(semaphore, s.schema(), sst);
auto read3_fut = (*reader3_ptr)();
auto reader4 = reader_wrapper(semaphore, s.schema(), sst);
BOOST_REQUIRE_EQUAL(semaphore.waiters(), 2);
// The queue should now be full.
BOOST_REQUIRE_THROW(reader_wrapper(rd.config, s.schema(), sst), std::runtime_error);
BOOST_REQUIRE_THROW(reader4().get(), queue_overloaded_exception);
reader1_ptr.reset();
read_fut.get();
read2_fut.get();
reader2_ptr.reset();
read3_fut.get();
}
REQUIRE_EVENTUALLY_EQUAL(new_reader_base_cost, rd.reader_semaphore->available_units());
REQUIRE_EVENTUALLY_EQUAL(new_reader_base_cost, semaphore.available_resources().memory);
});
}
SEASTAR_TEST_CASE(restricted_reader_create_reader) {
return async([&] {
storage_service_for_tests ssft;
restriction_data rd(new_reader_base_cost);
reader_concurrency_semaphore semaphore(100, new_reader_base_cost);
{
simple_schema s;
@@ -1091,7 +1148,7 @@ SEASTAR_TEST_CASE(restricted_reader_create_reader) {
auto sst = create_sstable(s, tmp->path);
{
auto reader = reader_wrapper(rd.config, s.schema(), sst);
auto reader = reader_wrapper(semaphore, s.schema(), sst);
// This fast-forward is stupid, I know but the
// underlying dummy reader won't care, so it's fine.
reader.fast_forward_to(query::full_partition_range).get();
@@ -1102,7 +1159,7 @@ SEASTAR_TEST_CASE(restricted_reader_create_reader) {
}
{
auto reader = reader_wrapper(rd.config, s.schema(), sst);
auto reader = reader_wrapper(semaphore, s.schema(), sst);
reader().get();
BOOST_REQUIRE(reader.created());
@@ -1111,6 +1168,6 @@ SEASTAR_TEST_CASE(restricted_reader_create_reader) {
}
}
REQUIRE_EVENTUALLY_EQUAL(new_reader_base_cost, rd.reader_semaphore->available_units());
REQUIRE_EVENTUALLY_EQUAL(new_reader_base_cost, semaphore.available_resources().memory);
});
}

View File

@@ -656,6 +656,46 @@ void test_streamed_mutation_fragments_have_monotonic_positions(populate_fn popul
});
}
static void test_date_tiered_clustering_slicing(populate_fn populate) {
BOOST_TEST_MESSAGE(__PRETTY_FUNCTION__);
simple_schema ss;
auto s = schema_builder(ss.schema())
.set_compaction_strategy(sstables::compaction_strategy_type::date_tiered)
.build();
auto pkey = ss.make_pkey();
mutation m1(pkey, s);
ss.add_static_row(m1, "s");
m1.partition().apply(ss.new_tombstone());
ss.add_row(m1, ss.make_ckey(0), "v1");
mutation_source ms = populate(s, {m1});
// query row outside the range of existing rows to exercise sstable clustering key filter
{
auto slice = partition_slice_builder(*s)
.with_range(ss.make_ckey_range(1, 2))
.build();
auto prange = dht::partition_range::make_singular(pkey);
assert_that(ms(s, prange, slice))
.produces(m1, slice.row_ranges(*s, pkey.key()))
.produces_end_of_stream();
}
{
auto slice = partition_slice_builder(*s)
.with_range(query::clustering_range::make_singular(ss.make_ckey(0)))
.build();
auto prange = dht::partition_range::make_singular(pkey);
assert_that(ms(s, prange, slice))
.produces(m1)
.produces_end_of_stream();
}
}
static void test_clustering_slices(populate_fn populate) {
BOOST_TEST_MESSAGE(__PRETTY_FUNCTION__);
auto s = schema_builder("ks", "cf")
@@ -819,6 +859,7 @@ static void test_query_only_static_row(populate_fn populate) {
auto pkeys = s.make_pkeys(1);
mutation m1(pkeys[0], s.schema());
m1.partition().apply(s.new_tombstone());
s.add_static_row(m1, "s1");
s.add_row(m1, s.make_ckey(0), "v1");
s.add_row(m1, s.make_ckey(1), "v2");
@@ -843,6 +884,59 @@ static void test_query_only_static_row(populate_fn populate) {
.produces(m1, slice.row_ranges(*s.schema(), m1.key()))
.produces_end_of_stream();
}
// query just a static row, single-partition case
{
auto slice = partition_slice_builder(*s.schema())
.with_ranges({})
.build();
auto prange = dht::partition_range::make_singular(m1.decorated_key());
assert_that(ms(s.schema(), prange, slice))
.produces(m1, slice.row_ranges(*s.schema(), m1.key()))
.produces_end_of_stream();
}
}
static void test_query_no_clustering_ranges_no_static_columns(populate_fn populate) {
simple_schema s(simple_schema::with_static::no);
auto pkeys = s.make_pkeys(1);
mutation m1(pkeys[0], s.schema());
m1.partition().apply(s.new_tombstone());
s.add_row(m1, s.make_ckey(0), "v1");
s.add_row(m1, s.make_ckey(1), "v2");
mutation_source ms = populate(s.schema(), {m1});
{
auto prange = dht::partition_range::make_ending_with(dht::ring_position(m1.decorated_key()));
assert_that(ms.make_flat_mutation_reader(s.schema(), prange, s.schema()->full_slice()))
.produces(m1)
.produces_end_of_stream();
}
// multi-partition case
{
auto slice = partition_slice_builder(*s.schema())
.with_ranges({})
.build();
auto prange = dht::partition_range::make_ending_with(dht::ring_position(m1.decorated_key()));
assert_that(ms(s.schema(), prange, slice))
.produces(m1, slice.row_ranges(*s.schema(), m1.key()))
.produces_end_of_stream();
}
// single-partition case
{
auto slice = partition_slice_builder(*s.schema())
.with_ranges({})
.build();
auto prange = dht::partition_range::make_singular(m1.decorated_key());
assert_that(ms(s.schema(), prange, slice))
.produces(m1, slice.row_ranges(*s.schema(), m1.key()))
.produces_end_of_stream();
}
}
void test_streamed_mutation_forwarding_succeeds_with_no_data(populate_fn populate) {
@@ -881,6 +975,7 @@ void test_streamed_mutation_forwarding_succeeds_with_no_data(populate_fn populat
}
void run_mutation_reader_tests(populate_fn populate) {
test_date_tiered_clustering_slicing(populate);
test_fast_forwarding_across_partitions_to_empty_range(populate);
test_clustering_slices(populate);
test_streamed_mutation_fragments_have_monotonic_positions(populate);
@@ -890,6 +985,7 @@ void run_mutation_reader_tests(populate_fn populate) {
test_streamed_mutation_forwarding_is_consistent_with_slicing(populate);
test_range_queries(populate);
test_query_only_static_row(populate);
test_query_no_clustering_ranges_no_static_columns(populate);
}
void run_conversion_to_mutation_reader_tests(populate_fn populate) {

View File

@@ -485,7 +485,7 @@ SEASTAR_TEST_CASE(test_cache_delegates_to_underlying_only_once_multiple_mutation
test(ds, query::full_partition_range, partitions.size() + 1);
test(ds, query::full_partition_range, partitions.size() + 1);
cache->invalidate([] {}, key_after_all);
cache->invalidate([] {}, key_after_all).get();
assert_that(ds(s, query::full_partition_range))
.produces(slice(partitions, query::full_partition_range))

View File

@@ -30,6 +30,7 @@
#include "mutation.hh"
#include "schema_builder.hh"
#include "streamed_mutation.hh"
#include "sstable_utils.hh"
// Helper for working with the following table:
//
@@ -47,11 +48,12 @@ public:
return {new_timestamp(), gc_clock::now()};
}
public:
simple_schema()
using with_static = bool_class<class static_tag>;
simple_schema(with_static ws = with_static::yes)
: _s(schema_builder("ks", "cf")
.with_column("pk", utf8_type, column_kind::partition_key)
.with_column("ck", utf8_type, column_kind::clustering_key)
.with_column("s1", utf8_type, column_kind::static_column)
.with_column("s1", utf8_type, ws ? column_kind::static_column : column_kind::regular_column)
.with_column("v", utf8_type)
.build())
, _v_def(*_s->get_column_definition(to_bytes("v")))
@@ -146,12 +148,14 @@ public:
// Creates a sequence of keys in ring order
std::vector<dht::decorated_key> make_pkeys(int n) {
std::vector<dht::decorated_key> keys;
for (int i = 0; i < n; ++i) {
keys.push_back(make_pkey(i));
}
std::sort(keys.begin(), keys.end(), dht::decorated_key::less_comparator(_s));
return keys;
auto local_keys = make_local_keys(n, _s);
return boost::copy_range<std::vector<dht::decorated_key>>(local_keys | boost::adaptors::transformed([this] (sstring& key) {
return make_pkey(std::move(key));
}));
}
dht::decorated_key make_pkey() {
return make_pkey(make_local_key(_s));
}
static std::vector<dht::ring_position> to_ring_positions(const std::vector<dht::decorated_key>& keys) {

View File

@@ -19,6 +19,45 @@
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "sstables/sstables.hh"
#include "dht/i_partitioner.hh"
#include <boost/range/irange.hpp>
#include <boost/range/adaptor/map.hpp>
sstables::shared_sstable make_sstable_containing(std::function<sstables::shared_sstable()> sst_factory, std::vector<mutation> muts);
//
// Make set of keys sorted by token for current shard.
//
static std::vector<sstring> make_local_keys(unsigned n, const schema_ptr& s, size_t min_key_size = 1) {
std::vector<std::pair<sstring, dht::decorated_key>> p;
p.reserve(n);
auto key_id = 0U;
auto generated = 0U;
while (generated < n) {
auto raw_key = sstring(std::max(min_key_size, sizeof(key_id)), int8_t(0));
std::copy_n(reinterpret_cast<int8_t*>(&key_id), sizeof(key_id), raw_key.begin());
auto dk = dht::global_partitioner().decorate_key(*s, partition_key::from_single_value(*s, to_bytes(raw_key)));
key_id++;
if (engine_is_ready() && engine().cpu_id() != dht::global_partitioner().shard_of(dk.token())) {
continue;
}
generated++;
p.emplace_back(std::move(raw_key), std::move(dk));
}
boost::sort(p, [&] (auto& p1, auto& p2) {
return p1.second.less_compare(*s, p2.second);
});
return boost::copy_range<std::vector<sstring>>(p | boost::adaptors::map_keys);
}
//
// Return one key for current shard. Note that it always returns the same key for a given shard.
//
inline sstring make_local_key(const schema_ptr& s, size_t min_key_size = 1) {
return make_local_keys(1, s, min_key_size).front();
}

View File

@@ -178,6 +178,7 @@ SEASTAR_TEST_CASE(test_mutation_merger_conforms_to_mutation_source) {
muts.push_back(mutation(m.decorated_key(), m.schema()));
}
fragment_scatterer c{muts};
c.consume(m.partition().partition_tombstone());
auto sm = streamed_mutation_from_mutation(m);
do_consume_streamed_mutation_flattened(sm, c).get();
for (int i = 0; i < n; ++i) {

View File

@@ -66,12 +66,12 @@ void cql_server::event_notifier::on_create_keyspace(const sstring& ks_name)
{
for (auto&& conn : _schema_change_listeners) {
using namespace cql_transport;
with_gate(conn->_pending_requests_gate, [&] {
return conn->write_response(conn->make_schema_change_event(event::schema_change{
if (!conn->_pending_requests_gate.is_closed()) {
conn->write_response(conn->make_schema_change_event(event::schema_change{
event::schema_change::change_type::CREATED,
ks_name
}));
});
};
}
}
@@ -79,14 +79,14 @@ void cql_server::event_notifier::on_create_column_family(const sstring& ks_name,
{
for (auto&& conn : _schema_change_listeners) {
using namespace cql_transport;
with_gate(conn->_pending_requests_gate, [&] {
return conn->write_response(conn->make_schema_change_event(event::schema_change{
if (!conn->_pending_requests_gate.is_closed()) {
conn->write_response(conn->make_schema_change_event(event::schema_change{
event::schema_change::change_type::CREATED,
event::schema_change::target_type::TABLE,
ks_name,
cf_name
}));
});
};
}
}
@@ -94,14 +94,14 @@ void cql_server::event_notifier::on_create_user_type(const sstring& ks_name, con
{
for (auto&& conn : _schema_change_listeners) {
using namespace cql_transport;
with_gate(conn->_pending_requests_gate, [&] {
return conn->write_response(conn->make_schema_change_event(event::schema_change{
if (!conn->_pending_requests_gate.is_closed()) {
conn->write_response(conn->make_schema_change_event(event::schema_change{
event::schema_change::change_type::CREATED,
event::schema_change::target_type::TYPE,
ks_name,
type_name
}));
});
};
}
}
@@ -124,12 +124,12 @@ void cql_server::event_notifier::on_update_keyspace(const sstring& ks_name)
{
for (auto&& conn : _schema_change_listeners) {
using namespace cql_transport;
with_gate(conn->_pending_requests_gate, [&] {
return conn->write_response(conn->make_schema_change_event(event::schema_change{
if (!conn->_pending_requests_gate.is_closed()) {
conn->write_response(conn->make_schema_change_event(event::schema_change{
event::schema_change::change_type::UPDATED,
ks_name
}));
});
};
}
}
@@ -137,14 +137,14 @@ void cql_server::event_notifier::on_update_column_family(const sstring& ks_name,
{
for (auto&& conn : _schema_change_listeners) {
using namespace cql_transport;
with_gate(conn->_pending_requests_gate, [&] {
return conn->write_response(conn->make_schema_change_event(event::schema_change{
if (!conn->_pending_requests_gate.is_closed()) {
conn->write_response(conn->make_schema_change_event(event::schema_change{
event::schema_change::change_type::UPDATED,
event::schema_change::target_type::TABLE,
ks_name,
cf_name
}));
});
};
}
}
@@ -152,14 +152,14 @@ void cql_server::event_notifier::on_update_user_type(const sstring& ks_name, con
{
for (auto&& conn : _schema_change_listeners) {
using namespace cql_transport;
with_gate(conn->_pending_requests_gate, [&] {
return conn->write_response(conn->make_schema_change_event(event::schema_change{
if (!conn->_pending_requests_gate.is_closed()) {
conn->write_response(conn->make_schema_change_event(event::schema_change{
event::schema_change::change_type::UPDATED,
event::schema_change::target_type::TYPE,
ks_name,
type_name
}));
});
};
}
}
@@ -182,12 +182,12 @@ void cql_server::event_notifier::on_drop_keyspace(const sstring& ks_name)
{
for (auto&& conn : _schema_change_listeners) {
using namespace cql_transport;
with_gate(conn->_pending_requests_gate, [&] {
return conn->write_response(conn->make_schema_change_event(event::schema_change{
if (!conn->_pending_requests_gate.is_closed()) {
conn->write_response(conn->make_schema_change_event(event::schema_change{
event::schema_change::change_type::DROPPED,
ks_name
}));
});
};
}
}
@@ -195,14 +195,14 @@ void cql_server::event_notifier::on_drop_column_family(const sstring& ks_name, c
{
for (auto&& conn : _schema_change_listeners) {
using namespace cql_transport;
with_gate(conn->_pending_requests_gate, [&] {
return conn->write_response(conn->make_schema_change_event(event::schema_change{
if (!conn->_pending_requests_gate.is_closed()) {
conn->write_response(conn->make_schema_change_event(event::schema_change{
event::schema_change::change_type::DROPPED,
event::schema_change::target_type::TABLE,
ks_name,
cf_name
}));
});
};
}
}
@@ -210,14 +210,14 @@ void cql_server::event_notifier::on_drop_user_type(const sstring& ks_name, const
{
for (auto&& conn : _schema_change_listeners) {
using namespace cql_transport;
with_gate(conn->_pending_requests_gate, [&] {
return conn->write_response(conn->make_schema_change_event(event::schema_change{
if (!conn->_pending_requests_gate.is_closed()) {
conn->write_response(conn->make_schema_change_event(event::schema_change{
event::schema_change::change_type::DROPPED,
event::schema_change::target_type::TYPE,
ks_name,
type_name
}));
});
};
}
}
@@ -240,9 +240,9 @@ void cql_server::event_notifier::on_join_cluster(const gms::inet_address& endpoi
{
for (auto&& conn : _topology_change_listeners) {
using namespace cql_transport;
with_gate(conn->_pending_requests_gate, [&] {
return conn->write_response(conn->make_topology_change_event(event::topology_change::new_node(endpoint, conn->_server_addr.port)));
});
if (!conn->_pending_requests_gate.is_closed()) {
conn->write_response(conn->make_topology_change_event(event::topology_change::new_node(endpoint, conn->_server_addr.port)));
};
}
}
@@ -250,9 +250,9 @@ void cql_server::event_notifier::on_leave_cluster(const gms::inet_address& endpo
{
for (auto&& conn : _topology_change_listeners) {
using namespace cql_transport;
with_gate(conn->_pending_requests_gate, [&] {
return conn->write_response(conn->make_topology_change_event(event::topology_change::removed_node(endpoint, conn->_server_addr.port)));
});
if (!conn->_pending_requests_gate.is_closed()) {
conn->write_response(conn->make_topology_change_event(event::topology_change::removed_node(endpoint, conn->_server_addr.port)));
};
}
}
@@ -260,9 +260,9 @@ void cql_server::event_notifier::on_move(const gms::inet_address& endpoint)
{
for (auto&& conn : _topology_change_listeners) {
using namespace cql_transport;
with_gate(conn->_pending_requests_gate, [&] {
return conn->write_response(conn->make_topology_change_event(event::topology_change::moved_node(endpoint, conn->_server_addr.port)));
});
if (!conn->_pending_requests_gate.is_closed()) {
conn->write_response(conn->make_topology_change_event(event::topology_change::moved_node(endpoint, conn->_server_addr.port)));
};
}
}
@@ -273,9 +273,9 @@ void cql_server::event_notifier::on_up(const gms::inet_address& endpoint)
if (!was_up) {
for (auto&& conn : _status_change_listeners) {
using namespace cql_transport;
with_gate(conn->_pending_requests_gate, [&] {
return conn->write_response(conn->make_status_change_event(event::status_change::node_up(endpoint, conn->_server_addr.port)));
});
if (!conn->_pending_requests_gate.is_closed()) {
conn->write_response(conn->make_status_change_event(event::status_change::node_up(endpoint, conn->_server_addr.port)));
};
}
}
}
@@ -287,9 +287,9 @@ void cql_server::event_notifier::on_down(const gms::inet_address& endpoint)
if (!was_down) {
for (auto&& conn : _status_change_listeners) {
using namespace cql_transport;
with_gate(conn->_pending_requests_gate, [&] {
return conn->write_response(conn->make_status_change_event(event::status_change::node_down(endpoint, conn->_server_addr.port)));
});
if (!conn->_pending_requests_gate.is_closed()) {
conn->write_response(conn->make_status_change_event(event::status_change::node_down(endpoint, conn->_server_addr.port)));
};
}
}
}

View File

@@ -591,8 +591,8 @@ future<> cql_server::connection::process()
return write_response(make_error(0, exceptions::exception_code::SERVER_ERROR, "unknown error", tracing::trace_state_ptr()));
}
}).finally([this] {
_server._notifier->unregister_connection(this);
return _pending_requests_gate.close().then([this] {
_server._notifier->unregister_connection(this);
return _ready_to_respond.finally([this] {
return _write_buf.close();
});