Compare commits

...

99 Commits

Author SHA1 Message Date
Jenkins
d27eb734a7 release: prepare for 2.2.2 by hagitsegev 2019-01-12 18:28:25 +02:00
Avi Kivity
e6aeb490b5 Update seastar submodule
* seastar 6f61d74...88cb58c (2):
  > reactor: disable nowait aio due to a kernel bug
  > configure.py: Enhance detection for gcc -fvisibility=hidden bug

Fixes #3996.
2018-12-17 15:57:58 +02:00
Vladimir Krivopalov
2e3b09b593 database: Capture io_priority_class by reference to avoid dangling ref.
The original reference points to a thread-local storage object that
guaranteed to outlive the continuation, but copying it make the
subsequent calls point to a local object and introduces a use-after-free
bug.

Fixes #3948

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
(cherry picked from commit 68458148e7)
2018-12-02 13:32:59 +02:00
Tomasz Grabiec
92c74f4e0b utils: phased_barrier: Make advance_and_await() have strong exception guarantees
Currently, when advance_and_await() fails to allocate the new gate
object, it will throw bad_alloc and leave the phased_barrier object in
an invalid state. Calling advance_and_await() again on it will result
in undefined behavior (typically SIGSEGV) beacuse _gate will be
disengaged.

One place affected by this is table::seal_active_memtable(), which
calls _flush_barrier.advance_and_await(). If this throws, subsequent
flush attempts will SIGSEGV.

This patch rearranges the code so that advance_and_await() has strong
exception guarantees.
Message-Id: <1542645562-20932-1-git-send-email-tgrabiec@scylladb.com>

Fixes #3931.

(cherry picked from commit 57e25fa0f8)
2018-11-21 12:18:25 +02:00
Avi Kivity
89d835e9e3 tests: fix network_topology_test timing out in debug mode
In 2.2, SEASTAR_DEBUG is just DEBUG.
2018-10-21 19:04:08 +03:00
Takuya ASADA
263a740084 dist/debian: use --configfile to specify pbuilderrc
Use --configfile to specify pbuilderrc, instead of copying it to home directory.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180420024624.9661-1-syuu@scylladb.com>
(cherry picked from commit 01c36556bf)
2018-10-21 18:21:18 +03:00
Avi Kivity
7f24b5319e release: prepare for 2.2.1 2018-10-19 21:16:14 +03:00
Avi Kivity
fe16c0e985 locator: fix abstract_replication_strategy::get_ranges() and friends violating sort order
get_ranges() is supposed to return ranges in sorted order. However, a35136533d
broke this and returned the range that was supposed to be last in the second
position (e.g. [0, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9]). The broke cleanup, which
relied on the sort order to perform a binary search. Other users of the
get_ranges() family did not rely on the sort order.

Fixes #3872.
Message-Id: <20181019113613.1895-1-avi@scylladb.com>

(cherry picked from commit 1ce52d5432)
2018-10-19 21:16:12 +03:00
Glauber Costa
f85badaaac api: use longs instead of ints for snapshot sizes
Int types in json will be serialized to int types in C++. They will then
only be able to handle 4GB, and we tend to store more data than that.

Without this patch, listsnapshots is broken in all versions.

Fixes: #3845

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20181012155902.7573-1-glauber@scylladb.com>
(cherry picked from commit 98332de268)
2018-10-12 22:02:56 +03:00
Eliran Sinvani
2193d41683 cql3 : add workaround to antlr3 null dereference bug
The Antlr3 exception class has a null dereference bug that crashes
the system when trying to extract the exception message using
ANTLR_Exception<...>::displayRecognitionError(...) function. When
a parsing error occurs the CqlParser throws an exception which in
turn processesed for some special cases in scylla to generate a custom
message. The default case however, creates the message using
displayRecognitionError, causing the system to crash.
The fix is a simple workaround, making sure the pointer is not null
before the call to the function. A "proper" fix can't be implemented
because the exception class itself is implemented outside scylla
in antlr headers that resides on the host machine os.

Tested manualy 2 testcases, a typo causing scylla to crash and
a cql comment without a newline at the end also caused scylla to crash.
Ran unit tests (release).

Fixes #3740
Fixes #3764

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <cfc7e0d758d7a855d113bb7c8191b0fd7d2e8921.1538566542.git.eliransin@scylladb.com>
(cherry picked from commit 20f49566a2)
2018-10-08 11:02:16 +03:00
Avi Kivity
1e1f0c29bf utils: crc32: mark power crc32 assembly as not requiring an executable stack
The linker uses an opt-in system for non-executable stack: if all object files
opt into a non-executable stack, the binary will have a non-executable stack,
which is very desirable for security. The compiler cooperates by opting into
a non-executable stack whenever possible (always for our code).

However, we also have an assembly file (for fast power crc32 computations).
Since it doesn't opt into a non-executable stack, we get a binary with
executable stack, which Gentoo's build system rightly complains about.

Fix by adding the correct incantation to the file.

Fixes #3799.

Reported-by: Alexys Jacob <ultrabug@gmail.com>
Message-Id: <20181002151251.26383-1-avi@scylladb.com>
(cherry picked from commit aaab8a3f46)
2018-10-08 11:02:16 +03:00
Calle Wilund
84d4588b5f storage_proxy: Add missing re-throw in truncate_blocking
Iff truncation times out, we want to log it, but the exception should
not be swallowed, but re-thrown.

Fixes #3796.

Message-Id: <20181001112325.17809-1-calle@scylladb.com>
(cherry picked from commit 2996b8154f)
2018-10-08 11:02:16 +03:00
Duarte Nunes
7b43b26709 tests/aggregate_fcts_test: Add test case for wrapped types
Provide a test case which checks a type being wrapped in a
reverse_type plays no role in assignment.

Refs #3789

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180927223201.28152-2-duarte@scylladb.com>
(cherry picked from commit 17578c3579)
2018-10-08 11:02:16 +03:00
Duarte Nunes
0ed01acf15 cql3/selection/selector: Unwrap types when validating assignment
When validating assignment between two types, it's possible one of
them is wrapped in a reverse_type, if it comes, for example, from the
type associated with a clustering column. When checking for weak
assignment the types are correctly unwrapped, but not when checking
for an exact match, which this patch fixes.

Technically, the receiver is never a reversed_type for the current
callers, but this is the morally correct implementation, as the type
being reversed or not plays no role in assignment.

Tests: unit(release)

Fixes #3789

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180927223201.28152-1-duarte@scylladb.com>
(cherry picked from commit 5e7bb20c8a)
2018-10-08 11:02:16 +03:00
Gleb Natapov
7ce160f408 mutation_query_test: add test for result size calculation
Check that digest only and digest+data query calculate result size to be
the same.

Message-Id: <20180906153800.GK2326@scylladb.com>
(cherry picked from commit 9e438933a2)

Message-Id: <20181008075901.GC2380@scylladb.com>
2018-10-08 11:02:09 +03:00
Gleb Natapov
5017d9b46a mutation_partition: accurately account for result size in digest only queries
When measuring_output_stream is used to calculate result's element size
it incorrectly takes into account not only serialized element size, but
a placeholder that ser::qr_partition__rows/qr_partition__static_row__cells
constructors puts in the beginning. Fix it by taking starting point in a
stream before element serialization and subtracting it afterwords.

Fixes #3755

Message-Id: <20180906153609.GJ2326@scylladb.com>
(cherry picked from commit d7674288a9)
2018-10-07 18:16:19 +03:00
Gleb Natapov
50b6ab3552 mutation_partition: correctly measure static row size when doing digest calculation
The code uses incorrect output stream in case only digest is requested
and thus getting incorrect data size. Failing to correctly account
for static row size while calculating digest may cause digest mismatch
between digest and data query.

Fixes #3753.

Message-Id: <20180905131219.GD2326@scylladb.com>
(cherry picked from commit 98092353df)
2018-09-06 16:51:19 +03:00
Eliran Sinvani
b1652823aa cql3: ensure repeated values in IN clauses don't return repeated rows
When the list of values in the IN list of a single column contains
duplicates, multiple executors are activated since the assumption
is that each value in the IN list corresponds to a different partition.
this results in the same row appearing in the result number times
corresponding to the duplication of the partition value.

Added queries for the in restriction unitest and fixed with a bad result check.

Fixes #2837
Tests: Queries as in the usecase from the GitHub issue in both forms ,
prepared and plain (using python driver),Unitest.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <ad88b7218fa55466be7bc4303dc50326a3d59733.1534322238.git.eliransin@scylladb.com>
(cherry picked from commit d734d316a6)
2018-08-26 15:51:17 +03:00
Tomasz Grabiec
02b24aec34 Merge 'Fix multi-cell static list updates in the presence of ckeys' from Duarte
Fixes a regression introduced in
9e88b60ef5, which broke the lookup for
prefetched values of lists when a clustering key is specified.

This is the code that was removed from some list operations:

 std::experimental::optional<clustering_key> row_key;
 if (!column.is_static()) {
   row_key = clustering_key::from_clustering_prefix(*params._schema, prefix);
 }
 ...
 auto&& existing_list = params.get_prefetched_list(m.key().view(), row_key, column);

Put it back, in the form of common code in the update_parameters class.

Fixes #3703

* https://github.com/duarten/scylla cql-list-fixes/v1:
  tests/cql_query_test: Test multi-cell static list updates with ckeys
  cql3/lists: Fix multi-cell static list updates in the presence of ckeys
  keys: Add factory for an empty clustering_key_prefix_view

(cherry picked from commit 6937cc2d1c)
2018-08-21 21:39:22 +01:00
Duarte Nunes
22eea4d8cf cql3/query_options: Use _value_views in prepare()
_value_views is the authoritative data structure for the
client-specified values. Indeed, the ctor called
transport::request::read_options() leaves _values completely empty.

In query_options::prepare() we were, however, using _values to
associated values to the client-specified column names, and not
_value_views. Fix this by using _value_views instead.

As for the reasons we didn't see this bug earlier, I assume it's
because very few drivers set the 0x04 query options flag, which means
column names are omitted. This is the right thing to do since most
drivers have enough information to correctly position the values.

Fixes #3688

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180814234605.14775-1-duarte@scylladb.com>
(cherry picked from commit a4355fe7e7)
2018-08-21 21:39:22 +01:00
Tomasz Grabiec
d257f6d57c mutation_partition: Fix exception safety of row::apply_monotonically()
When emplace_back() fails, value is already moved-from into a
temporary, which breaks monotonicity expected from
apply_monotonically(). As a result, writes to that cell will be lost.

The fix is to avoid the temporary by in-place construction of
cell_and_hash. To do that, appropriate cell_and_hash constructor was
added.

Found by mutation_test.cc::test_apply_monotonically_is_monotonic with
some modifications to the random mutation generator.

Introduced in 99a3e3a.

Fixes #3678.

Message-Id: <1533816965-27328-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 024b3c9fd9)
2018-08-21 21:39:18 +01:00
Takuya ASADA
6fca92ac3c dist/common/scripts/scylla_ec2_check: support custom NIC ifname on EC2
This is bash version of commit 88fe3c2694.

Since some AMIs using consistent network device naming, primary NIC
ifname is not 'eth0'.
But we hardcoded NIC name as 'eth0' on scylla_ec2_check, we need to add
--nic option to specify custom NIC ifname.

Fixes #3658

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180807231650.13697-1-syuu@scylladb.com>
2018-08-08 09:16:57 +03:00
Jesse Haber-Kucharsky
26e3917046 auth: Don't use unsupported hashing algorithms
In previous versions of Fedora, the `crypt_r` function returned
`nullptr` when a requested hashing algorithm was not supported.

This is consistent with the documentation of the function in its man
page.

As of Fedora 28, the function's behavior changes so that the encrypted
text is not `nullptr` on error, but instead the string "*0".

The info pages for `crypt_r` clarify somewhat (and contradict the man
pages):

    Some implementations return `NULL` on failure, and others return an
    _invalid_ hashed passphrase, which will begin with a `*` and will
    not be the same as SALT.

Because of this change of behavior, users running Scylla on a Fedora 28
machine which was upgraded from a previous release would not be able to
authenticate: an unsupported hashing algorithm would be selected,
producing encrypted text that did not match the entry in the table.

With this change, unsupported algorithms are correctly detected and
users should be able to continue to authenticate themselves.

Fixes #3637.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <bcd708f3ec195870fa2b0d147c8910fb63db7e0e.1533322594.git.jhaberku@scylladb.com>
(cherry picked from commit fce10f2c6e)
2018-08-05 10:30:47 +03:00
Gleb Natapov
3892594a93 cache_hitrate_calculator: fix race when new table is added during calculations
The calculation consists of several parts with preemption point between
them, so a table can be added while calculation is ongoing. Do not
assume that table exists in intermediate data structure.

Fixes #3636

Message-Id: <20180801093147.GD23569@scylladb.com>
(cherry picked from commit 44a6afad8c)
2018-08-01 14:34:08 +03:00
Amos Kong
4b24439841 scylla_setup: fix conditional statement of silent mode
Commit 300af65555 introdued a problem in
conditional statement, script will always abort in silent mode, it doesn't
care about the return value.

Fixes #3485

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <1c12ab04651352964a176368f8ee28f19ae43c68.1528077114.git.amos@scylladb.com>
(cherry picked from commit 364c2551c8)
2018-07-25 09:36:32 +03:00
Takuya ASADA
a02a4592d8 dist/common/scripts/scylla_setup: abort running script when one of setup failed in silent mode
Current script silently continues even one of setup fails, need to
abort.

Fixes #3433

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180522180355.1648-1-syuu@scylladb.com>
(cherry picked from commit 300af65555)
2018-07-25 09:36:29 +03:00
Avi Kivity
b6e1c08451 Merge "row_cache: Fix violation of continuity on concurrent eviction and population" from Tomasz
"
The problem happens under the following circumstances:

  - we have a partially populated partition in cache, with a gap in the middle

  - a read with no clustering restrictions trying to populate that gap

  - eviction of the entry for the lower bound of the gap concurrent with population

The population may incorrectly mark the range before the gap as continuous.
This may result in temporary loss of writes in that clustering range. The
problem heals by clearing cache.

Caught by row_cache_test::test_concurrent_reads_and_eviction, which has been
failing sporadically.

The problem is in ensure_population_lower_bound(), which returns true if
current clustering range covers all rows, which means that the populator has a
right to set continuity flag to true on the row it inserts. This is correct
only if the current population range actually starts since before all
clustering rows. Otherwise, we're populating since _last_row and should
consult it.

Fixes #3608.
"

* 'tgrabiec/fix-violation-of-continuity-on-concurrent-read-and-eviction' of github.com:tgrabiec/scylla:
  row_cache: Fix violation of continuity on concurrent eviction and population
  position_in_partition: Introduce is_before_all_clustered_rows()

(cherry picked from commit 31151cadd4)
2018-07-18 12:07:01 +02:00
Botond Dénes
9469afcd27 storage_proxy: use the original row limits for the final results merging
`query_partition_key_range()` does the final result merging and trimming
(if necessary) to make sure we don't send more rows to the client than
requested. This merging and trimming is done by a continuation attached
to the `query_partition_key_range_concurrent()` which does the actual
querying. The continuations captures via value the `row_limit` and
`partition_limit` fields of the `query::read_command` object of the
query. This has an unexpected consequence. The lambda object is
constructed after the call to `query_partition_key_range_concurrent()`
returns. If this call doesn't defer, any modifications done to the read
command object done by `query_partition_key_range_concurrent()` will be
visible to the lambda. This is undesirable because
`query_partition_key_range_concurrent()` updates the read command object
directly as the vnodes are traversed which in turn will result in the
lambda doing the final trimming according to a decremented `row_limits`,
which will cause the paging logic to declare the query as exhausted
prematurely because the page will not be full.
To avoid all this make a copy of the relevant limit fields before
`query_partition_key_range_concurrent()` is called and pass these copies
to the continuation, thus ensuring that the final trimming will be done
according to the original page limits.

Spotted while investigating a dtest failure on my 1865/range-scans/v2
branch. On that branch the way range scans are executed on replicas is
completely refactored. These changes appearantly reduce the number of
continuations in the read path to the point where an entire page can be
filled without deferring and thus causing the problem to surface.

Fixes #3605.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <f11e80a6bf8089d49ba3c112b25a69edf1a92231.1531743940.git.bdenes@scylladb.com>
(cherry picked from commit cc4acb6e26)
2018-07-16 17:51:06 +03:00
Avi Kivity
240b9f122b Merge "Backport empty partition range scan fixes" from Botond
"
This mini-series lumps together the fix for the empty partition range
scan crash (#3564) and the two follow-up patches.
"

* 'paging-fix-backport-2.2/v1' of https://github.com/denesb/scylla:
  query_pager: use query::is_single_partition() to check for singular range
  tests/cql_query_tess: add unit test for querying empty ranges test
  query_pager: be prepared to _ranges being empty
2018-07-05 10:29:31 +03:00
Botond Dénes
cb16cd7724 query_pager: use query::is_single_partition() to check for singular range
Use query::is_single_partition() to check whether the queried ranges are
singular or not. The current method of using
`dht::partition_range::is_singular()` is incorrect, as it is possible to
build a singular range that doesn't represent a single partition.
`query::is_single_partition()` correctly checks for this so use it
instead.

Found during code-review.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <f671f107e8069910a2f84b14c8d22638333d571c.1530675889.git.bdenes@scylladb.com>
(cherry picked from commit 8084ce3a8e)
2018-07-04 12:57:45 +03:00
Botond Dénes
c864d198fc tests/cql_query_tess: add unit test for querying empty ranges test
A bug was found recently (#3564) in the paging logic, where the code
assumed the queried ranges list is non-empty. This assumption is
incorrect as there can be valid (if rare) queries that can result in the
ranges list to be empty. Add a unit test that executes such a query with
paging enabled to detect any future bugs related to assumptions about
the ranges list being non-empty.

Refs: #3564
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <f5ba308c4014c24bb392060a7e72e7521ff021fa.1530618836.git.bdenes@scylladb.com>
(cherry picked from commit c236a96d7d)
2018-07-04 09:52:54 +03:00
Botond Dénes
25125e9c4f query_pager: be prepared to _ranges being empty
do_fetch_page() checks in the beginning whether there is a saved query
state already, meaning this is not the first page. If there is not it
checks whether the query is for a singulular partitions or a range scan
to decide whether to enable the stateful queries or not. This check
assumed that there is at least one range in _ranges which will not hold
under some circumstances. Add a check for _ranges being empty.

Fixes: #3564
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <cbe64473f8013967a93ef7b2104c7ca0507afac9.1530610709.git.bdenes@scylladb.com>
(cherry picked from commit 59a30f0684)
2018-07-04 09:52:54 +03:00
Shlomi Livne
faf10fe6aa release: prepare for 2.2.0
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2018-07-01 22:40:42 +03:00
Calle Wilund
f76269cdcf sstables::compress: Ensure unqualified compressor name if possible
Fixes #3546

Both older origin and scylla writes "known" compressor names (i.e. those
in origin namespace) unqualified (i.e. LZ4Compressor).

This behaviour was not preserved in the virtualization change. But
probably should be.

Message-Id: <20180627110930.1619-1-calle@scylladb.com>
(cherry picked from commit 054514a47a)
2018-06-28 18:55:15 +03:00
Avi Kivity
a9b0ccf116 Merge "Disable sstable filtering based on min/max clustering key components" from Tomasz
"
With DateTiered and TimeWindow, there is a read optimization enabled
which excludes sstables based on overlap with recorded min/max values
of clustering key components. The problem is that it doesn't take into
account partition tombstones and static rows, which should still be
returned by the reader even if there is no overlap in the query's
clustering range. A read which returns no clustering rows can
mispopulate cache, which will appear as partition deletion or writes
to the static row being lost. Until node restart or eviction of the
partition entry.

There is also a bad interaction between cache population on read and
that optimization. When the clustering range of the query doesn't
overlap with any sstable, the reader will return no partition markers
for the read, which leads cache populator to assume there is no
partition in sstables and it will cache an empty partition. This will
cause later reads of that partition to miss prior writes to that
partition until it is evicted from cache or node is restarted.

Disable until a more elaborate fix is implemented.

Fixes #3552
Fixes #3553
"

* tag 'tgrabiec/disable-min-max-sstable-filtering-v1' of github.com:tgrabiec/scylla:
  tests: Add test for slicing a mutation source with date tiered compaction strategy
  tests: Check that database conforms to mutation source
  database: Disable sstable filtering based on min/max clustering key components

(cherry picked from commit e1efda8b0c)
2018-06-28 18:55:15 +03:00
Tomasz Grabiec
abc5941f87 flat_mutation_reader: Move field initialization to initializer list
This works around a problem of std::terminate() being called in debug
mode build if initialization of _current throws.

Backtrace:

Thread 2 "row_cache_test_" received signal SIGABRT, Aborted.
0x00007ffff17ce9fb in raise () from /lib64/libc.so.6
(gdb) bt
  #0  0x00007ffff17ce9fb in raise () from /lib64/libc.so.6
  #1  0x00007ffff17d077d in abort () from /lib64/libc.so.6
  #2  0x00007ffff5773025 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
  #3  0x00007ffff5770c16 in ?? () from /lib64/libstdc++.so.6
  #4  0x00007ffff576fb19 in ?? () from /lib64/libstdc++.so.6
  #5  0x00007ffff5770508 in __gxx_personality_v0 () from /lib64/libstdc++.so.6
  #6  0x00007ffff3ce4ee3 in ?? () from /lib64/libgcc_s.so.1
  #7  0x00007ffff3ce570e in _Unwind_Resume () from /lib64/libgcc_s.so.1
  #8  0x0000000003633602 in reader::reader (this=0x60e0001160c0, r=...) at flat_mutation_reader.cc:214
  #9  0x0000000003655864 in std::make_unique<make_forwardable(flat_mutation_reader)::reader, flat_mutation_reader>(flat_mutation_reader &&) (__args#0=...)
    at /usr/include/c++/7/bits/unique_ptr.h:825
  #10 0x0000000003649a63 in make_flat_mutation_reader<make_forwardable(flat_mutation_reader)::reader, flat_mutation_reader>(flat_mutation_reader &&) (args#0=...)
    at flat_mutation_reader.hh:440
  #11 0x000000000363565d in make_forwardable (m=...) at flat_mutation_reader.cc:270
  #12 0x000000000303f962 in memtable::make_flat_reader (this=0x61300001d540, s=..., range=..., slice=..., pc=..., trace_state_ptr=..., fwd=..., fwd_mr=...)
    at memtable.cc:592

Message-Id: <1528792447-13336-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 6d6b93d1e7)
2018-06-28 18:55:15 +03:00
Asias He
a152ac12af gossip: Fix tokens assignment in assassinate_endpoint
The tokens vector is defined a few lines above and is needed outsie the
if block.

Do not redefine it again in the if block, otherwise the tokens will be empty.

Found by code inspection.

Fixes #3551.

Message-Id: <c7a06375c65c950e94236571127f533e5a60cbfd.1530002177.git.asias@scylladb.com>
(cherry picked from commit c3b5a2ecd5)
2018-06-28 18:55:15 +03:00
Botond Dénes
c274fdf2ec querier: find_querier(): return end() when no querier matches the range
When none of the queriers found for the lookup key match the lookup
range `_entries.end()` should be returned as the search failed. Instead
the iterator returned from the failed `std::find_if()` is returned
which, if the find failed, will be the end iterator returned by the
previous call to `_entries.equal_range()`. This is incorrect because as
long as `equal_range()`'s end iterator is not also `_entries.end()` the
search will always return an iterator to a querier regardless of whether
any of them actually matches the read range.
Fix by returning `_entries.end()` when it is detected that no queriers
match the range.

Fixes: #3530
(cherry picked from commit 2609a17a23)
2018-06-28 18:55:15 +03:00
Botond Dénes
5b88d6b4d6 querier_cache: restructure entries storage
Currently querier_cache uses a `std::unordered_map<utils::UUID, querier>`
to store cache entries and an `std::list<meta_entry>` to store meta
information about the querier entries, like insertion order, expiry
time, etc.

All cache eviction algorithms use the meta-entry list to evict entries
in reverse insertion order (LRU order). To make this possible
meta-entries keep an iterator into the entry map so that given a
meta-entry one can easily erase the querier entry. This however poses a
problem as std::unordered_map can possibly invalidate all its iterators
when new items are inserted. This is use-after-free waiting to happen.

Another disadvantages of the current solution is that it requires the
meta-entry to use a weak pointer to the querier entry so that in case
that is removed (as a result of a successful lookup) it doesn't try to
access it. This has an impact on all cache eviction algorithms as they
have to be prepared to deal with stale meta-entries. Stale meta-entries
also unnecesarily consume memory.

To solve these problems redesign how querier_cache stores entries
completely. Instead of storing the entries in an `std::unordered_map`
and storing the meta-entries in an `std::list`, store the entries in an
`std::list` and an intrusive-map (index) for lookups. This new design
has severeal advantages over the old one:
* The entries will now be in insert order, so eviction strategies can
  work on the entry list itself, no need to involve additional data
  structures for this.
* All data related to an entry is stored in one place, no data
  duplication.
* Removing an entry automatically removes it from the index as intrusive
  containers support auto unlink. This means there is no need to store
  iterators for long terms, risking use-after-free when the container
  invalidates it's iterators.

Additional changes:
* Modify eviction strategies so that they work with the `entry`
  interface rather than the stored value directly.

Ref #3424

(cherry picked from commit 7ce7f3f0cc)
2018-06-28 18:55:15 +03:00
Botond Dénes
2d626e1cf8 tests/querier_cache: fix memory based eviction test
Do increment the key counter after inserting the first querier into the
cache. Otherwise two queriers with the same key will be inserted and
will fail the test. This problem is exposed by the changes the next
patches make to the querier-cache but will be fixed before to maintain
bisectability of the code.

Fixes: #3529
(cherry picked from commit b9d51b4c08)
2018-06-28 18:55:15 +03:00
Avi Kivity
c11bd3e1cf Merge "Do not allow compaction controller shares to grow indefinitely" from Glauber
"
We are seeing some workloads with large datasets where the compaction
controller ends up with a lot of shares. Regardless of whether or not
we'll change the algorithm, this patchset handles a more basic issue,
which is the fact that the current controller doesn't set a maximum
explicitly, so if the input is larger than the maximum it will keep
growing without bounds.

It also pushes the maximum input point of the compaction controller from
10 to 30, allowing us to err on the side of caution for the 2.2 release.
"

* 'tame-controller' of github.com:glommer/scylla:
  controller: do not increase shares of controllers for inputs higher than the maximum
  controller: adjust constants for compaction controller

(cherry picked from commit e0eb66af6b)
2018-06-20 10:58:20 +03:00
Avi Kivity
9df3df92bc Merge "Try harder to move STCS towards zero-backlog" from Glauber
"
Tests: unit (release)

Before merging the LCS controller, we merged patches that would
guarantee that LCS would move towards zero backlog - otherwise the
backlog could get too high.

We didn't do the same for STCS, our first controlled strategy. So we may
end up with a situation where there are many SSTables inducing a large
backlog, but they are not yet meeting the minimum criteria for
compaction. The backlog, then, never goes down.

This patch changes the SSTable selection criteria so that if there is
nothing to do, we'll keep pushing towards reaching a state of zero
backlog. Very similar to what we did for LCS.
"

* 'stcs-min-threshold-v4' of github.com:glommer/scylla:
  STCS: bypass min_threshold unless configure to enforce strictly
  compaction_strategy: allow the user to tell us if min_threshold has to be strict

(cherry picked from commit f0fc888381)
2018-06-18 14:21:52 +03:00
Takuya ASADA
8ad9578a6c dist/debian: add --jobs <njobs> option just like build_rpm.sh
On some build environment we may want to limit number of parallel jobs since
ninja-build runs ncpus jobs by default, it may too many since g++ eats very
huge memory.
So support --jobs <njobs> just like on rpm build script.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180425205439.30053-1-syuu@scylladb.com>
(cherry picked from commit 782ebcece4)
2018-06-14 15:04:50 +03:00
Tomasz Grabiec
4cb6061a9f tests: row_cache: Reduce concurrency limit to avoid bad_alloc
The test uses random mutations. We saw it failing with bad_alloc from time to time.
Reduce concurrency to reduce memory footprint.

Message-Id: <20180611090304.16681-1-tgrabiec@scylladb.com>
(cherry picked from commit a91974af7a)
2018-06-14 13:40:00 +02:00
Tomasz Grabiec
1940e6bd95 tests: row_cache: Do not hang when only one of the readers throws
Message-Id: <20180531122729.3314-1-tgrabiec@scylladb.com>
(cherry picked from commit b5e42bc6a0)
2018-06-14 13:40:00 +02:00
Avi Kivity
044cfde5f3 database: stop using incremental selectors
There is a bug in incremental_selector for partitioned_sstable_set, so
until it is found, stop using it.

This degrades scan performance of Leveled Compaction Strategy tables.

Fixes #3513. (as a workaround)
Introduced: 2.1
Message-Id: <20180613131547.19084-1-avi@scylladb.com>

(cherry picked from commit aeffbb6732)
2018-06-13 21:04:56 +03:00
Vlad Zolotarov
262a246436 locator::ec2_multi_region_snitch: don't call for ec2_snitch::gossiper_starting()
ec2_snitch::gossiper_starting() calls for the base class (default) method
that sets _gossip_started to TRUE and thereby prevents to following
reconnectable_snitch_helper registration.

Fixes #3454

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1528208520-28046-1-git-send-email-vladz@scylladb.com>
(cherry picked from commit 2dde372ae6)
2018-06-12 19:02:19 +03:00
Botond Dénes
799dbb4f2e forwardable reader: implement fast_forward_to(position_in_partition)
Instead of throwing std::bad_function_call. Needed by the foreign_reader
unit test. Not sure how other tests didn't hit this before as the test
is using `run_mutation_source_tests()`.

(cherry picked from commit 50b67232e5)
Fixes #3491.
2018-06-05 12:34:15 +03:00
Shlomi Livne
a2fe669dd3 dist/docker: Switch to Scylla 2.2 repository
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <83b4ff801b283ade512a7035ecea9057a864dcdd.1526995747.git.shlomi@scylladb.com>
2018-06-05 12:34:15 +03:00
Avi Kivity
56de761daf Update seastar submodule
* seastar 7c6ba3a...6f61d74 (1):
  > tls: Ensure handshake always drains output before return/throw

Fixes #3461.
2018-06-05 12:34:15 +03:00
Shlomi Livne
c3187093a3 release: prepare for 2.2.rc2
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2018-05-30 17:32:16 +03:00
Avi Kivity
111c2ecf5d Update scylla-ami submodule
* dist/ami/files/scylla-ami 49896ec...6ed71a3 (1):
  > scylla_install_ami: Update CentOS to latest version
2018-05-28 14:02:43 +03:00
Takuya ASADA
a6ecdbbba6 Revert "dist/ami: update CentOS base image to latest version"
This reverts commit 69d226625a.
Since ami-4bf3d731 is Market Place AMI, not possible to publish public AMI based on it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180523112414.27307-1-syuu@scylladb.com>
(cherry picked from commit 6b1b9f9e602c570bbc96692d30046117e7d31ea7)
2018-05-28 13:40:15 +03:00
Glauber Costa
17cc62d0b3 commitlog: don't move pointer to segment
We are currently moving the pointer we acquired to the segment inside
the lambda in which we'll handle the cycle.

The problem is, we also use that same pointer inside the exception
handler. If an exception happens we'll access it and we'll crash.

Probably #3440.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20180518125820.10726-1-glauber@scylladb.com>
(cherry picked from commit 596a525950)
2018-05-19 19:12:26 +03:00
Shlomi Livne
eb646c61ed release: prepare for 2.2.rc1
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2018-05-16 21:31:50 +03:00
Avi Kivity
782d817e84 dist: redhat: get rid of raid0.devices_discard_performance
This parameter is not available on recent Red Hat kernels or on
non-Red Hat kernels (it was removed on 3.10.0-772.el7,
RHBZ 1455932). The presence of the parameter on kernels that don't
support it cause the module load to fail, with the result that the
storage is not available.

Fix by removing the parameter. For someone running an older Red Hat
kernel the effect will be that discard is disabled, but they can fix
that by updating the kernel. For someone running a newer kernel, the
effect will be that they can access their data.

Fixes #3437.
Message-Id: <20180516134913.6540-1-avi@scylladb.com>

(cherry picked from commit 3b8118d4e5)
2018-05-16 20:13:59 +03:00
Avi Kivity
3ed5e63e8a Update scylla-ami submodule
* dist/ami/files/scylla-ami 02b1853...49896ec (1):
  > Merge "AMI build fix" from Takuya
2018-05-16 12:37:03 +03:00
Tomasz Grabiec
d17ce46983 Update seastar submodule
Fixes #3339.

* seastar 491f994...7c6ba3a (2):
  > Merge "fix perftune.py issues with cpu-masks on big machines" from Vlad
  > Merge 'Handle Intel's NICs in a special way'  from Vlad
2018-05-16 09:37:41 +02:00
Takuya ASADA
7ca5e7e993 dist/redhat: replace scylla-libgcc72/scylla-libstdc++72 with scylla-2.2 metapackage
We have conflict between scylla-libgcc72/scylla-libstdc++72 and
scylla-libgcc73/scylla-libstdc++73, need to replace *72 package with
scylla-2.2 metapackage to prevent it.

Fixes #3373

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180510081246.17928-1-syuu@scylladb.com>
(cherry picked from commit 6fa3c4dcad)
2018-05-11 09:42:28 +03:00
Duarte Nunes
07b0ce27fa Merge 'Include OPTIONS with LIST ROLES' from Jesse
"
Fixes #3420.

Tests: dtest (`auth_test.py`), unit (release)
"

* 'jhk/fix_3420/v2' of https://github.com/hakuch/scylla:
  cql3: Include custom options in LIST ROLES
  auth: Query custom options from the `authenticator`
  auth: Add type alias for custom auth. options

(cherry picked from commit d49348b0e1)
2018-05-10 13:22:49 +03:00
Amnon Heiman
27be3cd242 scylla-housekeeping: support new 2018.1 path variation
Starting from 2018.1 and 2.2 there was a change in the repository path.
It was made to support multiple product (like manager and place the
enterprise in a different path).

As a result, the regular expression that look for the repository fail.

This patch change the way the path is searched, both rpm and debian
varations are combined and both options of the repository path are
unified.

See scylladb/scylla-enterprise#527

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20180429151926.20431-1-amnon@scylladb.com>
(cherry picked from commit 6bf759128b)
2018-05-09 15:22:55 +03:00
Calle Wilund
abf50aafef database: Fix assert in truncate
Fixes crash in cql_tests.StorageProxyCQLTester.table_test
"avoid race condition when deleting sstable on behalf..." changed
discard_sstables behaviour to only return rp:s for sstables owned
and submitted for deletion (not all matching time stamp),
which can in some cases cause zero rp returned.
Message-Id: <20180508070003.1110-1-calle@scylladb.com>
2018-05-09 10:02:09 +01:00
Duarte Nunes
dfe5b38a43 db/view: Limit number of pending view updates
This patch adds a simple and naive mechanism to ensure a base replica
doesn't overwhelm a potentially overloaded view replica by sending too
many concurrent view updates. We add a semaphore to limit to 100 the
number of outstanding view updates. We limit globally per shard, and
not per destination view replica. We also limit statically.

Refs #2538

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180426134457.21290-2-duarte@scylladb.com>
(cherry picked from commit 4b3562c3f5)
2018-05-08 00:46:33 +01:00
Duarte Nunes
9bdc8c25f5 db/view: Return a future when sending view updates
While we now send view mutations asynchronously in the normal view
write path, other processes interested in sending view updates, such
as streaming or view building, may wish to do it synchronously.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit dc44a08370)
2018-05-08 00:46:19 +01:00
Duarte Nunes
e75c55b2db db/timeout_clock: Properly scope type names
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180426134457.21290-1-duarte@scylladb.com>
(cherry picked from commit 2be75bdfc9)
2018-05-07 19:29:48 +01:00
Botond Dénes
756feae052 database: when dropping a table evict all relevant queriers
Queriers shouldn't outlive the table they read from as that could lead
to use-after-free problems when they are destroyed.

Fixes: #3414

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <3d7172cef79bb52b7097596e1d4ebba3a6ff757e.1525716986.git.bdenes@scylladb.com>
(cherry picked from commit 6f7d919470)
2018-05-07 21:20:42 +03:00
Tomasz Grabiec
202b4e6797 storage_proxy: Request schema from the coordinator in the original DC
The mutation forwarding intermediary (src_addr) may not always know
about the schema which was used by the original coordinator. I think
this may be the cause of the "Schema version ... not found" error seen
in one of the clusters which entered some pathological state:

  storage_proxy - Failed to apply mutation from 1.1.1.1#5: std::_Nested_exception<schema_version_loading_failed> (Failed to load schema version 32893223-a911-3a01-ad70-df1eb2a15db1): std::runtime_error (Schema version 32893223-a911-3a01-ad70-df1eb2a15db1 not found)

Fixes #3393.

Message-Id: <1524639030-1696-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 423712f1fe)
2018-05-07 13:08:40 +03:00
Raphael S. Carvalho
76ac200eff database: avoid race condition when deleting sstable on behalf of cf truncate
After removal of deletion manager, caller is now responsible for properly
submitting the deletion of a shared sstable. That's because deletion manager
was responsible for holding deletion until all owners agreed on it.
Resharding for example was changed to delete the shared sstables at the end,
but truncate wasn't changed and so race condition could happen when deleting
same sstable at more than one shard in parallel. Change the operation to only
submit a shared sstable for deletion in only one owner.

Fixes dtest migration_test.TestMigration.migrate_sstable_with_schema_change_test

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20180503193427.24049-1-raphaelsc@scylladb.com>
2018-05-04 13:10:12 +01:00
Tomasz Grabiec
9aa172fe8e db: schema_tables: Treat drop of scylla_tables.version as an alter
After upgrade from 1.7 to 2.0, nodes will record a per-table schema
version which matches that on 1.7 to support the rolling upgrade. Any
later schema change (after the upgrade is done) will drop this record
from affected tables so that the per-table schema version is
recalculated. If nodes perform a schema pull (they detect schema
mismatch), then the merge will affect all tables and will wipe the
per-table schema version record from all tables, even if their schema
did not change. If then only some nodes get restarted, the restarted
nodes will load tables with the new (recalculated) per-table schema
version, while not restarted nodes will still use the 1.7 per-table
schema version. Until all nodes are restarted, writes or reads between
nodes from different groups will involve a needless exchange of schema
definition.

This will manifest in logs with repeated messages indicating schema
merge with no effect, triggered by writes:

  database - Schema version changed to 85ab46cd-771d-36c9-bc37-db6d61bfa31f
  database - Schema version changed to 85ab46cd-771d-36c9-bc37-db6d61bfa31f
  database - Schema version changed to 85ab46cd-771d-36c9-bc37-db6d61bfa31f

The sync will be performed if the receiving shard forgets the foreign
version, which happens if it doesn't process any request referencing
it for more than 1 second.

This may impact latency of writes and reads.

The fix is to treat schema changes which drop the 1.7 per-table schema
version marker as an alter, which will switch in-memory data
structures to use the new per-table schema version immediately,
without the need for a restart.

Fixes #3394

Tests:
    - dtest: schema_test.py, schema_management_test.py
    - reproduced and validated the fix with run_upgrade_tests.sh from git@github.com:tgrabiec/scylla-dtest.git
    - unit (release)

Message-Id: <1524764211-12868-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit b1465291cf)
2018-05-03 10:51:19 +03:00
Takuya ASADA
c4af043ef7 dist/common/scripts/scylla_raid_setup: prevent 'device or resource busy' on creating mdraid device
According to this web site, there is possibility we have race condition with
mdraid creation vs udev:
http://dev.bizo.com/2012/07/mdadm-device-or-resource-busy.html
And looks like it can happen on our AMI, too (see #2784).

To initialize RAID safely, we should wait udev events are finished before and
after mdadm executed.

Fixes #2784

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1505898196-28389-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 4a8ed4cc6f)
2018-04-24 12:53:34 +03:00
Raphael S. Carvalho
06b25320be sstables: Fix bloom filter size after resharding by properly estimating partition count
We were feeding the total estimation partition count of an input shared
sstable to the output unshared ones.

So sstable writer thinks, *from estimation*, that each sstable created
by resharding will have the same data amount as the shared sstable they
are being created from. That's a problem because estimation is feeded to
bloom filter creation which directly influences its size.
So if we're resharding all sstables that belong to all shards, the
disk usage taken by filter components will be multiplied by the number
of shards. That becomes more of a problem with #3302.

Partition count estimation for a shard S will now be done as follow:
    //
    // TE, the total estimated partition count for a shard S, is defined as
    // TE = Sum(i = 0...N) { Ei / Si }.
    //
    // where i is an input sstable that belongs to shard S,
    //       Ei is the estimated partition count for sstable i,
    //       Si is the total number of shards that own sstable i.

Fixes #2672.
Refs #3302.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20180423151001.9995-1-raphaelsc@scylladb.com>
(cherry picked from commit 11940ca39e)
2018-04-24 12:53:34 +03:00
Takuya ASADA
ff70d9f15c dist: Drop AmbientCapabilities from scylla-server.service for Debian 8
Debian 8 causes "Invalid argument" when we used AmbientCapabilities on systemd
unit file, so drop the line when we build .deb package for Debian 8.
For other distributions, keep using the feature.

Fixes #3344

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180423102041.2138-1-syuu@scylladb.com>
(cherry picked from commit 7b92c3fd3f)
2018-04-24 12:53:34 +03:00
Avi Kivity
9bbd5821a2 Update scylla-ami submodule
* dist/ami/files/scylla-ami 9b4be70...02b1853 (1):
  > scylla_install_ami: remove the host id file after scylla_setup
2018-04-24 12:53:34 +03:00
Avi Kivity
a7841f1f2e release: prepare for 2.2.rc0 2018-04-18 11:08:43 +03:00
Takuya ASADA
84859e0745 dist/debian: use ~root as HOME to place .pbuilderrc
When 'always_set_home' is specified on /etc/sudoers pbuilder won't read
.pbuilderrc from current user home directory, and we don't have a way to change
the behavor from sudo command parameter.

So let's use ~root/.pbuilderrc and switch to HOME=/root when sudo executed,
this can work both environment which does specified always_set_home and doesn't
specified.

Fixes #3366

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1523926024-3937-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit ace44784e8)
2018-04-17 09:38:43 +03:00
Avi Kivity
6b74e1f02d Update seastar submodule
* seastar bcfbe0c...491f994 (3):
  > tls: Ensure we always pass through semaphores on shutdown
  > cpu scheduler: don't penalize first group to run
  > reactor: fix sleep mode

Fixes #3350.
2018-04-14 20:44:11 +03:00
Avi Kivity
520f17b315 Point seastar submodule at scylla-seastar.git
This allows backporting seastar patches.
2018-04-14 20:43:28 +03:00
Gleb Natapov
9fe3d04f31 cql_server: fix a race between closing of a connection and notifier registration
There is a race between cql connection closure and notifier
registration. If a connection is closed before notification registration
is complete stale pointer to the connection will remain in notification
list since attempt to unregister the connection will happen to early.
The fix is to move notifier unregisteration after connection's gate
is closed which will ensure that there is no outstanding registration
request. But this means that now a connection with closed gate can be in
notifier list, so with_gate() may throw and abort a notifier loop. Fix
that by replacing with_gate() by call to is_closed();

Fixes: #3355
Tests: unit(release)

Message-Id: <20180412134744.GB22593@scylladb.com>
(cherry picked from commit 1a9aaece3e)
2018-04-12 16:57:07 +03:00
Raphael S. Carvalho
a74183eb1e sstables/compaction_manager: do not break lcs invariant by not allowing parallel compaction for it
After change to serialize compaction on compaction weight (eff62bc61e),
LCS invariant may break because parallel compaction can start, and it's
not currently supported for LCS.

The condition is that weight is deregistered right before last sstable
for a leveled compaction is sealed, so it may happen that a new compaction
starts for the same column family meanwhile that will promote a sstable to
an overlapping token range.

That leads to strategy restoring invariant when it finds the overlapping,
and that means wasted resources.
The fix is about removing a fast path check which is incorrect now because
we release weight early and also fixing a check for ongoing compaction
which prevented compaction from starting for LCS whenever weight tracker
was not empty.

Fixes #3279.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20180410034538.30486-1-raphaelsc@scylladb.com>
(cherry picked from commit 638a647b7d)
2018-04-10 20:59:48 +03:00
Raphael S. Carvalho
e059f17bf2 database: make sure sstable is also forwarded to shard responsible for its generation
After f59f423f3c, sstable is loaded only at shards
that own it so as to reduce the sstable load overhead.

The problem is that a sstable may no longer be forwarded to a shard that needs to
be aware of its existence which would result in that sstable generation being
reallocated for a write request.
That would result in a failure as follow:
"SSTable write failed due to existence of TOC file for generation..."

This can be fixed by forwarding any sstable at load to all its owner shards
*and* the shard responsible for its generation, which is determined as follow:
s = generation % smp::count

Fixes #3273.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20180405035245.30194-1-raphaelsc@scylladb.com>
(cherry picked from commit 30b6c9b4cd)
2018-04-05 10:58:29 +03:00
Duarte Nunes
0e8e005357 db/view: Reject view entries with non-composite, empty partition key
Empty partition keys are not supported on normal tables - they cannot
be inserted or queried (surprisingly, the rules for composite
partition keys are different: all components are then allowed to be
empty). However, the (non-composite) partition key of a view could end
up being empty if that column is: a base table regular column, a
base table clustering key column, or a base table partition key column,
part of a composite key.

Fixes #3262
Refs CASSANDRA-14345

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180403122244.10626-1-duarte@scylladb.com>
(cherry picked from commit ec8960df45)
2018-04-03 17:20:33 +03:00
Glauber Costa
8bf6f39392 docker: default docker to overprovisioned mode.
By default, overprovisioned is not enabled on docker unless it is
explicitly set. I have come to believe that this is a mistake.

If the user is running alone in the machine, and there are no other
processes pinned anywhere - including interrupts - not running
overprovisioned is the best choice.

But everywhere else, it is not: even if a user runs 2 docker containers
in the same machine and statically partitions CPUs with --smp (but
without cpuset) the docker containers will pin themselves to the same
sets of CPU, as they are totally unaware of each other.

It is also very common, specially in some virtualized environments, for
interrupts not to be properly distributed - being particularly keen on
being delivered on CPU0, a CPU which Scylla will pin by default.

Lastly, environments like Kubernetes simply don't support pinning at the
moment.

This patch enables the overprovisioned flag if it is explicitly set -
like we did before - but also by default unless --cpuset is set.

Fixes #3336.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20180331142131.842-1-glauber@scylladb.com>
(cherry picked from commit ef84780c27)
2018-04-02 17:07:20 +03:00
Glauber Costa
04ba51986e parse and ignore background writer controller
Unused options are not exposed as command line options and will prevent
Scylla from booting when present, although they can still be pased over
YAML, for Cassandra compatibility.

That has never been a problem, but we have been adding options to i3
(and others) that are now deprecated, but were previously marked as
Used. Systems with those options may have issues upgrading.

While this problem is common to all Unused options, the likelihood for
any other unused option to appear in the command line is near zero,
except for those two - since we put them there ourselves.

There are two ways to handle this issue:

1) Mark them as Used, and just ignore them.
2) Add them explicitly to boost program options, and then ignore them.

The second option is preferred here, because we can add them as hidden
options in program_options, meaning they won't show up in the help. We
can then just print a discrete message saying that those options are,
for now on ignored.

v2: mark set as const (Botond)
v3: rebase on top of master, identation suggested by Duarte.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20180329145517.8462-1-glauber@scylladb.com>
(cherry picked from commit a9ef72537f)
2018-03-29 17:57:43 +03:00
Asias He
1d5379c462 gossip: Relax generation max difference check
start node 1 2 3
shutdown node2
shutdown node1 and node3
start node1 and node3
nodetool removenode node2
clean up all scylla data on node2
bootstrap node2 as a new node

I saw node2 could not bootstrap stuck at waiting for schema information to compelte for ever:

On node1, node3

    [shard 0] gossip - received an invalid gossip generation for peer 127.0.0.2; local generation = 2, received generation = 1521779704

On node2

    [shard 0] storage_service - JOINING: waiting for schema information to complete

This is becasue in nodetool removenode operation, the generation of node1 was increased from 0 to 2.

   gossiper::advertise_removing () calls eps.get_heart_beat_state().force_newer_generation_unsafe();
   gossiper::advertise_token_removed() calls eps.get_heart_beat_state().force_newer_generation_unsafe();

Each force_newer_generation_unsafe increases the generation by 1.

Here is an example,

Before nodetool removenode:
```
curl -X GET --header "Accept: application/json" "http://127.0.0.1:10000/failure_detector/endpoints/" | python -mjson.tool
   {
   "addrs": "127.0.0.2",
   "generation": 0,
   "is_alive": false,
   "update_time": 1521778757334,
   "version": 0
   },
```

After nodetool revmoenode:
```
curl -X GET --header "Accept: application/json" "http://127.0.0.1:10000/failure_detector/endpoints/" | python -mjson.tool
 {
     "addrs": "127.0.0.2",
     "application_state": [
         {
             "application_state": 0,
             "value": "removed,146b52d5-dc94-4e35-b7d4-4f64be0d2672,1522038476246",
             "version": 214
         },
         {
             "application_state": 6,
             "value": "REMOVER,14ecc9b0-4b88-4ff3-9c96-38505fb4968a",
             "version": 153
            }
     ],
     "generation": 2,
     "is_alive": false,
     "update_time": 1521779276246,
     "version": 0
 },
```

In gossiper::apply_state_locally, we have this check:

```
if (local_generation != 0 && remote_generation > local_generation + MAX_GENERATION_DIFFERENCE) {
    // assume some peer has corrupted memory and is broadcasting an unbelievable generation about another peer (or itself)
  logger.warn("received an invalid gossip generation for peer {}; local generation = {}, received generation = {}",ep, local_generation, remote_generation);

}
```
to skip the gossip update.

To fix, we relax generation max difference check to allow the generation
of a removed node.

After this patch, the removed node bootstraps successfully.

Tests: dtest:update_cluster_layout_tests.py
Fixes #3331

Message-Id: <678fb60f6b370d3ca050c768f705a8f2fd4b1287.1522289822.git.asias@scylladb.com>
(cherry picked from commit f539e993d3)
2018-03-29 12:10:09 +03:00
Avi Kivity
cb5dc56bfd Update scylla-ami submodule
Ref #3332.
2018-03-29 10:35:54 +03:00
Duarte Nunes
b578b492cd column_family: Don't retry flushing memtable if shutdown is requested
Since we just keep retrying, this can cause Scylla to not shutdown for
a while.

The data will be safe in the commit log.

Note that this patch doesn't fix the issue when shutdown goes through
storage_service::drain_on_shutdown - more work is required to handle
that case.

Ref #3318.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180324140822.3743-3-duarte@scylladb.com>
(cherry picked from commit a985ea0fcb)
2018-03-26 15:26:56 +03:00
Duarte Nunes
30c950a7f6 column_family: Increase scope of exception handling when flushing a memtable
In column_family::try_flush_memtable_to_sstable, the handle_exception()
block is on the inside of the continuations to
write_memtable_to_sstable(), which, if it fails, will leave the
sstable in the compaction_backlog_tracker::_ongoing_writes map, which
will waste disk space, and that sstable will map to a dangling pointer
to a destroyed database_sstable_write_monitor, which causes a seg
fault when accessed (for example, through the backlog_controller,
which accounts the _ongoing_writes when calculating the backlog).

Fix this by increasing the scope of handle_exception().

Fixes #3315

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180324140822.3743-2-duarte@scylladb.com>
(cherry picked from commit 50ad37d39b)
2018-03-26 15:26:54 +03:00
Duarte Nunes
f0d1e9c518 backlog_controller: Stop update timer
On database shutdown, this timer can cause use-after-free errors if
not stopped.

Refs #3315

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180324140822.3743-1-duarte@scylladb.com>
(cherry picked from commit b7bd9b8058)
2018-03-26 15:26:52 +03:00
Avi Kivity
597aeca93d Merge "Bug fixes for access-control, and finalizing roles" from Jesse
"
This series does not add or change any features of access-control and
roles, but addresses some bugs and finalizes the switch to roles.

"auth: Wait for schema agreement" and the patch prior help avoid false
negatives for integration tests and error messages in logs.

"auth: Remove ordering dependence" fixes an important bug in `auth` that
could leave the default superuser in a corrupted state when it is first
created.

Since roles are feature-complete (to the best of the author's knowledge
as of this writing), the final patch in the series removes any warnings
about them being unimplemented.

Tests: unit (release), dtest (PENDING)
"

* 'jhk/auth_fixes/v1' of https://github.com/hakuch/scylla:
  Roles are implemented
  auth: Increase delay before background tasks start
  auth: Remove ordering dependence
  auth: Don't warn on rescheduled task
  auth: Wait for schema agreement
  Single-node clusters can agree on schema

(cherry picked from commit 999df41a49)
2018-03-26 12:37:41 +03:00
Duarte Nunes
1a94b90a4d Merge 'Grant default permissions' from Jesse
The functional change in this series is in the last patch
("auth: Grant all permissions to object creator").

The first patch addresses `const` correctness in `auth`. This change
allowed the new code added in the last patch to be written with the
correct `const` specifiers, and also some code to be removed.

The second-to-last patch addresses error-handling in the authorizer for
unsupported operations and is a prerequisite for the last patch (since
we now always grant permissions for new database objects).

Tests: unit (release)

* 'jhk/default_permissions/v3' of https://github.com/hakuch/scylla:
  auth: Grant all permissions to object creator
  auth: Unify handling for unsupported errors
  auth: Fix life-time issue with parameter
  auth: Fix `const` correctness

(cherry picked from commit 934d805b4b)
2018-03-26 12:37:35 +03:00
Avi Kivity
acdd42c7c8 Merge "Fix abort during counter table read-on-delete" from Tomasz
"
This fixes an abort in an sstable reader when querying a partition with no
clustering ranges (happens on counter table mutation with no live rows) which
also doesn't have any static columns. In such case, the
sstable_mutation_reader will setup the data_consume_context such that it only
covers the static row of the partition, knowing that there is no need to read
any clustered rows. See partition.cc::advance_to_upper_bound(). Later when
the reader is done with the range for the static row, it will try to skip to
the first clustering range (missing in this case). If clustering_ranges_walker
tells us to skip to after_all_clustering_rows(), we will hit an assert inside
continuous_data_consumer::fast_forward_to() due to attempt to skip past the
original data file range. If clustering_ranges_walker returns
before_all_clustering_rows() instead, all is fine because we're still at the
same data file position.

Fixes #3304.
"

* 'tgrabiec/fix-counter-read-no-static-columns' of github.com:scylladb/seastar-dev:
  tests: mutation_source_test: Test reads with no clustering ranges and no static columns
  tests: simple_schema: Allow creating schema with no static column
  clustering_ranges_walker: Stop after static row in case no clustering ranges

(cherry picked from commit 054854839a)
2018-03-22 18:13:29 +02:00
Takuya ASADA
bd4f658555 scripts/scylla_install_pkg: follow redirection of specified repo URL
We should follow redirection on curl, just like normal web browser does.
Fixes #3312

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1521712056-301-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit bef08087e1)
2018-03-22 12:56:58 +02:00
Vladimir Krivopalov
a983ba7aad perf_fast_forward: fix error in date formatting
Instead of 'month', 'minutes' has been used.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <1e005ecaa992d8205ca44ea4eebbca4621ad9886.1521659341.git.vladimir@scylladb.com>
(cherry picked from commit 3010b637c9)
2018-03-22 12:56:56 +02:00
Duarte Nunes
0a561fc326 gms/gossiper: Synchronize endpoint state destruction
In gossiper::handle_major_state_change() we set the endpoint_state for
a particular endpoint and replicate the changes to other cores.

This is totally unsynchronized with the execution of
gossiper::evict_from_membership(), which can happen concurrently, and
can remove the very same endpoint from the map  (in all cores).

Replicating the changes to other cores in handle_major_state_change()
can interleave with replicating the changes to other cores in
evict_from_membership(), and result in an undefined final state.

Another issue happened in debug mode dtests, where a fiber executes
handle_major_state_change(), calls into the subscribers, of which
storage_service is one, and ultimately lands on
storage_service::update_peer_info(), which iterates over the
endpoint's application state with deferring points in between (to
update a system table). gossiper::evict_from_membership() was executed
concurrently by another fiber, which freed the state the first one is
iterating over.

Fixes #3299.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180318123211.3366-1-duarte@scylladb.com>
(cherry picked from commit 810db425a5)
2018-03-18 14:54:54 +02:00
Takuya ASADA
1f10549056 dist/redhat: build only scylla, iotune
Since we don't package tests, we don't need to build them.
It reduces package building time.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1521066363-4859-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 1bb3531b90)
2018-03-15 10:48:36 +02:00
Takuya ASADA
c2a2560ea3 dist/debian: use 3rdparty ppa on Ubuntu 18.04
Currently Ubuntu 18.04 uses distribution provided g++ and boost, but it's easier
to maintain Scylla package to build with same version toolchain/libraries, so
switch to them.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1521075576-12064-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 945e6ec4f6)
2018-03-15 10:48:31 +02:00
Takuya ASADA
237e36a0b4 dist/ami: update CentOS base image to latest version
Since we requires updated version of systemd, we need to update CentOS base
image.

Fixes #3184

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1518118694-23770-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 69d226625a)
2018-03-15 10:47:54 +02:00
Takuya ASADA
e78c137bfc dist/redhat: switch to gcc-7.3
We have hit following bug on debug-mode binary:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82560
Since it's fixed on gcc-7.3, we need to upgrade our gcc package.

See: https://groups.google.com/d/topic/scylladb-dev/RIdIpqMeTog/discussion
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1521064473-17906-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 856dc0a636)
2018-03-15 10:39:40 +02:00
Avi Kivity
fb99a7c902 Merge "Ubuntu/Debian build error fixes" from Takuya
* 'debian-ubuntu-build-fixes-v2' of https://github.com/syuu1228/scylla:
  dist/debian: build only scylla, iotune
  dist/debian: switch to boost-1.65
  dist/debian: switch to gcc-7.3

(cherry picked from commit bb4b1f0e91)
2018-03-14 22:51:44 +02:00
111 changed files with 1394 additions and 560 deletions

2
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui

View File

@@ -1,6 +1,6 @@
#!/bin/sh
VERSION=666.development
VERSION=2.2.2
if test -f version
then

View File

@@ -2193,11 +2193,11 @@
"description":"The column family"
},
"total":{
"type":"int",
"type":"long",
"description":"The total snapshot size"
},
"live":{
"type":"int",
"type":"long",
"description":"The live snapshot size"
}
}

View File

@@ -72,18 +72,22 @@ public:
return make_ready_future<authenticated_user>(anonymous_user());
}
virtual future<> create(stdx::string_view, const authentication_options& options) override {
virtual future<> create(stdx::string_view, const authentication_options& options) const override {
return make_ready_future();
}
virtual future<> alter(stdx::string_view, const authentication_options& options) override {
virtual future<> alter(stdx::string_view, const authentication_options& options) const override {
return make_ready_future();
}
virtual future<> drop(stdx::string_view) override {
virtual future<> drop(stdx::string_view) const override {
return make_ready_future();
}
virtual future<custom_options> query_custom_options(stdx::string_view role_name) const override {
return make_ready_future<custom_options>();
}
virtual const resource_set& protected_resources() const override {
static const resource_set resources;
return resources;

View File

@@ -58,24 +58,30 @@ public:
return make_ready_future<permission_set>(permissions::ALL);
}
virtual future<> grant(stdx::string_view, permission_set, const resource&) override {
throw exceptions::invalid_request_exception("GRANT operation is not supported by AllowAllAuthorizer");
virtual future<> grant(stdx::string_view, permission_set, const resource&) const override {
return make_exception_future<>(
unsupported_authorization_operation("GRANT operation is not supported by AllowAllAuthorizer"));
}
virtual future<> revoke(stdx::string_view, permission_set, const resource&) override {
throw exceptions::invalid_request_exception("REVOKE operation is not supported by AllowAllAuthorizer");
virtual future<> revoke(stdx::string_view, permission_set, const resource&) const override {
return make_exception_future<>(
unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));
}
virtual future<std::vector<permission_details>> list_all() const override {
throw exceptions::invalid_request_exception("LIST PERMISSIONS operation is not supported by AllowAllAuthorizer");
return make_exception_future<std::vector<permission_details>>(
unsupported_authorization_operation(
"LIST PERMISSIONS operation is not supported by AllowAllAuthorizer"));
}
virtual future<> revoke_all(stdx::string_view) override {
return make_ready_future();
virtual future<> revoke_all(stdx::string_view) const override {
return make_exception_future(
unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));
}
virtual future<> revoke_all(const resource&) override {
return make_ready_future();
virtual future<> revoke_all(const resource&) const override {
return make_exception_future(
unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));
}
virtual const resource_set& protected_resources() const override {

View File

@@ -43,9 +43,11 @@ std::ostream& operator<<(std::ostream&, authentication_option);
using authentication_option_set = std::unordered_set<authentication_option>;
using custom_options = std::unordered_map<sstring, sstring>;
struct authentication_options final {
std::optional<sstring> password;
std::optional<std::unordered_map<sstring, sstring>> options;
std::optional<custom_options> options;
};
inline bool any_authentication_options(const authentication_options& aos) noexcept {

View File

@@ -69,7 +69,9 @@ namespace auth {
class authenticated_user;
///
/// Abstract interface for authenticating users.
/// Abstract client for authenticating role identity.
///
/// All state necessary to authorize a role is stored externally to the client instance.
///
class authenticator {
public:
@@ -120,7 +122,7 @@ public:
///
/// The options provided must be a subset of `supported_options()`.
///
virtual future<> create(stdx::string_view role_name, const authentication_options& options) = 0;
virtual future<> create(stdx::string_view role_name, const authentication_options& options) const = 0;
///
/// Alter the authentication record of an existing user.
@@ -129,12 +131,19 @@ public:
///
/// Callers must ensure that the specification of `alterable_options()` is adhered to.
///
virtual future<> alter(stdx::string_view role_name, const authentication_options& options) = 0;
virtual future<> alter(stdx::string_view role_name, const authentication_options& options) const = 0;
///
/// Delete the authentication record for a user. This will disallow the user from logging in.
///
virtual future<> drop(stdx::string_view role_name) = 0;
virtual future<> drop(stdx::string_view role_name) const = 0;
///
/// Query for custom options (those corresponding to \ref authentication_options::options).
///
/// If no options are set the result is an empty container.
///
virtual future<custom_options> query_custom_options(stdx::string_view role_name) const = 0;
///
/// System resources used internally as part of the implementation. These are made inaccessible to users.

View File

@@ -44,6 +44,7 @@
#include <experimental/string_view>
#include <functional>
#include <optional>
#include <stdexcept>
#include <tuple>
#include <vector>
@@ -79,8 +80,15 @@ inline bool operator<(const permission_details& pd1, const permission_details& p
< std::forward_as_tuple(pd2.role_name, pd2.resource, pd2.permissions);
}
class unsupported_authorization_operation : public std::invalid_argument {
public:
using std::invalid_argument::invalid_argument;
};
///
/// Abstract interface for authorizing users to access resources.
/// Abstract client for authorizing roles to access resources.
///
/// All state necessary to authorize a role is stored externally to the client instance.
///
class authorizer {
public:
@@ -107,27 +115,37 @@ public:
///
/// Grant a set of permissions to a role for a particular \ref resource.
///
virtual future<> grant(stdx::string_view role_name, permission_set, const resource&) = 0;
/// \throws \ref unsupported_authorization_operation if granting permissions is not supported.
///
virtual future<> grant(stdx::string_view role_name, permission_set, const resource&) const = 0;
///
/// Revoke a set of permissions from a role for a particular \ref resource.
///
virtual future<> revoke(stdx::string_view role_name, permission_set, const resource&) = 0;
/// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.
///
virtual future<> revoke(stdx::string_view role_name, permission_set, const resource&) const = 0;
///
/// Query for all directly granted permissions.
///
/// \throws \ref unsupported_authorization_operation if listing permissions is not supported.
///
virtual future<std::vector<permission_details>> list_all() const = 0;
///
/// Revoke all permissions granted directly to a particular role.
///
virtual future<> revoke_all(stdx::string_view role_name) = 0;
/// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.
///
virtual future<> revoke_all(stdx::string_view role_name) const = 0;
///
/// Revoke all permissions granted to any role for a particular resource.
///
virtual future<> revoke_all(const resource&) = 0;
/// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.
///
virtual future<> revoke_all(const resource&) const = 0;
///
/// System resources used internally as part of the implementation. These are made inaccessible to users.

View File

@@ -25,6 +25,7 @@
#include "cql3/query_processor.hh"
#include "cql3/statements/create_table_statement.hh"
#include "database.hh"
#include "schema_builder.hh"
#include "service/migration_manager.hh"
@@ -48,7 +49,7 @@ future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_f
return exponential_backoff_retry::do_until_value(1s, 1min, as, [func = std::move(func)] {
return func().then_wrapped([] (auto&& f) -> stdx::optional<empty_state> {
if (f.failed()) {
auth_log.warn("Auth task failed with error, rescheduling: {}", f.get_exception());
auth_log.info("Auth task failed with error, rescheduling: {}", f.get_exception());
return { };
}
return { empty_state() };
@@ -58,13 +59,13 @@ future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_f
}
future<> create_metadata_table_if_missing(
const sstring& table_name,
stdx::string_view table_name,
cql3::query_processor& qp,
const sstring& cql,
stdx::string_view cql,
::service::migration_manager& mm) {
auto& db = qp.db().local();
if (db.has_schema(meta::AUTH_KS, table_name)) {
if (db.has_schema(meta::AUTH_KS, sstring(table_name))) {
return make_ready_future<>();
}
@@ -85,4 +86,12 @@ future<> create_metadata_table_if_missing(
return mm.announce_new_column_family(b.build(), false);
}
future<> wait_for_schema_agreement(::service::migration_manager& mm, const database& db) {
static const auto pause = [] { return sleep(std::chrono::milliseconds(500)); };
return do_until([&db] { return db.get_version() != database::empty_version; }, pause).then([&mm] {
return do_until([&mm] { return mm.have_schema_agreement(); }, pause);
});
}
}

View File

@@ -22,6 +22,7 @@
#pragma once
#include <chrono>
#include <experimental/string_view>
#include <seastar/core/future.hh>
#include <seastar/core/abort_source.hh>
@@ -36,6 +37,8 @@
using namespace std::chrono_literals;
class database;
namespace service {
class migration_manager;
}
@@ -65,16 +68,18 @@ future<> once_among_shards(Task&& f) {
}
inline future<> delay_until_system_ready(seastar::abort_source& as) {
return sleep_abortable(10s, as);
return sleep_abortable(15s, as);
}
// Func must support being invoked more than once.
future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_function<future<>()> func);
future<> create_metadata_table_if_missing(
const sstring& table_name,
stdx::string_view table_name,
cql3::query_processor&,
const sstring& cql,
stdx::string_view cql,
::service::migration_manager&);
future<> wait_for_schema_agreement(::service::migration_manager&, const database&);
}

View File

@@ -109,7 +109,7 @@ future<bool> default_authorizer::any_granted() const {
});
}
future<> default_authorizer::migrate_legacy_metadata() {
future<> default_authorizer::migrate_legacy_metadata() const {
alogger.info("Starting migration of legacy permissions metadata.");
static const sstring query = sprint("SELECT * FROM %s.%s", meta::AUTH_KS, legacy_table_name);
@@ -157,18 +157,18 @@ future<> default_authorizer::start() {
create_table,
_migration_manager).then([this] {
_finished = do_after_system_ready(_as, [this] {
if (legacy_metadata_exists()) {
return any_granted().then([this](bool any) {
if (!any) {
return migrate_legacy_metadata();
}
return async([this] {
wait_for_schema_agreement(_migration_manager, _qp.db().local()).get0();
alogger.warn("Ignoring legacy permissions metadata since role permissions exist.");
return make_ready_future<>();
});
}
if (legacy_metadata_exists()) {
if (!any_granted().get0()) {
migrate_legacy_metadata().get0();
return;
}
return make_ready_future<>();
alogger.warn("Ignoring legacy permissions metadata since role permissions exist.");
}
});
});
});
});
@@ -210,7 +210,7 @@ default_authorizer::modify(
stdx::string_view role_name,
permission_set set,
const resource& resource,
stdx::string_view op) {
stdx::string_view op) const {
return do_with(
sprint(
"UPDATE %s.%s SET %s = %s %s ? WHERE %s = ? AND %s = ?",
@@ -230,11 +230,11 @@ default_authorizer::modify(
}
future<> default_authorizer::grant(stdx::string_view role_name, permission_set set, const resource& resource) {
future<> default_authorizer::grant(stdx::string_view role_name, permission_set set, const resource& resource) const {
return modify(role_name, std::move(set), resource, "+");
}
future<> default_authorizer::revoke(stdx::string_view role_name, permission_set set, const resource& resource) {
future<> default_authorizer::revoke(stdx::string_view role_name, permission_set set, const resource& resource) const {
return modify(role_name, std::move(set), resource, "-");
}
@@ -267,7 +267,7 @@ future<std::vector<permission_details>> default_authorizer::list_all() const {
});
}
future<> default_authorizer::revoke_all(stdx::string_view role_name) {
future<> default_authorizer::revoke_all(stdx::string_view role_name) const {
static const sstring query = sprint(
"DELETE FROM %s.%s WHERE %s = ?",
meta::AUTH_KS,
@@ -286,7 +286,7 @@ future<> default_authorizer::revoke_all(stdx::string_view role_name) {
});
}
future<> default_authorizer::revoke_all(const resource& resource) {
future<> default_authorizer::revoke_all(const resource& resource) const {
static const sstring query = sprint(
"SELECT %s FROM %s.%s WHERE %s = ? ALLOW FILTERING",
ROLE_NAME,

View File

@@ -77,15 +77,15 @@ public:
virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const override;
virtual future<> grant(stdx::string_view, permission_set, const resource&) override;
virtual future<> grant(stdx::string_view, permission_set, const resource&) const override;
virtual future<> revoke( stdx::string_view, permission_set, const resource&) override;
virtual future<> revoke( stdx::string_view, permission_set, const resource&) const override;
virtual future<std::vector<permission_details>> list_all() const override;
virtual future<> revoke_all(stdx::string_view) override;
virtual future<> revoke_all(stdx::string_view) const override;
virtual future<> revoke_all(const resource&) override;
virtual future<> revoke_all(const resource&) const override;
virtual const resource_set& protected_resources() const override;
@@ -94,9 +94,9 @@ private:
future<bool> any_granted() const;
future<> migrate_legacy_metadata();
future<> migrate_legacy_metadata() const;
future<> modify(stdx::string_view, permission_set, const resource&, stdx::string_view);
future<> modify(stdx::string_view, permission_set, const resource&, stdx::string_view) const;
};
} /* namespace auth */

View File

@@ -149,7 +149,9 @@ static sstring gensalt() {
// blowfish 2011 fix, blowfish, sha512, sha256, md5
for (sstring pfx : { "$2y$", "$2a$", "$6$", "$5$", "$1$" }) {
salt = pfx + input;
if (crypt_r("fisk", salt.c_str(), &tlcrypt)) {
const char* e = crypt_r("fisk", salt.c_str(), &tlcrypt);
if (e && (e[0] != '*')) {
prefix = pfx;
return salt;
}
@@ -177,7 +179,7 @@ bool password_authenticator::legacy_metadata_exists() const {
return _qp.db().local().has_schema(meta::AUTH_KS, legacy_table_name);
}
future<> password_authenticator::migrate_legacy_metadata() {
future<> password_authenticator::migrate_legacy_metadata() const {
plogger.info("Starting migration of legacy authentication metadata.");
static const sstring query = sprint("SELECT * FROM %s.%s", meta::AUTH_KS, legacy_table_name);
@@ -201,7 +203,7 @@ future<> password_authenticator::migrate_legacy_metadata() {
});
}
future<> password_authenticator::create_default_if_missing() {
future<> password_authenticator::create_default_if_missing() const {
return default_role_row_satisfies(_qp, &has_salted_hash).then([this](bool exists) {
if (!exists) {
return _qp.process(
@@ -220,8 +222,16 @@ future<> password_authenticator::start() {
return once_among_shards([this] {
gensalt(); // do this once to determine usable hashing
auto f = create_metadata_table_if_missing(
meta::roles_table::name,
_qp,
meta::roles_table::creation_query(),
_migration_manager);
_stopped = do_after_system_ready(_as, [this] {
return async([this] {
wait_for_schema_agreement(_migration_manager, _qp.db().local()).get0();
if (any_nondefault_role_row_satisfies(_qp, &has_salted_hash).get0()) {
if (legacy_metadata_exists()) {
plogger.warn("Ignoring legacy authentication metadata since nondefault data already exist.");
@@ -239,7 +249,7 @@ future<> password_authenticator::start() {
});
});
return make_ready_future<>();
return f;
});
}
@@ -317,7 +327,7 @@ future<authenticated_user> password_authenticator::authenticate(
});
}
future<> password_authenticator::create(stdx::string_view role_name, const authentication_options& options) {
future<> password_authenticator::create(stdx::string_view role_name, const authentication_options& options) const {
if (!options.password) {
return make_ready_future<>();
}
@@ -328,7 +338,7 @@ future<> password_authenticator::create(stdx::string_view role_name, const authe
{hashpw(*options.password), sstring(role_name)}).discard_result();
}
future<> password_authenticator::alter(stdx::string_view role_name, const authentication_options& options) {
future<> password_authenticator::alter(stdx::string_view role_name, const authentication_options& options) const {
if (!options.password) {
return make_ready_future<>();
}
@@ -345,7 +355,7 @@ future<> password_authenticator::alter(stdx::string_view role_name, const authen
{hashpw(*options.password), sstring(role_name)}).discard_result();
}
future<> password_authenticator::drop(stdx::string_view name) {
future<> password_authenticator::drop(stdx::string_view name) const {
static const sstring query = sprint(
"DELETE %s FROM %s WHERE %s = ?",
SALTED_HASH,
@@ -355,6 +365,10 @@ future<> password_authenticator::drop(stdx::string_view name) {
return _qp.process(query, consistency_for_user(name), {sstring(name)}).discard_result();
}
future<custom_options> password_authenticator::query_custom_options(stdx::string_view role_name) const {
return make_ready_future<custom_options>();
}
const resource_set& password_authenticator::protected_resources() const {
static const resource_set resources({make_data_resource(meta::AUTH_KS, meta::roles_table::name)});
return resources;

View File

@@ -81,11 +81,13 @@ public:
virtual future<authenticated_user> authenticate(const credentials_map& credentials) const override;
virtual future<> create(stdx::string_view role_name, const authentication_options& options) override;
virtual future<> create(stdx::string_view role_name, const authentication_options& options) const override;
virtual future<> alter(stdx::string_view role_name, const authentication_options& options) override;
virtual future<> alter(stdx::string_view role_name, const authentication_options& options) const override;
virtual future<> drop(stdx::string_view role_name) override;
virtual future<> drop(stdx::string_view role_name) const override;
virtual future<custom_options> query_custom_options(stdx::string_view role_name) const override;
virtual const resource_set& protected_resources() const override;
@@ -94,9 +96,9 @@ public:
private:
bool legacy_metadata_exists() const;
future<> migrate_legacy_metadata();
future<> migrate_legacy_metadata() const;
future<> create_default_if_missing();
future<> create_default_if_missing() const;
};
}

View File

@@ -93,10 +93,12 @@ using role_set = std::unordered_set<sstring>;
enum class recursive_role_query { yes, no };
///
/// Abstract role manager.
/// Abstract client for managing roles.
///
/// All implementations should throw role-related exceptions as documented, but authorization-related checking is
/// handled by the CQL layer, and not here.
/// All state necessary for managing roles is stored externally to the client instance.
///
/// All implementations should throw role-related exceptions as documented. Authorization is not addressed here, and
/// access-control should never be enforced in implementations.
///
class role_manager {
public:
@@ -113,17 +115,17 @@ public:
///
/// \returns an exceptional future with \ref role_already_exists for a role that has previously been created.
///
virtual future<> create(stdx::string_view role_name, const role_config&) = 0;
virtual future<> create(stdx::string_view role_name, const role_config&) const = 0;
///
/// \returns an exceptional future with \ref nonexistant_role if the role does not exist.
///
virtual future<> drop(stdx::string_view role_name) = 0;
virtual future<> drop(stdx::string_view role_name) const = 0;
///
/// \returns an exceptional future with \ref nonexistant_role if the role does not exist.
///
virtual future<> alter(stdx::string_view role_name, const role_config_update&) = 0;
virtual future<> alter(stdx::string_view role_name, const role_config_update&) const = 0;
///
/// Grant `role_name` to `grantee_name`.
@@ -133,7 +135,7 @@ public:
/// \returns an exceptional future with \ref role_already_included if granting the role would be redundant, or
/// create a cycle.
///
virtual future<> grant(stdx::string_view grantee_name, stdx::string_view role_name) = 0;
virtual future<> grant(stdx::string_view grantee_name, stdx::string_view role_name) const = 0;
///
/// Revoke `role_name` from `revokee_name`.
@@ -142,7 +144,7 @@ public:
///
/// \returns an exceptional future with \ref revoke_ungranted_role if the role was not granted.
///
virtual future<> revoke(stdx::string_view revokee_name, stdx::string_view role_name) = 0;
virtual future<> revoke(stdx::string_view revokee_name, stdx::string_view role_name) const = 0;
///
/// \returns an exceptional future with \ref nonexistant_role if the role does not exist.

View File

@@ -36,6 +36,21 @@ namespace meta {
namespace roles_table {
stdx::string_view creation_query() {
static const sstring instance = sprint(
"CREATE TABLE %s ("
" %s text PRIMARY KEY,"
" can_login boolean,"
" is_superuser boolean,"
" member_of set<text>,"
" salted_hash text"
")",
qualified_name(),
role_col_name);
return instance;
}
stdx::string_view qualified_name() noexcept {
static const sstring instance = AUTH_KS + "." + sstring(name);
return instance;

View File

@@ -40,6 +40,8 @@ namespace meta {
namespace roles_table {
stdx::string_view creation_query();
constexpr stdx::string_view name{"roles", 5};
stdx::string_view qualified_name() noexcept;

View File

@@ -77,11 +77,18 @@ private:
void on_update_view(const sstring& ks_name, const sstring& view_name, bool columns_changed) override {}
void on_drop_keyspace(const sstring& ks_name) override {
_authorizer.revoke_all(auth::make_data_resource(ks_name));
_authorizer.revoke_all(
auth::make_data_resource(ks_name)).handle_exception_type([](const unsupported_authorization_operation&) {
// Nothing.
});
}
void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {
_authorizer.revoke_all(auth::make_data_resource(ks_name, cf_name));
_authorizer.revoke_all(
auth::make_data_resource(
ks_name, cf_name)).handle_exception_type([](const unsupported_authorization_operation&) {
// Nothing.
});
}
void on_drop_user_type(const sstring& ks_name, const sstring& type_name) override {}
@@ -177,9 +184,7 @@ future<> service::start() {
return once_among_shards([this] {
return create_keyspace_if_missing();
}).then([this] {
return _role_manager->start();
}).then([this] {
return when_all_succeed(_authorizer->start(), _authenticator->start());
return when_all_succeed(_role_manager->start(), _authorizer->start(), _authenticator->start());
}).then([this] {
_permissions_cache = std::make_unique<permissions_cache>(_permissions_cache_config, *this, log);
}).then([this] {
@@ -402,7 +407,7 @@ static void validate_authentication_options_are_supported(
future<> create_role(
service& ser,
const service& ser,
stdx::string_view name,
const role_config& config,
const authentication_options& options) {
@@ -415,7 +420,7 @@ future<> create_role(
&validate_authentication_options_are_supported,
options,
ser.underlying_authenticator().supported_options()).then([&ser, name, &options] {
return ser.underlying_authenticator().create(sstring(name), options);
return ser.underlying_authenticator().create(name, options);
}).handle_exception([&ser, &name](std::exception_ptr ep) {
// Roll-back.
return ser.underlying_role_manager().drop(name).then([ep = std::move(ep)] {
@@ -426,7 +431,7 @@ future<> create_role(
}
future<> alter_role(
service& ser,
const service& ser,
stdx::string_view name,
const role_config_update& config_update,
const authentication_options& options) {
@@ -444,10 +449,15 @@ future<> alter_role(
});
}
future<> drop_role(service& ser, stdx::string_view name) {
future<> drop_role(const service& ser, stdx::string_view name) {
return do_with(make_role_resource(name), [&ser, name](const resource& r) {
auto& a = ser.underlying_authorizer();
return when_all_succeed(a.revoke_all(name), a.revoke_all(r));
return when_all_succeed(
a.revoke_all(name),
a.revoke_all(r)).handle_exception_type([](const unsupported_authorization_operation&) {
// Nothing.
});
}).then([&ser, name] {
return ser.underlying_authenticator().drop(name);
}).then([&ser, name] {
@@ -471,7 +481,7 @@ future<bool> has_role(const service& ser, const authenticated_user& u, stdx::str
}
future<> grant_permissions(
service& ser,
const service& ser,
stdx::string_view role_name,
permission_set perms,
const resource& r) {
@@ -480,8 +490,19 @@ future<> grant_permissions(
});
}
future<> grant_applicable_permissions(const service& ser, stdx::string_view role_name, const resource& r) {
return grant_permissions(ser, role_name, r.applicable_permissions(), r);
}
future<> grant_applicable_permissions(const service& ser, const authenticated_user& u, const resource& r) {
if (is_anonymous(u)) {
return make_ready_future<>();
}
return grant_applicable_permissions(ser, *u.name, r);
}
future<> revoke_permissions(
service& ser,
const service& ser,
stdx::string_view role_name,
permission_set perms,
const resource& r) {

View File

@@ -75,12 +75,14 @@ public:
};
///
/// Central interface into access-control for the system.
/// Client for access-control in the system.
///
/// Access control encompasses user/role management, authentication, and authorization. This class provides access to
/// Access control encompasses user/role management, authentication, and authorization. This client provides access to
/// the dynamically-loaded implementations of these modules (through the `underlying_*` member functions), but also
/// builds on their functionality with caching and abstractions for common operations.
///
/// All state associated with access-control is stored externally to any particular instance of this class.
///
class service final {
permissions_cache_config _permissions_cache_config;
std::unique_ptr<permissions_cache> _permissions_cache;
@@ -149,26 +151,14 @@ public:
future<bool> exists(const resource&) const;
authenticator& underlying_authenticator() {
return *_authenticator;
}
const authenticator& underlying_authenticator() const {
return *_authenticator;
}
authorizer& underlying_authorizer() {
return *_authorizer;
}
const authorizer& underlying_authorizer() const {
return *_authorizer;
}
role_manager& underlying_role_manager() {
return *_role_manager;
}
const role_manager& underlying_role_manager() const {
return *_role_manager;
}
@@ -206,7 +196,7 @@ bool is_protected(const service&, const resource&) noexcept;
/// \returns an exceptional future with \ref unsupported_authentication_option if an unsupported option is included.
///
future<> create_role(
service&,
const service&,
stdx::string_view name,
const role_config&,
const authentication_options&);
@@ -219,7 +209,7 @@ future<> create_role(
/// \returns an exceptional future with \ref unsupported_authentication_option if an unsupported option is included.
///
future<> alter_role(
service&,
const service&,
stdx::string_view name,
const role_config_update&,
const authentication_options&);
@@ -229,7 +219,7 @@ future<> alter_role(
///
/// \returns an exceptional future with \ref nonexistant_role if the named role does not exist.
///
future<> drop_role(service&, stdx::string_view name);
future<> drop_role(const service&, stdx::string_view name);
///
/// Check if `grantee` has been granted the named role.
@@ -247,17 +237,34 @@ future<bool> has_role(const service&, const authenticated_user&, stdx::string_vi
///
/// \returns an exceptional future with \ref nonexistent_role if the named role does not exist.
///
/// \returns an exceptional future with \ref unsupported_authorization_operation if granting permissions is not
/// supported.
///
future<> grant_permissions(
service&,
const service&,
stdx::string_view role_name,
permission_set,
const resource&);
///
/// Like \ref grant_permissions, but grants all applicable permissions on the resource.
///
/// \returns an exceptional future with \ref nonexistent_role if the named role does not exist.
///
/// \returns an exceptional future with \ref unsupported_authorization_operation if granting permissions is not
/// supported.
///
future<> grant_applicable_permissions(const service&, stdx::string_view role_name, const resource&);
future<> grant_applicable_permissions(const service&, const authenticated_user&, const resource&);
///
/// \returns an exceptional future with \ref nonexistent_role if the named role does not exist.
///
/// \returns an exceptional future with \ref unsupported_authorization_operation if revoking permissions is not
/// supported.
///
future<> revoke_permissions(
service&,
const service&,
stdx::string_view role_name,
permission_set,
const resource&);
@@ -277,6 +284,9 @@ using recursive_permissions = bool_class<struct recursive_permissions_tag>;
/// \returns an exceptional future with \ref nonexistent_role if a role name is included which refers to a role that
/// does not exist.
///
/// \returns an exceptional future with \ref unsupported_authorization_operation if listing permissions is not
/// supported.
///
future<std::vector<permission_details>> list_filtered_permissions(
const service&,
permission_set,

View File

@@ -118,6 +118,10 @@ static future<record> require_record(cql3::query_processor& qp, stdx::string_vie
});
}
static bool has_can_login(const cql3::untyped_result_set_row& row) {
return row.has("can_login") && !(boolean_type->deserialize(row.get_blob("can_login")).is_null());
}
stdx::string_view standard_role_manager_name() noexcept {
static const sstring instance = meta::AUTH_PACKAGE_NAME + "CassandraRoleManager";
return instance;
@@ -135,18 +139,7 @@ const resource_set& standard_role_manager::protected_resources() const {
return resources;
}
future<> standard_role_manager::create_metadata_tables_if_missing() {
static const sstring create_roles_query = sprint(
"CREATE TABLE %s ("
" %s text PRIMARY KEY,"
" can_login boolean,"
" is_superuser boolean,"
" member_of set<text>,"
" salted_hash text"
")",
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
future<> standard_role_manager::create_metadata_tables_if_missing() const {
static const sstring create_role_members_query = sprint(
"CREATE TABLE %s ("
" role text,"
@@ -158,19 +151,19 @@ future<> standard_role_manager::create_metadata_tables_if_missing() {
return when_all_succeed(
create_metadata_table_if_missing(
sstring(meta::roles_table::name),
meta::roles_table::name,
_qp,
create_roles_query,
meta::roles_table::creation_query(),
_migration_manager),
create_metadata_table_if_missing(
sstring(meta::role_members_table::name),
meta::role_members_table::name,
_qp,
create_role_members_query,
_migration_manager));
}
future<> standard_role_manager::create_default_role_if_missing() {
return default_role_row_satisfies(_qp, [](auto&&) { return true; }).then([this](bool exists) {
future<> standard_role_manager::create_default_role_if_missing() const {
return default_role_row_satisfies(_qp, &has_can_login).then([this](bool exists) {
if (!exists) {
static const sstring query = sprint(
"INSERT INTO %s (%s, is_superuser, can_login) VALUES (?, true, true)",
@@ -199,7 +192,7 @@ bool standard_role_manager::legacy_metadata_exists() const {
return _qp.db().local().has_schema(meta::AUTH_KS, legacy_table_name);
}
future<> standard_role_manager::migrate_legacy_metadata() {
future<> standard_role_manager::migrate_legacy_metadata() const {
log.info("Starting migration of legacy user metadata.");
static const sstring query = sprint("SELECT * FROM %s.%s", meta::AUTH_KS, legacy_table_name);
@@ -231,7 +224,9 @@ future<> standard_role_manager::start() {
return this->create_metadata_tables_if_missing().then([this] {
_stopped = auth::do_after_system_ready(_as, [this] {
return seastar::async([this] {
if (any_nondefault_role_row_satisfies(_qp, [](auto&&) { return true; }).get0()) {
wait_for_schema_agreement(_migration_manager, _qp.db().local()).get0();
if (any_nondefault_role_row_satisfies(_qp, &has_can_login).get0()) {
if (this->legacy_metadata_exists()) {
log.warn("Ignoring legacy user metadata since nondefault roles already exist.");
}
@@ -256,7 +251,7 @@ future<> standard_role_manager::stop() {
return _stopped.handle_exception_type([] (const sleep_aborted&) { });
}
future<> standard_role_manager::create_or_replace(stdx::string_view role_name, const role_config& c) {
future<> standard_role_manager::create_or_replace(stdx::string_view role_name, const role_config& c) const {
static const sstring query = sprint(
"INSERT INTO %s (%s, is_superuser, can_login) VALUES (?, ?, ?)",
meta::roles_table::qualified_name(),
@@ -270,7 +265,7 @@ future<> standard_role_manager::create_or_replace(stdx::string_view role_name, c
}
future<>
standard_role_manager::create(stdx::string_view role_name, const role_config& c) {
standard_role_manager::create(stdx::string_view role_name, const role_config& c) const {
return this->exists(role_name).then([this, role_name, &c](bool role_exists) {
if (role_exists) {
throw role_already_exists(role_name);
@@ -281,7 +276,7 @@ standard_role_manager::create(stdx::string_view role_name, const role_config& c)
}
future<>
standard_role_manager::alter(stdx::string_view role_name, const role_config_update& u) {
standard_role_manager::alter(stdx::string_view role_name, const role_config_update& u) const {
static const auto build_column_assignments = [](const role_config_update& u) -> sstring {
std::vector<sstring> assignments;
@@ -312,7 +307,7 @@ standard_role_manager::alter(stdx::string_view role_name, const role_config_upda
});
}
future<> standard_role_manager::drop(stdx::string_view role_name) {
future<> standard_role_manager::drop(stdx::string_view role_name) const {
return this->exists(role_name).then([this, role_name](bool role_exists) {
if (!role_exists) {
throw nonexistant_role(role_name);
@@ -379,7 +374,7 @@ future<>
standard_role_manager::modify_membership(
stdx::string_view grantee_name,
stdx::string_view role_name,
membership_change ch) {
membership_change ch) const {
const auto modify_roles = [this, role_name, grantee_name, ch] {
@@ -421,7 +416,7 @@ standard_role_manager::modify_membership(
}
future<>
standard_role_manager::grant(stdx::string_view grantee_name, stdx::string_view role_name) {
standard_role_manager::grant(stdx::string_view grantee_name, stdx::string_view role_name) const {
const auto check_redundant = [this, role_name, grantee_name] {
return this->query_granted(
grantee_name,
@@ -452,7 +447,7 @@ standard_role_manager::grant(stdx::string_view grantee_name, stdx::string_view r
}
future<>
standard_role_manager::revoke(stdx::string_view revokee_name, stdx::string_view role_name) {
standard_role_manager::revoke(stdx::string_view revokee_name, stdx::string_view role_name) const {
return this->exists(role_name).then([this, revokee_name, role_name](bool role_exists) {
if (!role_exists) {
throw nonexistant_role(sstring(role_name));

View File

@@ -66,15 +66,15 @@ public:
virtual future<> stop() override;
virtual future<> create(stdx::string_view role_name, const role_config&) override;
virtual future<> create(stdx::string_view role_name, const role_config&) const override;
virtual future<> drop(stdx::string_view role_name) override;
virtual future<> drop(stdx::string_view role_name) const override;
virtual future<> alter(stdx::string_view role_name, const role_config_update&) override;
virtual future<> alter(stdx::string_view role_name, const role_config_update&) const override;
virtual future<> grant(stdx::string_view grantee_name, stdx::string_view role_name) override;
virtual future<> grant(stdx::string_view grantee_name, stdx::string_view role_name) const override;
virtual future<> revoke(stdx::string_view revokee_name, stdx::string_view role_name) override;
virtual future<> revoke(stdx::string_view revokee_name, stdx::string_view role_name) const override;
virtual future<role_set> query_granted(stdx::string_view grantee_name, recursive_role_query) const override;
@@ -89,17 +89,17 @@ public:
private:
enum class membership_change { add, remove };
future<> create_metadata_tables_if_missing();
future<> create_metadata_tables_if_missing() const;
bool legacy_metadata_exists() const;
future<> migrate_legacy_metadata();
future<> migrate_legacy_metadata() const;
future<> create_default_role_if_missing();
future<> create_default_role_if_missing() const;
future<> create_or_replace(stdx::string_view role_name, const role_config&);
future<> create_or_replace(stdx::string_view role_name, const role_config&) const;
future<> modify_membership(stdx::string_view role_name, stdx::string_view grantee_name, membership_change);
future<> modify_membership(stdx::string_view role_name, stdx::string_view grantee_name, membership_change) const;
};
}

View File

@@ -118,18 +118,22 @@ public:
});
}
virtual future<> create(stdx::string_view role_name, const authentication_options& options) override {
virtual future<> create(stdx::string_view role_name, const authentication_options& options) const override {
return _authenticator->create(role_name, options);
}
virtual future<> alter(stdx::string_view role_name, const authentication_options& options) override {
virtual future<> alter(stdx::string_view role_name, const authentication_options& options) const override {
return _authenticator->alter(role_name, options);
}
virtual future<> drop(stdx::string_view role_name) override {
virtual future<> drop(stdx::string_view role_name) const override {
return _authenticator->drop(role_name);
}
virtual future<custom_options> query_custom_options(stdx::string_view role_name) const override {
return _authenticator->query_custom_options(role_name);
}
virtual const resource_set& protected_resources() const override {
return _authenticator->protected_resources();
}
@@ -214,11 +218,11 @@ public:
return make_ready_future<permission_set>(transitional_permissions);
}
virtual future<> grant(stdx::string_view s, permission_set ps, const resource& r) override {
virtual future<> grant(stdx::string_view s, permission_set ps, const resource& r) const override {
return _authorizer->grant(s, std::move(ps), r);
}
virtual future<> revoke(stdx::string_view s, permission_set ps, const resource& r) override {
virtual future<> revoke(stdx::string_view s, permission_set ps, const resource& r) const override {
return _authorizer->revoke(s, std::move(ps), r);
}
@@ -226,11 +230,11 @@ public:
return _authorizer->list_all();
}
virtual future<> revoke_all(stdx::string_view s) override {
virtual future<> revoke_all(stdx::string_view s) const override {
return _authorizer->revoke_all(s);
}
virtual future<> revoke_all(const resource& r) override {
virtual future<> revoke_all(const resource& r) const override {
return _authorizer->revoke_all(r);
}

View File

@@ -47,6 +47,7 @@
class backlog_controller {
public:
future<> shutdown() {
_update_timer.cancel();
return std::move(_inflight_update);
}
protected:
@@ -126,7 +127,7 @@ public:
class compaction_controller : public backlog_controller {
public:
static constexpr unsigned normalization_factor = 10;
static constexpr unsigned normalization_factor = 30;
compaction_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, float static_shares) : backlog_controller(sg, iop, static_shares) {}
compaction_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, std::chrono::milliseconds interval, std::function<float()> current_backlog)
: backlog_controller(sg, iop, std::move(interval),

View File

@@ -60,6 +60,7 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
// - _next_row_in_range = _next.position() < _upper_bound
// - _last_row points at a direct predecessor of the next row which is going to be read.
// Used for populating continuity.
// - _population_range_starts_before_all_rows is set accordingly
reading_from_underlying,
end_of_stream
@@ -86,6 +87,13 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
partition_snapshot_row_cursor _next_row;
bool _next_row_in_range = false;
// True iff current population interval, since the previous clustering row, starts before all clustered rows.
// We cannot just look at _lower_bound, because emission of range tombstones changes _lower_bound and
// because we mark clustering intervals as continuous when consuming a clustering_row, it would prevent
// us from marking the interval as continuous.
// Valid when _state == reading_from_underlying.
bool _population_range_starts_before_all_rows;
future<> do_fill_buffer(db::timeout_clock::time_point);
void copy_from_cache_to_buffer();
future<> process_static_row(db::timeout_clock::time_point);
@@ -226,6 +234,7 @@ inline
future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_point timeout) {
if (_state == state::move_to_underlying) {
_state = state::reading_from_underlying;
_population_range_starts_before_all_rows = _lower_bound.is_before_all_clustered_rows(*_schema);
auto end = _next_row_in_range ? position_in_partition(_next_row.position())
: position_in_partition(_upper_bound);
return _read_context->fast_forward_to(position_range{_lower_bound, std::move(end)}, timeout).then([this, timeout] {
@@ -351,7 +360,7 @@ future<> cache_flat_mutation_reader::read_from_underlying(db::timeout_clock::tim
inline
bool cache_flat_mutation_reader::ensure_population_lower_bound() {
if (!_ck_ranges_curr->start()) {
if (_population_range_starts_before_all_rows) {
return true;
}
if (!_last_row.refresh(*_snp)) {
@@ -406,6 +415,7 @@ inline
void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
if (!can_populate()) {
_last_row = nullptr;
_population_range_starts_before_all_rows = false;
_read_context->cache().on_mispopulate();
return;
}
@@ -439,6 +449,7 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
with_allocator(standard_allocator(), [&] {
_last_row = partition_snapshot_row_weakref(*_snp, it, true);
});
_population_range_starts_before_all_rows = false;
});
}

View File

@@ -70,7 +70,7 @@ public:
{
if (!with_static_row) {
if (_current == _end) {
_current_start = _current_end = position_in_partition_view::after_all_clustered_rows();
_current_start = position_in_partition_view::before_all_clustered_rows();
} else {
_current_start = position_in_partition_view::for_range_start(*_current);
_current_end = position_in_partition_view::for_range_end(*_current);

View File

@@ -67,6 +67,12 @@ class error_collector : public error_listener<RecognizerType, ExceptionBaseType>
*/
const sstring_view _query;
/**
* An empty bitset to be used as a workaround for AntLR null dereference
* bug.
*/
static typename ExceptionBaseType::BitsetListType _empty_bit_list;
public:
/**
@@ -144,6 +150,14 @@ private:
break;
}
default:
// AntLR Exception class has a bug of dereferencing a null
// pointer in the displayRecognitionError. The following
// if statement makes sure it will not be null before the
// call to that function (displayRecognitionError).
// bug reference: https://github.com/antlr/antlr3/issues/191
if (!ex->get_expectingSet()) {
ex->set_expectingSet(&_empty_bit_list);
}
ex->displayRecognitionError(token_names, msg);
}
return msg.str();
@@ -345,4 +359,8 @@ private:
#endif
};
template<typename RecognizerType, typename TokenType, typename ExceptionBaseType>
typename ExceptionBaseType::BitsetListType
error_collector<RecognizerType,TokenType,ExceptionBaseType>::_empty_bit_list = typename ExceptionBaseType::BitsetListType();
}

View File

@@ -209,19 +209,18 @@ void query_options::prepare(const std::vector<::shared_ptr<column_specification>
}
auto& names = *_names;
std::vector<cql3::raw_value> ordered_values;
std::vector<cql3::raw_value_view> ordered_values;
ordered_values.reserve(specs.size());
for (auto&& spec : specs) {
auto& spec_name = spec->name->text();
for (size_t j = 0; j < names.size(); j++) {
if (names[j] == spec_name) {
ordered_values.emplace_back(_values[j]);
ordered_values.emplace_back(_value_views[j]);
break;
}
}
}
_values = std::move(ordered_values);
fill_value_views();
_value_views = std::move(ordered_values);
}
void query_options::fill_value_views()

View File

@@ -202,6 +202,14 @@ public:
const query_options& options,
gc_clock::time_point now) const override;
virtual std::vector<bytes_opt> values_raw(const query_options& options) const = 0;
virtual std::vector<bytes_opt> values(const query_options& options) const override {
std::vector<bytes_opt> ret = values_raw(options);
std::sort(ret.begin(),ret.end());
ret.erase(std::unique(ret.begin(),ret.end()),ret.end());
return ret;
}
#if 0
@Override
protected final boolean isSupportedBy(SecondaryIndex index)
@@ -224,7 +232,7 @@ public:
return abstract_restriction::term_uses_function(_values, ks_name, function_name);
}
virtual std::vector<bytes_opt> values(const query_options& options) const override {
virtual std::vector<bytes_opt> values_raw(const query_options& options) const override {
std::vector<bytes_opt> ret;
for (auto&& v : _values) {
ret.emplace_back(to_bytes_opt(v->bind_and_get(options)));
@@ -249,7 +257,7 @@ public:
return false;
}
virtual std::vector<bytes_opt> values(const query_options& options) const override {
virtual std::vector<bytes_opt> values_raw(const query_options& options) const override {
auto&& lval = dynamic_pointer_cast<multi_item_terminal>(_marker->bind(options));
if (!lval) {
throw exceptions::invalid_request_exception("Invalid null value for IN restriction");

View File

@@ -105,9 +105,11 @@ public:
virtual void reset() = 0;
virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver) override {
if (receiver->type == get_type()) {
auto t1 = receiver->type->underlying_type();
auto t2 = get_type()->underlying_type();
if (t1 == t2) {
return assignment_testable::test_result::EXACT_MATCH;
} else if (receiver->type->is_value_compatible_with(*get_type())) {
} else if (t1->is_value_compatible_with(*t2)) {
return assignment_testable::test_result::WEAKLY_ASSIGNABLE;
} else {
return assignment_testable::test_result::NOT_ASSIGNABLE;

View File

@@ -128,6 +128,17 @@ cql3::statements::create_keyspace_statement::prepare(database& db, cql_stats& st
return std::make_unique<prepared_statement>(make_shared<create_keyspace_statement>(*this));
}
future<> cql3::statements::create_keyspace_statement::grant_permissions_to_creator(const service::client_state& cs) {
return do_with(auth::make_data_resource(keyspace()), [&cs](const auth::resource& r) {
return auth::grant_applicable_permissions(
*cs.get_auth_service(),
*cs.user(),
r).handle_exception_type([](const auth::unsupported_authorization_operation&) {
// Nothing.
});
});
}
}
}

View File

@@ -84,6 +84,8 @@ public:
virtual future<shared_ptr<cql_transport::event::schema_change>> announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only) override;
virtual std::unique_ptr<prepared> prepare(database& db, cql_stats& stats) override;
virtual future<> grant_permissions_to_creator(const service::client_state&) override;
};
}

View File

@@ -70,6 +70,8 @@ public:
, _if_not_exists(if_not_exists) {
}
future<> grant_permissions_to_creator(const service::client_state&) const;
void validate(distributed<service::storage_proxy>&, const service::client_state&) override;
virtual future<> check_access(const service::client_state&) override;

View File

@@ -49,6 +49,8 @@
#include "cql3/statements/create_table_statement.hh"
#include "cql3/statements/prepared_statement.hh"
#include "auth/resource.hh"
#include "auth/service.hh"
#include "schema_builder.hh"
#include "service/storage_service.hh"
@@ -162,6 +164,16 @@ create_table_statement::prepare(database& db, cql_stats& stats) {
abort();
}
future<> create_table_statement::grant_permissions_to_creator(const service::client_state& cs) {
return do_with(auth::make_data_resource(keyspace(), column_family()), [&cs](const auth::resource& r) {
return auth::grant_applicable_permissions(
*cs.get_auth_service(),
*cs.user(),
r).handle_exception_type([](const auth::unsupported_authorization_operation&) {
// Nothing.
});
});
}
create_table_statement::raw_statement::raw_statement(::shared_ptr<cf_name> name, bool if_not_exists)
: cf_statement{std::move(name)}

View File

@@ -106,6 +106,8 @@ public:
virtual std::unique_ptr<prepared> prepare(database& db, cql_stats& stats) override;
virtual future<> grant_permissions_to_creator(const service::client_state&) override;
schema_ptr get_cf_meta_data(const database&);
class raw_statement;

View File

@@ -51,5 +51,8 @@ cql3::statements::grant_statement::execute(distributed<service::storage_proxy>&
}).handle_exception_type([](const auth::nonexistant_role& e) {
return make_exception_future<::shared_ptr<cql_transport::messages::result_message>>(
exceptions::invalid_request_exception(e.what()));
}).handle_exception_type([](const auth::unsupported_authorization_operation& e) {
return make_exception_future<::shared_ptr<cql_transport::messages::result_message>>(
exceptions::invalid_request_exception(e.what()));
});
}

View File

@@ -171,6 +171,9 @@ cql3::statements::list_permissions_statement::execute(
}).handle_exception_type([](const auth::nonexistant_role& e) {
return make_exception_future<::shared_ptr<cql_transport::messages::result_message>>(
exceptions::invalid_request_exception(e.what()));
}).handle_exception_type([](const auth::unsupported_authorization_operation& e) {
return make_exception_future<::shared_ptr<cql_transport::messages::result_message>>(
exceptions::invalid_request_exception(e.what()));
});
});
}

View File

@@ -70,7 +70,7 @@ cql3::statements::list_users_statement::execute(distributed<service::storage_pro
make_column_spec("name", utf8_type),
make_column_spec("super", boolean_type)});
static const auto make_results = [](auth::service& as, std::unordered_set<sstring>&& roles) {
static const auto make_results = [](const auth::service& as, std::unordered_set<sstring>&& roles) {
using cql_transport::messages::result_message;
auto results = std::make_unique<result_set>(metadata);
@@ -98,8 +98,8 @@ cql3::statements::list_users_statement::execute(distributed<service::storage_pro
});
};
auto& cs = state.get_client_state();
auto& as = *cs.get_auth_service();
const auto& cs = state.get_client_state();
const auto& as = *cs.get_auth_service();
const auto user = cs.user();
return auth::has_superuser(as, *user).then([&cs, &as, user](bool has_superuser) {

View File

@@ -51,5 +51,8 @@ cql3::statements::revoke_statement::execute(distributed<service::storage_proxy>&
}).handle_exception_type([](const auth::nonexistant_role& e) {
return make_exception_future<::shared_ptr<cql_transport::messages::result_message>>(
exceptions::invalid_request_exception(e.what()));
}).handle_exception_type([](const auth::unsupported_authorization_operation& e) {
return make_exception_future<::shared_ptr<cql_transport::messages::result_message>>(
exceptions::invalid_request_exception(e.what()));
});
}

View File

@@ -93,6 +93,17 @@ void validate_cluster_support() {
// `create_role_statement`
//
future<> create_role_statement::grant_permissions_to_creator(const service::client_state& cs) const {
return do_with(auth::make_role_resource(_role), [&cs](const auth::resource& r) {
return auth::grant_applicable_permissions(
*cs.get_auth_service(),
*cs.user(),
r).handle_exception_type([](const auth::unsupported_authorization_operation&) {
// Nothing.
});
});
}
void create_role_statement::validate(distributed<service::storage_proxy>&, const service::client_state&) {
validate_cluster_support();
}
@@ -123,9 +134,12 @@ create_role_statement::execute(distributed<service::storage_proxy>&,
std::move(config),
extract_authentication_options(_options),
[this, &state](const auth::role_config& config, const auth::authentication_options& authen_options) {
auto& as = *state.get_client_state().get_auth_service();
const auto& cs = state.get_client_state();
auto& as = *cs.get_auth_service();
return auth::create_role(as, _role, config, authen_options).then([] {
return auth::create_role(as, _role, config, authen_options).then([this, &cs] {
return grant_permissions_to_creator(cs);
}).then([] {
return void_result_message();
}).handle_exception_type([this](const auth::role_already_exists& e) {
if (!_if_not_exists) {
@@ -300,8 +314,6 @@ future<> list_roles_statement::check_access(const service::client_state& state)
future<result_message_ptr>
list_roles_statement::execute(distributed<service::storage_proxy>&, service::query_state& state, const query_options&) {
unimplemented::warn(unimplemented::cause::ROLES);
static const sstring virtual_table_name("roles");
static const auto make_column_spec = [](const sstring& name, const ::shared_ptr<const abstract_type>& ty) {
@@ -312,14 +324,19 @@ list_roles_statement::execute(distributed<service::storage_proxy>&, service::que
ty);
};
static const thread_local auto custom_options_type = map_type_impl::get_instance(utf8_type, utf8_type, true);
static const thread_local auto metadata = ::make_shared<cql3::metadata>(
std::vector<::shared_ptr<column_specification>>{
make_column_spec("role", utf8_type),
make_column_spec("super", boolean_type),
make_column_spec("login", boolean_type)});
make_column_spec("login", boolean_type),
make_column_spec("options", custom_options_type)});
static const auto make_results = [](auth::role_manager& rm, auth::role_set&& roles)
-> future<result_message_ptr> {
static const auto make_results = [](
const auth::role_manager& rm,
const auth::authenticator& a,
auth::role_set&& roles) -> future<result_message_ptr> {
auto results = std::make_unique<result_set>(metadata);
if (roles.empty()) {
@@ -333,14 +350,26 @@ list_roles_statement::execute(distributed<service::storage_proxy>&, service::que
return do_with(
std::move(sorted_roles),
std::move(results),
[&rm](const std::vector<sstring>& sorted_roles, std::unique_ptr<result_set>& results) {
return do_for_each(sorted_roles, [&results, &rm](const sstring& role) {
[&rm, &a](const std::vector<sstring>& sorted_roles, std::unique_ptr<result_set>& results) {
return do_for_each(sorted_roles, [&results, &rm, &a](const sstring& role) {
return when_all_succeed(
rm.can_login(role),
rm.is_superuser(role)).then([&results, &role](bool login, bool super) {
rm.is_superuser(role),
a.query_custom_options(role)).then([&results, &role](
bool login,
bool super,
auth::custom_options os) {
results->add_column_value(utf8_type->decompose(role));
results->add_column_value(boolean_type->decompose(super));
results->add_column_value(boolean_type->decompose(login));
results->add_column_value(
custom_options_type->decompose(
make_map_value(
custom_options_type,
map_type_impl::native_type(
std::make_move_iterator(os.begin()),
std::make_move_iterator(os.end())))));
});
}).then([&results] {
return make_ready_future<result_message_ptr>(::make_shared<result_message::rows>(std::move(results)));
@@ -348,12 +377,13 @@ list_roles_statement::execute(distributed<service::storage_proxy>&, service::que
});
};
auto& cs = state.get_client_state();
auto& as = *cs.get_auth_service();
const auto& cs = state.get_client_state();
const auto& as = *cs.get_auth_service();
const auto user = cs.user();
return auth::has_superuser(as, *user).then([this, &state, &cs, &as, user](bool super) {
auto& rm = as.underlying_role_manager();
const auto& rm = as.underlying_role_manager();
const auto& a = as.underlying_authenticator();
const auto query_mode = _recursive ? auth::recursive_role_query::yes : auth::recursive_role_query::no;
if (!_grantee) {
@@ -361,19 +391,21 @@ list_roles_statement::execute(distributed<service::storage_proxy>&, service::que
// only the roles granted to them.
return cs.check_has_permission(
auth::permission::DESCRIBE,
auth::root_role_resource()).then([&cs, &rm, user, query_mode](bool has_describe) {
auth::root_role_resource()).then([&cs, &rm, &a, user, query_mode](bool has_describe) {
if (has_describe) {
return rm.query_all().then([&rm](auto&& roles) { return make_results(rm, std::move(roles)); });
return rm.query_all().then([&rm, &a](auto&& roles) {
return make_results(rm, a, std::move(roles));
});
}
return rm.query_granted(*user->name, query_mode).then([&rm](auth::role_set roles) {
return make_results(rm, std::move(roles));
return rm.query_granted(*user->name, query_mode).then([&rm, &a](auth::role_set roles) {
return make_results(rm, a, std::move(roles));
});
});
}
return rm.query_granted(*_grantee, query_mode).then([&rm](auth::role_set roles) {
return make_results(rm, std::move(roles));
return rm.query_granted(*_grantee, query_mode).then([&rm, &a](auth::role_set roles) {
return make_results(rm, a, std::move(roles));
});
}).handle_exception_type([](const auth::nonexistant_role& e) {
return make_exception_future<result_message_ptr>(exceptions::invalid_request_exception(e.what()));
@@ -394,8 +426,6 @@ future<> grant_role_statement::check_access(const service::client_state& state)
future<result_message_ptr>
grant_role_statement::execute(distributed<service::storage_proxy>&, service::query_state& state, const query_options&) {
unimplemented::warn(unimplemented::cause::ROLES);
auto& as = *state.get_client_state().get_auth_service();
return as.underlying_role_manager().grant(_grantee, _role).then([] {
@@ -421,8 +451,6 @@ future<result_message_ptr> revoke_role_statement::execute(
distributed<service::storage_proxy>&,
service::query_state& state,
const query_options&) {
unimplemented::warn(unimplemented::cause::ROLES);
auto& rm = state.get_client_state().get_auth_service()->underlying_role_manager();
return rm.revoke(_revokee, _role).then([] {

View File

@@ -59,6 +59,10 @@ schema_altering_statement::schema_altering_statement(::shared_ptr<cf_name> name)
{
}
future<> schema_altering_statement::grant_permissions_to_creator(const service::client_state&) {
return make_ready_future<>();
}
bool schema_altering_statement::uses_function(const sstring& ks_name, const sstring& function_name) const
{
return cf_statement::uses_function(ks_name, function_name);
@@ -103,7 +107,11 @@ schema_altering_statement::execute0(distributed<service::storage_proxy>& proxy,
future<::shared_ptr<messages::result_message>>
schema_altering_statement::execute(distributed<service::storage_proxy>& proxy, service::query_state& state, const query_options& options) {
return execute0(proxy, state, options, false);
return execute0(proxy, state, options, false).then([this, &state](::shared_ptr<messages::result_message> result) {
return grant_permissions_to_creator(state.get_client_state()).then([result = std::move(result)] {
return result;
});
});
}
future<::shared_ptr<messages::result_message>>

View File

@@ -71,6 +71,14 @@ protected:
schema_altering_statement(::shared_ptr<cf_name> name);
/**
* When a new database object (keyspace, table) is created, the creator needs to be granted all applicable
* permissions on it.
*
* By default, this function does nothing.
*/
virtual future<> grant_permissions_to_creator(const service::client_state&);
virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const override;
virtual bool depends_on_keyspace(const sstring& ks_name) const override;

View File

@@ -53,6 +53,9 @@ update_parameters::get_prefetched_list(
return {};
}
if (column.is_static()) {
ckey = clustering_key_view::make_empty();
}
auto i = _prefetched->rows.find(std::make_pair(std::move(pkey), std::move(ckey)));
if (i == _prefetched->rows.end()) {
return {};

View File

@@ -361,9 +361,13 @@ filter_sstable_for_reader(std::vector<sstables::shared_sstable>&& sstables, colu
};
sstables.erase(boost::remove_if(sstables, sstable_has_not_key), sstables.end());
// FIXME: Workaround for https://github.com/scylladb/scylla/issues/3552
// and https://github.com/scylladb/scylla/issues/3553
const bool filtering_broken = true;
// no clustering filtering is applied if schema defines no clustering key or
// compaction strategy thinks it will not benefit from such an optimization.
if (!schema->clustering_key_size() || !cf.get_compaction_strategy().use_clustering_key_filter()) {
if (filtering_broken || !schema->clustering_key_size() || !cf.get_compaction_strategy().use_clustering_key_filter()) {
return sstables;
}
::cf_stats* stats = cf.cf_stats();
@@ -1053,30 +1057,31 @@ column_family::try_flush_memtable_to_sstable(lw_shared_ptr<memtable> old, sstabl
database_sstable_write_monitor monitor(std::move(permit), newtab, _compaction_manager, _compaction_strategy);
return do_with(std::move(monitor), [this, old, newtab] (auto& monitor) {
auto&& priority = service::get_local_memtable_flush_priority();
return write_memtable_to_sstable(*old, newtab, monitor, incremental_backups_enabled(), priority, false).then([this, newtab, old, &monitor] {
// Switch back to default scheduling group for post-flush actions, to avoid them being staved by the memtable flush
// controller. Cache update does not affect the input of the memtable cpu controller, so it can be subject to
// priority inversion.
return with_scheduling_group(default_scheduling_group(), [this, &monitor, old = std::move(old), newtab = std::move(newtab)] () mutable {
return newtab->open_data().then([this, old, newtab] () {
dblog.debug("Flushing to {} done", newtab->get_filename());
return with_scheduling_group(_config.memtable_to_cache_scheduling_group, [this, old, newtab] {
return update_cache(old, newtab);
auto f = write_memtable_to_sstable(*old, newtab, monitor, incremental_backups_enabled(), priority, false);
// Switch back to default scheduling group for post-flush actions, to avoid them being staved by the memtable flush
// controller. Cache update does not affect the input of the memtable cpu controller, so it can be subject to
// priority inversion.
return with_scheduling_group(default_scheduling_group(), [this, &monitor, old = std::move(old), newtab = std::move(newtab), f = std::move(f)] () mutable {
return f.then([this, newtab, old, &monitor] {
return newtab->open_data().then([this, old, newtab] () {
dblog.debug("Flushing to {} done", newtab->get_filename());
return with_scheduling_group(_config.memtable_to_cache_scheduling_group, [this, old, newtab] {
return update_cache(old, newtab);
});
}).then([this, old, newtab] () noexcept {
_memtables->erase(old);
dblog.debug("Memtable for {} replaced", newtab->get_filename());
return stop_iteration::yes;
});
}).handle_exception([this, old, newtab, &monitor] (auto e) {
monitor.write_failed();
newtab->mark_for_deletion();
dblog.error("failed to write sstable {}: {}", newtab->get_filename(), e);
// If we failed this write we will try the write again and that will create a new flush reader
// that will decrease dirty memory again. So we need to reset the accounting.
old->revert_flushed_memory();
return stop_iteration(_async_gate.is_closed());
});
}).then([this, old, newtab] () noexcept {
_memtables->erase(old);
dblog.debug("Memtable for {} replaced", newtab->get_filename());
return stop_iteration::yes;
}).handle_exception([this, old, newtab, &monitor] (auto e) {
monitor.write_failed();
newtab->mark_for_deletion();
dblog.error("failed to write sstable {}: {}", newtab->get_filename(), e);
// If we failed this write we will try the write again and that will create a new flush reader
// that will decrease dirty memory again. So we need to reset the accounting.
old->revert_flushed_memory();
return stop_iteration::no;
});
});
});
});
});
@@ -1614,7 +1619,7 @@ inline bool column_family::manifest_json_filter(const lister::path&, const direc
// TODO: possibly move it to seastar
template <typename Service, typename PtrType, typename Func>
static future<> invoke_shards_with_ptr(std::vector<shard_id> shards, distributed<Service>& s, PtrType ptr, Func&& func) {
static future<> invoke_shards_with_ptr(std::unordered_set<shard_id> shards, distributed<Service>& s, PtrType ptr, Func&& func) {
return parallel_for_each(std::move(shards), [&s, &func, ptr] (shard_id id) {
return s.invoke_on(id, [func, foreign = make_foreign(ptr)] (Service& s) mutable {
return func(s, std::move(foreign));
@@ -1632,16 +1637,23 @@ future<> distributed_loader::open_sstable(distributed<database>& db, sstables::e
// to distribute evenly the resource usage among all shards.
return db.invoke_on(column_family::calculate_shard_from_sstable_generation(comps.generation),
[&db, comps = std::move(comps), func = std::move(func), pc] (database& local) {
[&db, comps = std::move(comps), func = std::move(func), &pc] (database& local) {
return with_semaphore(local.sstable_load_concurrency_sem(), 1, [&db, &local, comps = std::move(comps), func = std::move(func), pc] {
return with_semaphore(local.sstable_load_concurrency_sem(), 1, [&db, &local, comps = std::move(comps), func = std::move(func), &pc] {
auto& cf = local.find_column_family(comps.ks, comps.cf);
auto f = sstables::sstable::load_shared_components(cf.schema(), cf._config.datadir, comps.generation, comps.version, comps.format, pc);
return f.then([&db, comps = std::move(comps), func = std::move(func)] (sstables::sstable_open_info info) {
// shared components loaded, now opening sstable in all shards that own it with shared components
return do_with(std::move(info), [&db, comps = std::move(comps), func = std::move(func)] (auto& info) {
return invoke_shards_with_ptr(info.owners, db, std::move(info.components),
// All shards that own the sstable is interested in it in addition to shard that
// is responsible for its generation. We may need to add manually this shard
// because sstable may not contain data that belong to it.
auto shards_interested_in_this_sstable = boost::copy_range<std::unordered_set<shard_id>>(info.owners);
shard_id shard_responsible_for_generation = column_family::calculate_shard_from_sstable_generation(comps.generation);
shards_interested_in_this_sstable.insert(shard_responsible_for_generation);
return invoke_shards_with_ptr(std::move(shards_interested_in_this_sstable), db, std::move(info.components),
[owners = info.owners, data = info.data.dup(), index = info.index.dup(), comps, func] (database& db, auto components) {
auto& cf = db.find_column_family(comps.ks, comps.cf);
return func(cf, sstables::foreign_sstable_open_info{std::move(components), owners, data, index});
@@ -2151,6 +2163,11 @@ database::database(const db::config& cfg, database_config dbcfg)
void backlog_controller::adjust() {
auto backlog = _current_backlog();
if (backlog >= _control_points.back().input) {
update_controller(_control_points.back().output);
return;
}
// interpolate to find out which region we are. This run infrequently and there are a fixed
// number of points so a simple loop will do.
size_t idx = 1;
@@ -2650,6 +2667,7 @@ bool database::update_column_family(schema_ptr new_schema) {
void database::remove(const column_family& cf) {
auto s = cf.schema();
auto& ks = find_keyspace(s->ks_name());
_querier_cache.evict_all_for_table(s->id());
_column_families.erase(s->id());
ks.metadata()->remove_column_family(s);
_ks_cf_to_uuid.erase(std::make_pair(s->ks_name(), s->cf_name()));
@@ -2799,6 +2817,7 @@ keyspace::make_column_family_config(const schema& s, const db::config& db_config
cfg.enable_disk_writes = _config.enable_disk_writes;
cfg.enable_commitlog = _config.enable_commitlog;
cfg.enable_cache = _config.enable_cache;
cfg.compaction_enforce_min_threshold = _config.compaction_enforce_min_threshold;
cfg.dirty_memory_manager = _config.dirty_memory_manager;
cfg.streaming_dirty_memory_manager = _config.streaming_dirty_memory_manager;
cfg.read_concurrency_semaphore = _config.read_concurrency_semaphore;
@@ -2813,6 +2832,7 @@ keyspace::make_column_family_config(const schema& s, const db::config& db_config
cfg.commitlog_scheduling_group = _config.commitlog_scheduling_group;
cfg.enable_metrics_reporting = db_config.enable_keyspace_column_family_metrics();
cfg.large_partition_warning_threshold_bytes = db_config.compaction_large_partition_warning_threshold_mb()*1024*1024;
cfg.view_update_concurrency_semaphore = _config.view_update_concurrency_semaphore;
return cfg;
}
@@ -3497,7 +3517,7 @@ future<> database::do_apply(schema_ptr s, const frozen_mutation& m, db::timeout_
if (cf.views().empty()) {
return apply_with_commitlog(std::move(s), cf, std::move(uuid), m, timeout);
}
future<row_locker::lock_holder> f = cf.push_view_replica_updates(s, m);
future<row_locker::lock_holder> f = cf.push_view_replica_updates(s, m, timeout);
return f.then([this, s = std::move(s), uuid = std::move(uuid), &m, timeout] (row_locker::lock_holder lock) {
auto& cf = find_column_family(uuid);
return apply_with_commitlog(std::move(s), cf, std::move(uuid), m, timeout).finally(
@@ -3567,6 +3587,7 @@ database::make_keyspace_config(const keyspace_metadata& ksm) {
cfg.enable_commitlog = false;
cfg.enable_cache = false;
}
cfg.compaction_enforce_min_threshold = _cfg->compaction_enforce_min_threshold();
cfg.dirty_memory_manager = &_dirty_memory_manager;
cfg.streaming_dirty_memory_manager = &_streaming_dirty_memory_manager;
cfg.read_concurrency_semaphore = &_read_concurrency_sem;
@@ -3581,6 +3602,7 @@ database::make_keyspace_config(const keyspace_metadata& ksm) {
cfg.query_scheduling_group = _dbcfg.query_scheduling_group;
cfg.commitlog_scheduling_group = _dbcfg.commitlog_scheduling_group;
cfg.enable_metrics_reporting = _cfg->enable_keyspace_column_family_metrics();
cfg.view_update_concurrency_semaphore = &_view_update_concurrency_sem;
return cfg;
}
@@ -3757,7 +3779,10 @@ future<> database::truncate(const keyspace& ks, column_family& cf, timestamp_fun
return f.then([this, &cf, truncated_at, low_mark, should_flush] {
return cf.discard_sstables(truncated_at).then([this, &cf, truncated_at, low_mark, should_flush](db::replay_position rp) {
// TODO: indexes.
assert(low_mark <= rp);
// Note: since discard_sstables was changed to only count tables owned by this shard,
// we can get zero rp back. Changed assert, and ensure we save at least low_mark.
assert(low_mark <= rp || rp == db::replay_position());
rp = std::max(low_mark, rp);
return truncate_views(cf, truncated_at, should_flush).then([&cf, truncated_at, rp] {
return db::system_keyspace::save_truncation_record(cf, truncated_at, rp);
});
@@ -4165,8 +4190,11 @@ future<db::replay_position> column_family::discard_sstables(db_clock::time_point
for (auto& p : *cf._sstables->all()) {
if (p->max_data_age() <= gc_trunc) {
rp = std::max(p->get_stats_metadata().position, rp);
remove.emplace_back(p);
// Only one shard that own the sstable will submit it for deletion to avoid race condition in delete procedure.
if (*boost::min_element(p->get_shards_for_this_sstable()) == engine().cpu_id()) {
rp = std::max(p->get_stats_metadata().position, rp);
remove.emplace_back(p);
}
continue;
}
pruned->insert(p);
@@ -4312,13 +4340,17 @@ std::vector<view_ptr> column_family::affected_views(const schema_ptr& base, cons
future<> column_family::generate_and_propagate_view_updates(const schema_ptr& base,
std::vector<view_ptr>&& views,
mutation&& m,
flat_mutation_reader_opt existings) const {
flat_mutation_reader_opt existings,
db::timeout_clock::time_point timeout) const {
auto base_token = m.token();
return db::view::generate_view_updates(base,
std::move(views),
flat_mutation_reader_from_mutations({std::move(m)}),
std::move(existings)).then([base_token = std::move(base_token)] (auto&& updates) {
db::view::mutate_MV(std::move(base_token), std::move(updates));
std::move(existings)).then([this, timeout, base_token = std::move(base_token)] (auto&& updates) mutable {
return seastar::get_units(*_config.view_update_concurrency_semaphore, 1, timeout).then(
[base_token = std::move(base_token), updates = std::move(updates)] (auto units) mutable {
db::view::mutate_MV(std::move(base_token), std::move(updates)).handle_exception([units = std::move(units)] (auto ignored) { });
});
});
}
@@ -4326,7 +4358,7 @@ future<> column_family::generate_and_propagate_view_updates(const schema_ptr& ba
* Given an update for the base table, calculates the set of potentially affected views,
* generates the relevant updates, and sends them to the paired view replicas.
*/
future<row_locker::lock_holder> column_family::push_view_replica_updates(const schema_ptr& s, const frozen_mutation& fm) const {
future<row_locker::lock_holder> column_family::push_view_replica_updates(const schema_ptr& s, const frozen_mutation& fm, db::timeout_clock::time_point timeout) const {
//FIXME: Avoid unfreezing here.
auto m = fm.unfreeze(s);
auto& base = schema();
@@ -4337,7 +4369,7 @@ future<row_locker::lock_holder> column_family::push_view_replica_updates(const s
}
auto cr_ranges = db::view::calculate_affected_clustering_ranges(*base, m.decorated_key(), m.partition(), views);
if (cr_ranges.empty()) {
return generate_and_propagate_view_updates(base, std::move(views), std::move(m), { }).then([] {
return generate_and_propagate_view_updates(base, std::move(views), std::move(m), { }, timeout).then([] {
// In this case we are not doing a read-before-write, just a
// write, so no lock is needed.
return make_ready_future<row_locker::lock_holder>();
@@ -4359,18 +4391,18 @@ future<row_locker::lock_holder> column_family::push_view_replica_updates(const s
// We'll return this lock to the caller, which will release it after
// writing the base-table update.
future<row_locker::lock_holder> lockf = local_base_lock(base, m.decorated_key(), slice.default_row_ranges());
return lockf.then([m = std::move(m), slice = std::move(slice), views = std::move(views), base, this] (row_locker::lock_holder lock) {
return lockf.then([m = std::move(m), slice = std::move(slice), views = std::move(views), base, this, timeout] (row_locker::lock_holder lock) {
return do_with(
dht::partition_range::make_singular(m.decorated_key()),
std::move(slice),
std::move(m),
[base, views = std::move(views), lock = std::move(lock), this] (auto& pk, auto& slice, auto& m) mutable {
[base, views = std::move(views), lock = std::move(lock), this, timeout] (auto& pk, auto& slice, auto& m) mutable {
auto reader = this->as_mutation_source().make_reader(
base,
pk,
slice,
service::get_local_sstable_query_read_priority());
return this->generate_and_propagate_view_updates(base, std::move(views), std::move(m), std::move(reader)).then([lock = std::move(lock)] () mutable {
return this->generate_and_propagate_view_updates(base, std::move(views), std::move(m), std::move(reader), timeout).then([lock = std::move(lock)] () mutable {
// return the local partition/row lock we have taken so it
// remains locked until the caller is done modifying this
// partition/row and destroys the lock object.
@@ -4516,16 +4548,14 @@ flat_mutation_reader make_local_shard_sstable_reader(schema_ptr s,
}
return reader;
};
return make_combined_reader(s, std::make_unique<incremental_reader_selector>(s,
std::move(sstables),
pr,
slice,
pc,
std::move(resource_tracker),
std::move(trace_state),
fwd,
fwd_mr,
std::move(reader_factory_fn)),
auto all_readers = boost::copy_range<std::vector<flat_mutation_reader>>(
*sstables->all()
| boost::adaptors::transformed([&] (sstables::shared_sstable sst) -> flat_mutation_reader {
return reader_factory_fn(sst, pr);
})
);
return make_combined_reader(s,
std::move(all_readers),
fwd,
fwd_mr);
}
@@ -4544,16 +4574,14 @@ flat_mutation_reader make_range_sstable_reader(schema_ptr s,
auto reader_factory_fn = [s, &slice, &pc, resource_tracker, fwd, fwd_mr, &monitor_generator] (sstables::shared_sstable& sst, const dht::partition_range& pr) {
return sst->read_range_rows_flat(s, pr, slice, pc, resource_tracker, fwd, fwd_mr, monitor_generator(sst));
};
return make_combined_reader(s, std::make_unique<incremental_reader_selector>(s,
std::move(sstables),
pr,
slice,
pc,
std::move(resource_tracker),
std::move(trace_state),
fwd,
fwd_mr,
std::move(reader_factory_fn)),
auto sstable_readers = boost::copy_range<std::vector<flat_mutation_reader>>(
*sstables->all()
| boost::adaptors::transformed([&] (sstables::shared_sstable sst) {
return reader_factory_fn(sst, pr);
})
);
return make_combined_reader(s,
std::move(sstable_readers),
fwd,
fwd_mr);
}

View File

@@ -297,6 +297,7 @@ public:
bool enable_cache = true;
bool enable_commitlog = true;
bool enable_incremental_backups = false;
bool compaction_enforce_min_threshold = false;
::dirty_memory_manager* dirty_memory_manager = &default_dirty_memory_manager;
::dirty_memory_manager* streaming_dirty_memory_manager = &default_dirty_memory_manager;
reader_concurrency_semaphore* read_concurrency_semaphore;
@@ -310,6 +311,7 @@ public:
seastar::scheduling_group streaming_scheduling_group;
bool enable_metrics_reporting = false;
uint64_t large_partition_warning_threshold_bytes = std::numeric_limits<uint64_t>::max();
db::timeout_semaphore* view_update_concurrency_semaphore;
};
struct no_commitlog {};
struct stats {
@@ -734,6 +736,10 @@ public:
_config.enable_incremental_backups = val;
}
bool compaction_enforce_min_threshold() const {
return _config.compaction_enforce_min_threshold;
}
const sstables::sstable_set& get_sstable_set() const;
lw_shared_ptr<sstable_list> get_sstables() const;
lw_shared_ptr<sstable_list> get_sstables_including_compacted_undeleted() const;
@@ -787,7 +793,7 @@ public:
void add_or_update_view(view_ptr v);
void remove_view(view_ptr v);
const std::vector<view_ptr>& views() const;
future<row_locker::lock_holder> push_view_replica_updates(const schema_ptr& s, const frozen_mutation& fm) const;
future<row_locker::lock_holder> push_view_replica_updates(const schema_ptr& s, const frozen_mutation& fm, db::timeout_clock::time_point timeout) const;
void add_coordinator_read_latency(utils::estimated_histogram::duration latency);
std::chrono::milliseconds get_coordinator_read_latency_percentile(double percentile);
@@ -803,7 +809,8 @@ private:
future<> generate_and_propagate_view_updates(const schema_ptr& base,
std::vector<view_ptr>&& views,
mutation&& m,
flat_mutation_reader_opt existings) const;
flat_mutation_reader_opt existings,
db::timeout_clock::time_point timeout) const;
mutable row_locker _row_locker;
future<row_locker::lock_holder> local_base_lock(const schema_ptr& s, const dht::decorated_key& pk, const query::clustering_row_ranges& rows) const;
@@ -977,6 +984,7 @@ public:
bool enable_disk_writes = true;
bool enable_cache = true;
bool enable_incremental_backups = false;
bool compaction_enforce_min_threshold = false;
::dirty_memory_manager* dirty_memory_manager = &default_dirty_memory_manager;
::dirty_memory_manager* streaming_dirty_memory_manager = &default_dirty_memory_manager;
reader_concurrency_semaphore* read_concurrency_semaphore;
@@ -989,6 +997,7 @@ public:
seastar::scheduling_group query_scheduling_group;
seastar::scheduling_group streaming_scheduling_group;
bool enable_metrics_reporting = false;
db::timeout_semaphore* view_update_concurrency_semaphore = nullptr;
};
private:
std::unique_ptr<locator::abstract_replication_strategy> _replication_strategy;
@@ -1119,6 +1128,8 @@ private:
semaphore _sstable_load_concurrency_sem{max_concurrent_sstable_loads()};
db::timeout_semaphore _view_update_concurrency_sem{100}; // Stand-in hack for #2538
concrete_execution_stage<future<lw_shared_ptr<query::result>>,
column_family*,
schema_ptr,

View File

@@ -723,7 +723,7 @@ public:
*/
auto me = shared_from_this();
auto fp = _file_pos;
return _pending_ops.wait_for_pending(timeout).then([me = std::move(me), fp, timeout] {
return _pending_ops.wait_for_pending(timeout).then([me, fp, timeout] {
if (fp != me->_file_pos) {
// some other request already wrote this buffer.
// If so, wait for the operation at our intended file offset

View File

@@ -125,6 +125,9 @@ public:
val(compaction_static_shares, float, 0, Used, \
"If set to higher than 0, ignore the controller's output and set the compaction shares statically. Do not set this unless you know what you are doing and suspect a problem in the controller. This option will be retired when the controller reaches more maturity" \
) \
val(compaction_enforce_min_threshold, bool, false, Used, \
"If set to true, enforce the min_threshold option for compactions strictly. If false (default), Scylla may decide to compact even if below min_threshold" \
) \
/* Initialization properties */ \
/* The minimal properties needed for configuring a cluster. */ \
val(cluster_name, sstring, "", Used, \

View File

@@ -827,15 +827,6 @@ static future<> do_merge_schema(distributed<service::storage_proxy>& proxy, std:
/*auto& old_aggregates = */read_schema_for_keyspaces(proxy, AGGREGATES, keyspaces).get0();
#endif
// Incoming mutations have the version field deleted. Delete here as well so that
// schemas which are otherwise equal don't appear as differing.
for (auto&& e : old_column_families) {
schema_mutations& sm = e.second;
if (sm.scylla_tables()) {
delete_schema_version(*sm.scylla_tables());
}
}
proxy.local().mutate_locally(std::move(mutations)).get0();
if (do_flush) {

View File

@@ -27,6 +27,6 @@
namespace db {
using timeout_clock = seastar::lowres_clock;
using timeout_semaphore = basic_semaphore<default_timeout_exception_factory, timeout_clock>;
using timeout_semaphore = seastar::basic_semaphore<seastar::default_timeout_exception_factory, timeout_clock>;
static constexpr timeout_clock::time_point no_timeout = timeout_clock::time_point::max();
}

View File

@@ -175,6 +175,31 @@ static bool update_requires_read_before_write(const schema& base,
return false;
}
static bool is_partition_key_empty(
const schema& base,
const schema& view_schema,
const partition_key& base_key,
const clustering_row& update) {
// Empty partition keys are not supported on normal tables - they cannot
// be inserted or queried, so enforce those rules here.
if (view_schema.partition_key_columns().size() > 1) {
// Composite partition keys are different: all components
// are then allowed to be empty.
return false;
}
auto* base_col = base.get_column_definition(view_schema.partition_key_columns().front().name());
switch (base_col->kind) {
case column_kind::partition_key:
return base_key.get_component(base, base_col->position()).empty();
case column_kind::clustering_key:
return update.key().get_component(base, base_col->position()).empty();
default:
// No multi-cell columns in the view's partition key
auto& c = update.cells().cell_at(base_col->id);
return c.as_atomic_cell().value().empty();
}
}
bool matches_view_filter(const schema& base, const view_info& view, const partition_key& key, const clustering_row& update, gc_clock::time_point now) {
return clustering_prefix_matches(base, view, key, update.key())
&& boost::algorithm::all_of(
@@ -330,7 +355,7 @@ static void add_cells_to_view(const schema& base, const schema& view, const row&
* This method checks that the base row does match the view filter before applying anything.
*/
void view_updates::create_entry(const partition_key& base_key, const clustering_row& update, gc_clock::time_point now) {
if (!matches_view_filter(*_base, _view_info, base_key, update, now)) {
if (is_partition_key_empty(*_base, *_view, base_key, update) || !matches_view_filter(*_base, _view_info, base_key, update, now)) {
return;
}
deletable_row& r = get_view_row(base_key, update);
@@ -346,7 +371,7 @@ void view_updates::create_entry(const partition_key& base_key, const clustering_
void view_updates::delete_old_entry(const partition_key& base_key, const clustering_row& existing, const row_tombstone& t, gc_clock::time_point now) {
// Before deleting an old entry, make sure it was matching the view filter
// (otherwise there is nothing to delete)
if (matches_view_filter(*_base, _view_info, base_key, existing, now)) {
if (!is_partition_key_empty(*_base, *_view, base_key, existing) && matches_view_filter(*_base, _view_info, base_key, existing, now)) {
do_delete_old_entry(base_key, existing, t, now);
}
}
@@ -391,11 +416,11 @@ void view_updates::do_delete_old_entry(const partition_key& base_key, const clus
void view_updates::update_entry(const partition_key& base_key, const clustering_row& update, const clustering_row& existing, gc_clock::time_point now) {
// While we know update and existing correspond to the same view entry,
// they may not match the view filter.
if (!matches_view_filter(*_base, _view_info, base_key, existing, now)) {
if (is_partition_key_empty(*_base, *_view, base_key, existing) || !matches_view_filter(*_base, _view_info, base_key, existing, now)) {
create_entry(base_key, update, now);
return;
}
if (!matches_view_filter(*_base, _view_info, base_key, update, now)) {
if (is_partition_key_empty(*_base, *_view, base_key, update) || !matches_view_filter(*_base, _view_info, base_key, update, now)) {
do_delete_old_entry(base_key, existing, row_tombstone(), now);
return;
}
@@ -791,7 +816,7 @@ get_view_natural_endpoint(const sstring& keyspace_name,
// for the writes to complete.
// FIXME: I dropped a lot of parameters the Cassandra version had,
// we may need them back: writeCommitLog, baseComplete, queryStartNanoTime.
void mutate_MV(const dht::token& base_token,
future<> mutate_MV(const dht::token& base_token,
std::vector<mutation> mutations)
{
#if 0
@@ -823,6 +848,7 @@ void mutate_MV(const dht::token& base_token,
() -> asyncRemoveFromBatchlog(batchlogEndpoints, batchUUID));
// add a handler for each mutation - includes checking availability, but doesn't initiate any writes, yet
#endif
auto fs = std::make_unique<std::vector<future<>>>();
for (auto& mut : mutations) {
auto view_token = mut.token();
auto keyspace_name = mut.schema()->ks_name();
@@ -838,9 +864,10 @@ void mutate_MV(const dht::token& base_token,
// do not wait for it to complete.
// Note also that mutate_locally(mut) copies mut (in
// frozen from) so don't need to increase its lifetime.
service::get_local_storage_proxy().mutate_locally(mut).handle_exception([] (auto ep) {
fs->push_back(service::get_local_storage_proxy().mutate_locally(mut).handle_exception([] (auto ep) {
vlogger.error("Error applying local view update: {}", ep);
});
return make_exception_future<>(std::move(ep));
}));
} else {
#if 0
wrappers.add(wrapViewBatchResponseHandler(mutation,
@@ -856,9 +883,10 @@ void mutate_MV(const dht::token& base_token,
// without a batchlog, and without checking for success
// Note we don't wait for the asynchronous operation to complete
// FIXME: need to extend mut's lifetime???
service::get_local_storage_proxy().send_to_endpoint(mut, *paired_endpoint, db::write_type::VIEW).handle_exception([paired_endpoint] (auto ep) {
fs->push_back(service::get_local_storage_proxy().send_to_endpoint(mut, *paired_endpoint, db::write_type::VIEW).handle_exception([paired_endpoint] (auto ep) {
vlogger.error("Error applying view update to {}: {}", *paired_endpoint, ep);
});;
return make_exception_future<>(std::move(ep));
}));
}
} else {
#if 0
@@ -901,6 +929,8 @@ void mutate_MV(const dht::token& base_token,
viewWriteMetrics.addNano(System.nanoTime() - startTime);
}
#endif
auto f = seastar::when_all_succeed(fs->begin(), fs->end());
return f.finally([fs = std::move(fs)] { });
}
} // namespace view

View File

@@ -92,7 +92,7 @@ query::clustering_row_ranges calculate_affected_clustering_ranges(
const mutation_partition& mp,
const std::vector<view_ptr>& views);
void mutate_MV(const dht::token& base_token,
future<> mutate_MV(const dht::token& base_token,
std::vector<mutation> mutations);
}

View File

@@ -120,7 +120,7 @@ else
fi
fi
echo -n " "
/usr/lib/scylla/scylla_ec2_check
/usr/lib/scylla/scylla_ec2_check --nic eth0
if [ $? -eq 0 ]; then
echo
fi

View File

@@ -1 +0,0 @@
options raid0 devices_discard_performance=Y

View File

@@ -2,6 +2,12 @@
. /usr/lib/scylla/scylla_lib.sh
print_usage() {
echo "scylla_ec2_check --nic eth0"
echo " --nic specify NIC"
exit 1
}
get_en_interface_type() {
TYPE=`curl -s http://169.254.169.254/latest/meta-data/instance-type|cut -d . -f 1`
SUBTYPE=`curl -s http://169.254.169.254/latest/meta-data/instance-type|cut -d . -f 2`
@@ -18,7 +24,7 @@ get_en_interface_type() {
}
is_vpc_enabled() {
MAC=`cat /sys/class/net/eth0/address`
MAC=`cat /sys/class/net/$1/address`
VPC_AVAIL=`curl -s http://169.254.169.254/latest/meta-data/network/interfaces/macs/$MAC/|grep vpc-id`
[ -n "$VPC_AVAIL" ]
}
@@ -27,9 +33,27 @@ if ! is_ec2; then
exit 0
fi
if [ $# -eq 0 ]; then
print_usage
fi
while [ $# -gt 0 ]; do
case "$1" in
"--nic")
verify_args $@
NIC="$2"
shift 2
;;
esac
done
if ! is_valid_nic $NIC; then
echo "NIC $NIC doesn't exist."
exit 1
fi
TYPE=`curl -s http://169.254.169.254/latest/meta-data/instance-type`
EN=`get_en_interface_type`
DRIVER=`ethtool -i eth0|awk '/^driver:/ {print $2}'`
DRIVER=`ethtool -i $NIC|awk '/^driver:/ {print $2}'`
if [ "$EN" = "" ]; then
tput setaf 1
tput bold
@@ -39,7 +63,7 @@ if [ "$EN" = "" ]; then
echo "More documentation available at: "
echo "http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html#enabling_enhanced_networking"
exit 1
elif ! is_vpc_enabled; then
elif ! is_vpc_enabled $NIC; then
tput setaf 1
tput bold
echo "VPC is not enabled!"

View File

@@ -91,6 +91,10 @@ create_perftune_conf() {
/usr/lib/scylla/perftune.py --tune net --nic "$nic" $mode --dump-options-file > /etc/scylla.d/perftune.yaml
}
is_valid_nic() {
[ -d /sys/class/net/$1 ]
}
. /etc/os-release
if is_debian_variant || is_gentoo_variant; then
SYSCONFIG=/etc/default

View File

@@ -96,7 +96,9 @@ elif is_gentoo_variant; then
emerge -uq sys-fs/mdadm sys-fs/xfsprogs
fi
if [ "$ID" = "ubuntu" ] && [ "$VERSION_ID" = "14.04" ]; then
udevadm settle
mdadm --create --verbose --force --run $RAID --level=0 -c1024 --raid-devices=$NR_DISK $DISKS
udevadm settle
mkfs.xfs $RAID -f
else
for dsk in $DISKS; do
@@ -107,7 +109,9 @@ else
fi
done
wait
udevadm settle
mdadm --create --verbose --force --run $RAID --level=0 -c1024 --raid-devices=$NR_DISK $DISKS
udevadm settle
mkfs.xfs $RAID -f -K
fi
if is_debian_variant; then

View File

@@ -39,6 +39,27 @@ print_usage() {
exit 1
}
interactive_choose_nic() {
NICS=$(for i in /sys/class/net/*;do nic=`basename $i`; if [ "$nic" != "lo" ]; then echo $nic; fi; done)
NR_NICS=`echo $NICS|wc -w`
if [ $NR_NICS -eq 0 ]; then
echo "NIC not found."
exit 1
elif [ $NR_NICS -eq 1 ]; then
NIC=$NICS
else
echo "Please select NIC from following list: "
while true; do
echo $NICS
echo -n "> "
read NIC
if is_valid_nic $NIC; then
break
fi
done
fi
}
interactive_ask_service() {
echo $1
echo $2
@@ -112,14 +133,20 @@ run_setup_script() {
name=$1
shift 1
$* &&:
if [ $? -ne 0 ] && [ $INTERACTIVE -eq 1 ]; then
printf "${RED}$name setup failed. press any key to continue...${NO_COLOR}\n"
read
return 1
if [ $? -ne 0 ]; then
if [ $INTERACTIVE -eq 1 ]; then
printf "${RED}$name setup failed. press any key to continue...${NO_COLOR}\n"
read
return 1
else
printf "$name setup failed.\n"
exit 1
fi
fi
return 0
}
NIC="eth0"
AMI=0
SET_NIC=0
DEV_MODE=0
@@ -260,7 +287,8 @@ if is_ec2; then
EC2_CHECK=$?
fi
if [ $EC2_CHECK -eq 1 ]; then
/usr/lib/scylla/scylla_ec2_check
interactive_choose_nic
/usr/lib/scylla/scylla_ec2_check --nic $NIC
fi
fi
@@ -447,24 +475,6 @@ if [ $INTERACTIVE -eq 1 ]; then
interactive_ask_service "Do you want to setup sysconfig?" "Answer yes to do system wide configuration customized for Scylla. Answer no to do nothing." "yes" &&:
SYSCONFIG_SETUP=$?
if [ $SYSCONFIG_SETUP -eq 1 ]; then
NICS=$(for i in /sys/class/net/*;do nic=`basename $i`; if [ "$nic" != "lo" ]; then echo $nic; fi; done)
NR_NICS=`echo $NICS|wc -w`
if [ $NR_NICS -eq 0 ]; then
echo "NIC not found."
exit 1
elif [ $NR_NICS -eq 1 ]; then
NIC=$NICS
else
echo "Please select NIC from following list: "
while true; do
echo $NICS
echo -n "> "
read NIC
if [ -e /sys/class/net/$NIC ]; then
break
fi
done
fi
interactive_ask_service "Do you want to optimize NIC queue settings?" "Answer yes to enable network card optimization and improve performance. Answer no to skip this optimization." "yes" &&:
SET_NIC=$?
fi
@@ -474,6 +484,7 @@ if [ $SYSCONFIG_SETUP -eq 1 ]; then
if [ $SET_NIC -eq 1 ]; then
SETUP_ARGS="--setup-nic"
fi
interactive_choose_nic
run_setup_script "NIC queue" /usr/lib/scylla/scylla_sysconfig_setup --nic $NIC $SETUP_ARGS
fi

View File

@@ -2,10 +2,11 @@
. /etc/os-release
print_usage() {
echo "build_deb.sh -target <codename> --dist --rebuild-dep"
echo "build_deb.sh -target <codename> --dist --rebuild-dep --jobs 2"
echo " --target target distribution codename"
echo " --dist create a public distribution package"
echo " --no-clean don't rebuild pbuilder tgz"
echo " --jobs specify number of jobs"
exit 1
}
install_deps() {
@@ -19,6 +20,7 @@ install_deps() {
DIST=0
TARGET=
NO_CLEAN=0
JOBS=0
while [ $# -gt 0 ]; do
case "$1" in
"--dist")
@@ -33,6 +35,10 @@ while [ $# -gt 0 ]; do
NO_CLEAN=1
shift 1
;;
"--jobs")
JOBS=$2
shift 2
;;
*)
print_usage
;;
@@ -131,7 +137,7 @@ if [ "$TARGET" = "jessie" ]; then
sed -i -e "s/@@INSTALL_FSTRIM@@/dh_installinit --no-start --name scylla-fstrim/g" debian/rules
sed -i -e "s/@@INSTALL_NODE_EXPORTER@@/dh_installinit --no-start --name node-exporter/g" debian/rules
sed -i -e "s#@@COMPILER@@#/opt/scylladb/bin/g++-7#g" debian/rules
sed -i -e "s/@@BUILD_DEPENDS@@/libsystemd-dev, scylla-gcc72-g++-7, libunwind-dev, scylla-antlr35, scylla-libthrift010-dev, scylla-antlr35-c++-dev, scylla-libboost-program-options163-dev, scylla-libboost-filesystem163-dev, scylla-libboost-system163-dev, scylla-libboost-thread163-dev, scylla-libboost-test163-dev/g" debian/control
sed -i -e "s/@@BUILD_DEPENDS@@/libsystemd-dev, scylla-gcc73-g++-7, libunwind-dev, scylla-antlr35, scylla-libthrift010-dev, scylla-antlr35-c++-dev, scylla-libboost-program-options165-dev, scylla-libboost-filesystem165-dev, scylla-libboost-system165-dev, scylla-libboost-thread165-dev, scylla-libboost-test165-dev/g" debian/control
sed -i -e "s/@@DEPENDS@@//g" debian/control
sed -i -e "s#@@INSTALL@@##g" debian/scylla-server.install
sed -i -e "s#@@HKDOTTIMER_D@@#dist/common/systemd/scylla-housekeeping-daily.timer /lib/systemd/system#g" debian/scylla-server.install
@@ -148,7 +154,7 @@ elif [ "$TARGET" = "stretch" ]; then
sed -i -e "s/@@INSTALL_FSTRIM@@/dh_installinit --no-start --name scylla-fstrim/g" debian/rules
sed -i -e "s/@@INSTALL_NODE_EXPORTER@@/dh_installinit --no-start --name node-exporter/g" debian/rules
sed -i -e "s#@@COMPILER@@#/opt/scylladb/bin/g++-7#g" debian/rules
sed -i -e "s/@@BUILD_DEPENDS@@/libsystemd-dev, scylla-gcc72-g++-7, libunwind-dev, antlr3, scylla-libthrift010-dev, scylla-antlr35-c++-dev, libboost-program-options1.62-dev, libboost-filesystem1.62-dev, libboost-system1.62-dev, libboost-thread1.62-dev, libboost-test1.62-dev/g" debian/control
sed -i -e "s/@@BUILD_DEPENDS@@/libsystemd-dev, scylla-gcc73-g++-7, libunwind-dev, antlr3, scylla-libthrift010-dev, scylla-antlr35-c++-dev, libboost-program-options1.62-dev, libboost-filesystem1.62-dev, libboost-system1.62-dev, libboost-thread1.62-dev, libboost-test1.62-dev/g" debian/control
sed -i -e "s/@@DEPENDS@@//g" debian/control
sed -i -e "s#@@INSTALL@@##g" debian/scylla-server.install
sed -i -e "s#@@HKDOTTIMER_D@@#dist/common/systemd/scylla-housekeeping-daily.timer /lib/systemd/system#g" debian/scylla-server.install
@@ -166,7 +172,7 @@ elif [ "$TARGET" = "trusty" ]; then
sed -i -e "s/@@INSTALL_FSTRIM@@//g" debian/rules
sed -i -e "s/@@INSTALL_NODE_EXPORTER@@//g" debian/rules
sed -i -e "s#@@COMPILER@@#/opt/scylladb/bin/g++-7#g" debian/rules
sed -i -e "s/@@BUILD_DEPENDS@@/scylla-gcc72-g++-7, libunwind8-dev, scylla-antlr35, scylla-libthrift010-dev, scylla-antlr35-c++-dev, scylla-libboost-program-options163-dev, scylla-libboost-filesystem163-dev, scylla-libboost-system163-dev, scylla-libboost-thread163-dev, scylla-libboost-test163-dev/g" debian/control
sed -i -e "s/@@BUILD_DEPENDS@@/scylla-gcc73-g++-7, libunwind8-dev, scylla-antlr35, scylla-libthrift010-dev, scylla-antlr35-c++-dev, scylla-libboost-program-options165-dev, scylla-libboost-filesystem165-dev, scylla-libboost-system165-dev, scylla-libboost-thread165-dev, scylla-libboost-test165-dev/g" debian/control
sed -i -e "s/@@DEPENDS@@/hugepages, num-utils/g" debian/control
sed -i -e "s#@@INSTALL@@#dist/debian/sudoers.d/scylla etc/sudoers.d#g" debian/scylla-server.install
sed -i -e "s#@@HKDOTTIMER_D@@##g" debian/scylla-server.install
@@ -183,7 +189,7 @@ elif [ "$TARGET" = "xenial" ]; then
sed -i -e "s/@@INSTALL_FSTRIM@@/dh_installinit --no-start --name scylla-fstrim/g" debian/rules
sed -i -e "s/@@INSTALL_NODE_EXPORTER@@/dh_installinit --no-start --name node-exporter/g" debian/rules
sed -i -e "s#@@COMPILER@@#/opt/scylladb/bin/g++-7#g" debian/rules
sed -i -e "s/@@BUILD_DEPENDS@@/libsystemd-dev, scylla-gcc72-g++-7, libunwind-dev, antlr3, scylla-libthrift010-dev, scylla-antlr35-c++-dev, scylla-libboost-program-options163-dev, scylla-libboost-filesystem163-dev, scylla-libboost-system163-dev, scylla-libboost-thread163-dev, scylla-libboost-test163-dev/g" debian/control
sed -i -e "s/@@BUILD_DEPENDS@@/libsystemd-dev, scylla-gcc73-g++-7, libunwind-dev, antlr3, scylla-libthrift010-dev, scylla-antlr35-c++-dev, scylla-libboost-program-options165-dev, scylla-libboost-filesystem165-dev, scylla-libboost-system165-dev, scylla-libboost-thread165-dev, scylla-libboost-test165-dev/g" debian/control
sed -i -e "s/@@DEPENDS@@/hugepages, /g" debian/control
sed -i -e "s#@@INSTALL@@##g" debian/scylla-server.install
sed -i -e "s#@@HKDOTTIMER_D@@#dist/common/systemd/scylla-housekeeping-daily.timer /lib/systemd/system#g" debian/scylla-server.install
@@ -200,7 +206,7 @@ elif [ "$TARGET" = "bionic" ]; then
sed -i -e "s/@@INSTALL_FSTRIM@@/dh_installinit --no-start --name scylla-fstrim/g" debian/rules
sed -i -e "s/@@INSTALL_NODE_EXPORTER@@/dh_installinit --no-start --name node-exporter/g" debian/rules
sed -i -e "s#@@COMPILER@@#g++-7#g" debian/rules
sed -i -e "s/@@BUILD_DEPENDS@@/libsystemd-dev, g++, libunwind-dev, antlr3, scylla-libthrift010-dev, scylla-antlr35-c++-dev, libboost-program-options-dev, libboost-filesystem-dev, libboost-system-dev, libboost-thread-dev, libboost-test-dev/g" debian/control
sed -i -e "s/@@BUILD_DEPENDS@@/libsystemd-dev, scylla-gcc73-g++-7, libunwind-dev, antlr3, scylla-libthrift010-dev, scylla-antlr35-c++-dev, scylla-libboost-program-options165-dev, scylla-libboost-filesystem165-dev, scylla-libboost-system165-dev, scylla-libboost-thread165-dev, scylla-libboost-test165-dev/g" debian/control
sed -i -e "s/@@DEPENDS@@/hugepages, /g" debian/control
sed -i -e "s#@@INSTALL@@##g" debian/scylla-server.install
sed -i -e "s#@@HKDOTTIMER_D@@#dist/common/systemd/scylla-housekeeping-daily.timer /lib/systemd/system#g" debian/scylla-server.install
@@ -237,6 +243,9 @@ fi
if [ "$TARGET" != "trusty" ]; then
cp dist/common/systemd/scylla-server.service.in debian/scylla-server.service
sed -i -e "s#@@SYSCONFDIR@@#/etc/default#g" debian/scylla-server.service
if [ "$TARGET" = "jessie" ]; then
sed -i -e "s#AmbientCapabilities=CAP_SYS_NICE##g" debian/scylla-server.service
fi
cp dist/common/systemd/scylla-housekeeping-daily.service.in debian/scylla-server.scylla-housekeeping-daily.service
sed -i -e "s#@@REPOFILES@@#'/etc/apt/sources.list.d/scylla*.list'#g" debian/scylla-server.scylla-housekeeping-daily.service
cp dist/common/systemd/scylla-housekeeping-restart.service.in debian/scylla-server.scylla-housekeeping-restart.service
@@ -245,16 +254,18 @@ if [ "$TARGET" != "trusty" ]; then
cp dist/common/systemd/node-exporter.service debian/scylla-server.node-exporter.service
fi
cp ./dist/debian/pbuilderrc ~/.pbuilderrc
if [ $NO_CLEAN -eq 0 ]; then
sudo rm -fv /var/cache/pbuilder/scylla-server-$TARGET.tgz
sudo -E DIST=$TARGET /usr/sbin/pbuilder clean
sudo -E DIST=$TARGET /usr/sbin/pbuilder create --allow-untrusted
sudo DIST=$TARGET /usr/sbin/pbuilder clean --configfile ./dist/debian/pbuilderrc
sudo DIST=$TARGET /usr/sbin/pbuilder create --configfile ./dist/debian/pbuilderrc --allow-untrusted
fi
sudo -E DIST=$TARGET /usr/sbin/pbuilder update --allow-untrusted
if [ $JOBS -ne 0 ]; then
DEB_BUILD_OPTIONS="parallel=$JOBS"
fi
sudo -H DIST=$TARGET /usr/sbin/pbuilder update --configfile ./dist/debian/pbuilderrc --allow-untrusted
if [ "$TARGET" = "trusty" ] || [ "$TARGET" = "xenial" ] || [ "$TARGET" = "yakkety" ] || [ "$TARGET" = "zesty" ] || [ "$TARGET" = "artful" ] || [ "$TARGET" = "bionic" ]; then
sudo -E DIST=$TARGET /usr/sbin/pbuilder execute --save-after-exec dist/debian/ubuntu_enable_ppa.sh
sudo DIST=$TARGET /usr/sbin/pbuilder execute --configfile ./dist/debian/pbuilderrc --save-after-exec dist/debian/ubuntu_enable_ppa.sh
elif [ "$TARGET" = "jessie" ] || [ "$TARGET" = "stretch" ]; then
sudo -E DIST=$TARGET /usr/sbin/pbuilder execute --save-after-exec dist/debian/debian_install_gpgkey.sh
sudo DIST=$TARGET /usr/sbin/pbuilder execute --configfile ./dist/debian/pbuilderrc --save-after-exec dist/debian/debian_install_gpgkey.sh
fi
sudo -E DIST=$TARGET pdebuild --buildresult build/debs
sudo -H DIST=$TARGET DEB_BUILD_OPTIONS=$DEB_BUILD_OPTIONS pdebuild --configfile ./dist/debian/pbuilderrc --buildresult build/debs

View File

@@ -1,12 +1,13 @@
#!/usr/bin/make -f
export PYBUILD_DISABLE=1
jobs := $(shell echo $$DEB_BUILD_OPTIONS | sed -r "s/.*parallel=([0-9]+).*/-j\1/")
override_dh_auto_configure:
./configure.py --enable-dpdk --mode=release --static-thrift --static-boost --static-yaml-cpp --compiler=@@COMPILER@@ --cflags="-I/opt/scylladb/include -L/opt/scylladb/lib/x86-linux-gnu/" --ldflags="-Wl,-rpath=/opt/scylladb/lib"
./configure.py --with=scylla --with=iotune --enable-dpdk --mode=release --static-thrift --static-boost --static-yaml-cpp --compiler=@@COMPILER@@ --cflags="-I/opt/scylladb/include -L/opt/scylladb/lib/x86-linux-gnu/" --ldflags="-Wl,-rpath=/opt/scylladb/lib"
override_dh_auto_build:
PATH="/opt/scylladb/bin:$$PATH" ninja
PATH="/opt/scylladb/bin:$$PATH" ninja $(jobs)
override_dh_auto_clean:
rm -rf build/release seastar/build

View File

@@ -26,7 +26,7 @@ ADD commandlineparser.py /commandlineparser.py
ADD docker-entrypoint.py /docker-entrypoint.py
# Install Scylla:
RUN curl http://downloads.scylladb.com/rpm/unstable/centos/master/latest/scylla.repo -o /etc/yum.repos.d/scylla.repo && \
RUN curl http://downloads.scylladb.com/rpm/centos/scylla-2.2.repo -o /etc/yum.repos.d/scylla.repo && \
yum -y install epel-release && \
yum -y clean expire-cache && \
yum -y update && \

View File

@@ -9,7 +9,8 @@ def parse():
parser.add_argument('--cpuset', default=None, help="e.g. --cpuset 0-3 for the first four CPUs")
parser.add_argument('--smp', default=None, help="e.g --smp 2 to use two CPUs")
parser.add_argument('--memory', default=None, help="e.g. --memory 1G to use 1 GB of RAM")
parser.add_argument('--overprovisioned', default='0', choices=['0', '1'], help="run in overprovisioned environment")
parser.add_argument('--overprovisioned', default=None, choices=['0', '1'],
help="run in overprovisioned environment. By default it will run in overprovisioned mode unless --cpuset is specified")
parser.add_argument('--listen-address', default=None, dest='listenAddress')
parser.add_argument('--broadcast-address', default=None, dest='broadcastAddress')
parser.add_argument('--broadcast-rpc-address', default=None, dest='broadcastRpcAddress')

View File

@@ -53,7 +53,7 @@ class ScyllaSetup:
args += [ "--memory %s" % self._memory ]
if self._smp is not None:
args += [ "--smp %s" % self._smp ]
if self._overprovisioned == "1":
if self._overprovisioned == "1" or (self._overprovisioned is None and self._cpuset is None):
args += [ "--overprovisioned" ]
if self._listenAddress is None:

View File

@@ -7,8 +7,12 @@ Group: Applications/Databases
License: AGPLv3
URL: http://www.scylladb.com/
Source0: %{name}-@@VERSION@@-@@RELEASE@@.tar
Requires: scylla-server = @@VERSION@@ scylla-jmx = @@VERSION@@ scylla-tools = @@VERSION@@ scylla-tools-core = @@VERSION@@ scylla-kernel-conf = @@VERSION@@ scylla-libgcc72 scylla-libstdc++72
Requires: scylla-server = @@VERSION@@ scylla-jmx = @@VERSION@@ scylla-tools = @@VERSION@@ scylla-tools-core = @@VERSION@@ scylla-kernel-conf = @@VERSION@@ scylla-libgcc73 scylla-libstdc++73
Obsoletes: scylla-server < 1.1
Obsoletes: scylla-libgcc72
Obsoletes: scylla-libstdc++72
Provides: scylla-libgcc72
Provides: scylla-libstdc++72
%description
Scylla is a highly scalable, eventually consistent, distributed,
@@ -52,7 +56,7 @@ License: AGPLv3
URL: http://www.scylladb.com/
BuildRequires: libaio-devel libstdc++-devel cryptopp-devel hwloc-devel numactl-devel libpciaccess-devel libxml2-devel zlib-devel thrift-devel yaml-cpp-devel yaml-cpp-static lz4-devel snappy-devel jsoncpp-devel systemd-devel xz-devel pcre-devel elfutils-libelf-devel bzip2-devel keyutils-libs-devel xfsprogs-devel make gnutls-devel systemd-devel lksctp-tools-devel protobuf-devel protobuf-compiler libunwind-devel systemtap-sdt-devel ninja-build cmake python ragel grep kernel-headers
%{?fedora:BuildRequires: boost-devel antlr3-tool antlr3-C++-devel python3 gcc-c++ libasan libubsan python3-pyparsing dnf-yum}
%{?rhel:BuildRequires: scylla-libstdc++72-static scylla-boost163-devel scylla-boost163-static scylla-antlr35-tool scylla-antlr35-C++-devel python34 scylla-gcc72-c++, scylla-python34-pyparsing20}
%{?rhel:BuildRequires: scylla-libstdc++73-static scylla-boost163-devel scylla-boost163-static scylla-antlr35-tool scylla-antlr35-C++-devel python34 scylla-gcc73-c++, scylla-python34-pyparsing20}
Requires: scylla-conf systemd-libs hwloc collectd PyYAML python-urwid pciutils pyparsing python-requests curl util-linux python-setuptools pciutils python3-pyudev mdadm xfsprogs
%{?rhel:Requires: python34 python34-PyYAML kernel >= 3.10.0-514}
%{?fedora:Requires: python3 python3-PyYAML}
@@ -86,11 +90,11 @@ cflags="--cflags=${defines[*]}"
%define is_housekeeping_conf %( if @@HOUSEKEEPING_CONF@@; then echo "1" ; else echo "0"; fi )
%if 0%{?fedora}
./configure.py %{?configure_opt} --mode=release "$cflags"
./configure.py %{?configure_opt} --with=scylla --with=iotune --mode=release "$cflags"
%endif
%if 0%{?rhel}
. /etc/profile.d/scylla.sh
python3.4 ./configure.py %{?configure_opt} --mode=release "$cflags" --static-boost --static-yaml-cpp --compiler=/opt/scylladb/bin/g++-7.2 --python python3.4 --ldflag=-Wl,-rpath=/opt/scylladb/lib64
python3.4 ./configure.py %{?configure_opt} --with=scylla --with=iotune --mode=release "$cflags" --static-boost --static-yaml-cpp --compiler=/opt/scylladb/bin/g++-7.3 --python python3.4 --ldflag=-Wl,-rpath=/opt/scylladb/lib64
%endif
ninja-build %{?_smp_mflags} build/release/scylla build/release/iotune
cp dist/common/systemd/scylla-server.service.in build/scylla-server.service
@@ -109,9 +113,6 @@ mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/security/limits.d/
mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/collectd.d/
mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/scylla/
mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/scylla.d/
%if 0%{?rhel}
mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/modprobe.d/
%endif
mkdir -p $RPM_BUILD_ROOT%{_sysctldir}/
mkdir -p $RPM_BUILD_ROOT%{_docdir}/scylla/
mkdir -p $RPM_BUILD_ROOT%{_unitdir}
@@ -122,9 +123,6 @@ install -m644 dist/common/limits.d/scylla.conf $RPM_BUILD_ROOT%{_sysconfdir}/sec
install -m644 dist/common/collectd.d/scylla.conf $RPM_BUILD_ROOT%{_sysconfdir}/collectd.d/
install -m644 dist/common/scylla.d/*.conf $RPM_BUILD_ROOT%{_sysconfdir}/scylla.d/
install -m644 dist/common/sysctl.d/*.conf $RPM_BUILD_ROOT%{_sysctldir}/
%if 0%{?rhel}
install -m644 dist/common/modprobe.d/*.conf $RPM_BUILD_ROOT%{_sysconfdir}/modprobe.d/
%endif
install -d -m755 $RPM_BUILD_ROOT%{_sysconfdir}/scylla
install -m644 conf/scylla.yaml $RPM_BUILD_ROOT%{_sysconfdir}/scylla/
install -m644 conf/cassandra-rackdc.properties $RPM_BUILD_ROOT%{_sysconfdir}/scylla/
@@ -317,18 +315,9 @@ if Scylla is the main application on your server and you wish to optimize its la
# We cannot use the sysctl_apply rpm macro because it is not present in 7.0
# following is a "manual" expansion
/usr/lib/systemd/systemd-sysctl 99-scylla-sched.conf >/dev/null 2>&1 || :
# Write modprobe.d params when module already loaded
%if 0%{?rhel}
if [ -e /sys/module/raid0/parameters/devices_discard_performance ]; then
echo Y > /sys/module/raid0/parameters/devices_discard_performance
fi
%endif
%files kernel-conf
%defattr(-,root,root)
%if 0%{?rhel}
%config(noreplace) %{_sysconfdir}/modprobe.d/*.conf
%endif
%{_sysctldir}/*.conf
%changelog

View File

@@ -77,10 +77,9 @@ $ docker run --name some-scylla --volume /var/lib/scylla:/var/lib/scylla -d scyl
## Configuring resource limits
Scylla utilizes all CPUs and all memory by default.
To configure resource limits for your Docker container, you can use the `--smp`, `--memory`, and `--cpuset` command line options documented in the section "Command-line options".
If you run multiple Scylla instances on the same machine, it is highly recommended that you enable the `--overprovisioned` command line option, which enables certain optimizations for Scylla to run efficiently in an overprovisioned environment.
The Scylla docker image defaults to running on overprovisioned mode and won't apply any CPU pinning optimizations, which it normally does in non-containerized environments.
For better performance, it is recommended to configure resource limits for your Docker container using the `--smp`, `--memory`, and `--cpuset` command line options, as well as
disabling the overprovisioned flag as documented in the section "Command-line options".
## Restart Scylla
@@ -163,12 +162,13 @@ $ docker run --name some-scylla -d scylladb/scylla --memory 4G
### `--overprovisioned ENABLE`
The `--overprovisioned` command line option enables or disables optimizations for running Scylla in an overprovisioned environment.
If no `--overprovisioned` option is specified, Scylla defaults to running with optimizations *disabled*.
If no `--overprovisioned` option is specified, Scylla defaults to running with optimizations *enabled*. If `--overprovisioned` is
not specified and is left at its default, specifying `--cpuset` will automatically disable `--overprovisioned`
For example, to enable optimizations for running in an overprovisioned environment:
For example, to enable optimizations for running in an statically partitioned environment:
```console
$ docker run --name some-scylla -d scylladb/scylla --overprovisioned 1
$ docker run --name some-scylla -d scylladb/scylla --overprovisioned 0
```
### `--cpuset CPUSET`

View File

@@ -183,10 +183,7 @@ flat_mutation_reader make_delegating_reader(flat_mutation_reader& r) {
flat_mutation_reader make_forwardable(flat_mutation_reader m) {
class reader : public flat_mutation_reader::impl {
flat_mutation_reader _underlying;
position_range _current = {
position_in_partition(position_in_partition::partition_start_tag_t()),
position_in_partition(position_in_partition::after_static_row_tag_t())
};
position_range _current;
mutation_fragment_opt _next;
// When resolves, _next is engaged or _end_of_stream is set.
future<> ensure_next() {
@@ -201,7 +198,10 @@ flat_mutation_reader make_forwardable(flat_mutation_reader m) {
});
}
public:
reader(flat_mutation_reader r) : impl(r.schema()), _underlying(std::move(r)) { }
reader(flat_mutation_reader r) : impl(r.schema()), _underlying(std::move(r)), _current({
position_in_partition(position_in_partition::partition_start_tag_t()),
position_in_partition(position_in_partition::after_static_row_tag_t())
}) { }
virtual future<> fill_buffer(db::timeout_clock::time_point timeout) override {
return repeat([this] {
if (is_buffer_full()) {

View File

@@ -487,7 +487,9 @@ flat_mutation_reader transform(flat_mutation_reader r, T t) {
return _reader.fast_forward_to(pr);
}
virtual future<> fast_forward_to(position_range pr, db::timeout_clock::time_point timeout) override {
throw std::bad_function_call();
forward_buffer_to(pr.start());
_end_of_stream = false;
return _reader.fast_forward_to(std::move(pr), timeout);
}
virtual size_t buffer_size() const override {
return flat_mutation_reader::impl::buffer_size() + _reader.buffer_size();

View File

@@ -478,7 +478,8 @@ future<> gossiper::apply_state_locally(std::map<inet_address, endpoint_state> ma
int local_generation = local_ep_state_ptr.get_heart_beat_state().get_generation();
int remote_generation = remote_state.get_heart_beat_state().get_generation();
logger.trace("{} local generation {}, remote generation {}", ep, local_generation, remote_generation);
if (local_generation != 0 && remote_generation > local_generation + MAX_GENERATION_DIFFERENCE) {
// A node was removed with nodetool removenode can have a generation of 2
if (local_generation > 2 && remote_generation > local_generation + MAX_GENERATION_DIFFERENCE) {
// assume some peer has corrupted memory and is broadcasting an unbelievable generation about another peer (or itself)
logger.warn("received an invalid gossip generation for peer {}; local generation = {}, received generation = {}",
ep, local_generation, remote_generation);
@@ -853,6 +854,7 @@ int gossiper::get_max_endpoint_state_version(endpoint_state state) {
// Runs inside seastar::async context
void gossiper::evict_from_membership(inet_address endpoint) {
auto permit = lock_endpoint(endpoint).get0();
_unreachable_endpoints.erase(endpoint);
container().invoke_on_all([endpoint] (auto& g) {
g.endpoint_state_map.erase(endpoint);
@@ -1003,7 +1005,7 @@ future<> gossiper::assassinate_endpoint(sstring address) {
logger.warn("Assassinating {} via gossip", endpoint);
if (es) {
auto& ss = service::get_local_storage_service();
auto tokens = ss.get_token_metadata().get_tokens(endpoint);
tokens = ss.get_token_metadata().get_tokens(endpoint);
if (tokens.empty()) {
logger.warn("Unable to calculate tokens for {}. Will use a random one", address);
throw std::runtime_error(sprint("Unable to calculate tokens for %s", endpoint));

View File

@@ -721,6 +721,10 @@ public:
static const compound& get_compound_type(const schema& s) {
return s.clustering_key_prefix_type();
}
static clustering_key_prefix_view make_empty() {
return { bytes_view() };
}
};
class clustering_key_prefix : public prefix_compound_wrapper<clustering_key_prefix, clustering_key_prefix_view, clustering_key> {

View File

@@ -119,9 +119,17 @@ insert_token_range_to_sorted_container_while_unwrapping(
const dht::token& tok,
dht::token_range_vector& ret) {
if (prev_tok < tok) {
ret.emplace_back(
dht::token_range::bound(prev_tok, false),
dht::token_range::bound(tok, true));
auto pos = ret.end();
if (!ret.empty() && !std::prev(pos)->end()) {
// We inserted a wrapped range (a, b] previously as
// (-inf, b], (a, +inf). So now we insert in the next-to-last
// position to keep the last range (a, +inf) at the end.
pos = std::prev(pos);
}
ret.insert(pos,
dht::token_range{
dht::token_range::bound(prev_tok, false),
dht::token_range::bound(tok, true)});
} else {
ret.emplace_back(
dht::token_range::bound(prev_tok, false),

View File

@@ -100,7 +100,6 @@ future<> ec2_multi_region_snitch::gossiper_starting() {
// Note: currently gossiper "main" instance always runs on CPU0 therefore
// this function will be executed on CPU0 only.
//
ec2_snitch::gossiper_starting();
using namespace gms;
auto& g = get_local_gossiper();

13
main.cc
View File

@@ -305,6 +305,12 @@ int main(int ac, char** av) {
return 0;
}
bpo::options_description deprecated("Deprecated options - ignored");
deprecated.add_options()
("background-writer-scheduling-quota", bpo::value<float>())
("auto-adjust-flush-quota", bpo::value<bool>());
app.get_options_description().add(deprecated);
// TODO : default, always read?
init("options-file", bpo::value<sstring>(), "configuration file (i.e. <SCYLLA_HOME>/conf/scylla.yaml)");
cfg->add_options(init);
@@ -331,6 +337,13 @@ int main(int ac, char** av) {
sm::make_gauge("current_version", sm::description("Current ScyllaDB version."), { sm::label_instance("version", scylla_version()), sm::shard_label("") }, [] { return 0; })
});
const std::unordered_set<sstring> ignored_options = { "auto-adjust-flush-quota", "background-writer-scheduling-quota" };
for (auto& opt: ignored_options) {
if (opts.count(opt)) {
print("%s option ignored (deprecated)\n", opt);
}
}
// Check developer mode before even reading the config file, because we may not be
// able to read it if we need to disable strict dma mode.
// We'll redo this later and apply it to all reactors.

View File

@@ -1089,7 +1089,7 @@ row::apply_monotonically(const column_definition& column, atomic_cell_or_collect
if (_type == storage_type::vector && id < max_vector_size) {
if (id >= _storage.vector.v.size()) {
_storage.vector.v.resize(id);
_storage.vector.v.emplace_back(cell_and_hash{std::move(value), std::move(hash)});
_storage.vector.v.emplace_back(std::move(value), std::move(hash));
_storage.vector.present.set(id);
_size++;
} else if (auto& cell_and_hash = _storage.vector.v[id]; !bool(cell_and_hash.cell)) {
@@ -1753,9 +1753,10 @@ void mutation_querier::query_static_row(const row& r, tombstone current_tombston
} else if (_short_reads_allowed) {
seastar::measuring_output_stream stream;
ser::qr_partition__static_row__cells<seastar::measuring_output_stream> out(stream, { });
auto start = stream.size();
get_compacted_row_slice(_schema, slice, column_kind::static_column,
r, slice.static_columns, _static_cells_wr);
_memory_accounter.update(stream.size());
r, slice.static_columns, out);
_memory_accounter.update(stream.size() - start);
}
if (_pw.requested_digest()) {
max_timestamp max_ts{_pw.last_modified()};
@@ -1816,8 +1817,9 @@ stop_iteration mutation_querier::consume(clustering_row&& cr, row_tombstone curr
} else if (_short_reads_allowed) {
seastar::measuring_output_stream stream;
ser::qr_partition__rows<seastar::measuring_output_stream> out(stream, { });
auto start = stream.size();
write_row(out);
stop = _memory_accounter.update_and_check(stream.size());
stop = _memory_accounter.update_and_check(stream.size() - start);
}
_live_clustering_rows++;

View File

@@ -74,6 +74,17 @@ using cell_hash_opt = seastar::optimized_optional<cell_hash>;
struct cell_and_hash {
atomic_cell_or_collection cell;
mutable cell_hash_opt hash;
cell_and_hash() = default;
cell_and_hash(cell_and_hash&&) noexcept = default;
cell_and_hash& operator=(cell_and_hash&&) noexcept = default;
cell_and_hash(const cell_and_hash&) = default;
cell_and_hash& operator=(const cell_and_hash&) = default;
cell_and_hash(atomic_cell_or_collection&& cell, cell_hash_opt hash)
: cell(std::move(cell))
, hash(hash)
{ }
};
//

View File

@@ -273,6 +273,11 @@ public:
return is_partition_end() || (_ck && _ck->is_empty(s) && _bound_weight > 0);
}
bool is_before_all_clustered_rows(const schema& s) const {
return _type < partition_region::clustered
|| (_type == partition_region::clustered && _ck->is_empty(s) && _bound_weight < 0);
}
template<typename Hasher>
void feed_hash(Hasher& hasher, const schema& s) const {
::feed_hash(hasher, _bound_weight);

View File

@@ -119,7 +119,7 @@ bool querier::matches(const dht::partition_range& range) const {
bound_eq(qr.start(), range.start()) || bound_eq(qr.end(), range.end());
}
querier::can_use querier::can_be_used_for_page(emit_only_live_rows only_live, const schema& s,
querier::can_use querier::can_be_used_for_page(emit_only_live_rows only_live, const ::schema& s,
const dht::partition_range& range, const query::partition_slice& slice) const {
if (only_live != emit_only_live_rows(std::holds_alternative<lw_shared_ptr<compact_for_data_query_state>>(_compaction_state))) {
return can_use::no_emit_only_live_rows_mismatch;
@@ -152,34 +152,33 @@ const size_t querier_cache::max_queriers_memory_usage = memory::stats().total_me
void querier_cache::scan_cache_entries() {
const auto now = lowres_clock::now();
auto it = _meta_entries.begin();
const auto end = _meta_entries.end();
auto it = _entries.begin();
const auto end = _entries.end();
while (it != end && it->is_expired(now)) {
if (*it) {
++_stats.time_based_evictions;
}
it = _meta_entries.erase(it);
_stats.population = _entries.size();
++_stats.time_based_evictions;
--_stats.population;
it = _entries.erase(it);
}
}
querier_cache::entries::iterator querier_cache::find_querier(utils::UUID key, const dht::partition_range& range, tracing::trace_state_ptr trace_state) {
const auto queriers = _entries.equal_range(key);
const auto queriers = _index.equal_range(key);
if (queriers.first == _entries.end()) {
if (queriers.first == _index.end()) {
tracing::trace(trace_state, "Found no cached querier for key {}", key);
return _entries.end();
}
const auto it = std::find_if(queriers.first, queriers.second, [&] (const std::pair<const utils::UUID, entry>& elem) {
return elem.second.get().matches(range);
const auto it = std::find_if(queriers.first, queriers.second, [&] (const entry& e) {
return e.value().matches(range);
});
if (it == queriers.second) {
tracing::trace(trace_state, "Found cached querier(s) for key {} but none matches the query range {}", key, range);
return _entries.end();
}
tracing::trace(trace_state, "Found cached querier for key {} and range {}", key, range);
return it;
return it->pos();
}
querier_cache::querier_cache(std::chrono::seconds entry_ttl)
@@ -199,8 +198,7 @@ void querier_cache::insert(utils::UUID key, querier&& q, tracing::trace_state_pt
tracing::trace(trace_state, "Caching querier with key {}", key);
auto memory_usage = boost::accumulate(
_entries | boost::adaptors::map_values | boost::adaptors::transformed(std::mem_fn(&querier_cache::entry::memory_usage)), size_t(0));
auto memory_usage = boost::accumulate(_entries | boost::adaptors::transformed(std::mem_fn(&entry::memory_usage)), size_t(0));
// We add the memory-usage of the to-be added querier to the memory-usage
// of all the cached queriers. We now need to makes sure this number is
@@ -210,20 +208,20 @@ void querier_cache::insert(utils::UUID key, querier&& q, tracing::trace_state_pt
memory_usage += q.memory_usage();
if (memory_usage >= max_queriers_memory_usage) {
auto it = _meta_entries.begin();
const auto end = _meta_entries.end();
auto it = _entries.begin();
const auto end = _entries.end();
while (it != end && memory_usage >= max_queriers_memory_usage) {
if (*it) {
++_stats.memory_based_evictions;
memory_usage -= it->get_entry().memory_usage();
}
it = _meta_entries.erase(it);
++_stats.memory_based_evictions;
memory_usage -= it->memory_usage();
--_stats.population;
it = _entries.erase(it);
}
}
const auto it = _entries.emplace(key, entry::param{std::move(q), _entry_ttl}).first;
_meta_entries.emplace_back(_entries, it);
_stats.population = _entries.size();
auto& e = _entries.emplace_back(key, std::move(q), lowres_clock::now() + _entry_ttl);
e.set_pos(--_entries.end());
_index.insert(e);
++_stats.population;
}
querier querier_cache::lookup(utils::UUID key,
@@ -240,9 +238,9 @@ querier querier_cache::lookup(utils::UUID key,
return create_fun();
}
auto q = std::move(it->second).get();
auto q = std::move(*it).value();
_entries.erase(it);
_stats.population = _entries.size();
--_stats.population;
const auto can_be_used = q.can_be_used_for_page(only_live, s, range, slice);
if (can_be_used == querier::can_use::yes) {
@@ -265,18 +263,24 @@ bool querier_cache::evict_one() {
return false;
}
auto it = _meta_entries.begin();
const auto end = _meta_entries.end();
++_stats.resource_based_evictions;
--_stats.population;
_entries.pop_front();
return true;
}
void querier_cache::evict_all_for_table(const utils::UUID& schema_id) {
auto it = _entries.begin();
const auto end = _entries.end();
while (it != end) {
const auto is_live = bool(*it);
it = _meta_entries.erase(it);
_stats.population = _entries.size();
if (is_live) {
++_stats.resource_based_evictions;
return true;
if (it->schema().id() == schema_id) {
--_stats.population;
it = _entries.erase(it);
} else {
++it;
}
}
return false;
}
querier_cache_context::querier_cache_context(querier_cache& cache, utils::UUID key, bool is_first_page)

View File

@@ -24,7 +24,8 @@
#include "mutation_compactor.hh"
#include "mutation_reader.hh"
#include <seastar/core/weak_ptr.hh>
#include <boost/intrusive/set.hpp>
#include <variant>
/// One-stop object for serving queries.
@@ -207,6 +208,9 @@ public:
return _reader.buffer_size();
}
schema_ptr schema() const {
return _schema;
}
};
/// Special-purpose cache for saving queriers between pages.
@@ -261,75 +265,65 @@ public:
};
private:
class entry : public weakly_referencable<entry> {
querier _querier;
lowres_clock::time_point _expires;
public:
// Since entry cannot be moved and unordered_map::emplace can pass only
// a single param to it's mapped-type we need to force a single-param
// constructor for entry. Oh C++...
struct param {
querier q;
std::chrono::seconds ttl;
};
class entry : public boost::intrusive::set_base_hook<boost::intrusive::link_mode<boost::intrusive::auto_unlink>> {
// Self reference so that we can remove the entry given an `entry&`.
std::list<entry>::iterator _pos;
const utils::UUID _key;
const lowres_clock::time_point _expires;
querier _value;
explicit entry(param p)
: _querier(std::move(p.q))
, _expires(lowres_clock::now() + p.ttl) {
public:
entry(utils::UUID key, querier q, lowres_clock::time_point expires)
: _key(key)
, _expires(expires)
, _value(std::move(q)) {
}
std::list<entry>::iterator pos() const {
return _pos;
}
void set_pos(std::list<entry>::iterator pos) {
_pos = pos;
}
const utils::UUID& key() const {
return _key;
}
const ::schema& schema() const {
return *_value.schema();
}
bool is_expired(const lowres_clock::time_point& now) const {
return _expires <= now;
}
const querier& get() const & {
return _querier;
}
querier&& get() && {
return std::move(_querier);
}
size_t memory_usage() const {
return _querier.memory_usage();
return _value.memory_usage();
}
const querier& value() const & {
return _value;
}
querier value() && {
return std::move(_value);
}
};
using entries = std::unordered_map<utils::UUID, entry>;
class meta_entry {
entries& _entries;
weak_ptr<entry> _entry_ptr;
entries::iterator _entry_it;
public:
meta_entry(entries& e, entries::iterator it)
: _entries(e)
, _entry_ptr(it->second.weak_from_this())
, _entry_it(it) {
}
~meta_entry() {
if (_entry_ptr) {
_entries.erase(_entry_it);
}
}
bool is_expired(const lowres_clock::time_point& now) const {
return !_entry_ptr || _entry_ptr->is_expired(now);
}
explicit operator bool() const {
return bool(_entry_ptr);
}
const entry& get_entry() const {
return *_entry_ptr;
}
struct key_of_entry {
using type = utils::UUID;
const type& operator()(const entry& e) { return e.key(); }
};
using entries = std::list<entry>;
using index = boost::intrusive::multiset<entry, boost::intrusive::key_of_value<key_of_entry>,
boost::intrusive::constant_time_size<false>>;
private:
entries _entries;
std::list<meta_entry> _meta_entries;
index _index;
timer<lowres_clock> _expiry_timer;
std::chrono::seconds _entry_ttl;
stats _stats;
@@ -382,6 +376,11 @@ public:
/// is empty).
bool evict_one();
/// Evict all queriers that belong to a table.
///
/// Should be used when dropping a table.
void evict_all_for_table(const utils::UUID& schema_id);
const stats& get_stats() const {
return _stats;
}

View File

@@ -55,7 +55,7 @@ if [ -f /etc/debian_version ]; then
cp /etc/hosts /etc/hosts.orig
echo 127.0.0.1 `hostname` >> /etc/hosts
if [ "$REPO_FOR_INSTALL" != "" ]; then
curl -o /etc/apt/sources.list.d/scylla_install.list $REPO_FOR_INSTALL
curl -L -o /etc/apt/sources.list.d/scylla_install.list $REPO_FOR_INSTALL
fi
apt-get -o Acquire::AllowInsecureRepositories=true \
-o Acquire::AllowDowngradeToInsecureRepositories=true update
@@ -78,13 +78,13 @@ if [ -f /etc/debian_version ]; then
rm /usr/sbin/policy-rc.d
rm /etc/apt/sources.list.d/scylla_install.list
if [ "$REPO_FOR_UPDATE" != "" ]; then
curl -o /etc/apt/sources.list.d/scylla.list $REPO_FOR_UPDATE
curl -L -o /etc/apt/sources.list.d/scylla.list $REPO_FOR_UPDATE
fi
apt-get -o Acquire::AllowInsecureRepositories=true \
-o Acquire::AllowDowngradeToInsecureRepositories=true update
else
if [ "$REPO_FOR_INSTALL" != "" ]; then
curl -o /etc/yum.repos.d/scylla_install.repo $REPO_FOR_INSTALL
curl -L -o /etc/yum.repos.d/scylla_install.repo $REPO_FOR_INSTALL
fi
if [ "$ID" = "centos" ]; then
@@ -104,6 +104,6 @@ else
rm /etc/yum.repos.d/scylla_install.repo
if [ "$REPO_FOR_UPDATE" != "" ]; then
curl -o /etc/yum.repos.d/scylla.repo $REPO_FOR_UPDATE
curl -L -o /etc/yum.repos.d/scylla.repo $REPO_FOR_UPDATE
fi
fi

View File

@@ -87,10 +87,7 @@ def get_repo_file(dir):
for name in files:
with open(name, 'r') as myfile:
for line in myfile:
match = re.search(".*http.?://.*/scylladb/([^/\s]+)/deb/([^/\s]+)[\s/].*", line)
if match:
return match.group(2), match.group(1)
match = re.search(".*http.?://.*/scylladb/([^/]+)/rpm/[^/]+/([^/\s]+)/.*", line)
match = re.search(".*http.?://repositories.*/scylladb/([^/\s]+)/.*/([^/\s]+)/scylladb-.*", line)
if match:
return match.group(2), match.group(1)
return None, None

Submodule seastar updated: bcfbe0c3f7...88cb58cfbf

View File

@@ -181,12 +181,9 @@ public:
, _is_thrift(false)
{}
// `nullptr` for internal instances.
auth::service* get_auth_service() {
return _auth_service;
}
// See above.
///
/// `nullptr` for internal instances.
///
const auth::service* get_auth_service() const {
return _auth_service;
}

View File

@@ -132,10 +132,15 @@ future<> migration_manager::schedule_schema_pull(const gms::inet_address& endpoi
return make_ready_future<>();
}
bool migration_manager::is_ready_for_bootstrap() {
bool migration_manager::have_schema_agreement() {
const auto known_endpoints = gms::get_local_gossiper().endpoint_state_map;
if (known_endpoints.size() == 1) {
// Us.
return true;
}
auto our_version = get_local_storage_proxy().get_db().local().get_version();
bool match = false;
for (auto& x : gms::get_local_gossiper().endpoint_state_map) {
for (auto& x : known_endpoints) {
auto& endpoint = x.first;
auto& eps = x.second;
if (endpoint == utils::fb_utilities::get_broadcast_address() || !eps.is_alive()) {

View File

@@ -144,7 +144,10 @@ public:
future<> stop();
bool is_ready_for_bootstrap();
/**
* Known peers in the cluster have the same schema version as us.
*/
bool have_schema_agreement();
void init_messaging_service();
private:

View File

@@ -144,7 +144,11 @@ future<lowres_clock::duration> cache_hitrate_calculator::recalculate_hitrates()
return _db.invoke_on_all([this, rates = std::move(rates), cpuid = engine().cpu_id()] (database& db) {
sstring gstate;
for (auto& cf : db.get_column_families() | boost::adaptors::filtered(non_system_filter)) {
stat s = rates.at(cf.first);
auto it = rates.find(cf.first);
if (it == rates.end()) { // a table may be added before map/reduce compltes and this code runs
continue;
}
stat s = it->second;
float rate = 0;
if (s.h) {
rate = s.h / (s.h + s.m);

View File

@@ -83,7 +83,7 @@ private:
_last_replicas = state->get_last_replicas();
} else {
// Reusing readers is currently only supported for singular queries.
if (_ranges.front().is_singular()) {
if (!_ranges.empty() && query::is_single_partition(_ranges.front())) {
_cmd->query_uuid = utils::make_random_uuid();
}
_cmd->is_first_page = true;

View File

@@ -3220,9 +3220,22 @@ storage_proxy::query_partition_key_range(lw_shared_ptr<query::read_command> cmd,
slogger.debug("Estimated result rows per range: {}; requested rows: {}, ranges.size(): {}; concurrent range requests: {}",
result_rows_per_range, cmd->row_limit, ranges.size(), concurrency_factor);
// The call to `query_partition_key_range_concurrent()` below
// updates `cmd` directly when processing the results. Under
// some circumstances, when the query executes without deferring,
// this updating will happen before the lambda object is constructed
// and hence the updates will be visible to the lambda. This will
// result in the merger below trimming the results according to the
// updated (decremented) limits and causing the paging logic to
// declare the query exhausted due to the non-full page. To avoid
// this save the original values of the limits here and pass these
// to the lambda below.
const auto row_limit = cmd->row_limit;
const auto partition_limit = cmd->partition_limit;
return query_partition_key_range_concurrent(timeout, std::move(results), cmd, cl, ranges.begin(), std::move(ranges), concurrency_factor,
std::move(trace_state), cmd->row_limit, cmd->partition_limit)
.then([row_limit = cmd->row_limit, partition_limit = cmd->partition_limit](std::vector<foreign_ptr<lw_shared_ptr<query::result>>> results) {
.then([row_limit, partition_limit](std::vector<foreign_ptr<lw_shared_ptr<query::result>>> results) {
query::result_merger merger(row_limit, partition_limit);
merger.reserve(results.size());
@@ -3585,6 +3598,7 @@ future<> storage_proxy::truncate_blocking(sstring keyspace, sstring cfname) {
std::rethrow_exception(ep);
} catch (rpc::timeout_error& e) {
slogger.trace("Truncation of {} timed out: {}", cfname, e.what());
throw;
} catch (...) {
throw;
}
@@ -3817,9 +3831,9 @@ void storage_proxy::init_messaging_service() {
p->_stats.forwarded_mutations += forward.size();
return when_all(
// mutate_locally() may throw, putting it into apply() converts exception to a future.
futurize<void>::apply([timeout, &p, &m, reply_to, src_addr = std::move(src_addr)] () mutable {
futurize<void>::apply([timeout, &p, &m, reply_to, shard, src_addr = std::move(src_addr)] () mutable {
// FIXME: get_schema_for_write() doesn't timeout
return get_schema_for_write(m.schema_version(), std::move(src_addr)).then([&m, &p, timeout] (schema_ptr s) {
return get_schema_for_write(m.schema_version(), netw::messaging_service::msg_addr{reply_to, shard}).then([&m, &p, timeout] (schema_ptr s) {
return p->mutate_locally(std::move(s), m, timeout);
});
}).then([reply_to, shard, response_id, trace_state_ptr] () {

View File

@@ -408,7 +408,7 @@ void storage_service::join_token_ring(int delay) {
}
// if our schema hasn't matched yet, keep sleeping until it does
// (post CASSANDRA-1391 we don't expect this to be necessary very often, but it doesn't hurt to be careful)
while (!get_local_migration_manager().is_ready_for_bootstrap()) {
while (!get_local_migration_manager().have_schema_agreement()) {
set_mode(mode::JOINING, "waiting for schema information to complete", true);
sleep(std::chrono::seconds(1)).get();
}
@@ -437,7 +437,7 @@ void storage_service::join_token_ring(int delay) {
}
// Check the schema and pending range again
while (!get_local_migration_manager().is_ready_for_bootstrap()) {
while (!get_local_migration_manager().have_schema_agreement()) {
set_mode(mode::JOINING, "waiting for schema information to complete", true);
sleep(std::chrono::seconds(1)).get();
}

View File

@@ -609,6 +609,27 @@ class resharding_compaction final : public compaction {
shard_id _shard; // shard of current sstable writer
std::function<shared_sstable(shard_id)> _sstable_creator;
compaction_backlog_tracker _resharding_backlog_tracker;
// Partition count estimation for a shard S:
//
// TE, the total estimated partition count for a shard S, is defined as
// TE = Sum(i = 0...N) { Ei / Si }.
//
// where i is an input sstable that belongs to shard S,
// Ei is the estimated partition count for sstable i,
// Si is the total number of shards that own sstable i.
//
struct estimated_values {
uint64_t estimated_size = 0;
uint64_t estimated_partitions = 0;
};
std::vector<estimated_values> _estimation_per_shard;
private:
// return estimated partitions per sstable for a given shard
uint64_t partitions_per_sstable(shard_id s) const {
uint64_t estimated_sstables = std::max(uint64_t(1), uint64_t(ceil(double(_estimation_per_shard[s].estimated_size) / _max_sstable_size)));
return ceil(double(_estimation_per_shard[s].estimated_partitions) / estimated_sstables);
}
public:
resharding_compaction(std::vector<shared_sstable> sstables, column_family& cf, std::function<shared_sstable(shard_id)> creator,
uint64_t max_sstable_size, uint32_t sstable_level)
@@ -616,10 +637,19 @@ public:
, _output_sstables(smp::count)
, _sstable_creator(std::move(creator))
, _resharding_backlog_tracker(std::make_unique<resharding_backlog_tracker>())
, _estimation_per_shard(smp::count)
{
cf.get_compaction_manager().register_backlog_tracker(_resharding_backlog_tracker);
for (auto& s : _sstables) {
_resharding_backlog_tracker.add_sstable(s);
for (auto& sst : _sstables) {
_resharding_backlog_tracker.add_sstable(sst);
const auto& shards = sst->get_shards_for_this_sstable();
auto size = sst->bytes_on_disk();
auto estimated_partitions = sst->get_estimated_key_count();
for (auto& s : shards) {
_estimation_per_shard[s].estimated_size += std::max(uint64_t(1), uint64_t(ceil(double(size) / shards.size())));
_estimation_per_shard[s].estimated_partitions += std::max(uint64_t(1), uint64_t(ceil(double(estimated_partitions) / shards.size())));
}
}
_info->type = compaction_type::Reshard;
}
@@ -665,7 +695,7 @@ public:
sstable_writer_config cfg;
cfg.max_sstable_size = _max_sstable_size;
auto&& priority = service::get_local_compaction_priority();
writer.emplace(sst->get_writer(*_cf.schema(), partitions_per_sstable(), cfg, priority, _shard));
writer.emplace(sst->get_writer(*_cf.schema(), partitions_per_sstable(_shard), cfg, priority, _shard));
}
return &*writer;
}

View File

@@ -26,7 +26,7 @@
#include <seastar/core/metrics.hh>
#include "exceptions.hh"
#include <cmath>
#include <boost/algorithm/cxx11/any_of.hpp>
#include <boost/range/algorithm/count_if.hpp>
static logging::logger cmlog("compaction_manager");
@@ -156,14 +156,13 @@ int compaction_manager::trim_to_compact(column_family* cf, sstables::compaction_
}
bool compaction_manager::can_register_weight(column_family* cf, int weight) {
if (_weight_tracker.empty()) {
return true;
}
auto has_cf_ongoing_compaction = [&] {
return boost::algorithm::any_of(_tasks, [&] (const lw_shared_ptr<task>& task) {
auto ret = boost::range::count_if(_tasks, [&] (const lw_shared_ptr<task>& task) {
return task->compacting_cf == cf;
});
// compaction task trying to proceed is already registered in task list,
// so we must check for an additional one.
return ret >= 2;
};
// Only one weight is allowed if parallel compaction is disabled.

View File

@@ -33,6 +33,7 @@
#include "unimplemented.hh"
#include "stdx.hh"
#include "segmented_compress_params.hh"
#include "utils/class_registrator.hh"
namespace sstables {
@@ -299,7 +300,8 @@ size_t local_compression::compress_max_size(size_t input_len) const {
void compression::set_compressor(compressor_ptr c) {
if (c) {
auto& cn = c->name();
unqualified_name uqn(compressor::namespace_prefix, c->name());
const sstring& cn = uqn;
name.value = bytes(cn.begin(), cn.end());
for (auto& p : c->options()) {
if (p.first != compression_parameters::SSTABLE_COMPRESSION) {

View File

@@ -294,6 +294,12 @@ size_tiered_compaction_strategy::get_sstables_for_compaction(column_family& cfs,
return sstables::compaction_descriptor(std::move(most_interesting));
}
// If we are not enforcing min_threshold explicitly, try any pair of SStables in the same tier.
if (!cfs.compaction_enforce_min_threshold() && is_any_bucket_interesting(buckets, 2)) {
std::vector<sstables::shared_sstable> most_interesting = most_interesting_bucket(std::move(buckets), 2, max_threshold);
return sstables::compaction_descriptor(std::move(most_interesting));
}
// if there is no sstable to compact in standard way, try compacting single sstable whose droppable tombstone
// ratio is greater than threshold.
// prefer oldest sstables from biggest size tiers because they will be easier to satisfy conditions for

View File

@@ -215,3 +215,22 @@ SEASTAR_TEST_CASE(test_aggregate_count) {
}
});
}
SEASTAR_TEST_CASE(test_reverse_type_aggregation) {
return do_with_cql_env_thread([&] (auto& e) {
e.execute_cql("CREATE TABLE test(p int, c timestamp, v int, primary key (p, c)) with clustering order by (c desc)").get();
e.execute_cql("INSERT INTO test(p, c, v) VALUES (1, 1, 1)").get();
e.execute_cql("INSERT INTO test(p, c, v) VALUES (1, 2, 1)").get();
{
auto tp = db_clock::from_time_t({ 0 }) + std::chrono::milliseconds(1);
auto msg = e.execute_cql("SELECT min(c) FROM test").get0();
assert_that(msg).is_rows().with_size(1).with_row({{timestamp_type->decompose(tp)}});
}
{
auto tp = db_clock::from_time_t({ 0 }) + std::chrono::milliseconds(2);
auto msg = e.execute_cql("SELECT max(c) FROM test").get0();
assert_that(msg).is_rows().with_size(1).with_row({{timestamp_type->decompose(tp)}});
}
});
}

View File

@@ -28,6 +28,8 @@
#include <seastar/util/defer.hh>
#include "auth/authenticated_user.hh"
#include "auth/permission.hh"
#include "auth/service.hh"
#include "cql3/query_processor.hh"
#include "cql3/role_name.hh"
#include "cql3/role_options.hh"
@@ -262,3 +264,67 @@ SEASTAR_TEST_CASE(revoke_role_restrictions) {
});
}, db_config_with_auth());
}
//
// The creator of a database object is granted all applicable permissions on it.
//
///
/// Grant a user appropriate permissions (with `grant_query`) then create a new database object (with `creation_query`).
/// Verify that the user has been granted all applicable permissions on the new object.
///
static void verify_default_permissions(
cql_test_env& env,
stdx::string_view user,
stdx::string_view grant_query,
stdx::string_view creation_query,
const auth::resource& r) {
create_user_if_not_exists(env, user);
env.execute_cql(sstring(grant_query)).get0();
with_user(env, user, [&env, creation_query] {
env.execute_cql(sstring(creation_query)).get0();
});
const auto default_permissions = auth::get_permissions(
env.local_auth_service(),
auth::authenticated_user(user),
r).get0();
BOOST_REQUIRE_EQUAL(
auth::permissions::to_strings(default_permissions),
auth::permissions::to_strings(r.applicable_permissions()));
}
SEASTAR_TEST_CASE(create_role_default_permissions) {
return do_with_cql_env_thread([](auto&& env) {
verify_default_permissions(
env,
alice,
"GRANT CREATE ON ALL ROLES TO alice",
"CREATE ROLE lord",
auth::make_role_resource("lord"));
}, db_config_with_auth());
}
SEASTAR_TEST_CASE(create_keyspace_default_permissions) {
return do_with_cql_env_thread([](auto&& env) {
verify_default_permissions(
env,
alice,
"GRANT CREATE ON ALL KEYSPACES TO alice",
"CREATE KEYSPACE armies WITH REPLICATION = { 'class': 'SimpleStrategy', 'replication_factor': 1 }",
auth::make_data_resource("armies"));
}, db_config_with_auth());
}
SEASTAR_TEST_CASE(create_table_default_permissions) {
return do_with_cql_env_thread([](auto&& env) {
verify_default_permissions(
env,
alice,
"GRANT CREATE ON KEYSPACE ks TO alice",
"CREATE TABLE orcs (id int PRIMARY KEY, strength int)",
auth::make_data_resource("ks", "orcs"));
}, db_config_with_auth());
}

View File

@@ -2047,10 +2047,9 @@ SEASTAR_TEST_CASE(test_in_restriction) {
assert_that(msg).is_rows().with_size(0);
return e.execute_cql("select r1 from tir where p1 in (2, 0, 2, 1);");
}).then([&e] (shared_ptr<cql_transport::messages::result_message> msg) {
assert_that(msg).is_rows().with_rows({
assert_that(msg).is_rows().with_rows_ignore_order({
{int32_type->decompose(4)},
{int32_type->decompose(0)},
{int32_type->decompose(4)},
{int32_type->decompose(1)},
{int32_type->decompose(2)},
{int32_type->decompose(3)},
@@ -2072,6 +2071,22 @@ SEASTAR_TEST_CASE(test_in_restriction) {
{int32_type->decompose(2)},
{int32_type->decompose(1)},
});
return e.prepare("select r1 from tir where p1 in ?");
}).then([&e] (cql3::prepared_cache_key_type prepared_id){
auto my_list_type = list_type_impl::get_instance(int32_type, true);
std::vector<cql3::raw_value> raw_values;
auto in_values_list = my_list_type->decompose(make_list_value(my_list_type,
list_type_impl::native_type{{int(2), int(0), int(2), int(1)}}));
raw_values.emplace_back(cql3::raw_value::make_value(in_values_list));
return e.execute_prepared(prepared_id,raw_values);
}).then([&e] (shared_ptr<cql_transport::messages::result_message> msg) {
assert_that(msg).is_rows().with_rows_ignore_order({
{int32_type->decompose(4)},
{int32_type->decompose(0)},
{int32_type->decompose(1)},
{int32_type->decompose(2)},
{int32_type->decompose(3)},
});
});
});
}
@@ -2607,3 +2622,81 @@ SEASTAR_TEST_CASE(test_insert_large_collection_values) {
});
});
}
// Corner-case test that checks for the paging code's preparedness for an empty
// range list.
SEASTAR_TEST_CASE(test_empty_partition_range_scan) {
return do_with_cql_env_thread([] (cql_test_env& e) {
e.execute_cql("create keyspace empty_partition_range_scan with replication = {'class': 'SimpleStrategy', 'replication_factor': 1};").get();
e.execute_cql("create table empty_partition_range_scan.tb (a int, b int, c int, val int, PRIMARY KEY ((a,b),c) );").get();
auto qo = std::make_unique<cql3::query_options>(db::consistency_level::LOCAL_ONE, std::vector<cql3::raw_value>{},
cql3::query_options::specific_options{1, nullptr, {}, api::new_timestamp()});
auto res = e.execute_cql("select * from empty_partition_range_scan.tb where token (a,b) > 1 and token(a,b) <= 1;", std::move(qo)).get0();
assert_that(res).is_rows().is_empty();
});
}
SEASTAR_TEST_CASE(test_static_multi_cell_static_lists_with_ckey) {
return do_with_cql_env_thread([] (cql_test_env& e) {
e.execute_cql("CREATE TABLE t (p int, c int, slist list<int> static, v int, PRIMARY KEY (p, c));").get();
e.execute_cql("INSERT INTO t (p, c, slist, v) VALUES (1, 1, [1], 1); ").get();
{
e.execute_cql("UPDATE t SET slist[0] = 3, v = 3 WHERE p = 1 AND c = 1;").get();
auto msg = e.execute_cql("SELECT slist, v FROM t WHERE p = 1 AND c = 1;").get0();
auto slist_type = list_type_impl::get_instance(int32_type, true);
assert_that(msg).is_rows().with_row({
{ slist_type->decompose(make_list_value(slist_type, list_type_impl::native_type({{3}}))) },
{ int32_type->decompose(3) }
});
}
{
e.execute_cql("UPDATE t SET slist = [4], v = 4 WHERE p = 1 AND c = 1;").get();
auto msg = e.execute_cql("SELECT slist, v FROM t WHERE p = 1 AND c = 1;").get0();
auto slist_type = list_type_impl::get_instance(int32_type, true);
assert_that(msg).is_rows().with_row({
{ slist_type->decompose(make_list_value(slist_type, list_type_impl::native_type({{4}}))) },
{ int32_type->decompose(4) }
});
}
{
e.execute_cql("UPDATE t SET slist = [3] + slist , v = 5 WHERE p = 1 AND c = 1;").get();
auto msg = e.execute_cql("SELECT slist, v FROM t WHERE p = 1 AND c = 1;").get0();
auto slist_type = list_type_impl::get_instance(int32_type, true);
assert_that(msg).is_rows().with_row({
{ slist_type->decompose(make_list_value(slist_type, list_type_impl::native_type({3, 4}))) },
{ int32_type->decompose(5) }
});
}
{
e.execute_cql("UPDATE t SET slist = slist + [5] , v = 6 WHERE p = 1 AND c = 1;").get();
auto msg = e.execute_cql("SELECT slist, v FROM t WHERE p = 1 AND c = 1;").get0();
auto slist_type = list_type_impl::get_instance(int32_type, true);
assert_that(msg).is_rows().with_row({
{ slist_type->decompose(make_list_value(slist_type, list_type_impl::native_type({3, 4, 5}))) },
{ int32_type->decompose(6) }
});
}
{
e.execute_cql("DELETE slist[2] from t WHERE p = 1;").get();
auto msg = e.execute_cql("SELECT slist, v FROM t WHERE p = 1 AND c = 1;").get0();
auto slist_type = list_type_impl::get_instance(int32_type, true);
assert_that(msg).is_rows().with_row({
{ slist_type->decompose(make_list_value(slist_type, list_type_impl::native_type({3, 4}))) },
{ int32_type->decompose(6) }
});
}
{
e.execute_cql("UPDATE t SET slist = slist - [4] , v = 7 WHERE p = 1 AND c = 1;").get();
auto msg = e.execute_cql("SELECT slist, v FROM t WHERE p = 1 AND c = 1;").get0();
auto slist_type = list_type_impl::get_instance(int32_type, true);
assert_that(msg).is_rows().with_row({
{ slist_type->decompose(make_list_value(slist_type, list_type_impl::native_type({3}))) },
{ int32_type->decompose(7) }
});
}
});
}

View File

@@ -29,6 +29,9 @@
#include "database.hh"
#include "partition_slice_builder.hh"
#include "frozen_mutation.hh"
#include "mutation_source_test.hh"
#include "schema_registry.hh"
#include "service/migration_manager.hh"
SEASTAR_TEST_CASE(test_querying_with_limits) {
return do_with_cql_env([](cql_test_env& e) {
@@ -74,3 +77,33 @@ SEASTAR_TEST_CASE(test_querying_with_limits) {
});
});
}
SEASTAR_THREAD_TEST_CASE(test_database_with_data_in_sstables_is_a_mutation_source) {
do_with_cql_env([] (cql_test_env& e) {
run_mutation_source_tests([&] (schema_ptr s, const std::vector<mutation>& partitions) -> mutation_source {
try {
e.local_db().find_column_family(s->ks_name(), s->cf_name());
service::get_local_migration_manager().announce_column_family_drop(s->ks_name(), s->cf_name(), true).get();
} catch (const no_such_column_family&) {
// expected
}
service::get_local_migration_manager().announce_new_column_family(s, true).get();
column_family& cf = e.local_db().find_column_family(s);
for (auto&& m : partitions) {
e.local_db().apply(cf.schema(), freeze(m)).get();
}
cf.flush().get();
cf.get_row_cache().invalidate([] {}).get();
return mutation_source([&] (schema_ptr s,
const dht::partition_range& range,
const query::partition_slice& slice,
const io_priority_class& pc,
tracing::trace_state_ptr trace_state,
streamed_mutation::forwarding fwd,
mutation_reader::forwarding fwd_mr) {
return cf.make_reader(s, range, slice, pc, std::move(trace_state), fwd, fwd_mr);
});
});
return make_ready_future<>();
}).get();
}

View File

@@ -26,11 +26,13 @@
#include <boost/test/unit_test.hpp>
#include <query-result-set.hh>
#include <query-result-writer.hh>
#include "tests/test_services.hh"
#include "tests/test-utils.hh"
#include "tests/mutation_assertions.hh"
#include "tests/result_set_assertions.hh"
#include "tests/mutation_source_test.hh"
#include "mutation_query.hh"
#include "core/do_with.hh"
@@ -525,3 +527,22 @@ SEASTAR_TEST_CASE(test_partition_limit) {
}
});
}
SEASTAR_THREAD_TEST_CASE(test_result_size_calculation) {
random_mutation_generator gen(random_mutation_generator::generate_counters::no);
std::vector<mutation> mutations = gen(1);
schema_ptr s = gen.schema();
mutation_source source = make_source(std::move(mutations));
query::result_memory_limiter l;
query::partition_slice slice = make_full_slice(*s);
slice.options.set<query::partition_slice::option::allow_short_read>();
query::result::builder digest_only_builder(slice, query::result_options{query::result_request::only_digest, query::digest_algorithm::xxHash}, l.new_digest_read(query::result_memory_limiter::maximum_result_size).get0());
data_query(s, source, query::full_partition_range, slice, std::numeric_limits<uint32_t>::max(), std::numeric_limits<uint32_t>::max(), gc_clock::now(), digest_only_builder).get0();
query::result::builder result_and_digest_builder(slice, query::result_options{query::result_request::result_and_digest, query::digest_algorithm::xxHash}, l.new_data_read(query::result_memory_limiter::maximum_result_size).get0());
data_query(s, source, query::full_partition_range, slice, std::numeric_limits<uint32_t>::max(), std::numeric_limits<uint32_t>::max(), gc_clock::now(), result_and_digest_builder).get0();
BOOST_REQUIRE_EQUAL(digest_only_builder.memory_accounter().used_memory(), result_and_digest_builder.memory_accounter().used_memory());
}

View File

@@ -659,6 +659,46 @@ void test_mutation_reader_fragments_have_monotonic_positions(populate_fn populat
});
}
static void test_date_tiered_clustering_slicing(populate_fn populate) {
BOOST_TEST_MESSAGE(__PRETTY_FUNCTION__);
simple_schema ss;
auto s = schema_builder(ss.schema())
.set_compaction_strategy(sstables::compaction_strategy_type::date_tiered)
.build();
auto pkey = ss.make_pkey();
mutation m1(s, pkey);
ss.add_static_row(m1, "s");
m1.partition().apply(ss.new_tombstone());
ss.add_row(m1, ss.make_ckey(0), "v1");
mutation_source ms = populate(s, {m1});
// query row outside the range of existing rows to exercise sstable clustering key filter
{
auto slice = partition_slice_builder(*s)
.with_range(ss.make_ckey_range(1, 2))
.build();
auto prange = dht::partition_range::make_singular(pkey);
assert_that(ms.make_reader(s, prange, slice))
.produces(m1, slice.row_ranges(*s, pkey.key()))
.produces_end_of_stream();
}
{
auto slice = partition_slice_builder(*s)
.with_range(query::clustering_range::make_singular(ss.make_ckey(0)))
.build();
auto prange = dht::partition_range::make_singular(pkey);
assert_that(ms.make_reader(s, prange, slice))
.produces(m1)
.produces_end_of_stream();
}
}
static void test_clustering_slices(populate_fn populate) {
BOOST_TEST_MESSAGE(__PRETTY_FUNCTION__);
auto s = schema_builder("ks", "cf")
@@ -822,6 +862,7 @@ static void test_query_only_static_row(populate_fn populate) {
auto pkeys = s.make_pkeys(1);
mutation m1(s.schema(), pkeys[0]);
m1.partition().apply(s.new_tombstone());
s.add_static_row(m1, "s1");
s.add_row(m1, s.make_ckey(0), "v1");
s.add_row(m1, s.make_ckey(1), "v2");
@@ -846,6 +887,59 @@ static void test_query_only_static_row(populate_fn populate) {
.produces(m1, slice.row_ranges(*s.schema(), m1.key()))
.produces_end_of_stream();
}
// query just a static row, single-partition case
{
auto slice = partition_slice_builder(*s.schema())
.with_ranges({})
.build();
auto prange = dht::partition_range::make_singular(m1.decorated_key());
assert_that(ms.make_reader(s.schema(), prange, slice))
.produces(m1, slice.row_ranges(*s.schema(), m1.key()))
.produces_end_of_stream();
}
}
static void test_query_no_clustering_ranges_no_static_columns(populate_fn populate) {
simple_schema s(simple_schema::with_static::no);
auto pkeys = s.make_pkeys(1);
mutation m1(s.schema(), pkeys[0]);
m1.partition().apply(s.new_tombstone());
s.add_row(m1, s.make_ckey(0), "v1");
s.add_row(m1, s.make_ckey(1), "v2");
mutation_source ms = populate(s.schema(), {m1});
{
auto prange = dht::partition_range::make_ending_with(dht::ring_position(m1.decorated_key()));
assert_that(ms.make_reader(s.schema(), prange, s.schema()->full_slice()))
.produces(m1)
.produces_end_of_stream();
}
// multi-partition case
{
auto slice = partition_slice_builder(*s.schema())
.with_ranges({})
.build();
auto prange = dht::partition_range::make_ending_with(dht::ring_position(m1.decorated_key()));
assert_that(ms.make_reader(s.schema(), prange, slice))
.produces(m1, slice.row_ranges(*s.schema(), m1.key()))
.produces_end_of_stream();
}
// single-partition case
{
auto slice = partition_slice_builder(*s.schema())
.with_ranges({})
.build();
auto prange = dht::partition_range::make_singular(m1.decorated_key());
assert_that(ms.make_reader(s.schema(), prange, slice))
.produces(m1, slice.row_ranges(*s.schema(), m1.key()))
.produces_end_of_stream();
}
}
void test_streamed_mutation_forwarding_succeeds_with_no_data(populate_fn populate) {
@@ -958,6 +1052,7 @@ void test_slicing_with_overlapping_range_tombstones(populate_fn populate) {
}
void run_mutation_reader_tests(populate_fn populate) {
test_date_tiered_clustering_slicing(populate);
test_fast_forwarding_across_partitions_to_empty_range(populate);
test_clustering_slices(populate);
test_mutation_reader_fragments_have_monotonic_positions(populate);
@@ -967,6 +1062,7 @@ void run_mutation_reader_tests(populate_fn populate) {
test_streamed_mutation_forwarding_is_consistent_with_slicing(populate);
test_range_queries(populate);
test_query_only_static_row(populate);
test_query_no_clustering_ranges_no_static_columns(populate);
}
void test_next_partition(populate_fn populate) {

View File

@@ -30,6 +30,7 @@
#include <map>
#include <iostream>
#include <sstream>
#include <boost/range/algorithm/adjacent_find.hpp>
static logging::logger nlogger("NetworkTopologyStrategyLogger");
@@ -52,6 +53,26 @@ void print_natural_endpoints(double point, const std::vector<inet_address> v) {
nlogger.debug("{}", strm.str());
}
#ifndef DEBUG
static void verify_sorted(const dht::token_range_vector& trv) {
auto not_strictly_before = [] (const dht::token_range a, const dht::token_range b) {
return !b.start()
|| !a.end()
|| a.end()->value() > b.start()->value()
|| (a.end()->value() == b.start()->value() && a.end()->is_inclusive() && b.start()->is_inclusive());
};
BOOST_CHECK(boost::adjacent_find(trv, not_strictly_before) == trv.end());
}
#endif
static void check_ranges_are_sorted(abstract_replication_strategy* ars, gms::inet_address ep) {
// Too slow in debug mode
#ifndef DEBUG
verify_sorted(ars->get_ranges(ep));
verify_sorted(ars->get_primary_ranges(ep));
#endif
}
void strategy_sanity_check(
abstract_replication_strategy* ars_ptr,
const std::map<sstring, sstring>& options) {
@@ -150,6 +171,7 @@ void full_ring_check(const std::vector<ring_point>& ring_points,
auto endpoints2 = ars_ptr->get_natural_endpoints(t2);
endpoints_check(ars_ptr, endpoints2);
check_ranges_are_sorted(ars_ptr, rp.host);
BOOST_CHECK(cache_hit_count + 1 == ars_ptr->get_cache_hits_count());
BOOST_CHECK(endpoints1 == endpoints2);
}

View File

@@ -202,7 +202,7 @@ std::string get_run_date_time() {
using namespace boost::posix_time;
const ptime current_time = second_clock::local_time();
auto facet = std::make_unique<time_facet>();
facet->format("%Y-%M-%d %H:%M:%S");
facet->format("%Y-%m-%d %H:%M:%S");
std::stringstream stream;
stream.imbue(std::locale(std::locale::classic(), facet.release()));
stream << current_time;

View File

@@ -518,7 +518,7 @@ SEASTAR_THREAD_TEST_CASE(test_memory_based_cache_eviction) {
}, 24h);
size_t i = 0;
const auto entry = t.produce_first_page_and_save_querier(i);
const auto entry = t.produce_first_page_and_save_querier(i++);
const size_t queriers_needed_to_fill_cache = floor(querier_cache::max_queriers_memory_usage / entry.memory_usage);

View File

@@ -3012,11 +3012,13 @@ SEASTAR_TEST_CASE(test_concurrent_reads_and_eviction) {
slice, actual, ::join(",\n", possible_versions)));
}
}
}).finally([&, id] {
done = true;
});
});
int n_updates = 100;
while (!readers.available() && n_updates--) {
while (!done && n_updates--) {
auto m2 = gen();
m2.partition().make_fully_continuous();
@@ -3034,8 +3036,8 @@ SEASTAR_TEST_CASE(test_concurrent_reads_and_eviction) {
tracker.region().evict_some();
// Don't allow backlog to grow too much to avoid bad_alloc
const auto max_active_versions = 10;
while (versions.size() > max_active_versions) {
const auto max_active_versions = 7;
while (!done && versions.size() > max_active_versions) {
later().get();
}
}

Some files were not shown because too many files have changed in this diff Show More