Compare commits

...

47 Commits

Author SHA1 Message Date
Jenkins
d27eb734a7 release: prepare for 2.2.2 by hagitsegev 2019-01-12 18:28:25 +02:00
Avi Kivity
e6aeb490b5 Update seastar submodule
* seastar 6f61d74...88cb58c (2):
  > reactor: disable nowait aio due to a kernel bug
  > configure.py: Enhance detection for gcc -fvisibility=hidden bug

Fixes #3996.
2018-12-17 15:57:58 +02:00
Vladimir Krivopalov
2e3b09b593 database: Capture io_priority_class by reference to avoid dangling ref.
The original reference points to a thread-local storage object that
guaranteed to outlive the continuation, but copying it make the
subsequent calls point to a local object and introduces a use-after-free
bug.

Fixes #3948

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
(cherry picked from commit 68458148e7)
2018-12-02 13:32:59 +02:00
Tomasz Grabiec
92c74f4e0b utils: phased_barrier: Make advance_and_await() have strong exception guarantees
Currently, when advance_and_await() fails to allocate the new gate
object, it will throw bad_alloc and leave the phased_barrier object in
an invalid state. Calling advance_and_await() again on it will result
in undefined behavior (typically SIGSEGV) beacuse _gate will be
disengaged.

One place affected by this is table::seal_active_memtable(), which
calls _flush_barrier.advance_and_await(). If this throws, subsequent
flush attempts will SIGSEGV.

This patch rearranges the code so that advance_and_await() has strong
exception guarantees.
Message-Id: <1542645562-20932-1-git-send-email-tgrabiec@scylladb.com>

Fixes #3931.

(cherry picked from commit 57e25fa0f8)
2018-11-21 12:18:25 +02:00
Avi Kivity
89d835e9e3 tests: fix network_topology_test timing out in debug mode
In 2.2, SEASTAR_DEBUG is just DEBUG.
2018-10-21 19:04:08 +03:00
Takuya ASADA
263a740084 dist/debian: use --configfile to specify pbuilderrc
Use --configfile to specify pbuilderrc, instead of copying it to home directory.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180420024624.9661-1-syuu@scylladb.com>
(cherry picked from commit 01c36556bf)
2018-10-21 18:21:18 +03:00
Avi Kivity
7f24b5319e release: prepare for 2.2.1 2018-10-19 21:16:14 +03:00
Avi Kivity
fe16c0e985 locator: fix abstract_replication_strategy::get_ranges() and friends violating sort order
get_ranges() is supposed to return ranges in sorted order. However, a35136533d
broke this and returned the range that was supposed to be last in the second
position (e.g. [0, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9]). The broke cleanup, which
relied on the sort order to perform a binary search. Other users of the
get_ranges() family did not rely on the sort order.

Fixes #3872.
Message-Id: <20181019113613.1895-1-avi@scylladb.com>

(cherry picked from commit 1ce52d5432)
2018-10-19 21:16:12 +03:00
Glauber Costa
f85badaaac api: use longs instead of ints for snapshot sizes
Int types in json will be serialized to int types in C++. They will then
only be able to handle 4GB, and we tend to store more data than that.

Without this patch, listsnapshots is broken in all versions.

Fixes: #3845

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20181012155902.7573-1-glauber@scylladb.com>
(cherry picked from commit 98332de268)
2018-10-12 22:02:56 +03:00
Eliran Sinvani
2193d41683 cql3 : add workaround to antlr3 null dereference bug
The Antlr3 exception class has a null dereference bug that crashes
the system when trying to extract the exception message using
ANTLR_Exception<...>::displayRecognitionError(...) function. When
a parsing error occurs the CqlParser throws an exception which in
turn processesed for some special cases in scylla to generate a custom
message. The default case however, creates the message using
displayRecognitionError, causing the system to crash.
The fix is a simple workaround, making sure the pointer is not null
before the call to the function. A "proper" fix can't be implemented
because the exception class itself is implemented outside scylla
in antlr headers that resides on the host machine os.

Tested manualy 2 testcases, a typo causing scylla to crash and
a cql comment without a newline at the end also caused scylla to crash.
Ran unit tests (release).

Fixes #3740
Fixes #3764

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <cfc7e0d758d7a855d113bb7c8191b0fd7d2e8921.1538566542.git.eliransin@scylladb.com>
(cherry picked from commit 20f49566a2)
2018-10-08 11:02:16 +03:00
Avi Kivity
1e1f0c29bf utils: crc32: mark power crc32 assembly as not requiring an executable stack
The linker uses an opt-in system for non-executable stack: if all object files
opt into a non-executable stack, the binary will have a non-executable stack,
which is very desirable for security. The compiler cooperates by opting into
a non-executable stack whenever possible (always for our code).

However, we also have an assembly file (for fast power crc32 computations).
Since it doesn't opt into a non-executable stack, we get a binary with
executable stack, which Gentoo's build system rightly complains about.

Fix by adding the correct incantation to the file.

Fixes #3799.

Reported-by: Alexys Jacob <ultrabug@gmail.com>
Message-Id: <20181002151251.26383-1-avi@scylladb.com>
(cherry picked from commit aaab8a3f46)
2018-10-08 11:02:16 +03:00
Calle Wilund
84d4588b5f storage_proxy: Add missing re-throw in truncate_blocking
Iff truncation times out, we want to log it, but the exception should
not be swallowed, but re-thrown.

Fixes #3796.

Message-Id: <20181001112325.17809-1-calle@scylladb.com>
(cherry picked from commit 2996b8154f)
2018-10-08 11:02:16 +03:00
Duarte Nunes
7b43b26709 tests/aggregate_fcts_test: Add test case for wrapped types
Provide a test case which checks a type being wrapped in a
reverse_type plays no role in assignment.

Refs #3789

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180927223201.28152-2-duarte@scylladb.com>
(cherry picked from commit 17578c3579)
2018-10-08 11:02:16 +03:00
Duarte Nunes
0ed01acf15 cql3/selection/selector: Unwrap types when validating assignment
When validating assignment between two types, it's possible one of
them is wrapped in a reverse_type, if it comes, for example, from the
type associated with a clustering column. When checking for weak
assignment the types are correctly unwrapped, but not when checking
for an exact match, which this patch fixes.

Technically, the receiver is never a reversed_type for the current
callers, but this is the morally correct implementation, as the type
being reversed or not plays no role in assignment.

Tests: unit(release)

Fixes #3789

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180927223201.28152-1-duarte@scylladb.com>
(cherry picked from commit 5e7bb20c8a)
2018-10-08 11:02:16 +03:00
Gleb Natapov
7ce160f408 mutation_query_test: add test for result size calculation
Check that digest only and digest+data query calculate result size to be
the same.

Message-Id: <20180906153800.GK2326@scylladb.com>
(cherry picked from commit 9e438933a2)

Message-Id: <20181008075901.GC2380@scylladb.com>
2018-10-08 11:02:09 +03:00
Gleb Natapov
5017d9b46a mutation_partition: accurately account for result size in digest only queries
When measuring_output_stream is used to calculate result's element size
it incorrectly takes into account not only serialized element size, but
a placeholder that ser::qr_partition__rows/qr_partition__static_row__cells
constructors puts in the beginning. Fix it by taking starting point in a
stream before element serialization and subtracting it afterwords.

Fixes #3755

Message-Id: <20180906153609.GJ2326@scylladb.com>
(cherry picked from commit d7674288a9)
2018-10-07 18:16:19 +03:00
Gleb Natapov
50b6ab3552 mutation_partition: correctly measure static row size when doing digest calculation
The code uses incorrect output stream in case only digest is requested
and thus getting incorrect data size. Failing to correctly account
for static row size while calculating digest may cause digest mismatch
between digest and data query.

Fixes #3753.

Message-Id: <20180905131219.GD2326@scylladb.com>
(cherry picked from commit 98092353df)
2018-09-06 16:51:19 +03:00
Eliran Sinvani
b1652823aa cql3: ensure repeated values in IN clauses don't return repeated rows
When the list of values in the IN list of a single column contains
duplicates, multiple executors are activated since the assumption
is that each value in the IN list corresponds to a different partition.
this results in the same row appearing in the result number times
corresponding to the duplication of the partition value.

Added queries for the in restriction unitest and fixed with a bad result check.

Fixes #2837
Tests: Queries as in the usecase from the GitHub issue in both forms ,
prepared and plain (using python driver),Unitest.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <ad88b7218fa55466be7bc4303dc50326a3d59733.1534322238.git.eliransin@scylladb.com>
(cherry picked from commit d734d316a6)
2018-08-26 15:51:17 +03:00
Tomasz Grabiec
02b24aec34 Merge 'Fix multi-cell static list updates in the presence of ckeys' from Duarte
Fixes a regression introduced in
9e88b60ef5, which broke the lookup for
prefetched values of lists when a clustering key is specified.

This is the code that was removed from some list operations:

 std::experimental::optional<clustering_key> row_key;
 if (!column.is_static()) {
   row_key = clustering_key::from_clustering_prefix(*params._schema, prefix);
 }
 ...
 auto&& existing_list = params.get_prefetched_list(m.key().view(), row_key, column);

Put it back, in the form of common code in the update_parameters class.

Fixes #3703

* https://github.com/duarten/scylla cql-list-fixes/v1:
  tests/cql_query_test: Test multi-cell static list updates with ckeys
  cql3/lists: Fix multi-cell static list updates in the presence of ckeys
  keys: Add factory for an empty clustering_key_prefix_view

(cherry picked from commit 6937cc2d1c)
2018-08-21 21:39:22 +01:00
Duarte Nunes
22eea4d8cf cql3/query_options: Use _value_views in prepare()
_value_views is the authoritative data structure for the
client-specified values. Indeed, the ctor called
transport::request::read_options() leaves _values completely empty.

In query_options::prepare() we were, however, using _values to
associated values to the client-specified column names, and not
_value_views. Fix this by using _value_views instead.

As for the reasons we didn't see this bug earlier, I assume it's
because very few drivers set the 0x04 query options flag, which means
column names are omitted. This is the right thing to do since most
drivers have enough information to correctly position the values.

Fixes #3688

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180814234605.14775-1-duarte@scylladb.com>
(cherry picked from commit a4355fe7e7)
2018-08-21 21:39:22 +01:00
Tomasz Grabiec
d257f6d57c mutation_partition: Fix exception safety of row::apply_monotonically()
When emplace_back() fails, value is already moved-from into a
temporary, which breaks monotonicity expected from
apply_monotonically(). As a result, writes to that cell will be lost.

The fix is to avoid the temporary by in-place construction of
cell_and_hash. To do that, appropriate cell_and_hash constructor was
added.

Found by mutation_test.cc::test_apply_monotonically_is_monotonic with
some modifications to the random mutation generator.

Introduced in 99a3e3a.

Fixes #3678.

Message-Id: <1533816965-27328-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 024b3c9fd9)
2018-08-21 21:39:18 +01:00
Takuya ASADA
6fca92ac3c dist/common/scripts/scylla_ec2_check: support custom NIC ifname on EC2
This is bash version of commit 88fe3c2694.

Since some AMIs using consistent network device naming, primary NIC
ifname is not 'eth0'.
But we hardcoded NIC name as 'eth0' on scylla_ec2_check, we need to add
--nic option to specify custom NIC ifname.

Fixes #3658

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180807231650.13697-1-syuu@scylladb.com>
2018-08-08 09:16:57 +03:00
Jesse Haber-Kucharsky
26e3917046 auth: Don't use unsupported hashing algorithms
In previous versions of Fedora, the `crypt_r` function returned
`nullptr` when a requested hashing algorithm was not supported.

This is consistent with the documentation of the function in its man
page.

As of Fedora 28, the function's behavior changes so that the encrypted
text is not `nullptr` on error, but instead the string "*0".

The info pages for `crypt_r` clarify somewhat (and contradict the man
pages):

    Some implementations return `NULL` on failure, and others return an
    _invalid_ hashed passphrase, which will begin with a `*` and will
    not be the same as SALT.

Because of this change of behavior, users running Scylla on a Fedora 28
machine which was upgraded from a previous release would not be able to
authenticate: an unsupported hashing algorithm would be selected,
producing encrypted text that did not match the entry in the table.

With this change, unsupported algorithms are correctly detected and
users should be able to continue to authenticate themselves.

Fixes #3637.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <bcd708f3ec195870fa2b0d147c8910fb63db7e0e.1533322594.git.jhaberku@scylladb.com>
(cherry picked from commit fce10f2c6e)
2018-08-05 10:30:47 +03:00
Gleb Natapov
3892594a93 cache_hitrate_calculator: fix race when new table is added during calculations
The calculation consists of several parts with preemption point between
them, so a table can be added while calculation is ongoing. Do not
assume that table exists in intermediate data structure.

Fixes #3636

Message-Id: <20180801093147.GD23569@scylladb.com>
(cherry picked from commit 44a6afad8c)
2018-08-01 14:34:08 +03:00
Amos Kong
4b24439841 scylla_setup: fix conditional statement of silent mode
Commit 300af65555 introdued a problem in
conditional statement, script will always abort in silent mode, it doesn't
care about the return value.

Fixes #3485

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <1c12ab04651352964a176368f8ee28f19ae43c68.1528077114.git.amos@scylladb.com>
(cherry picked from commit 364c2551c8)
2018-07-25 09:36:32 +03:00
Takuya ASADA
a02a4592d8 dist/common/scripts/scylla_setup: abort running script when one of setup failed in silent mode
Current script silently continues even one of setup fails, need to
abort.

Fixes #3433

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180522180355.1648-1-syuu@scylladb.com>
(cherry picked from commit 300af65555)
2018-07-25 09:36:29 +03:00
Avi Kivity
b6e1c08451 Merge "row_cache: Fix violation of continuity on concurrent eviction and population" from Tomasz
"
The problem happens under the following circumstances:

  - we have a partially populated partition in cache, with a gap in the middle

  - a read with no clustering restrictions trying to populate that gap

  - eviction of the entry for the lower bound of the gap concurrent with population

The population may incorrectly mark the range before the gap as continuous.
This may result in temporary loss of writes in that clustering range. The
problem heals by clearing cache.

Caught by row_cache_test::test_concurrent_reads_and_eviction, which has been
failing sporadically.

The problem is in ensure_population_lower_bound(), which returns true if
current clustering range covers all rows, which means that the populator has a
right to set continuity flag to true on the row it inserts. This is correct
only if the current population range actually starts since before all
clustering rows. Otherwise, we're populating since _last_row and should
consult it.

Fixes #3608.
"

* 'tgrabiec/fix-violation-of-continuity-on-concurrent-read-and-eviction' of github.com:tgrabiec/scylla:
  row_cache: Fix violation of continuity on concurrent eviction and population
  position_in_partition: Introduce is_before_all_clustered_rows()

(cherry picked from commit 31151cadd4)
2018-07-18 12:07:01 +02:00
Botond Dénes
9469afcd27 storage_proxy: use the original row limits for the final results merging
`query_partition_key_range()` does the final result merging and trimming
(if necessary) to make sure we don't send more rows to the client than
requested. This merging and trimming is done by a continuation attached
to the `query_partition_key_range_concurrent()` which does the actual
querying. The continuations captures via value the `row_limit` and
`partition_limit` fields of the `query::read_command` object of the
query. This has an unexpected consequence. The lambda object is
constructed after the call to `query_partition_key_range_concurrent()`
returns. If this call doesn't defer, any modifications done to the read
command object done by `query_partition_key_range_concurrent()` will be
visible to the lambda. This is undesirable because
`query_partition_key_range_concurrent()` updates the read command object
directly as the vnodes are traversed which in turn will result in the
lambda doing the final trimming according to a decremented `row_limits`,
which will cause the paging logic to declare the query as exhausted
prematurely because the page will not be full.
To avoid all this make a copy of the relevant limit fields before
`query_partition_key_range_concurrent()` is called and pass these copies
to the continuation, thus ensuring that the final trimming will be done
according to the original page limits.

Spotted while investigating a dtest failure on my 1865/range-scans/v2
branch. On that branch the way range scans are executed on replicas is
completely refactored. These changes appearantly reduce the number of
continuations in the read path to the point where an entire page can be
filled without deferring and thus causing the problem to surface.

Fixes #3605.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <f11e80a6bf8089d49ba3c112b25a69edf1a92231.1531743940.git.bdenes@scylladb.com>
(cherry picked from commit cc4acb6e26)
2018-07-16 17:51:06 +03:00
Avi Kivity
240b9f122b Merge "Backport empty partition range scan fixes" from Botond
"
This mini-series lumps together the fix for the empty partition range
scan crash (#3564) and the two follow-up patches.
"

* 'paging-fix-backport-2.2/v1' of https://github.com/denesb/scylla:
  query_pager: use query::is_single_partition() to check for singular range
  tests/cql_query_tess: add unit test for querying empty ranges test
  query_pager: be prepared to _ranges being empty
2018-07-05 10:29:31 +03:00
Botond Dénes
cb16cd7724 query_pager: use query::is_single_partition() to check for singular range
Use query::is_single_partition() to check whether the queried ranges are
singular or not. The current method of using
`dht::partition_range::is_singular()` is incorrect, as it is possible to
build a singular range that doesn't represent a single partition.
`query::is_single_partition()` correctly checks for this so use it
instead.

Found during code-review.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <f671f107e8069910a2f84b14c8d22638333d571c.1530675889.git.bdenes@scylladb.com>
(cherry picked from commit 8084ce3a8e)
2018-07-04 12:57:45 +03:00
Botond Dénes
c864d198fc tests/cql_query_tess: add unit test for querying empty ranges test
A bug was found recently (#3564) in the paging logic, where the code
assumed the queried ranges list is non-empty. This assumption is
incorrect as there can be valid (if rare) queries that can result in the
ranges list to be empty. Add a unit test that executes such a query with
paging enabled to detect any future bugs related to assumptions about
the ranges list being non-empty.

Refs: #3564
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <f5ba308c4014c24bb392060a7e72e7521ff021fa.1530618836.git.bdenes@scylladb.com>
(cherry picked from commit c236a96d7d)
2018-07-04 09:52:54 +03:00
Botond Dénes
25125e9c4f query_pager: be prepared to _ranges being empty
do_fetch_page() checks in the beginning whether there is a saved query
state already, meaning this is not the first page. If there is not it
checks whether the query is for a singulular partitions or a range scan
to decide whether to enable the stateful queries or not. This check
assumed that there is at least one range in _ranges which will not hold
under some circumstances. Add a check for _ranges being empty.

Fixes: #3564
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <cbe64473f8013967a93ef7b2104c7ca0507afac9.1530610709.git.bdenes@scylladb.com>
(cherry picked from commit 59a30f0684)
2018-07-04 09:52:54 +03:00
Shlomi Livne
faf10fe6aa release: prepare for 2.2.0
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2018-07-01 22:40:42 +03:00
Calle Wilund
f76269cdcf sstables::compress: Ensure unqualified compressor name if possible
Fixes #3546

Both older origin and scylla writes "known" compressor names (i.e. those
in origin namespace) unqualified (i.e. LZ4Compressor).

This behaviour was not preserved in the virtualization change. But
probably should be.

Message-Id: <20180627110930.1619-1-calle@scylladb.com>
(cherry picked from commit 054514a47a)
2018-06-28 18:55:15 +03:00
Avi Kivity
a9b0ccf116 Merge "Disable sstable filtering based on min/max clustering key components" from Tomasz
"
With DateTiered and TimeWindow, there is a read optimization enabled
which excludes sstables based on overlap with recorded min/max values
of clustering key components. The problem is that it doesn't take into
account partition tombstones and static rows, which should still be
returned by the reader even if there is no overlap in the query's
clustering range. A read which returns no clustering rows can
mispopulate cache, which will appear as partition deletion or writes
to the static row being lost. Until node restart or eviction of the
partition entry.

There is also a bad interaction between cache population on read and
that optimization. When the clustering range of the query doesn't
overlap with any sstable, the reader will return no partition markers
for the read, which leads cache populator to assume there is no
partition in sstables and it will cache an empty partition. This will
cause later reads of that partition to miss prior writes to that
partition until it is evicted from cache or node is restarted.

Disable until a more elaborate fix is implemented.

Fixes #3552
Fixes #3553
"

* tag 'tgrabiec/disable-min-max-sstable-filtering-v1' of github.com:tgrabiec/scylla:
  tests: Add test for slicing a mutation source with date tiered compaction strategy
  tests: Check that database conforms to mutation source
  database: Disable sstable filtering based on min/max clustering key components

(cherry picked from commit e1efda8b0c)
2018-06-28 18:55:15 +03:00
Tomasz Grabiec
abc5941f87 flat_mutation_reader: Move field initialization to initializer list
This works around a problem of std::terminate() being called in debug
mode build if initialization of _current throws.

Backtrace:

Thread 2 "row_cache_test_" received signal SIGABRT, Aborted.
0x00007ffff17ce9fb in raise () from /lib64/libc.so.6
(gdb) bt
  #0  0x00007ffff17ce9fb in raise () from /lib64/libc.so.6
  #1  0x00007ffff17d077d in abort () from /lib64/libc.so.6
  #2  0x00007ffff5773025 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
  #3  0x00007ffff5770c16 in ?? () from /lib64/libstdc++.so.6
  #4  0x00007ffff576fb19 in ?? () from /lib64/libstdc++.so.6
  #5  0x00007ffff5770508 in __gxx_personality_v0 () from /lib64/libstdc++.so.6
  #6  0x00007ffff3ce4ee3 in ?? () from /lib64/libgcc_s.so.1
  #7  0x00007ffff3ce570e in _Unwind_Resume () from /lib64/libgcc_s.so.1
  #8  0x0000000003633602 in reader::reader (this=0x60e0001160c0, r=...) at flat_mutation_reader.cc:214
  #9  0x0000000003655864 in std::make_unique<make_forwardable(flat_mutation_reader)::reader, flat_mutation_reader>(flat_mutation_reader &&) (__args#0=...)
    at /usr/include/c++/7/bits/unique_ptr.h:825
  #10 0x0000000003649a63 in make_flat_mutation_reader<make_forwardable(flat_mutation_reader)::reader, flat_mutation_reader>(flat_mutation_reader &&) (args#0=...)
    at flat_mutation_reader.hh:440
  #11 0x000000000363565d in make_forwardable (m=...) at flat_mutation_reader.cc:270
  #12 0x000000000303f962 in memtable::make_flat_reader (this=0x61300001d540, s=..., range=..., slice=..., pc=..., trace_state_ptr=..., fwd=..., fwd_mr=...)
    at memtable.cc:592

Message-Id: <1528792447-13336-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 6d6b93d1e7)
2018-06-28 18:55:15 +03:00
Asias He
a152ac12af gossip: Fix tokens assignment in assassinate_endpoint
The tokens vector is defined a few lines above and is needed outsie the
if block.

Do not redefine it again in the if block, otherwise the tokens will be empty.

Found by code inspection.

Fixes #3551.

Message-Id: <c7a06375c65c950e94236571127f533e5a60cbfd.1530002177.git.asias@scylladb.com>
(cherry picked from commit c3b5a2ecd5)
2018-06-28 18:55:15 +03:00
Botond Dénes
c274fdf2ec querier: find_querier(): return end() when no querier matches the range
When none of the queriers found for the lookup key match the lookup
range `_entries.end()` should be returned as the search failed. Instead
the iterator returned from the failed `std::find_if()` is returned
which, if the find failed, will be the end iterator returned by the
previous call to `_entries.equal_range()`. This is incorrect because as
long as `equal_range()`'s end iterator is not also `_entries.end()` the
search will always return an iterator to a querier regardless of whether
any of them actually matches the read range.
Fix by returning `_entries.end()` when it is detected that no queriers
match the range.

Fixes: #3530
(cherry picked from commit 2609a17a23)
2018-06-28 18:55:15 +03:00
Botond Dénes
5b88d6b4d6 querier_cache: restructure entries storage
Currently querier_cache uses a `std::unordered_map<utils::UUID, querier>`
to store cache entries and an `std::list<meta_entry>` to store meta
information about the querier entries, like insertion order, expiry
time, etc.

All cache eviction algorithms use the meta-entry list to evict entries
in reverse insertion order (LRU order). To make this possible
meta-entries keep an iterator into the entry map so that given a
meta-entry one can easily erase the querier entry. This however poses a
problem as std::unordered_map can possibly invalidate all its iterators
when new items are inserted. This is use-after-free waiting to happen.

Another disadvantages of the current solution is that it requires the
meta-entry to use a weak pointer to the querier entry so that in case
that is removed (as a result of a successful lookup) it doesn't try to
access it. This has an impact on all cache eviction algorithms as they
have to be prepared to deal with stale meta-entries. Stale meta-entries
also unnecesarily consume memory.

To solve these problems redesign how querier_cache stores entries
completely. Instead of storing the entries in an `std::unordered_map`
and storing the meta-entries in an `std::list`, store the entries in an
`std::list` and an intrusive-map (index) for lookups. This new design
has severeal advantages over the old one:
* The entries will now be in insert order, so eviction strategies can
  work on the entry list itself, no need to involve additional data
  structures for this.
* All data related to an entry is stored in one place, no data
  duplication.
* Removing an entry automatically removes it from the index as intrusive
  containers support auto unlink. This means there is no need to store
  iterators for long terms, risking use-after-free when the container
  invalidates it's iterators.

Additional changes:
* Modify eviction strategies so that they work with the `entry`
  interface rather than the stored value directly.

Ref #3424

(cherry picked from commit 7ce7f3f0cc)
2018-06-28 18:55:15 +03:00
Botond Dénes
2d626e1cf8 tests/querier_cache: fix memory based eviction test
Do increment the key counter after inserting the first querier into the
cache. Otherwise two queriers with the same key will be inserted and
will fail the test. This problem is exposed by the changes the next
patches make to the querier-cache but will be fixed before to maintain
bisectability of the code.

Fixes: #3529
(cherry picked from commit b9d51b4c08)
2018-06-28 18:55:15 +03:00
Avi Kivity
c11bd3e1cf Merge "Do not allow compaction controller shares to grow indefinitely" from Glauber
"
We are seeing some workloads with large datasets where the compaction
controller ends up with a lot of shares. Regardless of whether or not
we'll change the algorithm, this patchset handles a more basic issue,
which is the fact that the current controller doesn't set a maximum
explicitly, so if the input is larger than the maximum it will keep
growing without bounds.

It also pushes the maximum input point of the compaction controller from
10 to 30, allowing us to err on the side of caution for the 2.2 release.
"

* 'tame-controller' of github.com:glommer/scylla:
  controller: do not increase shares of controllers for inputs higher than the maximum
  controller: adjust constants for compaction controller

(cherry picked from commit e0eb66af6b)
2018-06-20 10:58:20 +03:00
Avi Kivity
9df3df92bc Merge "Try harder to move STCS towards zero-backlog" from Glauber
"
Tests: unit (release)

Before merging the LCS controller, we merged patches that would
guarantee that LCS would move towards zero backlog - otherwise the
backlog could get too high.

We didn't do the same for STCS, our first controlled strategy. So we may
end up with a situation where there are many SSTables inducing a large
backlog, but they are not yet meeting the minimum criteria for
compaction. The backlog, then, never goes down.

This patch changes the SSTable selection criteria so that if there is
nothing to do, we'll keep pushing towards reaching a state of zero
backlog. Very similar to what we did for LCS.
"

* 'stcs-min-threshold-v4' of github.com:glommer/scylla:
  STCS: bypass min_threshold unless configure to enforce strictly
  compaction_strategy: allow the user to tell us if min_threshold has to be strict

(cherry picked from commit f0fc888381)
2018-06-18 14:21:52 +03:00
Takuya ASADA
8ad9578a6c dist/debian: add --jobs <njobs> option just like build_rpm.sh
On some build environment we may want to limit number of parallel jobs since
ninja-build runs ncpus jobs by default, it may too many since g++ eats very
huge memory.
So support --jobs <njobs> just like on rpm build script.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180425205439.30053-1-syuu@scylladb.com>
(cherry picked from commit 782ebcece4)
2018-06-14 15:04:50 +03:00
Tomasz Grabiec
4cb6061a9f tests: row_cache: Reduce concurrency limit to avoid bad_alloc
The test uses random mutations. We saw it failing with bad_alloc from time to time.
Reduce concurrency to reduce memory footprint.

Message-Id: <20180611090304.16681-1-tgrabiec@scylladb.com>
(cherry picked from commit a91974af7a)
2018-06-14 13:40:00 +02:00
Tomasz Grabiec
1940e6bd95 tests: row_cache: Do not hang when only one of the readers throws
Message-Id: <20180531122729.3314-1-tgrabiec@scylladb.com>
(cherry picked from commit b5e42bc6a0)
2018-06-14 13:40:00 +02:00
Avi Kivity
044cfde5f3 database: stop using incremental selectors
There is a bug in incremental_selector for partitioned_sstable_set, so
until it is found, stop using it.

This degrades scan performance of Leveled Compaction Strategy tables.

Fixes #3513. (as a workaround)
Introduced: 2.1
Message-Id: <20180613131547.19084-1-avi@scylladb.com>

(cherry picked from commit aeffbb6732)
2018-06-13 21:04:56 +03:00
Vlad Zolotarov
262a246436 locator::ec2_multi_region_snitch: don't call for ec2_snitch::gossiper_starting()
ec2_snitch::gossiper_starting() calls for the base class (default) method
that sets _gossip_started to TRUE and thereby prevents to following
reconnectable_snitch_helper registration.

Fixes #3454

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1528208520-28046-1-git-send-email-vladz@scylladb.com>
(cherry picked from commit 2dde372ae6)
2018-06-12 19:02:19 +03:00
46 changed files with 585 additions and 199 deletions

View File

@@ -1,6 +1,6 @@
#!/bin/sh
VERSION=2.2.rc2
VERSION=2.2.2
if test -f version
then

View File

@@ -2193,11 +2193,11 @@
"description":"The column family"
},
"total":{
"type":"int",
"type":"long",
"description":"The total snapshot size"
},
"live":{
"type":"int",
"type":"long",
"description":"The live snapshot size"
}
}

View File

@@ -149,7 +149,9 @@ static sstring gensalt() {
// blowfish 2011 fix, blowfish, sha512, sha256, md5
for (sstring pfx : { "$2y$", "$2a$", "$6$", "$5$", "$1$" }) {
salt = pfx + input;
if (crypt_r("fisk", salt.c_str(), &tlcrypt)) {
const char* e = crypt_r("fisk", salt.c_str(), &tlcrypt);
if (e && (e[0] != '*')) {
prefix = pfx;
return salt;
}

View File

@@ -127,7 +127,7 @@ public:
class compaction_controller : public backlog_controller {
public:
static constexpr unsigned normalization_factor = 10;
static constexpr unsigned normalization_factor = 30;
compaction_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, float static_shares) : backlog_controller(sg, iop, static_shares) {}
compaction_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, std::chrono::milliseconds interval, std::function<float()> current_backlog)
: backlog_controller(sg, iop, std::move(interval),

View File

@@ -60,6 +60,7 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
// - _next_row_in_range = _next.position() < _upper_bound
// - _last_row points at a direct predecessor of the next row which is going to be read.
// Used for populating continuity.
// - _population_range_starts_before_all_rows is set accordingly
reading_from_underlying,
end_of_stream
@@ -86,6 +87,13 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
partition_snapshot_row_cursor _next_row;
bool _next_row_in_range = false;
// True iff current population interval, since the previous clustering row, starts before all clustered rows.
// We cannot just look at _lower_bound, because emission of range tombstones changes _lower_bound and
// because we mark clustering intervals as continuous when consuming a clustering_row, it would prevent
// us from marking the interval as continuous.
// Valid when _state == reading_from_underlying.
bool _population_range_starts_before_all_rows;
future<> do_fill_buffer(db::timeout_clock::time_point);
void copy_from_cache_to_buffer();
future<> process_static_row(db::timeout_clock::time_point);
@@ -226,6 +234,7 @@ inline
future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_point timeout) {
if (_state == state::move_to_underlying) {
_state = state::reading_from_underlying;
_population_range_starts_before_all_rows = _lower_bound.is_before_all_clustered_rows(*_schema);
auto end = _next_row_in_range ? position_in_partition(_next_row.position())
: position_in_partition(_upper_bound);
return _read_context->fast_forward_to(position_range{_lower_bound, std::move(end)}, timeout).then([this, timeout] {
@@ -351,7 +360,7 @@ future<> cache_flat_mutation_reader::read_from_underlying(db::timeout_clock::tim
inline
bool cache_flat_mutation_reader::ensure_population_lower_bound() {
if (!_ck_ranges_curr->start()) {
if (_population_range_starts_before_all_rows) {
return true;
}
if (!_last_row.refresh(*_snp)) {
@@ -406,6 +415,7 @@ inline
void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
if (!can_populate()) {
_last_row = nullptr;
_population_range_starts_before_all_rows = false;
_read_context->cache().on_mispopulate();
return;
}
@@ -439,6 +449,7 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
with_allocator(standard_allocator(), [&] {
_last_row = partition_snapshot_row_weakref(*_snp, it, true);
});
_population_range_starts_before_all_rows = false;
});
}

View File

@@ -67,6 +67,12 @@ class error_collector : public error_listener<RecognizerType, ExceptionBaseType>
*/
const sstring_view _query;
/**
* An empty bitset to be used as a workaround for AntLR null dereference
* bug.
*/
static typename ExceptionBaseType::BitsetListType _empty_bit_list;
public:
/**
@@ -144,6 +150,14 @@ private:
break;
}
default:
// AntLR Exception class has a bug of dereferencing a null
// pointer in the displayRecognitionError. The following
// if statement makes sure it will not be null before the
// call to that function (displayRecognitionError).
// bug reference: https://github.com/antlr/antlr3/issues/191
if (!ex->get_expectingSet()) {
ex->set_expectingSet(&_empty_bit_list);
}
ex->displayRecognitionError(token_names, msg);
}
return msg.str();
@@ -345,4 +359,8 @@ private:
#endif
};
template<typename RecognizerType, typename TokenType, typename ExceptionBaseType>
typename ExceptionBaseType::BitsetListType
error_collector<RecognizerType,TokenType,ExceptionBaseType>::_empty_bit_list = typename ExceptionBaseType::BitsetListType();
}

View File

@@ -209,19 +209,18 @@ void query_options::prepare(const std::vector<::shared_ptr<column_specification>
}
auto& names = *_names;
std::vector<cql3::raw_value> ordered_values;
std::vector<cql3::raw_value_view> ordered_values;
ordered_values.reserve(specs.size());
for (auto&& spec : specs) {
auto& spec_name = spec->name->text();
for (size_t j = 0; j < names.size(); j++) {
if (names[j] == spec_name) {
ordered_values.emplace_back(_values[j]);
ordered_values.emplace_back(_value_views[j]);
break;
}
}
}
_values = std::move(ordered_values);
fill_value_views();
_value_views = std::move(ordered_values);
}
void query_options::fill_value_views()

View File

@@ -202,6 +202,14 @@ public:
const query_options& options,
gc_clock::time_point now) const override;
virtual std::vector<bytes_opt> values_raw(const query_options& options) const = 0;
virtual std::vector<bytes_opt> values(const query_options& options) const override {
std::vector<bytes_opt> ret = values_raw(options);
std::sort(ret.begin(),ret.end());
ret.erase(std::unique(ret.begin(),ret.end()),ret.end());
return ret;
}
#if 0
@Override
protected final boolean isSupportedBy(SecondaryIndex index)
@@ -224,7 +232,7 @@ public:
return abstract_restriction::term_uses_function(_values, ks_name, function_name);
}
virtual std::vector<bytes_opt> values(const query_options& options) const override {
virtual std::vector<bytes_opt> values_raw(const query_options& options) const override {
std::vector<bytes_opt> ret;
for (auto&& v : _values) {
ret.emplace_back(to_bytes_opt(v->bind_and_get(options)));
@@ -249,7 +257,7 @@ public:
return false;
}
virtual std::vector<bytes_opt> values(const query_options& options) const override {
virtual std::vector<bytes_opt> values_raw(const query_options& options) const override {
auto&& lval = dynamic_pointer_cast<multi_item_terminal>(_marker->bind(options));
if (!lval) {
throw exceptions::invalid_request_exception("Invalid null value for IN restriction");

View File

@@ -105,9 +105,11 @@ public:
virtual void reset() = 0;
virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver) override {
if (receiver->type == get_type()) {
auto t1 = receiver->type->underlying_type();
auto t2 = get_type()->underlying_type();
if (t1 == t2) {
return assignment_testable::test_result::EXACT_MATCH;
} else if (receiver->type->is_value_compatible_with(*get_type())) {
} else if (t1->is_value_compatible_with(*t2)) {
return assignment_testable::test_result::WEAKLY_ASSIGNABLE;
} else {
return assignment_testable::test_result::NOT_ASSIGNABLE;

View File

@@ -53,6 +53,9 @@ update_parameters::get_prefetched_list(
return {};
}
if (column.is_static()) {
ckey = clustering_key_view::make_empty();
}
auto i = _prefetched->rows.find(std::make_pair(std::move(pkey), std::move(ckey)));
if (i == _prefetched->rows.end()) {
return {};

View File

@@ -361,9 +361,13 @@ filter_sstable_for_reader(std::vector<sstables::shared_sstable>&& sstables, colu
};
sstables.erase(boost::remove_if(sstables, sstable_has_not_key), sstables.end());
// FIXME: Workaround for https://github.com/scylladb/scylla/issues/3552
// and https://github.com/scylladb/scylla/issues/3553
const bool filtering_broken = true;
// no clustering filtering is applied if schema defines no clustering key or
// compaction strategy thinks it will not benefit from such an optimization.
if (!schema->clustering_key_size() || !cf.get_compaction_strategy().use_clustering_key_filter()) {
if (filtering_broken || !schema->clustering_key_size() || !cf.get_compaction_strategy().use_clustering_key_filter()) {
return sstables;
}
::cf_stats* stats = cf.cf_stats();
@@ -1633,9 +1637,9 @@ future<> distributed_loader::open_sstable(distributed<database>& db, sstables::e
// to distribute evenly the resource usage among all shards.
return db.invoke_on(column_family::calculate_shard_from_sstable_generation(comps.generation),
[&db, comps = std::move(comps), func = std::move(func), pc] (database& local) {
[&db, comps = std::move(comps), func = std::move(func), &pc] (database& local) {
return with_semaphore(local.sstable_load_concurrency_sem(), 1, [&db, &local, comps = std::move(comps), func = std::move(func), pc] {
return with_semaphore(local.sstable_load_concurrency_sem(), 1, [&db, &local, comps = std::move(comps), func = std::move(func), &pc] {
auto& cf = local.find_column_family(comps.ks, comps.cf);
auto f = sstables::sstable::load_shared_components(cf.schema(), cf._config.datadir, comps.generation, comps.version, comps.format, pc);
@@ -2159,6 +2163,11 @@ database::database(const db::config& cfg, database_config dbcfg)
void backlog_controller::adjust() {
auto backlog = _current_backlog();
if (backlog >= _control_points.back().input) {
update_controller(_control_points.back().output);
return;
}
// interpolate to find out which region we are. This run infrequently and there are a fixed
// number of points so a simple loop will do.
size_t idx = 1;
@@ -2808,6 +2817,7 @@ keyspace::make_column_family_config(const schema& s, const db::config& db_config
cfg.enable_disk_writes = _config.enable_disk_writes;
cfg.enable_commitlog = _config.enable_commitlog;
cfg.enable_cache = _config.enable_cache;
cfg.compaction_enforce_min_threshold = _config.compaction_enforce_min_threshold;
cfg.dirty_memory_manager = _config.dirty_memory_manager;
cfg.streaming_dirty_memory_manager = _config.streaming_dirty_memory_manager;
cfg.read_concurrency_semaphore = _config.read_concurrency_semaphore;
@@ -3577,6 +3587,7 @@ database::make_keyspace_config(const keyspace_metadata& ksm) {
cfg.enable_commitlog = false;
cfg.enable_cache = false;
}
cfg.compaction_enforce_min_threshold = _cfg->compaction_enforce_min_threshold();
cfg.dirty_memory_manager = &_dirty_memory_manager;
cfg.streaming_dirty_memory_manager = &_streaming_dirty_memory_manager;
cfg.read_concurrency_semaphore = &_read_concurrency_sem;
@@ -4537,16 +4548,14 @@ flat_mutation_reader make_local_shard_sstable_reader(schema_ptr s,
}
return reader;
};
return make_combined_reader(s, std::make_unique<incremental_reader_selector>(s,
std::move(sstables),
pr,
slice,
pc,
std::move(resource_tracker),
std::move(trace_state),
fwd,
fwd_mr,
std::move(reader_factory_fn)),
auto all_readers = boost::copy_range<std::vector<flat_mutation_reader>>(
*sstables->all()
| boost::adaptors::transformed([&] (sstables::shared_sstable sst) -> flat_mutation_reader {
return reader_factory_fn(sst, pr);
})
);
return make_combined_reader(s,
std::move(all_readers),
fwd,
fwd_mr);
}
@@ -4565,16 +4574,14 @@ flat_mutation_reader make_range_sstable_reader(schema_ptr s,
auto reader_factory_fn = [s, &slice, &pc, resource_tracker, fwd, fwd_mr, &monitor_generator] (sstables::shared_sstable& sst, const dht::partition_range& pr) {
return sst->read_range_rows_flat(s, pr, slice, pc, resource_tracker, fwd, fwd_mr, monitor_generator(sst));
};
return make_combined_reader(s, std::make_unique<incremental_reader_selector>(s,
std::move(sstables),
pr,
slice,
pc,
std::move(resource_tracker),
std::move(trace_state),
fwd,
fwd_mr,
std::move(reader_factory_fn)),
auto sstable_readers = boost::copy_range<std::vector<flat_mutation_reader>>(
*sstables->all()
| boost::adaptors::transformed([&] (sstables::shared_sstable sst) {
return reader_factory_fn(sst, pr);
})
);
return make_combined_reader(s,
std::move(sstable_readers),
fwd,
fwd_mr);
}

View File

@@ -297,6 +297,7 @@ public:
bool enable_cache = true;
bool enable_commitlog = true;
bool enable_incremental_backups = false;
bool compaction_enforce_min_threshold = false;
::dirty_memory_manager* dirty_memory_manager = &default_dirty_memory_manager;
::dirty_memory_manager* streaming_dirty_memory_manager = &default_dirty_memory_manager;
reader_concurrency_semaphore* read_concurrency_semaphore;
@@ -735,6 +736,10 @@ public:
_config.enable_incremental_backups = val;
}
bool compaction_enforce_min_threshold() const {
return _config.compaction_enforce_min_threshold;
}
const sstables::sstable_set& get_sstable_set() const;
lw_shared_ptr<sstable_list> get_sstables() const;
lw_shared_ptr<sstable_list> get_sstables_including_compacted_undeleted() const;
@@ -979,6 +984,7 @@ public:
bool enable_disk_writes = true;
bool enable_cache = true;
bool enable_incremental_backups = false;
bool compaction_enforce_min_threshold = false;
::dirty_memory_manager* dirty_memory_manager = &default_dirty_memory_manager;
::dirty_memory_manager* streaming_dirty_memory_manager = &default_dirty_memory_manager;
reader_concurrency_semaphore* read_concurrency_semaphore;

View File

@@ -125,6 +125,9 @@ public:
val(compaction_static_shares, float, 0, Used, \
"If set to higher than 0, ignore the controller's output and set the compaction shares statically. Do not set this unless you know what you are doing and suspect a problem in the controller. This option will be retired when the controller reaches more maturity" \
) \
val(compaction_enforce_min_threshold, bool, false, Used, \
"If set to true, enforce the min_threshold option for compactions strictly. If false (default), Scylla may decide to compact even if below min_threshold" \
) \
/* Initialization properties */ \
/* The minimal properties needed for configuring a cluster. */ \
val(cluster_name, sstring, "", Used, \

View File

@@ -120,7 +120,7 @@ else
fi
fi
echo -n " "
/usr/lib/scylla/scylla_ec2_check
/usr/lib/scylla/scylla_ec2_check --nic eth0
if [ $? -eq 0 ]; then
echo
fi

View File

@@ -2,6 +2,12 @@
. /usr/lib/scylla/scylla_lib.sh
print_usage() {
echo "scylla_ec2_check --nic eth0"
echo " --nic specify NIC"
exit 1
}
get_en_interface_type() {
TYPE=`curl -s http://169.254.169.254/latest/meta-data/instance-type|cut -d . -f 1`
SUBTYPE=`curl -s http://169.254.169.254/latest/meta-data/instance-type|cut -d . -f 2`
@@ -18,7 +24,7 @@ get_en_interface_type() {
}
is_vpc_enabled() {
MAC=`cat /sys/class/net/eth0/address`
MAC=`cat /sys/class/net/$1/address`
VPC_AVAIL=`curl -s http://169.254.169.254/latest/meta-data/network/interfaces/macs/$MAC/|grep vpc-id`
[ -n "$VPC_AVAIL" ]
}
@@ -27,9 +33,27 @@ if ! is_ec2; then
exit 0
fi
if [ $# -eq 0 ]; then
print_usage
fi
while [ $# -gt 0 ]; do
case "$1" in
"--nic")
verify_args $@
NIC="$2"
shift 2
;;
esac
done
if ! is_valid_nic $NIC; then
echo "NIC $NIC doesn't exist."
exit 1
fi
TYPE=`curl -s http://169.254.169.254/latest/meta-data/instance-type`
EN=`get_en_interface_type`
DRIVER=`ethtool -i eth0|awk '/^driver:/ {print $2}'`
DRIVER=`ethtool -i $NIC|awk '/^driver:/ {print $2}'`
if [ "$EN" = "" ]; then
tput setaf 1
tput bold
@@ -39,7 +63,7 @@ if [ "$EN" = "" ]; then
echo "More documentation available at: "
echo "http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html#enabling_enhanced_networking"
exit 1
elif ! is_vpc_enabled; then
elif ! is_vpc_enabled $NIC; then
tput setaf 1
tput bold
echo "VPC is not enabled!"

View File

@@ -91,6 +91,10 @@ create_perftune_conf() {
/usr/lib/scylla/perftune.py --tune net --nic "$nic" $mode --dump-options-file > /etc/scylla.d/perftune.yaml
}
is_valid_nic() {
[ -d /sys/class/net/$1 ]
}
. /etc/os-release
if is_debian_variant || is_gentoo_variant; then
SYSCONFIG=/etc/default

View File

@@ -39,6 +39,27 @@ print_usage() {
exit 1
}
interactive_choose_nic() {
NICS=$(for i in /sys/class/net/*;do nic=`basename $i`; if [ "$nic" != "lo" ]; then echo $nic; fi; done)
NR_NICS=`echo $NICS|wc -w`
if [ $NR_NICS -eq 0 ]; then
echo "NIC not found."
exit 1
elif [ $NR_NICS -eq 1 ]; then
NIC=$NICS
else
echo "Please select NIC from following list: "
while true; do
echo $NICS
echo -n "> "
read NIC
if is_valid_nic $NIC; then
break
fi
done
fi
}
interactive_ask_service() {
echo $1
echo $2
@@ -112,14 +133,20 @@ run_setup_script() {
name=$1
shift 1
$* &&:
if [ $? -ne 0 ] && [ $INTERACTIVE -eq 1 ]; then
printf "${RED}$name setup failed. press any key to continue...${NO_COLOR}\n"
read
return 1
if [ $? -ne 0 ]; then
if [ $INTERACTIVE -eq 1 ]; then
printf "${RED}$name setup failed. press any key to continue...${NO_COLOR}\n"
read
return 1
else
printf "$name setup failed.\n"
exit 1
fi
fi
return 0
}
NIC="eth0"
AMI=0
SET_NIC=0
DEV_MODE=0
@@ -260,7 +287,8 @@ if is_ec2; then
EC2_CHECK=$?
fi
if [ $EC2_CHECK -eq 1 ]; then
/usr/lib/scylla/scylla_ec2_check
interactive_choose_nic
/usr/lib/scylla/scylla_ec2_check --nic $NIC
fi
fi
@@ -447,24 +475,6 @@ if [ $INTERACTIVE -eq 1 ]; then
interactive_ask_service "Do you want to setup sysconfig?" "Answer yes to do system wide configuration customized for Scylla. Answer no to do nothing." "yes" &&:
SYSCONFIG_SETUP=$?
if [ $SYSCONFIG_SETUP -eq 1 ]; then
NICS=$(for i in /sys/class/net/*;do nic=`basename $i`; if [ "$nic" != "lo" ]; then echo $nic; fi; done)
NR_NICS=`echo $NICS|wc -w`
if [ $NR_NICS -eq 0 ]; then
echo "NIC not found."
exit 1
elif [ $NR_NICS -eq 1 ]; then
NIC=$NICS
else
echo "Please select NIC from following list: "
while true; do
echo $NICS
echo -n "> "
read NIC
if [ -e /sys/class/net/$NIC ]; then
break
fi
done
fi
interactive_ask_service "Do you want to optimize NIC queue settings?" "Answer yes to enable network card optimization and improve performance. Answer no to skip this optimization." "yes" &&:
SET_NIC=$?
fi
@@ -474,6 +484,7 @@ if [ $SYSCONFIG_SETUP -eq 1 ]; then
if [ $SET_NIC -eq 1 ]; then
SETUP_ARGS="--setup-nic"
fi
interactive_choose_nic
run_setup_script "NIC queue" /usr/lib/scylla/scylla_sysconfig_setup --nic $NIC $SETUP_ARGS
fi

View File

@@ -2,10 +2,11 @@
. /etc/os-release
print_usage() {
echo "build_deb.sh -target <codename> --dist --rebuild-dep"
echo "build_deb.sh -target <codename> --dist --rebuild-dep --jobs 2"
echo " --target target distribution codename"
echo " --dist create a public distribution package"
echo " --no-clean don't rebuild pbuilder tgz"
echo " --jobs specify number of jobs"
exit 1
}
install_deps() {
@@ -19,6 +20,7 @@ install_deps() {
DIST=0
TARGET=
NO_CLEAN=0
JOBS=0
while [ $# -gt 0 ]; do
case "$1" in
"--dist")
@@ -33,6 +35,10 @@ while [ $# -gt 0 ]; do
NO_CLEAN=1
shift 1
;;
"--jobs")
JOBS=$2
shift 2
;;
*)
print_usage
;;
@@ -248,16 +254,18 @@ if [ "$TARGET" != "trusty" ]; then
cp dist/common/systemd/node-exporter.service debian/scylla-server.node-exporter.service
fi
sudo cp ./dist/debian/pbuilderrc ~root/.pbuilderrc
if [ $NO_CLEAN -eq 0 ]; then
sudo rm -fv /var/cache/pbuilder/scylla-server-$TARGET.tgz
sudo -H DIST=$TARGET /usr/sbin/pbuilder clean
sudo -H DIST=$TARGET /usr/sbin/pbuilder create --allow-untrusted
sudo DIST=$TARGET /usr/sbin/pbuilder clean --configfile ./dist/debian/pbuilderrc
sudo DIST=$TARGET /usr/sbin/pbuilder create --configfile ./dist/debian/pbuilderrc --allow-untrusted
fi
sudo -H DIST=$TARGET /usr/sbin/pbuilder update --allow-untrusted
if [ $JOBS -ne 0 ]; then
DEB_BUILD_OPTIONS="parallel=$JOBS"
fi
sudo -H DIST=$TARGET /usr/sbin/pbuilder update --configfile ./dist/debian/pbuilderrc --allow-untrusted
if [ "$TARGET" = "trusty" ] || [ "$TARGET" = "xenial" ] || [ "$TARGET" = "yakkety" ] || [ "$TARGET" = "zesty" ] || [ "$TARGET" = "artful" ] || [ "$TARGET" = "bionic" ]; then
sudo -H DIST=$TARGET /usr/sbin/pbuilder execute --save-after-exec dist/debian/ubuntu_enable_ppa.sh
sudo DIST=$TARGET /usr/sbin/pbuilder execute --configfile ./dist/debian/pbuilderrc --save-after-exec dist/debian/ubuntu_enable_ppa.sh
elif [ "$TARGET" = "jessie" ] || [ "$TARGET" = "stretch" ]; then
sudo -H DIST=$TARGET /usr/sbin/pbuilder execute --save-after-exec dist/debian/debian_install_gpgkey.sh
sudo DIST=$TARGET /usr/sbin/pbuilder execute --configfile ./dist/debian/pbuilderrc --save-after-exec dist/debian/debian_install_gpgkey.sh
fi
sudo -H DIST=$TARGET pdebuild --buildresult build/debs
sudo -H DIST=$TARGET DEB_BUILD_OPTIONS=$DEB_BUILD_OPTIONS pdebuild --configfile ./dist/debian/pbuilderrc --buildresult build/debs

View File

@@ -1,12 +1,13 @@
#!/usr/bin/make -f
export PYBUILD_DISABLE=1
jobs := $(shell echo $$DEB_BUILD_OPTIONS | sed -r "s/.*parallel=([0-9]+).*/-j\1/")
override_dh_auto_configure:
./configure.py --with=scylla --with=iotune --enable-dpdk --mode=release --static-thrift --static-boost --static-yaml-cpp --compiler=@@COMPILER@@ --cflags="-I/opt/scylladb/include -L/opt/scylladb/lib/x86-linux-gnu/" --ldflags="-Wl,-rpath=/opt/scylladb/lib"
override_dh_auto_build:
PATH="/opt/scylladb/bin:$$PATH" ninja
PATH="/opt/scylladb/bin:$$PATH" ninja $(jobs)
override_dh_auto_clean:
rm -rf build/release seastar/build

View File

@@ -183,10 +183,7 @@ flat_mutation_reader make_delegating_reader(flat_mutation_reader& r) {
flat_mutation_reader make_forwardable(flat_mutation_reader m) {
class reader : public flat_mutation_reader::impl {
flat_mutation_reader _underlying;
position_range _current = {
position_in_partition(position_in_partition::partition_start_tag_t()),
position_in_partition(position_in_partition::after_static_row_tag_t())
};
position_range _current;
mutation_fragment_opt _next;
// When resolves, _next is engaged or _end_of_stream is set.
future<> ensure_next() {
@@ -201,7 +198,10 @@ flat_mutation_reader make_forwardable(flat_mutation_reader m) {
});
}
public:
reader(flat_mutation_reader r) : impl(r.schema()), _underlying(std::move(r)) { }
reader(flat_mutation_reader r) : impl(r.schema()), _underlying(std::move(r)), _current({
position_in_partition(position_in_partition::partition_start_tag_t()),
position_in_partition(position_in_partition::after_static_row_tag_t())
}) { }
virtual future<> fill_buffer(db::timeout_clock::time_point timeout) override {
return repeat([this] {
if (is_buffer_full()) {

View File

@@ -1005,7 +1005,7 @@ future<> gossiper::assassinate_endpoint(sstring address) {
logger.warn("Assassinating {} via gossip", endpoint);
if (es) {
auto& ss = service::get_local_storage_service();
auto tokens = ss.get_token_metadata().get_tokens(endpoint);
tokens = ss.get_token_metadata().get_tokens(endpoint);
if (tokens.empty()) {
logger.warn("Unable to calculate tokens for {}. Will use a random one", address);
throw std::runtime_error(sprint("Unable to calculate tokens for %s", endpoint));

View File

@@ -721,6 +721,10 @@ public:
static const compound& get_compound_type(const schema& s) {
return s.clustering_key_prefix_type();
}
static clustering_key_prefix_view make_empty() {
return { bytes_view() };
}
};
class clustering_key_prefix : public prefix_compound_wrapper<clustering_key_prefix, clustering_key_prefix_view, clustering_key> {

View File

@@ -119,9 +119,17 @@ insert_token_range_to_sorted_container_while_unwrapping(
const dht::token& tok,
dht::token_range_vector& ret) {
if (prev_tok < tok) {
ret.emplace_back(
dht::token_range::bound(prev_tok, false),
dht::token_range::bound(tok, true));
auto pos = ret.end();
if (!ret.empty() && !std::prev(pos)->end()) {
// We inserted a wrapped range (a, b] previously as
// (-inf, b], (a, +inf). So now we insert in the next-to-last
// position to keep the last range (a, +inf) at the end.
pos = std::prev(pos);
}
ret.insert(pos,
dht::token_range{
dht::token_range::bound(prev_tok, false),
dht::token_range::bound(tok, true)});
} else {
ret.emplace_back(
dht::token_range::bound(prev_tok, false),

View File

@@ -100,7 +100,6 @@ future<> ec2_multi_region_snitch::gossiper_starting() {
// Note: currently gossiper "main" instance always runs on CPU0 therefore
// this function will be executed on CPU0 only.
//
ec2_snitch::gossiper_starting();
using namespace gms;
auto& g = get_local_gossiper();

View File

@@ -1089,7 +1089,7 @@ row::apply_monotonically(const column_definition& column, atomic_cell_or_collect
if (_type == storage_type::vector && id < max_vector_size) {
if (id >= _storage.vector.v.size()) {
_storage.vector.v.resize(id);
_storage.vector.v.emplace_back(cell_and_hash{std::move(value), std::move(hash)});
_storage.vector.v.emplace_back(std::move(value), std::move(hash));
_storage.vector.present.set(id);
_size++;
} else if (auto& cell_and_hash = _storage.vector.v[id]; !bool(cell_and_hash.cell)) {
@@ -1753,9 +1753,10 @@ void mutation_querier::query_static_row(const row& r, tombstone current_tombston
} else if (_short_reads_allowed) {
seastar::measuring_output_stream stream;
ser::qr_partition__static_row__cells<seastar::measuring_output_stream> out(stream, { });
auto start = stream.size();
get_compacted_row_slice(_schema, slice, column_kind::static_column,
r, slice.static_columns, _static_cells_wr);
_memory_accounter.update(stream.size());
r, slice.static_columns, out);
_memory_accounter.update(stream.size() - start);
}
if (_pw.requested_digest()) {
max_timestamp max_ts{_pw.last_modified()};
@@ -1816,8 +1817,9 @@ stop_iteration mutation_querier::consume(clustering_row&& cr, row_tombstone curr
} else if (_short_reads_allowed) {
seastar::measuring_output_stream stream;
ser::qr_partition__rows<seastar::measuring_output_stream> out(stream, { });
auto start = stream.size();
write_row(out);
stop = _memory_accounter.update_and_check(stream.size());
stop = _memory_accounter.update_and_check(stream.size() - start);
}
_live_clustering_rows++;

View File

@@ -74,6 +74,17 @@ using cell_hash_opt = seastar::optimized_optional<cell_hash>;
struct cell_and_hash {
atomic_cell_or_collection cell;
mutable cell_hash_opt hash;
cell_and_hash() = default;
cell_and_hash(cell_and_hash&&) noexcept = default;
cell_and_hash& operator=(cell_and_hash&&) noexcept = default;
cell_and_hash(const cell_and_hash&) = default;
cell_and_hash& operator=(const cell_and_hash&) = default;
cell_and_hash(atomic_cell_or_collection&& cell, cell_hash_opt hash)
: cell(std::move(cell))
, hash(hash)
{ }
};
//

View File

@@ -273,6 +273,11 @@ public:
return is_partition_end() || (_ck && _ck->is_empty(s) && _bound_weight > 0);
}
bool is_before_all_clustered_rows(const schema& s) const {
return _type < partition_region::clustered
|| (_type == partition_region::clustered && _ck->is_empty(s) && _bound_weight < 0);
}
template<typename Hasher>
void feed_hash(Hasher& hasher, const schema& s) const {
::feed_hash(hasher, _bound_weight);

View File

@@ -152,34 +152,33 @@ const size_t querier_cache::max_queriers_memory_usage = memory::stats().total_me
void querier_cache::scan_cache_entries() {
const auto now = lowres_clock::now();
auto it = _meta_entries.begin();
const auto end = _meta_entries.end();
auto it = _entries.begin();
const auto end = _entries.end();
while (it != end && it->is_expired(now)) {
if (*it) {
++_stats.time_based_evictions;
}
it = _meta_entries.erase(it);
_stats.population = _entries.size();
++_stats.time_based_evictions;
--_stats.population;
it = _entries.erase(it);
}
}
querier_cache::entries::iterator querier_cache::find_querier(utils::UUID key, const dht::partition_range& range, tracing::trace_state_ptr trace_state) {
const auto queriers = _entries.equal_range(key);
const auto queriers = _index.equal_range(key);
if (queriers.first == _entries.end()) {
if (queriers.first == _index.end()) {
tracing::trace(trace_state, "Found no cached querier for key {}", key);
return _entries.end();
}
const auto it = std::find_if(queriers.first, queriers.second, [&] (const std::pair<const utils::UUID, entry>& elem) {
return elem.second.get().matches(range);
const auto it = std::find_if(queriers.first, queriers.second, [&] (const entry& e) {
return e.value().matches(range);
});
if (it == queriers.second) {
tracing::trace(trace_state, "Found cached querier(s) for key {} but none matches the query range {}", key, range);
return _entries.end();
}
tracing::trace(trace_state, "Found cached querier for key {} and range {}", key, range);
return it;
return it->pos();
}
querier_cache::querier_cache(std::chrono::seconds entry_ttl)
@@ -199,8 +198,7 @@ void querier_cache::insert(utils::UUID key, querier&& q, tracing::trace_state_pt
tracing::trace(trace_state, "Caching querier with key {}", key);
auto memory_usage = boost::accumulate(
_entries | boost::adaptors::map_values | boost::adaptors::transformed(std::mem_fn(&querier_cache::entry::memory_usage)), size_t(0));
auto memory_usage = boost::accumulate(_entries | boost::adaptors::transformed(std::mem_fn(&entry::memory_usage)), size_t(0));
// We add the memory-usage of the to-be added querier to the memory-usage
// of all the cached queriers. We now need to makes sure this number is
@@ -210,20 +208,20 @@ void querier_cache::insert(utils::UUID key, querier&& q, tracing::trace_state_pt
memory_usage += q.memory_usage();
if (memory_usage >= max_queriers_memory_usage) {
auto it = _meta_entries.begin();
const auto end = _meta_entries.end();
auto it = _entries.begin();
const auto end = _entries.end();
while (it != end && memory_usage >= max_queriers_memory_usage) {
if (*it) {
++_stats.memory_based_evictions;
memory_usage -= it->get_entry().memory_usage();
}
it = _meta_entries.erase(it);
++_stats.memory_based_evictions;
memory_usage -= it->memory_usage();
--_stats.population;
it = _entries.erase(it);
}
}
const auto it = _entries.emplace(key, entry::param{std::move(q), _entry_ttl}).first;
_meta_entries.emplace_back(_entries, it);
_stats.population = _entries.size();
auto& e = _entries.emplace_back(key, std::move(q), lowres_clock::now() + _entry_ttl);
e.set_pos(--_entries.end());
_index.insert(e);
++_stats.population;
}
querier querier_cache::lookup(utils::UUID key,
@@ -240,9 +238,9 @@ querier querier_cache::lookup(utils::UUID key,
return create_fun();
}
auto q = std::move(it->second).get();
auto q = std::move(*it).value();
_entries.erase(it);
_stats.population = _entries.size();
--_stats.population;
const auto can_be_used = q.can_be_used_for_page(only_live, s, range, slice);
if (can_be_used == querier::can_use::yes) {
@@ -265,25 +263,24 @@ bool querier_cache::evict_one() {
return false;
}
auto it = _meta_entries.begin();
const auto end = _meta_entries.end();
while (it != end) {
const auto is_live = bool(*it);
it = _meta_entries.erase(it);
_stats.population = _entries.size();
if (is_live) {
++_stats.resource_based_evictions;
return true;
}
}
return false;
++_stats.resource_based_evictions;
--_stats.population;
_entries.pop_front();
return true;
}
void querier_cache::evict_all_for_table(const utils::UUID& schema_id) {
_meta_entries.remove_if([&] (const meta_entry& me) {
return !me || me.get_entry().get().schema()->id() == schema_id;
});
_stats.population = _entries.size();
auto it = _entries.begin();
const auto end = _entries.end();
while (it != end) {
if (it->schema().id() == schema_id) {
--_stats.population;
it = _entries.erase(it);
} else {
++it;
}
}
}
querier_cache_context::querier_cache_context(querier_cache& cache, utils::UUID key, bool is_first_page)

View File

@@ -24,7 +24,8 @@
#include "mutation_compactor.hh"
#include "mutation_reader.hh"
#include <seastar/core/weak_ptr.hh>
#include <boost/intrusive/set.hpp>
#include <variant>
/// One-stop object for serving queries.
@@ -264,75 +265,65 @@ public:
};
private:
class entry : public weakly_referencable<entry> {
querier _querier;
lowres_clock::time_point _expires;
public:
// Since entry cannot be moved and unordered_map::emplace can pass only
// a single param to it's mapped-type we need to force a single-param
// constructor for entry. Oh C++...
struct param {
querier q;
std::chrono::seconds ttl;
};
class entry : public boost::intrusive::set_base_hook<boost::intrusive::link_mode<boost::intrusive::auto_unlink>> {
// Self reference so that we can remove the entry given an `entry&`.
std::list<entry>::iterator _pos;
const utils::UUID _key;
const lowres_clock::time_point _expires;
querier _value;
explicit entry(param p)
: _querier(std::move(p.q))
, _expires(lowres_clock::now() + p.ttl) {
public:
entry(utils::UUID key, querier q, lowres_clock::time_point expires)
: _key(key)
, _expires(expires)
, _value(std::move(q)) {
}
std::list<entry>::iterator pos() const {
return _pos;
}
void set_pos(std::list<entry>::iterator pos) {
_pos = pos;
}
const utils::UUID& key() const {
return _key;
}
const ::schema& schema() const {
return *_value.schema();
}
bool is_expired(const lowres_clock::time_point& now) const {
return _expires <= now;
}
const querier& get() const & {
return _querier;
}
querier&& get() && {
return std::move(_querier);
}
size_t memory_usage() const {
return _querier.memory_usage();
return _value.memory_usage();
}
const querier& value() const & {
return _value;
}
querier value() && {
return std::move(_value);
}
};
using entries = std::unordered_map<utils::UUID, entry>;
class meta_entry {
entries& _entries;
weak_ptr<entry> _entry_ptr;
entries::iterator _entry_it;
public:
meta_entry(entries& e, entries::iterator it)
: _entries(e)
, _entry_ptr(it->second.weak_from_this())
, _entry_it(it) {
}
~meta_entry() {
if (_entry_ptr) {
_entries.erase(_entry_it);
}
}
bool is_expired(const lowres_clock::time_point& now) const {
return !_entry_ptr || _entry_ptr->is_expired(now);
}
explicit operator bool() const {
return bool(_entry_ptr);
}
const entry& get_entry() const {
return *_entry_ptr;
}
struct key_of_entry {
using type = utils::UUID;
const type& operator()(const entry& e) { return e.key(); }
};
using entries = std::list<entry>;
using index = boost::intrusive::multiset<entry, boost::intrusive::key_of_value<key_of_entry>,
boost::intrusive::constant_time_size<false>>;
private:
entries _entries;
std::list<meta_entry> _meta_entries;
index _index;
timer<lowres_clock> _expiry_timer;
std::chrono::seconds _entry_ttl;
stats _stats;

Submodule seastar updated: 6f61d7456e...88cb58cfbf

View File

@@ -144,7 +144,11 @@ future<lowres_clock::duration> cache_hitrate_calculator::recalculate_hitrates()
return _db.invoke_on_all([this, rates = std::move(rates), cpuid = engine().cpu_id()] (database& db) {
sstring gstate;
for (auto& cf : db.get_column_families() | boost::adaptors::filtered(non_system_filter)) {
stat s = rates.at(cf.first);
auto it = rates.find(cf.first);
if (it == rates.end()) { // a table may be added before map/reduce compltes and this code runs
continue;
}
stat s = it->second;
float rate = 0;
if (s.h) {
rate = s.h / (s.h + s.m);

View File

@@ -83,7 +83,7 @@ private:
_last_replicas = state->get_last_replicas();
} else {
// Reusing readers is currently only supported for singular queries.
if (_ranges.front().is_singular()) {
if (!_ranges.empty() && query::is_single_partition(_ranges.front())) {
_cmd->query_uuid = utils::make_random_uuid();
}
_cmd->is_first_page = true;

View File

@@ -3220,9 +3220,22 @@ storage_proxy::query_partition_key_range(lw_shared_ptr<query::read_command> cmd,
slogger.debug("Estimated result rows per range: {}; requested rows: {}, ranges.size(): {}; concurrent range requests: {}",
result_rows_per_range, cmd->row_limit, ranges.size(), concurrency_factor);
// The call to `query_partition_key_range_concurrent()` below
// updates `cmd` directly when processing the results. Under
// some circumstances, when the query executes without deferring,
// this updating will happen before the lambda object is constructed
// and hence the updates will be visible to the lambda. This will
// result in the merger below trimming the results according to the
// updated (decremented) limits and causing the paging logic to
// declare the query exhausted due to the non-full page. To avoid
// this save the original values of the limits here and pass these
// to the lambda below.
const auto row_limit = cmd->row_limit;
const auto partition_limit = cmd->partition_limit;
return query_partition_key_range_concurrent(timeout, std::move(results), cmd, cl, ranges.begin(), std::move(ranges), concurrency_factor,
std::move(trace_state), cmd->row_limit, cmd->partition_limit)
.then([row_limit = cmd->row_limit, partition_limit = cmd->partition_limit](std::vector<foreign_ptr<lw_shared_ptr<query::result>>> results) {
.then([row_limit, partition_limit](std::vector<foreign_ptr<lw_shared_ptr<query::result>>> results) {
query::result_merger merger(row_limit, partition_limit);
merger.reserve(results.size());
@@ -3585,6 +3598,7 @@ future<> storage_proxy::truncate_blocking(sstring keyspace, sstring cfname) {
std::rethrow_exception(ep);
} catch (rpc::timeout_error& e) {
slogger.trace("Truncation of {} timed out: {}", cfname, e.what());
throw;
} catch (...) {
throw;
}

View File

@@ -33,6 +33,7 @@
#include "unimplemented.hh"
#include "stdx.hh"
#include "segmented_compress_params.hh"
#include "utils/class_registrator.hh"
namespace sstables {
@@ -299,7 +300,8 @@ size_t local_compression::compress_max_size(size_t input_len) const {
void compression::set_compressor(compressor_ptr c) {
if (c) {
auto& cn = c->name();
unqualified_name uqn(compressor::namespace_prefix, c->name());
const sstring& cn = uqn;
name.value = bytes(cn.begin(), cn.end());
for (auto& p : c->options()) {
if (p.first != compression_parameters::SSTABLE_COMPRESSION) {

View File

@@ -294,6 +294,12 @@ size_tiered_compaction_strategy::get_sstables_for_compaction(column_family& cfs,
return sstables::compaction_descriptor(std::move(most_interesting));
}
// If we are not enforcing min_threshold explicitly, try any pair of SStables in the same tier.
if (!cfs.compaction_enforce_min_threshold() && is_any_bucket_interesting(buckets, 2)) {
std::vector<sstables::shared_sstable> most_interesting = most_interesting_bucket(std::move(buckets), 2, max_threshold);
return sstables::compaction_descriptor(std::move(most_interesting));
}
// if there is no sstable to compact in standard way, try compacting single sstable whose droppable tombstone
// ratio is greater than threshold.
// prefer oldest sstables from biggest size tiers because they will be easier to satisfy conditions for

View File

@@ -215,3 +215,22 @@ SEASTAR_TEST_CASE(test_aggregate_count) {
}
});
}
SEASTAR_TEST_CASE(test_reverse_type_aggregation) {
return do_with_cql_env_thread([&] (auto& e) {
e.execute_cql("CREATE TABLE test(p int, c timestamp, v int, primary key (p, c)) with clustering order by (c desc)").get();
e.execute_cql("INSERT INTO test(p, c, v) VALUES (1, 1, 1)").get();
e.execute_cql("INSERT INTO test(p, c, v) VALUES (1, 2, 1)").get();
{
auto tp = db_clock::from_time_t({ 0 }) + std::chrono::milliseconds(1);
auto msg = e.execute_cql("SELECT min(c) FROM test").get0();
assert_that(msg).is_rows().with_size(1).with_row({{timestamp_type->decompose(tp)}});
}
{
auto tp = db_clock::from_time_t({ 0 }) + std::chrono::milliseconds(2);
auto msg = e.execute_cql("SELECT max(c) FROM test").get0();
assert_that(msg).is_rows().with_size(1).with_row({{timestamp_type->decompose(tp)}});
}
});
}

View File

@@ -2047,10 +2047,9 @@ SEASTAR_TEST_CASE(test_in_restriction) {
assert_that(msg).is_rows().with_size(0);
return e.execute_cql("select r1 from tir where p1 in (2, 0, 2, 1);");
}).then([&e] (shared_ptr<cql_transport::messages::result_message> msg) {
assert_that(msg).is_rows().with_rows({
assert_that(msg).is_rows().with_rows_ignore_order({
{int32_type->decompose(4)},
{int32_type->decompose(0)},
{int32_type->decompose(4)},
{int32_type->decompose(1)},
{int32_type->decompose(2)},
{int32_type->decompose(3)},
@@ -2072,6 +2071,22 @@ SEASTAR_TEST_CASE(test_in_restriction) {
{int32_type->decompose(2)},
{int32_type->decompose(1)},
});
return e.prepare("select r1 from tir where p1 in ?");
}).then([&e] (cql3::prepared_cache_key_type prepared_id){
auto my_list_type = list_type_impl::get_instance(int32_type, true);
std::vector<cql3::raw_value> raw_values;
auto in_values_list = my_list_type->decompose(make_list_value(my_list_type,
list_type_impl::native_type{{int(2), int(0), int(2), int(1)}}));
raw_values.emplace_back(cql3::raw_value::make_value(in_values_list));
return e.execute_prepared(prepared_id,raw_values);
}).then([&e] (shared_ptr<cql_transport::messages::result_message> msg) {
assert_that(msg).is_rows().with_rows_ignore_order({
{int32_type->decompose(4)},
{int32_type->decompose(0)},
{int32_type->decompose(1)},
{int32_type->decompose(2)},
{int32_type->decompose(3)},
});
});
});
}
@@ -2607,3 +2622,81 @@ SEASTAR_TEST_CASE(test_insert_large_collection_values) {
});
});
}
// Corner-case test that checks for the paging code's preparedness for an empty
// range list.
SEASTAR_TEST_CASE(test_empty_partition_range_scan) {
return do_with_cql_env_thread([] (cql_test_env& e) {
e.execute_cql("create keyspace empty_partition_range_scan with replication = {'class': 'SimpleStrategy', 'replication_factor': 1};").get();
e.execute_cql("create table empty_partition_range_scan.tb (a int, b int, c int, val int, PRIMARY KEY ((a,b),c) );").get();
auto qo = std::make_unique<cql3::query_options>(db::consistency_level::LOCAL_ONE, std::vector<cql3::raw_value>{},
cql3::query_options::specific_options{1, nullptr, {}, api::new_timestamp()});
auto res = e.execute_cql("select * from empty_partition_range_scan.tb where token (a,b) > 1 and token(a,b) <= 1;", std::move(qo)).get0();
assert_that(res).is_rows().is_empty();
});
}
SEASTAR_TEST_CASE(test_static_multi_cell_static_lists_with_ckey) {
return do_with_cql_env_thread([] (cql_test_env& e) {
e.execute_cql("CREATE TABLE t (p int, c int, slist list<int> static, v int, PRIMARY KEY (p, c));").get();
e.execute_cql("INSERT INTO t (p, c, slist, v) VALUES (1, 1, [1], 1); ").get();
{
e.execute_cql("UPDATE t SET slist[0] = 3, v = 3 WHERE p = 1 AND c = 1;").get();
auto msg = e.execute_cql("SELECT slist, v FROM t WHERE p = 1 AND c = 1;").get0();
auto slist_type = list_type_impl::get_instance(int32_type, true);
assert_that(msg).is_rows().with_row({
{ slist_type->decompose(make_list_value(slist_type, list_type_impl::native_type({{3}}))) },
{ int32_type->decompose(3) }
});
}
{
e.execute_cql("UPDATE t SET slist = [4], v = 4 WHERE p = 1 AND c = 1;").get();
auto msg = e.execute_cql("SELECT slist, v FROM t WHERE p = 1 AND c = 1;").get0();
auto slist_type = list_type_impl::get_instance(int32_type, true);
assert_that(msg).is_rows().with_row({
{ slist_type->decompose(make_list_value(slist_type, list_type_impl::native_type({{4}}))) },
{ int32_type->decompose(4) }
});
}
{
e.execute_cql("UPDATE t SET slist = [3] + slist , v = 5 WHERE p = 1 AND c = 1;").get();
auto msg = e.execute_cql("SELECT slist, v FROM t WHERE p = 1 AND c = 1;").get0();
auto slist_type = list_type_impl::get_instance(int32_type, true);
assert_that(msg).is_rows().with_row({
{ slist_type->decompose(make_list_value(slist_type, list_type_impl::native_type({3, 4}))) },
{ int32_type->decompose(5) }
});
}
{
e.execute_cql("UPDATE t SET slist = slist + [5] , v = 6 WHERE p = 1 AND c = 1;").get();
auto msg = e.execute_cql("SELECT slist, v FROM t WHERE p = 1 AND c = 1;").get0();
auto slist_type = list_type_impl::get_instance(int32_type, true);
assert_that(msg).is_rows().with_row({
{ slist_type->decompose(make_list_value(slist_type, list_type_impl::native_type({3, 4, 5}))) },
{ int32_type->decompose(6) }
});
}
{
e.execute_cql("DELETE slist[2] from t WHERE p = 1;").get();
auto msg = e.execute_cql("SELECT slist, v FROM t WHERE p = 1 AND c = 1;").get0();
auto slist_type = list_type_impl::get_instance(int32_type, true);
assert_that(msg).is_rows().with_row({
{ slist_type->decompose(make_list_value(slist_type, list_type_impl::native_type({3, 4}))) },
{ int32_type->decompose(6) }
});
}
{
e.execute_cql("UPDATE t SET slist = slist - [4] , v = 7 WHERE p = 1 AND c = 1;").get();
auto msg = e.execute_cql("SELECT slist, v FROM t WHERE p = 1 AND c = 1;").get0();
auto slist_type = list_type_impl::get_instance(int32_type, true);
assert_that(msg).is_rows().with_row({
{ slist_type->decompose(make_list_value(slist_type, list_type_impl::native_type({3}))) },
{ int32_type->decompose(7) }
});
}
});
}

View File

@@ -29,6 +29,9 @@
#include "database.hh"
#include "partition_slice_builder.hh"
#include "frozen_mutation.hh"
#include "mutation_source_test.hh"
#include "schema_registry.hh"
#include "service/migration_manager.hh"
SEASTAR_TEST_CASE(test_querying_with_limits) {
return do_with_cql_env([](cql_test_env& e) {
@@ -74,3 +77,33 @@ SEASTAR_TEST_CASE(test_querying_with_limits) {
});
});
}
SEASTAR_THREAD_TEST_CASE(test_database_with_data_in_sstables_is_a_mutation_source) {
do_with_cql_env([] (cql_test_env& e) {
run_mutation_source_tests([&] (schema_ptr s, const std::vector<mutation>& partitions) -> mutation_source {
try {
e.local_db().find_column_family(s->ks_name(), s->cf_name());
service::get_local_migration_manager().announce_column_family_drop(s->ks_name(), s->cf_name(), true).get();
} catch (const no_such_column_family&) {
// expected
}
service::get_local_migration_manager().announce_new_column_family(s, true).get();
column_family& cf = e.local_db().find_column_family(s);
for (auto&& m : partitions) {
e.local_db().apply(cf.schema(), freeze(m)).get();
}
cf.flush().get();
cf.get_row_cache().invalidate([] {}).get();
return mutation_source([&] (schema_ptr s,
const dht::partition_range& range,
const query::partition_slice& slice,
const io_priority_class& pc,
tracing::trace_state_ptr trace_state,
streamed_mutation::forwarding fwd,
mutation_reader::forwarding fwd_mr) {
return cf.make_reader(s, range, slice, pc, std::move(trace_state), fwd, fwd_mr);
});
});
return make_ready_future<>();
}).get();
}

View File

@@ -26,11 +26,13 @@
#include <boost/test/unit_test.hpp>
#include <query-result-set.hh>
#include <query-result-writer.hh>
#include "tests/test_services.hh"
#include "tests/test-utils.hh"
#include "tests/mutation_assertions.hh"
#include "tests/result_set_assertions.hh"
#include "tests/mutation_source_test.hh"
#include "mutation_query.hh"
#include "core/do_with.hh"
@@ -525,3 +527,22 @@ SEASTAR_TEST_CASE(test_partition_limit) {
}
});
}
SEASTAR_THREAD_TEST_CASE(test_result_size_calculation) {
random_mutation_generator gen(random_mutation_generator::generate_counters::no);
std::vector<mutation> mutations = gen(1);
schema_ptr s = gen.schema();
mutation_source source = make_source(std::move(mutations));
query::result_memory_limiter l;
query::partition_slice slice = make_full_slice(*s);
slice.options.set<query::partition_slice::option::allow_short_read>();
query::result::builder digest_only_builder(slice, query::result_options{query::result_request::only_digest, query::digest_algorithm::xxHash}, l.new_digest_read(query::result_memory_limiter::maximum_result_size).get0());
data_query(s, source, query::full_partition_range, slice, std::numeric_limits<uint32_t>::max(), std::numeric_limits<uint32_t>::max(), gc_clock::now(), digest_only_builder).get0();
query::result::builder result_and_digest_builder(slice, query::result_options{query::result_request::result_and_digest, query::digest_algorithm::xxHash}, l.new_data_read(query::result_memory_limiter::maximum_result_size).get0());
data_query(s, source, query::full_partition_range, slice, std::numeric_limits<uint32_t>::max(), std::numeric_limits<uint32_t>::max(), gc_clock::now(), result_and_digest_builder).get0();
BOOST_REQUIRE_EQUAL(digest_only_builder.memory_accounter().used_memory(), result_and_digest_builder.memory_accounter().used_memory());
}

View File

@@ -659,6 +659,46 @@ void test_mutation_reader_fragments_have_monotonic_positions(populate_fn populat
});
}
static void test_date_tiered_clustering_slicing(populate_fn populate) {
BOOST_TEST_MESSAGE(__PRETTY_FUNCTION__);
simple_schema ss;
auto s = schema_builder(ss.schema())
.set_compaction_strategy(sstables::compaction_strategy_type::date_tiered)
.build();
auto pkey = ss.make_pkey();
mutation m1(s, pkey);
ss.add_static_row(m1, "s");
m1.partition().apply(ss.new_tombstone());
ss.add_row(m1, ss.make_ckey(0), "v1");
mutation_source ms = populate(s, {m1});
// query row outside the range of existing rows to exercise sstable clustering key filter
{
auto slice = partition_slice_builder(*s)
.with_range(ss.make_ckey_range(1, 2))
.build();
auto prange = dht::partition_range::make_singular(pkey);
assert_that(ms.make_reader(s, prange, slice))
.produces(m1, slice.row_ranges(*s, pkey.key()))
.produces_end_of_stream();
}
{
auto slice = partition_slice_builder(*s)
.with_range(query::clustering_range::make_singular(ss.make_ckey(0)))
.build();
auto prange = dht::partition_range::make_singular(pkey);
assert_that(ms.make_reader(s, prange, slice))
.produces(m1)
.produces_end_of_stream();
}
}
static void test_clustering_slices(populate_fn populate) {
BOOST_TEST_MESSAGE(__PRETTY_FUNCTION__);
auto s = schema_builder("ks", "cf")
@@ -1012,6 +1052,7 @@ void test_slicing_with_overlapping_range_tombstones(populate_fn populate) {
}
void run_mutation_reader_tests(populate_fn populate) {
test_date_tiered_clustering_slicing(populate);
test_fast_forwarding_across_partitions_to_empty_range(populate);
test_clustering_slices(populate);
test_mutation_reader_fragments_have_monotonic_positions(populate);

View File

@@ -30,6 +30,7 @@
#include <map>
#include <iostream>
#include <sstream>
#include <boost/range/algorithm/adjacent_find.hpp>
static logging::logger nlogger("NetworkTopologyStrategyLogger");
@@ -52,6 +53,26 @@ void print_natural_endpoints(double point, const std::vector<inet_address> v) {
nlogger.debug("{}", strm.str());
}
#ifndef DEBUG
static void verify_sorted(const dht::token_range_vector& trv) {
auto not_strictly_before = [] (const dht::token_range a, const dht::token_range b) {
return !b.start()
|| !a.end()
|| a.end()->value() > b.start()->value()
|| (a.end()->value() == b.start()->value() && a.end()->is_inclusive() && b.start()->is_inclusive());
};
BOOST_CHECK(boost::adjacent_find(trv, not_strictly_before) == trv.end());
}
#endif
static void check_ranges_are_sorted(abstract_replication_strategy* ars, gms::inet_address ep) {
// Too slow in debug mode
#ifndef DEBUG
verify_sorted(ars->get_ranges(ep));
verify_sorted(ars->get_primary_ranges(ep));
#endif
}
void strategy_sanity_check(
abstract_replication_strategy* ars_ptr,
const std::map<sstring, sstring>& options) {
@@ -150,6 +171,7 @@ void full_ring_check(const std::vector<ring_point>& ring_points,
auto endpoints2 = ars_ptr->get_natural_endpoints(t2);
endpoints_check(ars_ptr, endpoints2);
check_ranges_are_sorted(ars_ptr, rp.host);
BOOST_CHECK(cache_hit_count + 1 == ars_ptr->get_cache_hits_count());
BOOST_CHECK(endpoints1 == endpoints2);
}

View File

@@ -518,7 +518,7 @@ SEASTAR_THREAD_TEST_CASE(test_memory_based_cache_eviction) {
}, 24h);
size_t i = 0;
const auto entry = t.produce_first_page_and_save_querier(i);
const auto entry = t.produce_first_page_and_save_querier(i++);
const size_t queriers_needed_to_fill_cache = floor(querier_cache::max_queriers_memory_usage / entry.memory_usage);

View File

@@ -3012,11 +3012,13 @@ SEASTAR_TEST_CASE(test_concurrent_reads_and_eviction) {
slice, actual, ::join(",\n", possible_versions)));
}
}
}).finally([&, id] {
done = true;
});
});
int n_updates = 100;
while (!readers.available() && n_updates--) {
while (!done && n_updates--) {
auto m2 = gen();
m2.partition().make_fully_continuous();
@@ -3034,8 +3036,8 @@ SEASTAR_TEST_CASE(test_concurrent_reads_and_eviction) {
tracker.region().evict_some();
// Don't allow backlog to grow too much to avoid bad_alloc
const auto max_active_versions = 10;
while (versions.size() > max_active_versions) {
const auto max_active_versions = 7;
while (!done && versions.size() > max_active_versions) {
later().get();
}
}

View File

@@ -775,3 +775,7 @@ FUNC_START(__crc32_vpmsum)
FUNC_END(__crc32_vpmsum)
#endif
// Mark the stack as non-executable so the final executable won't
// have an executable stack
.section .note.GNU-stack,"",@progbits

View File

@@ -164,7 +164,7 @@ class unqualified_name {
public:
// can be optimized with string_views etc.
unqualified_name(const sstring& pkg_pfx, const sstring& name)
: _qname(name.compare(0, pkg_pfx.size(), pkg_pfx) == 0 ? name.substr(pkg_pfx.size() + 1) : name)
: _qname(name.compare(0, pkg_pfx.size(), pkg_pfx) == 0 ? name.substr(pkg_pfx.size()) : name)
{}
operator const sstring&() const {
return _qname;

View File

@@ -68,10 +68,11 @@ public:
// Starts a new phase and waits for all operations started in any of the earlier phases.
// It is fine to start multiple awaits in parallel.
// Strong exception guarantees.
future<> advance_and_await() {
auto new_gate = make_lw_shared<gate>();
++_phase;
auto old_gate = std::move(_gate);
_gate = make_lw_shared<gate>();
auto old_gate = std::exchange(_gate, std::move(new_gate));
return old_gate->close().then([old_gate, op = start()] {});
}