* 'cql-trivial-cleanup' of ssh://github.com/scylladb/scylla-dev:
cql: rename modification_statement::_sets_a_collection to _selects_a_collection
cql: rename _column_conditions to _regular_conditions
cql: remove unnecessary optional around prefetch_data
"
Use a fixed-size, rather than a dynamically growing
bitset for column mask. This avoids unnecessary memory
reallocation in the most common case.
"
* 'column_set' of ssh://github.com/scylladb/scylla-dev:
schema: pre-allocate the bitset of column_set
schema: introduce schema::all_columns_count()
schema: rename column_mask to column_set
Adds per-table metrics for counting partition and row reuse
in memtables. New metrics are as follows:
- memtable_partition_writes - number of write operations performed
on partitions in memtables,
- memtable_partition_hits - number of write operations performed
on partitions that previously existed in a memtable,
- memtable_row_writes - number of row write operations performed
in memtables,
- memtable_row_hits - number of row write operations that ovewrote
rows previously present in a memtable.
Tests: unit(release)
Merged patch series from Dejan Mircevski. Implements the "LT" and "GT"
operators of the Expected update option (i.e., conditional updates),
and enables the pre-existing tests for them.
Since it contains a precise set of columns, it's more
accurate to call it a set, not a mask. Besides, the name
column_mask is already used for column options on storage
level.
This is merely to avoid confusion: we use _sets prefix to indicate that
there are operations over static/regular columns (_sets_static_columns,
_sets_regular_columns), but _sets_a_collection is set for both operations
and conditions. So let's rename it to _selects_a_collection and add some
comments.
It's weird that modification_statement has _static_conditions for
conditions on static columns and _column_conditions for conditions on
regular columns, as if conditions on static columns are not column
conditions. Let's rename _column_conditions to _regular_conditions to
avoid confusion.
--pkg option on install.sh is introduced for .deb packaging since it requires
different install directory for each subpackage.
But we actually able to use "debian/tmp" for shared install directory,
then we can specify file owner of the package using .install files.
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191030203142.31743-1-syuu@scylladb.com>
Adds per-table metrics for counting partition and row reuse
in memtables. New metrics are as follows:
- memtable_partition_writes - number of write operations performed
on partitions in memtables,
- memtable_partition_hits - number of write operations performed
on partitions that previously existed in a memtable,
- memtable_row_writes - number of row write operations performed
in memtables,
- memtable_row_hits - number of row write operations that ovewrote
rows previously present in a memtable.
Tests: unit(release)
This change adds a SCYLLA_REPO_URL argument to Dockerfile, which defines
the RPM repository used to install Scylla from.
When building a new Docker image, users can specify the argument by
passing the --build-arg SCYLLA_REPO_URL=<url> option to the docker build
command. If the argument is not specified, the same RPM repository is
used as before, retaining the old default behavior.
We intend to use this in release engineering infrastructure to specify
RPM repositories for nightly builds of release branches (for example,
3.1.x), which are currently only using the stable RPMs.
Code for check_LT(), check_GT(), etc. will be nearly identical, so
factor it out into a single function that takes a comparator object.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
In 1ca9dc5d47, it was established that the correct way to
base64-decode a JSON value is via string_view, rather than directly
from GetString().
This patch adds a base64_decode(rjson::value) overload, which
automatically uses the correct procedure. It saves typing, ensures
correctness (fixing one incorrect call found), and will come in handy
for future EXPECTED comparisons.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
unwrap_number() is now a public function in serialization.hh instead
of a static function visible only in executor.cc.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Merged patch series from Piotr Sarna:
An otherwise empty partition can still have a valid static column.
Filtering didn't take that fact into account and only filtered
full-fledged rows, which may result in non-matching rows being returned
to the client.
Fixes#5248
"type" label is already in use for the counter type ("derive", "gauge",
etc). Using the same label for "cas" / "non-cas" overwrites it. Let's
instead call the new label "conditional" and use "yes" / "no" for its
value, as suggested by Kostja.
Message-Id: <3082b16e4d6797f064d58da95fb4e50b59ab795c.1572451480.git.vdavydov@scylladb.com>
"
In case when a single reader contributes a stream of fragments and keeps winning over other readers, mutation_reader_merger will enter gallop mode, in which it is assumed that the reader will keep winning over other readers. Currently, a reader needs to contribute 3 fragments to enter that mode.
In gallop mode, fragments returned by the galloping reader will be compared with the best fragment from _fragment_heap. If it wins, the fragment is directly returned. Otherwise, gallop mode ends and merging performed as in general case, which involves heap operations.
In current implementation, when the end of partition is encountered while in gallop mode, the gallop mode is ended unconditionally.
A microbenchmark was added in order to test performance of the galloping reader optimization. A combining reader that merges results from four other readers is created. Each sub-reader provides a range of 32 clustering rows that is disjoint from others. All sub-readers return rows from the same partition. An improvement can be observed after introducing the galloping reader optimization.
As for other benchmarks from the "combined" group, results are pretty close to the old ones. The only one that seems to have suffered slightly is combined.many_overlapping.
Median times from a single run of perf_mutation_readers.combined: (1s run duration, 5 runs per benchmark, release mode)
test name before after improvement
one_row 49.070ns 48.287ns 1.60%
single_active 61.574us 61.235us 0.55%
many_overlapping 488.193us 514.977us -5.49%
disjoint_interleaved 57.462us 57.111us 0.61%
disjoint_ranges 56.545us 56.006us 0.95%
overlapping_partitions_disjoint_rows 127.039us 80.849us 36.36%
Same results, normalized per mutation fragment:
test name before after improvement
one_row 16.36ns 16.10ns 1.60%
single_active 109.46ns 108.86ns 0.55%
many_overlapping 216.97ns 228.88ns -5.49%
disjoint_interleaved 102.15ns 101.53ns 0.61%
disjoint_ranges 100.52ns 99.57ns 0.95%
overlapping_partitions_disjoint_rows 246.38ns 156.80ns 36.36%
Tested on AMD Ryzen Threadripper 2950X @ 3.5GHz.
Tests: unit(release)
Fixes#3593.
"
* '3593-combined_reader-gallop-mode' of https://github.com/piodul/scylla:
mutation_reader: gallop mode microbenchmark
mutation_reader: combined reader gallop tests
mutation_reader: gallop mode for combined reader
mutation_reader: refactor prepare_next
An otherwise empty partition can still have a valid static column.
Filtering didn't take that fact into account and only filtered
full-fledged rows, which may result in non-matching rows being returned
to the client.
Fixes#5248
Update previous results dictionary using the update_metrics method.
It calls metric_source.query_list to get a list of results (similar to discover()) then for each line in the response it updates results dictionary.
New results may be appeneded depending on the do_append parameter (True by default).
Previously, with prometheous, each metric.update called query_list resulting in O(n^2) when all metric were updated, like in the scylla_top dtest - causing test timeout when testing debug build.
(E.g. dtest-debug/216/testReport/scyllatop_test/TestScyllaTop/default_start_test/)
This patch adds "type" label to the following CQL metrics:
inserts
updates
deletes
batches
statements_in_batches
The label is set to "cas" for conditional statements and "non-cas" for
unconditional statements.
Note, for a batch to be accounted as CAS, it is enough to have just one
conditional statement. In this case all statements within the batch are
accounted as CAS as well.
This microbenchmark tests performance of the galloping reader
optimization. A combining reader that merges results from four other
readers is created. Each sub-reader provides a range of 32 clustering
rows that is disjoint from others. All sub-readers return rows from
the same partition. An improvement can be observed after introducing the
galloping reader optimization.
As for other benchmarks from the "combined" group, results are pretty
close to the old ones. The only one that seems to have suffered slightly
is combined.many_overlapping.
Median times from a single run of perf_mutation_readers.combined:
(1s run duration, 5 runs per benchmark, release mode)
test name before after improvement
one_row 49.070ns 48.287ns 1.60%
single_active 61.574us 61.235us 0.55%
many_overlapping 488.193us 514.977us -5.49%
disjoint_interleaved 57.462us 57.111us 0.61%
disjoint_ranges 56.545us 56.006us 0.95%
overlapping_partitions_disjoint_rows 127.039us 80.849us 36.36%
Same results, normalized per mutation fragment:
test name before after improvement
one_row 16.36ns 16.10ns 1.60%
single_active 109.46ns 108.86ns 0.55%
many_overlapping 216.97ns 228.88ns -5.49%
disjoint_interleaved 102.15ns 101.53ns 0.61%
disjoint_ranges 100.52ns 99.57ns 0.95%
overlapping_partitions_disjoint_rows 246.38ns 156.80ns 36.36%
Tested on AMD Ryzen Threadripper 2950X @ 3.5GHz.
In case when a single reader contributes a stream of fragments
and keeps winning over other readers, mutation_reader_merger will
enter gallop mode, in which it is assumed that the reader will keep
winning over other readers. Currently, a reader needs to contribute
3 fragments to enter that mode.
In gallop mode, fragments returned by the galloping reader will be
compared with the best fragment from _fragment_heap. If it wins, the
fragment is directly returned. Otherwise, gallop mode ends and
merging performed as in general case, which involves heap operations.
In current implementation, when the end of partition is encountered
while in gallop mode, the gallop mode is ended unconditionally.
Fixes#3593.
Move out logic responsible for adding readers at partition boundary
into `maybe_add_readers_at_partition_boundary`, and advancing one reader
into `prepare_one`. This will allow to reuse this logic outside
`prepare_next`.
Since seastar::streams are based on future/promise, variadic streams
suffer the same fate as variadic futures - deprecation and eventual
removal.
This patch therefore replaces a variadic stream in commitlog::read_log_file()
with a non-variadic stream, via a helper struct.
Tests: unit (dev)
Recently, scylla memory started to go beyond just providing raw stats
about the occupancy of the various memory pools, to additionally also
provide an overview of the "usual suspects" that cause memory pressure.
As part of this, recently 46341bd63f
added a section of the coordinator stats. This patch continues this
trend and adds a replica section, with the "usual suspects":
* read concurrency semaphores
* execution stages
* read/write operations
Example:
Replica:
Read Concurrency Semaphores:
user sstable reads: 0/100, remaining mem: 84347453 B, queued: 0
streaming sstable reads: 0/ 10, remaining mem: 84347453 B, queued: 0
system sstable reads: 0/ 10, remaining mem: 84347453 B, queued: 0
Execution Stages:
data query stage:
03 "service_level_sg_0" 4967
Total 4967
mutation query stage:
Total 0
apply stage:
03 "service_level_sg_0" 12608
06 "statement" 3509
Total 16117
Tables - Ongoing Operations:
pending writes phaser (top 10):
2 ks.table1
2 Total (all)
pending reads phaser (top 10):
3380 ks.table2
898 ks.table1
410 ks.table3
262 ks.table4
17 ks.table8
2 system_auth.roles
4969 Total (all)
pending streams phaser (top 10):
0 Total (all)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191029164817.99865-1-bdenes@scylladb.com>
This patch adds the following per table stats:
cas_prepare_latency
cas_propose_latency
cas_commit_latency
They are equivalent to CasPropose, CasPrepare, CasCommit metrics exposed
by Cassandra.
This patch implements accounting of Cassandra's metrics related to
lightweight transactions, namely:
cas_read_latency transactional read latency (histogram)
cas_write_latency transactional write latency (histogram)
cas_read_timeouts number of transactional read timeouts
cas_write_timeouts number of transactional write timeouts
cas_read_unavailable number of transactional read
unavailable errors
cas_write_unavailable number of transactional write
unavailable errors
cas_read_unfinished_commit number of transaction commit attempts
that occurred on read
cas_write_unfinished_commit number of transaction commit attempts
that occurred on write
cas_write_condition_not_met number of transaction preconditions
that did not match current values
cas_read_contention how many contended reads were
encountered (histogram)
cas_write_contention how many contended writes were
encountered (histogram)
Pass contention by reference to begin_and_repair_paxos(), where it is
incremented on every sleep. Rationale: we want to account the total
number of times query() / cas() had to sleep, either directly or within
begin_and_repair_paxos(), no matter if the function failed or succeeded.
Even though every Scylla version has its own scylla-gdb.py, because we
don't backport any fixes or improvements, practically we end up always
using master's version when debugging older versions of Scylla too. This
is made harder by the fact that both Scylla's and its dependencies'
(most notably that of libstdc++ and boost) code is constantly changing
between releases, requiring edits to scylla-gdb.py to make it usable
with past releases.
This patch attempts to make it easier to use scylla-gdb.py with past
releases, more specifically Scylla 3.0. This is achieved by wrapping
problematic lines in a `try: except:` and putting the backward
compatible version in the `except:` clause. These lines have comments
with the version they provide support for, so they can be removed when
said version is not supported anymore.
I did not attempt to provide full coverage, I only fixed up problems
that surfaced when using my favourite commands with 3.0.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191029155737.94456-1-bdenes@scylladb.com>
The loop that collects the result of the checksum calculations and logs
any errors. The error logging includes `checksums[0]` which corresponds
to the checksum calculation on the local node. This violates the
assumption of the code following the loop, which assumes that the future
of `checksums[0]` is intact after the loop terminates. However this is
only true when the checksum calculation is successful and is false when
it fails, as in this case the loop extracts the error and logs it. When
the code after the loop checks again whether said calculation failed, it
will get a false negative and will go ahead and attempt to extract the
value, triggering an assert failure.
Fix by making sure that even in the case of failed checksum calculation,
the result of `checksum[0]` is extracted only once.
Fixes: #5238
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191029151709.90986-1-bdenes@scylladb.com>
* seastar 2963970f6b...75e189c6ba (7):
> posix-stack: Do auto-resolve of ipv6 scope iff not set for link-local dests
> README.md: Add redpanda and smf to 'Projects using Seastar'
> unix_domain_test: don't assume that at temporary_buffer is null terminated
> socket_address: Use offsetof instead of null pointer
> README: add projects using seastar section to readme
> Adjustments for glibc 2.30 and hwloc 2.0
> Mark future::failed() as const
We may want to change paxos tables format and change internode protocol,
so hide lwt behind experimental flag for now.
Message-Id: <20191029102725.GM2866@scylladb.com>
Currently end of stream validation is done in the destructor,
but the validator may be destructed prematurely, e.g. on
exception, as seen in https://github.com/scylladb/scylla/issues/5215
This patch adds a on_end_of_stream() method explicitly called by
consume_pausable_in_thread. Also, the respective concepts for
ParitionFilter, MutationFragmentFilter and a new on for the
on_end_of_stream method were unified as FlattenedConsumerFilter.
Refs #5215
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 506ff40bd447f00158c24859819d4bb06436c996)