Commit Graph

1771 Commits

Author SHA1 Message Date
Avi Kivity
1572b9e41c Merge 'transport: Added listener with port-based load balancing' from Juliusz
"
This is inspired by #6781. The idea is to make Scylla listen for CQL connections on port 9042 (where both old shard-aware and shard-unaware clients can still connect the traditional way). On top of that I added a new port, where everything works the same way, only the port from client's socket used to determine the shard No. to connect to. Desired shard No. is the result of `clientside_port % num_shards`.

The new port is configurable from scylla.yaml and defaults to 19042 (unencrypted, unless user configures encryption options and omits `native_shard_aware_transport_port_ssl` in DB config).

Two "SUPPORTED" tags are added: "SCYLLA_SHARD_AWARE_PORT" and "SCYLLA_SHARD_AWARE_PORT_SSL". For compatibility, "SCYLLA_SHARDING_ALGORITHM" is still kept.

Fixes #5239
"

* jul-stas-shard-aware-listener:
  docs: Info about shard-aware listeners in protocol-extensions
  transport: Added listener with port-based load balancing
2020-08-03 19:23:28 +03:00
Calle Wilund
30a700c5b0 system_keyspace: Remove support for legacy truncation records
Fixes #6341

Since scylla no longer supports upgrading from a version without the
"new" (dedicated) truncation record table, we can remove support for these
and the migtration thereof.

Make sure the above holds whereever this is committed.

Note that this does not  remove the "truncated_at" field in
system.local.
2020-08-03 17:16:26 +03:00
Eliran Sinvani
779502ab11 Revert "schema: take into account features when converting a table creation to"
This reverts commit b97f466438.

It turns out that the schema mechanism has a lot of nuances,
after this change, for unknown reason, it was empirically
proven that the amount of cross shard on an upgraded node was
increased significantly with a steady stress traffic, if
was so significant that the node appeared unavailable to
the coordinators because all of the requests started to fail
on smp_srvice_group semaphore.

This revert will bring back a caveat in Scylla, the caveat is
that creating a table in a mixed cluster **might** under certain
condition cause schema mismatch on the newly created table, this
make the table essentially unusable until the whole cluster has
a uniform version (rolling upgrade or rollback completion).

Fixes #6893.
2020-08-03 12:51:16 +03:00
Avi Kivity
257c17a87a Merge "Don't depend on seastar::make_(lw_)?shared idiosyncrasies" from Rafael
"
While working on another patch I was getting odd compiler errors
saying that a call to ::make_shared was ambiguous. The reason was that
seastar has both:

template <typename T, typename... A>
shared_ptr<T> make_shared(A&&... a);

template <typename T>
shared_ptr<T> make_shared(T&& a);

The second variant doesn't exist in std::make_shared.

This series drops the dependency in scylla, so that a future change
can make seastar::make_shared a bit more like std::make_shared.
"

* 'espindola/make_shared' of https://github.com/espindola/scylla:
  Everywhere: Explicitly instantiate make_lw_shared
  Everywhere: Add a make_shared_schema helper
  Everywhere: Explicitly instantiate make_shared
  cql3: Add a create_multi_column_relation helper
  main: Return a shared_ptr from defer_verbose_shutdown
2020-08-02 19:51:24 +03:00
Juliusz Stasiewicz
1c11d8f4c4 transport: Added listener with port-based load balancing
The new port is configurable from scylla.yaml and defaults to 19042
(unencrypted, unless client configures encryption options and omits
`native_shard_aware_transport_port_ssl`).

Two "SUPPORTED" tags are added: "SCYLLA_SHARD_AWARE_PORT" and
"SCYLLA_SHARD_AWARE_PORT_SSL". For compatibility,
"SCYLLA_SHARDING_ALGORITHM" is still kept.

Fixes #5239
2020-07-31 13:02:13 +02:00
Tomasz Grabiec
3486eba1ce commitlog: Fix use-after-free on mutation object during replay
The mutation object may be freed prematurely during commitlog replay
in the schema upgrading path. We will hit the problem if the memtable
is full and apply_in_memory() needs to defer.

This will typically manifest as a segfault.

Fixes #6953

Introduced in 79935df

Tests:
  - manual using scylla binary. Reproduced the problem then verified the fix makes it go away

Message-Id: <1596044010-27296-1-git-send-email-tgrabiec@scylladb.com>
2020-07-29 20:58:15 +03:00
Avi Kivity
fea5067dfa Merge "Limit non-paged query memory consumption" from Botond
"
Non-paged queries completely ignore the query result size limiter
mechanism. They consume all the memory they want. With sufficiently
large datasets this can easily lead to a handful or even a single
unpaged query producing an OOM.

This series continues the work started by 134d5a5f7, by introducing a
configurable pair of soft/hard limit (default to 1MB/100MB) that is
applied to otherwise unlimited queries, like reverse and unpaged ones.
When an unlimited query reaches the soft limit a warning is logged. This
should give users some heads-up to adjust their application. When the
hard limit is reached the query is aborted. The idea is to not greet
users with failing queries after an upgrade while at the same time
protect the database from the really bad queries. The hard limit should
be decreased from time to time gradually approaching the desired goal of
1MB.

We don't want to limit internal queries, we trust ourselves to either
use another form of memory usage control, or read only small datasets.
So the limit is selected according to the query class. User reads use
the `max_memory_for_unlimited_query_{soft,hard}_limit` configuration
items, while internal reads are not limited. The limit is obtained by
the coordinator, who passes it down to replicas using the existing
`max_result_size` parameter (which is not a special type containing the
two limits), which is now passed on every verb, instead of once per
connection. This ensures that all replicas work with the same limits.
For normal paged queries `max_result_size` is set to the usual
`query::result_memory_limiter::maximum_result_size` For queries that can
consume unlimited amount of memory -- unpaged and reverse queries --
this is set to the value of the aforementioned
`max_memory_for_unlimited_query_{soft,hard}_limit` configuration item,
but only for user reads, internal reads are not limited.

This has the side-effect that reverse reads now send entire
partitions in a single page, but this is not that bad. The data was
already read, and its size was below the limit, the replica might as well
send it all.

Fixes: #5870
"

* 'nonpaged-query-limit/v5' of https://github.com/denesb/scylla: (26 commits)
  test: database_test: add test for enforced max result limit
  mutation_partition: abort read when hard limit is exceeded for non-paged reads
  query-result.hh: move the definition of short_read to the top
  test: cql_test_env: set the max_memory_unlimited_query_{soft,hard}_limit
  test: set the allow_short_read slice option for paged queries
  partition_slice_builder: add with_option()
  result_memory_accounter: remove default constructor
  query_*(): use the coordinator specified memory limit for unlimited queries
  storage_proxy: use read_command::max_result_size to pass max result size around
  query: result_memory_limiter: use the new max_result_size type
  query: read_command: add max_result_size
  query: read_command: use tagged ints for limit ctor params
  query: read_command: add separate convenience constructor
  service: query_pager: set the allow_short_read flag
  result_memory_accounter: check(): use _maximum_result_size instead of hardcoded limit
  storage_proxy: add get_max_result_size()
  result_memory_limiter: add unlimited_result_size constant
  database: add get_statement_scheduling_group()
  database: query_mutations(): obtain the memory accounter inside
  query: query_class_config: use max_result_size for the max_memory_for_unlimited_query field
  ...
2020-07-29 13:41:53 +03:00
Botond Dénes
9eab5bca27 query_*(): use the coordinator specified memory limit for unlimited queries
It is important that all replicas participating in a read use the same
memory limits to avoid artificial differences due to different amount of
results. The coordinator now passes down its own memory limit for reads,
in the form of max_result_size (or max_size). For unpaged or reverse
queries this has to be used now instead of the locally set
max_memory_unlimited_query configuration item.

To avoid the replicas accidentally using the local limit contained in
the `query_class_config` returned from
`database::make_query_class_config()`, we refactor the latter into
`database::get_reader_concurrency_semaphore()`. Most of its callers were
only interested in the semaphore only anyway and those that were
interested in the limit as well should get it from the coordinator
instead, so this refactoring is a win-win.
2020-07-28 18:00:29 +03:00
Botond Dénes
159d37053d storage_proxy: use read_command::max_result_size to pass max result size around
Use the recently added `max_result_size` field of `query::read_command`
to pass the max result size around, including passing it to remote
nodes. This means that the max result size will be sent along each read,
instead of once per connection.
As we want to select the appropriate `max_result_size` based on the type
of the query as well as based on the query class (user or internal) the
previous method won't do anymore. If the remote doesn't fill this
field, the old per-connection value is used.
2020-07-28 18:00:29 +03:00
Botond Dénes
92a7b16cba query: read_command: add max_result_size
This field will replace max size which is currently passed once per
established rpc connection via the CLIENT_ID verb and stored as an
auxiliary value on the client_info. For now it is unused, but we update
all sites creating a read command to pass the correct value to it. In the
next patch we will phase out the old max size and use this field to pass
max size on each verb instead.
2020-07-28 18:00:29 +03:00
Botond Dénes
8992bcd1f8 query: read_command: use tagged ints for limit ctor params
The convenience constructor of read_command now has two integer
parameter next to each other. In the next patch we intend to add another
one. This is recipe for disaster, so to avoid mistakes this patch
converts these parameters to tagged integers. This makes sure callers
pass what they meant to pass. As a matter of fact, while fixing up
call-sites, I already found several ones passing `query::max_partitions`
to the `row_limit` parameter. No harm done yet, as
`query::max_partitions` == `query::max_rows` but this shows just how
easy it is to mix up parameters with the same type.
2020-07-28 18:00:29 +03:00
Botond Dénes
46d5b651eb db/config: introduce max_memory_for_unlimited_query_soft_limit and max_memory_for_unlimited_query_hard_limit
This pair of limits replace the old max_memory_for_unlimited_query one,
which remains as an alias to the hard limit. The soft limit inherits the
previous value of the limit (1MB), when this limit is reached a warning
will be logged allowing the users to adjust their client codes without
downtime. The hard limit starts out with a more permissive default of
100MB. When this is reached queries are aborted, the same behaviour as
with the previous single limit.

The idea is to allow clients a grace period for fixing their code, while
at the same time protecting the database from the really bad queries.
2020-07-28 18:00:29 +03:00
Botond Dénes
9faaf46d4b utils: config_src::add_command_line_options(): drop name and desc args
Now that there are no ad-hoc aliases needing to overwrite the name and
description parameter of this method, we can drop these and have each
config item just use `name()` and `desc()` to access these.
2020-07-28 18:00:29 +03:00
Botond Dénes
dc23736d0c db/config: replace ad-hoc aliases with alias mechanism
We already uses aliases for some configuration items, although these are
created with an ad-hoc mechanism that only registers them on the command
line. Replace this with the built-in alias mechanism in the previous
patch, which has the benefit of conflict resolution and also working
with YAML.
2020-07-28 18:00:29 +03:00
Piotr Sarna
ee35c4c3d6 db: handle errors when loading view build progress
Currently, encountering an error when loading view build progress
would result in view builder refusing to start - which also means
that future views would not be built until the server restarts.
A more user-friendly solution would be to log an error message,
but continue to boot the view builder as if no views are currently
in progress, which would at least allow future views to be built
correctly.
The test case is also amended, since now it expects the call
to return that "no view builds are in progress" instead of
an exception.

Fixes #6934
Tests: unit(dev)
Message-Id: <9f26de941d10e6654883a919fd43426066cee89c.1595922374.git.sarna@scylladb.com>
2020-07-28 11:32:09 +03:00
Nadav Har'El
f488eaebaf merge: db/view: view_update_generator: make staging reader evictable
Merged patch set by Botond Dénes:

The view update generation process creates two readers. One is used to
read the staging sstables, the data which needs view updates to be
generated for, and another reader for each processed mutation, which
reads the current value (pre-image) of each row in said mutation. The

staging reader is created first and is kept alive until all staging data
is processed. The pre-image reader is created separately for each
processed mutation. The staging reader is not restricted, meaning it
does not wait for admission on the relevant reader concurrency
semaphore, but it does register its resource usage on it. The pre-image
reader however *is* restricted. This creates a situation, where the
staging reader possibly consumes all resources from the semaphore,
leaving none for the later created pre-image reader, which will not be
able to start reading. This will block the view building process meaning
that the staging reader will not be destroyed, causing a deadlock.

This patch solves this by making the staging reader restricted and
making it evictable. To prevent thrashing -- evicting the staging reader
after reading only a really small partition -- we only make the staging
reader evictable after we have read at least 1MB worth of data from it.

  test/boost: view_build_test: add test_view_update_generator_buffering
  test/boost: view_build_test: add test test_view_update_generator_deadlock
  reader_permit: reader_resources: add operator- and operator+
  reader_concurrency_semaphore: add initial_resources()
  test: cql_test_env: allow overriding database_config
  mutation_reader: expose new_reader_base_cost
  db/view: view_updating_consumer: allow passing custom update pusher
  db/view: view_update_generator: make staging reader evictable
  db/view: view_updating_consumer: move implementation from table.cc to view.cc
  database: add make_restricted_range_sstable_reader()

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
---
 db/view/view_updating_consumer.hh | 51 ++++++++++++++++++++++++++++---
 db/view/view.cc                   | 39 +++++++++++++++++------
 db/view/view_update_generator.cc  | 19 +++++++++---
 3 files changed, 91 insertions(+), 18 deletions(-)
2020-07-27 09:19:37 +02:00
Avi Kivity
39db54a758 Merge "Use seastar::with_file_close_on_failure in commitlog" from Benny
"
`close_on_failure` was committed to seastar so use
the library version.

This requires making the lambda function passed to
it nothrow move constructible, so this series also
makes db::commitlog::descriptor move constructor noexcept
and changes allocate_segment_ex and segment::segment
to get a descriptor by value rather than by reference.

Test: unit(dev), commitlog_test(debug)
"

* tag 'commit-log-use-with_file_close_on_failure-v1' of github.com:bhalevy/scylla:
  commitlog: use seastar::with_file_close_on_failure
  commitlog: descriptor: make nothrow move constructible
  commitlog: allocate_segment_ex, segment: pass descriptor by value
  commitlog: allocate_segment_ex: filename capture is unused
2020-07-23 19:23:23 +03:00
Rafael Ávila de Espíndola
e15c8ee667 Everywhere: Explicitly instantiate make_lw_shared
seastar::make_lw_shared has a constructor taking a T&&. There is no
such constructor in std::make_shared:

https://en.cppreference.com/w/cpp/memory/shared_ptr/make_shared

This means that we have to move from

    make_lw_shared(T(...)

to

    make_lw_shared<T>(...)

If we don't want to depend on the idiosyncrasies of
seastar::make_lw_shared.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-07-21 10:33:49 -07:00
Rafael Ávila de Espíndola
efeaded427 Everywhere: Add a make_shared_schema helper
This replaces a lot of make_lw_shared(schema(...)) with
make_shared_schema(...).

This makes it easier to drop a dependency on the differences between
seastar::make_shared and std::make_shared.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-07-21 10:33:49 -07:00
Botond Dénes
566e31a5ac db/view: view_updating_consumer: allow passing custom update pusher
So that tests can test the `view_update_consumer` in isolation, without
having to set up the whole database machinery. In addition to less
infrastructure setup, this allows more direct checking of mutations
pushed for view generation.
2020-07-20 11:23:39 +03:00
Botond Dénes
0166f97096 db/view: view_update_generator: make staging reader evictable
The view update generation process creates two readers. One is used to
read the staging sstables, the data which needs view updates to be
generated for, and another reader for each processed mutation, which
reads the current value (pre-image) of each row in said mutation. The
staging reader is created first and is kept alive until all staging data
is processed. The pre-image reader is created separately for each
processed mutation. The staging reader is not restricted, meaning it
does not wait for admission on the relevant reader concurrency
semaphore, but it does register its resource usage on it. The pre-image
reader however *is* restricted. This creates a situation, where the
staging reader possibly consumes all resources from the semaphore,
leaving none for the later created pre-image reader, which will not be
able to start reading. This will block the view building process meaning
that the staging reader will not be destroyed, causing a deadlock.

This patch solves this by making the staging reader restricted and
making it evictable. To prevent thrashing -- evicting the staging reader
after reading only a really small partition -- we only make the staging
reader evictable after we have read at least 1MB worth of data from it.
2020-07-20 11:23:39 +03:00
Botond Dénes
84357f0722 db/view: view_updating_consumer: move implementation from table.cc to view.cc
table.cc is a very counter-intuitive place for view related stuff,
especially if the declarations reside in `db/view/`.
2020-07-20 11:23:39 +03:00
Avi Kivity
5371be71e9 Merge "Reduce fanout of some mutation-related headers" from Pavel E
"
The set's goal is to reduce the indirect fanout of 3 headers only,
but likely affects more. The measured improvement rates are

flat_mutation_reader.hh: -80%
mutation.hh            : -70%
mutation_partition.hh  : -20%

tests: dev-build, 'checkheaders' for changed headers (the tree-wide
       fails on master)
"

* 'br-debloat-mutation-headers' of https://github.com/xemul/scylla:
  headers:: Remove flat_mutation_reader.hh from several other headers
  migration_manager: Remove db/schema_tables.hh inclustion into header
  storage_proxy: Remove frozen_mutation.hh inclustion
  storage_proxy: Move paxos/*.hh inclusions from .hh to .cc
  storage_proxy: Move hint_wrapper from .hh to .cc
  headers: Remove mutation.hh from trace_state.hh
2020-07-19 19:47:59 +03:00
Eliran Sinvani
b97f466438 schema: take into account features when converting a table creation to
schema_mutations

When upgrading from a version that lacks some schema features,
during the transition, when we have a mixed cluster. Schema digests
are calculated without taking into account the mixed cluster supported
features. Every node calculate the digest as if the whole cluster supports
its supported features.
Scylla already has a mechanism of redaction to the lowest common
denominator, but it haven't been used in this context.

This commit is using the redaction mechanism when calculating the digest on
the newly added table so it will match the supported features of the
whole cluster.

Tests: Manual upgrading - upgraded to a version with an additional
feature and additional schema column and validated that the digest
of the tables schema is identical on every node on the mixed cluster.
2020-07-19 10:30:51 +03:00
Pavel Emelyanov
92f58f62f2 headers:: Remove flat_mutation_reader.hh from several other headers
All they can live with forward declaration of the f._m._r. plus a
seastar header in commitlog code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-17 17:54:47 +03:00
Pavel Emelyanov
8618a02815 migration_manager: Remove db/schema_tables.hh inclustion into header
The schema_tables.hh -> migration_manager.hh couple seems to work as one
of "single header for everyhing" creating big blot for many seemingly
unrelated .hh's.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-17 17:54:43 +03:00
Benny Halevy
3ab1d9fe1d commitlog: use seastar::with_file_close_on_failure
`close_on_failure` was committed to seastar so use the
library version.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-07-16 20:32:32 +03:00
Benny Halevy
742298fa2a commitlog: descriptor: make nothrow move constructible
inherit from sstring nothrow move constructor.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-07-16 20:32:22 +03:00
Benny Halevy
54c5583b8d commitlog: allocate_segment_ex, segment: pass descriptor by value
Besdies being more robust than passing const descriptor&
to continuations, this helps simplify making allocate_segment_ex's
continuations nothrow_move_constructible, that is need for using
seastar::with_file_close_on_failure().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-07-16 20:31:12 +03:00
Benny Halevy
22c384c2e9 commitlog: allocate_segment_ex: filename capture is unused
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-07-16 20:23:57 +03:00
Pavel Solodovnikov
5ff5df1afd storage_proxy: un-hardcode force sync flag for mutate_locally(mutation) overload
Corresponding overload of `storage_proxy::mutate_locally`
was hardcoded to pass `db::commitlog::force_sync::no` to the
`database::apply`. Unhardcode it and substitute `force_sync::no`
to all existing call sites (as it were before).

`force_sync::yes` will be used later for paxos learn writes
when trying to apply mutations upgraded from an obsolete
schema version (similar to the current case when applying
locally a `frozen_mutation` stored in accepted proposal).

Tests: unit(dev)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200716124915.464789-1-pa.solodovnikov@scylladb.com>
2020-07-16 16:38:48 +03:00
Avi Kivity
0c7c255f94 Merge "compaction uuid for log and compaction_history" from Benny
"
We'd like to use the same uuid both for printing compaction log
messages and to update compaction_history.

Generate one when starting compaction and keep it in
compaction_info.  Then use it by convention in all
compaction log messages, along with compaction type,
and keyspace.table information.  Finally, use the
same uuid to update compaction_history.

Fixes #6840
"

* tag 'compaction-uuid-v1' of github.com:bhalevy/scylla:
  compaction: print uuid in log messages
  compaction: report_(start|finish): just return description
  compaction: move compaction uuid generation to compaction_info
2020-07-16 16:38:48 +03:00
Benny Halevy
e39fbe1849 compaction: move compaction uuid generation to compaction_info
We'd like to use the same uuid both for printing compaction log
messages and to update compaction_history.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-07-16 13:55:23 +03:00
Calle Wilund
811b531e2d db::config: Add option to set streams confidence window
Option to control the alternator streams CDC query/shard range time
confidence interval, i.e. the period we enforce as timestamp threshold
when reading. The default, 10s, should be sufficient on a normal
cluster, but across DCs:, or with client timestamps or whatever, one might
need a larger window.
2020-07-15 08:10:23 +00:00
Calle Wilund
a9641d4f02 system_distributed_keyspace: Add cdc topology/stream ids reader
To read the full topology (with expired and expirations etc)
from within.
2020-07-15 08:10:23 +00:00
Amnon Heiman
ea8d52b11c row_locking: change estimated histogram with time_estimated_histogram
This patch changes the row locking latencies to use
time_estimated_histogram.

The change consist of changing the histogram definition and changing how
values are inserted to the histogram.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-07-14 11:17:43 +03:00
Benny Halevy
d4615f4293 sstables: sstable_version_types: implement operator<=>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200707061715.578604-1-bhalevy@scylladb.com>
2020-07-08 14:23:11 +03:00
Avi Kivity
b0698dfb38 Merge 'Rewrite CQL3 restriction representation' from dekimir
"
This is the first stage of replacing the existing restrictions code with a new representation. It adds a new class `expression` to replace the existing class `restriction`. Lots of the old code is deleted, though not all -- that will come in subsequent stages.

Tests: unit (dev, debug restrictions_test), dtest (next-gating)
"

* dekimir-restrictions-rewrite:
  cql3/restrictions: Drop dead code
  cql3/restrictions: Use free functions instead of methods
  cql3/restrictions: Create expression objects
  cql3/restrictions: Add free functions over new classes
  cql3/restrictions: Add new representation
2020-07-08 10:22:17 +03:00
Dejan Mircevski
37ebe521e3 cql3/restrictions: Use free functions instead of methods
Instead of `restriction` class methods, use the new free functions.
Specific replacement actions are listed below.

Note that class `restrictions` (plural) remains intact -- both its
methods and its type hierarchy remain intact for now.

Ensure full test coverage of the replacement code with new file
test/boost/restrictions_test.cc and some extra testcases in
test/cql/*.

Drop some existing tests because they codify buggy behaviour
(reference #6369, #6382).  Drop others because they forbid relation
combinations that are now allowed (eg, mixing equality and
inequality, comparing to NULL, etc.).

Here are some specific categories of what was replaced:

- restriction::is_foo predicates are replaced by using the free
  function find_if; sometimes it is used transitively (see, eg,
  has_slice)

- restriction::is_multi_column is replaced by dynamic casts (recall
  that the `restrictions` class hierarchy still exists)

- utility methods is_satisfied_by, is_supported_by, to_string, and
  uses_function are replaced by eponymous free functions; note that
  restrictions::uses_function still exists

- restriction::apply_to is replaced by free function
  replace_column_def

- when checking infinite_bound_range_deletions, the has_bound is
  replaced by local free function bounded_ck

- restriction::bounds and restriction::value are replaced by the more
  general free function possible_lhs_values

- using free functions allows us to simplify the
  multi_column_restriction and token_restriction hierarchies; their
  methods merge_with and uses_function became identical in all
  subclasses, so they were moved to the base class

- single_column_primary_key_restrictions<clustering_key>::needs_filtering
  was changed to reuse num_prefix_columns_that_need_not_be_filtered,
  which uses free functions

Fixes #5799.
Fixes #6369.
Fixes #6371.
Fixes #6372.
Fixes #6382.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-07-07 23:08:09 +02:00
Avi Kivity
4c221855a1 Merge 'hinted handoff: fix commitlog memory leak' from Piotr D
"
When commitlog is recreated in hints manager, only shutdown() method is
called, but not release(). Because of that, some internal commitlog
objects (`segment_manager` and `segment`s) may be left pointing to each
other through shared_ptr reference cycles, which may result in memory
leak when the parent commitlog object is destroyed.

This PR prevents memory leaks that may happen this way by calling
release() after shutdown() from the hints manager.

Fixes: #6409, Fixes #6776
"

* piodul-fix-commitlog-memory-leak-in-hinted-handoff:
  hinted handoff: disable warnings about segments left on disk
  hinted handoff: release memory on commitlog termination
2020-07-07 21:36:14 +03:00
Piotr Dulikowski
b955793088 hinted handoff: disable warnings about segments left on disk
When a mutation is written to the commitlog, a rp_handle object is
returned which keeps a reference to commitlog segment. A segment is
"dirty" when its reference count is not zero, otherwise it is "clean".

When commitlog object is being destroyed, a warning is being printed
for every dirty segment. On the other hand, clean segments are deleted.

In case of standard mutation writing path, the rp_handle moves
responsibility for releasing the reference to the memtable to which the
mutation is written. When the memtable is flushed to disk, all
references accumulated in the memtable are released. In this context, it
makes sense to warn about dirty segments, because such segments contain
mutations that are not written to sstables, and need to be replayed.

However, hinted handoff uses a different workflow - it recreates a
commitlog object periodically. When a hint is written to commitlog, the
rp_handle reference is not released, so that segments with hints are not
deleted when destroying the commitlog. When commitlog is created again,
we get a list of saved segments with hints that we can try to send at a
later time.

Although this is intended behavior, now that releasing the hints
commitlog is done properly, it causes the mentioned warning to
periodically appear in the logs.

This patch adds a parameter for the commitlog that allows to disable
this warning. It is only used when creating hinted handoff commitlogs.
2020-07-07 19:40:42 +02:00
Piotr Dulikowski
002e6c4056 hinted handoff: release memory on commitlog termination
When commitlog is recreated in hints manager, only shutdown() method is
called, but not release(). Because of that, some internal commitlog
objects (`segment_manager` and `segment`s) may be left pointing to each
other through shared_ptr reference cycles, which may result in memory
leak when the parent commitlog object is destroyed.

This commit prevents memory leaks that may happen this way by calling
release() after shutdown() from the hints manager.

Fixes: #6409, #6776
2020-07-07 19:40:32 +02:00
Botond Dénes
5ebe2c28d1 db/view: view_update_generator: re-balance wait/signal on the register semaphore
The view update generator has a semaphore to limit concurrency. This
semaphore is waited on in `register_staging_sstable()` and later the
unit is returned after the sstable is processed in the loop inside
`start()`.
This was broken by 4e64002, which changed the loop inside `start()` to
process sstables in per table batches, however didn't change the
`signal()` call to return the amount of units according to the number of
sstables processed. This can cause the semaphore units to dry up, as the
loop can process multiple sstables per table but return just a single
unit. This can also block callers of `register_staging_sstable()`
indefinitely as some waiters will never be released as under the right
circumstances the units on the semaphore can permanently go below 0.
In addition to this, 4e64002 introduced another bug: table entries from
the `_sstables_with_tables` are never removed, so they are processed
every turn. If the sstable list is empty, there won't be any update
generated but due to the unconditional `signal()` described above, this
can cause the units on the semaphore to grow to infinity, allowing
future staging sstables producers to register a huge amount of sstables,
causing memory problems due to the amount of sstable readers that have
to be opened (#6603, #6707).
Both outcomes are equally bad. This patch fixes both issues and modifies
the `test_view_update_generator` unit test to reproduce them and hence
to verify that this doesn't happen in the future.

Fixes: #6774
Refs: #6707
Refs: #6603

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200706135108.116134-1-bdenes@scylladb.com>
2020-07-07 08:53:00 +02:00
Wojciech Mitros
76038b8d8e view: differentiate identical error messages and change them to warnings
Modified log message in view_builder::calculate_shard_build_step to make it distinct from the one in view_builder::execute, changed their logging level to warning, since we're continuing even if we handle an exception.

Fixes #4600
2020-07-06 20:50:34 +03:00
Botond Dénes
62c6859b69 db/view: view_update_generator: use partitioned sstable set
And pass it to `make_range_sstable_reader()` when creating the reader,
thus allowing the incremental selector created therein to exploit the
fact that staging sstables are disjoint (in the case of repair and
streaming at least). This should reduce the memory consumption of the
staging reader considerably when reading from a lot of sstables.
2020-07-06 13:38:23 +03:00
Piotr Sarna
4cb79f04b0 treewide: replace libjsoncpp usage with rjson
In order to eventually switch to a single JSON library,
most of the libjsoncpp usage is dropped in favor of rjson.
Unfortunately, one usage still remains:
test/utils/test_repl utility heavily depends on the *exact textual*
format of its output JSON files, so replacing a library results
in all tests failing because of differences in formatting.
It is possible to force rjson to print its documents in the exact
matching format, but that's left for later, since the issue is not
critical. It would be nice though if our test suite compared
JSON documents with a real JSON parser, since there are more
differences - e.g. libjsoncpp keeps children of the object
sorted, while rapidjson uses an unordered data structure.
This change should cause no change in semantics, it strives
just to replace all usage of libjsoncpp with rjson.
2020-07-03 10:27:23 +02:00
Pavel Emelyanov
f045cec586 snap: Get rid of storage_service reference in schema.cc
Now when the snapshot stopping is correctly handled, we may pull the database
reference all the way down to the schema::describe().

One tricky place is in table::napshot() -- the local db reference is pulled
through an smp::submit_to call, but thanks to the shard checks in the place
where it is needed the db is still "local"

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-26 20:28:25 +03:00
Pavel Emelyanov
9211df2cdf snapshot: Make check_snapshot_not_exist a method
Sanitation. It now can access the this->_db pointer.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-26 20:26:15 +03:00
Pavel Emelyanov
ba47ef0397 snapshots: Move ops gate from storage_service
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-26 20:17:21 +03:00
Pavel Emelyanov
e439873319 snapshot: Move lock from storage_service
For this de-static run_snapshot_*_operation (because we no longer have
the static global to get the lock from) and make the snapshot_ctl be
peering_sharded_service to call invoke_on.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-26 20:17:19 +03:00