Commit Graph

740 Commits

Author SHA1 Message Date
Tomasz Grabiec
36d90e637e Merge "Relax migration manager dependencies" from Pavel Emalyanov
The set make dependencies between mm and other services cleaner,
in particular, after the set:

- the query processor no longer needs migration manager
  (which doesn't need query processor either)

- the database no longer needs migration manager, thus the mutual
  dependency between these two is dropped, only migration manager
  -> database is left

- the migration manager -> storage_service dependency is relaxed,
  one more patchset will be needed to remove it, thus dropping one
  more mutual dependency between them, only the storage_service
  -> migration manager will be left

- the migration manager is stopped on drain, but several more
  services need it on stop, thus causing use after free problems,
  in particular there's a caught bug when view builder crashes
  when unregistering from notifier list on stop. Fixed.

Tests: unit(dev)
Fixes: #5404
2020-01-16 12:12:25 +01:00
Nadav Har'El
9953a33354 merge "Adding a schema file when creating a snapshot"
Merged pull request https://github.com/scylladb/scylla/pull/5294 from
Amnon Heiman:

To use a snapshot we need a schema file that is similar to the result of
running cql DESCRIBE command.

The DESCRIBE is implemented in the cql driver so the functionality needs
to be re-implemented inside scylla.

This series adds a describe method to the schema file and use it when doing
a snapshot.

There are different approach of how to handle materialize views and
secondary indexes.

This implementation creates each schema.cql file in its own relevant
directory, so the schema for materializing view, for example, will be
placed in the snapshot directory of the table of that view.

Fixes #4192
2020-01-16 12:05:50 +02:00
Amnon Heiman
028525daeb database: add schema.cql file when creating a snapshot
When creating a snapshot we need to add a schema.cql file in the
snapshot directory that describes the table in that snapshot.

This patch adds the file using the schema describe method.

get_snapshot_details and manifest_json_filter were modified to ignore
the schema.cql file.

Fixes #4192

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-01-15 15:06:00 +02:00
Pavel Emelyanov
5cf365d7e7 database: Explicitly pass migration_manager through init_non_system_keyspace
This is the last place where database code needs the migration_manager
instance to be alive, so now the mutual dependency between these two
is gone, only the migration_manager needs the database, but not the
vice-versa.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:29:21 +03:00
Pavel Emelyanov
7cfab1de77 database: Switch on mnotifier from migration_manager
Do not call for local migration manager instance to send notifications,
call for the local migration notifier, it will always be alive.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Pavel Emelyanov
e327feb77f database: Prepare to use on-database migration_notifier
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-01-15 14:28:21 +03:00
Gleb Natapov
29574c1271 database: pass sync flag from db::apply function to the commitlog
Allow upper layers to request a mutation to be persisted on a disk before
making future ready independent of which mode commitlog is running in.
2020-01-15 12:15:42 +02:00
Nadav Har'El
aa1de5a171 merge: Synchronize snapshot and staging sstable deletion using sem
Merged pull request https://github.com/scylladb/scylla/pull/5343 from
Benny Halevy.

Fixes #5340

Hold the sstable_deletion_sem table::move_sstables_from_subdirs to
serialize access to the staging directory. It now synchronizes snapshot,
compaction deletion of sstables, and view_update_generator moving of
sstables from staging.

Tests:

    unit (dev) [expect test_user_function_timestamp_return that fails for me locally, but also on master]
    snapshot_test.py (dev)
2019-12-17 14:06:02 +02:00
Benny Halevy
4b3243f5b9 table: move_sstables_from_staging_in_thread with _sstable_deletion_sem
Hold the _sstable_deletion_sem while moving sstables from the staging directory
so not to move them under the feet of table::snapshot.

Fixes #5340

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-12-17 12:20:20 +02:00
Rafael Ávila de Espíndola
3b61cf3f0b db: Don't use lw_shared_ptr for user_types_metadata
The user_types_metadata can simply be owned by the keyspace. This
simplifies the code since we never have to worry about nulls and the
ownership is now explicit.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Rafael Ávila de Espíndola
a55838323b user_types_metadata: don't implement enable_lw_shared_from_this
It looks like this was done just to avoid including
user_types_metadata.hh, which seems a bit much considering that it
requires adding specialization to the seastar namespace.

A followup patch will also stop using lw_shared_ptr for
user_types_metadata.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-12-11 10:44:40 -08:00
Tomasz Grabiec
87b72dad3e Merge "treewide: add missing const qualifiers" from Pavel Solodovnikov
This patchset adds missing "const" function qualifiers throughout
the Scylla code base, which would make code less error-prone.

The changeset incorporates Kostja's work regarding const qualifiers
in the cql code hierarchy along with a follow-up patch addressing the
review comment of the corresponding patch set (the patch subject is
"cql: propagate const property through prepared statement tree.").
2019-11-27 10:56:20 +01:00
Piotr Sarna
9c5a5a5ac2 treewide: add names to semaphores
By default, semaphore exceptions bring along very little context:
either that a semaphore was broken or that it timed out.
In order to make debugging easier without introducing significant
runtime costs, a notion of named semaphore is added.
A named semaphore is simply a semaphore with statically defined
name, which is present in its errors, bringing valuable context.
A semaphore defined as:

  auto sem = semaphore(0);

will present the following message when it breaks:
"Semaphore broken"
However, a named semaphore:

  auto named_sem = named_semaphore(0, named_semaphore_exception_factory{"io_concurrency_sem"});

will present a message with at least some debugging context:

  "Semaphore broken: io_concurrency_sem"

It's not much, but it would really help in pinpointing bugs
without having to inspect core dumps.

At the same time, it does not incur any costs for normal
semaphore operations (except for its creation), but instead
only uses more CPU in case an error is actually thrown,
which is considered rare and not to be on the hot path.

Refs #4999

Tests: unit(dev), manual: hardcoding a failure in view building code
2019-11-26 15:14:21 +02:00
Pavel Solodovnikov
2f442f28af treewide: add const qualifiers throughout the code base 2019-11-26 02:24:49 +03:00
Benny Halevy
f9e93bba38 sstables: compaction: move cleanup parameter to compaction_descriptor
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20191117165806.3234-1-bhalevy@scylladb.com>
2019-11-18 10:52:20 +01:00
Piotr Dulikowski
59fbbb993f memtables: add partition/row hit/miss counters
Adds per-table metrics for counting partition and row reuse
in memtables. New metrics are as follows:
    - memtable_partition_writes - number of write operations performed
          on partitions in memtables,
    - memtable_partition_hits - number of write operations performed
          on partitions that previously existed in a memtable,
    - memtable_row_writes - number of row write operations performed
          in memtables,
    - memtable_row_hits - number of row write operations that ovewrote
          rows previously present in a memtable.

Tests: unit(release)
2019-11-12 13:35:41 +01:00
Piotr Dulikowski
48f7b2e4fb table: move out table::stats to table_stats
This change was done in order to be able to forward-declare
the table::stats structure.
2019-11-12 13:35:41 +01:00
Vladimir Davydov
b75862610e paxos_state: account paxos round latency
This patch adds the following per table stats:

  cas_prepare_latency
  cas_propose_latency
  cas_commit_latency

They are equivalent to CasPropose, CasPrepare, CasCommit metrics exposed
by Cassandra.
2019-10-29 19:26:18 +03:00
Raphael S. Carvalho
7f1a2156c7 table: Don't account for shared SSTables in compaction backlog tracker
We don't want to add shared sstables to table's backlog tracker because:
1) table's backlog tracker has only an influence on regular compaction
2) shared sstables are never regular compacted, they're worked by
resharding which has its own backlog tracker.

Such sstables belong to more than one shard, meaning that currently
they're added to backlog tracker of all shards that own them.
But the thing is that such sstables ends up being resharded in shard
that may be completely random. So increasing backlog of all shards
such sstables belong to, won't lead to faster resharding. Also, table's
backlog tracker is supposed to deal only with regular compaction.

Accounting for shared sstables in table's tracker may lead to incorrect
speed up of regular compactions because the controller is not aware
that some relevant part of the backlog is due to pending resharding.
The fix is about ignoring sstables that will be resharded and let
table's backlog tracker account only for sstables that can be worked on
by regular compaction, and rely on resharding controlling itself
with its own tracker.
NOTE: this doesn't fix the resharding controlling issue completely,
as described in #4952. We'll still need to throttle regular compaction
on behalf of resharding. So subsequent work may be about:
- move resharding to its own priority class, perhaps streaming.
- make a resharding's backlog tracker accounts for sstables in all of
its pending jobs, not only the ongoing ones (currently limited to 1 by shard).
- limit compaction shares when resharding is in progress.
THIS only fixes the issue in which controller for regular compaction
shouldn't account sstables completely exclusive to resharding.

Fixes #5077.
Refs #4952.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190924022109.17400-1-raphaelsc@scylladb.com>
2019-10-13 10:14:13 +03:00
Amnon Heiman
64c2d28a7f database: Add counter for the number of schema changes
Schema changes can have big effects on performance, typically it should
be a rare event.

It is usefull to monitor how frequently the schema changed.
This patch adds a counter that increases each time a schema changed.

After this patch the metrics would look like:

scylla_database_schema_changed{shard="0",type="derive"} 2

Fixes #4785

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-10-08 17:54:49 +02:00
Tomasz Grabiec
79935df959 commitlog: replay: Respect back-pressure from memtable space to prevent OOM
Commit log replay was bypassing memtable space back-pressure, and if
replay was faster than memtable flush, it could lead to OOM.

The fix is to call database::apply_in_memory() instead of
table::apply(). The former blocks when memtable space is full.

Fixes #4982.

Tests:
  - unit (release)
  - manual, replay with memtable flush failin and without failing

Message-Id: <1568381952-26256-1-git-send-email-tgrabiec@scylladb.com>
2019-09-15 11:51:56 +03:00
Piotr Sarna
1ab07b80b4 database: assign proper io priority for streaming view updates
Streamed view updates parasitized on writing io priority, which is
reserved for user writes - it's now properly bound to streaming
write priority.
2019-08-20 00:24:50 +02:00
Tomasz Grabiec
7604980d63 database: Add missing partition slicing on streaming reader recreation
streaming_reader_lifecycle_policy::create_reader() was ignoring the
partition_slice passed to it and always creating the reader for the
full slice.

That's wrong because create_reader() is called when recreating a
reader after it's evicted. If the reader stopped in the middle of
partition we need to start from that point. Otherwise, fragments in
the mutation stream will appear duplicated or out of ordre, violating
assumptions of the consumers.

This was observed to result in repair writing incorrect sstables with
duplicated clustering rows, which results in
malformed_sstable_exception on read from those sstables.

Fixes #4659.

In v2:

  - Added an overload without partition_slice to avoid changing existing users which never slice

Tests:

  - unit (dev)
  - manual (3 node ccm + repair)

Backport: 3.1
Reviewd-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1563451506-8871-1-git-send-email-tgrabiec@scylladb.com>
2019-07-18 18:35:28 +03:00
Benny Halevy
0e4567c881 table: document _sstables_lock/_sstable_deletion_sem locking order
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-07-15 19:20:35 +03:00
Benny Halevy
bbbd749f70 table: uninline enable_sstable_write
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-07-11 12:14:44 +03:00
Kamil Braun
d6736a304a Add metric for failed memtable flushes
Resolves #3316.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-07-10 11:30:10 +03:00
Avi Kivity
fca1ae69ff database: convert _cfg from a pointer to a reference
_cfg cannot be null, so it can be converted to a reference to
indicate this. Follow-up to fe59997efe.
2019-07-02 17:57:50 +02:00
Avi Kivity
2abe015150 database: allow live update of the compaction_enforce_min_threshold config item
Change the type from bool to updateable_value<bool> throughout the dependency
chain and mark it as live updateable.

In theory we should also observe the value and trigger compaction if it changes,
but I don't think it is worthwhile.
2019-06-28 16:43:25 +03:00
Avi Kivity
fe59997efe database: don't copy config object
Copying the config object breaks the link between the original and the copied
object, so updates to config items will not be visible. To allow updates, don't
copy any more, and instead keep a pointer.

The pointer won't work will once config is updateable, since the same object is
shared across multiple shard, but that can be addressed later.
2019-06-28 15:20:39 +03:00
Avi Kivity
339699b627 database: remove default constructor
Currently, database::_cfg is a copy of the global configuration. But this means
that we have multiple master copies of the configuration, which makes updating
the configuration harder. In order to eliminate the copy we have to eliminate the
database default constructor, which creates a config object, so that all
remaining constructors can receive config by reference and retain that reference.
2019-06-28 15:20:39 +03:00
Piotr Sarna
e77ef849af database: add flag for infinite bound range deletions
Database can only support infinite bound range deletions if sstable mc
format is supported. As a first step to implement these checks,
an appropriate flag is added to database.
2019-06-24 15:57:47 +03:00
Dejan Mircevski
8dcb35913a table: Avoid needless allocation of cell lockers
All `table` instances currently unconditionally allocate a cell locker
for counter cells, though not all need one.  Since the lockers occupy
quite a bit of memory (as reported in #4441), it's wasteful to
allocate them when unneeded.

Fixes #4441.

Tests: unit (dev, debug)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Message-Id: <20190515190910.87931-1-dejan@scylladb.com>
2019-05-16 11:10:38 +03:00
Tomasz Grabiec
3cb7b2d72e treewide: Propagate schema_features to db::schema::all_tables() 2019-04-28 15:50:13 +02:00
Benny Halevy
223e1af521 sstables: provide large_data_handler to constructor
And use it for writing the sstable and/or when deleting it.

Refs #4198

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:24:19 +02:00
Benny Halevy
eebc3701a5 sstables: introduce sstables_manager
The goal of the sstables manager is to track and manage sstables life-cycle.
There is a sstable manager instance per database and it is passed to each column-family
(and test environment) on construction.
All sstables created, loaded, and deleted pass through the sstables manager.

The manager will make sure consumers of sstables are in sync so that sstables
will not be deleted while in use.

Refs #4149

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
3a17053cb8 database: add table::make_sstable helper
In most cases we make a sstable based on the table schema
and soon - large_data_handler.
Encapsulate that in a make_sstable method.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Piotr Sarna
a7602bd2f1 database: add global view update stats
Currently view update metrics are only per-table, but per-table metrics
are not always enabled. In order to be able to see the number of
generated view updates in all cases, global stats are added.

Fixes #4221
Message-Id: <e94c27c530b2d7d262f76d03937e7874d674870a.1552552016.git.sarna@scylladb.com>
2019-03-14 12:04:18 +00:00
Rafael Ávila de Espíndola
54b856e5e4 large_data_handler: propagate a future out of stop()
stop() will close a semaphore in a followup patch, so it needs to return a
future.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Avi Kivity
0beeb2f721 Merge "implement upgradesstables + scub​" from Calle
"
Fixes #4245

Breaks up "perform_cleanup" in parameterized "rewrite_sstables"
and implements upgrade + scrub in terms of this.

Both run as a "regular" compaction, but ignore the normal criteria
for compaction and select obsolete/all tables.
We also ensure all previous compactions are done so we can guarantee
all tables are rewritten post invocation of command.
"

* 'calle/upgrade_sstables' of github.com:scylladb/seastar-dev:
  api::storage_service: Implement "scrub"
  api/storage_service: Implement "upgradesstables"
  api::storage_service: Add keyspace + tables helper
  compaction_manager: Add perform_sstable_scrub
  compaction_manager: Add perform_sstable_upgrade
  compaction_manager: break out rewrite_sstables from cleanup
  table: parameterize cleanup_sstables
2019-03-06 15:47:26 +02:00
Duarte Nunes
a29ec4be76 Merge 'Update system.large_partitions during shutdown' from Rafael
"
Currently any large partitions found during shutdown are not
recorded. The reason is that the database commit log is already off,
so there is nowhere to record it to.

One possible solution is to have an independent system database. With
that the regular db is shutdown first and writes can continue to the
system db.

That is a pretty big change. It would also not allow us to record
large partitions in any system tables.

This patch series instead tries to stop the commit log later. With
that any large partitions are recorded to the log and moved to a
sstable on the next startup.
"

* 'espindola/shutdown-order-patches-v7' of https://github.com/espindola/scylla:
  db: stop the commit log after the tables during shutdown
  db: stop the compaction manager earlier
  db: Add a stop_database helper
  db: Don't record large partitions in system tables
2019-03-06 10:36:38 -03:00
Tomasz Grabiec
889f31fabe Merge "fix slow truncation under flush pressure" from Glauber
Truncating a table is very slow if the system is under pressure. Because
in that case we mostly just want to get rid of the existing data, it
shouldn't take this long. The problem happens because truncate has to
wait for memtable flushes to end, twice. This is regardless of whether
or not the table being truncated has any data.

1. The first time is when we call truncate itself:

if auto_snapshot is enabled, we will flush the contents of this table
first and we are expected to be slow. However, even if auto_snapshot is
disabled we will still do it -- which is a bug -- if the table is marked
as durable. We should just not flush in this case and it is a silly bug.

1. The second time is when we call cf->stop(). Stopping a table will
wait for a flush to finish. At this point, regardless of which path
(Durable or non-durable) we took in the previous step we will have no
more data in the table. However, calling `flush()` still need to acquire
a flush_permit, which means we will wait for whichever memtable is
flushing at that very moment to end.

If the system is under pressure and a memtable flush will take many
seconds, so will truncate.  Even if auto_snapshots are enabled, we
shouldn't have to flush twice. The first flush should already put is in
a state in which the next one is immediate (maybe holding on to the
permit, maybe destroying the memtable_list already at that point ->
since no other memtables should be created).

If auto_snapshots are not enabled, the whole thing should just be
instantaneous.

This patchset fixes that by removing the flush need when !auto_snapshot,
and special casing the flush of an empty table.

Fixes #4294

* git@github.com:glommer/scylla.git slowtruncate-v2:
  database: immediately flush tables with no memtables.
  truncate: do not flush memtables if auto_snapshot is false.
2019-03-06 13:54:58 +01:00
Rafael Ávila de Espíndola
16ed9a2574 db: stop the commit log after the tables during shutdown
This allows for system.large_partitions to be updated if a large
partition is found while writing the last sstables.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-05 18:04:51 -08:00
Rafael Ávila de Espíndola
765d8535f1 db: Add a stop_database helper
This reduces code duplication. A followup patch will add more code to
stop_database.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-05 18:04:45 -08:00
Glauber Costa
ed8261a0fe database: immediately flush tables with no memtables.
If a table has no data, it may still take a long time to flush. This is
because before we even try to flush, we need go acquire a permit and
that can take a while if there is a long running flush already queued.

We can special case the situation in which there is no data in any of
the memtables owned by table and return immediately.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-03-05 11:22:48 -05:00
Piotr Sarna
67e63d4dd7 database: add view_stats getter
It will be used for testing purposes
2019-02-28 10:47:20 +01:00
Calle Wilund
7fb6bbe68c table: parameterize cleanup_sstables
To allow using the logic for one-sstable-at-a-time compaction (i.e.
rewrite) of sstables without the "normal" cleanup logic and partition
selection.
2019-02-27 14:25:31 +00:00
Asias He
75edbe939d database: Add update_schema_version and announce_schema_version
Split the update_schema_version_and_announce() into
update_schema_version() and announce_schema_version(). This is going to
be used in storage_service::prepare_to_join() where we want to first
update the schema version, start gossip, announce the schema version.
2019-02-26 19:10:02 +08:00
Rafael Ávila de Espíndola
9cd14f2602 Don't write to system.large_partition during shutdown
The included testcase used to crash because during database::stop() we
would try to update system.large_partition.

There doesn't seem to be an order we can stop the existing services in
cql_test_env that makes this possible.

This patch then adds another step when shutting down a database: first
stop updating system.large_partition.

This means that during shutdown any memtable flush, compaction or
sstable deletion will not be reflected in system.large_partition. This
is hopefully not too bad since the data in the table is TTLed.

This seems to impact only tests, since main.cc calls _exit directly.

Tests: unit (release,debug)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190213194851.117692-1-espindola@scylladb.com>
2019-02-15 10:49:10 +01:00
Glauber Costa
e0bfd1c40a allow Cassandra SSTables with counters to be imported if they are new enough
Right now Cassandra SSTables with counters cannot be imported into
Scylla.  The reason for that is that Cassandra changed their counter
representation in their 2.1 version and kept transparently supporting
both representations.  We do not support their old representation, nor
there is a sane way to figure out by looking at the data which one is in
use.

For safety, we had made the decision long ago to not import any
tables with counters: if a counter was generated in older Cassandra, we
would misrepresent them.

In this patch, I propose we offer a non-default way to import SSTables
with counters: we can gate it with a flag, and trust that the user knows
what they are doing when flipping it (at their own peril). Cassandra 2.1
is by now pretty old. many users can safely say they've never used
anything older.

While there are tools like sstableloader that can be used to import
those counters, there are often situations in which directly importing
SSTables is either better, faster, or worse: the only option left.  I
argue that having a flag that allow us to import them when we are sure
it is safe is better than having no option at all.

With this patch I was able to successfully import Cassandra tables with
counters that were generated in Cassandra 2.1, reshard and compact their
SSTables, and read the data back to get the same values in Scylla as in
Cassandra.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190210154028.12472-1-glauber@scylladb.com>
2019-02-10 17:50:48 +02:00
Rafael Ávila de Espíndola
625080b414 Rename large_partition_handler
Now that it also handles large rows, rename it to large_data_handler.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-28 15:03:14 -08:00