Commit Graph

1223 Commits

Author SHA1 Message Date
Rafael Ávila de Espíndola
26ac2c23ef Change *_row_* names that refer to partitions
This renames some variables and functions to make it clear that they
refer to partitions and not rows.

Old versions of sstablemetadata used to refer to a row histogram, but
current versions now mention a partition histogram instead.

This patch doesn't change the exposed API names.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181229223311.4184-2-espindola@scylladb.com>
2019-01-09 14:53:42 +02:00
Duarte Nunes
fa2b0384d2 Replace std::experimental types with C++17 std version.
Replace stdx::optional and stdx::string_view with the C++ std
counterparts.

Some instances of boost::variant were also replaced with std::variant,
namely those that called seastar::visit.

Scylla now requires GCC 8 to compile.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190108111141.5369-1-duarte@scylladb.com>
2019-01-08 13:16:36 +02:00
Nadav Har'El
da090a5458 materialized views: move hints to top-level directory
While we keep ordinary hints in a directory parallel to the data directory,
we decided to keep the materialized view hints in a subdirectory of the data
directory, named "view_pending_updates". But during boot, we expect all
subdirectories of data/ to be keyspace names, and when we notice this one,
we print a warning:

   WARN: database - Skipping undefined keyspace: view_pending_updates

This spurious warning annoyed users. But moreover, we could have bigger
problems if the user actually tries to create a keyspace with that name.

So in this patch, we move the view hints to a separate top-level directory,
which defaults to /var/lib/scylla/view_hints, but as usual can be configured.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190107142257.16342-1-nyh@scylladb.com>
2019-01-07 16:43:43 +02:00
Avi Kivity
f02c64cadf streaming: stream_session: remove include of db/view/view_update_from_staging_generator.hh
This header, which is easily replaced with a forward declaration,
introduces a dependency on database.hh everywhere. Remove it and scatter
includes of database.hh in source files that really need it.
2019-01-05 17:33:25 +02:00
Avi Kivity
c3ef99f84f schema_tables: remove #include of database.hh
Distribute in source files (and one header - table_helper.hh) that need it.
2019-01-05 15:43:07 +02:00
Avi Kivity
f43f82d1d2 cql_type_parser: remove dependency on user_types_metadata
A default parameter of type T (or lw_shared_ptr<T>) requires that T be
defined. Remove the depndency by redefining the default parameter
as an overload, for T = user_types_metadata.
2019-01-05 15:40:58 +02:00
Piotr Sarna
9d46715613 streaming,view: move view update checks to separate file
Checking if view update path should be used for sstables
is going to be reused in row level repair code,
so relevant functions are moved to a separate header.
2019-01-03 08:31:40 +01:00
Duarte Nunes
b7517183fa db/commitlog: Use fragmented buffers to read entries
Leverage fragmented_temporary_buffer when reading commit log
entries, avoiding large allocations.

Refs #4020

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-31 13:20:37 +00:00
Duarte Nunes
0e50a9bc6d db/commitlog: Implement skip in terms of input buffer skipping
This simplifies the code and allows to get rid of the overload of
advance() taking a temporary_buffer.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-31 13:20:37 +00:00
Duarte Nunes
f41d13f38c db/view/view_update_from_staging_generator: Break semaphore on stop()
This avoid having fibers waiting _registration_sem without ever being
notified.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-29 12:55:04 +00:00
Duarte Nunes
4974addc5c db/view/view_update_from_staging_generator: Restore formatting
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-29 12:55:02 +00:00
Duarte Nunes
201196130d db/view/view_update_from_staging_generator: Avoid creating more than one fiber
If view_update_from_staging_generator::maybe_generate_view_updates()
is called before view_update_from_staging_generator::start(), as can
happen in main.cc, then we can potentially create more than one fiber,
which leads to corrupted state and conflicting operations.

To avoid this, use just one fiber and be explicit about notifying it
that more work is needed, by leveraging a condition-variable.

Fixes #4021

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-29 12:52:51 +00:00
Duarte Nunes
66113a2d39 Merge 'Replace query_processor's sharded<database> with plain database' from Avi
"
A sharded<database> is not very useful for accessing data since data is
usually distributed across many nodes, while a sharded<database>
contains only a single node's view. So it is really only used for
accessing replicated metadata, not data. As such only the local shard
is accessed.

Use that to simplify query_processor a little by replacing sharded<database>
with a plain database.

We can probably be more ambitious and make all accesses, data and metadata,
go through storage_proxy, but this is a start.
"

* tag 'qp-unshard-database/v1' of https://github.com/avikivity/scylla:
  query_processor: replace sharded<database> with the local shard
  commitlog_replayer: don't use query_processor
  client_state: change set_keyspace() to accept a single database shard
  legacy_schema_migrator: initialize with database reference
2018-12-29 12:14:19 +00:00
Avi Kivity
0c0cc66ee7 system_keyspace, view: reduce interdependencies
system_keyspace is an implementation detail for most of its users, not
part of the interface, as it's only used to store internal data. Therefore,
including it in a header file causes unneeded dependencies.

This patch removes a dependency between views and system_keyspace.hh
by moving view_name and view_build_progress into a separate header file,
and using forward declarations where possible. This allows us to
remove an inclusion of system_keyspace.hh from a header file (the last
one), so that further changes to system_keyspace.hh will cause fewer
recompilations.
Message-Id: <20181228215736.11493-1-avi@scylladb.com>
2018-12-29 12:12:15 +00:00
Avi Kivity
30745eeb72 query_processor: replace sharded<database> with the local shard
query_processor uses storage_proxy to access data, and the local
database object to access replicated metadata. While it seems strange
that the database object is not used to access data, it is logical
when you consider that a sharded<database> only contain's this node's
data, not the cluster data.

Take advantage of this to replace sharded<database> with a single database
shard.
2018-12-29 11:02:15 +02:00
Avi Kivity
f0a709cfc8 commitlog_replayer: don't use query_processor
During normal writes, query processing happens before commitlog, so
logically commitlog replaying the commitlog shouldn't need it. And in
fact the dependency on query_processor can be eliminated, all it needs
is the local node's database.
2018-12-29 11:00:29 +02:00
Avi Kivity
e4233262cf legacy_schema_migrator: initialize with database reference
Provide legacy_schema_migrator with a sharded<database> so it doesn't need
to use the one from query_processor. We want to replace query_processor's
sharded<database> with just a local database reference in order to simplify
it, and this is standing in the way.
2018-12-29 10:58:22 +02:00
Tomasz Grabiec
7747f2dde3 Merge "nodetool toppartitions" from Rafi & Avi
Implementation of nodetool toppartiotion query, which samples most frequest PKs in read/write
operation over a period of time.

Content:
- data_listener classes: mechanism that interfaces with mutation readers in database and table classes,
- toppartition_query and toppartition_data_listener classes to implement toppartition-specific query (this
  interfaces with data_listeners and the REST api),
- REST api for toppartitions query.

Uses Top-k structure for handling stream summary statistics (based on implementation in C*, see #2811).

What's still missing:
- JMX interface to nodetool (interface customization may be required),
- Querying #rows and #bytes (currently, only #partitions is supported).

Fixes #2811

* https://github.com/avikivity/scylla rafie_toppartitions_v7.1:
  top_k: whitespace and minor fixes
  top_k: map template arguments
  top_k: std::list -> chunked_vector
  top_k: support for appending top_k results
  nodetool toppartitions: refactor table::config constructor
  nodetool toppartitions: data listeners
  nodetool toppartitions: add data_listeners to database/table
  nodetool toppartitions: fully_qualified_cf_name
  nodetool toppartitions: Toppartitions query implementation
  nodetool toppartitions: Toppartitions query REST API
  nodetool toppartitions: nodetool-toppartitions script
2018-12-28 16:31:24 +01:00
Rafi Einstein
6b2c21f69b nodetool toppartitions: Toppartitions query implementation
toppartitions_query installs toppartitions_data_listener-s on all database shards, waits for
the designated period, uninstalls shards and collects top-k read/write partition keys.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:57 +02:00
Rafi Einstein
08ba115c16 nodetool toppartitions: data listeners
Mechanism that interfaces with mutation readers in database and table classes, to
allow tracking most frequent partition keys in read and write operation.
Basic design is specified in #2811.

Tracking top #rows and #bytes will be supported in the future.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:57 +02:00
Rafi Einstein
038f8c7988 nodetool toppartitions: refactor table::config constructor
Eliminae extra parameters to ctor and deduce them instead from db param.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:57 +02:00
Duarte Nunes
2f69ba2844 lwt: Remove Paxos-related Cassandra code
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181227112526.4180-1-duarte@scylladb.com>
2018-12-27 13:30:10 +02:00
Avi Kivity
eae030b061 hints: reduce dependencies on db/config.hh
Instead of accessing extensions via config, access it via
database::extensions(). This reduces recompilations when configuration
is extended.
2018-12-21 20:15:44 +00:00
Avi Kivity
cc8312a8b9 commitlog: reduce dependencies on db/config.hh
Instead of accessing extensions via config, access it via
database::extensions(). This reduces recompilations when configuration
is extended.
2018-12-21 20:15:43 +00:00
Duarte Nunes
2bd76f8fc5 db/view: Introduce node_update_backlog class
This class is an atomic view update backlog representation,
safe to update from multiple shards.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
6afbec4685 db/hints: Initialize current backlog
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
12ce517242 db/view: Add view_update_backlog
The view update backlog represents the pending view data that a base replica
maintains. It is the maximum of the memory backlog - how much memory pending
view updates are consuming - and the disk backlog - how much view hints are
consuming. The size of a backlog is relative to its maximum size.

We will use this class to represent a base replica's view update
backlog at the coordinator.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
a3d30ea99a db/view: Propagate acquired semaphore units to mutate_MV()
Propagate acquired semaphore units to mutate_MV() to allow the
semaphore to be incrementally signalled as view updates are processed
by view replicas.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
8c1e6fcee8 db/timeout_clock: Define timeout_semaphore_units
Defines the type of semaphore_units<> associated with
timeout_semaphore.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
2753cfee88 db/view: Generate view updates as frozen_mutations
Working in terms of frozen_mutations allows us to account more
precisely the memory pending view updates consume at the storage_proxy
layer.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
715da6fd6b db/view: Reserve vector space in mutate_MV()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
5d011eb61f db/view: Cleanup mutate_MV()
In particular, extract out the logic updating the stats in case of a
failed update.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Avi Kivity
dd51c659f7 config: remove "to be removed before release" notice mc sstable config
The "enable_sstables_mc_format" config item help text wants to remove itself
before release. Since scylla-3.0 did not get enough mc format mileage, we
decided to leave it in, so the notice should be removed.

Fixes #4003.
Message-Id: <20181219082554.23923-1-avi@scylladb.com>
2018-12-19 09:39:29 +00:00
Duarte Nunes
224821303c Merge 'Reduce the dependency on database.hh' from Botond
"
Working on database.hh or any header that is included in database.hh
(of which there is a lot), is a major pain as each change involves the
recompilation of half of our compilation units.
Reduce the impact by removing the `#include "database.hh"` directive
from as many header files as possible. Many headers can make do with
just some forward declarations and don't need to include the entire
headers. I also found some headers that included database.hh without
actually needing it.

Results

Before:
    $ touch database.hh
    $ ninja build/release/scylla
    [1/154] CXX build/release/gen/cql3/CqlParser.o

After:
    $ touch database.hh
    $ ninja build/release/scylla
    [1/107] CXX build/release/gen/cql3/CqlParser.o
"

* 'reduce-dependencies-on-database-hh/v2' of https://github.com/denesb/scylla:
  treewide: remove include database.hh from headers where possible
  database_fwd.hh: add keyspace fwd declaration
  service/client_state: de-inline set_keyspace()
  Move cache_temperature into its own header
2018-12-14 12:24:48 +00:00
Botond Dénes
1865e5da41 treewide: remove include database.hh from headers where possible
Many headers don't really need to include database.hh, the include can
be replaced by forward declarations and/or including the actually needed
headers directly. Some headers don't need this include at all.

Each header was verified to be compilable on its own after the change,
by including it into an empty `.cc` file and compiling it. `.cc` files
that used to get `database.hh` through headers that no longer include it
were changed to include it themselves.
2018-12-14 08:03:57 +02:00
Vlad Zolotarov
7da1ac2c2c large_partition_handler: fix the message
We currently detect large partitions - not rows. So this is what we
should be reporting.

Fixes #3986

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20181212215506.9879-1-vladz@scylladb.com>
2018-12-13 00:11:27 +00:00
Duarte Nunes
89ae3fbf11 db/system_distributed_keyspace: Create the schema with min_timestamp
Different nodes can concurrently create the distributed system
keyspace on boot, before the "if not exists" clause can take effect.

However, the resulting schema mutations will be different since
different nodes use different timestamps. This patch forces the
timestamps to be the same across all nodes, so we save some schema
mismatches.

This fixes a bug exposed by ca5dfdf, whereby the initialization of the
distributed system keyspace is done before waiting for schema
agreement. While waiting for schema agreement in
storage_service::join_token_ring(), the node still hasn't joined the
ring and schemas can't be pulled from it, so nodes can deadlock. A
similar situation can happen between a seed node and a non-seed node,
where the seed node progresses to a different "wait for schema
agreement" barrier, but still can't make progress because it can't
pull the schema from the non-seed node still trying to join the ring.

Finally, it is assumed that changes to the schema of the current
distributed system keyspace tables will be protected by a cluster
feature and a subsequent schema synchronization, such that all nodes
will be at a point where schemas can be transferred around.

Fixes #3976

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181211113407.20075-1-duarte@scylladb.com>
2018-12-11 13:35:48 +01:00
Avi Kivity
b251183359 extensions: remove unneeded includes
<boost/any.hpp> is not used, and "schema.hh" can be replaced with forward
declarations.
2018-12-10 21:34:09 +02:00
Avi Kivity
119a83bf2f extensions: deinline extension accessors
Quite complex code that is not performance sensitive. Move it out of line.
2018-12-10 21:22:56 +02:00
Avi Kivity
e9f5641b64 extensions: return concrete types from the extension accessors
Returning "auto" makes it harder to understand what the function is returning,
and impossible to de-inline.

Return a vector of pointers instead. The caller should iterate immediately, in
any case, and since the previous return value was a range of references to const
unique_ptrs, nothing else could be done with it anyway.
2018-12-10 21:16:45 +02:00
Avi Kivity
8e05bcbe71 extensions: remove dependency on cql layer
The extensions class reaches into cql's property_definitions class to grab
a map<sstring, sstring> type. This generates a few unneeded dependencies.

Reduce dependencies by defining the map type ourselves; if cql's property_definitions
changes in an incompatible way, it will have to adapt, rather than the extensions
class.
2018-12-10 20:55:30 +02:00
Tomasz Grabiec
538e041f22 Merge "Remove some dependencies on db::config" from Avi
db::config is a global class; changes in any module can cause changes
in db::config. Therefore, it is a cause of needless recompilation.

Remove some of these dependencies by having consumers of db::config
declare an intermediate config struct that is contains only
configuration of interest to them, and have their caller fill it out
(in the case of auth, it already followed this scheme and the patchset
only moves the translation function).

In addition, some outright pointless inclusions of db/config.hh are
removed.

The result is somewhat shorter compile times, and fewer needless
recompiles.

* https://github.com/avikivity/scylla unconfig-1/v1:
  config: remove inclusions of db/config.hh from header files
  repair: remove unneeded config.hh inclusion
  batchlog_manager: remove dependency on db::config
  auth: remove permissions_cache dependency on db::config
  auth: remove auth::service dependency on db::config
  auth: remove unneeded db/config.hh includes
2018-12-10 14:53:14 +01:00
Avi Kivity
475b151c97 Merge "Use utils::small_vector more in read path" from Paweł
"
This series optimises the read path by replacing some usages of
std::vector by utils::small_vector. The motivation for this change was
an observation that memory allocation functions are pointed out by the
profiler as the ones where we spent most time and while they have a
large number of callers storage allocation for some vectors was close to
the top. The gains are not huge, since the problem is a lot of things
adding up and not a single slow thing, but we need to start with
something.

Unfortunately, the performance of boost::container::small_vector is
quite disappointing so a new implementation of a small_vector was
introduced.

perf_simple_query -c4 --duration 60, medians:

       ./perf_before  ./perf_after  diff
 read      343086.80     360720.53  5.1%

Tests: unit(release, small_vector in debug)
"

* tag 'small_vector/v2.1' of https://github.com/pdziepak/scylla:
  partition_slice: use small_vector for column_ids
  mutation_fragment_merger: use small_vector
  auth: use small_vector in resource
  auth: avoid list-initialisation of vectors
  idl: serialiser: add serialiser for utils::small_vector
  idl: serialiser: deduplicate vector serialisers
  utils: introduce small_vector
  intrusive_set_external_comparator: make iterator nothrow move constructible
  mutation_fragment_merger: value-initialise iterator
2018-12-10 13:50:59 +02:00
Calle Wilund
55f10ffc43 commitlog: Recycle used segments instead of delete + new file
Refs #3929

When deleting a segment, IFF we have not yet filled up all reserves,
instead of actually deleting the file, put it on a "recycle" list.
Next segment allocation will instead of creating a new one simply
rename the segment and reuse the file and its allocated space.

We rename the file twice: Once on adding to recycle list, with special
prefix so we don't mix up actual replayable segments and these. Second
when we actually re-use the file (also to ensure consecutive names).

Note that we limit the amount of recyclables, so a really stressed
application which somehow fills up the replenish queue might
cause us to still drop the segments. Could skip this but risk
getting to many files on disk.

Replay should be safe, since all entries are guarded by CRC based
on the file ID (i.e. file name). Thus replaying a recycled segment
will simply cause a CRC error in the main header and be ignored (see
previous patch).

Segments that are fully synced will have terminating zero-header (see
previous patch) so we know when to stop processing a recycled file.
If a file is the result of a mid-write crash, we will generate a CRC
processing error as "normally" in this case, when hitting partially
written block or coming to an old/new chunk boundary.

v2:
* Sync dir on rename
* auto -> const sstring&
* Allow recycling files as long as we're within disk space limits

v3:
* Use special names for files waiting for reuse
2018-12-10 09:09:07 +00:00
Calle Wilund
b13b6ef6a0 commitlog: Terminate all segments with a zero chunk
Writes a final chunk header of zero to the file on close, to mark
end-of-segment.
This allows us to gracefully stop replay processing of a segment file
even if it was not zeroed from the beginning (maybe recycled - hint
hint).
2018-12-10 09:09:07 +00:00
Calle Wilund
b35af84599 commitlog_replay: Enforce file name based id matching
When reading the header chunk of a commitlog file, check the stored id
value against the id derived from the file name, and ignore if
mismatched. This is a prerequisite for re-using renamed commitlog files,
as we can then fail-fast should one such be left on disk, instead of
trying to replay it.

We also check said id via the CRC check for each chunk parsed. If we
find a chunk with
mismatched id, we will get a CRC error for the chunk, and replay will
terminate (albeit not gracefully).
2018-12-10 09:09:07 +00:00
Avi Kivity
89be47e291 batchlog_manager: remove dependency on db::config
Extract configuration into a new struct batchlog_manager_config and have the
callers populate it using db::config. This reduces dependencies on global objects.
2018-12-09 20:11:38 +02:00
Avi Kivity
864f55e745 config: remove inclusions of db/config.hh from header files
Instead, distribute those inclusions to .cc files that require them. This
reduces rebuilds when config.hh changes, and makes it easier to locate files
that need config disaggregation.
2018-12-09 20:11:38 +02:00
Vladimir Krivopalov
6a5d8934a6 db: Enable SSTables 'mc' format by default.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <ab4394b98a520b87c986bea2ceef13d015688967.1544227350.git.vladimir@scylladb.com>
2018-12-08 11:07:38 +02:00
Paweł Dziepak
9024187222 partition_slice: use small_vector for column_ids 2018-12-06 14:21:04 +00:00