Commit Graph

209 Commits

Author SHA1 Message Date
Rafael Ávila de Espíndola
fd5ea2df5a Avoid including cryptopp headers
cryptopp's config.h has the following pragma:

 #pragma GCC diagnostic ignored "-Wunused-function"

It is not wrapped in a push/pop. Because of that, including cryptopp
headers disables that warning on scylla code too.

The issue has been reported as
https://github.com/weidai11/cryptopp/issues/793

To work around it, this patch uses a pimpl to have a single .cc file
that has to include cryptopp headers.

While at it, it also reduces the differences and code duplication
between the md5 and sha1 hashers.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-20 08:03:46 -08:00
Calle Wilund
4e657c0633 system_keyspace: Add waitable for trunc. migration
For tests. Hooray for separation of concern.
2019-02-13 09:08:12 +00:00
Calle Wilund
64e8c6f31d storage_service: Add features disabling for tests 2019-02-13 09:08:12 +00:00
Calle Wilund
12ebcf1ec7 commitlog_replay: Use dedicated table for truncation
Fixes #4083

Instead of sharded collection in system.local, use a
dedicated system table (system.truncated) to store
truncation positions. Makes query/update easier
and easier on the query memory.

The code also migrates any existing truncation
positions on startup and clears the old data.
2019-02-13 09:08:12 +00:00
Avi Kivity
6c71eae63f Merge "API: Stream compaction history records" from Amnon
"
get_compaction_history can return a lot of records which will add up to a
big http reply.

This series makes sure it will not create large allocations when
returning the results.

It adds an api to the query_processor to use paged queries with a
consumer function that returns a future, this way we can use the http
stream after each record.

This implementation will prevent large allocations and stalls.

Fixes #4152
"

* 'amnon/compaction_history_stream_v7' of github.com:scylladb/seastar-dev:
  tests/query_processor_test: add query_with_consumer_test
  system_keyspace, api: stream get_compaction_history
  query_processor: query and for_each_cql_result with future
2019-02-05 14:16:36 +02:00
Amnon Heiman
6c7742d616 system_keyspace, api: stream get_compaction_history
get_compaciton_history can return big chunk of data.

To prevent large memory allocation, the get_compaction_history now read
each compaction_history record and use the http stream to send it.

Fixes #4152

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-02-05 11:14:53 +02:00
Piotr Jastrzebski
834bec5cc9 Read shard awareness columns as dropped
Without this new version of Scylla won't be able to
start with system tables inherited after older version
that had shard awareness columns.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <cb62f20fc0c98f532c6f4ad5e08b3794951e85bd.1549289050.git.piotr@scylladb.com>
2019-02-04 18:43:11 +02:00
Piotr Jastrzebski
ad217bbdc7 Revert "system_keyspace: add sharding information to local table"
This reverts commit bdce561ada.

Those columns are not used and cause problems with tools.

Refs #4112
Message-Id: <c772ebc0ebc001e5bdf229424c6d51dc58cd5d2e.1548945023.git.piotr@scylladb.com>
2019-01-31 19:06:55 +01:00
Botond Dénes
4e89dea9ea database: don't allow access to global semaphores
Recently we had a bug (#4096) due to a component
(`multishard_mutation_query()`) assuming that all reads used the
semaphore obtainable via `database::user_read_concurrency_sem()`.
This problem revealed that it is plain wrong to allow access to the
shard-global semaphores residing in the database object. Instead all
code wishing to access the relevant semaphore for some read, should do
so via the relevant `table` object, thus guaranteeing that it will get
the correct semaphore, configured for that table.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <4f3a6780eb3240822db34aba7c1ba0a675a96592.1547734212.git.bdenes@scylladb.com>
2019-01-21 16:29:02 +02:00
Duarte Nunes
fa2b0384d2 Replace std::experimental types with C++17 std version.
Replace stdx::optional and stdx::string_view with the C++ std
counterparts.

Some instances of boost::variant were also replaced with std::variant,
namely those that called seastar::visit.

Scylla now requires GCC 8 to compile.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190108111141.5369-1-duarte@scylladb.com>
2019-01-08 13:16:36 +02:00
Avi Kivity
0c0cc66ee7 system_keyspace, view: reduce interdependencies
system_keyspace is an implementation detail for most of its users, not
part of the interface, as it's only used to store internal data. Therefore,
including it in a header file causes unneeded dependencies.

This patch removes a dependency between views and system_keyspace.hh
by moving view_name and view_build_progress into a separate header file,
and using forward declarations where possible. This allows us to
remove an inclusion of system_keyspace.hh from a header file (the last
one), so that further changes to system_keyspace.hh will cause fewer
recompilations.
Message-Id: <20181228215736.11493-1-avi@scylladb.com>
2018-12-29 12:12:15 +00:00
Tomasz Grabiec
7747f2dde3 Merge "nodetool toppartitions" from Rafi & Avi
Implementation of nodetool toppartiotion query, which samples most frequest PKs in read/write
operation over a period of time.

Content:
- data_listener classes: mechanism that interfaces with mutation readers in database and table classes,
- toppartition_query and toppartition_data_listener classes to implement toppartition-specific query (this
  interfaces with data_listeners and the REST api),
- REST api for toppartitions query.

Uses Top-k structure for handling stream summary statistics (based on implementation in C*, see #2811).

What's still missing:
- JMX interface to nodetool (interface customization may be required),
- Querying #rows and #bytes (currently, only #partitions is supported).

Fixes #2811

* https://github.com/avikivity/scylla rafie_toppartitions_v7.1:
  top_k: whitespace and minor fixes
  top_k: map template arguments
  top_k: std::list -> chunked_vector
  top_k: support for appending top_k results
  nodetool toppartitions: refactor table::config constructor
  nodetool toppartitions: data listeners
  nodetool toppartitions: add data_listeners to database/table
  nodetool toppartitions: fully_qualified_cf_name
  nodetool toppartitions: Toppartitions query implementation
  nodetool toppartitions: Toppartitions query REST API
  nodetool toppartitions: nodetool-toppartitions script
2018-12-28 16:31:24 +01:00
Rafi Einstein
038f8c7988 nodetool toppartitions: refactor table::config constructor
Eliminae extra parameters to ctor and deduce them instead from db param.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:57 +02:00
Avi Kivity
775b7e41f4 Update seastar submodule
* seastar d59fcef...b924495 (2):
  > build: Fix protobuf generation rules
  > Merge "Restructure files" from Jesse

Includes fixup patch from Jesse:

"
Update Seastar `#include`s to reflect restructure

All Seastar header files are now prefixed with "seastar" and the
configure script reflects the new locations of files.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com>
"
2018-11-21 00:01:44 +02:00
Avi Kivity
d77e044cde db: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
04b70a2ff8 system_keyspace: simplify complicated sprint()
update_peer_info() uses two sprint()s where one would do, which confuses
the sprint-to-fmt translator. Simplify the code by using just one call.
2018-11-01 13:16:17 +00:00
Benny Halevy
2a57c454f2 update_compaction_history: handle execute_cql exception
Fixes #3774

Tested using view_schema_test with and without injecting an exception in
modification_statement::do_execute for "compaction_history".

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181017105758.9602-3-bhalevy@scylladb.com>
2018-10-24 18:39:53 +03:00
Tomasz Grabiec
10f6b125c8 database: Run system table flushes in the main scheduling group
memtable flushes for system and regular region groups run under the
memtable_scheduling_group, but the controller adjusts shares based on
the occupancy of the regular region group.

It can happen that regular is not under pressure, but system is. In
this case the controller will incorrectly assign low shares to the
memtable flush of system. This may result in high latency and low
throughput for writes in the system group.

I observed writes to the sytem keyspace timing out (on scylla-2.3-rc2)
in the dtest: limits_test.py:TestLimits.max_cells_test, which went
away after this.

Fixes #3717.

Message-Id: <1535016026-28006-1-git-send-email-tgrabiec@scylladb.com>
2018-08-23 15:07:05 +03:00
Duarte Nunes
2fa7f10429 db/system_keyspace: Add function to remove view build status of a shard
This patch adds a function that clears the view build in-progress
status for the current shard, similar to the existing one that clears
it across all shards.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-07-11 21:27:39 +01:00
Avi Kivity
6f23403137 Merge "Virtualize IndexInfo system table" from Duarte
"
The IndexInfo table tracks the secondary indexes that have already
been populated. Since our secondary index implementation is backed by
materialized views, we can virtualize that table so queries are
actually answered by built_views.

Fixes #3483
"

* 'built-indexes-virtual-reader/v2' of github.com:duarten/scylla:
  tests/virtual_reader_test: Add test for built indexes virtual reader
  db/system_keysace: Add virtual reader for IndexInfo table
  db/system_keyspace: Explain that table_name is the keyspace in IndexInfo
  index/secondary_index_manager: Expose index_table_name()
  db/legacy_schema_migrator: Don't migrate indexes
2018-06-06 17:35:51 +03:00
Glauber Costa
bdce561ada system_keyspace: add sharding information to local table
We would like the clients to be able to route work directly to the right
shards. To do that, they need to know the sharding algorithm and its
parameters.

The algorithm can be copied into the client, but the parameters need to
be exported somewhere. Let's use the local table for that.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
---
v2: force msb to zero on non-murmur
2018-06-04 11:25:58 -04:00
Duarte Nunes
3e39985c7a db/system_keysace: Add virtual reader for IndexInfo table
The IndexInfo table tracks the secondary indexes that have already
been populated. Since our secondary index implementation is backed by
materialized views, we can virtualize that table so queries are
actually answered by built_views.

Fixes #3483

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-06-04 11:14:17 +01:00
Duarte Nunes
65c4205334 db/system_keyspace: Explain that table_name is the keyspace in IndexInfo
This patch adds the same comment that exists in Apache Cassandra,
explaining that the table_name column in the IndexInfo system table
actually refers to the keyspace name. Don't be fooled.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-06-04 11:14:17 +01:00
Duarte Nunes
7187963bda db/legacy_schema_migrator: Don't migrate indexes
Previous versions contained no indexes, and Apache Cassandra indexes
cannot be migrated to Scylla.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-06-04 11:14:17 +01:00
Duarte Nunes
a23bda3393 Merge 'Implement separate timeout for range queries' from Avi
"
This patchset implements separate timeouts for range queries, and lays
the foundations for separate timeouts for other query types.

While the feature in itself is worthy, the real motivation is to have
the timeouts decided by the caller, instead of storage_proxy. This in
turn is required to disentangle each layer behaving differently
depending on whether the query is internal or not; instead, the goal
is to have each caller declare its needs in terms of consistency level
and timeouts, and have the lower layers implement its requirements
instead of making their own decisions.

Fixes #3013.

Tests: unit (release)
"

* tag '3013/v1.1' of https://github.com/avikivity/scylla:
  storage_proxy: remove default_query_timeout()
  storage_proxy: don't use default timeouts
  query_options: augment with timeout_config
  thrift: configure thrift transport and handler with a timeout_config
  transport: configure native transport with a timeout_config
  cql3: define and populate timeout_config_selector
  timeout_config: introduce timeout configuration
2018-05-13 20:05:50 +02:00
Piotr Sarna
fe02c3d0e2 database, sstables, tests: add large_partition_handler
This commit makes database, sstables and tests aware
of which large_partition_handler they use.
Proper large_partition_handler is retrievable from config information
and is based on existing compaction_large_partition_warning_threshold_mb
entry. Right now CQL TABLE variant of large_partition_handler is used
in the database.

Tests use a NOP version of large_partition_handler, which does not
depend on CQL queries at all.
2018-05-04 14:38:13 +02:00
Piotr Sarna
02822efbc8 db: add system.large_partitions table
This commit adds a system.large_partitions table, which can be used
to trace largest partitions of a cluster.
Schema: (
  keyspace_name text,
  table_name text,
  sstable_name text,
  partition_size bigint,
  key text,
  compaction_time timestamp,
  PRIMARY KEY((keyspace_name, table_name), sstable_name, partition_size, key)
) WITH CLUSTERING ORDER BY (partition_size DESC);

References #3292
2018-05-04 12:45:40 +02:00
Avi Kivity
d8dd7e05a7 storage_proxy: don't use default timeouts
Require all callers to supply timeouts instead of relying on defaults.

Since all callers now have the timeouts set up, they can easily supply
them.
2018-04-30 13:19:53 +03:00
Calle Wilund
b1edf75c8b types: Make seastar::inet_address the "native" type for CQL inet.
Fixes #3187

Requires seastar "inet_address: Add constructor and conversion function
from/to IPv4"

Implements support IPv6 for CQL inet data. The actual data stored will
now vary between 4 and 16 bytes. gms::inet_address has been augumented
to interop with seastar::inet_address, though of course actually trying
to use an Ipv6 address there or in any of its tables with throw badly.

Tests assuming ipv4 changed. Storing a ipv4_address should be
transparent, as it now "widens". However, since all ipv4 is
inet_address, but not vice versa, there is no implicit overloading on
the read paths. I.e. tests and system_keyspace (where we read ip
addresses from tables explicitly) are modified to use the proper type.
Message-Id: <20180424161817.26316-1-calle@scylladb.com>
2018-04-24 23:12:07 +01:00
Duarte Nunes
75bb66a50d db/system_keyspace: scylla_views_builds_in_progress writes are user mem
Treat writes to scylla_views_builds_in_progress as user memory, as the
number of writes is dependent on the amount of user data on views
(times the number of views, divided by the view building batch size).

Fixes #3325

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-04-03 13:16:28 +01:00
Duarte Nunes
4227641a3d db/system_keyspace: Add API for MV-related system tables
This patch implements an API to access the MV-related system tables,
which pertain to the view building process.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:10 +01:00
Duarte Nunes
b2cae7ea09 db/system_keyspace: Add virtual reader for MV in-progress build status
Provide a virtual reader so users can query the in-progress view table
in a way compatible with Apache Cassandra.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:10 +01:00
Duarte Nunes
7811474697 db/system_keyspace: Add Scylla-specific MV system table
When building a materialized view, we divide our work by shard, so we
need to register which shard did what work in the in-progress system
table. We also add the token we started at, which will enable some
optimizations in the view building code.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:10 +01:00
Duarte Nunes
38831888d2 db/system_keyspace: Include MV system tables in all_tables()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:10 +01:00
Botond Dénes
2e2abf6edb storage_proxy: add coordinator_query_options and coordinator_query_result
As yet more parameters and return-values are about to be added to all
storage_proxy::query_* methods we need a way that scales better than
changing the signatures every time. To this end we aggregate all
non-mandatory query parameters into `coordinator_query_options` and all
return values into `coordinator_query_result`.
This way new fields can be simply added to the respective structs while
the signatures of the methods themselves and their client code can
remain unchanged.
2018-03-19 15:17:35 +02:00
Botond Dénes
eac597d726 Add preferred and last replicas to the signature of query()
preferred_replicas are added to the parameters and last_replicas are
added to the return type. The preferred replicas will be used as a hint
for the selection of the replicas to send the read requests to. The last
replicas (returned) are the replicas actually selected for the read.
This will allow queries to consistently hit the same replicas for each
page thus reusing readers created on these replicas.
For convenience a query() overload is provided that doesn't take or
return the preferred and last replicas.

This patch only adds the parameters and propagates them down to
query_singular() and query_partition_key_range(). The code to actually
use these preferred-replicas will be added in later patches.
This reason for separating this is to reduce noise and improve
reviewability for those functional changes later.
2018-03-13 10:34:34 +02:00
Avi Kivity
4f6b892aa1 cql3: remove #include of system_keyspace.hh
We include system_keyspace for just the string "system" (and a related
is_system_keyspace() function). Replace with a forward-declared functions.
2018-03-11 18:02:23 +02:00
Botond Dénes
1259031af3 Use the reader_concurrency_semaphore to limit reader concurrency 2018-03-08 14:12:12 +02:00
Duarte Nunes
9254a9a6fe db/system_keyspace: Move dependency on db/schema_tables to source file
And add missing dependencies to header file.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180307111304.2914-1-duarte@scylladb.com>
2018-03-07 14:45:36 +02:00
José Guilherme Vanz
380bc0aa0d Swap arguments order of mutation constructor
Swap arguments in the mutation constructor keeping the same standard
from the constructor variants. Refs #3084

Signed-off-by: José Guilherme Vanz <guilherme.sft@gmail.com>
Message-Id: <20180120000154.3823-1-guilherme.sft@gmail.com>
2018-01-21 12:58:42 +02:00
Glauber Costa
08a0c3714c allow request-specific read timeouts in storage proxy reads
Timeouts are a global property. However, for tables in keyspaces like
the system keyspace, we don't want to uphold that timeout--in fact, we
wan't no timeout there at all.

We already apply such configuration for requests waiting in the queued
sstable queue: system keyspace requests won't be removed. However, the
storage proxy will insert its own timeouts in those requests, causing
them to fail.

This patch changes the storage proxy read layer so that the timeout is
applied based on the column family configuration, which is in turn
inherited from the keyspace configuration. This matches our usual
way of passing db parameters down.

In terms of implementation, we can either move the timeout inside the
abstract read executor or keep it external. The former is a bit cleaner,
the the latter has the nice property that all executors generated will
share the exact same timeout point. In this patch, we chose the latter.

We are also careful to propagate the timeout information to the replica.
So even if we are talking about the local replica, when we add the
request to the concurrency queue, we will do it in accordance with the
timeout specified by the storage proxy layer.

After this patch, Scylla is able to start just fine with very low
timeouts--since read timeouts in the system keyspace are now ignored.

Fixes #2462

Implementation notes, and general comments about open discussion in 2462:

* Because we are not bypassing the timeout, just setting it high enough,
  I consider the concerns about the batchlog moot: if we fail for any
  other reason that will be propagated. Last case, because the timeout
  is per-CF, we could do what we do for the dirty memory manager and
  move the batchlog alone to use a different timeout setting.

* Storage proxy likes specifying its timeouts as a time_point, whereas
  when we get low enough as to deal with the read_concurrency_config,
  we are talking about deltas. So at some point we need to convert time_points
  to durations. We do that in the database query functions.

v2:
- use per-request instead of per-table timeouts.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-01-12 07:43:21 -05:00
Botond Dénes
fea6214a0a Update reader restriction related metrics
Update description of existing reader count metrics, add memory
consumption metrics. Use labels to distinguish between system, user and
streaming reads related metrics.
2017-10-03 12:44:17 +03:00
Botond Dénes
47e07b787e restricted_mutation_reader: restrict based-on memory consumption
Restrict readers based on their memory consumption, instead of the count
of the top-level readers. To do this an interposer is installed at the
input_stream level which tracks buffers emmited by the stream. This way
we can have an accurate picture of the readers' actual memory
consumption.
New readers will consume 16k units from the semaphore up-front. This is
to account their own memory-consumption, apart from the buffers they
will allocate. Creating the reader will be deferred to when there are
enough resources to create it. As before only new readers will be
blocked on an exhausted semaphore, existing readers can continue to
work.
2017-10-03 12:44:12 +03:00
Avi Kivity
78eae8bf48 Revert "Merge "Make restricting_mutation_reader more accurate" from Botond"
This reverts commit c6e5dcc556, reversing
changes made to 19b21a0ab2. Failes to build,
plus author has more changes.
2017-10-03 11:58:59 +03:00
Botond Dénes
43dba8f173 Update reader restriction related metrics
Update description of existing reader count metrics, add memory
consumption metrics.
2017-09-20 11:16:21 +03:00
Botond Dénes
33e97e7457 restricted_mutation_reader: restrict based-on memory consumption
Restrict readers based on their memory consumption, instead of the count
of the top-level readers. To do this an interposer is installed at the
input_stream level which tracks buffers emmited by the stream. This way
we can have an accurate picture of the readers' actual memory
consumption.
New readers will consume 16k units from the semaphore up-front. This is
to account their own memory-consumption, apart from the buffers they
will allocate. Creating the reader will be deferred to when there are
enough resources to create it. As before only new readers will be
blocked on an exhausted semaphore, existing readers can continue to
work.
2017-09-20 11:14:35 +03:00
Avi Kivity
e44517851e untyped_result_set: reduce dependencies
Forward-declare untyped_result_set and untyped_result_set_row, and remove
the include from query_processor.hh.
Message-Id: <20170916170859.27612-3-avi@scylladb.com>
2017-09-18 15:15:15 +02:00
Avi Kivity
0aaefe665b system_keyspace: add missing include 2017-09-11 20:09:45 +03:00
Piotr Jastrzebski
dd5dc75605 Stop calling _local_cache.stop in at_exit.
This removes a race condition that was causing #2721

Fixes #2721

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <ad060fab43d63c17db9f811c421d7ab26e5e57c8.1503933021.git.piotr@scylladb.com>
2017-09-03 15:55:48 +03:00
Avi Kivity
ebff739a84 Merge "use paging for compaction history" from Amnon
"This series adds an option to use paging in internal query and use that for the
get compaction history function.

Internal paging will be done explicitly, to use paging, you first create a
state object (that contains the query as well) and use that state to get the
first page, the result will contain both the query result and a new state that
can be used to get the next page.

Fixes #2366"

* 'amnon/paged_compaction_history_v5' of github.com:cloudius-systems/seastar-dev:
  system_keyspace: Use paging for get compaction history
  Add paging for internal queries
  query_options: Allows creating query_options from query_options
2017-08-02 18:15:58 +03:00