cryptopp's config.h has the following pragma:
#pragma GCC diagnostic ignored "-Wunused-function"
It is not wrapped in a push/pop. Because of that, including cryptopp
headers disables that warning on scylla code too.
The issue has been reported as
https://github.com/weidai11/cryptopp/issues/793
To work around it, this patch uses a pimpl to have a single .cc file
that has to include cryptopp headers.
While at it, it also reduces the differences and code duplication
between the md5 and sha1 hashers.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Fixes#4083
Instead of sharded collection in system.local, use a
dedicated system table (system.truncated) to store
truncation positions. Makes query/update easier
and easier on the query memory.
The code also migrates any existing truncation
positions on startup and clears the old data.
"
get_compaction_history can return a lot of records which will add up to a
big http reply.
This series makes sure it will not create large allocations when
returning the results.
It adds an api to the query_processor to use paged queries with a
consumer function that returns a future, this way we can use the http
stream after each record.
This implementation will prevent large allocations and stalls.
Fixes#4152
"
* 'amnon/compaction_history_stream_v7' of github.com:scylladb/seastar-dev:
tests/query_processor_test: add query_with_consumer_test
system_keyspace, api: stream get_compaction_history
query_processor: query and for_each_cql_result with future
get_compaciton_history can return big chunk of data.
To prevent large memory allocation, the get_compaction_history now read
each compaction_history record and use the http stream to send it.
Fixes#4152
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Recently we had a bug (#4096) due to a component
(`multishard_mutation_query()`) assuming that all reads used the
semaphore obtainable via `database::user_read_concurrency_sem()`.
This problem revealed that it is plain wrong to allow access to the
shard-global semaphores residing in the database object. Instead all
code wishing to access the relevant semaphore for some read, should do
so via the relevant `table` object, thus guaranteeing that it will get
the correct semaphore, configured for that table.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <4f3a6780eb3240822db34aba7c1ba0a675a96592.1547734212.git.bdenes@scylladb.com>
Replace stdx::optional and stdx::string_view with the C++ std
counterparts.
Some instances of boost::variant were also replaced with std::variant,
namely those that called seastar::visit.
Scylla now requires GCC 8 to compile.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190108111141.5369-1-duarte@scylladb.com>
system_keyspace is an implementation detail for most of its users, not
part of the interface, as it's only used to store internal data. Therefore,
including it in a header file causes unneeded dependencies.
This patch removes a dependency between views and system_keyspace.hh
by moving view_name and view_build_progress into a separate header file,
and using forward declarations where possible. This allows us to
remove an inclusion of system_keyspace.hh from a header file (the last
one), so that further changes to system_keyspace.hh will cause fewer
recompilations.
Message-Id: <20181228215736.11493-1-avi@scylladb.com>
Implementation of nodetool toppartiotion query, which samples most frequest PKs in read/write
operation over a period of time.
Content:
- data_listener classes: mechanism that interfaces with mutation readers in database and table classes,
- toppartition_query and toppartition_data_listener classes to implement toppartition-specific query (this
interfaces with data_listeners and the REST api),
- REST api for toppartitions query.
Uses Top-k structure for handling stream summary statistics (based on implementation in C*, see #2811).
What's still missing:
- JMX interface to nodetool (interface customization may be required),
- Querying #rows and #bytes (currently, only #partitions is supported).
Fixes#2811
* https://github.com/avikivity/scylla rafie_toppartitions_v7.1:
top_k: whitespace and minor fixes
top_k: map template arguments
top_k: std::list -> chunked_vector
top_k: support for appending top_k results
nodetool toppartitions: refactor table::config constructor
nodetool toppartitions: data listeners
nodetool toppartitions: add data_listeners to database/table
nodetool toppartitions: fully_qualified_cf_name
nodetool toppartitions: Toppartitions query implementation
nodetool toppartitions: Toppartitions query REST API
nodetool toppartitions: nodetool-toppartitions script
* seastar d59fcef...b924495 (2):
> build: Fix protobuf generation rules
> Merge "Restructure files" from Jesse
Includes fixup patch from Jesse:
"
Update Seastar `#include`s to reflect restructure
All Seastar header files are now prefixed with "seastar" and the
configure script reflects the new locations of files.
Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com>
"
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().
Mechanically converted with https://github.com/avikivity/unsprint.
memtable flushes for system and regular region groups run under the
memtable_scheduling_group, but the controller adjusts shares based on
the occupancy of the regular region group.
It can happen that regular is not under pressure, but system is. In
this case the controller will incorrectly assign low shares to the
memtable flush of system. This may result in high latency and low
throughput for writes in the system group.
I observed writes to the sytem keyspace timing out (on scylla-2.3-rc2)
in the dtest: limits_test.py:TestLimits.max_cells_test, which went
away after this.
Fixes#3717.
Message-Id: <1535016026-28006-1-git-send-email-tgrabiec@scylladb.com>
This patch adds a function that clears the view build in-progress
status for the current shard, similar to the existing one that clears
it across all shards.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
"
The IndexInfo table tracks the secondary indexes that have already
been populated. Since our secondary index implementation is backed by
materialized views, we can virtualize that table so queries are
actually answered by built_views.
Fixes#3483
"
* 'built-indexes-virtual-reader/v2' of github.com:duarten/scylla:
tests/virtual_reader_test: Add test for built indexes virtual reader
db/system_keysace: Add virtual reader for IndexInfo table
db/system_keyspace: Explain that table_name is the keyspace in IndexInfo
index/secondary_index_manager: Expose index_table_name()
db/legacy_schema_migrator: Don't migrate indexes
We would like the clients to be able to route work directly to the right
shards. To do that, they need to know the sharding algorithm and its
parameters.
The algorithm can be copied into the client, but the parameters need to
be exported somewhere. Let's use the local table for that.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
---
v2: force msb to zero on non-murmur
The IndexInfo table tracks the secondary indexes that have already
been populated. Since our secondary index implementation is backed by
materialized views, we can virtualize that table so queries are
actually answered by built_views.
Fixes#3483
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch adds the same comment that exists in Apache Cassandra,
explaining that the table_name column in the IndexInfo system table
actually refers to the keyspace name. Don't be fooled.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
"
This patchset implements separate timeouts for range queries, and lays
the foundations for separate timeouts for other query types.
While the feature in itself is worthy, the real motivation is to have
the timeouts decided by the caller, instead of storage_proxy. This in
turn is required to disentangle each layer behaving differently
depending on whether the query is internal or not; instead, the goal
is to have each caller declare its needs in terms of consistency level
and timeouts, and have the lower layers implement its requirements
instead of making their own decisions.
Fixes#3013.
Tests: unit (release)
"
* tag '3013/v1.1' of https://github.com/avikivity/scylla:
storage_proxy: remove default_query_timeout()
storage_proxy: don't use default timeouts
query_options: augment with timeout_config
thrift: configure thrift transport and handler with a timeout_config
transport: configure native transport with a timeout_config
cql3: define and populate timeout_config_selector
timeout_config: introduce timeout configuration
This commit makes database, sstables and tests aware
of which large_partition_handler they use.
Proper large_partition_handler is retrievable from config information
and is based on existing compaction_large_partition_warning_threshold_mb
entry. Right now CQL TABLE variant of large_partition_handler is used
in the database.
Tests use a NOP version of large_partition_handler, which does not
depend on CQL queries at all.
This commit adds a system.large_partitions table, which can be used
to trace largest partitions of a cluster.
Schema: (
keyspace_name text,
table_name text,
sstable_name text,
partition_size bigint,
key text,
compaction_time timestamp,
PRIMARY KEY((keyspace_name, table_name), sstable_name, partition_size, key)
) WITH CLUSTERING ORDER BY (partition_size DESC);
References #3292
Fixes#3187
Requires seastar "inet_address: Add constructor and conversion function
from/to IPv4"
Implements support IPv6 for CQL inet data. The actual data stored will
now vary between 4 and 16 bytes. gms::inet_address has been augumented
to interop with seastar::inet_address, though of course actually trying
to use an Ipv6 address there or in any of its tables with throw badly.
Tests assuming ipv4 changed. Storing a ipv4_address should be
transparent, as it now "widens". However, since all ipv4 is
inet_address, but not vice versa, there is no implicit overloading on
the read paths. I.e. tests and system_keyspace (where we read ip
addresses from tables explicitly) are modified to use the proper type.
Message-Id: <20180424161817.26316-1-calle@scylladb.com>
Treat writes to scylla_views_builds_in_progress as user memory, as the
number of writes is dependent on the amount of user data on views
(times the number of views, divided by the view building batch size).
Fixes#3325
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch implements an API to access the MV-related system tables,
which pertain to the view building process.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Provide a virtual reader so users can query the in-progress view table
in a way compatible with Apache Cassandra.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
When building a materialized view, we divide our work by shard, so we
need to register which shard did what work in the in-progress system
table. We also add the token we started at, which will enable some
optimizations in the view building code.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
As yet more parameters and return-values are about to be added to all
storage_proxy::query_* methods we need a way that scales better than
changing the signatures every time. To this end we aggregate all
non-mandatory query parameters into `coordinator_query_options` and all
return values into `coordinator_query_result`.
This way new fields can be simply added to the respective structs while
the signatures of the methods themselves and their client code can
remain unchanged.
preferred_replicas are added to the parameters and last_replicas are
added to the return type. The preferred replicas will be used as a hint
for the selection of the replicas to send the read requests to. The last
replicas (returned) are the replicas actually selected for the read.
This will allow queries to consistently hit the same replicas for each
page thus reusing readers created on these replicas.
For convenience a query() overload is provided that doesn't take or
return the preferred and last replicas.
This patch only adds the parameters and propagates them down to
query_singular() and query_partition_key_range(). The code to actually
use these preferred-replicas will be added in later patches.
This reason for separating this is to reduce noise and improve
reviewability for those functional changes later.
Timeouts are a global property. However, for tables in keyspaces like
the system keyspace, we don't want to uphold that timeout--in fact, we
wan't no timeout there at all.
We already apply such configuration for requests waiting in the queued
sstable queue: system keyspace requests won't be removed. However, the
storage proxy will insert its own timeouts in those requests, causing
them to fail.
This patch changes the storage proxy read layer so that the timeout is
applied based on the column family configuration, which is in turn
inherited from the keyspace configuration. This matches our usual
way of passing db parameters down.
In terms of implementation, we can either move the timeout inside the
abstract read executor or keep it external. The former is a bit cleaner,
the the latter has the nice property that all executors generated will
share the exact same timeout point. In this patch, we chose the latter.
We are also careful to propagate the timeout information to the replica.
So even if we are talking about the local replica, when we add the
request to the concurrency queue, we will do it in accordance with the
timeout specified by the storage proxy layer.
After this patch, Scylla is able to start just fine with very low
timeouts--since read timeouts in the system keyspace are now ignored.
Fixes#2462
Implementation notes, and general comments about open discussion in 2462:
* Because we are not bypassing the timeout, just setting it high enough,
I consider the concerns about the batchlog moot: if we fail for any
other reason that will be propagated. Last case, because the timeout
is per-CF, we could do what we do for the dirty memory manager and
move the batchlog alone to use a different timeout setting.
* Storage proxy likes specifying its timeouts as a time_point, whereas
when we get low enough as to deal with the read_concurrency_config,
we are talking about deltas. So at some point we need to convert time_points
to durations. We do that in the database query functions.
v2:
- use per-request instead of per-table timeouts.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Update description of existing reader count metrics, add memory
consumption metrics. Use labels to distinguish between system, user and
streaming reads related metrics.
Restrict readers based on their memory consumption, instead of the count
of the top-level readers. To do this an interposer is installed at the
input_stream level which tracks buffers emmited by the stream. This way
we can have an accurate picture of the readers' actual memory
consumption.
New readers will consume 16k units from the semaphore up-front. This is
to account their own memory-consumption, apart from the buffers they
will allocate. Creating the reader will be deferred to when there are
enough resources to create it. As before only new readers will be
blocked on an exhausted semaphore, existing readers can continue to
work.
Restrict readers based on their memory consumption, instead of the count
of the top-level readers. To do this an interposer is installed at the
input_stream level which tracks buffers emmited by the stream. This way
we can have an accurate picture of the readers' actual memory
consumption.
New readers will consume 16k units from the semaphore up-front. This is
to account their own memory-consumption, apart from the buffers they
will allocate. Creating the reader will be deferred to when there are
enough resources to create it. As before only new readers will be
blocked on an exhausted semaphore, existing readers can continue to
work.
Forward-declare untyped_result_set and untyped_result_set_row, and remove
the include from query_processor.hh.
Message-Id: <20170916170859.27612-3-avi@scylladb.com>
"This series adds an option to use paging in internal query and use that for the
get compaction history function.
Internal paging will be done explicitly, to use paging, you first create a
state object (that contains the query as well) and use that state to get the
first page, the result will contain both the query result and a new state that
can be used to get the next page.
Fixes#2366"
* 'amnon/paged_compaction_history_v5' of github.com:cloudius-systems/seastar-dev:
system_keyspace: Use paging for get compaction history
Add paging for internal queries
query_options: Allows creating query_options from query_options