Update description of existing reader count metrics, add memory
consumption metrics. Use labels to distinguish between system, user and
streaming reads related metrics.
Restrict readers based on their memory consumption, instead of the count
of the top-level readers. To do this an interposer is installed at the
input_stream level which tracks buffers emmited by the stream. This way
we can have an accurate picture of the readers' actual memory
consumption.
New readers will consume 16k units from the semaphore up-front. This is
to account their own memory-consumption, apart from the buffers they
will allocate. Creating the reader will be deferred to when there are
enough resources to create it. As before only new readers will be
blocked on an exhausted semaphore, existing readers can continue to
work.
Restrict readers based on their memory consumption, instead of the count
of the top-level readers. To do this an interposer is installed at the
input_stream level which tracks buffers emmited by the stream. This way
we can have an accurate picture of the readers' actual memory
consumption.
New readers will consume 16k units from the semaphore up-front. This is
to account their own memory-consumption, apart from the buffers they
will allocate. Creating the reader will be deferred to when there are
enough resources to create it. As before only new readers will be
blocked on an exhausted semaphore, existing readers can continue to
work.
Forward-declare untyped_result_set and untyped_result_set_row, and remove
the include from query_processor.hh.
Message-Id: <20170916170859.27612-3-avi@scylladb.com>
"This series adds an option to use paging in internal query and use that for the
get compaction history function.
Internal paging will be done explicitly, to use paging, you first create a
state object (that contains the query as well) and use that state to get the
first page, the result will contain both the query result and a new state that
can be used to get the next page.
Fixes#2366"
* 'amnon/paged_compaction_history_v5' of github.com:cloudius-systems/seastar-dev:
system_keyspace: Use paging for get compaction history
Add paging for internal queries
query_options: Allows creating query_options from query_options
there could be a lot of compactions when querying for compaction
history.
This patch changes the query to use paging. It would collect all results
when returning to the caller.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
- introcduced "seastarx.hh" header, which does a "using namespace seastar";
- 'net' namespace conflicts with seastar::net, renamed to 'netw'.
- 'transport' namespace conflicts with seastar::transport, renamed to
cql_transport.
- "logger" global variables now conflict with logger global type, renamed
to xlogger.
- other minor changes
Having a varadic parameter being used in implicit sprint is not
very readable + makes it less intuitive when suddenly system keyspace
becomes more than one -> multiple sprints in the chain -> more confusion
or more execution paths.
Its not that horrible with some spread out sprint:s
org::apache::cassandra (the generated namespace name) gets confused with
apache::cassandra (the thrift runtime library namespace), either due to
changes in gcc 7 or in thrift 0.10. Either way, the problem is fixed
by changing the generated namespace to plain cassandra.
This patch moves some duplicate code into the
add_column_family_and_create_directory() function. It also saves some
superfluous keyspace lookups and readies the code to be used by
materialized views.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Batchlog is a potentially memory-intensive table whose workload is
driven by user needs, not system's. Move it to the user dirty memory
manager.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Now that access to the size_estimates system is virtualized, we no
longer need the recorder.
Fixes#1616
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch add a virtual mutation_reader so that queries
to the size_estimates system table are handled by the engine
without needing to perform any IO.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch adds a function to system_keyspace responsible for creating
a mutation to a partition of the size_estimates system table from a
set of range_estimates.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch changes the range_estimates struct so that the tokens are
represented as utf8 encoded bytes. This will make future patches
require less conversions.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
There are places in which we need to use the column family object many
times, with deferring points in between. Because the column family may
have been destroyed in the deferring point, we need to go and find it
again.
If we use lw_shared_ptr, however, we'll be able to at least guarantee
that the object will be alive. Some users will still need to check, if
they want to guarantee that the column family wasn't removed. But others
that only need to make sure we don't access an invalid object will be
able to avoid the cost of re-finding it just fine.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <722bf49e158da77ff509372c2034e5707706e5bf.1478111467.git.glauber@scylladb.com>
Remove inclusions from header files (primary offender is fb_utilities.hh)
and introduce new messaging_service_fwd.hh to reduce rebuilds when the
messaging service changes.
Message-Id: <1475584615-22836-1-git-send-email-avi@scylladb.com>
The query_size_estimates() function queries the size_estimates system
table for a given keyspace and table, filtering out the token ranges
according to the specified tokens.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch makes range_estimates a proper struct, where tokens are
represented as dht::tokens rather than dht::ring_position*.
We also pass other arguments to update_ and clear_size_estimates by
copy, since one will already be required.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Having a trace_state_ptr in the storage_proxy level is needed to trace code bits in this level.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
This patch implements functions that allow the size_estimates system
table to be updated and cleared. The size_estimates table is updated
per schema with a set of token ranges and the associated estimations
of how many partitions there are and their mean size.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
In the spirit of what we are doing for the read semaphore, this patch moves
system writes to its own dirty memory manager. Not only will it make sure that
system tables will not be serialized by its own semaphore, but it will also put
system tables in its own region group.
Moving system tables to its own region group has the advantage that system
requests won't be waiting during throttle behind a potentially big queue of user
requests, since requests are tended to in FIFO order within the same region
group. However, system tables being more controlled and predictable, we can
actually go a step further and give them some extra reservation so they may not
necessarily block even if under pressure (up to 10 MB more).
Signed-off-by: Glauber Costa <glauber@scylladb.com>
While limiting the number of concurrently executing sstable readers reduces
our memory load, the queued readers, although consuming a small amount of
memory, can still grow without bounds.
To limit the damage, add two limits on the queue:
- a timeout, which is equal to the read timeout
- a queue length limit, which is equal to 2% of the shard memory divided
by an estimate of the queued request size (1kb)
Together, these limits bound the amount of memory needed by queued disk
requests in case the disk can't keep up.
Message-Id: <1467206055-30769-1-git-send-email-avi@scylladb.com>
Since reading mutations can consume a large amount of memory, which, moreover,
is not predicatable at the time the read is initiated, restrict the number
of reads to 100 per shard. This is more than enough to saturate the disk,
and hopefully enough to prevent allocation failures.
Restriction is applied in column_family::make_sstable_reader(), which is
called either on a cache miss or if the cache is disabled. This allows
cached reads to proceed without restriction, since their memory usage is
supposedly low.
Reads from the system keyspace use a separate semaphore, to prevent
user reads from blocking system reads. Perhaps we should select the
semaphore based on the source of the read rather than the keyspace,
but for now using the keyspace is sufficient.
"There is a need to have an ability to detect whether a feature is
supported by entire cluster. The way to do it is to advertise feature
availability over gossip and then each node will be able to check if all
other nodes have a feature in question.
The idea is to have new application state SUPPORTED_FEATURES that will contain
set of strings, each string holding feature name.
This series adds API to do so.
The following patch on top of this series demostreates how to wait for features
during boot up. FEATURE1 and FEATURE2 are introduced. We use
wait_for_feature_on_all_node to wait for FEATURE1 and FEATURE2 successfully.
Since FEATURE3 is not supported, the wait will not succeed, the wait will timeout.
--- a/service/storage_service.cc
+++ b/service/storage_service.cc
@@ -95,7 +95,7 @@ sstring storage_service::get_config_supported_features() {
// Add features supported by this local node. When a new feature is
// introduced in scylla, update it here, e.g.,
// return sstring("FEATURE1,FEATURE2")
- return sstring("");
+ return sstring("FEATURE1,FEATURE2");
}
std::set<inet_address> get_seeds() {
@@ -212,6 +212,11 @@ void storage_service::prepare_to_join() {
// gossip snitch infos (local DC and rack)
gossip_snitch_info().get();
+ gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE1"), sstring("FEATURE2")}, std::chrono::seconds(30)).get();
+ logger.info("Wait for FEATURE1 and FEATURE2 done");
+ gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE3")}).get();
+ logger.info("Wait for FEATURE3 done");
+
We can query the supported_features:
cqlsh> SELECT supported_features from system.peers;
supported_features
--------------------
FEATURE1,FEATURE2
FEATURE1,FEATURE2
(2 rows)
cqlsh> SELECT supported_features from system.local;
supported_features
--------------------
FEATURE1,FEATURE2
(1 rows)"