Commit Graph

168 Commits

Author SHA1 Message Date
Botond Dénes
fea6214a0a Update reader restriction related metrics
Update description of existing reader count metrics, add memory
consumption metrics. Use labels to distinguish between system, user and
streaming reads related metrics.
2017-10-03 12:44:17 +03:00
Botond Dénes
47e07b787e restricted_mutation_reader: restrict based-on memory consumption
Restrict readers based on their memory consumption, instead of the count
of the top-level readers. To do this an interposer is installed at the
input_stream level which tracks buffers emmited by the stream. This way
we can have an accurate picture of the readers' actual memory
consumption.
New readers will consume 16k units from the semaphore up-front. This is
to account their own memory-consumption, apart from the buffers they
will allocate. Creating the reader will be deferred to when there are
enough resources to create it. As before only new readers will be
blocked on an exhausted semaphore, existing readers can continue to
work.
2017-10-03 12:44:12 +03:00
Avi Kivity
78eae8bf48 Revert "Merge "Make restricting_mutation_reader more accurate" from Botond"
This reverts commit c6e5dcc556, reversing
changes made to 19b21a0ab2. Failes to build,
plus author has more changes.
2017-10-03 11:58:59 +03:00
Botond Dénes
43dba8f173 Update reader restriction related metrics
Update description of existing reader count metrics, add memory
consumption metrics.
2017-09-20 11:16:21 +03:00
Botond Dénes
33e97e7457 restricted_mutation_reader: restrict based-on memory consumption
Restrict readers based on their memory consumption, instead of the count
of the top-level readers. To do this an interposer is installed at the
input_stream level which tracks buffers emmited by the stream. This way
we can have an accurate picture of the readers' actual memory
consumption.
New readers will consume 16k units from the semaphore up-front. This is
to account their own memory-consumption, apart from the buffers they
will allocate. Creating the reader will be deferred to when there are
enough resources to create it. As before only new readers will be
blocked on an exhausted semaphore, existing readers can continue to
work.
2017-09-20 11:14:35 +03:00
Avi Kivity
e44517851e untyped_result_set: reduce dependencies
Forward-declare untyped_result_set and untyped_result_set_row, and remove
the include from query_processor.hh.
Message-Id: <20170916170859.27612-3-avi@scylladb.com>
2017-09-18 15:15:15 +02:00
Avi Kivity
0aaefe665b system_keyspace: add missing include 2017-09-11 20:09:45 +03:00
Piotr Jastrzebski
dd5dc75605 Stop calling _local_cache.stop in at_exit.
This removes a race condition that was causing #2721

Fixes #2721

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <ad060fab43d63c17db9f811c421d7ab26e5e57c8.1503933021.git.piotr@scylladb.com>
2017-09-03 15:55:48 +03:00
Avi Kivity
ebff739a84 Merge "use paging for compaction history" from Amnon
"This series adds an option to use paging in internal query and use that for the
get compaction history function.

Internal paging will be done explicitly, to use paging, you first create a
state object (that contains the query as well) and use that state to get the
first page, the result will contain both the query result and a new state that
can be used to get the next page.

Fixes #2366"

* 'amnon/paged_compaction_history_v5' of github.com:cloudius-systems/seastar-dev:
  system_keyspace: Use paging for get compaction history
  Add paging for internal queries
  query_options: Allows creating query_options from query_options
2017-08-02 18:15:58 +03:00
Amnon Heiman
e345d05ebe system_keyspace: Use paging for get compaction history
there could be a lot of compactions when querying for compaction
history.

This patch changes the query to use paging. It would collect all results
when returning to the caller.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2017-07-20 18:17:49 +03:00
Calle Wilund
7a583585a2 system_keyspace: Make sure "system" is written to keyspaces (visible)
Fixes #2514

Bug in schema version 3 update: We failed to write "system" to the
schema tables. Only visible on an empty instance of course.

Message-Id: <1500469809-23546-2-git-send-email-calle@scylladb.com>
2017-07-19 16:18:56 +03:00
Avi Kivity
f0b20be14d Revert "system_keyspace: Make sure "system" is written to keyspaces (visible)"
This reverts commit 89ef69c4b3. Prevents nodes
from joining the cluster.
2017-06-21 16:58:04 +03:00
Calle Wilund
89ef69c4b3 system_keyspace: Make sure "system" is written to keyspaces (visible)
Fixes #2514
Bug in schema version 3 update: We failed to write "system" to the
schema tables. Only visible on an empty instance of course.
Message-Id: <1497966982-10044-1-git-send-email-calle@scylladb.com>
2017-06-20 20:59:47 +02:00
Gleb Natapov
69c5526301 messaging_service: return cache hit ratio as part of data read 2017-06-13 09:57:14 +03:00
Avi Kivity
ebaeefa02b Merge seatar upstream (seastar namespace)
- introcduced "seastarx.hh" header, which does a "using namespace seastar";
 - 'net' namespace conflicts with seastar::net, renamed to 'netw'.
 - 'transport' namespace conflicts with seastar::transport, renamed to
   cql_transport.
 - "logger" global variables now conflict with logger global type, renamed
   to xlogger.
 - other minor changes
2017-05-21 12:26:15 +03:00
Calle Wilund
6c8b5fc09d schema_tables: Use v3 schema tables and formats
Switches system/schema_* for system_schema/*, updates schema/schema
builder and uses to hold/expect v3 style info (i.e. types & dropped).
2017-05-10 16:44:48 +00:00
Calle Wilund
8066efb710 system_keyspace: Add getter/setter for built index status
Even though we have none.
2017-05-09 13:48:55 +00:00
Calle Wilund
061ef16562 system_tables/schema_tables: Remove special format case of "execute_cql"
Having a varadic parameter being used in implicit sprint is not
very readable + makes it less intuitive when suddenly system keyspace
becomes more than one -> multiple sprints in the chain -> more confusion
or more execution paths.
Its not that horrible with some spread out sprint:s
2017-05-09 13:48:55 +00:00
Calle Wilund
27fdc5cfef schema_tables/system_tables: Add v3 tables to "ALL" and handle in init
I.e. deal with more than one keyspace in system_keyspace::make
2017-05-09 13:48:55 +00:00
Calle Wilund
2fb36e3bf8 system_keyspace: Add query overloads with named keyspace 2017-05-09 13:48:55 +00:00
Calle Wilund
32909d4c84 system_keyspace: Add v3+legacy schema definitions 2017-05-09 13:48:55 +00:00
Avi Kivity
d542cdddf6 thrift: change generated code namespace
org::apache::cassandra (the generated namespace name) gets confused with
apache::cassandra (the thrift runtime library namespace), either due to
changes in gcc 7 or in thrift 0.10.  Either way, the problem is fixed
by changing the generated namespace to plain cassandra.
2017-05-05 05:26:20 +03:00
Tomasz Grabiec
586dbaa8d3 db: Replace virtual_reader_type with mutation_source_opt
Virtual reader is a mutation_source.
2017-02-23 18:23:52 +01:00
Calle Wilund
ef26ab0e1b db::system_keyspace: Find rpc_address by lookup 2017-02-06 09:45:37 +00:00
Duarte Nunes
40c684b5f5 database: Extract common create cf code
This patch moves some duplicate code into the
add_column_family_and_create_directory() function. It also saves some
superfluous keyspace lookups and readies the code to be used by
materialized views.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-20 13:06:11 +00:00
Asias He
e5485f3ea6 Get rid of query::partition_range
Use dht::partition_range instead
2016-12-19 08:09:25 +08:00
Glauber Costa
db7cc3cba8 system keyspace: write batchlog mutation in user memory
Batchlog is a potentially memory-intensive table whose workload is
driven by user needs, not system's. Move it to the user dirty memory
manager.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-12-13 13:59:35 -05:00
Duarte Nunes
6a37d87c76 db: Delete size_estimates_recorder
Now that access to the size_estimates system is virtualized, we no
longer need the recorder.

Fixes #1616

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-11-21 11:15:05 +00:00
Duarte Nunes
225648780d size_estimates: Add virtual reader
This patch add a virtual mutation_reader so that queries
to the size_estimates system table are handled by the engine
without needing to perform any IO.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-11-21 11:15:05 +00:00
Duarte Nunes
636287fdf2 system_keyspace: Build mutations for size estimates
This patch adds a function to system_keyspace responsible for creating
a mutation to a partition of the size_estimates system table from a
set of range_estimates.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-11-21 11:15:04 +00:00
Duarte Nunes
18ddec245e size_estimates: Store the token range as bytes
This patch changes the range_estimates struct so that the tokens are
represented as utf8 encoded bytes. This will make future patches
require less conversions.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-11-21 11:14:21 +00:00
Duarte Nunes
e7a5162c1d range_estimates: Add schema
This will be used in future patches, when virtualizing the
size_estimates system table.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-11-21 10:56:32 +00:00
Tomasz Grabiec
c1a7e2090e Revert "database: change find_column_families signature so it returns a lw_shared_ptr"
This reverts commit f3528ede65.
2016-11-04 10:48:21 +01:00
Glauber Costa
f3528ede65 database: change find_column_families signature so it returns a lw_shared_ptr
There are places in which we need to use the column family object many
times, with deferring points in between. Because the column family may
have been destroyed in the deferring point, we need to go and find it
again.

If we use lw_shared_ptr, however, we'll be able to at least guarantee
that the object will be alive. Some users will still need to check, if
they want to guarantee that the column family wasn't removed. But others
that only need to make sure we don't access an invalid object will be
able to avoid the cost of re-finding it just fine.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <722bf49e158da77ff509372c2034e5707706e5bf.1478111467.git.glauber@scylladb.com>
2016-11-03 13:27:31 +01:00
Avi Kivity
c94fb1bf12 build: reduce inclusions of messaging_service.hh
Remove inclusions from header files (primary offender is fb_utilities.hh)
and introduce new messaging_service_fwd.hh to reduce rebuilds when the
messaging service changes.

Message-Id: <1475584615-22836-1-git-send-email-avi@scylladb.com>
2016-10-05 11:46:49 +03:00
Duarte Nunes
e0a43a82c6 system_keyspace: Correctly deal with wrapped ranges
This patch ensures we correctly deal with ranges that wrap around when
querying the size_estimates system table.

Ref #693

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470412433-7767-1-git-send-email-duarte@scylladb.com>
2016-08-05 19:17:00 +03:00
Duarte Nunes
ecfa04da77 system_keyspace: Add query_size_estimates() function
The query_size_estimates() function queries the size_estimates system
table for a given keyspace and table, filtering out the token ranges
according to the specified tokens.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-24 22:43:58 +00:00
Duarte Nunes
e16f3f2969 system_keyspace: Avoid pointers in range_estimates
This patch makes range_estimates a proper struct, where tokens are
represented as dht::tokens rather than dht::ring_position*.

We also pass other arguments to update_ and clear_size_estimates by
copy, since one will already be required.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-24 22:43:35 +00:00
Piotr Jastrzebski
636a4acfd0 Add flag to configure
max size of a cached partition.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2016-07-21 09:47:20 +02:00
Vlad Zolotarov
baa6496816 service::storage_proxy: READ instrumentation: store trace state object in abstract_read_executor
Having a trace_state_ptr in the storage_proxy level is needed to trace code bits in this level.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:59 +03:00
Duarte Nunes
f8f61cf246 system_keyspace: Record and clear size estimates
This patch implements functions that allow the size_estimates system
table to be updated and cleared. The size_estimates table is updated
per schema with a set of token ranges and the associated estimations
of how many partitions there are and their mean size.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-18 23:58:31 +00:00
Glauber Costa
7169b727ea move system tables to its own region
In the spirit of what we are doing for the read semaphore, this patch moves
system writes to its own dirty memory manager. Not only will it make sure that
system tables will not be serialized by its own semaphore, but it will also put
system tables in its own region group.

Moving system tables to its own region group has the advantage that system
requests won't be waiting during throttle behind a potentially big queue of user
requests, since requests are tended to in FIFO order within the same region
group. However, system tables being more controlled and predictable, we can
actually go a step further and give them some extra reservation so they may not
necessarily block even if under pressure (up to 10 MB more).

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-07-05 17:46:28 -04:00
Avi Kivity
76cc6408cd Merge "feature check for seed node" from Asias
""This series implemnts feature check for seed node.
2016-07-05 19:01:01 +03:00
Asias He
6f69963ef9 system_keyspace: Simplify load_host_ids implementation
- Use plain loop instead of do_for_each

- Use row.get_as() instead of row.template get_as()
Message-Id: <3e108d3a6258c0caaf569eb9c79532d9789ea411.1467703722.git.asias@scylladb.com>
2016-07-05 09:47:21 +02:00
Asias He
3f31be58b6 system_keyspace: Simplify load_tokens implemntation
- Use plain loop instead of do_for_each

- Use row.get_as() instead of row.template get_as()
Message-Id: <f959ace4f30078695d383c849ed4520169228f97.1467703722.git.asias@scylladb.com>
2016-07-05 09:47:21 +02:00
Asias He
31df4e5316 system_keyspace: Introduce load_peer_features
To get the peer features stored in the system.peers table.
2016-07-05 10:09:53 +08:00
Avi Kivity
9ac730dcc9 mutation_reader: make restricting_mutation_reader even more restricting
While limiting the number of concurrently executing sstable readers reduces
our memory load, the queued readers, although consuming a small amount of
memory, can still grow without bounds.

To limit the damage, add two limits on the queue:
 - a timeout, which is equal to the read timeout
 - a queue length limit, which is equal to 2% of the shard memory divided
   by an estimate of the queued request size (1kb)

Together, these limits bound the amount of memory needed by queued disk
requests in case the disk can't keep up.
Message-Id: <1467206055-30769-1-git-send-email-avi@scylladb.com>
2016-06-29 15:17:35 +02:00
Avi Kivity
edeef03b34 db: restrict replica read concurrency
Since reading mutations can consume a large amount of memory, which, moreover,
is not predicatable at the time the read is initiated, restrict the number
of reads to 100 per shard.  This is more than enough to saturate the disk,
and hopefully enough to prevent allocation failures.

Restriction is applied in column_family::make_sstable_reader(), which is
called either on a cache miss or if the cache is disabled.  This allows
cached reads to proceed without restriction, since their memory usage is
supposedly low.

Reads from the system keyspace use a separate semaphore, to prevent
user reads from blocking system reads.  Perhaps we should select the
semaphore based on the source of the read rather than the keyspace,
but for now using the keyspace is sufficient.
2016-06-27 17:17:56 +03:00
Pekka Enberg
47a904c0f6 Merge "gossip: Introduce SUPPORTED_FEATURES" from Asias
"There is a need to have an ability to detect whether a feature is
supported by entire cluster. The way to do it is to advertise feature
availability over gossip and then each node will be able to check if all
other nodes have a feature in question.

The idea is to have new application state SUPPORTED_FEATURES that will contain
set of strings, each string holding feature name.

This series adds API to do so.

The following patch on top of this series demostreates how to wait for features
during boot up. FEATURE1 and FEATURE2 are introduced. We use
wait_for_feature_on_all_node to wait for FEATURE1 and FEATURE2 successfully.
Since FEATURE3 is not supported, the wait will not succeed, the wait will timeout.

   --- a/service/storage_service.cc
   +++ b/service/storage_service.cc
   @@ -95,7 +95,7 @@ sstring storage_service::get_config_supported_features() {
        // Add features supported by this local node. When a new feature is
        // introduced in scylla, update it here, e.g.,
        // return sstring("FEATURE1,FEATURE2")
   -    return sstring("");
   +    return sstring("FEATURE1,FEATURE2");
    }

    std::set<inet_address> get_seeds() {
   @@ -212,6 +212,11 @@ void storage_service::prepare_to_join() {
        // gossip snitch infos (local DC and rack)
        gossip_snitch_info().get();

   +    gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE1"), sstring("FEATURE2")}, std::chrono::seconds(30)).get();
   +    logger.info("Wait for FEATURE1 and FEATURE2 done");
   +    gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE3")}).get();
   +    logger.info("Wait for FEATURE3 done");
   +

We can query the supported_features:

    cqlsh> SELECT supported_features from system.peers;

     supported_features
    --------------------
      FEATURE1,FEATURE2
      FEATURE1,FEATURE2

    (2 rows)
    cqlsh> SELECT supported_features from system.local;

     supported_features
    --------------------
      FEATURE1,FEATURE2

    (1 rows)"
2016-04-08 09:22:50 +03:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00