scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-06 23:13:15 +00:00

Author	SHA1	Message	Date
Nadav Har'El	b262eb5031	alternator: use simpler API for registering Alternator's HTTP URLs We used the Seastar HTTP server's add() method to register URLs to serve (so-called "routes"), but as suggested by Amnon, when we have fixed URLs without parameters being path components, it's simpler to use the put() method to do the same thing - and also results in slightly less work at run-time to look up these routes. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-02-05 21:14:18 +02:00
Nadav Har'El	9de26b73a4	alternator-test: test "/localnodes" feature This is a partial test for the "/localnodes" request, which is supposed to return the list of live nodes in this DC. Because of the limitations of our current alternator-test framework (which should work on any pre-existing cluster), we don't know what to expects as a reply, but we just verify the minimum: The request is understood, returns a JSON list, which contains at least one item. As "/localnodes" is a Scylla-only feature, this test is skipped when running with "--aws". Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-02-05 21:14:18 +02:00
Nadav Har'El	3fecf6f641	alternator: add public API for list of nodes in current DC If we want to balance the Alternator request load among the different nodes (Refs #5030), the load balancer - whether it uses HTTP load balancing or DNS - needs to be able to get an up-to-date list of live nodes to which it can direct Alternator traffic. This list should include only the live nodes in the same data center (geographical region) - it is expected that a separate load balancer will be installed in each data center, and clients from within this data center will reach this data center's load balancer. There are multiple APIs in current Scylla to do something similar to what we need, but as far as I know, none of them is exactly what we need or convenient for Alternator installations: We don't want the load balancer to use CQL, and the REST API http://localhost:10000/gossiper/endpoint/live/ doesn't do what we need (it doesn't restrict the list to one data center) plus it's not open to connections outside the machine. So in this patch, we implement a new HTTP request on the Alternator port - "/localnodes", returning a JSON-formatted list of all live nodes in the contacted node's data center: $ curl http://localhost:8000/localnodes ["127.0.0.2","127.0.0.1","127.0.0.3"] Like the existing health check HTTP request, this operation is public and unauthenticated. We consider the security risk low - it allows an attacker to enquire the list of Scylla nodes in this DC, but an attacker can achieve the same thing by just scanning the addresses in this subnet using the health check request (or even with ordinary DynamoDB API requests). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-02-05 21:14:18 +02:00
Nadav Har'El	95351016fd	alternator: use LWT timestamp - in BatchWriteItems too A previous patch fixed Alternator's writes to use the timestamp provided by LWT instead of the current timestamp. That patch fixed the PutItem, DeleteItem and UpdateItem operations - and this patch fixes the remaining write operation: BatchWriteItems. So, Fixes #5653. Unfortunatly, the requirements of both BatchWriteItems and LWT make the resulting code - and this patch - somewhat inelegant. BatchWriteItems requires that we prepare all the operations first - failing if any of them has an error. Before this patch, the result of this preparation was an array of mutations, which in a second step we wrote to the database. But we can no longer use mutations for the result of the first step, because creating a mutation requires knowing the timestamp, which we don't know during the preparate phase - we will only know it during the later LWT operation. So now we need to invent a new intermediate format between the request and the mutation. This intermediate format is further complicated by the need to be send it between shards (for LWT's shard forwarding) so it cannot, for example, contain a reference to a schema. The fact that different sub-operations need to be sent to different shards, and that different sub-operations may write to different tables, further complicate the book-keeping and gives us a bunch of funky-typed maps. But eventually it all fits together. After this patch, as before this patch, the same code (now called put_or_delete_item), is used to implement both the PutItem and DeleteItem stand-alone operation, and the BachWriteItems operation which includes a whole list of these PutItem and DeleteItem operation. This patch also includes two more tests in test_batch.py, which test two more corner tests we haven't tested before: One tests the capability of BatchWriteItems to write to more than one table. The other tests that BatchWriteItems can write an empty item (it is not surprising that it does, but we do have special code for this case, so we should test it). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2020-02-05 21:14:18 +02:00
Rafael Ávila de Espíndola	9eae0b57a3	test: Enable all experimental features in the cql_repl The cql repl will hopefully be used to write most new tests, so it should have all experimental features enabled. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200204173448.95892-1-espindola@scylladb.com>	2020-02-04 19:36:37 +02:00
Avi Kivity	7d70bfe20c	Merge "Lua: Fix handling of list<varint> and list<decimal>" from Rafael " This patch series fixes #5711, enables UDF support in CQL tests and and includes a few extra cleanups. " * 'espindola/lua-fixes' of https://github.com/espindola/scylla: lua: Use a negative index for consistency lua: Fix returning list<decimal> lua: Fix returning list<varint> lua: Use a lua_slice_state instead of a from_lua_visitor test: Enable UDF in the cql repl	2020-02-04 18:51:54 +02:00
Nadav Har'El	acafcbfdf4	alternator: use LWT timestamp, not current timestamp The DynamoDB API doesn't have the notion of client-supplied timestamps, so the server is supposed to use its own current timestamp for write operations. However, for LWT writes, we should not use this node's current time: Different nodes may slightly differ in their clocks, and LWT needs a monotonically-increasing notion of time for the consistent operations. LWT provides to the operation's apply() method the specific timestamp that it should use in its returned mutation - and we should use this timestamp, not the current timestamp. In the optional write modes where LWT is not used, we continue to use the current timestamp (api::new_timestamp()) as before. This patch fixes the PutItem, UpdateItem and DeleteItem operations. The BatchWriteItem operation is not yet fixed by this patch - fixing it will require more elaborate code changes so will be done in a separate patch. Refs #5653. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200130122853.7658-1-nyh@scylladb.com>	2020-02-04 10:18:49 +01:00
Nadav Har'El	0a23471eae	alternator: switch BatchWriteItems to use LWT too Today, we use LWT for all PutItem, UpdateItem and GetItem operations. We do this even for pure writes - writes which do not involve a read before the write). But BatchWriteItem also does pure writes - and it doesn't use LWT yet. So this patch changes it so it does. As before we keep in the code - not yet configurable by a user - also the option to do these unconditional writes without LWT. A BatchWriteItem may change multiple partitions (but a fairly low number - DynamoDB allows each BatchWriteItem to only do 25 updates) and we start the different LWT operations in parallel. This patch collects multiple mutations to the same partition together to be done with a single LWT operation, so we also add a test for this case, were we have a batch of writes involving several items in each of several partitions. Fixes #5637 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200128160538.11775-1-nyh@scylladb.com>	2020-02-04 10:08:18 +01:00
Rafael Ávila de Espíndola	6764316576	cql3: Simplify maybe_quote This produce code that is just as fast as the previous implementation and is quite a bit easier to read IMHO. I benchmarked it by temporally adding: BOOST_AUTO_TEST_CASE(bench_maybe_quote) { std::string val(1 << 20, 'x'); using clk = std::chrono::steady_clock; cql3::util::maybe_quote(val); auto start = clk::now(); for (int i = 0; i < 1000; ++i) { cql3::util::maybe_quote(val); } auto end = clk::now(); std::chrono::duration<double> duration = end - start; std::cout << "delta = " << duration.count() << '\n'; } Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200203225140.180262-1-espindola@scylladb.com>	2020-02-04 10:52:04 +02:00
Avi Kivity	cdecb21b78	Update seastar submodule * seastar 65980a9b30...30185fd901 (12): > sstring: resize: NulTerminate when downsizing > reactor: make open_flags::dsync respect --unsafe-bypass-fsync > json/json_elements: Use double quotes around element name > Revert "reactor: make open_flags::dsync respect --unsafe-bypass-fsync" > Merge "smp: reduce allocations in work_item::process" from Avi > task: optimize destruction by making destructor non-virtual > reactor: make open_flags::dsync respect --unsafe-bypass-fsync > Revert "sstring: resize: NulTerminate when downsizing" > sstring: resize: NulTerminate when downsizing > tests: Rename unix domain socket test for consistency > resource: downgrade cgroupsv2 message. > Merge "Simplify the stream/subscription implementation" from Rafael	2020-02-04 10:20:29 +02:00
Nadav Har'El	3de09042bb	CDC topology change support Merged pull request https://github.com/scylladb/scylla/pull/5485 by Kamil Braun: This series introduces the notion of CDC generations: sets of CDC streams used by the cluster to choose partition keys for CDC log writes. Each CDC generation begins operating at a specific time point, called the generation's timestamp (cdc_streams_timestamp in the code). It continues being used by all nodes in the cluster to generate log writes until superseded by a new generation. Generations are chosen so that CDC log writes are colocated with their corresponding base table writes, i.e. their partition keys (which are CDC stream identifiers picked from the generation operating at time of making the write) fall into the same vnode and shard as the corresponding base table write partition keys. Currently this is probabilistic and not 100% of log writes will be colocated - this will change in future commits, after per-table partitioners are implemented. CDC generations are a global property of the cluster -- they don't depend on any particular table's configuration. Therefore the old "CDC stream description tables", which were specific to each CDC-enabled table, were removed and replaced by a new, global description table inside the system_distributed keyspace. A new generation is introduced and supersedes the previous one whenever we insert new tokens into the token ring, which breaks the colocation property of the previous generation. The new generation is chosen to account for the new tokens and restore colocation. This happens when a new node joins the cluster. The joining node is responsible for creating and informing other nodes about the new CDC generation. It does that by serializing it and inserting into an internal distributed table ("CDC topology description table"). If it fails the insert, it fails the joining process. It then announces the generation to other nodes through gossip using the generation's timestamp, which is the partition key of the inserted distributed table entry. Nodes that learn about the new generation through gossip attempt to retrieve it from the distributed table. This might fail - for example, if the node is partitioned away from all replicas that hold this generation's table entry. In that case the node might stop accepting writes, since it knows that it should send log entries to a new generation of streams, but it doesn't know what the generation is. The node will keep trying to retrieve the data in the background until it succeeds or sees that it is no longer necessary (e.g., because yet another generation superseded this one). So we give up some availability to achieve safety. However, this solution is not completely safe (might break consistency properties): if a node learns about a new generation too late (if gossip doesn't reach this node in time), the node might send writes to the wrong (old) generation. In the future we will introduce a transaction-based approach where we will always make sure that all nodes receive the new generation before any of them starts using it (and if it's impossible e.g. due to a network partition, we will fail the bootstrap attempt). In practice, if the admin makes sure that the cluster works correctly before bootstrapping a new node, and a network partition doesn't start in the few seconds window where a new generation is announced, everything will work as it should. After the learning node retrieves the generation, it inserts it into an in-memory data structure called "CDC metadata". This structure is then used when performing writes to the CDC log -- given the timestamp of the written mutation, the data structure will return the CDC generation operating at this time point. CDC metadata might reject the query for two reasons: if the timestamp belongs to an earlier generation, which most probably doesn't have the colocation property anymore, or if it is picked too far away into the future, where we don't know if the current generation won't be superseded by a different one (so we don't yet know the set of streams that this log write should be sent to). If the client uses server-generated timestamps, the query will never be rejected. Clients can also use client-generated timestamps, but they must make sure that their clocks are not too desynchronized with the database -- otherwise some or all of their writes to CDC-enabled tables will be rejected. In the case of rolling upgrade, where we restart nodes that were previously running without CDC, we act a bit differently - there is no naturally selected joining node which must propose a new generation. We have to select such a node using other means. For this we use a bully approach: every node compares its host id with host ids of other nodes and if it finds that it has the greatest host id, it becomes responsible for creating the first generation. This change also fixes the way of choosing values of the "time" column of CDC log writes: the timeuuid is chosen in a way which preserves ordering of corresponding base table mutations (the timestamp of this timeuuid is equal to the base table mutation timestamp). Warning: if you were running a previous CDC version (without topology change support), make sure to disable CDC on all tables before performing the upgrade. This will drop the log data -- backup it if needed. TODO in future patchset: expire CDC generations. Currently, each inserted CDC generation will stay in the distributed tables forever (until manually removed by the administrator). When a generation is superseded, it should become "expired", and 24 hours after expiration, it should be removed. The distributed tables (cdc_topology_description and cdc_description) both have an "expired" column which can be used for this purpose. Unit tests: dev, debug, release dtests (dev): https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/907/	2020-02-04 10:20:29 +02:00
Gleb Natapov	2876482373	lwt: account for cases where LWT request were moved to another shard in statistics Now that we bounce lwt requests to the correct shard before calling into storage_proxy the cross shard op accounting does not account for bounced lwt statement. Fix that by increasing corresponding counter when returning a "bounce" reply. Message-Id: <20200203122011.GH26048@scylladb.com>	2020-02-04 10:20:28 +02:00
Nadav Har'El	37f2f6112e	cql3::util::maybe_quote: avoid stack overflow and fix quote doubling Merged patch series from Benny Halevy: The function was reimplemented to solve the following issues. The cutom implementation also improved its performance in close to 19% Using regex_match("[a-z][a-z0-9_]") may cause stack overflow on long input strings as found with the limits_test.py:TestLimits.max_key_length_test dtest. std::regex_replace does not replace in-place so no doubling of quotes was actually done. Add unit test that reproduces the crash without this fix and tests various string patterns for correctness. Note that defining the regex with std::regex::optimize still ended up with stack overflow. Fixes #5671 cql3::util::maybe_quote: avoid stack overflow and fix quote doubling * cql3::util::maybe_quote: further optimize quote doubling	2020-02-04 10:20:28 +02:00
Nadav Har'El	6e91f159fe	LWT: handle bounce_to_shard result for batch statements Merged patch series from Gleb Natapov: Batch statement can also execute LWT and hence need to handle bounce_to_shard result. * transport: handle bounce_to_shard for batch statement * transport: consolidate bounce_to_shard handling between all three verbs that handle it	2020-02-04 10:20:28 +02:00
Takuya ASADA	1446fe930b	dist/redhat: install specified version of scylla-conf on meta package (#5599 ) To install specified version of scylla-conf package, we need to add it on Requires. Fixes #5639	2020-02-04 10:20:28 +02:00
Benny Halevy	f45fabab73	gossiper: do_stop_gossiping: copy live endpoints vector It can be resized asynchronously by mark_dead. Fixes #5701 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200203091344.229518-1-bhalevy@scylladb.com>	2020-02-04 10:20:28 +02:00
Avi Kivity	501b24cad3	test.py: use command line option in preference to environment variable when calling a test Command line options are printed out, so if a user cuts-and-pastes a command line they will get a run that is more similar to the one that the test executed. Message-Id: <20200202133209.209608-1-avi@scylladb.com>	2020-02-04 10:20:28 +02:00
Rafael Ávila de Espíndola	1294770970	lua: Use a negative index for consistency In this case we know the size of the stack and both indexes refer to the same position. Using a negative index is just more consistent with the rest of the file and hopefully a bit less brittle to future changes. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-03 18:23:09 -08:00
Rafael Ávila de Espíndola	a4d668e8ed	lua: Fix returning list<decimal> We were accessing the wrong stack location if a decimal was not at top of the stack. Fixes: #5711 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-03 18:10:04 -08:00
Rafael Ávila de Espíndola	39e637f6bf	lua: Fix returning list<varint> We were accessing the wrong stack location if a varint was not at the top of the stack. Refs: #5711 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-03 18:09:59 -08:00
Rafael Ávila de Espíndola	530779efb6	lua: Use a lua_slice_state instead of a from_lua_visitor A few places were using a from_lua_visitor only to access the lua_slice_state member variable. This is just a simplification. No functionality changed. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-03 18:04:36 -08:00
Rafael Ávila de Espíndola	35023c831c	test: Enable UDF in the cql repl A followup commit will use this to write cql tests for UDF. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-02-03 17:58:27 -08:00
Gleb Natapov	9c75a25e9f	transport: consolidate bounce_to_shard handling between all three verbs that handle it All three verbs that need to handle bounce_to_shard have almost identical process_() and process__on_shard() functions. Consolidate them into one to reuse the code.	2020-02-03 14:27:50 +02:00
Gleb Natapov	dd793098fa	transport: handle bounce_to_shard for batch statement Batch statement can also execute LWT and hence need to handle bounce_to_shard result. Fixes: #5644	2020-02-03 14:27:30 +02:00
Kamil Braun	4b3754ff94	docs: add documentation about CDC generations	2020-02-03 10:57:31 +01:00
Kamil Braun	b130b76274	test: disable CDC flag by default When CDC flag is on, the node startup procedure takes a few seconds longer (we have to generate CDC streams). This is not necessary in non-CDC tests.	2020-02-03 10:57:31 +01:00
Kamil Braun	0d41e2c1fe	test: add cdc::generate_timeuuid tests	2020-02-03 10:57:31 +01:00
Kamil Braun	5fb5925fb4	test: add cdc::find_timestamp tests	2020-02-03 10:57:31 +01:00
Kamil Braun	7cb6ac33f5	storage_service: check if we know other nodes' tokens when joining ring If we are a seed node (but not the only one) or we set auto_bootstrap=off, it might happen due to misconfiguration or a network partition that we don't know other nodes' tokens at the end of the join_token_ring function, when we go into the NORMAL status, finishing the joining process. CDC however requires that we know other nodes' tokens at this point: we need them to correctly create a new CDC generation. This commit adds a check which prevents the node from starting if that's not the case. If the check fails, the node first tries waiting a bit until it learns about the tokens or timeouts.	2020-02-03 10:57:28 +01:00
Avi Kivity	2816404f57	test.py: documented exit code value Document our chosen exit failure code value and its relationship to git bisect. Message-Id: <20200202134223.210578-1-avi@scylladb.com>	2020-02-03 00:58:58 +02:00
Avi Kivity	541893e69a	Merge "Fix conversion of lua nil to cql null" from Rafael " The fix itself is fairly simple, but looking at the code I found that our code base was not cleanly distinguishing null and empty values and was treating null and missing values differently, but that distinction was dead since a null is represented as a dead cell. " * 'espindola/lua-fix-null-v6' of https://github.com/espindola/scylla: lua: Handle nil returns correctly types: Return bytes_opt from data_value::serialize query-result-set: Assert that we don't have null values types: Fix comparison of empty and null data_values Revert "tests: Handle null and not present values differently" query-result-set: Avoid a copy during construction types: Move operator== for data_value out-of-line	2020-02-02 15:43:24 +02:00
Avi Kivity	c8890eb124	Merge "Simplify usage of stream subscriptions" from Rafael " In a few places, the only use we had for a subscription was calling done(). With this series we now call done() early and store the future<> instead. " * 'espindola/stream-cleanup' of https://github.com/espindola/scylla: sstable_test: Store a future<> instead of a subscription commitlog: Store a future instead of a subscription in db::commitlog::segment_manager::list_descriptors::helper lister: Store a future<> instead of a subscription	2020-02-02 14:49:00 +02:00
Rafael Ávila de Espíndola	5dfb658e77	build: Add two missing dependencies With this change we always rebuild seastar/libseastar_testing.a for the same reason we always rebuild seastar/libseastar.a: We have no idea what its dependencies are, we have to recurse to seastar to find out. The other missing dependency is that we have to rebuild build.ninja when seastar/CMakeLists.txt changes. A change in seastar/CMakeLists.txt can cause seastar.pc to change which can change the command lines used. That is incomplete as change other seastar files can have the same impact, but it is better than nothing. It is not sufficient to put a dependency in the seastar.pc file as that file will be modified when cmake is run and the scylla ninja process doesn't see the CMakeLists.txt to seastar.pc edge. Fixes: #5687 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200201001126.458992-1-espindola@scylladb.com>	2020-02-01 21:08:26 +02:00
Pavel Emelyanov	4839ca8491	storage_service: Unregister from gossiper notifications ... at all This unregistration doesn't happen currently, but doesn't seem to cause any problems in general, as on stop gossiper is stopped and nothing from it hits the store_service. However (!) if an exception pops up between the storage_service is subscribed on gossiper and the drain_on_shutdown defer action is set up then we _may_ get into the following situation: - main's stuff gets unrolled back - gossiper is not stopped (drain_on_shutdown defer is not set up) - migration manager is stopped (with deferred action in main) - a nitification comes from gossiper -> storage_service::on_change might want to pull schema with the help of local migration manager -> assert(local_is_initialized) strikes Fix this by registering storage_service to gossiper a bit earlier (both are already initialized y that time) and setting up unregister defer right afterwards. Test: unit(dev), manual start-stop Bug: #5628 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200130190343.25656-1-xemul@scylladb.com>	2020-01-31 14:02:18 +01:00
Avi Kivity	ec5b721db7	test: make eventually() more patient We use eventually() in tests to wait for eventually consistent data to become consistent. However, we see spurious failures indicating that we wait too little. Increasing the timeout has a negative side effect in that tests that fail will now take longer to do so. However, this negative side effect is negligible to false-positive failures, since they throw away large test efforts and sometimes require a person to investigate the problem, only to conclude it is a false positive. This patch therefore makes eventually() more patient, by a factor of 32. Fixes #4707. Message-Id: <20200130162745.45569-1-avi@scylladb.com>	2020-01-31 14:02:18 +01:00
Dejan Mircevski	6661ed7de4	cql3: Drop restrictions::values() method No-one seems to invoke this method. Instead, clients invoke restriction::values (note singular "restriction"). Most subclasses of restrictions also inherit from restriction, so values() still exists in their public interface. Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com>	2020-01-31 13:05:51 +01:00
Avi Kivity	985e00efa6	Merge "Fix the serialization of negative varint values" from Rafael " Benny pointed out that we could avoid a branch inside a loop is the old serialization code. That got me looking at the logic and I found that it would also produce an unnecessary 0xff prefix for some negative numbers. This patch series fixes the serialization and optimizes it. It now does no extra copies for positives numbers and only one extra copy for negative numbers, which I think is optimal since cpp_int uses sign magnitude and we want the 2 complement representation. " * 'espindola/serialize_varint-improvements-v2' of https://github.com/espindola/scylla: types: Use a fancy iterator to avoid a temporary buffer types: Use export_bits to serialize cpp_int types: Avoid a branch in a loop types: Fix encoding of negative varint types: Replace "num.sign() < 0" with "num < 0"	2020-01-30 20:35:54 +02:00
Rafael Ávila de Espíndola	cc81ba3432	types: Use a fancy iterator to avoid a temporary buffer By using a fancy iterator we can avoid calling export_bits with a temporary buffer before copying the result to the output. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 10:26:39 -08:00
Rafael Ávila de Espíndola	7e67ce0bdb	types: Use export_bits to serialize cpp_int This avoid a copy when serializing positive numbers. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 10:26:39 -08:00
Rafael Ávila de Espíndola	27a67f1a2c	types: Avoid a branch in a loop Thanks to Benny for the suggestion. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 10:26:39 -08:00
Rafael Ávila de Espíndola	c89c90d07f	types: Fix encoding of negative varint We would sometimes produce an unnecessary extra 0xff prefix byte. The new encoding matches what cassandra does. This was both a efficiency and correctness issue, as using varint in a key could produce different tokens. Fixes #5656 Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 10:25:09 -08:00
Rafael Ávila de Espíndola	ed747122aa	types: Replace "num.sign() < 0" with "num < 0" Surprisingly, this produces better code with cpp_int. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 10:24:03 -08:00
Rafael Ávila de Espíndola	cc9495d4d3	sstable_test: Store a future<> instead of a subscription The only use we had for the subscription was calling done, may as well call it early and store the future<>. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 08:31:28 -08:00
Rafael Ávila de Espíndola	da984f1f33	commitlog: Store a future instead of a subscription in db::commitlog::segment_manager::list_descriptors::helper The only use we had for the subscription was calling done, may as well call it early and store the future<>. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 08:31:28 -08:00
Rafael Ávila de Espíndola	b88f6edee0	lister: Store a future<> instead of a subscription The only use we had for the subscription was calling done, may as well call it early and store the future<>. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-01-30 08:31:28 -08:00
Gleb Natapov	b08679e1d3	db/system_keyspace: use user memory limits for local.paxos table Treat writes to local.paxos as user memory, as the number of writes is dependent on the amount of user data written with LWT. Fixes #5682 Message-Id: <20200130150048.GW26048@scylladb.com>	2020-01-30 17:07:27 +02:00
Piotr Sarna	b783d40aaf	Merge 'Add per scheduling groups statistics' from Eliran This set implements support for per scheduling group statistics in storage proxy and tables view statistics (although tables view per scheduling group stats are not actively applied in this series). Having those statistics per scheduling group can help in finding operations that are performed outside their context, another advantage is that it lays the land for supporting per service level statistics for the workload prioritization enterprise feature. At some point there was a thought to add those stats per role but for now it is not feasible at the moment: 1. The number of roles/user is unbounded so it is dangerous to hold stats (in memory) for all of them. 2. We will need a proper design of how to deal with the hierarchical nature of roles in the stats. Besides these reasons and regardless, it is beneficial to look on resource related stats per scheduling group, looking at resources per user or role will not necessarily give insights since resources are divided per sg and not role, so it can lead to false conclusions if more than one role is attached to the same service level. Tests: unit tests (Dev, Debug) validating the stats with monitor * es/per_sg_stats/v6: storage proxy: migrate to per scheduling group statistics internalize storage proxy statistics metric registration	2020-01-30 15:02:33 +01:00
Eliran Sinvani	971711a546	storage proxy: migrate to per scheduling group statistics This commit builds on top of the introduced per scheduling group statistics template and employs it for achieving a per scheduling group statistics in storage_proxy. Some of the statistics also had meaning as a global - per shard one. Those are the ones for determining if to throttle the write request. This was handled by creating a global stats struct that will hold those stats and by changing the stat update to also include the global one. One point that complicated it is an already existing aggregation over the per shard stats that now became a per scheduling group per shard stats, converting the aggregation to a two-dimensional aggregation. One thing this commit doesn't handle is validating that an individual statistic didn't "cross a scheduling group boundary", such validation is possible but it can easily be added in the future. There is a subtlety to doing so since if the operation did cross to other scheduling group two connected statistics can lose balance for example written bytes and completed write transactions. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2020-01-30 15:01:44 +01:00
Eliran Sinvani	8cfc2aad57	internalize storage proxy statistics metric registration The storage proxy statistics structure did not contain a method for registering the statistics for metric groups, instead, each user had to register some of the metrics by itself. There is no real reason for separating the metrics registration from the statistics data. There is even less justification for doing this only for part of the stats as is the case for those statistics. This commit internalize the metrics registration in the storage_proxy stats structures. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2020-01-30 15:01:40 +01:00
Gleb Natapov	c138dfd33e	lwt: introduce LWT gossiper feature Do not allow lwt operation if LWT is not enabled by entire cluster. Message-Id: <20200130120912.GV26048@scylladb.com>	2020-01-30 15:12:56 +02:00

1 2 3 4 5 ...

20925 Commits