scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 02:50:33 +00:00

Author	SHA1	Message	Date
Avi Kivity	acdd42c7c8	Merge "Fix abort during counter table read-on-delete" from Tomasz " This fixes an abort in an sstable reader when querying a partition with no clustering ranges (happens on counter table mutation with no live rows) which also doesn't have any static columns. In such case, the sstable_mutation_reader will setup the data_consume_context such that it only covers the static row of the partition, knowing that there is no need to read any clustered rows. See partition.cc::advance_to_upper_bound(). Later when the reader is done with the range for the static row, it will try to skip to the first clustering range (missing in this case). If clustering_ranges_walker tells us to skip to after_all_clustering_rows(), we will hit an assert inside continuous_data_consumer::fast_forward_to() due to attempt to skip past the original data file range. If clustering_ranges_walker returns before_all_clustering_rows() instead, all is fine because we're still at the same data file position. Fixes #3304. " * 'tgrabiec/fix-counter-read-no-static-columns' of github.com:scylladb/seastar-dev: tests: mutation_source_test: Test reads with no clustering ranges and no static columns tests: simple_schema: Allow creating schema with no static column clustering_ranges_walker: Stop after static row in case no clustering ranges (cherry picked from commit `054854839a`)	2018-03-22 18:13:29 +02:00
Vladimir Krivopalov	a983ba7aad	perf_fast_forward: fix error in date formatting Instead of 'month', 'minutes' has been used. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <1e005ecaa992d8205ca44ea4eebbca4621ad9886.1521659341.git.vladimir@scylladb.com> (cherry picked from commit `3010b637c9`)	2018-03-22 12:56:56 +02:00
Avi Kivity	f8613a8415	Merge "Save and recall queriers for paged singular-mutation queries" from Botond " Terms ----- querier: A class encapsulating all the logic and state needed to fill a page. This Includes the reader, the compact_mutation object and all associated state. Preamble -------- Currently for paged-queries we throw away all readers, compactors and all associated state that contributed to filling the page and on the next page we create them from scratch again. Thus on each page we throw away a considerable amount of work, only to redo it again on the next page. This has been one of the major contributors to latencies as from the point of view of a replica each page is as much work as a fresh query. Solution -------- The solution presented in this patch-series is to save queriers after filling a page and reuse them on the next pages, thus doing the considerable amount of work involved with creating the them only once. On each page the coordinator will generate a UUID that identifies this page. This UUID is used as the key, under which the contributing queriers will be saved in the cache. On the next page the UUID from the previous page will be used to lookup saved queriers, and the one from the current one to saved them afterwards (if the query isn't finished). These UUIDs (reader_recall_uuid and reader_save_uuid) are attached to the page-state. Also attached to the page state is the list of replicas hit on the last page. On the next page this list will be consulted to hit the same replicas again, thus reusing the queriers saved on them. Cached queriers will be evicted after a certain period of time to avoid unecessary resource consumption by abandoned reads. Cached queriers may also be evicted when the shard faces resource-pressure, to free up resources. Splitting up the work --------------------- This series only fixes the singular-mutation query path, that is queries that either fetch a single partition, or severeal single partitions (IN queries). The fix for the scanning query path will be done in a follow-up series, however much of the infrastructure needed for the general querier reuse is already introduced by this series. Ref #1865 Tests: unit-tests(debug, release), dtests(paging_test, paging_additional_test) Benchmarking summary (read-from-disk) ------------------------------------- 1) Latency BEFORE latency mean : 58.0 latency median : 57.4 latency 95th percentile : 68.8 latency 99th percentile : 79.9 latency 99.9th percentile : 93.6 latency max : 93.6 AFTER latency mean : 41.3 latency median : 40.5 latency 95th percentile : 50.8 latency 99th percentile : 68.9 latency 99.9th percentile : 89.2 latency max : 89.2 2) Throughput (single partition query) sum(scylla_cql_reads): BEFORE: 173'567 AFTER: 427'774 +246% 3) Throughput (IN query, 2 partitions) sum(scylla_cql_reads): BEFORE: 85'637 AFTER: 127'431 +148% " * '1865/singular-mutations/v8.2' of https://github.com/denesb/scylla: (23 commits) Add unit test for resource based cache eviction Add unit tests for querier_cache Add counters to monitor querier-cache efficiency Memory based cache eviction Add buffer_size() to flat_mutation_reader Resource-based cache eviction Time-based cache eviction Save and restore queriers in mutation_query() and data_query() Add the querier_cache_context helper Add querier_cache Add querier Add are_limits_reached() compact_mutation_state Add start_new_page() to compact_mutation_state Save last key of the page and method to query it Make compact_mutation reusable Add the CompactedFragmentsConsumer Use the last_replicas stored in the page_state query_singular(): return the used replicas Consider preferred replicas when choosing endpoints for query_singular() Add preferred and last replicas to the signature of query() ...	2018-03-13 18:38:59 +02:00
Botond Dénes	c0009750c3	Add unit test for resource based cache eviction Specifically for the reader-permit based eviction. This test lives in a separate executable as it uses with_cql_test_env() and thus needs a main() of it's own.	2018-03-13 16:20:50 +02:00
Botond Dénes	c53b6f75c8	Add unit tests for querier_cache	2018-03-13 12:59:45 +02:00
Avi Kivity	636760c282	Merge "Introduce JSON output format to perf_fast_forward tests." from Vladimir " This patchset is a part of a bigger effort for bringing our microbenchmarking tests from the source tree to be used for regression testing purposes with CI. Now, it is possible to export results of tests run into JSON format that can be stored in ElasticSearch and compared among runs to detect performance degradation should it happen. Example of JSON output (formatted for readability): { "results" : { "parameters" : { "read" : "64", "read,skip,test_run_count" : "64,256,1", "skip" : "256", "test_run_count" : 1 }, "stats" : { "(KiB)" : 126960, "aio" : 993, "blocked" : 208, "c blk" : 1, "c hit" : 0, "c miss" : 1, "cpu" : 99.779365539550781, "dropped" : 0, "frag/s" : 311939.61559016741, "frags" : 200000, "idx blk" : 0, "idx hit" : 0, "idx miss" : 0, "time (s)" : 0.641149729 } }, "test_group_properties" : { "message" : "Testing scanning large partition with skips.\nReads whole range interleaving reads with skips according to read-skip pattern", "name" : "large-partition-skips", "needs_cache" : false, "partition_type" : "large" }, "versions" : { "scylla-server" : { "commit_id" : "4acfa17f4", "date" : "20180306", "run_date_time" : "2018-16-06 12:16:41", "version" : "666.development" } } } " * 'issues/2947/v6' of https://github.com/argenet/scylla: Add support for JSON output format for perf_fast_forward results. Wrap output for customization. Move all output handling to a single managing class.	2018-03-13 12:37:34 +02:00
Benoît Canet	1d0cc7cf20	messaging_service: Start messaging service earlier The messaging service was completely started after a bootstraping node finished to join hence leading to #2034. Fixes #2034 Message-Id: <20180313084500.27265-1-amnon@scylladb.com>	2018-03-13 10:59:53 +02:00
Avi Kivity	bd7881066a	tests: reduce dependencies in test_services.hh Convert storage_service_for_test to a pimpl implementation to reduce dependencies. Tests that depended on those includes were fixed to include their dependencies directly.	2018-03-12 20:05:23 +02:00
Avi Kivity	cd668061fc	storage_service: remove system_keyspace.hh include Re-distribute include among the files that really need it.	2018-03-11 18:53:49 +02:00
Avi Kivity	84004a2574	locator: de-inline production_snitch_base De-inlining allows us to remove some dependencies, and those functions are too complex to inline anyway. A few always-throwing functions get the [[noreturn]] attribute to avoid damaging code generation.	2018-03-11 18:22:49 +02:00
Raphael S. Carvalho	87035bd8d1	sstables: fix min and max timestamp when negative timestamp is specified unsigned type was incorrectly used for keeping track of min and max timestamp, so a negative number would be treated as a very high number that would incorrectly end up as max timestamp in sstable metadata. Fixes #3000. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180308162217.18963-1-raphaelsc@scylladb.com>	2018-03-08 18:31:30 +02:00
Botond Dénes	341ddd096a	Modify unit tests so that they test the dual-limits	2018-03-08 14:12:12 +02:00
Botond Dénes	1259031af3	Use the reader_concurrency_semaphore to limit reader concurrency	2018-03-08 14:12:12 +02:00
Tomasz Grabiec	180a877db3	tests: cache: Add tests for row-level eviction	2018-03-07 16:52:59 +01:00
Tomasz Grabiec	9fab5068c6	tests: cache: Check that data is evictable after schema change	2018-03-07 16:52:59 +01:00
Tomasz Grabiec	f0e0c79a70	tests: cache: Move definitions to the top	2018-03-07 16:52:59 +01:00
Tomasz Grabiec	1e4f9eb2c1	tests: perf_cache_eviction: Switch eviction counter to row granularity	2018-03-07 16:52:59 +01:00
Tomasz Grabiec	48f91b4605	tests: row_cache_alloc_stress: Avoid quadratic behavior Partitions corresponding to keys have 40k rows. With row-level eviction touching them inside the loop became a serious performance issue, because touch() now needs to walk over all rows.	2018-03-07 16:52:59 +01:00
Vladimir Krivopalov	8028f90460	Add support for JSON output format for perf_fast_forward results. The JSON output is arranged in a way that makes it easier to upload results to ElasticSearch. All the tests results are placed under the perf_forward_data_output/ directory For test groups, we create separate subdirectories where we save results from runs of tests in those groups. For each test run, we store results in a separate file named: <dash-separated-param-list>.<run-number>.json where <dash-separated-param-list> is a dash-separated list of parameters of the current test, e.g., 1-64 (for read-skip pattern). <run-number> is the number of run of this test with the specified parameters. This is needed as the same list of parameters can be used more than once (for instance, when cache is enabled). Those numbers start with 1, i.e., 1, 2, 3. So, the path to a resulting JSON file may look like: perf_fast_forward_output/large-partition-skips/64-4096.1.json Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-03-06 12:09:00 -08:00
Vladimir Krivopalov	e810fc4e09	Wrap output for customization. Move all output handling to a single managing class. Instead of passing the output parameters to std::cout straight away, use helper wrappers. This will allow us to add more formats for gathered tests results. Introduce helper writer classes hierarchy that can be extended to support different output formats (JSON, XML, etc). Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-03-06 09:49:05 -08:00
Tomasz Grabiec	da901b93fc	cache: Track number of rows and row invalidations	2018-03-06 11:50:29 +01:00
Tomasz Grabiec	381bf02f55	cache: Evict with row granularity Instead of evicting whole partitions, evicts whole rows. As part of this, invalidation of partition entries was changed to not evict from snapshots right away, but unlink them and let them be evicted by the reclaimer.	2018-03-06 11:50:29 +01:00
Tomasz Grabiec	19951ede7d	tests: mvcc: Use apply_to_incomplete() to create versions So that the test doesn't depend on internal invariants.	2018-03-06 11:50:28 +01:00
Tomasz Grabiec	ed6271fc87	tests: mvcc: Fix test_apply_to_incomplete() It should use evictable entries instead of non-evictable ones, because they are required by apply_to_incomplete().	2018-03-06 11:50:28 +01:00
Tomasz Grabiec	f2bdac2874	tests: cache: Do not depend on particular granularity of eviction	2018-03-06 11:50:28 +01:00
Tomasz Grabiec	c306c1050e	tests: cache: Make sure readers touch rows in test_eviction() With row-level eviction just creating a reader won't necessarily update the LRU.	2018-03-06 11:50:28 +01:00
Tomasz Grabiec	fb2107416b	tests: cache: Invoke partial eviction in test_concurrent_reads_and_eviction In hope of catching more issues.	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	5320705300	cache: Propagate cache_tracker to places manipulating evictable entries cache_tracker reference will be needed to link/unlink row entries. No change of behavior in this patch.	2018-03-06 11:50:27 +01:00
Tomasz Grabiec	bbe771e28f	tests: Add more tests for continuity merging	2018-03-06 11:50:26 +01:00
Tomasz Grabiec	9893e8e5f7	mvcc: Make each version have independent continuity This change is a preparation for introducing row-level eviction, such that entries can be evicted from older versions without having to touch other versions. Currently continuity flags on entries are interpreted relative to the combined view merged from all entries. For example: v2: <key=2, cont=1> v1: <key=1, cont=1> In v2, the flag on entry key=2 marks the range (1, 2) as continuous. This is problematic because if the old version is evicted, continuity will change in an incorrect way: v2: <key=2, cont=1> Here, the range (-inf, 1) would be marked as continuous, which is not true. To solve this problem, we change the rules for continuity interpretation in MVCC. Each version will have its own continuity, fully specified in that version, independent of continuity of other versions. Continuity of the snapshot will be a union of continuous ranges in each version. It is assumed that continuous intervals in different versions are non- overlapping, except for points corresponding to complete rows, in which case a later version may overlap with an older version (overwrite). We make use of this assumption to make calculation of the union of intervals on merging easier. I make use of the above assumption in mutation_partition::apply_monotonically(). MVCC population of incomplete entries already almost maintains the non-overlapping invariant, because population intervals correspond to intervals which are incomplete in the old snapshot. The only change needed is to ensure that both population bounds will have entries in the latest version. Population from memtables doesn't mark any intervals as continuous, so also conforms. The only change needed there is to not inherit continuity flags from the old snapshot, effectively making the new version internally discontinuous except for row points. The example from the beginning will become: v2: <key=1, cont=0> <key=2, cont=1> v1: <key=1, cont=1> When marking a range as continuous with some rows present only in older versions, we need to insert entries in the latest version, so that we can mark the range as continuous. The easiest solution is to copy the entry from the old version. Another option would be to add support for incomplete rows and insert such instead. This way we would avoid duplicating row contents. This optimization is deferred.	2018-03-06 11:50:25 +01:00
Tomasz Grabiec	bd1e730053	tests: cache: Add test for merging and reading randomly populated versions	2018-03-06 11:32:09 +01:00
Tomasz Grabiec	1b959cb6e9	tests: cache: Take parameters by const&	2018-03-06 11:32:09 +01:00
Tomasz Grabiec	d2744b6ad8	tests: mvcc: Don't set mutations in versions directly Simply copying mutations which are not fully continuous may violate MVCC invariants, like the one about non-overlapping continuity which will be added later. Use apply_to_incomplete() instead. This unfortunately reduces strenght of the test, since the continuity of the entry is now completely determined by the first version. We should use populate() instead, but it doesn't exist yet. It could be extracted from cache_streamed_mutation, but that's not an easy change. This is alleviated by adding a similar test to row_cache_test_g, in a later patch.	2018-03-06 11:32:09 +01:00
Tomasz Grabiec	d9f0c1f097	tests: cache: Fix invalidate() not being waited for Probably responsible for occasional failures of subsequent assertion. Didn't mange to reproduce. Message-Id: <1520330967-584-1-git-send-email-tgrabiec@scylladb.com>	2018-03-06 12:14:04 +02:00
Duarte Nunes	0c05fc0bff	tests/flush_queue_test: Don't assume continuations run immediately This patch fixes an issue with test_propagation(), where the test assumed that after the future returned from wait_for_pending(0) resolved, the continuations set for the post operation had already run, which is not true. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180305131908.7667-1-duarte@scylladb.com>	2018-03-05 15:22:33 +02:00
Avi Kivity	1dae29b48d	test: mutation_reader_test: fix no-timeout case in reader_wrapper reader_wrapper's _timeout defaults to now(), which means to time out immediately rather than no timeout. Fix by switching to a time_point, defaulting to no_timeout, and provide a compatible constructor (with a duration parameter) for callers that do want a duration-based timeout. Tests: mutation_reader_test (debug, release) Message-Id: <20180305111739.31972-1-avi@scylladb.com>	2018-03-05 12:40:07 +01:00
Vlad Zolotarov	e3ca390333	tests: gce_snitch_test: drop the property file related message The message in question is printed with printf() which is bad by itself. And most importantly this test uses a single .property file so this message doesn't add any interesting information to begin with. Therefore it makes more sense to drop it than to fix it. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1519661059-13325-1-git-send-email-vladz@scylladb.com>	2018-03-04 16:16:37 +02:00
Jesse Haber-Kucharsky	90af3d889a	tests: Rename test for consistency Now we have `cql_auth_query_test` and `cql_auth_syntax_test`.	2018-03-01 12:06:59 -05:00
Jesse Haber-Kucharsky	b84e22acdd	cql: Elaborate error for quoted user names Since quoted names are allowed for role names, we add a more descriptive error message when a quoted name is (erroneously) used for a user name. This behavior is consistent with Apache Cassandra.	2018-03-01 12:06:59 -05:00
Jesse Haber-Kucharsky	b5264d8bf7	cql: Allow role names to be string literals This behavior matches that of Apache Cassandra. When a role name is specified as a string literal (single quotes), the case is preserved.	2018-03-01 12:06:59 -05:00
Jesse Haber-Kucharsky	d7f2035dea	cql: Make role syntax more consistent This patch changes the syntax for CQL statements related to roles to favor a form like CREATE ROLE sam WITH PASSWORD = 'shire' AND LOGIN = false; instead of CREATE ROLE sam WITH PASSWORD 'shire' NOLOGIN; This new syntax has the benefit of not imposing any ordering constraints on the modifiers for roles and being consistent with other parts of the CQL grammar. It is also consistent with syntax in Apache Cassandra. The old USER-based statements (CREATE USER and ALTER USER) still have the old forms for backwards compatibility. A previous change modified the USER-related statements to allow for the OPTIONS option. However, this was a mistake; only the PASSWORD option should have been allowed. This patch also corrects this mistake.	2018-03-01 12:04:40 -05:00
Jesse Haber-Kucharsky	62bfc3939c	tests: Add CQL syntax tests for access-control These are quick-running tests for verifying the accepted forms of CQL statements (and fragments) related to access-control: users, roles, and permissions. Establishing the allowed forms of statements is helpful for reference, but also makes syntax changes (like those expected in later patches) clearer and more safe.	2018-03-01 11:46:37 -05:00
Avi Kivity	d973445a94	Merge "sstable/schema extensions" from Calle " Adds extension points to schema/sstables to enable hooking in stuff, like, say, something that modifies how sstable disk io works. (Cough, cough, encryption) Extensions are processed as property keywords in CQL. To add an extension, a "module" must register it into the extensions object on boot time. To avoid globals (and yet don't), extensions are reachable from config (and thus from db). Table/view tables already contain an extension element, so we utilize this to persist config. schema_tables tables/views from mutations now require a "context" object (currently only extensions, but abstracted for easier further changes. Because of how schemas currently operate, there is a super lame workaround to allow "schema_registry" access to config and by extension extensions. DB, upon instansiation, calls a thread local global "init" in schema_registry and registers the config. It, in turn, can then call table_from_mutations as required. Includes the (modified) patch to encapsulate compression into objects, mainly because it is nice to encapsulate, and isolate a little. " * 'calle/extensions-v5' of github.com:scylladb/seastar-dev: extensions: Small unit test sstables: Process extensions on file open sstables::types: Add optional extensions attribute to scylla metadata sstables::disk_types: Add hash and comparator(sstring) to disk_string schema_tables: Load/save extensions table cql: Add schema extensions processing to properties schema_tables: Require context object in schema load path schema_tables: Add opaque context object config_file_impl: Remove ostream operators main/init: Formalize configurables + add extensions to init call db::config: Add extensions as a config sub-object db::extensions: Configuration object to store various extensions cql3::statements::property_definitions: Use std::variant instead of any sstables: Add extension type for wrapping file io schema: Add opaque type to represent extensions sstables::compress/compress: Make compression a virtual object	2018-02-26 17:15:29 +02:00
Calle Wilund	e75d3dc997	extensions: Small unit test Test basic operation of schema and sstable extensions	2018-02-26 10:43:37 +00:00
Jesse Haber-Kucharsky	82c8104c72	cql_test_env: Ignore error if user already exists When a `cql_test_env` points to a data directory that was previously populated with `cql_test_env`, then the "tester" user will already exist. This is not an error, so we can just ignore the exception. Fixes #3224. Tests: unit (debug) Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <7729e5a98d8020a7ed1b6d12d8726559f0850f9d.1519315698.git.jhaberku@scylladb.com>	2018-02-22 19:30:50 +01:00
Pekka Enberg	f1f691b555	Merge "Add the GoogleCloudSnitch" from Vlad "This series adds the GoogleCloudSnitch. Fixes #1619" * 'google-cloud-snitch-v4' of https://github.com/vladzcloudius/scylla: config: uncomment/add the supported snitches description tests: added gce_snitch_test locator::gce_snitch: implementation of the GoogleCloudSnitch locator::snitch_base: properly log the failure during the snitch startup	2018-02-19 15:58:56 +02:00
Paweł Dziepak	d97eebe82d	tests/cql3: increase TTL to avoid spurious failures The test inserts some values with a TTL of 1 second and then reads them back expecting them not to be expired yet. That may not always be the case if the machine is slow and we are running in the debug mode. Increasising the TTLs by x100 should help avoid these false positives. Message-Id: <20180219133816.17452-1-pdziepak@scylladb.com>	2018-02-19 15:40:19 +02:00
Duarte Nunes	d394b30882	tests/flush_queue_test: Ensure queue is closed before being destroyed Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180217172008.27551-1-duarte@scylladb.com>	2018-02-19 13:10:28 +00:00
Duarte Nunes	294326b5b1	tests/commitlog_test: Close file Operations on a append_challenged_posix_file_impl schedule asynchronous operations when they are executed, which capture the file object. To synchronize with them and prevent use-after-free, we need to call close() before destroying the file. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180217170556.27330-1-duarte@scylladb.com>	2018-02-19 13:10:14 +00:00
Duarte Nunes	ac55210677	tests/logalloc_test: Ensure regions are reclaimed in order This test relied on task execution order to work correctly. Namely, it relied on parent regions being reclaimed before child regions (reclaiming is an asynchronous process started by a call to start_reclaiming()). This order is necessary because child regions don't know about parent regions when calculating the biggest region that should be reclaimed. We fix this by forcing the reclaim order. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180217121655.26057-1-duarte@scylladb.com>	2018-02-19 13:09:59 +00:00

1 2 3 4 5 ...

2108 Commits