scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 11:00:35 +00:00

Author	SHA1	Message	Date
Piotr Sarna	27bf20aa3f	cql3: enable ALLOW FILTERING Enables 'ALLOW FILTERING' queries by transfering control to result_set_builder::filtering_visitor. Both regular and primary key columns are allowed, but some things are left unimplemented: - multi-column restrictions - CONTAINS queries Fixes #2025	2018-07-05 10:50:43 +02:00
Piotr Sarna	7b018f6fd6	service: add filtering_pager For paged results of an 'ALLOW FILTERING' query, a filtering pager is provided. It's based on a filtering_visitor for result_builder.	2018-07-05 10:50:43 +02:00
Piotr Sarna	a08fba19e3	cql3: optimize filtering partition keys and static rows If any restriction on partition key or static row part fails, it will be so for every row that belongs to a partition. Hence, full check of the rest of the rows is skipped.	2018-07-05 10:50:43 +02:00
Piotr Sarna	2a0b720102	cql3: add filtering visitor In order to filter results of an 'ALLOW FILTERING' query, a visitor that can take optional filter for result_builder is provided. It defaults to nop_filter, which accepts all rows.	2018-07-05 10:50:43 +02:00
Piotr Sarna	1cf5653f89	cql3: move result_set_builder functions to header Moving function definitions to header is a preparation step before turning result_set_builder into a template.	2018-07-05 10:50:43 +02:00
Piotr Sarna	4d3d32f465	cql3: amend need_filtering() Previous implementation of need_filtering() was too eager to assume that index query should be used, whereas sometimes a query should just be filtered.	2018-07-05 10:50:39 +02:00
Piotr Sarna	f42eaff75e	cql3: add single column primary key restrictions getters Getters for single column partition/clustering key restrictions are added to statement_restrictions.	2018-07-04 09:48:32 +02:00
Piotr Sarna	a99acbc376	cql3: expose single column primary key restrictions Underlying single_column_restrictions are exposed for single_column_primary_key_restrictions via a const method.	2018-07-04 09:48:32 +02:00
Piotr Sarna	f7a2f15935	cql3: add needs_filtering to primary key restrictions Primary key restrictions sometimes require filtering. These functions return true if ALLOW FILTERING needs to be enabled in order to satisfy these restrictions.	2018-07-04 09:48:32 +02:00
Piotr Sarna	6aec9e711f	cql3: add simpler single_column_restriction::is_satisfied_by Currently restriction::is_satisfied_by() accepts only keys and rows as arguments. In this commit, a version that only takes bytes of data is provided. This simpler version applies to single_column_restriction only, because it compares raw bytes underneath anyway. For other restriction types, simplified is_satisfied_by is not defined.	2018-07-04 09:48:32 +02:00
Alexys Jacob	8c03c1e2ce	Support Gentoo Linux on node_health_check script. Gentoo Linux was not supported by the node_health_check script which resulted in the following error message displayed: "This s a Non-Supported OS, Please Review the Support Matrix" This patch adds support for Gentoo Linux while adding a TODO note to add support for authenticated clusters which the script does not support yet. Signed-off-by: Alexys Jacob <ultrabug@gentoo.org> Message-Id: <20180703124458.3788-1-ultrabug@gentoo.org>	2018-07-03 20:18:13 +03:00
Tomasz Grabiec	2ffb621271	Merge "Fix atomic_cell_or_collection::external_memory_usage()" from Paweł After the transition to the new in-memory representation in `aab6b0ee27` 'Merge "Introduce new in-memory representation for cells" from Paweł' atomic_cell_or_collection::external_memory_usage() stopped accounting for the externally stored data. Since, it wasn't covered by the unit tests the bug remained unnotices until now. This series fixes the memory usage calculation and adds proper unit tests. * https://github.com/pdziepak/scylla.git fix-external-memory-usage/v1: tests/mutation: properly mark atomic_cells that are collection members imr::utils::object: expose size overhead data::cell: expose size overhead of external chunks atomic_cell: add external chunks and overheads to external_memory_usage() tests/mutation: test external_memory_usage()	2018-07-03 14:58:10 +02:00
Botond Dénes	c236a96d7d	tests/cql_query_tess: add unit test for querying empty ranges test A bug was found recently (#3564) in the paging logic, where the code assumed the queried ranges list is non-empty. This assumption is incorrect as there can be valid (if rare) queries that can result in the ranges list to be empty. Add a unit test that executes such a query with paging enabled to detect any future bugs related to assumptions about the ranges list being non-empty. Refs: #3564 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <f5ba308c4014c24bb392060a7e72e7521ff021fa.1530618836.git.bdenes@scylladb.com>	2018-07-03 13:43:17 +01:00
Botond Dénes	59a30f0684	query_pager: be prepared to _ranges being empty do_fetch_page() checks in the beginning whether there is a saved query state already, meaning this is not the first page. If there is not it checks whether the query is for a singulular partitions or a range scan to decide whether to enable the stateful queries or not. This check assumed that there is at least one range in _ranges which will not hold under some circumstances. Add a check for _ranges being empty. Fixes: #3564 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <cbe64473f8013967a93ef7b2104c7ca0507afac9.1530610709.git.bdenes@scylladb.com>	2018-07-03 11:05:01 +01:00
Avi Kivity	eafd16266d	tests: reduce multishard_mutation_test runtime in debug mode Debug mode is so slow that generating 1000 mutations is too much for it. High memory use can also confuse the santitizers that track each allocation. Reduce mutation count from 1000 to 10 in debug mode.	2018-07-03 12:01:44 +03:00
Avi Kivity	a36b1f1967	Merge "more scylla_setup fixes" from Takuya " Added NIC / Disk existance check, --force-raid mode on scylla_raid_setup. " * 'scylla_setup_fix4' of https://github.com/syuu1228/scylla: dist/common/scripts/scylla_raid_setup: verify specified disks are unused dist/common/scripts/scylla_raid_setup: add --force-raid to construct raid even only one disk is specified dist/common/scripts/scylla_setup: don't accept disk path if it's not block device dist/common/scripts/scylla_raid_setup: verify specified disk paths are block device dist/common/scripts/scylla_sysconfig_setup: verify NIC existance	2018-07-03 11:03:08 +03:00
Takuya ASADA	d0f39ea31d	dist/common/scripts/scylla_raid_setup: verify specified disks are unused Currently only scylla_setup interactive mode verifies selected disks are unused, on non-interactive mode we get mdadm/mkfs.xfs program error and python backtrace when disks are busy. So we should verify disks are unused also on scylla_raid_setup, print out simpler error message.	2018-07-03 14:50:34 +09:00
Takuya ASADA	3289642223	dist/common/scripts/scylla_raid_setup: add --force-raid to construct raid even only one disk is specified User may want to start RAID volume with only one disk, add an option to force constructing RAID even only one disk specified.	2018-07-03 14:50:34 +09:00
Takuya ASADA	e0c16c4585	dist/common/scripts/scylla_setup: don't accept disk path if it's not block device Need to ignore input when specified path is not block device.	2018-07-03 14:50:34 +09:00
Takuya ASADA	24ca2d85c6	dist/common/scripts/scylla_raid_setup: verify specified disk paths are block device Verify disk paths are block device, exit with error if not.	2018-07-03 14:50:34 +09:00
Takuya ASADA	99b5cf1f92	dist/common/scripts/scylla_sysconfig_setup: verify NIC existance Verify NIC existance before writing sysconfig file to prevent causing error while running scylla. See #2442	2018-07-03 14:50:34 +09:00
Takuya ASADA	084c824d12	scripts: merge scylla_install_pkg to scylla-ami scylla_install_pkg is initially written for one-liner-installer, but now it only used for creating AMI, and it just few lines of code, so it should be merge into scylla_install_ami script. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180612150106.26573-2-syuu@scylladb.com>	2018-07-02 13:20:09 +03:00
Takuya ASADA	fafcacc31c	dist/ami: drop Ubuntu AMI support Drop Ubuntu AMI since it's not maintained for a long time, and we have no plan to officially provide it. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20180612150106.26573-1-syuu@scylladb.com>	2018-07-02 13:20:08 +03:00
Avi Kivity	677991f353	Uodate scylla-ami submodule * dist/ami/files/scylla-ami 36e8511...0fd9d23 (2): > scylla_install_ami: merge scylla_install_pkg > scylla_install_ami: drop Ubuntu AMI	2018-07-02 13:19:34 +03:00
Avi Kivity	0b148d0070	Merge "scylla_setup fixes" from Takuya " I found problems on previously submmited patchset 'scylla_setup fixes' and 'more fixes for scylla_setup', so fixed them and merged into one patchset. Also added few more patches. " * 'scylla_setup_fix3' of https://github.com/syuu1228/scylla: dist/common/scripts/scylla_setup: allow input multiple disk paths on RAID disk prompt dist/common/scripts/scylla_raid_setup: skip constructing RAID0 when only one disk specified dist/common/scripts/scylla_raid_setup: fix module import dist/common/scripts/scylla_setup: check disk is used in MDRAID dist/common/scripts/scylla_setup: move unmasking scylla-fstrim.timer on scylla_fstrim_setup dist/common/scripts/scylla_setup: use print() instead of logging.error() dist/common/scripts/scylla_setup: implement do_verify_package() for Gentoo Linux dist/common/scripts/scylla_coredump_setup: run os.remove() when deleting directory is symlink dist/common/scripts/scylla_setup: don't include the disk on unused list when it contains partitions dist/common/scripts/scylla_setup: skip running rest of the check when the disk detected as used dist/common/scripts/scylla_setup: add a disk to selected list correctly dist/common/scripts/scylla_setup: fix wrong indent dist/common/scripts: sync instance type list for detect NIC type to latest one dist/common/scripts: verify systemd unit existance using 'systemctl cat'	2018-07-02 10:21:49 +03:00
Avi Kivity	a45c3aa8c7	Merge "Fix handling of stale write replies in storage_proxy" from Gleb " If a coordinator sends write requests with ID=X and restarts it may get a reply to the request after it restarts and sends another request with the same ID (but to different replicas). This condition will trigger an assert in a coordinator. Drop the assertion in favor of a warning and initialize handler id in a way to make this situation less likely. Fixes: #3153 " * 'gleb/write-handler-id' of github.com:scylladb/seastar-dev: storage_proxy: initialize write response id counter from wall clock value storage_proxy: drop virtual from signal(gms::inet_address) storage_proxy: do not assert on getting an unexpected write reply	2018-07-01 17:59:54 +03:00
Gleb Natapov	19e7493d5b	storage_proxy: initialize write response id counter from wall clock value Initializing write response id to the same value on each reboot may cause stale id to be taken for active one if node restarts after sending only a couple of write request and before receiving replies. On next reboot it will start assigning id's from the same value and receiving old replies will confuse it. Mitigate this by assigning initial id to wall clock value in milliseconds. It will not solve the problem completely, but will mitigate it.	2018-07-01 17:24:40 +03:00
Nadav Har'El	3194ce16b3	repair: fix combination of "-pr" and "-local" repair options When nodetool repair is used with the combination of the "-pr" (primary range) and "-local" (only repair with nodes in the same DC) options, Scylla needs to define the "primary ranges" differently: Rather than assign one node in the entire cluster to be the primary owner of every token, we need one node in each data-center - so that a "-local" repair will cover all the tokens. Fixes #3557. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180701132445.21685-1-nyh@scylladb.com>	2018-07-01 16:39:33 +03:00
Gleb Natapov	569437aaa5	storage_proxy: drop virtual from signal(gms::inet_address) The function is not overridden, so should not be virtual.	2018-07-01 16:35:59 +03:00
Gleb Natapov	5ee09e5f3b	storage_proxy: do not assert on getting an unexpected write reply In theory we should not get write reply from a node we did not send write to, but in practice stale reply can be received if node reboot between sending write and getting a reply. Do not assert, but log the warning instead and ignore the reply. Fixes: #3153	2018-07-01 16:35:09 +03:00
Tomasz Grabiec	b464b66e90	row_cache: Fix memtable reads concurrent with cache update missing writes Introduced in `5b59df3761`. It is incorrect to erase entries from the memtable being moved to cache if partition update can be preempted because a later memtable read may create a snapshot in the memtable before memtable writes for that partition are made visible through cache. As a result the read may miss some of the writes which were in the memtable. The code was checking for presence of snapshots when entering the partition, but this condition may change if update is preempted. The fix is to not allow erasing if update is preemptible. This also caused SIGSEGVs because we were assuming that no such snapshots will be created and hence were not invalidating iterators on removal of the entries, which results in undefined behavior when such snapshots are actually created. Fixes SIGSEGV in dtest: limits_test.py:TestLimits.max_cells_test Fixes #3532 Message-Id: <1530129009-13716-1-git-send-email-tgrabiec@scylladb.com>	2018-07-01 15:36:05 +03:00
Avi Kivity	f3da043230	Merge "Make in-memory partition version merging preemptable" from Tomasz " Partition snapshots go away when the last read using the snapshot is done. Currently we will synchronously attempt to merge partition versions on this event. If partitions are large, that may stall the reactor for a significant amount of time, depending on the size of newer versions. Cache update on memtable flush can create especially large versions. The solution implemented in this series is to allow merging to be preemptable, and continue in the background. Background merging is done by the mutation_cleaner associated with the container (memtable, cache). There is a single merging process per mutation_cleaner. The merging worker runs in a separate scheduling group, introduced here, called "mem_compaction". When the last user of a snapshot goes away the snapshot is slided to the oldest unreferenced version first so that the version is no longer reachable from partition_entry::read(). The cleaner will then keep merging preceding (newer) versions into it, until it merges a version which is referenced. The merging is preemtable. If the initial merging is preempted, the snapshot is enqueued into the cleaner, the worker woken up, and merging will continue asynchronously. When memtable is merged with cache, its cleaner is merged with cache cleaner, so any outstanding background merges will be continued by the cache cleaner without disruption. This reduces scheduling latency spikes in tests/perf_row_cache_update for the case of large partition with many rows. For -c1 -m1G I saw them dropping from >23ms to 1-2ms. System-level benchmark using scylla-bench shows a similar improvement. " * tag 'tgrabiec/merge-snapshots-gradually-v4' of github.com:tgrabiec/scylla: tests: perf_row_cache_update: Test with an active reader surviving memtable flush memtable, cache: Run mutation_cleaner worker in its own scheduling group mutation_cleaner: Make merge() redirect old instance to the new one mvcc: Use RAII to ensure that partition versions are merged mvcc: Merge partition version versions gradually in the background mutation_partition: Make merging preemtable tests: mvcc: Use the standard maybe_merge_versions() to merge snapshots	2018-07-01 15:32:51 +03:00
Botond Dénes	5fd9c3b9d4	tests/mutation_reader_test: require min shard-count for multishard tests Tests testing different aspects of `foreign_reader` and `multishard_combining_reader` are designed to run with a certain minimum shard count. Running them with any shard count below this minimum makes them useless at best but can even fail them. Refuse to run these tests when the shard count is below the required minimum to avoid an accidental and unnecessary investigation into a false-positive test failure. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <d24159415b6a9d74eafb8355b6e3fba98c1ff7ff.1530274392.git.bdenes@scylladb.com>	2018-07-01 12:44:41 +03:00
Avi Kivity	f73340e6f8	Merge "Index reader and associated types clean-up." from Vladimir " This patchset paves way to support for reading SSTables 3.x index files. It aims at streamlining and tidying up the existing index_reader and helpers and brings no functional or high-level changes. In v3: - do not capture 'found' and just return 'true' in the continuation inside advance_and_check_if_present() - split code that makes the use of advance_upper_past() internal-only into two commits for better readability GitHub URL: https://github.com/argenet/scylla/tree/projects/sstables-30/index_reader_cleanup/v3 Tests: unit {release} Performance tests (perf_fast_forward) did not reveal any noticeable changes. The complete output is below. ======================================== Original code (before the patchset) ======================================== running: large-partition-skips Testing scanning large partition with skips. Reads whole range interleaving reads with skips according to read-skip pattern: read skip time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 1 0 0.336514 1000000 2971642 1000 126956 35 0 0 0 0 0 0 0 99.5% 1 1 1.411239 500000 354299 993 127056 2 0 0 1 1 0 0 0 99.9% 1 8 0.464468 111112 239224 993 127056 2 0 0 1 1 0 0 0 99.8% 1 16 0.330490 58824 177990 993 127056 12 0 0 1 1 0 0 0 99.7% 1 32 0.257010 30304 117910 993 127056 15 0 0 1 1 0 0 0 99.7% 1 64 0.213650 15385 72010 997 127072 268 0 0 3 3 0 0 0 99.5% 1 256 0.159498 3892 24402 993 127056 245 0 0 1 1 0 0 0 95.5% 1 1024 0.088678 976 11006 993 127056 347 0 0 1 1 0 0 0 63.4% 1 4096 0.082627 245 2965 649 22452 389 252 0 1 1 0 0 0 20.0% 64 1 0.411080 984616 2395191 1059 127056 57 1 0 1 1 0 0 0 99.1% 64 8 0.390130 888896 2278461 993 127056 2 0 0 1 1 0 0 0 99.8% 64 16 0.369033 800000 2167828 993 127056 3 0 0 1 1 0 0 0 99.8% 64 32 0.338126 666688 1971714 993 127056 10 0 0 1 1 0 0 0 99.7% 64 64 0.297335 500032 1681711 997 127072 18 0 0 3 3 0 0 0 99.7% 64 256 0.199420 200000 1002910 993 127056 211 0 0 1 1 0 0 0 99.5% 64 1024 0.113953 58880 516704 993 127056 284 0 0 1 1 0 0 0 64.1% 64 4096 0.094596 15424 163051 687 23684 415 248 0 1 1 0 0 0 23.7% running: large-partition-slicing Testing slicing of large partition: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.000586 1 1706 3 164 2 1 0 1 1 0 0 0 9.0% 0 32 0.000587 32 54539 3 164 2 1 0 1 1 0 0 0 9.9% 0 256 0.000688 256 372343 4 196 2 1 0 1 1 0 0 0 20.7% 0 4096 0.004320 4096 948185 19 676 10 1 0 1 1 0 0 0 36.7% 500000 1 0.000882 1 1134 5 228 3 2 0 1 1 0 0 0 14.3% 500000 32 0.000881 32 36321 5 228 3 2 0 1 1 0 0 0 14.3% 500000 256 0.000961 256 266386 6 260 3 2 0 1 1 0 0 0 21.9% 500000 4096 0.003127 4096 1309805 21 740 14 2 0 1 1 0 0 0 54.0% running: large-partition-slicing-clustering-keys Testing slicing of large partition using clustering keys: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.000639 1 1564 3 164 2 0 0 1 1 0 0 0 13.9% 0 32 0.000626 32 51154 3 164 2 0 0 1 1 0 0 0 15.3% 0 256 0.000716 256 357560 4 168 2 0 0 1 1 0 0 0 23.1% 0 4096 0.003681 4096 1112743 16 680 8 1 0 1 1 0 0 0 38.5% 500000 1 0.000966 1 1035 4 424 3 2 0 1 1 0 0 0 12.4% 500000 32 0.000911 32 35121 5 296 3 1 0 1 1 0 0 0 13.1% 500000 256 0.000978 256 261645 5 296 3 1 0 1 1 0 0 0 19.1% 500000 4096 0.003155 4096 1298139 11 744 6 1 0 1 1 0 0 0 44.5% running: large-partition-slicing-single-key-reader Testing slicing of large partition, single-partition reader: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.000756 1 1323 4 484 2 0 0 1 1 0 0 0 11.3% 0 32 0.000625 32 51174 3 164 2 0 0 1 1 0 0 0 15.5% 0 256 0.000705 256 363337 4 196 2 0 0 1 1 0 0 0 24.3% 0 4096 0.003603 4096 1136829 16 900 8 1 0 1 1 0 0 0 44.4% 500000 1 0.000880 1 1136 5 228 3 3 0 1 1 0 0 0 12.6% 500000 32 0.000882 32 36268 5 228 3 1 0 1 1 0 0 0 14.0% 500000 256 0.000965 256 265178 6 260 3 1 0 1 1 0 0 0 20.8% 500000 4096 0.003098 4096 1322024 21 740 14 2 0 1 1 0 0 0 54.6% running: large-partition-select-few-rows Testing selecting few rows from a large partition: stride rows time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 1000000 1 0.000631 1 1585 3 164 2 2 0 1 1 0 0 0 15.2% 500000 2 0.000873 2 2291 5 228 3 2 0 1 1 0 0 0 13.2% 250000 4 0.001404 4 2850 9 356 5 4 0 1 1 0 0 0 11.9% 125000 8 0.002878 8 2779 21 740 13 8 0 1 1 0 0 0 15.5% 62500 16 0.005184 16 3087 41 1380 25 16 0 1 1 0 0 0 19.3% 2 500000 0.948899 500000 526926 1040 127056 39 0 0 1 1 0 0 0 99.9% running: large-partition-forwarding Testing forwarding with clustering restriction in a large partition: pk-scan time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu yes 0.001813 2 1103 11 1380 3 8 0 1 1 0 0 0 18.5% no 0.000922 2 2170 5 228 3 1 0 1 1 0 0 0 14.1% running: small-partition-skips Testing scanning small partitions with skips. Reads whole range interleaving reads with skips according to read-skip pattern: read skip time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> 1 0 1.023396 1000000 977139 1104 139668 12 0 0 2 2 0 0 0 99.7% -> 1 1 2.176794 500000 229696 6200 177660 5109 0 0 5108 7679 0 0 0 69.9% -> 1 8 1.130179 111112 98314 6200 177660 5109 0 0 5108 9647 0 0 0 41.5% -> 1 16 0.972022 58824 60517 6200 177660 5109 0 0 5108 9913 0 0 0 32.0% -> 1 32 0.880783 30304 34406 6201 177664 5110 0 0 5108 10057 0 0 0 25.2% -> 1 64 0.829019 15385 18558 6199 177656 5108 0 0 5107 10135 0 0 0 20.4% -> 1 256 2.248487 3892 1731 5028 168948 3937 0 0 3936 7801 0 0 0 4.6% -> 1 1024 0.342806 976 2847 2076 146948 985 105 0 984 1955 0 0 0 9.3% -> 1 4096 0.088605 245 2765 739 18152 492 246 0 247 490 0 0 0 11.1% -> 64 1 1.796715 984616 548009 6274 177660 5120 0 0 5108 5187 0 0 0 63.1% -> 64 8 1.688994 888896 526287 6200 177660 5109 0 0 5108 5674 0 0 0 61.2% -> 64 16 1.593196 800000 502135 6200 177660 5109 0 0 5108 6143 0 0 0 58.7% -> 64 32 1.438651 666688 463412 6200 177660 5109 0 0 5108 6807 0 0 0 56.5% -> 64 64 1.290205 500032 387560 6200 177660 5109 0 0 5108 7660 0 0 0 49.2% -> 64 256 2.136466 200000 93613 5252 170616 4161 0 0 4160 6267 0 0 0 13.8% -> 64 1024 0.388871 58880 151413 2317 148784 1226 107 0 1225 1844 0 0 0 23.4% -> 64 4096 0.107253 15424 143809 807 19100 562 244 0 321 482 0 0 0 24.2% running: small-partition-slicing Testing slicing small partitions: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.002773 1 361 3 68 2 0 0 1 1 0 0 0 10.5% 0 32 0.002905 32 11015 3 68 2 0 0 1 1 0 0 0 11.6% 0 256 0.003170 256 80764 4 104 2 0 0 1 1 0 0 0 17.8% 0 4096 0.008125 4096 504095 20 616 11 1 0 1 1 0 0 0 54.1% 500000 1 0.002914 1 343 3 72 2 0 0 1 2 0 0 0 10.7% 500000 32 0.002967 32 10786 3 72 2 0 0 1 2 0 0 0 12.6% 500000 256 0.003338 256 76685 5 112 3 0 0 2 2 0 0 0 17.4% 500000 4096 0.008495 4096 482141 21 624 12 1 0 2 2 0 0 0 52.3% ======================================== With the patchset ======================================== running: large-partition-skips Testing scanning large partition with skips. Reads whole range interleaving reads with skips according to read-skip pattern: read skip time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 1 0 0.340110 1000000 2940229 1000 126956 42 0 0 0 0 0 0 0 97.5% 1 1 1.401352 500000 356798 993 127056 2 0 0 1 1 0 0 0 99.9% 1 8 0.463124 111112 239918 993 127056 2 0 0 1 1 0 0 0 99.8% 1 16 0.330050 58824 178228 993 127056 11 0 0 1 1 0 0 0 99.7% 1 32 0.255981 30304 118384 993 127056 8 0 0 1 1 0 0 0 99.7% 1 64 0.215160 15385 71505 997 127072 263 0 0 3 3 0 0 0 99.4% 1 256 0.159702 3892 24370 993 127056 239 0 0 1 1 0 0 0 95.6% 1 1024 0.094403 976 10339 993 127056 298 0 0 1 1 0 0 0 58.9% 1 4096 0.082501 245 2970 649 22452 391 252 0 1 1 0 0 0 20.1% 64 1 0.415227 984616 2371272 1059 127056 52 1 0 1 1 0 0 0 99.3% 64 8 0.391556 888896 2270166 993 127056 2 0 0 1 1 0 0 0 99.8% 64 16 0.372075 800000 2150102 993 127056 4 0 0 1 1 0 0 0 99.7% 64 32 0.337454 666688 1975641 993 127056 15 0 0 1 1 0 0 0 99.7% 64 64 0.296345 500032 1687333 997 127072 21 0 0 3 3 0 0 0 99.7% 64 256 0.199221 200000 1003911 993 127056 204 0 0 1 1 0 0 0 99.4% 64 1024 0.118224 58880 498037 993 127056 275 0 0 1 1 0 0 0 61.8% 64 4096 0.095098 15424 162191 687 23684 417 248 0 1 1 0 0 0 23.7% running: large-partition-slicing Testing slicing of large partition: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.000585 1 1709 3 164 2 1 0 1 1 0 0 0 10.7% 0 32 0.000589 32 54353 3 164 2 1 0 1 1 0 0 0 10.0% 0 256 0.000688 256 372293 4 196 2 1 0 1 1 0 0 0 20.7% 0 4096 0.004336 4096 944562 19 676 10 1 0 1 1 0 0 0 36.9% 500000 1 0.000877 1 1140 5 228 3 2 0 1 1 0 0 0 13.6% 500000 32 0.000883 32 36222 5 228 3 2 0 1 1 0 0 0 14.4% 500000 256 0.000963 256 265804 6 260 3 2 0 1 1 0 0 0 22.0% 500000 4096 0.003008 4096 1361779 21 740 17 2 0 1 1 0 0 0 56.7% running: large-partition-slicing-clustering-keys Testing slicing of large partition using clustering keys: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.000623 1 1604 3 164 2 0 0 1 1 0 0 0 13.9% 0 32 0.000624 32 51261 3 164 2 0 0 1 1 0 0 0 14.7% 0 256 0.000714 256 358484 4 168 2 0 0 1 1 0 0 0 22.6% 0 4096 0.003687 4096 1110990 16 680 8 1 0 1 1 0 0 0 38.6% 500000 1 0.000973 1 1028 4 424 3 2 0 1 1 0 0 0 12.1% 500000 32 0.000914 32 35022 5 296 3 1 0 1 1 0 0 0 12.8% 500000 256 0.000986 256 259646 5 296 3 1 0 1 1 0 0 0 19.7% 500000 4096 0.003155 4096 1298122 11 744 6 1 0 1 1 0 0 0 44.5% running: large-partition-slicing-single-key-reader Testing slicing of large partition, single-partition reader: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.000766 1 1305 4 484 2 0 0 1 1 0 0 0 12.2% 0 32 0.000626 32 51111 3 164 2 0 0 1 1 0 0 0 15.2% 0 256 0.000710 256 360563 4 196 2 0 0 1 1 0 0 0 25.2% 0 4096 0.003963 4096 1033440 16 900 8 1 0 1 1 0 0 0 40.2% 500000 1 0.000877 1 1141 5 228 3 1 0 1 1 0 0 0 12.7% 500000 32 0.000882 32 36272 5 228 3 1 0 1 1 0 0 0 14.2% 500000 256 0.000959 256 266937 6 260 3 1 0 1 1 0 0 0 21.1% 500000 4096 0.003103 4096 1319992 21 740 14 2 0 1 1 0 0 0 53.9% running: large-partition-select-few-rows Testing selecting few rows from a large partition: stride rows time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 1000000 1 0.000631 1 1586 3 164 2 2 0 1 1 0 0 0 13.8% 500000 2 0.000872 2 2295 5 228 3 2 0 1 1 0 0 0 13.4% 250000 4 0.001483 4 2698 9 356 5 4 0 1 1 0 0 0 11.2% 125000 8 0.002894 8 2764 21 740 13 8 0 1 1 0 0 0 15.6% 62500 16 0.005182 16 3087 41 1380 25 16 0 1 1 0 0 0 19.5% 2 500000 0.942943 500000 530255 1040 127056 38 0 0 1 1 0 0 0 99.9% running: large-partition-forwarding Testing forwarding with clustering restriction in a large partition: pk-scan time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu yes 0.001807 2 1107 11 1380 3 8 0 1 1 0 0 0 18.9% no 0.000924 2 2165 5 228 3 1 0 1 1 0 0 0 14.1% running: small-partition-skips Testing scanning small partitions with skips. Reads whole range interleaving reads with skips according to read-skip pattern: read skip time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu -> 1 0 1.009953 1000000 990145 1104 139668 11 0 0 2 2 0 0 0 99.7% -> 1 1 2.213846 500000 225851 6200 177660 5109 0 0 5108 7679 0 0 0 70.3% -> 1 8 1.150029 111112 96617 6200 177660 5109 0 0 5108 9647 0 0 0 42.3% -> 1 16 0.989438 58824 59452 6200 177660 5109 0 0 5108 9913 0 0 0 33.2% -> 1 32 0.891590 30304 33989 6201 177664 5110 0 0 5108 10057 0 0 0 26.4% -> 1 64 0.840952 15385 18295 6199 177656 5108 0 0 5107 10135 0 0 0 21.6% -> 1 256 2.247875 3892 1731 5028 168948 3937 0 0 3936 7801 0 0 0 5.0% -> 1 1024 0.345917 976 2821 2076 146948 985 105 0 984 1955 0 0 0 10.0% -> 1 4096 0.088806 245 2759 739 18152 492 246 0 247 490 0 0 0 11.6% -> 64 1 1.821995 984616 540406 6274 177660 5119 0 0 5108 5187 0 0 0 63.9% -> 64 8 1.715052 888896 518291 6200 177660 5109 0 0 5108 5674 0 0 0 61.9% -> 64 16 1.620385 800000 493710 6200 177660 5109 0 0 5108 6143 0 0 0 59.4% -> 64 32 1.464497 666688 455233 6200 177660 5109 0 0 5108 6807 0 0 0 56.9% -> 64 64 1.311386 500032 381300 6200 177660 5109 0 0 5108 7660 0 0 0 50.0% -> 64 256 2.153954 200000 92853 5252 170616 4161 0 0 4160 6267 0 0 0 14.3% -> 64 1024 0.350275 58880 168097 2317 148784 1226 107 0 1225 1844 0 0 0 27.5% -> 64 4096 0.107498 15424 143482 807 19100 562 244 0 321 482 0 0 0 24.5% running: small-partition-slicing Testing slicing small partitions: offset read time (s) frags frag/s aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu 0 1 0.002872 1 348 3 68 2 0 0 1 1 0 0 0 10.2% 0 32 0.002833 32 11297 3 68 2 0 0 1 1 0 0 0 12.1% 0 256 0.003145 256 81404 4 104 2 0 0 1 1 0 0 0 17.9% 0 4096 0.008110 4096 505079 20 616 12 1 0 1 1 0 0 0 54.4% 500000 1 0.002934 1 341 3 72 2 1 0 1 2 0 0 0 10.6% 500000 32 0.002871 32 11145 3 72 2 0 0 1 2 0 0 0 12.0% 500000 256 0.003216 256 79598 5 112 3 0 0 2 2 0 0 0 18.3% 500000 4096 0.008557 4096 478692 21 624 12 1 0 2 2 0 0 0 51.9% " * 'projects/sstables-30/index_reader_cleanup/v3' of https://github.com/argenet/scylla: sstables: Remove "lower_" from index_reader public methods. sstables: Make index_reader::advance_upper_past() method private. sstables: Stop using index_reader::advance_upper_past() outside the class. sstables: Move promoted_index_block from types.hh to index_entry.hh. sstables: Factor out promoted index into a separate class. sstables: Use std::optional instead of std::experimental optional in index_reader.	2018-07-01 12:30:29 +03:00
Botond Dénes	da53ea7a13	tests.py: add --jobs command line parameter Allowing for setting the number of jobs to use for running the tests. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <d58d6393c6271bffc37ab3b5edc37b00ef485d9c.1529433590.git.bdenes@scylladb.com>	2018-07-01 12:26:41 +03:00
Vladimir Krivopalov	b24eb5c11d	sstables: Remove "lower_" from index_reader public methods. The index_reader class public interface has been amended to only deal with the upper bound cursor along with advancing the lower bound. Since the class users can only explicitly operate with the lower bound cursor (take data file position, advance to the next partition, etc), it no longer makes sense to specify that the method operates on the lower bound cursor in its name. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-06-29 11:48:33 -07:00
Vladimir Krivopalov	30109a693b	sstables: Make index_reader::advance_upper_past() method private. No changes made to the code except that it is moved around. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-06-29 11:47:48 -07:00
Vladimir Krivopalov	80d1d5017f	sstables: Stop using index_reader::advance_upper_past() outside the class. The only case when it needs to be called is when an index_reader is advanced to a specific partition as part of sstable_reader initialisation. Instead, we're passing an optional upper_bound parameter that is used to call advance_upper_past() internally if partition is found. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-06-29 11:47:20 -07:00
Duarte Nunes	0db5419ec5	Merge 'Avoid copies when unfreezing frozen_mutation' from Paweł " When frozen mutation gets deserialised current implementation copies its value 3 times: from IDL buffer to bytes object, from bytes object to atomic_cell and then atomic_cell is copied again. Moreover, the value gets linearised which may cause a large allocation. All of that is very wasteful. This patch devirtualises and reworks IDL reading code so that when used with partition_builder the cell value is copied only once and without linearisation: from the IDL buffer to the final atomic_cell. perf_simple_query -c4, medians of 30 results: ./perf_before ./perf_after diff read 310576.54 316273.90 1.8% write 359913.15 375579.44 4.4% microbenchmark, perf_idl: BEFORE test iterations median mad min max frozen_mutation.freeze_one_small_row 2142435 462.431ns 0.125ns 462.306ns 467.659ns frozen_mutation.unfreeze_one_small_row 1640949 601.422ns 0.082ns 601.340ns 605.279ns frozen_mutation.apply_one_small_row 1538969 645.993ns 0.405ns 645.588ns 656.510ns AFTER test iterations median mad min max frozen_mutation.freeze_one_small_row 2139548 455.525ns 0.631ns 454.894ns 456.707ns frozen_mutation.unfreeze_one_small_row 1760139 566.157ns 0.003ns 566.153ns 584.339ns frozen_mutation.apply_one_small_row 1582050 610.951ns 0.060ns 610.891ns 613.044ns Tests: unit(release) " * tag 'avoid-copy-unfreeze/v2' of https://github.com/pdziepak/scylla: mutation_partition_view: use column_mapping_entry::is_atomic() schema: column_mapping_entry: cache abstract_type::is_atomic() schema: column_mapping_entry: reduce logic duplication mutation_partition_view: do not linearise or copy cell value atomic_cell: allow passing value via ser::buffer_view mutation_partition_view: pass cell by value to visitor mutation_partition_view: devirtualise accept() storage_proxy: use mutation_partition_view::{first, last}_row_key() mutation_partition_view: add last_row_key() and first_row_key() getters	2018-06-28 22:55:20 +01:00
Paweł Dziepak	c45e291084	mutation_partition_view: use column_mapping_entry::is_atomic()	2018-06-28 22:16:42 +01:00
Paweł Dziepak	6c54a97320	schema: column_mapping_entry: cache abstract_type::is_atomic() IDL deserialisation code calls is_atomic() for each cell. An additional indirection and a virtual call can be avoided by caching that value in column_mapping_entry. There is already very similar optimisation done for column_definitions.	2018-06-28 22:16:42 +01:00
Paweł Dziepak	2bfdc2d781	schema: column_mapping_entry: reduce logic duplication User-defined constructors often make it more likely that a careless developer will forget to update one of them when adding a new member to a structure. The risk of that happening can be reduced by reducing code duplication with delegating constructors.	2018-06-28 22:16:42 +01:00
Paweł Dziepak	199f9196e9	mutation_partition_view: do not linearise or copy cell value	2018-06-28 22:11:19 +01:00
Paweł Dziepak	92700c6758	atomic_cell: allow passing value via ser::buffer_view	2018-06-28 22:11:19 +01:00
Paweł Dziepak	bf330a99f0	mutation_partition_view: pass cell by value to visitor mutation_partition_view needs to create an atomic_cell from IDL-serialised data. Then that cell is passed to the visitor. However, because generic mutation_partition_visitor interface was used, the cell was passed by constant reference which forced the visitor to needlessly copy it. This patch takes advantage of the fact that mutation_partition_view is devirtualised now and adjust the interfaces of its visitors so that the cell can be passed without copying.	2018-06-28 22:11:19 +01:00
Paweł Dziepak	569176aad1	mutation_partition_view: devirtualise accept() There are only two types of visitors used and only one of them appears in the hot path. They can be devirtualised without too much effort, which also enables future custom interface specialisations specific to mutation_partition_views and its users, not necessairly in the scope of more general mutation_partition_visitor.	2018-06-28 22:11:19 +01:00
Paweł Dziepak	6bd71015e7	storage_proxy: use mutation_partition_view::{first, last}_row_key()	2018-06-28 22:11:19 +01:00
Paweł Dziepak	2259eee97c	mutation_partition_view: add last_row_key() and first_row_key() getters Some users (e.g. reconciliation code) need only to know the clustering key of the first or the last row in the partition. This was done with a full visitor visiting every single cell of the partition, which is very wasteful. This patch adds direct getters for the needed information.	2018-06-28 22:11:19 +01:00
Vladimir Krivopalov	a497edcbda	sstables: Move promoted_index_block from types.hh to index_entry.hh. It is only being used by index_reader internally and never exposed so should not be listed in commonly used types. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-06-28 12:28:59 -07:00
Vladimir Krivopalov	81fba73e9d	sstables: Factor out promoted index into a separate class. An index entry may or may not have a promoted index. All the optional fields are better scoped under the same class to avoid lots of separate optional fields and give better representation. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-06-28 12:28:59 -07:00

1 2 3 4 5 ...

15976 Commits