scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-01 21:55:50 +00:00

Author	SHA1	Message	Date
Piotr Sarna	4eecb57a0b	cql3: add missing include to multi-column restriction	2019-02-19 10:24:31 +01:00
Avi Kivity	e37e095432	build: allow configuring and testing multiple modes Allow the --mode argument to ./configure.py and ./test.py to be repeated. This is to allow contiuous integration to configure only debug and release, leaving dev to developers. Message-Id: <20190214162736.16443-1-avi@scylladb.com>	2019-02-18 15:52:25 +00:00
Duarte Nunes	6e83457b1b	Merge 'Add PER PARTITION LIMIT' from Piotr " This series introduces PER PARTITION LIMIT to CQL. Protocol and storage is already capable of applying per-partition limits, so for nonpaged queries the changes are superficial - a variable is parsed and passed down. For paged queries and filtering the situation is a little bit more complicated due to corner cases: results for one partition can be split over 2 or more pages, filtering may drop rows, etc. To solve these, another variable is added to paging state - the number of rows already returned from last served partition. Note that "last" partition may be stretched over any number of pages, not just the last one, which is a case especially when considering filtering. As a result, per-partition-limiting queries are not eligible for page generator optimization, because they may need to have their results locally filtered for extraneous rows (e.g. when the next page asks for per-partition limit 5, but we already received 4 rows from the last partition, so need just 1 more from last partition key, but 5 from all next ones). Tests: unit (dev) Fixes #2202 " * 'add_per_partition_limit_3' of https://github.com/psarna/scylla: tests: remove superficial ignore_order from filtering tests tests: add filtering with per partition key limit test tests: publish extract_paging_state and count_rows_fetched tests: fix order of parameters in with_rows_ignore_order cql3,grammar: add PER PARTITION LIMIT idl,service: add persistent last partition row count cql3: prevent page generator usage for per-partition limit cql3: add checking for previous partition count to filtering pager: add adjusting per-partition row limit cql3: obey per partition limit for filtering cql3: clean up unneeded limit variables cql3: obey per partition limit for select statement cql3: add get_per_partition_limit cql3: add per_partition_limit to CQL statement	2019-02-18 14:47:11 +00:00
Amnon Heiman	750b76b1de	scylla-housekeeping: Read JSON as UTF-8 string for older Python 3 compatibility Python 3.6 is the first version to accept bytes to the json.loads(), which causes the following error on older Python 3 versions: Traceback (most recent call last): File "/usr/lib/scylla/scylla-housekeeping", line 175, in <module> args.func(args) File "/usr/lib/scylla/scylla-housekeeping", line 121, in check_version raise e File "/usr/lib/scylla/scylla-housekeeping", line 116, in check_version versions = get_json_from_url(version_url + params) File "/usr/lib/scylla/scylla-housekeeping", line 55, in get_json_from_url return json.loads(data) File "/usr/lib64/python3.4/json/__init__.py", line 312, in loads s.__class__.__name__)) TypeError: the JSON object must be str, not 'bytes' To support those older Python versions, convert the bytes read to utf8 strings before calling the json.loads(). Fixes #4239 Branches: master, 3.0 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20190218112312.24455-1-amnon@scylladb.com>	2019-02-18 14:52:32 +02:00
Piotr Sarna	5ad5221ce1	tests: remove superficial ignore_order from filtering tests Testing filtering with LIMIT used with_rows_ignore_order function, while it's better to use simpler with_rows.	2019-02-18 11:06:44 +01:00
Piotr Sarna	5f67a501ec	tests: add filtering with per partition key limit test	2019-02-18 11:06:44 +01:00
Piotr Sarna	a84e237177	tests: publish extract_paging_state and count_rows_fetched These local lambda functions will be reused, so they are promoted to static functions.	2019-02-18 11:06:44 +01:00
Piotr Sarna	824e9dc352	tests: fix order of parameters in with_rows_ignore_order When reporting a failure, expected rows were mixed up with received rows. Also, the message assumed it received more rows, but it can as well be less, so now it reports a "different number" of rows.	2019-02-18 11:06:44 +01:00
Piotr Sarna	3e4f065847	cql3,grammar: add PER PARTITION LIMIT Select statements now allow passing PER PARTITION LIMIT (?) directive which will trim results for each partition accordingly.	2019-02-18 11:06:44 +01:00
Piotr Sarna	acf7bedad4	idl,service: add persistent last partition row count In order to process paged queries with per-partition limits properly, paging state needs to keep additional information: what was the row count of last partition returned in previous run. That's necessary because the end of previous page and the beginning of current one might consist of rows with the same partition key and we need to be able to trim the results to the number indicated by per-partition limit.	2019-02-18 11:06:44 +01:00
Piotr Sarna	3a2b004f02	cql3: prevent page generator usage for per-partition limit Paged queries that induce per-partition limits cannot use page generator optimization, as sometimes the results need to be filtered for extraneous rows on page breaks.	2019-02-18 11:06:44 +01:00
Piotr Sarna	1dadae212a	cql3: add checking for previous partition count to filtering Filtering now needs to take into account per partition limits as well, and for that it's essential to be able to compare partition keys and decide which rows should be dropped - if previous page(s) contained rows with the same partition key, these need to be taken into consideration too.	2019-02-18 11:06:43 +01:00
Piotr Sarna	82a3883575	pager: add adjusting per-partition row limit For filtering pagers, per partition limit should be set to page size every time a query is executed, because some rows may potentially get dropped from results.	2019-02-18 10:55:52 +01:00
Piotr Sarna	b965c3778f	cql3: obey per partition limit for filtering Filtering queries now take into account the limit of rows per single partition provided by the user.	2019-02-18 10:29:34 +01:00
Piotr Sarna	b3aa939cde	cql3: clean up unneeded limit variables Some places extracted a `limit` variable to be captured by lambdas, but they were not used inside them.	2019-02-18 10:29:34 +01:00
Piotr Sarna	cfb6e9c79c	cql3: obey per partition limit for select statement Select statement now takes into account the limit of rows per single partition provided by the user.	2019-02-18 10:29:34 +01:00
Piotr Sarna	41b466246e	cql3: add get_per_partition_limit	2019-02-18 10:29:34 +01:00
Piotr Sarna	93786a9148	cql3: add per_partition_limit to CQL statement Select statements can now accept per_partition_limit variable.	2019-02-18 10:29:34 +01:00
Gleb Natapov	b01a659014	storage_proxy: remove old Cassandra code Part of the code is already implemented (counters and hinted-handoff). Part of the code will probably never be (triggers). And the rest is the code that estimates number of rows per range to determine query parallelism, but we implemented exponential growth algorithms instead. Message-Id: <20190214112226.GE19055@scylladb.com>	2019-02-18 10:34:55 +02:00
Avi Kivity	a1567b0997	Merge "replace get_restricted_ranges() function with generator interface" from Gleb " get_restricted_ranges() is inefficient since it calculates all vnodes that cover a requested key ranges in advance, but callers often use only the first one. Replace the function with generator interface that generates requested number of vnodes on demand. " * 'gleb/query_ranges_to_vnodes_generator' of github.com:scylladb/seastar-dev: storage_proxy: limit amount of precaclulated ranges by query_ranges_to_vnodes_generator storage_proxy: remove old get_restricted_ranges() interface cql3/statements/select_statement: convert index query interface to new query_ranges_to_vnodes_generator interface tests: convert storage_proxy test to new query_ranges_to_vnodes_generator interface storage_proxy: convert range query path to new query_ranges_to_vnodes_generator interface storage_proxy: introduce new query_ranges_to_vnode_generator interface	2019-02-18 10:33:54 +02:00
Avi Kivity	497367f9f7	Revert "build: switch debug mode from -O0 to -Og" This reverts commit `e988521b89`. It triggers a bug int gcc variable tracking, and there are reports it significantly slows down compilation.	2019-02-17 18:32:28 +02:00
Nadav Har'El	05db7d8957	Materialized views: name the "batch_memory_max" constant Give the constant 1024*1024 introduced in an earlier commit a name, "batch_memory_max", and move it from view.cc to view_builder.hh. It now resides next to the pre-existing constant that controlled how many rows were read in each build step, "batch_size". Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190217100222.15673-1-nyh@scylladb.com>	2019-02-17 13:28:16 +00:00
Avi Kivity	7b411e30a9	Update seastar submodule * seastar 11546d4...2313dec (6): > Deprecate thread_scheduling_group in favor of scheduling_group > Merge "Fixes for Doxygen documentation" from Jesse > future: optionally type-erase future::then() and future::then_wrapped > build: Allow deprecated declarations internally > rpc: fix insertion of server connections into server's container > rpc: split BOOST_REQUIRE with long conditions into multiple	2019-02-16 22:27:34 +02:00
Avi Kivity	03531c2443	fragmented_temporary_buffer: fix read_exactly() during premature end-of-stream read_exactly(), when given a stream that does not contain the amount of data requested, will loop endlessly, allocating more and more memory as it does, until it fails with an exception (at which point it will release the memory). Fix by returning an empty result, like input_stream::read_exactly() (which it replaces). Add a test case that fails without a fix. Affected callers are the native transport, commitlog replay, and internal deserialization. Fixes #4233. Branches: master, branch-3.0 Tests: unit(dev) Message-Id: <20190216150825.14841-1-avi@scylladb.com>	2019-02-16 17:06:19 +00:00
Takuya ASADA	af988a5360	install-dependencies.sh: show description when 'yum-utils' package is installed on Fedora When yum-utils already installed on Fedora, 'yum install dnf-utils' causes conflict, will fail. We should show description message instead of just causing dnf error mesage. Fixes #4215 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20190215221103.2379-1-syuu@scylladb.com>	2019-02-16 17:16:18 +02:00
Pekka Enberg	f7cf04ac4b	tools/toolchain: Clean up DNF cache from Docker image Make sure we call "dnf clean all" to remove the DNF cache, which reduces Docker image size as per the following guidelines: https://github.com/fedora-cloud/Fedora-Dockerfiles/wiki/Guidelines-for-Creating-Dockerfiles A freshly built image is 250 MB smaller than the one on Docker Hub: <none> <none> b8cafc8ff557 16 seconds ago 1.2 GB docker.io/scylladb/scylla-toolchain fedora-29-20190212 d253d45a964c 3 days ago 1.45 GB Message-Id: <20190215142322.12466-1-penberg@scylladb.com>	2019-02-16 17:12:10 +02:00
Botond Dénes	2125e99531	service/storage_service: fix pre-bootstrap wait for schema agreement When bootstrapping, a node should to wait to have a schema agreement with its peers, before it can join the ring. This is to ensure it can immediately accept writes. Failing to reach schema agreement before joining is not fatal, as the node can pull unknown schemas on writes on-demand. However, if such a schema contains references to UDFs, the node will reject writes using it, due to #3760. To ensure that schema agreement is reached before joining the ring, `storage_service::join_token_ring()` has to checks. First it checks that at least one peer was connected previously. For this it compares `database::get_version()` with `database::empty_version`. The (implied) assumption is that this will become something other than `database::empty_version` only after having connected (and pulled schemas from) at least one peer. This assumption doesn't hold anymore, as we now set the version earlier in the boot process. The second check verifies that we have the same schema version as all known, live peers. This check assumes (since `3e415e2`) that we have already "met" all (or at least some) of our peers and if there is just one known node (us) it concludes that this is a single-node cluster, which automatically has schema agreement. It's easy to see how these two checks will fail. The first fails to ensure that we have met our peers, and the second wrongfully concludes that we are a one-node cluster, and hence have schema agreement. To fix this, modify the first check. Instead of relying on the presence of a non-empty database version, supposedly implying that we already talked to our peers, explicitely make sure that we have really talked to at least one other node, before proceeding to the second check, which will now do the correct thing, actually checking the schema versions. Fixes: #4196 Branches: 3.0, 2.3 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <40b95b18e09c787e31ba6c5519fb64d68b4ca32e.1550228389.git.bdenes@scylladb.com>	2019-02-15 15:56:46 +01:00
Rafael Ávila de Espíndola	9cd14f2602	Don't write to system.large_partition during shutdown The included testcase used to crash because during database::stop() we would try to update system.large_partition. There doesn't seem to be an order we can stop the existing services in cql_test_env that makes this possible. This patch then adds another step when shutting down a database: first stop updating system.large_partition. This means that during shutdown any memtable flush, compaction or sstable deletion will not be reflected in system.large_partition. This is hopefully not too bad since the data in the table is TTLed. This seems to impact only tests, since main.cc calls _exit directly. Tests: unit (release,debug) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190213194851.117692-1-espindola@scylladb.com>	2019-02-15 10:49:10 +01:00
Avi Kivity	e988521b89	build: switch debug mode from -O0 to -Og -Og is advertised as debug-friendly optimization, both in compile time and debug experience. It also cuts sstable_mutation_test run time in half: Changing -O0 to -Og Before: real 16m49.441s user 16m34.641s sys 0m10.490s After: real 8m38.696s user 8m26.073s sys 0m10.575s Message-Id: <20190214205521.19341-1-avi@scylladb.com>	2019-02-15 08:19:48 +02:00
Nadav Har'El	43c42d608d	materialized views: forbid using "virtual" columns in restrictions For fixing issue #3362 we added in materialized views, in some cases, "virtual columns" for columns which were not selected into the view. Although these columns nominally exist in the view's schema, they must not be visible to the user, and in commit `3f3a76aa8f` we prevented a user from being able to SELECT these columns. In this patch we also prevent the user from being able to use these column names (which shouldn't exist in the view) in WHERE restrictions. Fixes #4216 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190212162014.18778-1-nyh@scylladb.com>	2019-02-14 16:08:41 +02:00
Gleb Natapov	0b84b04f97	consistency_level: make it more const correct Message-Id: <20190214122631.GF19055@scylladb.com>	2019-02-14 14:52:51 +02:00
Nadav Har'El	fec562ec8f	Materialized views: limit size of row batching during bulk view building The bulk materialized-view building processes (when adding a materialized view to a table with existing data) currently reads the base table in batches of 128 (view_builder::batch_size) rows. This is clearly better than reading entire partitions (which may be huge), but still, 128 rows may grow pretty large when we have rows with large strings or blobs, and there is no real reason to buffer 128 rows when they are large. Instead, when the rows we read so far exceed some size threshold (in this patch, 1MB), we can operate on them immediately instead of waiting for 128. As a side-effect, this patch also solves another bug: At worst case, all the base rows of one batch may be written into one output view partition, in one mutation. But there is a hard limit on the size of one mutation (commitlog_segment_size_in_mb, by default 32MB), so we cannot allow the batch size to exceed this limit. By not batching further after 1MB, we avoid reaching this limit when individual rows do not reach it but 128 of them did. Fixes #4213. This patch also includes a unit test reproducing #4213, and demonstrating that it is now solved. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190214093424.7172-1-nyh@scylladb.com>	2019-02-14 12:04:40 +02:00
Calle Wilund	e70286a849	db/extensions: Allow schema extensions to turn themselves off Fixes #4222 Iff an extension creation callback returns null (not exception) we treat this as "I'm not needed" and simply ignore it. Message-Id: <20190213124311.23238-1-calle@scylladb.com>	2019-02-13 14:50:51 +02:00
Jesse Haber-Kucharsky	74ac1deee1	build: Fix the build on Ubuntu The way the `pkg-config` executable works on Fedora and Ubuntu is different, since on Fedora `pkg-config` is provided by the `pkgconf` project. In the build directory of Seastar, `seastar.pc` and `seastar-testing.pc` are generated. `seastar` is a requirement of `seastar-testing`. When pkg-config is invoked like this: pkg-config --libs build/release/seastar-testing.pc the version of `pkg-config` on Fedora resolves the reference to `seastar` in `Requires` to the `seastar.pc` in the same directory. However, the version of `pkg-config` on Ubuntu 18.04 does not: Package seastar was not found in the pkg-config search path. Perhaps you should add the directory containing `seastar.pc' to the PKG_CONFIG_PATH environment variable Package 'seastar', required by '/seastar-testing', not found To address the divergent behavior, we set the `PKG_CONFIG_PATH` variable to point to the directory containing `seastar.pc`. With this change, I was able to configure Scylla on both Fedora 29 and Ubuntu 18.04. Fixes #4218 Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <d7164bde2790708425ac6761154d517404818ecd.1550002959.git.jhaberku@scylladb.com>	2019-02-13 13:33:50 +02:00
Avi Kivity	2915baeff4	Merge "Move truncation records to separate table" from Calle " Fixes #4083 Instead of sharded collection in system.local, use a dedicated system table (system.truncated) to store truncation positions. Makes query/update easier and easier on the query memory. The code also migrates any existing truncation positions on startup and clears the old data. " * 'calle/truncation' of github.com:scylladb/seastar-dev: truncation_migration_test: Add rudimentary test system_keyspace: Add waitable for trunc. migration cql_test_env: Add separate config w. feature disable cql_test_env: Add truncation migration to init cql_assertions: Add null/non-null tests storage_service: Add features disabling for tests Add system.truncated documentation in docs commitlog_replay: Use dedicated table for truncation storage_service: Add "truncation_table" feature	2019-02-13 11:16:30 +02:00
Calle Wilund	2e320a456c	truncation_migration_test: Add rudimentary test	2019-02-13 09:08:12 +00:00
Calle Wilund	4e657c0633	system_keyspace: Add waitable for trunc. migration For tests. Hooray for separation of concern.	2019-02-13 09:08:12 +00:00
Calle Wilund	b253757b17	cql_test_env: Add separate config w. feature disable	2019-02-13 09:08:12 +00:00
Calle Wilund	859a1d8f36	cql_test_env: Add truncation migration to init	2019-02-13 09:08:12 +00:00
Calle Wilund	fbcbe529ad	cql_assertions: Add null/non-null tests	2019-02-13 09:08:12 +00:00
Calle Wilund	64e8c6f31d	storage_service: Add features disabling for tests	2019-02-13 09:08:12 +00:00
Calle Wilund	7d3867e153	Add system.truncated documentation in docs	2019-02-13 09:08:12 +00:00
Calle Wilund	12ebcf1ec7	commitlog_replay: Use dedicated table for truncation Fixes #4083 Instead of sharded collection in system.local, use a dedicated system table (system.truncated) to store truncation positions. Makes query/update easier and easier on the query memory. The code also migrates any existing truncation positions on startup and clears the old data.	2019-02-13 09:08:12 +00:00
Calle Wilund	ff5e541335	storage_service: Add "truncation_table" feature	2019-02-13 09:08:12 +00:00
Avi Kivity	a3de5581ce	Update seastar submodule * seastar 428f4ac...11546d4 (9): > reactor: Fix an infinite loop caused the by high resolution timer not being monitored > build: Add back `SEASTAR_SHUFFLE_TASK_QUEUE` > build: Unify dependency versions > future-util: optimize parallel_for_each() with single element > core/sharded.hh: fix doxygen for "Multicore" group > build: switch from travis-ci to circleci > perftune.py: fix irqbalance tuning on Ubuntu 18 > build: Make the use of sanitizers transitive > net: ipv6: fix ipv6 detection and tests by binding to loopback	2019-02-12 18:42:07 +02:00
Avi Kivity	c7aa73af51	Merge "Automatically pause shard readers when not used" from Botond " Recently, there has been a series of incidents of the multishard combining reader deadlocking, when the concurrency of reads were severely restricted and there was no timeout for the read. Several fixes have been merged (`414b14a6b`, `21b4b2b9a`, `ee193f1ab`, `170fa382f`) but eliminating all occurrences of deadlocks proved to be a whack-a-mole game. After the last bug report I have decided that instead of trying to plug new wholes as we find them, I'll try to make wholes impossible to appear in the first place. To translate this into the multishard reader, instead of sprinkling new `reader.pause()` calls all over the place in the multishard reader to solve the newly found deadlocks, make the pausing of readers fully automatic on the shard reader level. Readers are now always kept in a paused state, except when actually used. This eliminates the entire class of deadlock bugs. This patch-set also aims at simplifying the multishard reader code, as well as the code of the existing `lifecycle_policy` implementations. This effort resulted in: * mutation_reader.cc: no change in SLOC, although it now also contains logic that used to be duplicated in every `lifecycle_policy` implementation; * multishard_mutation_query.cc: 150 SLOC removed; * database.cc: 30 SLOC removed; Also the code is now (hopefully) simpler, safer and has a clearer structure. Fixes #4050 (main issue) Fixes #3970 Fixes #3998 (deprecates really) " * 'simplify-and-fix-multishard-reader/v3.1' of https://github.com/denesb/scylla: query_mutations_on_all_shards(): make states light-weight query_mutations_on_all_shards(): get rid of read_context::paused_reader query_mutations_on_all_shards(): merge the dismantling and ready_to_save states into saving state query_mutations_on_all_shards(): pause looked-up readers query_mutation_on_all_shards(): remove unecessary indirection shard_reader: auto pause readers after being used reader_concurrency_semaphore::inactive_read_handle: fix handle semantics shard_reader: make reader creation sync shard_reader: use semaphore directly to pause-resume shard_reader: recreate_reader(): fix empty range case foreign_reader: rip out the now unused private API shard_reader: move away from foreign_reader multishard_combining_reader: make shard_reader a shared pointer multishard_combining_reader: move the shard reader definition out multishard_combining_reader: disentangle shard_reader	2019-02-12 16:22:52 +02:00
Botond Dénes	db106a32c8	query_mutations_on_all_shards(): make states light-weight Previously the different states a reader can be in were all separate structs, and were joined together by a variant. When this was designed this made sense as states were numerous and quite different. By this point however the number of states has been reduced to 4, with 3 of them being almost the same. Thus it makes sense to merge these states into single struct and keep track of the current state with an enum field. This can theoretically increase the chances of mistakes, but in practice I expect the opposite, due to the simpler (and less) code. Also, all the important checks that verify that a reader is in the state expected by the code are all left in place. A byproduct of this change is that the amount of cross-shard writes is greatly reduced. Whereas previously the whole state object had to be rewritten on state change, now a single enum value has to be updated. Cross shard reads are reduced as well to the read of a few foreign pointers, all state-related data is now kept on the shard where the associated reader lives.	2019-02-12 16:20:51 +02:00
Botond Dénes	65b2eb0939	query_mutations_on_all_shards(): get rid of read_context::paused_reader	2019-02-12 16:20:51 +02:00
Botond Dénes	ec44a4dbb1	query_mutations_on_all_shards(): merge the dismantling and ready_to_save states into saving state These two states are now the same, with the artificial distinction that all readers are promoted to readey_to_save state after the compaction state and the combined buffer is dismantled. From a practical perspective this distinction is meaningless so merge the two states into a single `saving` state.	2019-02-12 16:20:51 +02:00
Botond Dénes	9a1bd24d82	query_mutations_on_all_shards(): pause looked-up readers On the beginning of each page, all saved readers from the previous pages (if any) are looked up, so they can be reused. Some of these saved readers can end up not being used at all for the current page, in which case they will needlessly sit on their permit for the duration of filling the page. Avoid this by immediately pausing all looked-up readers. This also allows a nice unifying of the reader saving logic, as now all readers will be in a paused state when `save_reader()` is called. Previously, looked-up, but not used readers were an exception to this, requiring extra logic to handle both cases. This logic can now be removed.	2019-02-12 16:20:51 +02:00

1 2 3 4 5 ...

18001 Commits