scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-01 21:55:50 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	17c66224f7	tests: Check that partition is not resurrected on compaction failure Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-10-13 00:06:51 -03:00
Raphael S. Carvalho	6301a10fd7	tests: Add sstable compaction test for gc-only mutation compactor consumer Make sure gc'able-tombstone-only sstable is properly generated with data that comes from regular compaction's input sstable. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-10-12 21:38:53 -03:00
Raphael S. Carvalho	91260cf91b	sstables: Fix Incremental Compaction Efficiency Compaction prevents data resurrection from happening by checking that there's no way a data shadowed by a GC'able tombstone will survive alone, after a failure for example. Consider the following scenario: We have two runs A and B, each divided to 5 fragments, A1..A5, B1..B5. They have the following token ranges: A: A1=[0, 3] A2=[4, 7] A3=[8, 11] A4=[12, 15] A5=[16,18] B is the same as A's ranges, offset by 1: B: B1=[1,4] B2=[5,8] B3=[9,12] B4=[13,16] B5=[17,19] Let's say we are finished flushing output until position 10 in the compaction. We are currently working on A3 and B3, so obviously those cannot be deleted. Because B2 overlaps with A3, we cannot delete B2 either. Otherwise, B2 could have a GC'able tombstone that shadows data in A3, and after B2 is gone, dead data in A3 could be resurrected on failure. Now, A2 overlaps with B2 which we couldn't delete yet, so we can't delete A2. Now A2 overlaps with B1 so we can't delete B1. And B1 overlaps with A1 so we can't delete A1. So we can't delete any fragment. The problem with this approach is obvious, fragments can potentially not be released due to data dependency, so incremental compaction efficiency is severely reduced. To fix it, let's not purge GC'able tombstones right away in the mutation compactor step. Instead, let's have compaction writing them to a separate sstable run that would be deleted in the end of compaction. By making sure that tombstone information from all compacting sstables is not lost, we no longer need to have incremental compaction imposing lots of restriction on which fragments could be released. Now, any sstable which data is safe in a new sstable can be released right away. In addition, incremental compaction will only take place if compaction procedure is working with one multi-fragment sstable run at least. Fixes #4531. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2019-10-12 21:36:03 -03:00
Kamil Braun	ef9d5750c8	view: fix bug in virtual columns. When creating a virtual column of non-frozen map type, the wrong type was used for the map's keys. Fixes #5165.	2019-10-11 20:47:06 +03:00
Avi Kivity	f12feec2c9	Update seastar submodule * seastar 1f68be436f...e888b1df9c (8): > sharded: Make map work with mapper that returns a future > cmake: Remove FindBoost.cmake > Reduce noncopyable_function instruction cache footprint > doc: add Loops section to the tutorial > Merge "Move file related code out of reactor" from Asias > Merge "Move the io_queue code out of reactor" from Asias > cmake: expose seastar_perf_testing lib > future: class doc: explain why discarding a future is bad - main.cc now includes new file io_queue.hh - perf tests now include seastar perf utilities via user, not system, includes since those are not exported	2019-10-10 18:17:28 +03:00
Nadav Har'El	33027a36b4	alternator: Add authorization Merged patch set from Piotr Sarna: Refs #5046 This commit adds handling "Authorization:" header in incoming requests. The signature sent in the authorization is recomputed server-side and compared with what the client sent. In case of a mismatch, UnrecognizedClientException is returned. The signature computation is based on boto3 Python implementation and uses gnutls to compute HMAC hashes. This series is rebased on a previous HTTPS series in order to ease merging these two. As such, it depends on the HTTPS series being merged first. Tests: alternator(local, remote) The series also comes with a simple authorization test and a docs update. Piotr Sarna (6): alternator: migrate split() function to string_view alternator: add computing the auth signature config: add alternator_enforce_authorization entry alternator: add verifying the auth signature alternator-test: add a basic authorization test case docs: update alternator authorization entry alternator-test/test_authorization.py \| 34 ++++++++ configure.py \| 1 + alternator/{server.hh => auth.hh} \| 22 ++--- alternator/server.hh \| 3 +- db/config.hh \| 1 + alternator/auth.cc \| 88 ++++++++++++++++++++ alternator/server.cc \| 112 +++++++++++++++++++++++--- db/config.cc \| 1 + main.cc \| 2 +- docs/alternator/alternator.md \| 7 +- 10 files changed, 241 insertions(+), 30 deletions(-) create mode 100644 alternator-test/test_authorization.py copy alternator/{server.hh => auth.hh} (58%) create mode 100644 alternator/auth.cc	2019-10-10 15:57:46 +03:00
Nadav Har'El	df62499710	docs/isolation.md: copy-edit Minor spelling and syntax corrections. No new content or semantic changes. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191010093457.20439-1-nyh@scylladb.com>	2019-10-10 15:17:28 +03:00
Piotr Dulikowski	c04e8c37aa	distributed_loader: populate non-system keyspaces in parallel Before this change, when populating non-system keyspaces, each data directory was scanned and for each entry (keyspace directory), a keyspace was populated. This was done in a serial fashion - populating of one keyspace was not started until the previous one was done. Loading keyspaces in such fashion can introduce unnecessary waiting in case of a large number of keyspaces in one data directory. Population process is I/O intensive and barely uses CPU. This change enables parallel loading of keyspaces per data directory. Populating the next keyspace does not wait for the previous one. A benchmark was performed measuring startup time, with the following setup: - 1 data directory, - 200 keyspaces, - 2 tables in each keyspace, with the following schema: CREATE TABLE tbl (a int, b int, c int, PRIMARY KEY(a, b)) WITH CLUSTERING ORDER BY (b DESC), - 1024 rows in each table, with values (i, 2i, 3i) for i in 0..1023, - ran on 6-core virtual machine running on i7-8750H CPU, - compiled in dev mode, - parameters: --smp 6 --max-io-requests 4 --developer-mode=yes --datadir $DIR --commitlog-directory $DIR --hints-directory $DIR --view-hints-directory $DIR The benchmark tested: - boot time, by comparing timestamp of the first message in log, and timestamp of the following message: "init - Scylla version ... initialization completed." - keyspace population time, by comparing timestamps of messages: "init - loading non-system sstables" and "init - starting view update generator" The benchmark was run 5 times for sequential and parallel version, with the following results: - sequential: boot 31.620s, keyspace population 6.051s - parallel: boot 29.966s, keyspace population 4.360s Keyspace population time decreased by ~27.95%, and overall boot time by about ~5.23%. Tests: unit(release) Fixes #2007	2019-10-10 15:12:23 +03:00
Piotr Sarna	6ca55d3c83	docs: update alternator authorization entry The entry now contains a comment that computing a signature works, but is still based on a hardcoded key.	2019-10-10 13:51:00 +02:00
Piotr Sarna	23798b7301	alternator-test: add a basic authorization test case The test case ensures that passing wrong credential results in getting an UnrecognizedClientException.	2019-10-10 13:51:00 +02:00
Piotr Sarna	97cbb9a2c7	alternator: add verifying the auth signature The signature sent in the "Authorization:" header is now verified by computing the signature server-side with a matching secret key and confirming that the signatures match. Currently the secret key is hardcoded to be "whatever" in order to work with current tests, but it should be replaced by a proper key store. Refs #5046	2019-10-10 13:51:00 +02:00
Piotr Sarna	e245b54502	config: add alternator_enforce_authorization entry The config entry will be used to turn authorization for alternator requests on and off. The default is currently off, since the key store is not implemented yet.	2019-10-10 13:51:00 +02:00
Piotr Sarna	589a22d078	alternator: add computing the auth signature A function for computing the auth signature from user requests is added, along with helper functions. The implementation is based on gnutls's HMAC. Refs #5046	2019-10-10 13:51:00 +02:00
Piotr Sarna	ca58b46b4c	alternator: migrate split() function to string_view The implementation of string split was based on sstring type for simplicity, but it turns out that more generic std::string_view will be beneficial later to avoid unneeded string copying. Unfortunately boost::split does not cooperate well with string views, so a simple manual implementation is provided instead.	2019-10-10 13:50:59 +02:00
Botond Dénes	52afbae1e5	README.md: add links to other documentation sources Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191010103926.34705-3-bdenes@scylladb.com>	2019-10-10 14:15:01 +03:00
Botond Dénes	e52712f82c	docs: add README.md Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191010103926.34705-2-bdenes@scylladb.com>	2019-10-10 14:14:09 +03:00
Amnon Heiman	64c2d28a7f	database: Add counter for the number of schema changes Schema changes can have big effects on performance, typically it should be a rare event. It is usefull to monitor how frequently the schema changed. This patch adds a counter that increases each time a schema changed. After this patch the metrics would look like: scylla_database_schema_changed{shard="0",type="derive"} 2 Fixes #4785 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2019-10-08 17:54:49 +02:00
Asias He	b89ced4635	streaming: Do not open rpc stream connection if reader has no data We can use the reader::peek() to check if the reader contains any data. If not, do not open the rpc stream connection. It helps to reduce the port usage. Refs: #4943	2019-10-08 10:31:02 +02:00
Konstantin Osipov	94006d77b1	lwt: add cas_contention_timeout_in_ms to config Make the default conform to the origin. Message-Id: <20191006154532.54856-3-kostja@scylladb.com>	2019-10-08 00:02:35 +02:00
Konstantin Osipov	383e17162a	lwt: implement query_options::check_serial_consistency() Both in a single-statement transaction and in a batch we expect that serial consistency is provided. Move the check to query_options class and make it available for reuse. Keep get_serial_consistency() around for use in transport/server.cc. Message-Id: <20191006154532.54856-2-kostja@scylladb.com>	2019-10-08 00:02:35 +02:00
Piotr Sarna	36a1905e98	storage_proxy: handle unstarted write cancelling When another node is reported to be down, view updates queued for it are cancelled, but some of them may already be initiated. Right now, cancelling such a write resulted in an exception, but on conceptual level it's not really an exception, since this behaviour is expected. Previous version of this patch was based on introducing a special exception type that was later handled specially, but it's not clear if it's a good direction. Instead, this patch simply makes this path non-exceptional, as was originally done by Nadav in the first version of the series that introduced handling unstarted write cancellations. Additionally, a message containing the information that a write is cancelled is logged with debug level.	2019-10-07 16:55:36 +03:00
Vladimir Davydov	e8bcb34ed4	api: drop /storage_proxy/metrics/cas_read/condition_not_met There's no such metric in Cassandra (although Cassadra's docs mistakenly say it exists). Having it would make no sense anyway so let's drop it. Message-Id: <b4f7a6ad278235c443cb8ea740bfa6399f8e4ee1.1570434332.git.vdavydov@scylladb.com>	2019-10-07 16:54:39 +03:00
Piotr Sarna	5ab134abef	alternator-test: update HTTPS section of README README.md has 3 fixes applied: - s/alternator_tls_port/alternator_https_port - conf directory is mentioned more explicitly - it now correctly states that the self-signed certificate warning is explicitly ignored in tests Message-Id: <e5767f7dbea260852fc2fa9b613e1bebf490cc78.1570444085.git.sarna@scylladb.com>	2019-10-07 14:51:16 +03:00
Avi Kivity	8ed6f94a16	Merge "Fix handling of schema alters and eviction in cache" from Tomasz " Fixes #5134, Eviction concurrent with preempted partition entry update after memtable flush may allow stale data to be populated into cache. Fixes #5135, Cache reads may miss some writes if schema alter followed by a read happened concurrently with preempted partition entry update. Fixes #5127, Cache populating read concurrent with schema alter may use the wrong schema version to interpret sstable data. Fixes #5128, Reads of multi-row partitions concurrent with memtable flush may fail or cause a node crash after schema alter. " * tag 'fix-cache-issues-with-schema-alter-and-eviction-v2' of github.com:tgrabiec/scylla: tests: row_cache: Introduce test_alter_then_preempted_update_then_memtable_read tests: row_cache_stress_test: Verify all entries are evictable at the end tests: row_cache_stress_test: Exercise single-partition reads tests: row_cache_stress_test: Add periodic schema alters tests: memtable_snapshot_source: Allow changing the schema tests: simple_schema: Prepare for schema altering row_cache: Record upgraded schema in memtable entries during update memtable: Extract memtable_entry::upgrade_schema() row_cache, mvcc: Prevent locked snapshots from being evicted row_cache: Make evict() not use invalidate_unwrapped() mvcc: Introduce partition_snapshot::touch() row_cache, mvcc: Do not upgrade schema of entries which are being updated row_cache: Use the correct schema version to populate the partition entry delegating_reader: Optimize fill_buffer() row_cache, memtable: Use upgrade_schema() flat_mutation_reader: Introduce upgrade_schema()	2019-10-07 14:43:36 +03:00
Nadav Har'El	f2f0f5eb0f	alternator: add https support Merged patch series from Piotr Sarna: This series adds HTTPS support for Alternator. The series comes with --https option added to alternator-test, which makes the test harness run all the tests with HTTPS instead of HTTP. All the tests pass, albeit with security warnings that a self-signed x509 certificate was used and it should not be trusted. Fixes #5042 Refs scylladb/seastar#685 Patches: docs: update alternator entry on HTTPS alternator-test: suppress the "Unverified HTTPS request" warning alternator-test: add HTTPS info to README.md alternator-test: add HTTPS to test_describe_endpoints alternator-test: add --https parameter alternator: add HTTPS support config: add alternator HTTPS port	2019-10-07 12:38:20 +03:00
Avi Kivity	969113f0c9	Update seastar submodule * seastar c21a7557f9...1f68be436f (6): > scheduling: Add per scheduling group data support > build: Include dpdk as a single object in libseastar.a > sharded: fix foreign_ptr's move assignment > build: Fix DPDK libraries linking in pkg-config file > http server: https using tls support > Make output_stream blurb Doxygen	2019-10-07 12:18:49 +03:00
Nadav Har'El	754add1688	alternator: fix Expected's BEGINS_WITH error handling The BEGINS_WITH condition in conditional updates (via Expected) requires that the given operand be either a string or a binary. Any other operand should result in a validation exception - not a failed condition as we generate now. This patch fixes the test for this case so it will succeed against Amazon DynamoDB (before this patch it fails - this failure was masked by a typo before commit `332ffa77ea`). The patch then fixes our code to handle this case correctly. Note that BEGINS_WITH handling of wrong types is now asymmetrical: A bad type in the operand is now handled differently from a bad type in the attribute's value. We add another check to the test to verify that this is the case. Fixes #5141 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20191006080553.4135-1-nyh@scylladb.com>	2019-10-06 17:16:55 +03:00
Tomasz Grabiec	020a537ade	tests: row_cache: Introduce test_alter_then_preempted_update_then_memtable_read	2019-10-04 11:38:13 +02:00
Tomasz Grabiec	ebedefac29	tests: row_cache_stress_test: Verify all entries are evictable at the end	2019-10-04 11:38:12 +02:00
Tomasz Grabiec	1b95f5bf60	tests: row_cache_stress_test: Exercise single-partition reads make_single_key_reader() currently doesn't actually create single-partition readers because it doesn't set mutation_reader::forwarding::no when it creates individual readers. The readers will default to mutation_reader::forwarding::yes and actually create scanning readers in preparation for fast-forwarding across partitions. Fix by passing mutation_reader::forwarding::no.	2019-10-04 11:38:12 +02:00
Tomasz Grabiec	81dd17da4e	tests: row_cache_stress_test: Add periodic schema alters Reproduces #5127.	2019-10-03 22:03:29 +02:00
Tomasz Grabiec	2fc144e1a8	tests: memtable_snapshot_source: Allow changing the schema	2019-10-03 22:03:29 +02:00
Tomasz Grabiec	22dde90dba	tests: simple_schema: Prepare for schema altering Currently, methods of simple_schema assume that table's schema doesn't change. Accessors like get_value() assume that rows were generated using simple_schema::_s. Because if that, the column_definition& for the "v" column is cached in the instance. That column_definiion& cannot be used to access objects created with a different schema version. To allow using simple_schema after schema changes, column_definition& caching is now tagged with the table schema version of origin. Methods which access schema-dependent objects, like get_value(), are now accepting schema& corresponding to the objects. Also, it's now possible to tell simple_schema to use a different schema version in its generator methods.	2019-10-03 22:03:29 +02:00
Tomasz Grabiec	e6afc89735	row_cache: Record upgraded schema in memtable entries during update Cache update may defer in the middle of moving of partition entry from a flushed memtable to the cache. If the schema was changed since the entry was written, it upgrades the schema of the partition_entry first but doesn't update the schema_ptr in memtable_entry. The entry is removed from the memtable afterward. If a memtable reader encounters such an entry, it will try to upgrade it assuming it's still at the old schema. That is undefined behavior in general, which may include: - read failures due to bad_alloc, if fixed-size cells are interpreted as variable-sized cells, and we misinterpret a value for a huge size - wrong read results - node crash This doesn't result in a permanent corruption, restarting the node should help. It's the more likely to happen the more rows there are in a partition. It's unlikely to happen with single-row partitions. Introduced in `70c7277`. Fixes #5128.	2019-10-03 22:03:29 +02:00
Tomasz Grabiec	ea461a3884	memtable: Extract memtable_entry::upgrade_schema()	2019-10-03 22:03:29 +02:00
Tomasz Grabiec	90d6c0b9a2	row_cache, mvcc: Prevent locked snapshots from being evicted If the whole partition entry is evicted while being updated from the memtable, a subsequent read may populate the partition using the old version of data if it attempts to do it before cache update advances past that partition. Partial eviction is not affected because populating reads will notice that there is a newer snapshot corresponding to the updater. This can happen only in OOM situations where the whole cache gets evicted. Affects only tables with multi-row partitions, which are the only ones that can experience the update of partition entry being preempted. Introduced in `70c7277`. Fixes #5134.	2019-10-03 22:03:29 +02:00
Tomasz Grabiec	57a93513bd	row_cache: Make evict() not use invalidate_unwrapped() invalidate_unwrapped() calls cache_entry::evict(), which cannot be called concurrently with cache update. invalidate() serializes it properly by calling do_update(), but evict() doesn't. The purpose of evict() is to stress eviction in tests, which can happen concurrently with cache update. Switch it to use memory reclaimer, so that it's both correct and more realistic. evict() is used only in tests.	2019-10-03 22:03:28 +02:00
Tomasz Grabiec	c88a4e8f47	mvcc: Introduce partition_snapshot::touch()	2019-10-03 22:03:28 +02:00
Tomasz Grabiec	25e2f87a37	row_cache, mvcc: Do not upgrade schema of entries which are being updated When a read enters a partition entry in the cache, it first upgrades it to the current schema of the cache. The same happens when an entry is updated after a memtable flush. Upgrading the entry is currently performed by squashing all versions and replacing them with a single upgraded version. That has a side effect of detaching all snapshots from the partition entry. Partition entry update on memtable flush is writing into a snapshot. If that snapshot is detached by a schema upgrade, the entry will be missing writes from the memtable which fall into continuous ranges in that entry which have not yet been updated. This can happen only if the update of the entry is preempted and the schema was altered during that, and a read hit that partition before the update went past it. Affects only tables with multi-row partitions, which are the only ones that can experience the update of partition entry being preempted. The problem is fixed by locking updated entries and not upgrading schema of locked entries. cache_entry::read() is prepared for this, and will upgrade on-the-fly to the cache's schema. Fixes #5135	2019-10-03 22:03:28 +02:00
Tomasz Grabiec	0675088818	row_cache: Use the correct schema version to populate the partition entry The sstable reader which populates the partition entry in the cache is using the schema of the partition entry snapshot, which will be the schema of the cache at the time the partition was entered. If there was a schema change after the cache reader entered the partition but before it created the sstable reader, the cache populating reader will interpret sstable fragments using the wrong schema version. That is more likely if partitions have many rows, and the front of the partition is populated. With single-row partitions that's unlikely to happen. That is undefined behavior in general, which may include: - read failures due to bad_alloc, if fixed-size cells are interpreted as variable-sized cells, and we misinterpret a value for a huge size - wrong read results - node crash This doesn't result in a permanent corruption, restarting the node should help. Fixes #5127.	2019-10-03 22:03:28 +02:00
Tomasz Grabiec	10992a8846	delegating_reader: Optimize fill_buffer() Use move_buffer_content_to() which is faster than fill_buffer_from() because it doesn't involve popping and pushing the fragments across buffers. We save on size estimation costs.	2019-10-03 22:03:28 +02:00
Piotr Sarna	07ac3ea632	docs: update alternator entry on HTTPS The HTTPS entry is updated - it's now supported, but still misses the same features as HTTP - CRC headers, etc.	2019-10-03 19:10:30 +02:00
Piotr Sarna	b63077a8dc	alternator-test: suppress the "Unverified HTTPS request" warning Running with --https and a self-signed certificate results in a flood of expected warnings, that the connection is not to be trusted. These warnings are silenced, as users runing a local test with --https usually use self-signed certificates.	2019-10-03 19:10:30 +02:00
Piotr Sarna	e65fd490da	alternator-test: add HTTPS info to README.md A short paragraph about running tests with `--https` and configuring the cluster to work correctly with this parameter is added to README.md.	2019-10-03 19:10:30 +02:00
Piotr Sarna	0d28d7f528	alternator-test: add HTTPS to test_describe_endpoints The test_describe_endpoints test spawns another client connection to the cluster, so it needs to be HTTPS-aware in order to work properly with --https parameter.	2019-10-03 19:10:30 +02:00
Piotr Sarna	9fd77ed81d	alternator-test: add --https parameter Running with --https parameter will result in sending the requests via HTTPS instead of HTTP. By default, port 8043 is used for a local cluster. Before running pytest --https, make sure that Scylla was properly configured to initialize a HTTPS alternator server by providing the alternator_tls_port parameter. The HTTPS-based connection runs with verification disabled, otherwise it would not work with self-signed certificates, which are useful for tests.	2019-10-03 19:10:30 +02:00
Piotr Sarna	e1b0537149	alternator: add HTTPS support By providing a server based on a TLS socket, it's now possible to serve HTTPS requests in alternator. The HTTPS server is enabled by setting its port in scylla.yaml: alternator_tls_port=XXXX. Alternator TLS relies on the existing TLS configuration, which is provided by certificate, keyfile, truststore, priority_string options. Fixes #5042	2019-10-03 19:10:30 +02:00
Piotr Sarna	b42eb8b80a	config: add alternator HTTPS port The config variable will be used to set up a TLS-based server for serving alternator HTTPS requests.	2019-10-03 19:10:29 +02:00
Nadav Har'El	9d4e71bbc6	alternator-test: fix misleading xfail message The test test_update_expression_function_nesting() fails because DynamoDB don't allow an expression like list_append(list_append(:val1, :val2), :val3) but Alternator doesn't check for this (and supports this expression). The "xfail" message was outdated, suggesting that the test fails because the "SET" expression isn't supported - but it is. So replace the message by a more accurate one. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190915104708.30471-1-nyh@scylladb.com>	2019-10-03 18:45:03 +03:00
Nadav Har'El	9747019e7b	alternator: implement additional Expected operators Merged patch set from Dejan Mircevski implementing some of the missing operators for Expected: NE, IN, NULL and NOT_NULL. Patches: alternator: Factor out Expected operand checks alternator: Implement NOT_NULL operator in Expected alternator: Implement NULL operator in Expected alternator: Fix expected_1_null testcase alternator: Implement IN operator in Expected alternator: Implement NE operator in Expected alternator: Factor out common code in Expected	2019-10-03 18:12:38 +03:00

1 2 3 4 5 ...

19784 Commits