Commit Graph

16367 Commits

Author SHA1 Message Date
Tomasz Grabiec
75cde85349 Merge "Support reading range tombstones" from Piotr and Vladimir
Implement and test support for reading range tombstones in SSTables 3.

Does not yet support reads which are using slicing or fast forwarding.

From github.com/scylladb/seastar-dev.git haaawk/sstables3/tombstones_v11:

Piotr Jastrzebski (5):
  sstables: Add consumer_m::consume_range_tombstone
  sstables: Support null columns in ck
  sstables: Support reading range_tombstones
  sstables: Test reading range_tombstones
  sstables: Add test for RT with non-full key

Vladimir Krivopalov (2):
  sstables: Add operator<< overload for bound_kind_m.
  keys: Add clustering_key_prefix::make_full helper.
2018-08-27 20:43:38 +02:00
Avi Kivity
5792a59c96 migration_manager: downgrade frightening "Can't send migration request" ERROR
This error is transient, since as soon as the node is up we will be able
to send the migration request.  Downgrade it to a warning to reduce anxiety
among people who actually read the logs (like QA).

The message is also badly worded as no one can guess what a migration
request is, but that is left to another patch.

Fixes #3706.
Message-Id: <20180821070200.18691-1-avi@scylladb.com>
2018-08-27 14:49:36 +02:00
Takuya ASADA
10b67c7934 dist/ami: package scylla-ami as rpm
Now scylla-ami is not submodule of scylla repo, it will works as
independent repository just like scylla-jmx and scylla-tools, provides
.rpm package to install AMI scripts on AMI.

Most files are gone from dist/ami/files, but scylla_install_ami copied
from scylla-ami, since it requires to install scylla .rpms, cannot
pacakge in scylla-ami rpm.

On scylla_install_ami, we dropped ixgbevf/ena drivers code, we will
provide 'scylla-ixgbevf' and 'scylla-ena' DKMS .rpm instead.
It will automatically build kernel modules for current kernel.

A repo of the driver packages is on
https://copr.fedorainfracloud.org/coprs/scylladb/scylla-ami-drivers/

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180821201101.4631-1-syuu@scylladb.com>
2018-08-27 11:48:52 +03:00
Avi Kivity
62750eb517 Merge "Prepare for removing Iterator from simple_memory_input_stream" from Paweł
"
Right now, simple_memory_input_stream takes Iterator as a template
parameter. That iterator is supposed to point to fragments in a
underlying fragmented buffer. This makes no sense, since simple streams
deal only with contiguous buffer.

This series removes any assumption that simple_memory_input_stream has
iterator_type member from Scylla so that it can be removed.
"

* tag 'prepare-simple-stream-no-iterator/v1' of https://github.com/pdziepak/scylla:
  idl: deserialized_bytes_proxy do not assume presence of iterator_type
  idl-compiler: specify return type of with_serialized_stream() lambdas
2018-08-26 16:29:06 +03:00
Avi Kivity
16478355be Merge "Refactor password handling" from Jesse
"
This series is a refactor of password management, motivated by a
combination of correctness bugs, improving testability, improving
clarity, and adding documentation.

Tests: unit (release)
"

* 'jhk/passwords_refactor/v2' of https://github.com/hakuch/scylla:
  auth: Clean up implementation comments
  auth: Remove unnecessary local variable
  auth: Allow different random engines for salt
  auth: Correct modulo bias in salt generation
  auth: Extract random byte generation for salt
  auth: Split out test for best supported scheme
  auth: Rename function to use full words
  auth: Add domain-specific exception for passwords
  auth: Document passwords interface
  auth: Move passsword stuff to its own namespace
  auth: Identify password hashing errors correctly
  auth: Add unit tests for password handling
  auth: Move password handling to its own files
  auth: Construct `std::random_device` instances once
2018-08-26 11:18:31 +03:00
Tomasz Grabiec
2afce13967 database: Avoid OOM when soft pressure but nothing to flush
There could be soft pressure, but soft-pressure flusher may not be
able to make progress (Refs #3716). It will keep trying to flush empty
memtables, which block on earlier flushes to complete, and thus
allocate continuations in memory. Those continuations accumulate in
memory and can cause OOM.

flush will take longer to complete. Due to scheduling group isolation,
the soft-pressure flusher will keep getting the CPU.

This causes bad_alloc and crashes of dtest:
limits_test.py:TestLimits.max_cells_test

Fixes #3717

Message-Id: <1535102520-23039-1-git-send-email-tgrabiec@scylladb.com>
2018-08-26 11:03:58 +03:00
Tomasz Grabiec
1e50f85288 database: Make soft-pressure memtable flusher not consider already flushed memtables
The flusher picks the memtable list which contains the largest region
according to region_impl::evictable_occupancy().total_space(), which
follows region::occupancy().total_space(). But only the latest
memtable in the list can start flushing. It can happen that the
memtable corresponding to the largest region was already flushed to an
sstable (flush permit released), but not yet fsynced or moved to
cache, so it's still in the memtable list.

The latest memtable in the winning list may be small, or empty, in
which case the soft pressure flusher will not be able to make much
progress. There could be other memtable lists with non-empty
(flushable) latest memtables. This can lead to writes unnecessarily
blocking on dirty.

I observed this for the system memtable group, where it's easy for the
memtables to overshoot small soft pressure limits. The flusher kept
trying to flush empty memtables, while the previous non-empty memtable
was still in the group.

The CPU scheduler makes this worse, because it runs memtable_to_cache
in a separate scheduling group, so it further defers in time the
removal of the flushed memtable from the memtable list.

This patch fixes the problem by making regions corresponding to
memtables which started flushing report evictable_occupancy() as 0, so
that they're picked by the flusher last.

Fixes #3716.
Message-Id: <1535040132-11153-2-git-send-email-tgrabiec@scylladb.com>
2018-08-26 11:02:34 +03:00
Tomasz Grabiec
364418b5c5 logalloc: Make evictable_occupancy() indicate no free space
Doesn't fix any bug, but it's closer to the truth that all segments
are used rather than none is used.

Message-Id: <1535040132-11153-1-git-send-email-tgrabiec@scylladb.com>
2018-08-26 11:02:32 +03:00
Avi Kivity
54ac334f4b Update scylla-ami submodule
* dist/ami/files/scylla-ami c7e5a70...b7db861 (2):
  > scylla-ami-setup.service: run only on first startup
  > Use fstab to mount RAID volume on every reboot
2018-08-26 10:57:32 +03:00
Takuya ASADA
ff55e3c247 dist/common/scripts/scylla_raid_setup: refuse start scylla-server.service when RAID volume is not mounted
Since the Linux system abort booting when it fails to mount fstab entries,
user may not able to see an error message when we use fstab to mount
/var/lib/scylla on AMI.

Instead of abort booting, we can just abort to start scylla-server.service
when RAID volume is not mounted, using RequiresMountsFor directive of systemd
unit file.

See #3640

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180824185511.17557-1-syuu@scylladb.com>
2018-08-26 10:55:34 +03:00
Paweł Dziepak
4ca991ea65 idl: deserialized_bytes_proxy do not assume presence of iterator_type
deserialized_bytes_proxy assumes that the provided input stream has
iterator_type that represents the iterator pointing to the next
fragment of the fragmented underlying buffyer. This makes little sense
if the input stream is a contiguous one (i.e.
simple_memory_input_stream) so let's not make such assumptions.
2018-08-24 16:19:40 +01:00
Paweł Dziepak
3b7579aa0e idl-compiler: specify return type of with_serialized_stream() lambdas
IDL-generated code uses with_serialized_stream() to optimise for cases
when the underlying buffer is not fragmented. The provided lambda will
be called with wither simple or fragmented stream as an argument. The
consequence of this is that both instantations of generic lambda need to
return the same type. This is a problem if the type is deduced and
depends on the provided input stream (e.g. different type for fragmented
and simple streams). The solution is to explictly specify the return
type as the type returned by deserialising general utils::input_stream.
This way each instantation of lambda can return whatever it wants as
long as it is convertible to the type that the serialiser would return
if utils::input_stream was given.
2018-08-24 16:07:20 +01:00
Tomasz Grabiec
10f6b125c8 database: Run system table flushes in the main scheduling group
memtable flushes for system and regular region groups run under the
memtable_scheduling_group, but the controller adjusts shares based on
the occupancy of the regular region group.

It can happen that regular is not under pressure, but system is. In
this case the controller will incorrectly assign low shares to the
memtable flush of system. This may result in high latency and low
throughput for writes in the system group.

I observed writes to the sytem keyspace timing out (on scylla-2.3-rc2)
in the dtest: limits_test.py:TestLimits.max_cells_test, which went
away after this.

Fixes #3717.

Message-Id: <1535016026-28006-1-git-send-email-tgrabiec@scylladb.com>
2018-08-23 15:07:05 +03:00
Piotr Sarna
94262cf5d0 tests: add null collection test scenario to INSERT JSON
Refs #3664
Message-Id: <a34b9f5e8b9d7e3dd8906b559957220d74734b41.1534848313.git.sarna@scylladb.com>
2018-08-23 11:22:07 +03:00
Piotr Sarna
465045368f cql3: add proper setting of empty collections in INSERT JSON
Previously empty collections where incorrectly added as dead cells,
which resulted in serialization errors later.

Fixes #3664
Message-Id: <a9c90d66c6737641cafe40edb779df490ada0309.1534848313.git.sarna@scylladb.com>
2018-08-23 11:22:05 +03:00
Duarte Nunes
36a293bb23 cell_locking: Use xxhash instead of fnv1a
Being the single user of fnv1a, this allows us to get rid of it. As
the TODO inside fnv1a_hasher.hh indicates, and judging by any
independent benchmark, fnv1a is very slow. As we have added xx_hash
since then, and we know it to be fast, use it instead.

Tests: unit(release/cell_locker_test)

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180823081715.26089-1-duarte@scylladb.com>
2018-08-23 11:21:00 +03:00
Piotr Jastrzebski
2997fda1b1 sstables: Add test for RT with non-full key
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-08-22 18:28:11 +02:00
Piotr Jastrzebski
c50929233f sstables: Test reading range_tombstones
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-08-22 18:28:11 +02:00
Piotr Jastrzebski
7434be348c sstables: Support reading range_tombstones
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-08-22 18:27:41 +02:00
Piotr Jastrzebski
d19a108d87 sstables: Support null columns in ck
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-08-22 14:32:10 +02:00
Piotr Jastrzebski
3636697663 sstables: Add consumer_m::consume_range_tombstone
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-08-22 12:53:15 +02:00
Vladimir Krivopalov
8acf4ddb8e keys: Add clustering_key_prefix::make_full helper.
This method fills non-full clustering key with trailing empty values to
make it full.
This can be used for clustering keys of rows in a compact table as,
unlike in regular tables, they can be non-full.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-08-22 12:13:23 +02:00
Amnon Heiman
ab207356a5 API: storage_service stream endpoints
This patch changes how list of tokens returned from the storage_service
API.

Instead of create a vector and construct a json object of it, use the
streaming capabilities of the http.

This is important for large cluster and prevent large allocations.

Fixes #3701

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20180820195631.26792-1-amnon@scylladb.com>
2018-08-22 11:24:38 +03:00
Takuya ASADA
e4f38b7c22 dist/redhat: support package renaming on build script
To automatically rename packages on enterprise release, added package name
prefix as a variable on build_rpm.sh.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180822072105.9420-1-syuu@scylladb.com>
2018-08-22 11:03:39 +03:00
Piotr Sarna
4a274ee7e2 tests: add parsing varint from JSON string test
Refs #3666
Message-Id: <f4205e9484f5385796fade7986e3e38dcbc65bac.1534845398.git.sarna@scylladb.com>
2018-08-21 11:20:11 +01:00
Piotr Sarna
37a5c38471 types: enable deserializing varint from JSON string
Previously deserialization failed because the JSON string
representing a number was unnecessarily quoted.

Fixes #3666
Message-Id: <a0a100dbac7c151d627522174303657d1da05c27.1534845398.git.sarna@scylladb.com>
2018-08-21 11:20:11 +01:00
Tomasz Grabiec
6937cc2d1c Merge 'Fix multi-cell static list updates in the presence of ckeys' from Duarte
Fixes a regression introduced in
9e88b60ef5, which broke the lookup for
prefetched values of lists when a clustering key is specified.

This is the code that was removed from some list operations:

 std::experimental::optional<clustering_key> row_key;
 if (!column.is_static()) {
   row_key = clustering_key::from_clustering_prefix(*params._schema, prefix);
 }
 ...
 auto&& existing_list = params.get_prefetched_list(m.key().view(), row_key, column);

Put it back, in the form of common code in the update_parameters class.

Fixes #3703

* https://github.com/duarten/scylla cql-list-fixes/v1:
  tests/cql_query_test: Test multi-cell static list updates with ckeys
  cql3/lists: Fix multi-cell static list updates in the presence of ckeys
  keys: Add factory for an empty clustering_key_prefix_view
2018-08-21 12:14:30 +02:00
Vladimir Krivopalov
c8422c9a91 sstables: Add operator<< overload for bound_kind_m.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-08-20 16:22:53 -07:00
Duarte Nunes
ff7304b190 tests/cql_query_test: Test multi-cell static list updates with ckeys
Refs #3703

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-08-20 21:39:37 +01:00
Duarte Nunes
05731cb5ad cql3/lists: Fix multi-cell static list updates in the presence of ckeys
This patch fixes a regression introduced in
9e88b60ef5, which broke the lookup for
prefetched values of lists when a clustering key is specified.

This is the code that was removed from some list operations:

std::experimental::optional<clustering_key> row_key;
if (!column.is_static()) {
  row_key = clustering_key::from_clustering_prefix(*params._schema, prefix);
}
...
auto&& existing_list = params.get_prefetched_list(m.key().view(), row_key, column);

Put it back, in the form of common code in the update_parameters class.

Fixes #3703

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-08-20 21:39:37 +01:00
Duarte Nunes
ce461b06d7 keys: Add factory for an empty clustering_key_prefix_view
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-08-20 21:39:37 +01:00
Avi Kivity
231174cda9 build: auto-detect g++ -gz support
Older combinations of g++/binutils don't support -gz, so auto-detect its
presence.

Fixes #3697.
Message-Id: <20180817161113.2287-1-avi@scylladb.com>
2018-08-20 18:48:18 +02:00
Tomasz Grabiec
c31dff8211 Merge 'Skip inside wide partitions using index (rows only)' from Vladimir
This patchset adds support for skipping inside wide partitions using
index for sliced queries. This can significantly reduce disk I/O for
queries that only need to read a small amount of data from a wide
partition.

Other changes include general code clean-up and simplification.

 * github.com/argenet/scylla.git tree/projects/sstables-30/skip_using_index/v6:
  sstables: Support resetting data_consume_rows_context_m to
    indexable_element::cell.
  tests: Add tests to cover skipping with index through SSTables 3.x.
  sstables: Support skipping inside wide partitions using index.
  to_string: Add operator<< overload for std::optional.
  sstables: Use std::optional instead of std::experimental::optional.
2018-08-20 18:39:51 +02:00
Avi Kivity
e605cd4ff8 multishard_writer_test: reduce mutation count in release mode
We see occasional bad_alloc failures in release mode; this is due
to the random mutation generator generating large mutations.

Reduce the mutation count to 300. I tested 100 runs and all passed,
so it reduces the false positive rate to < 1%.
2018-08-20 16:53:05 +03:00
Gleb Natapov
7277ee2939 storage_proxy: do not fail read without speculation on connection error
After ac27d1c93b if a read executor has just enough targets to
achieve request's CL and a connection to one of them will be dropped
during execution ReadFailed error will be returned immediately and
client will not have a chance to issue speculative read (retry). The
patch changes the code to not return ReadFailed error immediately, but
wait for timeout instead and give a client chance to issue speculative
read in case read executor does not have additional targets to send
speculative reads to by itself.

Fixes #3699.
Message-Id: <20180819131646.GK2326@scylladb.com>
2018-08-20 10:12:31 +03:00
Vladimir Krivopalov
f1b9f82ff5 sstables: Use std::optional instead of std::experimental::optional.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-08-17 18:20:05 -07:00
Vladimir Krivopalov
7b1d4915a1 to_string: Add operator<< overload for std::optional.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-08-17 18:20:05 -07:00
Vladimir Krivopalov
3e92434eed sstables: Support skipping inside wide partitions using index.
This fix adds proper support for skipping inside wide partitions using
index for sliced reads. This significantly reduces disk I/O for filtered
queries.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-08-17 18:20:04 -07:00
Vladimir Krivopalov
ec78fb9f13 tests: Add tests to cover skipping with index through SSTables 3.x.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-08-17 18:19:22 -07:00
Vladimir Krivopalov
4bf1e9de3f sstables: Support resetting data_consume_rows_context_m to indexable_element::cell.
Set the proper parsing state when resetting to indexable_element::cell.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-08-17 10:09:19 -07:00
Eliran Sinvani
f5f6cf2096 cql3: remove rejection of an IN relation if not on last partition KEY
The constraint is no longer relevant, since Casandra removed
it in version 2.2. In addition the mechanism for handling this
case is already implemented and is identical in case of
clustering keys with single column EQ,= and IN relations.
(Cartesian product of singular ranges).

A unit test for this test case was added.

Fixes #1735
Tests:
1. Unit Tests.
2. Manual testing with the case described in the issue.
3. dtest: ql_additional_tests.py:TestCQL.composite_row_key_test

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <83b43fdc1ca0e0cc287f66f11816fc71b8bd2925.1534430405.git.eliransin@scylladb.com>
2018-08-16 19:32:43 +01:00
Eliran Sinvani
d743ceae76 cql3: ignore LIMIT in select statement with aggregate
LIMIT should restrict the output result and not the query whose result
set is aggregated. when using aggregate the output is guarantied to
be only one row long. since LIMIT accepts only none negative numbers,
it has no effect and can be ignored.

Fixes #2028
Tests: The issue described Testcase ,  UnitTests.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <6c235376c81f052020e2ed23d0a3d071b36d4415.1534416997.git.eliransin@scylladb.com>
2018-08-16 19:31:56 +01:00
Duarte Nunes
a4355fe7e7 cql3/query_options: Use _value_views in prepare()
_value_views is the authoritative data structure for the
client-specified values. Indeed, the ctor called
transport::request::read_options() leaves _values completely empty.

In query_options::prepare() we were, however, using _values to
associated values to the client-specified column names, and not
_value_views. Fix this by using _value_views instead.

As for the reasons we didn't see this bug earlier, I assume it's
because very few drivers set the 0x04 query options flag, which means
column names are omitted. This is the right thing to do since most
drivers have enough information to correctly position the values.

Fixes #3688

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180814234605.14775-1-duarte@scylladb.com>
2018-08-15 10:38:09 +01:00
Duarte Nunes
8751a58a2b cql3/query_options: Preserve unset values when building value_views
A raw value can be in one of three states: a valid value, an unset
value, a null value. When translating raw_values to their views, we
were treating both unset and null values are null raw_value_views.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180814231051.14385-1-duarte@scylladb.com>
2018-08-15 10:37:29 +01:00
Duarte Nunes
805ce6e019 cql3/query_processor: Validate presence of statement values timeously
We need to validate before calling query_options::prepare() whether
the set of prepared statement values sent in the query matches the
amount of names we need to bind, otherwise we risk an out-of-bounds
access if the client also specified names together with the values.

Refs #3688

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180814225607.14215-1-duarte@scylladb.com>
2018-08-15 10:37:13 +01:00
Eliran Sinvani
d734d316a6 cql3: ensure repeated values in IN clauses don't return repeated rows
When the list of values in the IN list of a single column contains
duplicates, multiple executors are activated since the assumption
is that each value in the IN list corresponds to a different partition.
this results in the same row appearing in the result number times
corresponding to the duplication of the partition value.

Added queries for the in restriction unitest and fixed with a bad result check.

Fixes #2837
Tests: Queries as in the usecase from the GitHub issue in both forms ,
prepared and plain (using python driver),Unitest.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <ad88b7218fa55466be7bc4303dc50326a3d59733.1534322238.git.eliransin@scylladb.com>
2018-08-15 10:21:22 +01:00
Duarte Nunes
a025bf6a7d Merge seastar upstream
Seastar introduced a "compat" namespace, which conflicts with Scylla's
own "compat" namespaces. The merge thus includes changes to scope
uses of Scylla's "compat" namespaces.

* seastar 8ad870f...9bb1611  (5):
  > util/variant_utils: Ensure variant_cast behaves well with rvalues
  > util/std-compat: Fix infinite recursion
  > doc/tutorial: Undo namespace changes
  > util/variant_utils: Add cast_variant()
  > Add compatbility with C++17's library types

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-08-14 13:07:09 +01:00
Duarte Nunes
25a0a0f83d tests/cql_test_env: Increase eventually() attempts
The current value has proved to be insufficient for our CI
infrastructure.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180814112201.8595-1-duarte@scylladb.com>
2018-08-14 12:37:32 +01:00
Duarte Nunes
495a92c5b6 tests/gossip_test: Use RAII for orderly destruction
Change the test so that services are correctly teared down, by the
correct order (e.g., storage_service access the messaging_service when
stopping).

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180814112111.8521-2-duarte@scylladb.com>
2018-08-14 12:27:14 +01:00
Duarte Nunes
3956a77235 tests/gossip_test: Don't bind address to avoid conflicts
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180814112111.8521-1-duarte@scylladb.com>
2018-08-14 12:27:02 +01:00