Commit Graph

18124 Commits

Author SHA1 Message Date
Jesse Haber-Kucharsky
afed9c7bee tests: Validate authentication correctly
There are additional validation steps that the server executes in
addition to simply invoking the authenticator, so we adapt the tests to
also perform that validation.

We also eliminate lots of code duplication.
2019-02-28 15:01:14 -05:00
Jesse Haber-Kucharsky
baefde0f6c tests: Ensure test roles are created and dropped
Since the role manager and authenticator work in tandem, the test cases
should use the wrapper for `auth::service` to create and drop users
instead of just doing it through the authenticator.
2019-02-28 15:00:20 -05:00
Jesse Haber-Kucharsky
fd88d59ad9 tests: Use static variables in auth_test
This way, we avoid copies and alleviate resource-management concerns.
2019-02-28 14:59:38 -05:00
Jesse Haber-Kucharsky
f274982522 tests: Remove non-useful test
Password handling is verified in its own test suite, and this test not
only makes a number of assumptions about implementation details, but
also tries to verify a hashing scheme (bcrypt) which is not supported on
most Linux distributions.
2019-02-28 14:58:27 -05:00
Avi Kivity
7c968f4a9e build: move XXH_PRIVATE_API and SEASTAR_TESTING_MAIN non-mode-specific
These defines are global, so they can be in the mode-agnostic cxxflags
rather than the mode-specific cxxflags_{mode}.
Message-Id: <20190228081247.20116-1-avi@scylladb.com>
2019-02-28 09:51:02 +00:00
Avi Kivity
20eadb2c39 relocatable-package: package and redirect gnutls configuration
gnutls requires a configuration file, and the configuration file must match
the one used by the library. Since we ship our own version of the library with
the relocatable package, we must also ship the configuration file.

Luckily, it is possible to override the location of the configuration file via
an environment variable, so all we need to do is to copy the file to the archive
and provide the environment variable in the thunk that adjusts the library path.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190227110529.14146-1-avi@scylladb.com>
2019-02-28 10:57:32 +02:00
Avi Kivity
4022a919f6 test: allocate at least one logical core per unit test
Currently, we only allocate memory for concurrent unit test runs. This can cause
CPU overcommit when running test.py on machines with a log of memory but few cores.
This overcommit can cause timeouts in tests that are time-sensitive (bad practice,
but can happen) and makes the desktop sluggish.

Improve by allocating at least one logical core per running test.

Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190227132516.22147-1-avi@scylladb.com>
2019-02-28 10:34:33 +02:00
Dan Yasny
6dbb48a12a node_health_check: collect scylla.d contents with node_health_check
We are missing data for CPU conf files and potentially other
information when collecting node data.

Fixes #4094

Message-Id: <20190225204727.20805-5-dyasny@scylladb.com>
2019-02-28 10:23:19 +02:00
Dan Yasny
9055e7a49e node_health_check: Add redhat-release to health check if present
Collect /etc/redhat-release as well as os-release from relevant
hosts. The problem with os-release is that it doesn't contain the
minor version of the EL OS family. Since this is only present in
Red Hat distributions and derivatives, it will not be collected
in Debian derivatives.

Another approach is to use lsb_release -a but it will not provide
anything more useful than os-release on Debian and lsb needs to be
installed on EL derivatives first.

Fixes #4093

Message-Id: <20190225204727.20805-4-dyasny@scylladb.com>
2019-02-28 10:23:12 +02:00
Dan Yasny
2f26390f52 node_health_check: Use clear hostname instead of -i for filenames and report names
Hostname -i produces a garbled output on new systems with ipv6
enabled, better to use the clean hostname instead, for the file
names.

Message-Id: <20190225204727.20805-3-dyasny@scylladb.com>
2019-02-28 10:23:06 +02:00
Dan Yasny
f483c594ee node_health_check: Detect the address for the CQL (port 9042) listener and use it
The script relies on hostname -i for host address, which can be
wrong in some systems. This patch checks for where the defined
CQL_PORT is listening, and uses the correct IP address instead.

Message-Id: <20190225204727.20805-2-dyasny@scylladb.com>
2019-02-28 10:22:58 +02:00
Avi Kivity
632c7c303a Merge "auth: Restructure SASL code" from Jesse
"
This series restructures the SASL code that was previously internal
to the `password_authenticator` so that it can be used in other contexts.
"

* 'jhk/restructure_sasl/v1' of https://github.com/hakuch/scylla:
  auth: Rename SASL challenge class for "PLAIN"
  auth: Make a ctor `explicit`
  auth: Move `sasl_challenge` to its own file
  auth: Decouple SASL code from its parent class
2019-02-28 10:19:41 +02:00
Jesse Haber-Kucharsky
f2d92f81e8 auth: Report a more specific error with bad creds
Without this change, the resulting error message for an invalid password
is "authentication failed".

With this change, we report "Username and/or password are incorrect".

Fixes #4285

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <32d00be8af5075ee10d2c14f85b76843a9adac10.1551306914.git.jhaberku@scylladb.com>
2019-02-28 09:53:57 +02:00
Jesse Haber-Kucharsky
3d883e8cf2 auth: Rename SASL challenge class for "PLAIN" 2019-02-27 18:36:58 -05:00
Jesse Haber-Kucharsky
0c955b7992 auth: Make a ctor explicit 2019-02-27 18:36:58 -05:00
Jesse Haber-Kucharsky
dc41f1098b auth: Move sasl_challenge to its own file
This will allow for other authenticators other than
`password_authenticator` from making use of the PLAIN SASL
authentication code.
2019-02-27 18:36:52 -05:00
Jesse Haber-Kucharsky
2d59fa6be9 auth: Decouple SASL code from its parent class
This way, we can (in the future) use this implementation of the SASL
"PLAIN" mechanism in other contexts other than `password_authenticator`.
2019-02-27 18:11:31 -05:00
Avi Kivity
88322086cb Merge "Add fuzzer-type unit test for range scans" from Botond
"
This series adds a fuzzer-type unit test for range scans, which
generates a semi-random dataset and executes semi-random range scans
against it, validating the result.
This test aims to cover a wide range of corner cases with the help of
randomness. Data and queries against it are generated in such a way that
various corner cases and their combinations are likely to be covered.

The infrastructure under range-scans have gone under massive changes in
the last year, growing in complexity and scope. The correctness of range
scans is critical for the correct functioning of any Scylla cluster, and
while the current unit tests served well in detecting any major problems
(mostly while developing), they are too simplistic and can only be
relied on to check the correctness of the basic functionality. This test
aims to extend coverage drastically, testing cases that the author of
the range-scan code or that of the existing unit tests didn't even think
exists, by relying on some randomness.

Fixes: #3954 (deprecates really)
"

* 'more-extensive-range-scan-unit-tests/v2' of https://github.com/denesb/scylla:
  tests/multishard_mutation_query_test: add fuzzy test
  tests/multishard_mutation_query_test: refactor read_all_partitions_with_paged_scan()
  tests/test_table: add advanced `create_test_table()` overload
  tests/test_table: make `create_test_table()` customizable
  query: add trim_clustering_row_ranges_to()
  tests/test_table: add keyspace and table name params
  tests/test_table: s/create_test_cf/create_test_table/
  tests: move create_test_cf() to tests/test_table.{hh,cc}
  tests/multishard_mutation_query_test: drop many partition test
  tests/multishard_mutation_query_test: drop range tombstone test
2019-02-27 17:26:53 +02:00
Avi Kivity
cc2f9841c4 Merge "Simplify -g and -gz checks in configure.py" from Rafael
* 'simplify-g-gz-check-v2' of https://github.com/espindola/scylla:
  Assume -gz is always available
  Assume -g is always available
2019-02-27 17:19:37 +02:00
Duarte Nunes
871790a340 Merge 'Hide virtual columns write time and ttl from the user' from Piotr
"
This miniseries hides virtual columns's writetime and ttl
from the user.

Tests: unit (dev)

Fixes #4288
"

* 'hide_virtual_columns_writetime_and_ttl_2' of https://github.com/psarna/scylla:
  tests: add test for hiding virtual columns from WRITETIME
  cql3: hide virtual columns from WRITETIME() and TTL()
  schema: add column_definition::is_hidden_from_cql
2019-02-27 14:36:08 +00:00
Piotr Sarna
09eb0429ce tests: add test for hiding virtual columns from WRITETIME
Visibility checks for virtual columns' WRITETIME and TTL
are added.
2019-02-27 15:08:16 +01:00
Piotr Sarna
af39787bf0 cql3: hide virtual columns from WRITETIME() and TTL()
Virtual columns should not be visible to the user,
so they are now hidden not only from directly selecting them,
but also via WRITETIME() and TTL() keywords.

Fixes #4288
2019-02-27 15:08:15 +01:00
Piotr Sarna
b0ab4c28cf schema: add column_definition::is_hidden_from_cql
Right now the only columns hidden from CQL are view virtual columns,
but in case of expanding this set, a helper function is provided.
2019-02-27 15:07:54 +01:00
Avi Kivity
d189e12438 tests: database_test: fix misaligned dma write
test_distributed_loader_with_pending_delete issues a dma write, but violates
the unwritten contract to temporary_buffer::aligned(), which requires that
size be a multiple of alignment. As a result the test fails spuriously.

Instead of playing with the alignment, rewrite that snippet to use the
easier-to-use make_file_output_stream().

Introduced in 1ba88b709f.
Branches: master.
Message-Id: <20190226181850.3074-1-avi@scylladb.com>
2019-02-27 09:00:31 +01:00
Rafael Ávila de Espíndola
a586ac209a Assume -gz is always available
It is available since clang 5 and gcc 5.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 09:57:26 -08:00
Rafael Ávila de Espíndola
054078b6af Assume -g is always available
From the log it looks like these checks were added in 2014 because of
a broken clang.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 09:57:26 -08:00
Rafael Ávila de Espíndola
87106ea5e2 Improve the build mode documentation
With this patch HACKING suggest using just ./configure.py and passing
the mode to ninja. It also expands on the characteristics of each mode
and mentions the dev mode.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190208020444.19145-1-espindola@scylladb.com>
2019-02-26 19:54:50 +02:00
Nadav Har'El
da54d0fc7d Materialized views: fix accidental zeroing of flow-control delay
The materialized-views flow control carefully calculates an amount of
microseconds to delay a client to slow it down to the desired rate -
but then a typo (std::min instead of std::max) causes this delay to
be zeroed, which in effect completely nullifies the flow control
algorithm.

Before this fix, experiments suggested that view flow control was
not having any effect and view backlog not bounded at all. After this
fix, we can see the flow control having its desired effect, and the
view backlog converging.

Fixes #4143.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190226161452.498-1-nyh@scylladb.com>
2019-02-26 18:22:18 +02:00
Tomasz Grabiec
1a63a313c8 Merge "repair: Rename names to be consistent with rpc verb
" from Asias

Some of the function names are not updated after we change the rpc verb
names. Rename them to make them consistent with the rpc verb names.

* seastar-dev.git asias/row_level_repair_rename_consistent_with_rpc_verb/v1:
  repair: Rename request_sync_boundary to get_sync_boundary
  repair: Rename request_full_row_hashes to get_full_row_hashes
  repair: Rename request_combined_row_hash to get_combined_row_hash
  repair: Rename request_row_diff to get_row_diff
  repair: Rename send_row_diff to put_row_diff
  repair: Update function name in docs/row_level_repair.md
2019-02-26 13:01:36 +01:00
Tomasz Grabiec
b06aac4fdb Merge "Fix temporary spurious schema version mismatch when nodes are restarted" from Asias
Fixes: #4148
Fixes: #4258

Tests: resharding_test.py:reshardingtest_nodes4_with_sizetieredcompactionstrategy.resharding_by_smp_increase_test

* seastar-dev.git asias/fix_schema_mismatch_when_nodes_restarts/v1:
  database: Add update_schema_version and announce_schema_version
  storage_service: Add application_state::SCHEMA when gossip starts
2019-02-26 12:55:52 +01:00
Avi Kivity
5f94bc902a transport: add option to disable shard-aware drivers
The shard-aware drivers can cause a huge amount of connections to be created
when there are tens of thousands of clients. While normally the shard-aware
drivers are beneficial, in those cases they can consume too much memory.

Provide an option to disable shard awareness from the server (it is likely to
be easier to do this on the server than to reprovision those thousands of
clients).

Tests: manual test with wireshark.
Message-Id: <20190223173331.24424-1-avi@scylladb.com>
2019-02-26 12:44:11 +01:00
Asias He
459836079c storage_service: Add application_state::SCHEMA when gossip starts
In resharding_test.py:ReshardingTest_nodes4_with_SizeTieredCompactionStrategy.resharding_by_smp_increase_test, we saw:

   4 nodes in the tests

   n1, n2, n3, n4 are started

   n1 is stopped

   n1 is changed to use different shard config

   n1 is restarted ( 2019-01-27 04:56:00,377 )

The backtrace happened on n2 right fater n1 restarts:

   0 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature STREAM_WITH_RPC_STREAM is enabled
   1 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature WRITE_FAILURE_REPLY is enabled
   2 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature XXHASH is enabled
   3 WARN 2019-01-27 04:56:05,177 [shard 0] gossip - Fail to send EchoMessage to 127.0.58.1: seastar::rpc::closed_error (connection is closed)
   4 INFO 2019-01-27 04:56:05,205 [shard 0] gossip - InetAddress 127.0.58.1 is now UP, status =
   5 Segmentation fault on shard 0.
   6 Backtrace:
   7 0x00000000041c0782
   8 0x00000000040d9a8c
   9 0x00000000040d9d35
   10 0x00000000040d9d83
   11 /lib64/libpthread.so.0+0x00000000000121af
   12 0x0000000001a8ac0e
   13 0x00000000040ba39e
   14 0x00000000040ba561
   15 0x000000000418c247
   16 0x0000000004265437
   17 0x000000000054766e
   18 /lib64/libc.so.6+0x0000000000020f29
   19 0x00000000005b17d9

The theory is: migration_manager::maybe_schedule_schema_pull is scheduled, at this time
n1 has SCHEMA application_state, when n1 restarts, n2 gets new application
state from n1 which does not have SCHEMA yet, when migration_manager::maybe_schedule
wakes up from the 60 sleep, n1 has non-empty endpoint_state but empty
application_state for SCHEMA. We dereference the nullptr
application_state and abort.

In commit da80f27f44, we fixed the problem by
checking the pointer before dereference.

To prevent this to happen in the first place, we'd better to add
application_state::SCHEMA when gossip starts. This way, peer nodes
always see the application_state::SCHEMA when a node restarts.

Tests: resharding_test.py:ReshardingTest_nodes4_with_SizeTieredCompactionStrategy.resharding_by_smp_increase_test

Fixes #4148
Fixes #4258
2019-02-26 19:30:22 +08:00
Asias He
75edbe939d database: Add update_schema_version and announce_schema_version
Split the update_schema_version_and_announce() into
update_schema_version() and announce_schema_version(). This is going to
be used in storage_service::prepare_to_join() where we want to first
update the schema version, start gossip, announce the schema version.
2019-02-26 19:10:02 +08:00
Amnon Heiman
b8a838c66c node_exporter_install: Add a force install option
It is sometimes usefull for force reinstallation of the node_exporter,
for example during upgrade or if something is wrong with the current
installation.

This patch adds a --force command line option.

If the --force is given to the node_expoerter_install, it will reinstall
node_exporter to the latest version, regardless if it was already
installed.

The symbolic link in /usr/bin/node_exporter will be set to the installed
version, so if there are other installed version, they will remain.

Examples:
$ sudo ./dist/common/scripts/node_exporter_install
node_exporter already installed, you can use `--force` to force reinstallation

$ sudo ./dist/common/scripts/node_exporter_install --force
node_exporter already installed, reinstalling

Fixes #4201

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20190225151120.21919-1-amnon@scylladb.com>
2019-02-25 20:16:58 +02:00
Pekka Enberg
ca288189a9 dist/ami: Support different products for the AMI
Let's add a PRODUCT variable, similar to build_rpm.sh, for example, so
that we can override package names for enterprise AMIs.

Message-Id: <20190225063319.19516-1-penberg@scylladb.com>
2019-02-25 11:17:44 +02:00
Asias He
3e615c3a15 repair: Update function name in docs/row_level_repair.md
The repair rpc request_* functions are renamed to get_*.
The send_row_diff is renamed to put_row_diff.
2019-02-25 15:13:39 +08:00
Asias He
62104902db repair: Rename send_row_diff to put_row_diff
Make it consistent with the row level repair rpc verb.
2019-02-25 15:13:39 +08:00
Asias He
6e4ea1b3c4 repair: Rename request_row_diff to get_row_diff
Make it consistent with the row level repair rpc verb.
2019-02-25 15:13:39 +08:00
Asias He
5b29fb30ac repair: Rename request_combined_row_hash to get_combined_row_hash
Make it consistent with the row level repair rpc verb.
2019-02-25 15:13:39 +08:00
Asias He
6f6c4878d5 repair: Rename request_full_row_hashes to get_full_row_hashes
Make it consistent with the row level repair rpc verb.
2019-02-25 15:13:39 +08:00
Asias He
02ddfa393e repair: Rename request_sync_boundary to get_sync_boundary
Make it consistent with the row level repair rpc verb.
2019-02-25 15:13:39 +08:00
Avi Kivity
a0b0db7915 Merge "Fix regression in perf_fast_forward results" from Paweł
"
After adcb3ec20c ("row_cache: read is not
single-partition if inter-partition forwarding is enabled") we have
noticed a regression in the results of some perf_fast_forward tests.
This was caused by those tests not disabling partition-level
fast-forwarding even though it was not needed and the commit in question
fixed an incorrect optimisation in such cases.

However, after solving that issue it has also become apparent that
mutation_reader_merger performs worse when the fast-forwarding is
disabled. This was attributed to logic responsible for dropping readers
as soon as they have reached the end of stream (which cannot be done if
fast-forwarding is enabled). This problem was mitigated with avoiding a
scan of the list and removing readers in small batches.

Fixes #4246.
Fixes #4254.

Tests: unit(dev)
"

* tag 'perf_fast_forward-fix-regression/v1' of https://github.com/pdziepak/scylla:
  mutation_reader_merger: drop unneded readers in small batches
  mutation_reader_merger: track readers by iterators and not pointers
  tests/perf_fast_forward: disable partition-level fast-forwarding if not needed
2019-02-24 19:24:00 +02:00
Avi Kivity
e3c53ff3ff Update seastar submodule
* seastar 2313dec...ab54765 (10):
  > Fix C++-17-only uses of static_assert() with a single parameter.
  > README.md: fix out-of-date explanation of C++ dialect
  > net: fix tcp load balancer accounting leak while moving socket to other shard
  > Revert "deleter: prevent early memory free caused by deleter append."
  > deleter: prevent early memory free caused by deleter append.
  > Solve seastar.unit.thread failure in debug mode
  > Fix iovec-based read_dma: use make_readv_iocb instead of make_read_iocb
  > build: Fix the required version of `fmt`
  > app_template: fix use after move in app constructor
  > build: Rename CMake variable for private flags

Fixes #4269.
2019-02-24 16:06:23 +02:00
Avi Kivity
a3a7bea12f Merge "Clean up preprocessor definitions" from Jesse
* 'jhk/define_debug/v1' of https://github.com/hakuch/scylla:
  build: Remove the `DEBUG_SHARED_PTR` pp variable
  build: Prefer the Seastar version of a pp variable
2019-02-23 14:04:08 +02:00
Jesse Haber-Kucharsky
f9297895c1 auth: Change the log level for async. retries
The log message is benign, but it has caused some users of Scylla to
think that an error has occurred.

Fixes #3850

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <ba49c38266c0e77c3ed23cfca3c1a082b3060f17.1550777586.git.jhaberku@scylladb.com>
2019-02-23 14:03:16 +02:00
Tomasz Grabiec
3f698701c2 gdb: Drop incorrect throw of StopIteration
It is converted into a RuntimeError by python3:

  https://docs.python.org/3/library/exceptions.html#StopIteration

We should just return.

Message-Id: <20190221144321.18093-1-tgrabiec@scylladb.com>
2019-02-23 14:02:47 +02:00
Nadav Har'El
0eddf19432 main: add INFO log messages at start, initialization end, and end.
Scylla currently prints a welcome message when it starts, with the
Scylla version, but this is not printed to the regular log so in some
cases (e.g., Jenkins runs) we do not see it in the log. So let's add
a regular INFO-level log message with the same information.

Also, Scylla currently doesn't print any specific log message when it
normally completes its shutdown. In some cases, users may end up
wondering whether Scylla hung in the middle of the shutdown, or in
fact exited normally. Refs #4238. So in this patch we add a "shutdown
complete" message as the very last message in a successfull shutdown.
We print Scylla's version also in the shutdown message, which may be
useful to see in the logs when shutting down one version of Scylla
and starting a different version.

Finally, we also add a log message when initialization is complete,
which may also be useful to understand whether Scylla hung during
initialization.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190217140659.19512-1-nyh@scylladb.com>
2019-02-22 16:52:31 +01:00
Tomasz Grabiec
b90cb91468 gdb: Introduce 'scylla cache'
Prints contents of the row cache for each table on current shard.
Message-Id: <20190222144420.19677-1-tgrabiec@scylladb.com>
2019-02-22 14:58:58 +00:00
Paweł Dziepak
b524f96a74 mutation_reader_merger: drop unneded readers in small batches
It was observed that destroying readers as soon as they are not needed
negatively affects performance of relatively small reads. We don't want
to keep them alive for too long either, since they may own a lot of
memory, but deferring the destruction slightly and removing them in
batches of 4 seems to solve the problem for the small reads.
2019-02-22 14:43:38 +00:00
Paweł Dziepak
435e24f509 mutation_reader_merger: track readers by iterators and not pointers
mutation_reader_merger uses a std::list of mutation_reader to keep them
alive while the rest of the logic operates on non-owning pointers.

This means that when it is a time to drop some of the readers that are
no longer needed, the merger needs to scan the list looking for them.
That's not ideal.

The solution is to make the logic use iterators to elements in that
list, which allows for O(1) removal of an unneeded reader. Iterators to
list are just pointers to the node and are not invalidated by unrelated
additions and removals.
2019-02-22 14:33:10 +00:00