Commit Graph

18556 Commits

Author SHA1 Message Date
Glauber Costa
178fb5fe5f make scylla_util OS detection robust against empty lines
Newer versions of RHEL ship the os-release file with newlines in the
end, which our script was not prepared to handle. As such, scylla_setup
would fail.

This patch makes our OS detection robust against that.

Fixes #4473

Branches: master, branch-3.1
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190502152224.31307-1-glauber@scylladb.com>
(cherry picked from commit 99c00547ad)
2019-05-03 09:57:21 +03:00
Nadav Har'El
a45b6e41a0 materialized views and secondary index: sometimes allow dropping base columns
Until this patch, dropping columns from a table was completely forbidden
if this table has any materialized views or secondary indexes. However,
this is excessively harsh, and not compatible with Cassandra which does
allow dropping columns from a base table which has a secondary index on
*other* columns. This incompatibility was raised in the following
Stackoverflow question:
https://stackoverflow.com/questions/55757273/error-while-dropping-column-from-a-table-with-secondary-index-scylladb/55776490

In this patch, we allow dropping a base table column if none of its
materialized views *needs* this column. Columns selected by a view
(as regular or key columns) are needed by it, of course, but when
virtual columns are used (namely, there is a view with same key columns
as the base), *all* columns are needed by the view, so unfortunately none
of the columns may be dropped.

After this patch, when a base-table column cannot be dropped because one
of the materialized views needs it, the error message will look like:

   exceptions::invalid_request_exception: Cannot drop column a from base
   table ks.cf: a materialized view cf_a_idx_index needs this column.

This patch also includes extensive testing for the cases where dropping
columns are now allowed, and not allowed. The secondary-index tests are
especially interesting, because they demonstrate that now usually (when
a non-key column is being indexed) dropping columns will be allowed,
which is what originally bothered the Stackoverflow user.

Fixes #4448.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190429214805.2972-1-nyh@scylladb.com>
2019-04-30 12:13:10 +01:00
Nadav Har'El
92d5f61ba5 cql: support single-value IN restriction wherever EQ restriction is supported
There are several places were IN restrictions are not currently supported,
especially in queries involving a secondary index. However, when the IN
restriction has just a single value, it is nothing more than an equality
restriction and can be converted into one and be supported. So this patch
does exactly this.

Note that Cassandra does this conversion since August 2016, and therefore
supports the special case of single-value IN even where general IN is not
supported. So it's important for Cassandra compatibility that we do this
conversion too.

This patch also includes a test with two queries involving a secondary
index that were previously disallowed because of the "IN" on the primary
key or the indexed column - and are now allowed when the IN restriction
has just a single value. A third query tested is not related to secondary
indexes, but confirms we don't break multi-column single-value IN queries.

Fixes #4455.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190428160317.23328-1-nyh@scylladb.com>
2019-04-30 12:13:06 +01:00
Tomasz Grabiec
1adcb3637e Merge "multishard reader: fix handling of non strictly monotonous positions" from Botond
The shard readers of the multishard reader assumed that the positions in
the data stream are strictly monotonous. This assumption is invalid.
Range tombstones can have positions that they can share with other range
tombstones and/or a clustering row. The effect of this false assumption
was that when the shard reader was evicted such that the last seen
fragment was a range tombstone, when recreated it would skip any unseen
fragments that have the same position as that of the last seen range
tombstone.

Fixes: #4418

Branches: master, 3.0, 2019.1

Tests: unit(dev)

* https://github.com/denesb/scylla.git
multishard_reader_handle_non_strictly_monotonous_positions/v4:
  multishard_combining_reader: shard_reader::remote_reader extract
    fill-buffer logic into do_fill_buffer()
  mutlishard_combining_reader: reorder
    shard_reader::remote_reader::do_fill_buffer() code
  position_in_partition_view: add region() accessor
  multishard_combining_reader: fix handling of non-strictly monotonous
    positions
  flat_mutation_reader: add flat_mutation_reader_from_mutations()
    overload with range and slice
  flat_mutation_reader: add make_flat_mutation_reader_from_fragments()
    overload with range and slice
  tests: add unit test for multishard reader correctly handling
    non-strictly monotonous positions
2019-04-30 12:35:28 +02:00
Tomasz Grabiec
077c639e42 Merge "Simplify the result_set_row API" from Rafael
Currently null and missing values are treated differently. Missing
values throw no_such_column. Null values return nullptr, std::nullopt
or throw null_column_value.

The api is a bit confusing since a function returning a std::optional
either returns std::nullopt or throws depending on why there is no
value.

With this patch series only get_nonnull throws and there is only one
exception type.

* https://github.com/espindola/scylla.git espindola/merge-null-and-missing-v2:
  query-result-set: merge handling of null and missing values
  Remove result_set_row::has
  Return a reference from get_nonnull
2019-04-30 11:06:29 +02:00
Rafael Ávila de Espíndola
63c47117b5 Return a reference from get_nonnull
No reason to copy if we don't have to. Now that get_nonnull doesn't
copy, replace a raw used of get_data_value with it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-29 21:14:11 -07:00
Rafael Ávila de Espíndola
0474458872 Remove result_set_row::has
Now that the various get methods return nullptr or std::nullopt on
missing values, we don't need to do double lookups.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-29 19:56:26 -07:00
Rafael Ávila de Espíndola
2770b29036 query-result-set: merge handling of null and missing values
Nothing seems to differentiate a missing and a null value. This patch
then merges the two exception types and now the only method that
throws is get_nonnull. The other methods return nullptr or
std::nullopt as appropriate.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-29 19:56:20 -07:00
Avi Kivity
3726a4fbd9 Merge "Fix schema disagreement during rolling upgrade" from Tomasz
"
After 7c87405, schema sync includes system_schema.view_virtual_columns in the
schema digest. Old nodes don't know about this table and will not include it
in the digest calculation. As a result, there will be schema disagreement
until the whole cluster is upgraded.

Also, the order in which tables were hashed changed in 7c87405, which
causes digests to differ in some schemas.

Fixes #4457.
"

* tag 'fix-disagreement-during-upgrade-v2' of github.com:tgrabiec/scylla:
  db/schema_tables: Include view_virtual_columns in the digest only when all nodes do
  storage_service: Introduce the VIEW_VIRTUAL_COLUMNS cluster feature
  db/schema_tables: Hash schema tables in the same order as on 3.0
  db/schema_tables: Remove table name caching from all_tables()
  treewide: Propagate schema_features to db::schema::all_tables()
  enum_set: Introduce full()
  service/storage_service: Introduce cluster_schema_features()
  schema: Introduce schema_features
  schema_tables: Propagate storage_service& to merge_schema()
  gms/feature: Introduce a more convenient when_enabled()
  gms/feature: Mark all when_enabled() overloads as const
2019-04-29 14:23:53 +03:00
Avi Kivity
ede1d248af tools: toolchain: improve dbuild signal handing
Currently, we use --sig-proxy to forward signals to the container. However, this
requires the container's co-operation, which usually doesn't exist. For example,

    docker run --sig-proxy fedora:29 bash -c "sleep 5"

Does not respond to ctrl-C.

This is a problem for continuous integration. If a build is aborted, Jenkins will
first attempt to gracefully terminate the processes (SIGINT/SIGTERM) and then give
up and use SIGKILL. If the graceful termination doesn't work, we end up with an
orphan container running on the node, which can then consume enough memory and CPU
to harm the following jobs.

To fix this, trap signals and handle them by killing the container. Also trap
shell exit, and even kill the container unconditionally, since if Jenkins happens
to kill the "docker wait" process the regular paths will not be taken.

We lose a lot by running the container asynchronously with the dbuild shell
script, so we need to add it back:

 - log display: via the "docker logs" command
 - auto-removal of the container: add a "docker rm -f" command on signal
   or normal exit
Message-Id: <20190424130112.794-1-avi@scylladb.com>
2019-04-29 10:05:21 +02:00
Botond Dénes
aa18bb33b9 tests: add unit test for multishard reader correctly handling non-strictly monotonous positions 2019-04-29 10:24:14 +03:00
Botond Dénes
51e81cf027 flat_mutation_reader: add make_flat_mutation_reader_from_fragments() overload with range and slice
To be able to support this new overload, the reader is made
partition-range aware. It will now correctly only return fragments that
fall into the partition-range it was created with. For completeness'
sake and to be able to test it, also implement
`fast_forward_to(const dht::partition_range)`. Slicing is done by
filtering out non-overlapping fragments from the initial list of
fragments. Also add a unit test that runs it through the mutation_source
test suite.
2019-04-29 10:24:14 +03:00
Tomasz Grabiec
c96ee9882b db/schema_tables: Include view_virtual_columns in the digest only when all nodes do
After 7c87405, schema sync includes system_schema.view_virtual_columns
in the schema digest. Old nodes don't know about this table and will
not include it in the digest calculation. As a result, there will be
schema disagreement until the whole cluster is upgraded.

Fix this by taking the new table into account only when the whole
cluster is upgraded.

The table should not be used for anything before this happens. This is
not currently enforced, but should be.

Fixes #4457.
2019-04-28 15:50:13 +02:00
Tomasz Grabiec
a108df09f9 storage_service: Introduce the VIEW_VIRTUAL_COLUMNS cluster feature
Needed for determining if all nodes in the cluster are aware of the
new schema table. Only when all nodes are aware of it we can take it
into account when calculating schema digest, otherwise there would be
permanent schema disagreement in during rolling upgrade.
2019-04-28 15:50:13 +02:00
Tomasz Grabiec
73b859005c db/schema_tables: Hash schema tables in the same order as on 3.0
The commit 7c87405 also indirectly changed the order of schema tables
during hash calculation (index table should be taken after all other
tables). This shows up when there is an index created and any of {user
defined type, function, or aggregate}.

Refs #4457.
2019-04-28 15:50:13 +02:00
Tomasz Grabiec
394a684a99 db/schema_tables: Remove table name caching from all_tables()
The set of table names will depend on the features and thus will be dynamic.
2019-04-28 15:50:13 +02:00
Tomasz Grabiec
3cb7b2d72e treewide: Propagate schema_features to db::schema::all_tables() 2019-04-28 15:50:13 +02:00
Tomasz Grabiec
f33f0d759d enum_set: Introduce full() 2019-04-28 15:50:12 +02:00
Tomasz Grabiec
1d9b88dceb service/storage_service: Introduce cluster_schema_features() 2019-04-28 15:50:12 +02:00
Tomasz Grabiec
0633fcde10 schema: Introduce schema_features 2019-04-28 15:50:12 +02:00
Tomasz Grabiec
6e2c190b5f schema_tables: Propagate storage_service& to merge_schema()
We will need to calculate cluster schema features at the time we
calculate the schema digest.
2019-04-28 12:33:10 +02:00
Tomasz Grabiec
6db002163f gms/feature: Introduce a more convenient when_enabled()
It can be invoked with a lambda without the ceremony of creating a
class deriving from gms::feature::listener.

The reutrned registration object controls listener's scope.
2019-04-28 12:33:10 +02:00
Tomasz Grabiec
22c07b9183 gms/feature: Mark all when_enabled() overloads as const 2019-04-28 12:33:10 +02:00
Rafael Ávila de Espíndola
ee9f3388f6 cql_query_test: Fix a use after return
There was nothing keeping the verify lambda alive after the return. It
worked most of the time since the only state kept by the lambda was
a pointer to cql_test_env.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190426203823.15562-1-espindola@scylladb.com>
2019-04-27 08:06:35 +03:00
Avi Kivity
07d06aee43 Update seastar submodule
* seastar e84d2647c...4cdccae53 (4):
  > Merge "future: Move some code out of line" from Rafael
  > tests: socket_test: Add missing virtual and override
  > build: Don't pass -Wno-maybe-uninitialized to clang
  > Merge "expose file_permssions for creating files and dirs in API" from Benny
2019-04-26 22:58:48 +03:00
Tomasz Grabiec
c6274fdef3 keys: Avoid implicit conversion to partition_key in the hasher of partition_key_view
Message-Id: <1556230107-13557-1-git-send-email-tgrabiec@scylladb.com>
2019-04-26 20:02:35 +03:00
Botond Dénes
bc08f8fd07 flat_mutation_reader: add flat_mutation_reader_from_mutations() overload with range and slice
To be able to run the mutation-source test suite with this reader. In
the next patch, this reader will be used in testing another reader, so
it is important to make sure it works correctly first.
2019-04-26 12:43:45 +03:00
Botond Dénes
eba310163d multishard_combining_reader: fix handling of non-strictly monotonous positions
The shard readers under a multishard reader are paused after every
operation executed on them. When paused they can be evicted at any time.
When this happens, they will be re-created lazily on the next
operation, with a start position such that they continue reading from
where the evicted reader left off. This start position is determined
from the last fragment seen by the previous reader. When this position
is clustering position, the reader will be recreated such that it reads
the clustering range (from the half-read partition): (last-ckey, +inf).
This can cause problems if the last fragment seen by the evicted reader
was a range-tombstone. Range tombstones can share the same clustering
position with other range tombstones and potentially one clustering row.
This means that when the reader is recreated, it will start from the
next clustering position, ignoring any unread fragments that share the
same position as the last seen range tombstone.
To fix, ensure that on each fill-buffer call, the buffer contains all
fragments for the last position. To this end, when the last fragment in
the buffer is a range tombstone (with pos x), we continue reading until
we see a fragment with a position y that is greater. This way it is
ensured that we have seen all fragments for pos x and it is safe to
resume the read, starting from after position x.
2019-04-26 11:38:12 +03:00
Botond Dénes
b30af48c83 position_in_partition_view: add region() accessor 2019-04-26 11:38:12 +03:00
Piotr Sarna
037b517c85 service: initialize system distributed keyspace after schema agreement
In order to avoid schema disagreements during upgrades (which may lead
to deadlocks), system distributed keyspace initialization is moved
right before starting the bootstrapping process, after the schema
agreement checks already succeeded.

Fixes #3976
Message-Id: <932e642659df1d00a2953df988f939a81275774a.1556204185.git.sarna@scylladb.com>
2019-04-25 18:44:08 +02:00
Raphael S. Carvalho
ccb29c6c20 sstables: make partitioned sstable set available to custom compaction strategies
To make it available, we'll need to make it optional the usage of level metadata,
used to deal with interval map's fragmentation issue when level 0 falls behind,
and also introduce a interface for compaction strategies to implement
make_sstable_set() that instantiate partitioned sstable set.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190424232948.668-1-raphaelsc@scylladb.com>
2019-04-25 12:59:04 +03:00
Botond Dénes
a3f79bfe5e mutlishard_combining_reader: reorder shard_reader::remote_reader::do_fill_buffer() code
Reduce the number of indentations - use early return for the short path.
2019-04-24 10:55:16 +03:00
Botond Dénes
bbd3f0acc3 multishard_combining_reader: shard_reader::remote_reader extract fill-buffer logic into do_fill_buffer() 2019-04-24 10:55:16 +03:00
Avi Kivity
b19792405f main: RAII-ify shutdown
Instead of app-template::run_deprecated() and at_exit() hooks, use
app_template::run() and RAII (via defer()) to stop services. This makes it
easier to add services that do support shutdown correctly.

Ref #2737
Message-Id: <20190420175733.29454-1-avi@scylladb.com>
2019-04-23 16:13:39 +02:00
Tomasz Grabiec
21fbf59fa8 lsa: Fix compact_and_evict() being called with a too low step
compact_and_evict gets memory_to_release in bytes while
reclamation step is in segments.

Broken in f092decd90.

It doesn't make much difference with the current default step of 1
segment since we cannot reclaim less than that, so shouldn't cause
problems in practice.

Message-Id: <1556013920-29676-1-git-send-email-tgrabiec@scylladb.com>
2019-04-23 13:14:43 +03:00
Gleb Natapov
c6b3b9ff13 cache_hitrate_calculator: wait for ongoing calculation to complete during stop
Currently stop returns ready future immediately. This is not a problem
since calculation loop holds a shared pointer to the local service, so
it will not be destroyed until calculation completes and global database
object db, that also used by the calculation, is never destroyed. But the
later is just a workaround for a shutdown sequence that cannot handle
it and will be changed one day. Make cache hitrate calculation service
ready for it.

Message-Id: <20190422113538.GR21208@scylladb.com>
2019-04-22 14:44:42 +03:00
Takuya ASADA
64c2aa8f9b reloc/python3: add missing SCYLLA-PRODUCT-FILE to python3 relocatable package
Since 214c74a, we need SCYLLA-PRODUCT-FILE on relocatable package so add
it on python3 package as well.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190422085620.22486-1-syuu@scylladb.com>
2019-04-22 13:56:38 +03:00
Gleb Natapov
306f5b99b5 cache_hitrate_calculator: fix use after free in non_system_filter lambda
non_system_filter lambda is defined static which means it is initialized
only once, so the 'this' that is will capture will belong to a shard
where the function runs first. During service destruction the function
may run on different shard and access already other's shard service that
may be already freed.

Fixed #4425

Message-Id: <20190421152139.GN21208@scylladb.com>
2019-04-21 18:22:31 +03:00
Amnon Heiman
9ad63efcfe Adding node_exporter to docker
This patch add the node_exporter to the docker image.
It install it create and run a service with it.

After this patch node_exporter will run and will be part of scylla
Docker image.

Fixes #4300

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20190421130643.6837-1-amnon@scylladb.com>
2019-04-21 18:12:58 +03:00
Benny Halevy
0c9aaef673 sstables: make lamdas that std:move mutable
As noticed by Rafael Ávila de Espíndola <espindola@scylladb.com>
regarding commit 5a99023d4a:
Without the lambda being mutable, the second std::move actually doesn't move anything.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190421150422.19304-1-bhalevy@scylladb.com>
2019-04-21 18:11:42 +03:00
Benny Halevy
5a99023d4a treewide: use lambda for io_check of *touch_directory
To prepare for a seastar change that adds an optional file_permissions
parameter to touch_directory and recursive_touch_directory.
This change messes up the call to io_check since the compiler can't
derive the Func&& argument.  Therefore, use a lambda function instead
to wrap the call to {recursive_,}touch_directory.

Ref #4395

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190421085502.24729-1-bhalevy@scylladb.com>
2019-04-21 12:04:39 +03:00
Tomasz Grabiec
f092decd90 lsa: Fix potential bad_alloc even though evictable memory exists
When we start the LSA reclamation it can be that
segment_pool::_free_segments is 0 under some conditions and
segment_pool::_current_emergency_reserve_goal is set to 1. The
reclamation step is 1 segment, and compact_and_evict_locked() frees 1
segment back into the segment_pool. However,
segment_pool::reclaim_segments() doesn't free anything to the standard
allocator because the condition _free_segments >
_current_emergency_reserve_goal is false. As a result,
tracker::impl::reclaim() returns 0 as the amount of released memory,
tracker::reclaim() returns
memory::reclaiming_result::reclaimed_nothing and the seastar allocator
thinks it's a real OOM and throws std::bad_alloc.

The fix is to change compact_and_evict() to make sure that reserves
are met, by releasing more if they're not met at entry.

This change also allows us to drop the variant of allocate_segment()
which accepts the reclamation step as a means to refill reserves
faster. This is now not needed, because compact_and_evict() will look
at the reserve deficit to increase the amount of memory to reclaim.

Fixes #4445

Message-Id: <1555671713-16530-1-git-send-email-tgrabiec@scylladb.com>
2019-04-20 09:17:49 +03:00
Avi Kivity
704600f829 Update seastar submodule
* seastar eb03ba5cd...e84d2647c (14):
  > Fix hardcoded python paths in shebang line
  > Disable -Wmaybe-uninitialized everywhere
  > app_template: allow opting out of automatic SIGINT/SIGTERM handling
  > build: Restore DPDK machine inference from cflags
  > http: capture request content for POST requests
  > Merge "Simplify future_state and promise" from Rafael
  > temporary_buffer: fix memleak on fast path
  > perftune.py: allow explicitly giving a CPU mask to be used for binding IRQs
  > perftune.py: fix the sanity check for args.tune
  > perftune.py: identify fast-path hardware queues IRQs of Mellanox NICs
  > memory: malloc_allocator should be always available
  > Merge "Using custom allocator in the posix network stack" from Elazar
  > memory: Tell reclaimers how much should be reclaimed
  > net/ipv4_addr: add std::hash & operator== overloads
2019-04-20 09:16:53 +03:00
Avi Kivity
d485facea2 Revert "tools: toolchain: improve dbuild signal handing"
This reverts commit 6c672e674b. It loses
build logs, and the patch that restores logs causes build failures, so
the whole thing needs to be revisited.
2019-04-19 15:16:42 +03:00
Takuya ASADA
0a874f1897 dist/docker/redhat: prioritize /opt/scylladb/python3/bin on $PATH
To prevent running entrypoint script in another python3 package like
python36 in EPEL, move /opt/scylladb/python3/bin to top of $PATH.
It won't happen on this container image, but may occurs when user tries to
extend the image.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190417165806.12212-1-syuu@scylladb.com>
2019-04-19 11:47:40 +03:00
Takuya ASADA
c3dae6673f dist/common/scripts: use out() to run perftune.py
perftune.py executes hwloc-calc, the command is now provided as
relocatable binary, placed under /opt/scylladb/bin.
So we need to add the directory to PATH when calling
subprocess.check_output(), but our utility function already do that,
switch to it.

Fixes #4443

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190418124345.24973-1-syuu@scylladb.com>
2019-04-19 11:47:40 +03:00
Benny Halevy
9785754e0d distributed_loader: do not follow symlinks when verifying mode and owner
We allow only regular files and directotries so to detect symlinks
we must not follow them.

Fixes #4375

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190418051627.9298-1-bhalevy@scylladb.com>
2019-04-19 11:47:40 +03:00
Takuya ASADA
214c74a71d dist: merge product name parameter on single place
When we add product name customization, we mistakenly defined the
parameter on each package build script.
Number of script is increasing since we recently added relocatable
python3 package, we should merge it in single place.

Also we should save the parameter on relocatable package, just like
version-release parameters.

So move the definition to SCYLLA-VERSION-GEN, save it to
build/SCYLLA-PRODUCT-FILE then archive it to relocatable package.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190417163335.10191-1-syuu@scylladb.com>
2019-04-19 11:47:40 +03:00
Paweł Dziepak
d47ea66ec6 messaging_service: add lz4_fragmented RPC compressor
Seastar now supports two RPC compression algorithm: the original LZ4 one
and LZ4_FRAGMENTED. The latter uses lz4 stream interface which allows it
to process large messages without fully linearising them. Since, RPC
requests used by Scylla often contain user-provided data that
potentially could be very large, LZ4_FRAGMENTED is a better choice for
the default compression algorithm.

Message-Id: <20190417144318.27701-1-pdziepak@scylladb.com>
2019-04-18 19:07:14 +03:00
Takuya ASADA
592fec32a0 dist/common/scripts: use /etc/os-release to detect distributions
Since we moved relocatable .rpm now Scylla able to run on Amazon Linux
2.
However, is_redhat_variant() on scylla_util.py does not works on Amazon
Linux 2, since it does not have /etc/redhat-release.
So we need to switch to /etc/os-release, use ID_LIKE to detect Redhat
variants/Debian variants.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190417115634.9635-1-syuu@scylladb.com>
2019-04-18 19:07:14 +03:00