"
A good portion of the values that one would want to be examine with
scylla-tools will be partition or clustering keys. While examining them
was possible before too, especially for single component keys, it
required manually extracting the components from it, so they can be
individually examined.
This series adds support for working with keys directly, by adding
prefixable and full compound type support.
When passing --prefix-compound or --full-compound, multiple types can be
passed, which will form the compound type.
Example:
$ scylla_types --print --prefix-compound -t TimeUUIDType -t Int32Type 0010d00819896f6b11ea00000000001c571b000400000010
(d0081989-6f6b-11ea-0000-0000001c571b, 16)
Another feature added in this series is validation. For this,
`compound_type::validate()` had to be implemented first. We already use
this in our code, but currently has a no-op body.
Example:
$ scylla-types --validate --full-compound -t TimeUUIDType -t Int32Type 0010d00819896f6b11ea00000000001c571b0004000000
0010d00819896f6b11ea00000000001c571b0004000000: INVALID - seastar::internal::backtraced<marshal_exception> (marshaling error: compound_type iterator - not enough bytes, expected 4, got 3 Backtrace: 0x1b2e30f
0x85c9d5
0x85cb07
0x85cc7b
0x85cd7c
0x85d2d7
0x844e03
0x84241b
0x84490b
0x844ae5
0x19c0362
0x19c0741
0x19c13d1
0x19c4b44
0x8aeb7a
0x8aeca7
0x19ebc90
0x19fb8d5
0x1a12b49
0x19c4376
0x19c47a6
0x19c4900
0x843373
/lib64/libc.so.6+0x271a2
0x84202d
)
Tests: unit(dev)
"
* 'tools-scylla-types-compound-support/v1' of https://github.com/denesb/scylla:
tools/scylla_types: add validation action
tools/scylla_types: add compound_type support
tools/scylla_types: single source of truth for actions
compound_type: implement validate()
compound_type: fix const correctness
tools: mv scylla_types scylla-types
The patch implements:
- /storage_service/auto_compaction API endpoint
- /column_family/autocompaction/{name} API endpoint
Those APIs allow to control and request the status of background
compaction jobs for the existing tables.
The implementation introduces the table::_compaction_disabled_by_user.
Then the CompactionManager checks if it can push the background
compaction job for the corresponding table.
New members
===
table::enable_auto_compaction();
table::disable_auto_compaction();
bool table::is_auto_compaction_disabled_by_user() const
Test
===
Tests: unit(sstable_datafile_test autocompaction_control_test), manual
$ ninja build/dev/test/boost/sstable_datafile_test
$ ./build/dev/test/boost/sstable_datafile_test --run_test=autocompaction_control_test -- -c1 -m2G --overprovisioned --unsafe-bypass-fsync 1 --blocked-reactor-notify-ms 2000000
The test tries to submit a compaction job after playing
with autocompaction control table switch. However, there is
no reliable way to hook pending compaction task. The code
assumed that with_scheduling_group() closure will never
preempt execution of the stats check.
Revert
===
Reverts commit c8247ac. In previous version the execution
sometimes resulted into the following error:
test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test":
critical check cm->get_stats().pending_tasks == 1 || cm->get_stats().active_tasks == 1 has failed
This version adds a few sstables to the cf, starts
the compaction and awaits until it is finished.
API change
===
- `/column_family/autocompaction/` always returned `true` while answering to the question: if the autocompaction disabled (see https://github.com/scylladb/scylla-jmx/blob/master/src/main/java/org/apache/cassandra/db/ColumnFamilyStore.java#L321). now it answers to the question: if the autocompaction for specific table is enabled. The question logic is inverted. The patch to the JMX is required. However, the change is decent because all old values were invalid (it always reported all compactions are disabled).
- `/column_family/autocompaction/` got support for POST/DELETE per table
Fixes
===
Fixes#1488Fixes#1808Fixes#440
Signed-off-by: Ivan Prisyazhnyy <ivan@scylladb.com>
Reviewed-by: Glauber Costa <glauber@scylladb.com>
The single-key sstable reader uses the clustering ranges from the slice
to determine the upper bound of the disk read-range using the index.
For this is simply uses the end bound of the last clustering ranges. For
reverse reads however the clustering ranges in the slice are in reverse
order, so this will in fact be the upper bound of the smallest range.
Depending on whether the distance between the clustering range is big
enough for the sstable reader to use the index to skip between them,
this will lead to either reading too little data or an assert failure.
This patch fixes the problematic function `get_slice_upper_bound()` to
consider reverse reads as well.
Initially I thought there will be more mishandling of reverse slices,
but actually `mutation_fragment_filter`, the component doing the actual
slicing of rows, is already reverse-slice aware.
A unit test which reproduces the assert failure is also added.
Fixes: #6171
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200507114956.271799-1-bdenes@scylladb.com>
"
Garbage collected SSTables, created by incremental compaction process,
are being added to the SSTable set using a function that invalidates
row cache using the range of the SSTable itself. That's incorrect
because data in GC SSTables come from preexisting SSTables in set,
meaning the state of data isn't changed and so no need for
invalidation at all. Incorrect invalidation like this is a source of
read performance issues. This problem is fixed by including GC
SSTables to the descriptor which is used to specify changes to the
SSTable set, which is the correct thing to do given that a midway
failure could leave the set in an incorrect state.
Fixes#5956.
Fixes#6275.
tests: unit(dev)
"
* 'fix_issue_5956_v4' of github.com:raphaelsc/scylla:
sstables/compaction: Don't invalidate row cache when adding GC SSTable to SSTable set
sstables/compaction: Change meaning of compaction_completion_desc input and output fields
sstables/compaction: Clean up code around garbage_collected_sstable_writer
The shutdown process of compaction manager starts with an explicit call
from the database object. However that can only happen everything is
already initialized. This works well today, but I am soon to change
the resharding process to operate before the node is fully ready.
One can still stop the database in this case, but reshardings will
have to finish before the abort signal is processed.
This patch passes the existing abort source to the construction of the
compaction_manager and subscribes to it. If the abort source is
triggered, the compaction manager will react to it firing and all
compactions it manages will be stopped.
We still want the database object to be able to wait for the compaction
manager, since the database is the object that owns the lifetime of
the compaction manager. To make that possible we'll use a future
that is return from stop(): no matter what triggered the abort, either
an early abort during initial resharding or a database-level event like
drain, everything will shut down in the right order.
The abort source is passed to the database, who is responsible from
constructing the compaction manager.
Tests: unit (dev), manual start+stop, manual drain + stop
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200506184749.98288-1-glauber@scylladb.com>
This is unrelated to counters, but happens to fix#4209
`tuple::delayed_value::contains_bind_marker` used to check that
ALL terms are bound (not that ANY of them is bound). As a result,
scylla would crash in prepare codepath for collections of tuples.
After this fix `invalid_request_exception` is thrown instead.
* jul-stas-4209-crash-on-counter-shards-set:
boost/tests: test for bound variable in a list of tuple literals
cql3: fix detection of bound variables in tuples
"
CDC has to create CDC streams that are co-located with corresponding BaseTable data. This is not always easy. Especially for small vnodes. This PR introduces new partitioner which allows us to easily find such stream ids that the stream belongs to a given vnode and shard.
The idea is that a partitioner accepts only keys that are a blob composed of two int64 numbers. The first number is the token of the key.
Tests: unit(dev), dtests(CDC)
"
* haaawk-cdc_partitioner:
cdc:use CDCPartitioner for CDC Log
dht: Add find_first_token_for_shard
dht: use long_token in token::to_int64
cdc: add CDCPartitioner
stream_id: add token_from_bytes static function
i_partitioner: Stop distinguishing whether keys order is preserved
Garbage collected SSTable is incorrectly added to SSTable set with a function
that invalidates row cache. This problem is fixed by adding GC SStable
to set using mechanism which replaces old sstables with new sstables.
Also, adding GC SSTable to set in a separate call is not correct.
We should make sure that GC SSTable reaches the SSTable set at the same time
its respective old (input) SSTable is removed from the set, and that's done
using a single request call to table.
Fixes#5956.
Fixes#6275.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
input_sstables is renamed to old_sstables and is about old SSTables that should be
deleted and removed from the SSTable set.
output_sstables is renamed to new_sstables and is about new SSTable that should be
added to the SSTable set, replacing the old ones.
This will allow us, for example, to add auxiliary SSTables to SSTable set using
the same call which replaces output SSTables by input SSTables in compaction.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Currently, the test seems to use the tmpdir class in a wrong way,
just to get a path to a temporary directory.
It should keep the tmpdir object around for the duration of the test
so the temporary directory will be automatically removed when the test
completes.
Refs #6344
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200504153810.202218-1-bhalevy@scylladb.com>
Right now the compaction_manager needs to be started explicitly.
We may change it in the future, but right now that's how it is.
Everything works now even without it, because compaction_manager::stop
happens to work even if it was not started. But it is technically
illegal.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200504143048.17201-1-glauber@scylladb.com>
"
Fixes#6067
Makes the scylla endpoint initializations that support TLS use reloadable certificate stores, watching used cert + key files for changes, and reload iff modified.
Tests in separate dtest set.
"
* elcallio-calle/reloadable-tls:
transport: Use reloadable tls certificates
redis: Use reloadable tls certificates
alternator: Use reloadable tls certificates
messaging_service: Use reloadable TLS certificates
Changes messaging service rpc to use reloadable tls
certificates iff tls is enabled-
Note that this means that the service cannot start
listening at construction time if TLS is active,
and user need to call start_listen_ex to initialize
and actually start the service.
Since "normal" messaging service is actually started
from gms, this route too is made a continuation.
The compaction_manager test lives inside a thread and it is not taking
advantage of it, with continuations all over.
One of the side effects of it is that the test is calling stop() twice
on the compaction_manager. While this works today, it is not good
practice. A change I am making is just about to break it.
This patch converts the test to fully use .get() instead of chained
continuations and in doing so also guarantees that the compaction
manager will be RAII-stopped just one, from a defer object.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200503161420.8346-2-glauber@scylladb.com>
Loading SSTables from the main directory is possible, to be compatible with
Cassandra, but extremely dangerous and not recommended.
From the beginning, we recommend using an separate, upload/ directory.
In all this time, perhaps due to how the feature's usefulness is reduced
in Cassandra due to the possible races, I have never seen anyone coming
from Cassandra doing procedures involving refresh at all.
Loading SSTables from the main directory forces us to disable writes to
the table temporarily until the SSTables are sorted out. If we get rid of
this, we can get rid of the disabling of the writes as well.
We can't do it now because if we want to be nice to the odd user that may
be using refresh through the main directory without our knowledge we should
at least error out.
This patch, then, does that: it errors out if SSTables are found in the main
directory. It will not proceed with the refresh, and direct the user to the
upload directory.
The main loop in reshuffle_sstables is left in place structurally for now, but
most of it is gone. The test for is is deleted.
After a period of deprecation we can start ignoring these SSTables and get rid
of the lock.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200429144511.13681-1-glauber@scylladb.com>
"
This series fix hang in multishard_writer when error happens. It contains
- multishard_writer: Abort the queue attached to consumers when producer fails
- repair: Fix hang when the writer is dead
Fixes#6241
Refs: #6248
"
* asias-stream_fix_multishard_writer_hang:
repair: Fix hang when the writer is dead
mutation_writer_test: Add test_multishard_writer_producer_aborts
multishard_writer: Abort the queue attached to consumers when producer fails
The constructor of `read_command` is used both by IDL and clients in the
code. However, this constructor has a parameter that is not used by IDL:
`read_timestamp`. This requires that this parameter is the very last in
the list and that new parameters that are used by IDL are added before
it. One such new parameter was `bool is_first_page`. Adding this
parameter right before the read timestamp one created a situation where
the last parameter (read_timestamp) implicitly converts to the one
before it (is_first_page). This means that some call sites passing
`read_timestamp` were now silently converting this to `is_first_page`,
effectively dropping the timestamp.
This patch aims to rectify this, while also avoiding similar accidents
in the future, by making `is_first_page` a `bool_class` which doesn't
have any implicit convertions defined. This change does not break the
ABI as `bool_class` is also sent as a `bool` on the wire.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Tests: unit(dev)
Message-Id: <20200422073657.87241-1-bdenes@scylladb.com>
Some legacy `mc` SSTables (created in Scylla 3.0) may contain incorrect
serialization headers, which don't wrap frozen UDTs nested inside collections
with the FrozenType<...> tag. When reading such SSTable,
Scylla would detect a mismatch between the schema saved in schema
tables (which correctly wraps UDTs in the FrozenType<...> tag) and the schema
from the serialization header (which doesn't have these tags).
SSTables created in Scylla versions 3.1 and above, in particular in
Scylla versions that contain this commit, create correct serialization
headers (which wrap UDTs in the FrozenType<...> tag).
This commit does two things:
1. for all SSTables created after this commit, include a new feature
flag, CorrectUDTsInCollections, presence of which implies that frozen
UDTs inside collections have the FrozenType<...> tag.
2. when reading a Scylla SSTable without the feature flag, we assume that UDTs
nested inside collections are always frozen, even if they don't have
the tag. This assumption is safe to be made, because at the time of
this commit, Scylla does not allow non-frozen (multi-cell) types inside
collections or UDTs, and because of point 1 above.
There is one edge case not covered: if we don't know whether the SSTable
comes from Scylla or from C*. In that case we won't make the assumption
described in 2. Therefore, if we get a mismatch between schema and
serialization headers of a table which we couldn't confirm to come from
Scylla, we will still reject the table. If any user encounters such an
issue (unlikely), we will have to use another solution, e.g. using a
separate tool to rewrite the SSTable.
Fixes#6130.
There is no reason to read a single SSTable at a time from the staging
directory. Moving SSTables from staging directory essentially involves
scanning input SSTables and creating new SSTables (albeit in a different
directory).
We have a mechanism that does that: compactions. In a follow up patch, I
will introduce a new specialization of compaction that moves SSTables
from staging (potentially compacting them if there are plenty).
In preparation for that, some signatures have to be changed and the
view_updating_consumer has to be more compaction friendly. Meaning:
- Operating with an sstable vector
- taking a table reference, not a database
Because this code is a bit fragile and the reviewer set is fundamentally
different from anything compaction related, I am sending this separately
* glommer-view_build:
staging: potentially read many SSTables at the same time
view_build_test: make sure it works with smp > 1
This test doesn't work with higher smp counts, because it relies on
dealing with keys named 'a' and 'b' and creates SSTables containing one
of them manually. This throws an exception if we happen to execute on
a shard that don't own the tokens corresponding to those keys.
This patch avoids that problem by pre-selecting keys that we know to
belong to the current shard in which the test is executed.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Fixes#6195
test_commitlog_delete_when_over_disk_limit reads current segment list
in flush handler, to compare with result after allowing deletetion of
segement. However, it might be called more than once in rare cases,
because timing and us using rather small sizes.
Reading the list the second time however is not a good idea, because
it might just very well be exactly the same as what we read in the
test check code, and we actually overwrite the list we want to
check against. Because callback is on timer. And test is not.
Message-Id: <20200414114322.13268-1-calle@scylladb.com>
"
This is a continuation of recent efforts to cover more and more internal
de-serialization paths with `on_internal_error()`. Errors like this
should always be investigated but this can only be done with a core.
This patch covers the error paths of `composite::iterator` with
`on_internal_error()`. As we need this patch to investigate a 4.0
blocker issue (#6121) it only does the minimal amount of changes needed
to allow generating a core for de-serializiation failures of composites.
There are a few FIXMEs left in the code that I plan to address in a
follow-up.
Ref: #6121
"
* 'compound-on-internal-error/v1' of https://github.com/denesb/scylla:
compound_compat: composite::iterator cover error-paths with on_internal_error()
compound_compat: composite_view: add is_valid()
For a column of type `frozen<list<T>>` in base table, a corresponding
column of type `frozen<map<timeuuid, T>>` is created in cdc log.
Although a similar change of type takes place in case of non-frozen
lists, this is unneeded in case of frozen lists - frozen collections are
atomic, therefore there is no need for complicated type that will be
able to represent a column update that depends on its previous value
(e.g. appending elements to the end of the list).
Moreover, only cdc log table creation logic performs this type change
for frozen lists. The logic of `transformer::transform`, which is
responsible for creation of mutations to cdc log, assumes that atomic
columns will have their types unchanged in cdc log table. It simply
copies new value of the column from original mutation to the cdc log
mutation. A serialized frozen list might be copied to a field that is of
frozen map type, which may cause the field to become impossible to
deserialize.
This patch causes frozen list base table columns to have a corresponding
column in cdc log with the same type.
A test is added which asserts that the type of cdc log columns is not
changed in the case of frozen base columns.
Tests: unit(dev)
Fixes#6172
Fixes#5808
Seems some gcc:s will generate the code as sign extending. Mine does not,
but this should be more correct anyhow.
Added small stringify test to serialization_test for inet_address
"
We currently allow null on the right-hand side of certain relations, while Cassandra prohibits it. Since our handling of these null values is mostly incorrect, it's better to match Cassandra in prohibiting it.
See the discussion (https://github.com/scylladb/scylla/pull/5763#discussion_r405557323.
NB: any reverse mismatch (Scylla prohibiting something that Cassandra allows) is left remaining. For example, we forbid null bounds on clustering columns, which Cassandra allows.
Tests: unit (dev)
"
* dekimir-match-cass-null:
restrictions: Forbid null bound for nonkey columns
restrictions: Forbid null equality
This reverts commit 1c444b7e1e. The test
it adds sometimes fails as follows:
test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test":
critical check cm->get_stats().pending_tasks == 1 || cm->get_stats().active_tasks == 1 has failed
Ivan is working on a fix, but let's revert this commit to avoid blocking
next promotion failing from time to time.
This patch adds API endpoint /column_family/autocompaction/{name}
that listen to GET and POST requests to pick and control table
background compactions.
To implement that the patch introduces "_compaction_disabled_by_user"
flag that affects if CompactionManager is allowed to push background
compactions jobs into the work.
It introduces
table::enable_auto_compaction();
table::disable_auto_compaction();
bool table::is_auto_compaction_disabled_by_user() const
to control auto compaction state.
Fixes#1488Fixes#1808Fixes#440
Tests: unit(sstable_datafile_test autocompaction_control_test), manual
We typically use `std::bad_function_call` to throw from
mandatory-to-implement virtual functions, that cannot have a meaningful
implementation in the derived class. The problem with
`std::bad_function_call` is that it carries absolutely no information
w.r.t. where was it thrown from.
I originally wanted to replace `std::bad_function_call` in our codebase
with a custom exception type that would allow passing in the name of the
function it is thrown from to be included in the exception message.
However after I ended up also including a backtrace, Benny Halevy
pointed out that I might as well just throw `std:bad_function_call` with
a backtrace instead. So this is what this patch does.
All users are various unimplemented methods of the
`flat_mutation_reader::impl` interface.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200408075801.701416-1-bdenes@scylladb.com>
Fixes#6143
When doing post-image generation, we also write values for columns not
in delta (actual update), based on data selected in pre-image row.
However, if we are doing initial update/insert with only a subset of
columns, when the pre-image result set is nil, this cannot be done.
Adds check to non-touched column post-image code. Also uses the
pre-image value extractor to handle non-atomic sets properly.
Tests updated.
But only non-validation error paths. When validating we do expect it to
maybe fail, so we don't want to generate cores for validation.
Validation is in fact a de-serialization pass with some additional
checks. To be able to keep reusing the same code for de-serialization
and validation just with different error handling, introduce a
`strict_mode` flag that can be passed to `composite::iterator`
constructor. When in strict mode (the default) the iterator will convert
any `marshal_exception` thrown during the de-serialization to
`on_internal_error()`.
We don't want anybody to use the iterator in non-strict mode, besides
validation, so the iterator constructors are made private. This is
standard practice for iterators anyway.
Any alter table statement that doesn't explicitly set the default time
to live will reset it to 0.
That can be very dangerous for time series use cases, which rely on
all data being eventually expired, and a default TTL of 0 means
data never being expired.
Fixes#5048.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200402211653.25603-1-raphaelsc@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/6136 from
Piotr Sarna:
An empty partition/clustering key pair is a valid state of the
query paging state. Unfortunately, recent attempts at debugging
a flaky test (#5856) resulted in introducing an assertion (7616290)
which breaks when trying to generate a key from such a pair.
In order to keep the assertion (since it still makes sense in its
scope), but at the same time translate empty keys properly,
empty keys are now explicitly processed at the beginning of the
function.
This behaviour was 100% reproducible in a secondary index dtest below.
Fixes#6134
Refs #5856
Tests: unit(dev),
dtest(TestSecondaryIndexes.test_truncate_base)
"
This patch makes makes major compaction aware of time buckets
for TWCS. That means that calling a major compaction with TWCS
will not bundle all SSTables together, but rather split them
based on their timestamps.
There are two motivations for this work:
Telling users not to ever major compact is easier said than
done: in practice due to a variety of circumstances it might
end up being done in which case data will have a hard time
expiring later.
We are about to start working with offstrategy compactions,
which are compactions that work in parallel with the main
compactions. In those cases we may be converting SSTables from
one format to another and it might be necessary to split a single
big STCS SSTable into something that TWCS expects
In order to achieve that, we start by changing the way resharding works:
it will now work with a read interposer, similar to the one TWCS uses for
streaming data. Once we do that, a lot of assumptions that exist in the
compaction code can be simplified and supporting TWCS major
compactions become a matter of simply enabling its interposer in the
compaction code as well.
There are many further simplifications that this work exposes:
The compaction method create_new_sstable seems out of place. It is not
used by resharding, and it seems duplicated for normal compactions. We
could clean it up with more refactoring in a later patch.
The whole logic of the feed_writer could be part of the consumer code.
Testing details:
scylla unit tests (dev, release)
sstable_datafile_test (debug)
dtests (resharding_test.py)
manual scylla resharding
Fixes#1431
"
Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
* 'twcs-major-v3' of github.com:glommer/scylla:
compaction: make major compaction time-aware with TWCS
compaction: do resharding through an interposer
mutation_writer: introduce shard_based splitting writer
mutation_writer: factor out part of the code for the timestamp splitter
compaction: abort if create_new_sstable is called from resharding
In order to check regressions related to #6136 and similar issues,
test cases for handling paging state with empty partition/clustering
key pair are added.
This removes the need to include reactor.hh, a source of compile
time bloat.
In some places, the call is qualified with seastar:: in order
to resolve ambiguities with a local name.
Includes are adjusted to make everything compile. We end up
having 14 translation units including reactor.hh, primarily for
deprecated things like reactor::at_exit().
Ref #1