"
Includes:
- code cleanups
- support for measuring data stores with more than one partition
- measure sstable footprint for all supported formats
- less verbose mode by default
"
* tag 'memory-footprint-test-improvement-v2' of github.com:tgrabiec/scylla:
test: memory_footprint: Silence logging by default
test: memory_footprint: Introduce --partition-count option
test: memory_footprint: Run under a cql_test_env
test: memory_footprint: Calculate sstable size for each format version
sstables: Move all_sstable_versions to version.hh
Some legacy `mc` SSTables (created in Scylla 3.0) may contain incorrect
serialization headers, which don't wrap frozen UDTs nested inside collections
with the FrozenType<...> tag. When reading such SSTable,
Scylla would detect a mismatch between the schema saved in schema
tables (which correctly wraps UDTs in the FrozenType<...> tag) and the schema
from the serialization header (which doesn't have these tags).
SSTables created in Scylla versions 3.1 and above, in particular in
Scylla versions that contain this commit, create correct serialization
headers (which wrap UDTs in the FrozenType<...> tag).
This commit does two things:
1. for all SSTables created after this commit, include a new feature
flag, CorrectUDTsInCollections, presence of which implies that frozen
UDTs inside collections have the FrozenType<...> tag.
2. when reading a Scylla SSTable without the feature flag, we assume that UDTs
nested inside collections are always frozen, even if they don't have
the tag. This assumption is safe to be made, because at the time of
this commit, Scylla does not allow non-frozen (multi-cell) types inside
collections or UDTs, and because of point 1 above.
There is one edge case not covered: if we don't know whether the SSTable
comes from Scylla or from C*. In that case we won't make the assumption
described in 2. Therefore, if we get a mismatch between schema and
serialization headers of a table which we couldn't confirm to come from
Scylla, we will still reject the table. If any user encounters such an
issue (unlikely), we will have to use another solution, e.g. using a
separate tool to rewrite the SSTable.
Fixes#6130.
There is no reason to read a single SSTable at a time from the staging
directory. Moving SSTables from staging directory essentially involves
scanning input SSTables and creating new SSTables (albeit in a different
directory).
We have a mechanism that does that: compactions. In a follow up patch, I
will introduce a new specialization of compaction that moves SSTables
from staging (potentially compacting them if there are plenty).
In preparation for that, some signatures have to be changed and the
view_updating_consumer has to be more compaction friendly. Meaning:
- Operating with an sstable vector
- taking a table reference, not a database
Because this code is a bit fragile and the reviewer set is fundamentally
different from anything compaction related, I am sending this separately
* glommer-view_build:
staging: potentially read many SSTables at the same time
view_build_test: make sure it works with smp > 1
This test doesn't work with higher smp counts, because it relies on
dealing with keys named 'a' and 'b' and creates SSTables containing one
of them manually. This throws an exception if we happen to execute on
a shard that don't own the tokens corresponding to those keys.
This patch avoids that problem by pre-selecting keys that we know to
belong to the current shard in which the test is executed.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Fixes#6195
test_commitlog_delete_when_over_disk_limit reads current segment list
in flush handler, to compare with result after allowing deletetion of
segement. However, it might be called more than once in rare cases,
because timing and us using rather small sizes.
Reading the list the second time however is not a good idea, because
it might just very well be exactly the same as what we read in the
test check code, and we actually overwrite the list we want to
check against. Because callback is on timer. And test is not.
Message-Id: <20200414114322.13268-1-calle@scylladb.com>
"
This is a continuation of recent efforts to cover more and more internal
de-serialization paths with `on_internal_error()`. Errors like this
should always be investigated but this can only be done with a core.
This patch covers the error paths of `composite::iterator` with
`on_internal_error()`. As we need this patch to investigate a 4.0
blocker issue (#6121) it only does the minimal amount of changes needed
to allow generating a core for de-serializiation failures of composites.
There are a few FIXMEs left in the code that I plan to address in a
follow-up.
Ref: #6121
"
* 'compound-on-internal-error/v1' of https://github.com/denesb/scylla:
compound_compat: composite::iterator cover error-paths with on_internal_error()
compound_compat: composite_view: add is_valid()
For a column of type `frozen<list<T>>` in base table, a corresponding
column of type `frozen<map<timeuuid, T>>` is created in cdc log.
Although a similar change of type takes place in case of non-frozen
lists, this is unneeded in case of frozen lists - frozen collections are
atomic, therefore there is no need for complicated type that will be
able to represent a column update that depends on its previous value
(e.g. appending elements to the end of the list).
Moreover, only cdc log table creation logic performs this type change
for frozen lists. The logic of `transformer::transform`, which is
responsible for creation of mutations to cdc log, assumes that atomic
columns will have their types unchanged in cdc log table. It simply
copies new value of the column from original mutation to the cdc log
mutation. A serialized frozen list might be copied to a field that is of
frozen map type, which may cause the field to become impossible to
deserialize.
This patch causes frozen list base table columns to have a corresponding
column in cdc log with the same type.
A test is added which asserts that the type of cdc log columns is not
changed in the case of frozen base columns.
Tests: unit(dev)
Fixes#6172
We have in alternator-test a set of over 340 functional tests for
Alternator. These tests are written in Python using the pytest
framework, expect Scylla to be running and connect to it using the
DynamoDB API with the "boto3" library (the AWS SDK for Python).
We have a script alternator-test/run which does everything needed
to run all these tests: Starts Scylla with the appropriate parameters
in a temporary directory, runs all the tests against it, and makes
sure the temporary directory is removed (regardless of whether the
tests succeeded or failed).
The goal of this small patch series is to integrate these Alternator
tests into test.py in a *simple* way. The idea is that we add *one*
test which just runs the aforementioned "run" script which does its
own business.
The changes we needed to do in this series to achieve this are:
1. Make the alternator-test/run script pick a unique IP address on which
to listen, instead of always using 127.0.0.1. This allows running
this test in parallel with dtest tests, or even parallel to itself.
2. Move the alternator-test directory to test/alternator. This is
the directory where test.py expects all the tests to live in.
It also makes sense - since we already have multiple subdirectories
in test/, to put the Alternator tests there too.
3. Add a new test suite type, "Run". A "Run" suite is simply a directory
with a script called "run", and this script is run to run the entire
suite, and this script does its own business.
4. Tests (such as the new "Run" ones) who can be killed gently and clean
up after themselves, should be killed with SIGTERM instead of
SIGKILL.
After this series, to run the Alternator tests from test.py, do:
./test.py --mode dev alternator
Note that in this version, the "--mode" has no effect - test/alternator/run
always runs the latest compiled Scylla, regardless of the chosen mode.
This can be fixed later.
The Alternator tests can still be run manually and individually against
a running Scylla or DynamoDB as before - just go to the test/alternator
directory and run "pytest" with the desired parameters.
Fixes#6046
* nyh/alternator-test-v3:
alternator-test: make Alternator tests runnable from test.py
test.py: add xunit XML output file for "Run" tests
test.py: add new test type "Run"
test.py: flag for aborting tests with SIGTERM, not SIGKILL
alternator-test: change "run" script to pick random IP address
alternator-test: add "--url" option to choose Alternator's URL
Fixes#5808
Seems some gcc:s will generate the code as sign extending. Mine does not,
but this should be more correct anyhow.
Added small stringify test to serialization_test for inet_address
"
We currently allow null on the right-hand side of certain relations, while Cassandra prohibits it. Since our handling of these null values is mostly incorrect, it's better to match Cassandra in prohibiting it.
See the discussion (https://github.com/scylladb/scylla/pull/5763#discussion_r405557323.
NB: any reverse mismatch (Scylla prohibiting something that Cassandra allows) is left remaining. For example, we forbid null bounds on clustering columns, which Cassandra allows.
Tests: unit (dev)
"
* dekimir-match-cass-null:
restrictions: Forbid null bound for nonkey columns
restrictions: Forbid null equality
To make the tests in alternator-test runnable by test.py, we need to
move the directory alternator-test/ to test/alternator, because test.py
only looks for tests in subdirectories of test/. Then, we need to create
a test/alternator/suite.yaml saying that this test directory is of type
"Run", i.e., it has a single run script "run" which runs all its tests.
The "run" script had to be slightly modified to be aware of its new
location relative to the source directory.
To run the Alternator tests from test.py, do:
./test.py --mode dev alternator
Note that in this version, the "--mode" has no effect - test/alternator/run
always runs the latest compiled Scylla, regardless of the chosen mode.
The Alternator tests can still be run manually and individually against
a running Scylla or DynamoDB as before - just go to the test/alternator
directory (instead of alternator-test previously) and run "pytest" with
the desired parameters.
Fixes#6046
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This reverts commit 1c444b7e1e. The test
it adds sometimes fails as follows:
test/boost/sstable_datafile_test.cc(1076): fatal error: in "autocompaction_control_test":
critical check cm->get_stats().pending_tasks == 1 || cm->get_stats().active_tasks == 1 has failed
Ivan is working on a fix, but let's revert this commit to avoid blocking
next promotion failing from time to time.
This patch adds API endpoint /column_family/autocompaction/{name}
that listen to GET and POST requests to pick and control table
background compactions.
To implement that the patch introduces "_compaction_disabled_by_user"
flag that affects if CompactionManager is allowed to push background
compactions jobs into the work.
It introduces
table::enable_auto_compaction();
table::disable_auto_compaction();
bool table::is_auto_compaction_disabled_by_user() const
to control auto compaction state.
Fixes#1488Fixes#1808Fixes#440
Tests: unit(sstable_datafile_test autocompaction_control_test), manual
We typically use `std::bad_function_call` to throw from
mandatory-to-implement virtual functions, that cannot have a meaningful
implementation in the derived class. The problem with
`std::bad_function_call` is that it carries absolutely no information
w.r.t. where was it thrown from.
I originally wanted to replace `std::bad_function_call` in our codebase
with a custom exception type that would allow passing in the name of the
function it is thrown from to be included in the exception message.
However after I ended up also including a backtrace, Benny Halevy
pointed out that I might as well just throw `std:bad_function_call` with
a backtrace instead. So this is what this patch does.
All users are various unimplemented methods of the
`flat_mutation_reader::impl` interface.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200408075801.701416-1-bdenes@scylladb.com>
Fixes#6143
When doing post-image generation, we also write values for columns not
in delta (actual update), based on data selected in pre-image row.
However, if we are doing initial update/insert with only a subset of
columns, when the pre-image result set is nil, this cannot be done.
Adds check to non-touched column post-image code. Also uses the
pre-image value extractor to handle non-atomic sets properly.
Tests updated.
But only non-validation error paths. When validating we do expect it to
maybe fail, so we don't want to generate cores for validation.
Validation is in fact a de-serialization pass with some additional
checks. To be able to keep reusing the same code for de-serialization
and validation just with different error handling, introduce a
`strict_mode` flag that can be passed to `composite::iterator`
constructor. When in strict mode (the default) the iterator will convert
any `marshal_exception` thrown during the de-serialization to
`on_internal_error()`.
We don't want anybody to use the iterator in non-strict mode, besides
validation, so the iterator constructors are made private. This is
standard practice for iterators anyway.
Any alter table statement that doesn't explicitly set the default time
to live will reset it to 0.
That can be very dangerous for time series use cases, which rely on
all data being eventually expired, and a default TTL of 0 means
data never being expired.
Fixes#5048.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200402211653.25603-1-raphaelsc@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/6136 from
Piotr Sarna:
An empty partition/clustering key pair is a valid state of the
query paging state. Unfortunately, recent attempts at debugging
a flaky test (#5856) resulted in introducing an assertion (7616290)
which breaks when trying to generate a key from such a pair.
In order to keep the assertion (since it still makes sense in its
scope), but at the same time translate empty keys properly,
empty keys are now explicitly processed at the beginning of the
function.
This behaviour was 100% reproducible in a secondary index dtest below.
Fixes#6134
Refs #5856
Tests: unit(dev),
dtest(TestSecondaryIndexes.test_truncate_base)
"
This patch makes makes major compaction aware of time buckets
for TWCS. That means that calling a major compaction with TWCS
will not bundle all SSTables together, but rather split them
based on their timestamps.
There are two motivations for this work:
Telling users not to ever major compact is easier said than
done: in practice due to a variety of circumstances it might
end up being done in which case data will have a hard time
expiring later.
We are about to start working with offstrategy compactions,
which are compactions that work in parallel with the main
compactions. In those cases we may be converting SSTables from
one format to another and it might be necessary to split a single
big STCS SSTable into something that TWCS expects
In order to achieve that, we start by changing the way resharding works:
it will now work with a read interposer, similar to the one TWCS uses for
streaming data. Once we do that, a lot of assumptions that exist in the
compaction code can be simplified and supporting TWCS major
compactions become a matter of simply enabling its interposer in the
compaction code as well.
There are many further simplifications that this work exposes:
The compaction method create_new_sstable seems out of place. It is not
used by resharding, and it seems duplicated for normal compactions. We
could clean it up with more refactoring in a later patch.
The whole logic of the feed_writer could be part of the consumer code.
Testing details:
scylla unit tests (dev, release)
sstable_datafile_test (debug)
dtests (resharding_test.py)
manual scylla resharding
Fixes#1431
"
Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
* 'twcs-major-v3' of github.com:glommer/scylla:
compaction: make major compaction time-aware with TWCS
compaction: do resharding through an interposer
mutation_writer: introduce shard_based splitting writer
mutation_writer: factor out part of the code for the timestamp splitter
compaction: abort if create_new_sstable is called from resharding
When trying to get rid of a large stack warning for gossip test,
I found out that it actually does not run at all for multiple reasons:
1. It segfaults due to wrong initialization order
2. After fixing that, it segfaults on use-after-free (due to capturing
a shared pointer by reference instead of by copy)
3. After that, cleanups are in order:
* seastar thread does not need to be spawned inside another thread;
* default captures are harmful, so they're made explicit instead;
* db::config is moved to heap, to finally get rid of the warning.
Tests: manual(gossip)
Message-Id: <feaca415d0d29a16c541f9987645365310663630.1585128338.git.sarna@scylladb.com>
In order to check regressions related to #6136 and similar issues,
test cases for handling paging state with empty partition/clustering
key pair are added.
This removes the need to include reactor.hh, a source of compile
time bloat.
In some places, the call is qualified with seastar:: in order
to resolve ambiguities with a local name.
Includes are adjusted to make everything compile. We end up
having 14 translation units including reactor.hh, primarily for
deprecated things like reactor::at_exit().
Ref #1
This allows us to drop a #include <reactor.hh>, reducing compile time.
Several translation units that lost access to required declarations
are updated with the required includes (this can be an include of
reactor.hh itself, in case the translation unit that lost it got it
indirectly via logalloc.hh)
Ref #1.
This patch makes makes major compaction aware of time buckets
for TWCS. That means that calling a major compaction with TWCS
will not bundle all SSTables together, but rather split them
based on their timestamps.
There are two motivations for this work:
1. Telling users not to ever major compact is easier said than
done: in practice due to a variety of circumstances it might
end up being done in which case data will have a hard time
expiring later.
2. We are about to start working with offstrategy compactions,
which are compactions that work in parallel with the main
compactions. In those cases we may be converting SSTables from
one format to another and it might be necessary to split a single
big STCS SSTable into something that TWCS expects
With the motivation out of the way, let's talk about the implementation:
The implementation is quite simple and builds upon the previous patches.
It simply specializes the interposer implementation for regular compaction
with a table-specific interposer.
Fixes#1431
Signed-off-by: Glauber Costa <glauber@scylladb.com>
"
This patchseries is part of my effort to make resharding less special -
and hopefully less problematic. The next steps are a bit heavy, so I'd
like to, if possible, get this out of the way.
After these two patches, there is no more need to ever call
reshard_sstables: compact_sstables will do, and it will be able to
recognize resharding compactions.
To do that we need to unify the creator function, which is trivially
done by adding a shard parameter to regular compactions as well: they
can just ignore it. I have considered just making the
compaction_descriptor have a virtual create() function and specializing
it, but because we have to store the creator in the compaction object I
decided to keep the virtual function for now.
In a later cleanup step, if we can for instance store the entire
compaction_descriptor object in the compaction object we could do that.
Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Tests: unit tests (dev), dtest (resharding.py)
"
* 'resharding-through-compact-sstables' of github.com:glommer/scylla:
resharding: get rid of special reshard_sstables
compaction: enhance compaction_descriptor with creator and replace function
Always enable lightweight transactions. Remove the check for the command
line switch from the feature service, assuming LWT is always enabled.
Remove the check for LWT from Alternator.
Note that in order for the cluster to work with LWT, all nodes need
to support it.
Rename LWT to UNUSED in db/config.hh, to keep accepting lwt keyword in
--experimental-features command line option, but do nothing with it.
Changes in v2:
* remove enable_lwt feature flag, it's always there
Closes#6102
test: unit (dev, debug)
Message-Id: <20200401071149.41921-1-kostja@scylladb.com>
There is a method, reshard_sstables(), whose sole purpose is to call a
resharding compaction. There is nothing special about this method: all
the information it needs is now present in the compaction_descriptor.
This patch extend the compaction_options class to recognize resharding
compactions as well, and uses that so that make_compaction() can also
create resharding compactions.
To make that happen we have to create a compaction_descriptor object in
the resharding method. Note however that resharding works by passing an
object very close to the compaction_descriptor around. Once this patch
is merged, a logical next step is to reuse it, and avoid creating the
descriptor right before calling compact_sstables().
Signed-off-by: Glauber Costa <glauber@scylladb.com>
There are many differences between resharding and compaction that are
artificial, arising more from the way we ended up implementing it than
necessity. This patch attempts to pass the creator and replacer functions
through the compaction_descriptor.
There is a difference between the creator function for resharding and
regular compaction: resharding has to pass the shard number on behalf
of which the SSTable is created. However regular compactions can just
ignore this. No need to have a special path just for this.
After this is done, the constructor for the compaction object can be
greatly simplified. In further patches I intend to simplify it a bit
further, but some more cleanup has to happen first.
To make that happen we have to construct a compaction_descriptor object
inside the resharding function. This is temporary: resharding currently
works with a descriptor, but at some point that descriptor is lost and
broken into pieces to be passed to this function. The overarching goal
of this work is exactly to be able to keep that descriptor for as long
as possible, which should simplify things a lot.
Callers are patched, but there are plenty for sstable_datafile_test.cc.
For their benefit, a helper function is provided to keep the previous
signature (test only).
Signed-off-by: Glauber Costa <glauber@scylladb.com>
"
Currently, both sharding and partitioning logic is encapsulated into partitioners. This is not desirable because these two concepts are totally independent and shouldn't be coupled together in such a way.
This PR separates sharding and partitioning. Partitioning will still live in i_partitioner class and its subclasses. Sharding is extracted to a new class called sharding_info. Both partitioners and sharding_info are still managed by schema class. Partitioner can be accessed with schema::get_partitioner while sharding_info can be accessed with schema::get_sharding_info.
The transition is done in steps:
1. sharding_info class is defined and all the sharding logic is extracted from partitioner to the new class. Temporarily sharding_info is still embedded into i_partitioner and all sharding related functions in i_partitioner call delegate to the embedded sharding_info object.
2. All calls to i_partitioner functions that are related to sharding are gradually switched to calls to sharding_info equivalents. sharding_info.
3. Once everything uses sharding_info, all sharding logic is dropped from i_partitioner.
Tests: unit(dev, release)
"
* haaawk-sharding_info: (32 commits)
dummy_sharder: rename dummy_sharding_info.* to dummy_sharder.*
sharding_info: rename the class to sharder
i_partitioner:remove embeded sharding_info
i_partitioner: remove unused get_sharding_info
schema: remove incorrect comment
schema: make it possible to set sharding_info per schema
i_partitioner: remove unused shard_count
multishard_writer: stop calling i_partitioner::shard_count
i_partitioner: remove sharding_ignore_msb
partitioner_test: test ranges and sharding_infos
i_partitioner: remove unused split_ranges_to_shards
i_partitioner: remove unused shard_of function
sstable-utils: use sharding_info::shard_of
create_token_range_from_keys: use sharding info for shard_of
multishard_mutation_query_test: use sharding info for shard_of
distribute_reader_and_consume_on_shards: use sharding_info::shard_of
multishard_mutation_query: use sharding_info::shard_of
dht::shard_of: use schema::get_sharding_info
i_partitioner: remove unused token_for_next_shard
split_range_to_single_shard: use sharding info instead of partitioner
...