"
This is hopefully the last large refactoring on the way of UDF.
In UDF we have to convert internal types to Lua and back. Currently
almost all our types and hidden in types.cc and expose functionality
via virtual functions. While it should be possible to add a
convert_{to|from}_lua virtual functions, that seems like a bad design.
In compilers, the type definition is normally public and different
passes know how to reason about each type. The alias analysis knows
about int and floats, not the other way around.
This patch series is inspired by both the LLVM RTTI
(https://www.llvm.org/docs/HowToSetUpLLVMStyleRTTI.html) and
std::variant.
The series makes the types public, adds a visit function and converts
the various virtual methods to just use visit. As a small example of
why this is useful, it then moves a bit of cql3 and json specific
logic out of types.cc and types.hh. In a similar way, the UDF code
will be able to used visit to convert objects to Lua.
In comparison with the previous versions, this series doesn't require the intermediate step of converting void* to data_value& in a few member functions.
This version also has fewer double dispatches I a am fairly confident has all the tools for avoiding all double dispatches.
"
* 'simplify-types-v3' of https://github.com/espindola/scylla: (80 commits)
types: Move abstract_type visit to a header
types: Move uuid_type_impl to a header
types: Move inet_addr_type_impl to a header
types: Move varint_type_impl to a header
types: Move timeuuid_type_impl to a header
types: Move date_type_impl to a header
types: Move bytes_type_impl to a header
types: Move utf8_type_impl to a header
types: Move ascii_type_impl to a header
types: Move string_type_impl to a header
types: Move time_type_impl to a header
types: Move simple_date_type_impl to a header
types: Move timestamp_type_impl to a header
types: Move duration_type_impl to a header
types: Move decimal_type_impl to a header
types: Move floating point types to a header
types: Move boolean_type_impl to a header
types: Move integer types to a header
types: Move integer_type_impl to a header
types: Move simple_type_impl to a header
...
With this patch the logic for walking all nested types is moved to a
helper function. It also fixes reversed_type_impl not being handled in
references_duration.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
The corresponding code is correct, but I noticed no tests would fail
if it was broken while refactoring it.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
There are two reasons for the move. First is that cql server lifetime
is shorter than storage_proxy one and the later stores references to
the semaphore in each service_permit it holds. Second - we want thrift
(and in the future other user APIs) to share the same admission control
memory pool.
Fixes#4844
Message-Id: <20190814142614.GT17984@scylladb.com>
Make the reader recreation logic more robust, by moving away from
deciding which fragments have to be dropped based on a bunch of
special cases, instead replacing this with a general logic which just
drops all already seen fragments (based on their position). Special
handling is added for the case when the last position is a range
tombstone with a non full prefix starting position. Reproducer unit
tests are added for both cases.
Refs #4695Fixes#4733
Add `single_fragment_buffer` test variable. When set, the shard readers
are created with a max buffer size of 1, effectively causing them to
read a single fragment at a time. This, when combined with
`evict_readers=true` will stress the recreate reader logic to the max.
Be more tolerant with different but equivalent representation of range
deletions. When expecting a range tombstone, keep reading range
tombstones while these can be merged with the cumulative range
tombstone, resulting from the merging of the previous range tombstones.
This results in tests tolerating range tombstones that are split into
several, potentially overlapping range tombstones, representing the
same underlying deletion.
This is tailored to the multishard_combining_reader, to make sure it
doesn't loos rows following a range tombstone with a prefix starting
position (whose prefix their keys fall into).
Match SQL's LIKE in allowing an empty pattern, which matches only
an empty text field.
Tests: unit (dev)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Current cql transport code acquire a permit before processing a query and
release it when the query gets a reply, but some quires leave work behind.
If the work is allowed to accumulate without any limit a server may
eventually run out of memory. To prevent that the permit system should
account for the background work as well. The patch is a first step in
this direction. It passes a permit down to storage proxy where it will
be later hold by background work.
ignore_partial_runs() brings confusion because i__p__r() equal to true
doesn't mean filter out partial runs from compaction. It actually means
not caring about compaction of a partial run.
The logic was wrong because any compaction strategy that chooses not to ignore
partial sstable run[1] would have any fragment composing it incorrectly
becoming a candidate for compaction.
This problem could make compaction include only a subset of fragments composing
the partial run or even make the same fragment be compacted twice due to
parallel compaction.
[1]: partial sstable run is a sstable that is still being generated by
compaction and as a result cannot be selected as candidate whatsoever.
Fix is about making sure partial sstable run has none of its fragments
selected for compaction. And also renaming i__p__r.
Fixes#4729.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190807022814.12567-1-raphaelsc@scylladb.com>
The files in tests/sstables/3.x/compressed/ were not used in the tests.
This commit:
- renames the directory to tests/sstables/3.x/lz4/,
- adds analogous directories and files for other compressors,
- adds tests using these files,
- does some minor refactoring.
Signed-off-by: Kamil Braun <kbraun@scylladb.com>
"
Not emitting partition_end for a partition is incorrect. SStable
writer assumes that it is emitted. If it's not, the sstable will not
be written correctly. The partition index entry for the last partition
will be left partially written, which will result in errors during
reads. Also, statistics and sstable key ranges will not include the
last partition.
It's better to catch this problem at the time of writing, and not
generate bad sstables.
Another way of handling this would be to implicitly generate a
partition_end, but I don't think that we should do this. We cannot
trust the mutation stream when invariants are violated, we don't know
if this was really the last partition which was supposed to be
written. So it's safer to fail the write.
Enabled for both mc and la/ka.
Passing --abort-on-internal-error on the command line will switch to
aborting instead of throwing an exception.
The reason we don't abort by default is that it may bring the whole
cluster down and cause unavailability, while it may not be necessary
to do so. It's safer to fail just the affected operation,
e.g. repair. However, failing the operation with an exception leaves
little information for debugging the root cause. So the idea is that the
user would enable aborts on only one of the nodes in the cluster to
get a core dump and not bring the whole cluster down.
"
* 'catch-unclosed-partition-sstable-write' of https://github.com/tgrabiec/scylla:
sstables: writer: Validate that partition is closed when the input mutation stream ends
config, exceptions: Add helper for handling internal errors
utils: config_file: Introduce named_value::observe()
The handler is intended to be called when internal invariants are
violated and the operation cannot safely continue. The handler either
throws (default) or aborts, depending on configuration option.
Passing --abort-on-internal-error on the command line will switch to
aborting.
The reason we don't abort by default is that it may bring the whole
cluster down and cause unavailability, while it may not be necessary
to do so. It's safer to fail just the affected operation,
e.g. repair. However, failing the operation with an exception leaves
little information for debugging the root cause. So the idea is that the
user would enable aborts on only one of the nodes in the cluster to
get a core dump and not bring the whole cluster down.
Use the query_time passed in to the populate function and forward it to
the sstable constructor, so that the compaction happening during sstable
write uses the same query time that any compaction done by the mutation
source test suit does.
So tests that do compaction can pass the query_time they used for it to
clients that do some compaction themselves, making sure all compactions
happen with the same query time, avoiding false positive test failures.
"
Fixes#4663.
Fixes#4718.
"
* 'make_cleanup_run_aware_v3' of https://github.com/raphaelsc/scylla:
tests/sstable_datafile_test: Check cleaned sstable is generated with expected run id
table: Make SSTable cleanup run aware
compaction: introduce constants for compaction descriptor
compaction: Make it possible to config the identifier of the output sstable run
table: do not rely on undefined behavior in cleanup_sstables
"
This series is a batch of first small steps towards devirtualizing CQL restrictions:
- one artificial parent class in the hierarchy is removed: abstract_restriction
- the following functions are devirtualized:
* is_EQ()
* is_IN()
* is_slice()
* is_contains()
* is_LIKE()
* is_on_token()
* is_multi_column()
Future steps can involve the following:
- introducing a std::variant of restriction targets: it's either a column
or a vector of columns
- introducing a std::variant of restriction values: it's one of:
{term, term_slice, std::vector<term>, abstract_marker}
The steps above will allow devirtualizing most of the remaining virtual functions
in favor of std::visit. They will also reduce the number of subclasses,
e.g. what's currently `token_restriction::IN_with_values` can be just an instance
of `restriction`, knowing that it's on a token, having a target of std::vector<column>
and a value of std::vector<term>.
Tests: unit(dev), dtest: cql_tests, cql_additional_tests
"
* 'refactor_restrictions_2' of https://github.com/psarna/scylla:
cql3: devirtualize is_on_token()
cql3: devirtualize is_multi_column()
cql3: devirtualize is_EQ, is_IN, is_contains, is_slice, is_LIKE
tests: add enum_set adding case
cql3: allow adding enum_sets
cql3: remove abstract_restriction class
Currently all cells generated by this method uses a ttl of 1. This
causes test flakyness as tests often compact the input and output
mutations to weed out artificial differences between them. If this
compaction is not done with the exact same query time, then some cells
will be expired in one compaction but not in the other.
733c68cb1 attempted to solve this by passing the same query time to
`flat_mutation_reader_assertions::produce_compacted()` as well as
`mutation_partition::compact_for_query()` when compacting the input
mutation. However a hidden compaction spot remained: the ka/la sstable
writer also does some compaction, and for this it uses the time point
passed to the `sstable` constructor, which defaults to
`gc_clock::now()`. This leads to false positive failures in
`sstable_mutation_test.cc`.
At this point I don't know what the original intent was behind this low
`ttl` value. To solve the immediate problem of the tests failing, I
increased it. If it turns out that this `ttl` value has a good reason,
we can do a more involved fix, of making sure all sstables written also
get the same query time as that used for the compaction.
Fixes: #4747
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190731081522.22915-1-bdenes@scylladb.com>
`produces_compacted()` is usually used in tandem of another
compaction done on the expected output (`m` param). This is usually done
so that even though the reader works with an uncompacted stream, when
checking the checking of the result will not fail due to insignificant
changes to the data, e.g. expired collection cells dropped while merging
two collections. Currently, the two compactions, the one inside
`produce_compacted()` and the one done by the caller uses two separate
calls to `gc_clock::now()` to obtain the query time. This can lead to
off-by-one errors in the two query times and subsequently artificial
differences between the two compacted mutations, ultimately failing the
test due to a false-positive.
To prevent this allow callers to pass in a query time, the same they
used to compact the input mutation (`m`).
This solves another source of flakyness in unit tests using the mutation
source test suite.
Refs: #4695Fixes: #4747
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190726144032.3411-1-bdenes@scylladb.com>
Across all calls to `make_fragments_with_non_monotonic_positions()`, to
prevent off-by one errors between the separately generated test input
and expected output. This problem was already supposed to be fixed by
5f22771ea8 but for some reason that only
used the same deletion time *inside* a single call, which will still
fall short in some cases.
This should hopefully fix this problem for good.
Refs: #4695
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190724073240.125975-1-bdenes@scylladb.com>
compaction: allow collecting purged data
Allow the compaction initiator to pass an additional consumer that will
consume any data that is purged during the compaction process. This
allows the separate retention of these dead cells and tombstone until
some long-running process like compaction safely finishes. If the
process fails or is interrupted the purged data can be used to prevent
data resurrection.
This patch was developed to serve as the basis for a solution to #4531
but it is not a complete solution in and on itself.
This series is a continuation of the patch: "[PATCH v1 1/3] Introduce
Garbage Collected Consumer to Mutation Compactor" by Raphael S.
Carvalho <raphaelsc@scylladb.com>.
Refs: #4531
* https://github.com/denesb/scylla.git compaction_collect_purged_data/v8:
Introduce compaction_garbage_collector interface
collection_type_impl::mutation: compact_and_expire() add collector
parameter
row: add garbage_collector
row_marker: de-inline compact_and_expire()
row_marker: add garbage_collector
Introduce Garbage Collected Consumer to Mutation Compactor
tests: mutation_writer_test.cc/generate_mutations() ->
random_schema.hh/generate_random_mutations()
tests/random_schema: generate_random_mutations(): remove `engine`
parameter
tests/random_schema: add assert to make_clustering_key()
tests/random_schema: generate_random_mutations(): allow customizing
generated data
tests: random_schema: futurize generate_random_mutations()
random_schema: generate_random_mutations(): restore indentation
data_model: extend ttl and expiry support
tests/random_schema: generate_random_mutations(): generate partition
tombstone
random_schema: add ttl and expiry support
tests/random: add get_bool() overload with random engine param
random_schema: generate_random_mutations(): ensure partitions are
unique
tests: add unit tests for the data stream split in compaction
"
Issue #4558 describes a situation in which failure to execute clearsnapshots
will hard crash the node. The problem is that clearsnapshots will internally use
lister::rmdir, which in turn has two in-tree users: clearing snapshots and clearing
temporary directories during sstable creation. The way it is currently
coded, it wraps the io functions in io_check, which means that failures
to remove the directory will crash the database.
We recently saw how benign failures crashed a database during
clearsnapshot: we had snapshot creation running in parallel, adding more
files to the directory that wasn't empty by the time of deletion. I
have also seen very often users add files to existing directories by
accident, which is another possibility to trigger that.
This patch removes the io_check from lister, and moves it to the caller
in which we want to be more strict. We still want to be strict about
the creation of temporary directories, since users shouldn't be touching
that in any way.
Also while working on that, I realized we have no tests for snapshots of
any kind in tree, so let's write some
"
* 'snapshots' of https://github.com/glommer/scylla:
tests: add tests for snapshots.
lister: don't crash the node on failure to remove snapshot
Compilation fails with fmt release 5.3.0 when we print a bytes_view
using "{}" formatter.
Compiler's complain is: "error: static assertion failed: mismatch
between char-types of context and argument"
Fix this by explicitly using to_hex() converter.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20190716221231.22605-3-vladz@scylladb.com>
The original "test_schema_digest_does_not_change" test case ensures
that schema digests will match for older nodes that do not support
all the features yet (including computed columns).
The additional case uses sstables generated after computed columns
are allowed, in order to make sure that the digest computed
including computed columns does not change spuriously as well.
Schema change test might need regenerating every time a system table
is added. In order to save future developer's time on debugging this
test, a short description of that requirement is added.
In order to make sure that old schema digest is not recomputed
and can be verified - computed columns feature is initially disabled
in schema_change_test.
The reason for that is as follows: running CQL test env assumes that
we are running the newest cluster with all features enabled. However,
the mere existence of some features might influence digest calculation.
So, in order for the existing test to work correctly, it should have
exactly the same set of cluster supported features as it had during
its creation. It used to be "all features", but now it's "all features
except computed columns". One can think of that as running a cluster
with some nodes not yet knowing what computed columns are, so they
are not taken into account when computing digests.
Additionally, a separate test case that takes computed column digest
into account will be generated and added in this series.
"
Fix another source of flakyness in mutation_reader_test. This one is caused by storage_service_for_tests lacking a config::broadcast_to_all_shards() call, triggering an invalid memory access (or SEGFAULT) when run on more than one shards.
Refs: #4695
"
* 'fix_storage_service_for_tests' of https://github.com/denesb/scylla:
tests: storage_service_for_tests: broadcast config to all shards
tests: move storage_service_for_tests impl to test_services.cc
While inspecting the snapshot code, I realized that we don't have any
tests for it. So I decided to add some.
Unfortunately I couldn't come up with a test of clearsnapshot reliably
failing to remove the directory: relying on create snapshot +
clearsnapshot is racy (not always happen), and other tricks that can be
used to reproduce this -- like creating a root-owned file inside the
snapshots directory -- is environment-dependent, and a bit ugly for unit
tests. Dtests would probably be a better place for that.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Due to recent changes to the config subsystem, configuration has to be
broadcast to all shards if one wishes to use it on them. The
`storage_service_for_tests` has a `sharded<gms::gossiper>` member, which
reads config values on initialization on each shard, causing a crash as
the configuration was initialized only on shard 0. Add a call to
`config::broadcast_to_all_shards()` to ensure all shards have access to
valid config values.
Currently the
test_multishard_combining_reader_non_strictly_monotonic_positions is
flaky. The test is somewhat unconventional, in that it doesn't use the
same instance of data as the input to the test and as it's expected
output, instead it invokes the method which generates this data
(`make_fragments_with_non_monotonic_positions()`) twice, first to
generate the input, and a secondly to generate the expected output. This
means that the test is prone to any deviation in the data generated by
said method. One such deviation, discovered recently, is that the method
doesn't explicitly specify the deletion time of the generated range
tombstones. This results in this deletion time sometimes differing
between the test input and the expected output. Solve by explicitly
passing the same deletion time to all created range tombstones.
Refs: #4695
Fixes a segfault when querying for an empty keyspace.
Also, fixes an infinite loop on smp > 1. Queries to
system.size_estimates table which are not single-partition queries
caused Scylla to go into an infinite loop inside
multishard_combining_reader::fill_buffer. This happened because
multishard_combinind_reader assumes that shards return rows belonging
to separate partitions, which was not the case for
size_estimates_mutation_reader.
Fixes#4689.
Duplicate partitions can appear as a result of the same partition key
generated more than once. For now we simply remove any duplicates. This
means that in some circumstances there will be less partitions generated
than asked.
When generating data, the user can now also generate ttls and
expiry for all generated atoms. This happens in a controlled way, via a
generator functor, very similar to how the timestamps are generated.
This functor is also used by `random_schema` to generate `deletion_time`
for all tombstones, so the user now has full control of when all of the
atoms can be GC'd.