"In time-series, it's common for tables in a given time window to be eventually
fully expired. The deletion of such tables is done by compaction, but there's
*no* need to *actually* compact such fully expired sstables *iff* their full
deletion will not cause older data to be ressurected. In other words, a fully
expired table can be actually skipped (but deleted in the end) by compaction
*iff* it doesn't contain newer data than its overlapping counterparts. So there
may be false negatives, but never false positives.
All that said, the goal behind this patchset is to save read bandwidth of disk
in such scenarios. Given that fully expired sstables will not be read by
compaction process anymore, read amplification will be greatly reduced too.
Fixes #2620."
* 'time_series_performance_improvement_v2_2' of github.com:raphaelsc/scylla:
tests: check sstable auto correct bad max deletion time
tests: add test for compaction with fully expired table
sstables/compaction: do not actually compact fully expired sstables
sstables: make sstable auto correct max_local_deletion_time
sstables: switch to const ref wherever possible
sstables: use gc_clock::time_point for gc_before
gc_clock: introduce operator<<(ostream&, gc_clock::time_point)
sstables: introduce sstable::get_max_local_deletion_time
sstables: remove unnecessary copy in time series strategies
sstables: change return value type of get_fully_expired_sstables
dtcs: make code to extract non expired tables faster
sstables: add has_correct_max_deletion_time to sstable
"Soon we will have resources beyond just keyspaces and table names. There
will be resources for roles, for user-defined functions (UDFs), and
possible resources for REST end-points. This change generalizes the
implementation of a `data_resource` to many different kinds of
resources, though there is still only one kind (`data`).
The most important patch is 2/5 ("auth/resource: Generalize to different
kinds"), which re-writes `auth::data_resource`. The patch message should
sufficiently explain the design decisions involved.
The other patches rename files and identifiers based on the expanded
role of this class, except for 5/5 ("auth/resource.hh: Rename
`resource_ids`"): this patch gives a more appropriate name to a type
alias.
Fixes #3027."
* 'jhk/generalize_resource/v3' of https://github.com/hakuch/scylla:
auth/resource.hh: Rename `resource_ids`
auth: Rename `data_resource` files
cql3/authorization_statement: Fix typo
auth/resource: Generalize to different kinds
auth: Rename `data_resource` to `resource`
sstables created prior to cc6c383 can contain bad max deletion time stat,
which would make get_fully_expired_sstables return sstables that aren't
actually fully expired. Let's make sstable invalidate the stat if it
is potentially incorrect.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
unordered_set will allow us to quickly extract fully expired tables
from a set of compacting sstables.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
This change generalizes the implementation of a `resource` to many
different kinds of resources, though there is still only one
kind (`data`). In the future, we also expect resource kinds for roles,
user-defined functions (UDFs), and possibly on particular REST
end-points.
I considered several approaches to generalizing to different kinds of
resources.
One approach is to have a base class that is inherited from by different
resource kinds. The common functionality would be accessed through
virtual member functions and kind-specific functions would exist in
sub-classes. I rejected this approach because dealing with different
kinds of resources uniformly requires storage and life-time management
through something like `std::unique_ptr<auth::resource>`, which means
that we lose value semantics (including comparison) and must deal with
complications around ownership.
Another option was to use `boost::variant` (or, in future,
`std::variant`). This is closer to what we want, since there a static
set of resource kinds that we support. I rejected this approach for two
reasons. The first is that all resource kinds share the same data (a
list of segments and a root identifier), which would be duplicated in
each type that composed the variant. The second is that the complexity
and source-code overhead of `boost::variant` didn't seem warranted.
The solution I ended up with is home-grown variant. All resources are
described in the same `final` class: `auth::resource`. This class has
value semantics, supports equality comparison, and has a strict
ordering. All resources have in common a tag ("kind") and a list of
parts. Most operations on resources don't care about the kind of
resource (like getting its name, parsing a name, querying for the
parent, etc). These are just member functions of the class.
When we care about a kind-specific interpretation of a resource, we can
produce a "view" of the resource. For example, `data_resource_view`
allows for accessing the (optional) keyspace and table names.
I anticipate in the future to add functions for creating role
resources (`auth::resource::role`) and also `role_resource_view`.
The functional behaviour of the system should be unchanged with this
patch.
I've added new unit tests in `auth_resource_test.cc` and removed the old
test from `auth_test.cc`.
Fixes#3027.
"This fix for the issue #2989 first adds unit tests for caching_options which
is the only class that uses the helpers from json.hh. This is done to
have regression tests in place for the main change.
The second commit adds conditional use of new recommended JsonCpp API
where available. For older versions of the library, it uses the old
code."
* 'issues/2989/v1' of https://github.com/argenet/scylla:
Use CharReaderBuilder/CharReader and StreamWriterBuilder from JsonCpp.
tests: Add unit tests for caching_options.
The assertions already have produces(mutation) and
produces(dht::decorated_key) overloads. Additional overload that accepts
a range of elements will allow to check if a range of mutations of
decorated keys is produced.
The same interface is exposed by mutation_reader_assertions.
produces(mutation_fragment::kind) is provided by
streamed_mutation_assertions and is going to be needed in order to
fully convert tests to the flat mutation readers.
Both fast_forward_to() overloads return a future which should be waited
for. Additionally, fast_forward_to(const dht::partition_range&) expects
the range to remain valid at least until the next call to
fast_forward_to(). The original mutation_reader_assertions guaranteed
that and so should flat_mutation_reader_assertions.
Recently, memtable flush in test requires storage service for tests,
or it fails with "Assertion `local_is_initialized()' failed".
storage_service_for_tests needs to run in a thread, that's why
flush_memtable was flattened.
Last but not least, we need to revert flushed memory account because
same memtable is used for all sstables in the perf test so as not
to trigger `_mt._flushed_memory <= _mt.occupancy().used_space()'
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20171205012853.21559-1-raphaelsc@scylladb.com>
There are a few edge cases that were untested and as this patch-series
reworks completely how the combined-reader works these should be tested
as well to ensure they keep working.
For now only the interface is converted, behind the scenes the previous
implementation remains, it's output is simply converted by
flat_mutation_reader_from_mutation_reader. The implementation will be
converted in the following patches.
"This series makes it easier to comprehend assertion failures which
involve printing mutation contents."
* 'tgrabiec/mutation-printout' of github.com:scylladb/seastar-dev:
tests: Introduce mutation_diff script
mutation: Make printout more concise
mutation_partition: Don't print absent elements
mutation_partition: Make row_marker printout similar to other partition elements
database: Move operator<<() overloads to appropriate source files
mutation_partition: Use multi-line printout
position_in_partition: Improve printout
Add test to verify we can write and read non-compound tombstones and
compound ones for backward compatibility.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
"This simplifies implementation of mutation_partition merging by relaxing
exception guarantees it needs to provide. This allows reverters to be dropped.
Direct motivation for this is to make it easier to implement new semantics
for merging of clustering range continuity.
Implementation details:
We only need strong exception guarantees when applying to the memtable, which is
using MVCC. Instead of calling apply() with strong exception guarantees on the latest
version, we will move the incoming mutation to a new partition_version and then
use monotonic apply() to merge them. If that merging fails, we attach the version with
the remainder, which cannot fail. This way apply() always succeeds if the allocation
of partition_version object succeeds.
Results of `perf_simple_query_g -c1 -m1G --write` (high overwrite rate):
Before:
101011.13 tps
102498.07 tps
103174.68 tps
102879.55 tps
103524.48 tps
102794.56 tps
103565.11 tps
103018.51 tps
103494.37 tps
102375.81 tps
103361.65 tps
After:
101785.37 tps
101366.19 tps
103532.26 tps
100834.83 tps
100552.11 tps
100891.31 tps
101752.06 tps
101532.00 tps
100612.06 tps
102750.62 tps
100889.16 tps
Fixes #2012."
* tag 'tgrabiec/drop-reversible-apply-v1' of github.com:scylladb/seastar-dev:
mutation_partition: Drop apply_reversibly()
mutation_partition: Relax exception guarantees of apply()
mutation_partition: Introduce apply_weak()
tests: mvcc: Add test for atomicity of partition_entry::apply()
tests: Move failure_injecting_allocation_strategy to a header
tests: mutation_partition: Test exception guarantees of apply_monotonically()
mvcc: Use apply_monotonically() where sufficient
mvcc: partition_version: Use apply_monotonically() to provide atomicity
mvcc: Extract partition_entry::add_version()
mutation_partition: Introduce apply_monotonically()
mutation_partition: Introduce row::consume_with()
The uses which needed strong or weak exception guarantees were
switched to a solution involving apply_monotonically(). All remaining
uses don't need any exception guarantees.
The role manager is responsible for creating, removing, querying for,
granting, and revoking roles.
The role manager does not yet run in production, and is not connected to
the rest of the system.
Included in this patch is the definition of the abstract role management
interface, and also the implementation of the standard role manager.
The standard role manager is tested fully in the `role_manager_test`.
These tests now require having the storage service initialize, which
is needed to decide whether correct non-compound range tombstones
should be emitted or not.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20171126152921.5199-1-duarte@scylladb.com>
The following patches convert sstable writers to use flat mutation
readers instead of the legacy mutation_reader interface.
Writers were already using flat consumer interface and used
consume_flattened_in_thread(), so most of the work was limited to
providing an appropriate equivalent for flat mutation readers.
* https://github.com/pdziepak/scylla.git flat_mutation_reader-sstable-write/v1:
flat_mutation_reader: move consumer_adapter out of consume()
flat_mutation_reader: introduce consume_in_thread()
tests/flat_mutation_reader: test consume_in_thread()
sstables: switch write_components() to flat_mutation_reader
streamed_mutation: drop streamed_mutation_returning()
sstables: convert compaction to flat_mutation_reader
mutation_reader: drop consume_flattened_in_thread()
This series mainly fixes issues with the serialization of promoted
index entries for non-compound schemas and with the serialization of
range tombstones, also for non-compound schemas.
We lift the correct cell name writing code into its own function,
and direct all users to it. We also ensure backward compatibility with
incorrectly generated promoted indexes and range tombstones.
Fixes#2995Fixes#2986Fixes#2979Fixes#2992Fixes#2993
* git@github.com:duarten/scylla.git promoted-index-serialization/v3:
sstables/sstables: Unify column name writers
sstables/sstables: Don't write index entry for a missing row maker
sstables/sstables: Reuse write_range_tombstone() for row tombstones
sstables/sstables: Lift index writing for row tombstones
sstables/sstables: Leverage index code upon range tombstone consume
sstables/sstables: Move out tombstone check in write_range_tombstone()
sstables/sstables: A schema with static columns is always compound
sstables/sstables: Lift column name writing logic
sstables/sstables: Use schema-aware write_column_name() for
collections
sstables/sstables: Use schema-aware write_column_name() for row marker
sstables/sstables: Use schema-aware write_column_name() for static row
sstables/sstables: Writing promoted index entry leverages
column_name_writer
sstables/sstables: Add supported feature list to sstables
sstables/sstables: Don't use incorrectly serialized promoted index
cql3/single_column_primary_key_restrictions: Implement is_inclusive()
cql3/delete_statement: Constrain range deletions for non-compound
schemas
tests/cql_query_test: Verify range deletion constraints
sstables/sstables: Correctly deserialize range tombstones
service/storage_service: Add feature for correct non-compound RTs
tests/sstable_*: Start the storage service for some cases
sstables/sstable_writer: Prepare to control range tombstone
serialization
sstables/sstables: Correctly serialize range tombstones
tests/sstable_assertions: Fix monotonicity check for promoted indexes
tests/sstable_assertions: Assert a promoted index is empty
tests/sstable_mutation_test: Verify promoted index serializes
correctly
tests/sstable_mutation_test: Verify promoted index repeats tombstones
tests/sstable_mutation_test: Ensure range tombstone serializes
correctly
tests/sstable_datafile_test: Add test for incorrect promoted index
tests/sstable_datafile_test: Verify reading of incorrect range
tombstones
sstables/sstable: Rename schema-oblivious write_column_name() function
sstables/sstables: No promoted index without clustering keys
tests/sstable_mutation_test: Verify promoted index is not generated
sstables/sstables: Optimize column name writing and indexing
compound_compat: Don't assume compoundness
TTL of 1 second may cause the cell to expire right after we write it,
if the second component of current time changes right after it. Use
larger ttl to avoid spurious faliures due to this.
Message-Id: <1511463392-1451-1-git-send-email-tgrabiec@scylladb.com>