So that is_set() will be true for that option. Needed in tests which
set some config options in higher layer and then lower layers detects
if option was set or not before applying its default.
The previous fix removed the additional insertion of "min rp" per source
shard based on whether we had processed existing CF:s or not (i.e. if
a CF does not exist as sstable at all, we must tag it as zero-rp, and
make whole shard for it start at same zero.
This is bad in itself, because it can cause data loss. It does not cause
crashing however. But it did uncover another, old old lingering bug,
namely the commitlog reader initiating its stream wrongly when reading
from an actual offset (i.e. not processing the whole file).
We opened the file stream from the file offset, then tried
to read the file header and magic number from there -> boom, error.
Also, rp-to-file mapping was potentially suboptimal due to using
bucket iterator instead of actual range.
I.e. three fixes:
* Reinstate min position guarding for unencoutered CF:s
* Fix stream creating in CL reader
* Fix segment map iterator use.
v2:
* Fix typo
Message-Id: <1490611637-12220-1-git-send-email-calle@scylladb.com>
Fixes#2173
Per-shard min positions can be unset if we never collected any
sstable/truncation info for it, yet replay segments of that id.
Wrap the lookups to handle "missing data -> default", which should have been
there in the first place.
Message-Id: <1490185101-12482-1-git-send-email-calle@scylladb.com>
This patch ensures that the schema merging atomically publishes
schema changes. In particular, it ensures that when a base schema
and a subset of its views are modified together (i.e., upon an alter
table or alter type statement), then they are published together as
well, without any deferring in-between.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
The write path uses a base schema at a particular version, and we
want it to use the materialized views at the corresponding version.
To achieve this, we need to map the state currently in db::view::view
to a particular schema version, which this patch does by introducing
the view_info class to hold the state previously in db::view::view,
and by having a view schema directly point to it.
The changes in the patch are thus:
1) Introduce view_info to hold the extra view state;
2) Point to the view_info from the schema;
3) Make the functions in the now stateless db::view::view non-member;
4) Remove the db::view::view class.
All changes are structural and don't affect current behavior.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Fixes#2098
Replay previously did all segments in parallel on shard 0, which
caused heavy memory load. To reduce this and spread footprint
across shards, instead do X segments per shard, sequential per shard.
v2:
* Fixed whitespace errors
Message-Id: <1489503382-830-1-git-send-email-calle@scylladb.com>
This patch changes a lambda argument type so the keyspace name is
passed by reference instead of copying it, in
read_schema_for_keyspaces().
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170309213134.10331-1-duarte@scylladb.com>
Add batch_size_fail_threshold_in_kb to prevent huge batch from been
applied and causing troubles. Also do not warn or fail if only one
partition is affected.
Fixes: #2128
Message-Id: <20170309111247.GE8197@scylladb.com>
"If a node is bootstrapped with auto_boostrap disabled, it will not
wait for schema sync before creating global keyspaces for auth and
tracing. When such schema changes are then reconciled with schema on
other nodes, they may overwrite changes made by the user before the
node was started, because they will have higher timestamp.
To prevent that, let's use minimum timestamp so that default schema
always looses with manual modifications. This is what Cassandra does.
Fixes #2129."
* tag 'tgrabiec/prevent-keyspace-metadata-loss-v1' of github.com:scylladb/seastar-dev:
db: Create default auth and tracing keyspaces using lowest timestamp
migration_manager: Append actual keyspace mutations with schema notifications
There is a workaround for notification race, which attaches keyspace
mutations to other schema changes in case the target node missed the
keyspace creation. Currently that generated keyspace mutations on the
spot instead of using the ones stored in schema tables. Those
mutations would have current timestamp, as if the keyspace has been
just modified. This is problematic because this may generate an
overwrite of keyspace parameters with newer timestamp but with stale
values, if the node is not up to date with keyspace metadata.
That's especially the case when booting up a node without enabling
auto_bootstrap. In such case the node will not wait for schema sync
before creating auth tables. Such table creation will attach
potentially out of date mutations for keyspace metadata, which may
overwrite changes made to keyspace paramteters made earlier in the
cluster.
Refs #2129.
"This series adds various optimisations to counter implementation
(nothing extreme, mostly just avoiding unnecessary operations) as well
as some missing features such as tracing and dropping timed out queries.
Performance was tested using:
perf-simple-query -c4 --counters --duration 60
The following results are medians.
before after diff
write 18640.41 33156.81 +77.9%
read 58002.32 62733.93 +8.2%"
* tag 'pdziepak/optimise-counters/v3' of github.com:cloudius-systems/seastar-dev: (30 commits)
cell_locker: add metrics for lock acquisition
storage_proxy: count counter updates for which the node was a leader
storage_proxy: use counter-specific timeout for writes
storage_proxy: transform counter timeouts to mutation_write_timeout_exception
db: avoid allocations in do_apply_counter_update()
tests/counters: add test for apply reversability
counters: attempt to apply in place
atomic_cell: add COUNTER_IN_PLACE_REVERT flag
counters: add equality operators
counters: implement decrement operators for shard_iterator
counters: allow using both views and mutable_views
atomic_cell: introduce atomic_cell_mutable_view
managed_bytes: add cast to mutable_view
bytes: add bytes_mutable_view
utils: introduce mutable_view
db: add more tracing events for counter writes
db: propagate tracing state for counter writes
tests/cell_locker: add test for timing out lock acquisition
counter_cell_locker: allow setting timeouts
db: propagate timeout for counter writes
...
It is safe to copy column_mapping accros shards. Such guarantee comes at
the cost of performance.
This patch makes commitlog_entry_writer use IDL generated writer to
serialise commitlog_entry so that column_mapping is not copied. This
also simplifies commitlog_entry itself.
Performance difference tested with:
perf_simple_query -c4 --write --duration 60
(medians)
before after diff
write 79434.35 89247.54 +12.3%
"This introduces an API which allows forward navigation in a stream of mutation
fragments. It allows one to consume only a subset of the stream by iteratively
specifying sub-ranges from which fragments should be returned.
API outline:
When in forwarding mode, the stream does not return all fragments right away,
but only those belonging to the current range. Initially current range only
covers the static row. The stream can be forwarded, even before reaching end-
of-stream for current range, to a later range with fast_forward_to().
Forwarding doesn't change initial restrictions of the stream, it can only be
used to skip over data.
Monotonicity of positions is preserved by forwarding. That is fragments
emitted after forwarding will have greater positions than any fragments
emitted before forwarding.
For any range, all range tombstones relevant for that range which are present
in the original stream will be emitted. Range tombstones emitted before
forwarding which overlap with the new range are not necessarily re-emitted.
When not in forwarding mode, the stream acts as if the current range was equal
to the full range. This implies that fast_forward_to() cannot be
used.
Whether stream is in forwarding mode or not is specified when the stream
is created, typically via mutation_source interface.
What's left for later series:
Optimization by providing specialized implementations. This series implements
forwarding support in all mutation sources via generic wrapper which simply
drops fragments."
* tag 'tgrabiec/clustering-fast-forward-to-v2' of github.com:scylladb/seastar-dev:
tests: mutation_source_tests: Verify monotonicty of positions
tests: random_mutation_generator: Spread the keys more
tests: mutation_source_test: Make blobs more easily distinguishable
tests: streamed_mutation: Test that merged stream passes mutation source tests
tests: mutation_source_test: Add tests for forwarding of streamed_mutation
tests: streamed_mutation_assertions: Add methods for navigating the stream
tests: Add range generators to random_mutation_generator
partition_slice_builder: Add with_ranges()
query: Introduce full_clustering_range
streamed_mutation: Add non-owning variant of mutation_from_streamed_mutation()
db: Enable creating forwardable readers via mutation_source
mutation_source: Document liveness requirements
mutation_source: Cleanup
db: Replace virtual_reader_type with mutation_source_opt
partition_version: Refactor make_partition_snapshot_reader() overloads
database: Fix mutation_source created by as_mutation_source() to not ignore trace_state_ptr
memtable: Accept all mutation_source parameters
streamed_mutation: Implement fast_forward_to() in stream merger
streamed_mutation: Add generic implementation of forwardable streamed_mutation
streamed_mutation: Add fast_forward_to() API
position_in_partition: Introduce position_range
position_in_partition: Introduce position constructor for right after the static row
streamed_mutation: Make cast to view non-explicit
streamed_mutation: Make schema() getter non-copying
Do not ignore a future<> retuned by cycle() since it will produce a
warning in case of an error. Log it instead.
Message-Id: <20170219151811.GN11471@scylladb.com>
On database stop, we do flush memtables and clean up commit log segment usage.
However, since we never actually destroy the distributed<database>, we
don't actually free the commitlog either, and thus never clear out
the remaining (clean) segments. Thus we leave perfectly clean segments
on disk.
This just adds a "release" method to commitlog, and calls it from
database::stop, after flushing CF:s.
Message-Id: <1485784950-17387-1-git-send-email-calle@scylladb.com>
mutation_fragments are going to be caching their size in memory. In
order to be able to invalidate that correctly, they need to know when
that size may change (but avoid invalidation when it is not necessary).
This adds a function mutate_MV() which takes view mutations and sends
them to the appropriate nodes (this may be the current node, or a
remote node).
This is only a partial implementation - we still don't do the local
batch log (to survive reboots and failures) and some other stuff which
is left commented out.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This adds to the "write_type" enum also the "VIEW" write type.
To be honest, I don't understand why the "write_type" distinction
is important.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
A function to find the appropriate replica to send a view update to.
This patch creates a new source file db/view/view.cc. We should
eventually move a lot more of the materialized views code there.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch adds the view_update_builder class, which is responsible
for calculating the mutations to apply to a column family's
materialized views, given a streamed_mutation representing an update
to the base table and a streamed_mutation representing the
pre-existing rows which the update covers.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch adds the view_updates::generate_update() function to
generate view updates given a base row update and the corresponding,
pre-existing row. This function will decide which of the previously
introduced functions to call based on whether there is a pre-existing
row and whether there exists a regular base column that's part of the
view's PK.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch adds a function to replace a view row given a base
table update and the pre-existing row, which simply deletes the old
view entry and adds a new one.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch introduces the view_updates::update_entry function,
which creates the updates to apply to the existing view entry given
the base table row before and after the update.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch introduces the view_updates::delete_old_entry function,
which creates a view row mutation to delete an entry given an updated
base table row.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch introduces the view_updates::create_entry function, which
creates a view row mutation given a new base table row.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch adds a function to compute the row marker of a view row
given the base row. There are two cases to consider when building the
row marker: 1) there is a column C that is a regular base column but
is in the view PK; and 2) the columns for the base and the view PKs
are the same.
For 1), the view row marker timestamp will be the biggest between the
base's row marker and C. The TTL will be that of C. This means that if
C expires, the view row maker will expire as well (and the row, if no
other column is keeping it alive). Note that if the base row marker
expires but not C, then the base row will still be live due to C and
we shouldn't expire the view row.
For 2), the view row timestamp will be the same as the base row
timestamp. The TTL should be set in such a way that both base and view
rows live for the same time. We thus set the view row TTL to be the
max of any other TTL in the base row. This is particularly important
in the case where the base row marker has a TTL, but a column *absent*
from the view holds a greater one.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch introduces the view_updates class, which is responsible
for generating and storing updates to a particular materialized view.
The updates will be generated from an updated base row and the
pre-existing one (if any), in later patches.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
To help calculate the view mutations from a base update, we store in
the view class the column that's part of the view's primary key but
not part of the base's, if such column exists.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch adds the matches_view_filter() function which specifies
whether a given base row matches the view filter. Unlike
may_be_affected_by(), this function has no false positives.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch adds the may_be_affected_by() function to the view class,
which is responsible to determine whether an update to a base class
affects one of its views.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Refs #1813 (fixes scylla part)
Added require_client_auth and priority_string options to
server_encryption_options/client_encryption_options an process them.
Allows TLS method/algo specification. Also enabled enforcing known cert
authentication for both node-to-node and client communication.
* seastar 397685c...c1dbd89 (13):
> lowres_clock: drop cache-line alignment for _timer
> net/packet: add missing include
> Merge "Adding histogram and description support" from Amnon
> reactor: Fix the error: cannot bind 'std::unique_ptr' lvalue to 'std::unique_ptr&&'
> Set the option '--server' of tests/tcp_sctp_client to be required
> core/memory: Remove superfluous assignment
> core/memory: Remove dead code
> core/reactor: Use logger instead of cerr
> fix inverted logic in overprovision parameter
> rpc: fix timeout checking condition
> rpc: use lowres_clock instead of high resolution one
> semaphore: make semaphore's clock configurable
> rpc: detect timedout outgoing packets earlier
Includes treewide change to accomodate rpc changing its timeout clock
to lowres_clock.
Includes fixup from Amnon:
collectd api should use the metrics getters
As part of a preperation of the change in the metrics layer, this change
the way the collectd api uses the metrics value to use the getters
instead of calling the member directly.
This will be important when the internal implementation will changed
from union to variant.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1485457657-17634-1-git-send-email-amnon@scylladb.com>
The per-node limit will be total memory divided by number of shards
instead of just total memory. For example, when Scylla is started with
-c16 -m16G, the commit log will induce flushes on given shard when
unflushed data exceeds on that shard 62MB instead of 1GB.
Fixes#2046.
Message-Id: <1485874534-10939-1-git-send-email-tgrabiec@scylladb.com>
It still has problems:
- while resharding a very large leveled compaction strategy table, a huge
amount of tiny sstables are generated, overwhelming the file descriptor
limits
- there is a large impact on read latency while resharding is going on
(cherry picked from commit cf27d44412)
(forward-ported from branch-1.6)
Commit f0c28e1 ("db/schema_tables: Add schema_functions and
schema_aggregates tables") forgot to add the newly added tables to the
db::schema_tables::ALL list, which is used for authorization checks, for
example.
Fixes the following auth_test.py dtest failures:
('Unable to connect to any servers', {'127.0.0.1': Unauthorized('Error from server: code=2100 [Unauthorized] message="User cathy has no SELECT permission on <table system.schema_functions> or any of its parents"',)})
Message-Id: <1484045277-4997-1-git-send-email-penberg@scylladb.com>
This reverts commit b81a57e8eb.
With exponential range scanning, we should now be able to survive
msb ignore bits of 12, which allows better sharding on large clusters.
This patch adds an utility function that creates a raw select
statement from a set of columns and a where clause. It is intended to
be used to create the prepared select statement used by the view
class.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>