The set make dependencies between mm and other services cleaner,
in particular, after the set:
- the query processor no longer needs migration manager
(which doesn't need query processor either)
- the database no longer needs migration manager, thus the mutual
dependency between these two is dropped, only migration manager
-> database is left
- the migration manager -> storage_service dependency is relaxed,
one more patchset will be needed to remove it, thus dropping one
more mutual dependency between them, only the storage_service
-> migration manager will be left
- the migration manager is stopped on drain, but several more
services need it on stop, thus causing use after free problems,
in particular there's a caught bug when view builder crashes
when unregistering from notifier list on stop. Fixed.
Tests: unit(dev)
Fixes: #5404
This is the last place where database code needs the migration_manager
instance to be alive, so now the mutual dependency between these two
is gone, only the migration_manager needs the database, but not the
vice-versa.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Do not call for local migration manager instance to send notifications,
call for the local migration notifier, it will always be alive.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The _listeners list on migration_manager class and the corresponding
notify_xxx helpers have nothing to do with the its instances, they
are just transport for notification delivery.
At the same time some services need the migration manager to be alive
at their stop time to unregister from it, while the manager itself
may need them for its needs.
The proposal is to move the migration notifier into a complete separate
sharded "service". This service doesn't need anything, so it's started
first and stopped last.
While it's not effectively a "migration" notifier, we inherited the name
from Cassandra and renaming it will "scramble neurons in the old-timers'
brains but will make it easier for newcomers" as Avi says.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/5533
from Avi Kivity:
canonical_mutation objects are used for schema reconciliation, which is a
fragile area and thus deserves some debugging help.
This series makes canonical_mutation objects printable.
"
The original fix (10f6b125c8) didn't
take into account that if there was a failed memtable flush (Refs
flush) but is not a flushable memtable because it's not the latest in
the memtable list. If that happens, it means no other memtable is
flushable as well, cause otherwise it would be picked due to
evictable_occupancy(). Therefore the right action is to not flush
anything in this case.
Suspected to be observed in #4982. I didn't manage to reproduce after
triggering a failed memtable flush.
Fixes#3717
"
* tag 'avoid-ooming-with-flush-continuations-v2' of github.com:tgrabiec/scylla:
database: Avoid OOMing with flush continuations after failed memtable flush
lsa: Introduce operator bool() to occupancy_stats
lsa: Expose region_impl::evictable_occupancy in the region class
The user_types_metadata can simply be owned by the keyspace. This
simplifies the code since we never have to worry about nulls and the
ownership is now explicit.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
It looks like this was done just to avoid including
user_types_metadata.hh, which seems a bit much considering that it
requires adding specialization to the seastar namespace.
A followup patch will also stop using lw_shared_ptr for
user_types_metadata.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
On aarch64, asan detected a use-after-move. It doesn't happen on x86_64,
likely due to different argument evaluation order.
Fix by evaluating full_slice before moving the schema.
Note: I used "auto&&" and "std::move()" even though full_slice()
returns a reference. I think this is safer in case full_slice()
changes, and works just as well with a reference.
Fixes#5419.
Exception messages contain semaphore's name (provided in ctor).
This affects the queue overflow exception as well as timeout
exception. Also, custom throwing function in ctor was changed
to `prethrow_action', i.e. metrics can still be updated there but
now callers have no control over the type of the exception being
thrown. This affected `restricted_reader_max_queue_length' test.
`reader_concurrency_semaphore'-s docs are updated accordingly.
The original fix (10f6b125c8) didn't
take into account that if there was a failed memtable flush (Refs
flush) but is not a flushable memtable because it's not the latest in
the memtable list. If that happens, it means no other memtable is
flushable as well, cause otherwise it would be picked due to
evictable_occupancy(). Therefore the right action is to not flush
anything in this case.
Suspected to be observed in #4982. I didn't manage to reproduce after
triggering a failed memtable flush.
Fixes#3717
"
This patch series adds only UDF support, UDA will be in the next patch series.
With this all CQL types are mapped to Lua. Right now we setup a new
lua state and copy the values for each argument and return. This will
be optimized once profiled.
We require --experimental to enable UDF in case there is some change
to the table format.
"
* 'espindola/udf-only-v4' of https://github.com/espindola/scylla: (65 commits)
Lua: Document the conversions between Lua and CQL
Lua: Implement decimal subtraction
Lua: Implement decimal addition
Lua: Implement support for returning decimal
Lua: Implement decimal to string conversion
Lua: Implement decimal to floating point conversion
Lua: Implement support for decimal arguments
Lua: Implement support for returning varint
Lua: Implement support for returning duration
Lua: Implement support for duration arguments
Lua: Implement support for returning inet
Lua: Implement support for inet arguments
Lua: Implement support for returning time
Lua: Implement support for time arguments
Lua: Implement support for returning timeuuid
Lua: Implement support for returning uuid
Lua: Implement support for uuid and timeuuid arguments
Lua: Implement support for returning date
Lua: Implement support for date arguments
Lua: Implement support for returning timestamp
...
Adds per-table metrics for counting partition and row reuse
in memtables. New metrics are as follows:
- memtable_partition_writes - number of write operations performed
on partitions in memtables,
- memtable_partition_hits - number of write operations performed
on partitions that previously existed in a memtable,
- memtable_row_writes - number of row write operations performed
in memtables,
- memtable_row_hits - number of row write operations that ovewrote
rows previously present in a memtable.
Tests: unit(release)
With this it is possible to create user defined functions and
aggregates and they are saved to disk and the schema change is
propagated.
It is just not possible to call them yet.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Schema changes can have big effects on performance, typically it should
be a rare event.
It is usefull to monitor how frequently the schema changed.
This patch adds a counter that increases each time a schema changed.
After this patch the metrics would look like:
scylla_database_schema_changed{shard="0",type="derive"} 2
Fixes#4785
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Scylla currently crashes if we run manual operations like nodetool
compact with the controller disabled. While we neither like nor
recommend running with the controller disabled, due to some corner cases
in the controller algorithm we are not yet at the point in which we can
deprecate this and are sometimes forced to disable it.
The reason for the crash is that manual operations will invoke
_backlog_of_shares, which returns what is the backlog needed to
create a certain number of shares. That scan the existing control
points, but when we run without the controller there are no control
points and we crash.
Backlog doesn't matter if the controller is disabled, and the return
value of this function will be immaterial in this case. So to avoid the
crash, we return something right away if the controller is disabled.
Fixes#5016
Signed-off-by: Glauber Costa <glauber@scylladb.com>
This patch silences those future discard warnings where it is clear that
discarding the future was actually the intent of the original author,
*and* they did the necessary precautions (handling errors). The patch
also adds some trivial error handling (logging the error) in some
places, which were lacking this, but otherwise look ok. No functional
changes.
If a schema was created before computed columns were implemented,
its token column may not have been marked as computed.
To remedy this, if no computed column is found, the schema
will be recreated.
The code will work correctly even without this patch in order to support
upgrading from legacy versions, but it's still important: it transforms
token columns from the legacy format to new computed format, which will
eventually (after a few release cycles) allow dropping the support for
legacy format altogether.
streaming_reader_lifecycle_policy::create_reader() was ignoring the
partition_slice passed to it and always creating the reader for the
full slice.
That's wrong because create_reader() is called when recreating a
reader after it's evicted. If the reader stopped in the middle of
partition we need to start from that point. Otherwise, fragments in
the mutation stream will appear duplicated or out of ordre, violating
assumptions of the consumers.
This was observed to result in repair writing incorrect sstables with
duplicated clustering rows, which results in
malformed_sstable_exception on read from those sstables.
Fixes#4659.
In v2:
- Added an overload without partition_slice to avoid changing existing users which never slice
Tests:
- unit (dev)
- manual (3 node ccm + repair)
Backport: 3.1
Reviewd-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1563451506-8871-1-git-send-email-tgrabiec@scylladb.com>
Change the type from bool to updateable_value<bool> throughout the dependency
chain and mark it as live updateable.
In theory we should also observe the value and trigger compaction if it changes,
but I don't think it is worthwhile.
Copying the config object breaks the link between the original and the copied
object, so updates to config items will not be visible. To allow updates, don't
copy any more, and instead keep a pointer.
The pointer won't work will once config is updateable, since the same object is
shared across multiple shard, but that can be addressed later.
Currently, database::_cfg is a copy of the global configuration. But this means
that we have multiple master copies of the configuration, which makes updating
the configuration harder. In order to eliminate the copy we have to eliminate the
database default constructor, which creates a config object, so that all
remaining constructors can receive config by reference and retain that reference.
This patch adds a warning option to the user for situations where
rows count may get bigger than initially designed. Through the
warning, users can be aware of possible data modeling problems.
The threshold is initially set to '100,000'.
Tests: unit (dev)
Message-Id: <20190528075612.GA24671@shenzou.localdomain>
To prepare for a seastar change that adds an optional file_permissions
parameter to touch_directory and recursive_touch_directory.
This change messes up the call to io_check since the compiler can't
derive the Func&& argument. Therefore, use a lambda function instead
to wrap the call to {recursive_,}touch_directory.
Ref #4395
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190421085502.24729-1-bhalevy@scylladb.com>
1. All nodes in the cluster have to support MC_SSTABLE_FEATURE
2. When a node observes that whole cluster supports MC_SSTABLE_FEATURE
then it should start using MC format.
3. Once all shards start to use MC then a node should broadcast that
unbounded range tombstones are now supported by the cluster.
4. Once whole cluster supports unbounded range tombstones we can
start accepting them on CQL level.
tests: unit(release)
Fixes#4205Fixes#4113
* seastar-dev.git dev/haaawk/enable_mc/v11:
system_keyspace: Add scylla_local
system_keyspace: add accessors for SCYLLA_LOCAL
storage_service: add _sstables_format field
feature: add when_enabled callbacks
system_keyspace: add storage_service param to setup
Add sstable format helper methods
Register feature listeners in storage_service
Add service::read_sstables_format
Use read_sstables_format in main.cc
Use _sstables_format to determine current format
Add _unbounded_range_tombstones_feature
Update supported features on format change
Before this patch raw_builder would always start with an empty list of
user types. This means that every time a type is added to a keyspace,
every type in that keyspace needs to be recreated.
With this patch we pass a keyspace_metadata instead of just the
keyspace name and can construct new user types on top of previous
ones.
This will be used in the followup patch, where only new types are
created.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
The goal of the sstables manager is to track and manage sstables life-cycle.
There is a sstable manager instance per database and it is passed to each column-family
(and test environment) on construction.
All sstables created, loaded, and deleted pass through the sstables manager.
The manager will make sure consumers of sstables are in sync so that sstables
will not be deleted while in use.
Refs #4149
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
"
Currently any large partitions found during shutdown are not
recorded. The reason is that the database commit log is already off,
so there is nowhere to record it to.
One possible solution is to have an independent system database. With
that the regular db is shutdown first and writes can continue to the
system db.
That is a pretty big change. It would also not allow us to record
large partitions in any system tables.
This patch series instead tries to stop the commit log later. With
that any large partitions are recorded to the log and moved to a
sstable on the next startup.
"
* 'espindola/shutdown-order-patches-v7' of https://github.com/espindola/scylla:
db: stop the commit log after the tables during shutdown
db: stop the compaction manager earlier
db: Add a stop_database helper
db: Don't record large partitions in system tables
Truncating a table is very slow if the system is under pressure. Because
in that case we mostly just want to get rid of the existing data, it
shouldn't take this long. The problem happens because truncate has to
wait for memtable flushes to end, twice. This is regardless of whether
or not the table being truncated has any data.
1. The first time is when we call truncate itself:
if auto_snapshot is enabled, we will flush the contents of this table
first and we are expected to be slow. However, even if auto_snapshot is
disabled we will still do it -- which is a bug -- if the table is marked
as durable. We should just not flush in this case and it is a silly bug.
1. The second time is when we call cf->stop(). Stopping a table will
wait for a flush to finish. At this point, regardless of which path
(Durable or non-durable) we took in the previous step we will have no
more data in the table. However, calling `flush()` still need to acquire
a flush_permit, which means we will wait for whichever memtable is
flushing at that very moment to end.
If the system is under pressure and a memtable flush will take many
seconds, so will truncate. Even if auto_snapshots are enabled, we
shouldn't have to flush twice. The first flush should already put is in
a state in which the next one is immediate (maybe holding on to the
permit, maybe destroying the memtable_list already at that point ->
since no other memtables should be created).
If auto_snapshots are not enabled, the whole thing should just be
instantaneous.
This patchset fixes that by removing the flush need when !auto_snapshot,
and special casing the flush of an empty table.
Fixes#4294
* git@github.com:glommer/scylla.git slowtruncate-v2:
database: immediately flush tables with no memtables.
truncate: do not flush memtables if auto_snapshot is false.
This allows for system.large_partitions to be updated if a large
partition is found while writing the last sstables.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
We want to finish all large data logging in stop_system, so stopping
the compaction manager should be the first thing stop_system does.
The make_ready_future<>() will be removed in a followup patch.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>