The mapping between a base table update and a view update is schema
dependent, so we need to ensure the view schema versions match the
base schema version. For example, we match base columns to view
columns by name, so we need to ensure the base and view schemas we're
using for writting are isolated with respect to a previous alter
table statement.
We thus need to match base schema versions with view schema versions,
and we need to so atomically to ensure that when one fiber sees a
schema, it also sees the complete set of corresponding view schemas.
This series ensures the schemas modified as a result of an alter
table statement are published atomically, under the schema lock. This
way, all the schemas referenced by the database are consistent with
each other when they are observed by other fibers.
Finally, we upgrade the mutation schema before generating the view
updates, to ensure it matches the most recent view schemas the base
replica knows about, registered in the database.
The db::view::view class was replaced by a set of non-member
functions, with its state, which used to reflect only the most recent
schema version, being moved to a new view_info class.
One of the goals of can_allocate_more_memory() is to prevent depleting
seastar's free memory close to its minimum, leaving a head room above
that minimum so that standard allocations will not cause reclamation
immediately. Currently the function doesn't take into accoutn actual
threshold used by the seastar allocator, so there could be no gap or
even could go below the minimum.
Fix that by ensuring there's always a gap above min_free_memory().
min_gap was reduced to 1 MiB so that low memory setups are not
impacted significantly by the change.
Message-Id: <1489667863-15099-1-git-send-email-tgrabiec@scylladb.com>
* seastar 4d25b85...6b21197 (3):
> core: memory: Expose control of the free memory low water mark
> scripts: add perftune.py
> tutorial: make network examples work on multi-core
"The test allocates objects in batches (allocation is always under a reclaim
lock) of ~3MiB and assumes that it will always succeed because if we cross the
low water mark for free memory (20MiB) in seastar, reclamation will be
performed between the batches, asynchronously.
Unfortunately that's prevented by can_allocate_more_memory(), which fails
segment allocation when we're below the low water mark. LSA currently doesn't
allow allocating below the low water mark.
The solution which is employed across the code base is to use allocating_section,
so use it here as well.
Exposed by recent consistent failures on branch-1.7."
* 'tgrabiec/fix-lsa-async-eviction-test' of github.com:cloudius-systems/seastar-dev:
tests: lsa_async_eviction_test: Allocate objects under allocating section
lsa: Allow adjusting reserves in allocating_section
This patch ensures we upgrade the mutation to the current schema when
generating and pushing view updates, so that the it matches the most
up to date views.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch ensures that the schema merging atomically publishes
schema changes. In particular, it ensures that when a base schema
and a subset of its views are modified together (i.e., upon an alter
table or alter type statement), then they are published together as
well, without any deferring in-between.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch changes the migration path for table updates such that the
base table mutations are sent and applied atomically with the view
schema mutations.
This ensures that after schema merging, we have a consistent mapping
of base table versions to view table versions, which will be used in
later patches.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
The write path uses a base schema at a particular version, and we
want it to use the materialized views at the corresponding version.
To achieve this, we need to map the state currently in db::view::view
to a particular schema version, which this patch does by introducing
the view_info class to hold the state previously in db::view::view,
and by having a view schema directly point to it.
The changes in the patch are thus:
1) Introduce view_info to hold the extra view state;
2) Point to the view_info from the schema;
3) Make the functions in the now stateless db::view::view non-member;
4) Remove the db::view::view class.
All changes are structural and don't affect current behavior.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
In preparation for upcoming patches, which will deal with
moving the state in db::view::view to view_info.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
"The current implementations of collection_type_impl::is_empty() and
collection_type_impl::difference() don't handle tombstoned collection
mutations correctly. In particular:
- is_empty() considers a collection mutation with a tombstone (and no
entries) as empty;
- difference() doesn't do set difference between the cells tombstones,
and always returns the highests.
Fixes#2152"
* 'collection-diff/v4' of github.com:duarten/scylla:
mutation_test: Add more test cases for difference()
mutation_source_test: Randomly generate collection cells
collection_type_impl: Use set difference for tombstones
collection_type_impl: A mutation with a tombstone is not empty
This patch fixes collection_type_impl::difference() so it does set
difference for tombstones instead of just returning the larger
one, as difference() is supposed to return only the information in
mutation A that supersedes that in B, given difference(A, B).
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch changes the collection_type_impl::is_empty() function so
that it doesn't consider empty a collection_mutation which has a
tombstone.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Discarding blocks on large RAID volume takes too much time, user may suspects
the script doesn't works correctly, so it's better to skip, do discard directly on each volume instead.
Fixes#1896
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1489533460-30127-1-git-send-email-syuu@scylladb.com>
Fixes#2098
Replay previously did all segments in parallel on shard 0, which
caused heavy memory load. To reduce this and spread footprint
across shards, instead do X segments per shard, sequential per shard.
v2:
* Fixed whitespace errors
Message-Id: <1489503382-830-1-git-send-email-calle@scylladb.com>
Metrics name should be unique per type.
requests_blocked_memory was registered twice, one as a gauge and one as
derived.
This is not allowed.
Fixes#2165
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20170314162826.25521-1-amnon@scylladb.com>
* seastar fd29fd0...4d25b85 (2):
> core/file: fix EOF detection for file with custom impl
> tutorial: fix echo server example
Includes patch from Raphael updating checked_file_impl:
"Now file_impl requires dma_read_bulk to be implemented, and for
checked_file_impl, it only's about calling dma_read_bulk from
the posix file it wraps."
Metrics should have their unique name. This patch changes
throttled_writes of the queu lenght to current_throttled_writes.
Without it, metrics will be reported twice under the same name, which
may cause errors in the prometheus server.
This could be related to scylladb/seastar#250
Fixes#2163.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20170314081456.6392-1-amnon@scylladb.com>
* seastar 84a0b70...fd29fd0 (4):
> Fix smp::submit_to() with function reference
> execution_stage: add concept restraint for operator()
> core/temporary_buffer: Add operator==()
> map_reduce: allow reducer to take accumulated value by rref
"This replaces use of a generic forwarding wrapper in sstable reader with
specialized implentation. Forwarding doesn't yet utilize indexes in this
series, only integrates it with mp_row_consumer, which is a prerequisite.
It's still an optimization, since mp_row_consumer will not try to consume
past the range as it used to.
Sending early for easier consumption."
* tag 'tgrabiec/forwarding-of-mp-row-consumer-v2' of github.com:scylladb/seastar-dev:
sstables: Remove use of forwarding wrapper
sstables: Implement sstable_streamed_mutation::fast_forward_to()
sstables: Extract and use clustering_ranges_walker
tests: sstables: Add test for handling of repeated tombstones
sstables: Extract writer parameters into config objects
tests: Move as_mutation_source() helper to header
tests: Extract ensure_monotonic_positions() to streamed_mutation_assertions
streamed_mutation: Add streamed_mutation_returning() helper
tests: mutation_source_test: Add test case for forwarding to a full range
tests: simple_schema: Add fragment factories
tests: Extract simple_schema
sstables: Move workaround for out-of-order range tombstones to mp_row_consumer
sstables: Drop default mp_row_consumer constructor
sstables: Swap order of values in "proceed" so that "no" is assigned 0
util/optimized_optional: Make printable
position_in_partition: Add is_static_row() in the view
range_tombstone_stream: Add reset()
range_tombstone_stream: Add get_next(position_in_partition_view)
sstables: streamed_mutation: Stop reading when end of slice reached
sstables: Switch is_in_range() to position_in_partition
Handling of forwarding is done inside mp_row_consumer, because it
allows us to filter out irrelevant data sooner and thus more
efficiently.
Becuase static row can be now skipped as well, _skip_clustering_row
was renamed to more generic _skip_in_progress.
This is a preliminary step before adding support for fast-forwarding
to mp_row_consumer, so that range handling can be solely in
mp_row_consumer rather than split between it and
sstable_streamed_mutation.
This also alleviates #2080 by reading all tombstones only up to the
first row, after that range tombstones are treated like other
fragments.
As part of this change, skip detection detection is refactored. This
simplifies reasoning about mp_row_consumer's state a bit because now
is_mutation() is not reset externally and only depends on current
position of the reader.
It will prove useful when we extend mutation reader to decide if it
should skip to the next partition up front before calling
_context.read(), so that we can for instance skip using index instead.
Fixes#2088.