Commit Graph

11551 Commits

Author SHA1 Message Date
Tomasz Grabiec
bb0ce5d8fe Merge "Ensure base and view schema versions match" from Duarte
The mapping between a base table update and a view update is schema
dependent, so we need to ensure the view schema versions match the
base schema version. For example, we match base columns to view
columns by name, so we need to ensure the base and view schemas we're
using for writting are isolated with respect to a previous alter
table statement.

We thus need to match base schema versions with view schema versions,
and we need to so atomically to ensure that when one fiber sees a
schema, it also sees the complete set of corresponding view schemas.
This series ensures the schemas modified as a result of an alter
table statement are published atomically, under the schema lock. This
way, all the schemas referenced by the database are consistent with
each other when they are observed by other fibers.

Finally, we upgrade the mutation schema before generating the view
updates, to ensure it matches the most recent view schemas the base
replica knows about, registered in the database.

The db::view::view class was replaced by a set of non-member
functions, with its state, which used to reflect only the most recent
schema version, being moved to a new view_info class.
2017-03-17 12:40:00 +01:00
Duarte Nunes
b27da688f9 mutation: Remove dead get_cell() function
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170316234843.23130-1-duarte@scylladb.com>
2017-03-17 11:18:23 +02:00
Pekka Enberg
91b9e0d914 Update scylla-ami submodule
* dist/ami/files/scylla-ami eedd12f...407e8f3 (1):
  > scylla_create_devices: check block device is exists

Fixes #2171
2017-03-17 11:13:07 +02:00
Tomasz Grabiec
3609665b19 lsa: Fix debug-mode compilation error
By moving definitions of setters out of #ifdef
2017-03-16 18:23:05 +01:00
Tomasz Grabiec
88e7b3ff79 lsa: Ensure can_allocate_more_memory() always leaves a gap above seastar's min_free_memory()
One of the goals of can_allocate_more_memory() is to prevent depleting
seastar's free memory close to its minimum, leaving a head room above
that minimum so that standard allocations will not cause reclamation
immediately. Currently the function doesn't take into accoutn actual
threshold used by the seastar allocator, so there could be no gap or
even could go below the minimum.

Fix that by ensuring there's always a gap above min_free_memory().

min_gap was reduced to 1 MiB so that low memory setups are not
impacted significantly by the change.
Message-Id: <1489667863-15099-1-git-send-email-tgrabiec@scylladb.com>
2017-03-16 12:42:50 +00:00
Tomasz Grabiec
17ede24a77 Update seastar submodule
* seastar 4d25b85...6b21197 (3):
  > core: memory: Expose control of the free memory low water mark
  > scripts: add perftune.py
  > tutorial: make network examples work on multi-core
2017-03-16 13:32:45 +01:00
Pekka Enberg
3afd7f39b5 cql3: Wire up functions for floating-point types
Fixes #2168
Message-Id: <1489661748-13924-1-git-send-email-penberg@scylladb.com>
2017-03-16 11:04:59 +00:00
Avi Kivity
434a4fee28 Merge "tests: Use allocating_section in lsa_async_eviction_test" from Tomasz
"The test allocates objects in batches (allocation is always under a reclaim
lock) of ~3MiB and assumes that it will always succeed because if we cross the
low water mark for free memory (20MiB) in seastar, reclamation will be
performed between the batches, asynchronously.

Unfortunately that's prevented by can_allocate_more_memory(), which fails
segment allocation when we're below the low water mark. LSA currently doesn't
allow allocating below the low water mark.

The solution which is employed across the code base is to use allocating_section,
so use it here as well.

Exposed by recent consistent failures on branch-1.7."

* 'tgrabiec/fix-lsa-async-eviction-test' of github.com:cloudius-systems/seastar-dev:
  tests: lsa_async_eviction_test: Allocate objects under allocating section
  lsa: Allow adjusting reserves in allocating_section
2017-03-16 12:44:14 +02:00
Tomasz Grabiec
cefb6b604a tests: lsa_async_eviction_test: Allocate objects under allocating section 2017-03-16 10:21:10 +01:00
Tomasz Grabiec
4ab8b255da lsa: Allow adjusting reserves in allocating_section 2017-03-16 10:21:10 +01:00
Raphael S. Carvalho
6b6bb38f38 compaction_manager: stop manager after storage io error
Manager will stop itself if a compaction fails due to storage io
error, which unconditionally results in stop of transportation
services.

Fixes #2147.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170316054538.23423-1-raphaelsc@scylladb.com>
2017-03-16 10:37:47 +02:00
Duarte Nunes
876a514743 database: Upgrade mutation to current schema to push view updates
This patch ensures we upgrade the mutation to the current schema when
generating and pushing view updates, so that the it matches the most
up to date views.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-03-15 18:15:27 +01:00
Duarte Nunes
be12a2bf0a db/schema_tables: Atomically publish base and view changes
This patch ensures that the schema merging atomically publishes
schema changes. In particular, it ensures that when a base schema
and a subset of its views are modified together (i.e., upon an alter
table or alter type statement), then they are published together as
well, without any deferring in-between.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-03-15 16:35:07 +01:00
Duarte Nunes
e215f25b11 migration_manager: Atomically migrate table and views
This patch changes the migration path for table updates such that the
base table mutations are sent and applied atomically with the view
schema mutations.

This ensures that after schema merging, we have a consistent mapping
of base table versions to view table versions, which will be used in
later patches.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-03-15 16:03:56 +01:00
Duarte Nunes
bfb8a3c172 materialized views: Replace db::view::view class
The write path uses a base schema at a particular version, and we
want it to use the materialized views at the corresponding version.

To achieve this, we need to map the state currently in db::view::view
to a particular schema version, which this patch does by introducing
the view_info class to hold the state previously in db::view::view,
and by having a view schema directly point to it.

The changes in the patch are thus:

1) Introduce view_info to hold the extra view state;
2) Point to the view_info from the schema;
3) Make the functions in the now stateless db::view::view non-member;
4) Remove the db::view::view class.

All changes are structural and don't affect current behavior.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-03-15 15:50:05 +01:00
Duarte Nunes
a64c47f315 schema: Move raw_view_info outside of raw_schema
In preparation of an upcoming patch, where the schema
won't directly store the raw_view_info.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-03-15 15:38:31 +01:00
Duarte Nunes
4b209be8b8 view_info: Rename to raw_view_info
In preparation for upcoming patches, which will deal with
moving the state in db::view::view to view_info.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-03-15 15:38:31 +01:00
Paweł Dziepak
dc99197318 Merge "Correctly handle tombstoned collections" from Duarte
"The current implementations of collection_type_impl::is_empty() and
collection_type_impl::difference() don't handle tombstoned collection
mutations correctly. In particular:

- is_empty() considers a collection mutation with a tombstone (and no
  entries) as empty;
- difference() doesn't do set difference between the cells tombstones,
  and always returns the highests.

Fixes #2152"

* 'collection-diff/v4' of github.com:duarten/scylla:
  mutation_test: Add more test cases for difference()
  mutation_source_test: Randomly generate collection cells
  collection_type_impl: Use set difference for tombstones
  collection_type_impl: A mutation with a tombstone is not empty
2017-03-15 13:39:55 +00:00
Duarte Nunes
143136647a mutation_test: Add more test cases for difference()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-03-15 14:34:01 +01:00
Duarte Nunes
005e4741e3 mutation_source_test: Randomly generate collection cells
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-03-15 14:34:01 +01:00
Duarte Nunes
61741a69b6 collection_type_impl: Use set difference for tombstones
This patch fixes collection_type_impl::difference() so it does set
difference for tombstones instead of just returning the larger
one, as difference() is supposed to return only the information in
mutation A that supersedes that in B, given difference(A, B).

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-03-15 14:34:01 +01:00
Duarte Nunes
19fcd2d140 collection_type_impl: A mutation with a tombstone is not empty
This patch changes the collection_type_impl::is_empty() function so
that it doesn't consider empty a collection_mutation which has a
tombstone.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-03-15 14:34:01 +01:00
Takuya ASADA
b65d58e90e dist/common/scripts/scylla_raid_setup: don't discard blocks at mkfs time
Discarding blocks on large RAID volume takes too much time, user may suspects
the script doesn't works correctly, so it's better to skip, do discard directly on each volume instead.

Fixes #1896

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1489533460-30127-1-git-send-email-syuu@scylladb.com>
2017-03-15 13:13:57 +02:00
Calle Wilund
078589c508 commitlog_replayer: Make replay parallel per shard
Fixes #2098

Replay previously did all segments in parallel on shard 0, which
caused heavy memory load. To reduce this and spread footprint
across shards, instead do X segments per shard, sequential per shard.

v2:
* Fixed whitespace errors

Message-Id: <1489503382-830-1-git-send-email-calle@scylladb.com>
2017-03-15 13:07:17 +02:00
Amnon Heiman
0a2eba1b94 database: requests_blocked_memory metric should be unique
Metrics name should be unique per type.

requests_blocked_memory was registered twice, one as a gauge and one as
derived.

This is not allowed.

Fixes #2165

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20170314162826.25521-1-amnon@scylladb.com>
2017-03-14 19:36:45 +02:00
Avi Kivity
ed4b5f5a18 Merge seastar upstream
* seastar fd29fd0...4d25b85 (2):
  > core/file: fix EOF detection for file with custom impl
  > tutorial: fix echo server example

Includes patch from Raphael updating checked_file_impl:

"Now file_impl requires dma_read_bulk to be implemented, and for
checked_file_impl, it only's about calling dma_read_bulk from
the posix file it wraps."
2017-03-14 13:38:38 +02:00
Takuya ASADA
d016dd4b74 dist: schedule daily fstrim for data directory and commitlog directory
Schedule daily fstrim for data directory and commitlog directory, witch is
recommended by Scylla doc:
http://www.scylladb.com/doc/admin/#schedule-fstrim

Fixes #1347

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1489447472-2981-1-git-send-email-syuu@scylladb.com>
2017-03-14 11:51:53 +02:00
Amnon Heiman
295a981c61 storage_proxy: metrics should have unique name
Metrics should have their unique name. This patch changes
throttled_writes of the queu lenght to current_throttled_writes.

Without it, metrics will be reported twice under the same name, which
may cause errors in the prometheus server.

This could be related to scylladb/seastar#250

Fixes #2163.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20170314081456.6392-1-amnon@scylladb.com>
2017-03-14 11:19:39 +02:00
Tomasz Grabiec
ed530dfb3a tests: sstables: Add test for skipping within a compressed stream
Refs #2143.
2017-03-13 13:08:24 +01:00
Tomasz Grabiec
1e0af2efc3 Update seastar submodule
* seastar 84a0b70...fd29fd0 (4):
  > Fix smp::submit_to() with function reference
  > execution_stage: add concept restraint for operator()
  > core/temporary_buffer: Add operator==()
  > map_reduce: allow reducer to take accumulated value by rref
2017-03-13 10:13:03 +01:00
Paweł Dziepak
60c6b9a240 Merge "Implement sstable_streamed_mutation::fast_forward_to()" from Tomasz
"This replaces use of a generic forwarding wrapper in sstable reader with
specialized implentation. Forwarding doesn't yet utilize indexes in this
series, only integrates it with mp_row_consumer, which is a prerequisite.

It's still an optimization, since mp_row_consumer will not try to consume
past the range as it used to.

Sending early for easier consumption."

* tag 'tgrabiec/forwarding-of-mp-row-consumer-v2' of github.com:scylladb/seastar-dev:
  sstables: Remove use of forwarding wrapper
  sstables: Implement sstable_streamed_mutation::fast_forward_to()
  sstables: Extract and use clustering_ranges_walker
  tests: sstables: Add test for handling of repeated tombstones
  sstables: Extract writer parameters into config objects
  tests: Move as_mutation_source() helper to header
  tests: Extract ensure_monotonic_positions() to streamed_mutation_assertions
  streamed_mutation: Add streamed_mutation_returning() helper
  tests: mutation_source_test: Add test case for forwarding to a full range
  tests: simple_schema: Add fragment factories
  tests: Extract simple_schema
  sstables: Move workaround for out-of-order range tombstones to mp_row_consumer
  sstables: Drop default mp_row_consumer constructor
  sstables: Swap order of values in "proceed" so that "no" is assigned 0
  util/optimized_optional: Make printable
  position_in_partition: Add is_static_row() in the view
  range_tombstone_stream: Add reset()
  range_tombstone_stream: Add get_next(position_in_partition_view)
  sstables: streamed_mutation: Stop reading when end of slice reached
  sstables: Switch is_in_range() to position_in_partition
2017-03-10 13:55:46 +00:00
Tomasz Grabiec
1f1b516b31 sstables: Remove use of forwarding wrapper 2017-03-10 14:42:22 +01:00
Tomasz Grabiec
d7afab21e7 sstables: Implement sstable_streamed_mutation::fast_forward_to()
Handling of forwarding is done inside mp_row_consumer, because it
allows us to filter out irrelevant data sooner and thus more
efficiently.

Becuase static row can be now skipped as well, _skip_clustering_row
was renamed to more generic _skip_in_progress.
2017-03-10 14:42:22 +01:00
Tomasz Grabiec
4750216387 sstables: Extract and use clustering_ranges_walker
Extracted from mp_row_consumer.
2017-03-10 14:42:22 +01:00
Tomasz Grabiec
88ccc99017 tests: sstables: Add test for handling of repeated tombstones 2017-03-10 14:42:22 +01:00
Tomasz Grabiec
124dde30db sstables: Extract writer parameters into config objects
Also enables users to change the default promoted index block size.
2017-03-10 14:42:22 +01:00
Tomasz Grabiec
ad1e69c4c5 tests: Move as_mutation_source() helper to header 2017-03-10 14:42:22 +01:00
Tomasz Grabiec
6f409d367b tests: Extract ensure_monotonic_positions() to streamed_mutation_assertions 2017-03-10 14:42:22 +01:00
Tomasz Grabiec
dc7b93a326 streamed_mutation: Add streamed_mutation_returning() helper 2017-03-10 14:42:22 +01:00
Tomasz Grabiec
06a964b3a0 tests: mutation_source_test: Add test case for forwarding to a full range 2017-03-10 14:42:22 +01:00
Tomasz Grabiec
929842ad3f tests: simple_schema: Add fragment factories 2017-03-10 14:42:22 +01:00
Tomasz Grabiec
d98f013b07 tests: Extract simple_schema 2017-03-10 14:42:22 +01:00
Tomasz Grabiec
01374c41f2 sstables: Move workaround for out-of-order range tombstones to mp_row_consumer
This is a preliminary step before adding support for fast-forwarding
to mp_row_consumer, so that range handling can be solely in
mp_row_consumer rather than split between it and
sstable_streamed_mutation.

This also alleviates #2080 by reading all tombstones only up to the
first row, after that range tombstones are treated like other
fragments.
2017-03-10 14:42:22 +01:00
Tomasz Grabiec
d41a7c5eb4 sstables: Drop default mp_row_consumer constructor 2017-03-10 14:42:22 +01:00
Tomasz Grabiec
56f1ad7841 sstables: Swap order of values in "proceed" so that "no" is assigned 0 2017-03-10 14:42:22 +01:00
Tomasz Grabiec
58c29be45c util/optimized_optional: Make printable 2017-03-10 14:42:21 +01:00
Tomasz Grabiec
a32cf6c4cc position_in_partition: Add is_static_row() in the view 2017-03-10 14:42:21 +01:00
Tomasz Grabiec
e4db643730 range_tombstone_stream: Add reset() 2017-03-10 14:42:21 +01:00
Tomasz Grabiec
48ad2e2d64 range_tombstone_stream: Add get_next(position_in_partition_view) 2017-03-10 14:42:21 +01:00
Tomasz Grabiec
084747b1ee sstables: streamed_mutation: Stop reading when end of slice reached
As part of this change, skip detection detection is refactored. This
simplifies reasoning about mp_row_consumer's state a bit because now
is_mutation() is not reset externally and only depends on current
position of the reader.

It will prove useful when we extend mutation reader to decide if it
should skip to the next partition up front before calling
_context.read(), so that we can for instance skip using index instead.

Fixes #2088.
2017-03-10 14:42:19 +01:00