Commit Graph

17295 Commits

Author SHA1 Message Date
Benny Halevy
ef53ddf3ae scylla_io_setup: correct units in low space warning
GiB -> GB

Refs #2676

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181210092503.10344-1-bhalevy@scylladb.com>
2018-12-10 13:58:49 +02:00
Avi Kivity
475b151c97 Merge "Use utils::small_vector more in read path" from Paweł
"
This series optimises the read path by replacing some usages of
std::vector by utils::small_vector. The motivation for this change was
an observation that memory allocation functions are pointed out by the
profiler as the ones where we spent most time and while they have a
large number of callers storage allocation for some vectors was close to
the top. The gains are not huge, since the problem is a lot of things
adding up and not a single slow thing, but we need to start with
something.

Unfortunately, the performance of boost::container::small_vector is
quite disappointing so a new implementation of a small_vector was
introduced.

perf_simple_query -c4 --duration 60, medians:

       ./perf_before  ./perf_after  diff
 read      343086.80     360720.53  5.1%

Tests: unit(release, small_vector in debug)
"

* tag 'small_vector/v2.1' of https://github.com/pdziepak/scylla:
  partition_slice: use small_vector for column_ids
  mutation_fragment_merger: use small_vector
  auth: use small_vector in resource
  auth: avoid list-initialisation of vectors
  idl: serialiser: add serialiser for utils::small_vector
  idl: serialiser: deduplicate vector serialisers
  utils: introduce small_vector
  intrusive_set_external_comparator: make iterator nothrow move constructible
  mutation_fragment_merger: value-initialise iterator
2018-12-10 13:50:59 +02:00
Duarte Nunes
a42b2895c2 Merge branch 'gossip: Send node UP event to cql client after cql server is up' from Asias
"
This is a backport of CASSANDRA-8236.

Before this patch, scylla sends the node UP event to cql client when it
sees a new node joins the cluster, i.e., when a new node's status
becomes NORMAL. The problem is, at this time, the cql server might not
be ready yet. Once the client receives the UP event, it tries to
connect to the new node's cql port and fails.

To fix, a new application_sate::RPC_READY is introduced, new node sets
RPC_READY to false when it starts gossip in the very beginning and sets
RPC_READY to true when the cql server is ready.

The RPC_READY is a bad name but I think it is better to follow Cassandra.

Nodes with or without this patch are supposed to work together with no
problem.

Refs #3843
"

* 'asias/node_up_down.upstream.v4.1' of github.com:scylladb/seastar-dev:
  storage_service: Use cql_ready facility
  storage_service: Handle application_state::RPC_READY
  storage_service: Add notify_cql_change
  storage_service: Add debug log in notify_joined
  storage_service: Add extra check in notify_joined
  storage_service: Add notify_joined
  storage_service: Add debug log in notify_up
  storage_service: Add extra check in notify_up
  storage_service: Add notify_up
  storage_service: Make notify_left log debug level
  storage_service: Introduce notify_left
  storage_service: Add debug log in notify_down
  storage_service: Introduce notify_down
  storage_service: Add set_cql_ready
  gossip: Add gossiper::is_cql_ready
  gms: Add endpoint_state::is_cql_ready
  gms: Add application_state::RPC_READY
  gms: Introduce cql_ready in versioned_value
2018-12-10 11:37:59 +00:00
Asias He
06dc9b8da0 storage_service: Use cql_ready facility
At this point the cql_ready facility is ready. To use it, advertise the
RPC_READY application state in the following cases:

- When a node boots, set it to false
- When cql server is ready, set it to true
- When cql server is down, set it to false
2018-12-10 19:20:20 +08:00
Asias He
4761b53035 storage_service: Handle application_state::RPC_READY 2018-12-10 19:20:20 +08:00
Asias He
0e64814206 storage_service: Add notify_cql_change
It is called when a RPC_READY gossip application state is received.
2018-12-10 19:20:20 +08:00
Asias He
a1bbd7bcc7 storage_service: Add debug log in notify_joined 2018-12-10 19:20:20 +08:00
Asias He
17d68cb408 storage_service: Add extra check in notify_joined
Do not send node joined event if node is not in NORMAL status which
means the node has joined the cluster officially.
2018-12-10 19:20:20 +08:00
Asias He
9abb15192f storage_service: Add notify_joined
Add a helper for node joined event.
2018-12-10 19:20:20 +08:00
Asias He
60c74431f7 storage_service: Add debug log in notify_up 2018-12-10 19:20:20 +08:00
Asias He
948d2b6c78 storage_service: Add extra check in notify_up
Do not send up event if is_cql_ready is false which means cql server is
not ready yet or node is down.
2018-12-10 19:20:20 +08:00
Asias He
48cd31dc1e storage_service: Add notify_up
Add a helper for node up event.
2018-12-10 19:20:20 +08:00
Asias He
03f9c3e7e5 storage_service: Make notify_left log debug level
Be consistent with other notification log.
2018-12-10 19:20:20 +08:00
Asias He
a5ec25f28b storage_service: Introduce notify_left
Add a helper for node left event.
2018-12-10 19:20:20 +08:00
Asias He
15d7fce902 storage_service: Add debug log in notify_down 2018-12-10 19:20:19 +08:00
Asias He
f18cb0654d storage_service: Introduce notify_down
Add a helper for node down event.
2018-12-10 19:20:19 +08:00
Asias He
2f3130b36f storage_service: Add set_cql_ready
It is used to set the status of the RPC_READY of this node so it can be
advertised by gossip.
2018-12-10 19:20:17 +08:00
Asias He
e07150166a gossip: Add gossiper::is_cql_ready
- New scylla node always send application_state::RPC_READY = false when
the node boots and send application_state::RPC_READY = true when cql
server is up

- Old scylla node that does not support the application_state::RPC_READY
never has application_state::RPC_READY in the endpoint_state, we can
only think their cql server is up, so we return true here if
application_state::RPC_READY is not present
2018-12-10 19:16:44 +08:00
Asias He
2737654c75 gms: Add endpoint_state::is_cql_ready
Retrun if the endpoint_state has the RPC_READY application_state.
2018-12-10 19:16:44 +08:00
Asias He
67093324ad gms: Add application_state::RPC_READY
It is used to tell peer nodes that the cql server is ready and can
accept clients request.

Follow the same name which Cassandra uses.
2018-12-10 19:16:44 +08:00
Asias He
4ed2ef23e9 gms: Introduce cql_ready in versioned_value 2018-12-10 19:16:43 +08:00
Avi Kivity
7c7da0b462 sstables: fix overflow in clustering key blocks header bit access
_ck_blocks_header is a 64-bit variable, so the mask should be 64 bits too.
Otherwise, a shift in the range 32-63 will produce wrong results.

Fix by using a 64-bit mask.

Found by Fedora 29's ubsan.

Fixes #3973.
Message-Id: <20181209120549.21371-1-avi@scylladb.com>
2018-12-10 11:09:25 +00:00
Takuya ASADA
a2d0ebf4d9 dist/offline_installer/redhat: fix missing dependencies
Offline installer with Scylla 3.0 causes dependency error on CentOS, added
missing packages.

Fixes #3969

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181207020711.23055-1-syuu@scylladb.com>
2018-12-10 12:47:10 +02:00
Avi Kivity
904db433d9 Merge "Re-use commitlog segments" from Calle
"
Refs #3929

Enables re-use of commitlog segments.

First, ensures we never succeed playing back a commitlog
segment with name not matching the ID:s in the actual
file data, by determining expected id based on file name.
This will also handle partially written re-used files, as
each chunk headers CRC is dependent on the ID, and will
fail once we hit any left-overs.

Second part renamed and puts files into a recycle list
instead of actually deleting them when finished.
Allocating new files will the prioritize this list
before creating a new file.

Note that since consumtion and release of segments can
be somewhat unbalanced, this does not really guarantee
we will use recycled files even in all cases when it
might be possible, simply because of timing. It does
however give a good chance of it.

We limit recycled files based on the max disk size
setting, thus we can potentially grow disk size
more than without depending on timing, but not
uncontrolled.

While all this theoretially might improve disk
writes in some cases, it is far from any magic bullet.
No real performance testing has been done yet, only
functional.
"

* 'calle/commitlog-reuse' of github.com:scylladb/seastar-dev:
  commitlog: Recycle used segments instead of delete + new file
  commitlog: Terminate all segments with a zero chunk
  commitlog_replay: Enforce file name based id matching
2018-12-10 11:15:02 +02:00
Calle Wilund
55f10ffc43 commitlog: Recycle used segments instead of delete + new file
Refs #3929

When deleting a segment, IFF we have not yet filled up all reserves,
instead of actually deleting the file, put it on a "recycle" list.
Next segment allocation will instead of creating a new one simply
rename the segment and reuse the file and its allocated space.

We rename the file twice: Once on adding to recycle list, with special
prefix so we don't mix up actual replayable segments and these. Second
when we actually re-use the file (also to ensure consecutive names).

Note that we limit the amount of recyclables, so a really stressed
application which somehow fills up the replenish queue might
cause us to still drop the segments. Could skip this but risk
getting to many files on disk.

Replay should be safe, since all entries are guarded by CRC based
on the file ID (i.e. file name). Thus replaying a recycled segment
will simply cause a CRC error in the main header and be ignored (see
previous patch).

Segments that are fully synced will have terminating zero-header (see
previous patch) so we know when to stop processing a recycled file.
If a file is the result of a mid-write crash, we will generate a CRC
processing error as "normally" in this case, when hitting partially
written block or coming to an old/new chunk boundary.

v2:
* Sync dir on rename
* auto -> const sstring&
* Allow recycling files as long as we're within disk space limits

v3:
* Use special names for files waiting for reuse
2018-12-10 09:09:07 +00:00
Calle Wilund
b13b6ef6a0 commitlog: Terminate all segments with a zero chunk
Writes a final chunk header of zero to the file on close, to mark
end-of-segment.
This allows us to gracefully stop replay processing of a segment file
even if it was not zeroed from the beginning (maybe recycled - hint
hint).
2018-12-10 09:09:07 +00:00
Calle Wilund
b35af84599 commitlog_replay: Enforce file name based id matching
When reading the header chunk of a commitlog file, check the stored id
value against the id derived from the file name, and ignore if
mismatched. This is a prerequisite for re-using renamed commitlog files,
as we can then fail-fast should one such be left on disk, instead of
trying to replay it.

We also check said id via the CRC check for each chunk parsed. If we
find a chunk with
mismatched id, we will get a CRC error for the chunk, and replay will
terminate (albeit not gracefully).
2018-12-10 09:09:07 +00:00
Amnon Heiman
09c2b8b48a node_exporter_install: switch to node_exporter 0.17
The newer version of node_exporter comes with important bug fixes, that
is especially important for I3.metal is not supported with the older
version of node_exporter.

The dashboards can now support both the new and the old version of
node_exporter.

Fixes #3927

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20181210085251.23312-1-amnon@scylladb.com>
2018-12-10 10:54:50 +02:00
Benny Halevy
bcb486b8b9 scylla_io_setup: io_tune should not run when there is less than 10GB of disk space
Fixes #2676

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181209174852.3620-1-bhalevy@scylladb.com>
2018-12-10 10:38:33 +02:00
Yibo Cai (Arm Technology China)
6717816a8d utils/gz: optimize crc_combine for arm64
Signed-off-by: Yibo Cai <yibo.cai@arm.com>
Message-Id: <1544418903-26290-1-git-send-email-yibo.cai@arm.com>
2018-12-10 10:31:08 +02:00
Avi Kivity
40677fae37 Merge "Compaction strategy aware major compaction" from Raphael
"
Make major compaction aware of compaction strategy, by using an
optimal approach which suits the strategy needs.

Refs #1431.
"

* 'compaction_strategy_aware_major_compaction_v2' of github.com:raphaelsc/scylla:
  tests: add test for compaction-strategy-aware major compaction
  compaction: implement major compaction heuristic for leveled strategy
  compaction: introduce notion of compaction-strategy-aware major compaction
2018-12-10 10:10:22 +02:00
Amos Kong
09a3b11c2f scylla_setup: only ask for nic in interactive mode
Current scylla_setup still asks for nic even nic is already assigned in cmdline.

Fixes #3908

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <6b867e17a5583c495c771a37d5fa1e8366b1d61b.1542337635.git.amos@scylladb.com>
2018-12-09 15:29:31 +02:00
Gleb Natapov
9fb79bf379 storage_proxy: fix crash during write timeout callback invocation
rh_entry address is captured inside timeout's callback lambda, so the
structure should not be moved after it is created. Change the code to
create rh_entry in-place instead of moving it into the map.

Fixes #3972.

Message-Id: <20181206164043.GN25283@scylladb.com>
2018-12-09 10:33:37 +02:00
Vladimir Krivopalov
6a5d8934a6 db: Enable SSTables 'mc' format by default.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <ab4394b98a520b87c986bea2ceef13d015688967.1544227350.git.vladimir@scylladb.com>
2018-12-08 11:07:38 +02:00
Tomasz Grabiec
b78d98a358 tests: perf_fast_forward: Fix result_collector::add() for multi-element results
The results vector should be populated vertically, not horizontally.

Responsible for assertion failure with --cache-enabled:

  void result_collector::add(test_result_vector): Assertion `rs.size() == results.size()' failed.

Introduced in 3fc78a25bf.
Message-Id: <1544105835-24530-2-git-send-email-tgrabiec@scylladb.com>
2018-12-07 12:44:32 +00:00
Tomasz Grabiec
10cde9ae50 tests: perf_fast_forward: Fix live_range not being initialized
Broken in 470552b7ab

Causes test failure when running with --cache-enabled
Message-Id: <1544105835-24530-1-git-send-email-tgrabiec@scylladb.com>
2018-12-07 12:38:01 +00:00
Tomasz Grabiec
bb24d378b2 Merge "Fixes for collecting stats in SST3 + more tests" from Vladimir
This patchset fixes several remaining issues found during thorough
testing of SSTables 3.x statistics and enriches ~30 unit tests with
statistics validation against Cassandra-generated golden copies.

* https://github.com/argenet/scylla/tree/projects/sstables-30/sst3-tests-statistics/v1:
  sstables: Enforce estimated_partitions in generate_summary() to be
    always positive.
  sstables: Don't enforce default max_local_deletion_time value for 'mc'
    files.
  sstables: Update TTL/local deletion stats for non-expiring and live
    liveness_info.
  sstables: Collect statistics when writing RT markers to SSTables 3.x.
  tests: Return sstable_assertions from validate_read() helper.
  tests: Introduce helper for validating stats metadata in SSTables 3.x
    tests.
  tests: Add stats metadata validation to test_write_static_row.
  tests: Add stats metadata validation to
    test_write_composite_partition_key.
  tests: Add stats metadata validation to
    test_write_composite_clustering_key.
  tests: Add stats metadata validation to test_write_wide_partitions.
  tests: Add stats metadata validation to write_ttled_row
  tests: Add stats metadata validation to write_ttled_column
  tests: Add stats metadata validation to write_deleted_column
  tests: Add stats metadata validation to write_deleted_row
  tests: Add stats metadata validation to write_collection_wide_update
  tests: Add stats metadata validation to
    write_collection_incremental_update
  tests: Add stats metadata validation to write_multiple_partitions
  tests: Add stats metadata validation to write_multiple_rows
  tests: Add stats metadata validation to
    write_missing_columns_large_set
  tests: Add stats metadata validation to write_different_types
  tests: Add stats metadata validation to write_empty_clustering_values
  tests: Add stats metadata validation to write_large_clustering_key
  tests: Add stats metadata validation to write_compact_table
  tests: Add stats metadata validation to write_user_defined_type_table
  tests: Add stats metadata validation to write_simple_range_tombstone
  tests: Add stats metadata validation to
    write_adjacent_range_tombstones
  tests: Add stats metadata validation to
    write_non_adjacent_range_tombstones
  tests: Add stats metadata validation to
    write_mixed_rows_and_range_tombstones
  tests: Add stats metadata validation to
    write_adjacent_range_tombstones_with_rows
  tests: Add stats metadata validation to
    write_range_tombstone_same_start_with_row
  tests: Add stats metadata validation to
    write_range_tombstone_same_end_with_row
  tests: Add stats metadata validation to
    write_two_non_adjacent_range_tombstones
  tests: Delete unused (bogus) Statistics.db file from write_ SST3
    tests.
2018-12-07 12:05:55 +01:00
Vladimir Krivopalov
98ae39f920 tests: Delete unused (bogus) Statistics.db file from write_ SST3 tests.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
dcd639b4d5 tests: Add stats metadata validation to write_two_non_adjacent_range_tombstones
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
d07ab3b3ef tests: Add stats metadata validation to write_range_tombstone_same_end_with_row
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
b856cf837e tests: Add stats metadata validation to write_range_tombstone_same_start_with_row
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
ba24572fb6 tests: Add stats metadata validation to write_adjacent_range_tombstones_with_rows
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
4167c9e51d tests: Add stats metadata validation to write_mixed_rows_and_range_tombstones
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
fd1c9b84c6 tests: Add stats metadata validation to write_non_adjacent_range_tombstones
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
1a6d613654 tests: Add stats metadata validation to write_adjacent_range_tombstones
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
57d2d1a1c6 tests: Add stats metadata validation to write_simple_range_tombstone
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
bc5d5633dc tests: Add stats metadata validation to write_user_defined_type_table
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
d9f2829ca0 tests: Add stats metadata validation to write_compact_table
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
3a1e287c6a tests: Add stats metadata validation to write_large_clustering_key
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
722fc7222a tests: Add stats metadata validation to write_empty_clustering_values
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00