Commit Graph

18835 Commits

Author SHA1 Message Date
Asias He
a1e19514f9 repair: Add get_row_diff_source_op
It is a helper that works on the source() of the
get_row_diff rpc stream verb.
2019-07-02 21:22:41 +08:00
Asias He
473bd7599c repair: Add get_full_row_hashes_with_rpc_stream
It is rpc stream version of get_full_row_hashes. It uses rpc stream
instead of rpc verb to get the repair hashes data from follower nodes.
2019-07-02 21:22:41 +08:00
Asias He
1e2a598fe7 repair: Add get_full_row_hashes_sink_op
It is a helper that works on the sink() of the get_full_row_hashes
rpc stream verb.
2019-07-02 21:22:41 +08:00
Asias He
149c54b000 repair: Add get_full_row_hashes_source_op
It is a helper that works on the source() of the
get_full_row_hashes rpc stream verb.
2019-07-02 21:22:41 +08:00
Asias He
b3e7299032 repair: Add sink and source object into repair_meta
They will soon be used to sync repair hashes and repair rows bewteen
master and follower nodes.
2019-07-02 21:22:41 +08:00
Asias He
acd40fd529 repair: Add sink_source_for_put_row_diff
Use sink_source_for_repair to define sink_source_for_put_row_diff
with sink = repair_row_on_wire_with_cmd, source = repair_stream_cmd
for REPAIR_PUT_ROW_DIFF_WITH_RPC_STREAM rpc stream verb.
2019-07-02 21:22:41 +08:00
Asias He
4405f7a6ff repair: Add sink_source_for_get_row_diff
Use sink_source_for_repair to define sink_source_for_get_row_diff with
sink = repair_hash_with_cmd, source = repair_row_on_wire_with_cmd for
REPAIR_GET_ROW_DIFF_WITH_RPC_STREAM rpc stream verb.
2019-07-02 21:22:41 +08:00
Asias He
0bffd07e7e repair: Add sink_source_for_get_full_row_hashes
Use the sink_source_for_repair to define
sink_source_for_get_full_row_hashes with sink = repair_stream_cmd,
source = repair_hash_with_cmd for
REPAIR_GET_FULL_ROW_HASHES_WITH_RPC_STREAM rpc stream verb.
2019-07-02 21:22:41 +08:00
Asias He
8400dafa12 repair: Add sink_source_for_repair helper class
It is used to store the sink and source objects for the rpc stream verbs
used by row level repair.
2019-07-02 21:22:41 +08:00
Asias He
37b3de4ea0 messaging_service: Add REPAIR_GET_FULL_ROW_HASHES_WITH_RPC_STREAM support
It is used by row level repair.
2019-07-02 21:18:55 +08:00
Asias He
a7c7ba9765 messaging_service: Add REPAIR_PUT_ROW_DIFF_WITH_RPC_STREAM support
It is used by row level repair.
2019-07-02 21:18:55 +08:00
Asias He
dc92bda93b messaging_service: Add REPAIR_GET_ROW_DIFF_WITH_RPC_STREAM support 2019-07-02 21:18:55 +08:00
Asias He
f312c95b74 messaging_service: Add do_make_sink_source helper
It is used by the row level repair rpc stream verbs to make sink and
source object.
2019-07-02 21:18:55 +08:00
Asias He
bc295a00a6 messaging_service: Add rpc stream verb for row level repair
- REPAIR_GET_ROW_DIFF_WITH_RPC_STREAM

Get repair rows from follower nodes

- REPAIR_PUT_ROW_DIFF_WITH_RPC_STREAM

Put repair rows to follower nodes

- REPAIR_GET_FULL_ROW_HASHES_WITH_RPC_STREAM:

Get full hashes from follower nodes
2019-07-02 21:18:55 +08:00
Asias He
c93113f3a5 idl: Add repair_row_on_wire_with_cmd 2019-07-02 21:18:54 +08:00
Asias He
a90fb24efc idl: Add repair_hash_with_cmd 2019-07-02 21:18:37 +08:00
Asias He
599d40fbe9 idl: Add repair_stream_cmd 2019-07-02 21:18:15 +08:00
Asias He
672c24f6b0 idl: Add send_full_set_rpc_stream for row_level_diff_detect_algorithm 2019-07-02 21:17:36 +08:00
Asias He
fb3f0125ee repair: Add default construct for partition_key_and_mutation_fragments
This is useful when we want to add an empty
partition_key_and_mutation_fragments.
2019-06-26 09:12:55 +08:00
Asias He
3fc53a6b72 repair: Add send_full_set_rpc_stream in row_level_diff_detect_algorithm
It is used to negotiate if the master can use the rpc stream interface
to transfer data.
2019-06-26 09:12:55 +08:00
Asias He
6054a56333 repair: Add repair_row_on_wire_with_cmd
It is used to contain both a repair cmd and repair_row_on_wire object.
2019-06-26 09:12:55 +08:00
Asias He
9f36d775dc repair: Add repair_hash_with_cmd
It is a wrapper contains both a repair cmd and repair_hash object.
2019-06-26 09:12:55 +08:00
Asias He
6b59279e26 repair: Add repair_stream_cmd
It is used by row level repair to add small protocol on top of the rpc stream
interface.
2019-06-26 09:12:55 +08:00
Rafael Ávila de Espíndola
94d2194c77 dht: token: Simplify operator<
While this is a strict weak ordering, it is not obvious and duplicates
a bit of logic. This ptach simplifies it by using tri_compare.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190621204820.37874-1-espindola@scylladb.com>
2019-06-25 19:06:30 +03:00
Tomasz Grabiec
269e65a8db Merge "Sync schema before repair" from Asias
This series makes sure new schema is propagated to repair master and
follower nodes before repair.

Fixes #4575

* dev.git asias/repair_pull_schema_v2:
  migration_manager: Add sync_schema
  repair: Sync schema from follower nodes before repair
2019-06-25 19:05:29 +03:00
Amos Kong
f0cd589a75 dist: suppress the yaml load warning
YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated,
as the default Loader is unsafe. Please read https://msg.pyyaml.org/load
for full details.

Fix it by use new safe interface - yaml.safe_load()

Signed-off-by: Amos Kong <amos@scylladb.com>
Cc: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <9b68601845117274573474ede0341cc81f80efa6.1561156205.git.amos@scylladb.com>
2019-06-25 19:05:29 +03:00
Avi Kivity
fc629bb14f Merge "cql3: lift infinite bound check" from Benny & Piotr
"
If the database supports infinite bound range deletions,
CQL layer will no longer throw an error indicating that both ranges
need to be specified.

Fixes #432

Update test_range_deletion_scenarios unit test accordingly.
"

* 'cql3-lift-infinite-bound-check' of https://github.com/bhalevy/scylla:
  cql3: lift infinite bound check if it's supported
  service: enable infinite bound range deletions with mc
  database: add flag for infinite bound range deletions
2019-06-25 19:05:29 +03:00
Nadav Har'El
a88c9ca5a5 Merge branch 'add_proper_aggregation_for_paged_indexing_2' of git://github.com/psarna/scylla into next
Piotr Sarna says:

Fixes #4540
This series adds proper handling of aggregation for paged indexed queries.
Before this series returned results were presented to the user in per-page
partial manner, while they should have been returned as a single aggregated
value.

Tests: unit(dev)

Piotr Sarna (8):
  cql3: split execute_base_query implementation
  cql3: enable explicit copying of query_options
  cql3: add a query options constructor with explicit page size
  cql3: add proper aggregation to paged indexing
  cql3: make DEFAULT_COUNT_PAGE_SIZE constant public
  tests: add query_options to cquery_nofail
  tests: add indexing + paging + aggregation test case
  tests: add indexing+paging test case for clustering keys
2019-06-25 19:05:29 +03:00
Avi Kivity
7195f75fb2 Update seastar submodule
* seastar ded50bd8a4...b629d5ef7a (9):
  > sharded: no_sharded_instance_exception: fix grammar
  > core,net: output_stream: remove redundant std::move()
  > perftune: make sure that ethtool -K has a chance of succeeding
  > net/dpdk: upgrade to dpdk-19.05
  > perftune.py: Fix a few more places where we use deprecated pyudev.Device ones
  > reactor: provide an uptime function
  > rpc: add sink::flush() to streaming api
  > Use a table to document the various build modes
  > foreign_ptr: Fix compilation error due to unused variable
2019-06-25 19:05:29 +03:00
Avi Kivity
9d21341733 review-checklist.md: add common checks
- code style
 - naming
 - micro-performance
 - concurrency
 - unit-testing
 - templates and type erasure
 - singletons
2019-06-25 19:05:29 +03:00
Piotr Sarna
efa7951ea5 main: stop view builder conditionally
The view builder is started only if it's enabled in config,
via the view_building=true variable. Unfortunately, stopping
the builder was unconditional, which may result in failed
assertions during shutdown. To remedy this, view building
is stopped only if it was previously started.

Fixes #4589
2019-06-25 19:05:29 +03:00
Asias He
bb5665331c repair: Sync schema from follower nodes before repair
Since commit "repair: Use the same schema version for repair master and
followers", repair master and followers uses the same schema version
that master decides to use during the whole repair operation. If master
has older version of schema, repair could ignore the data which makes use
of the new schema, e.g., writes to new columns.

To fix, always sync the schema agreement before repair.

The master node pulls schema from followers and applies locally. The
master then uses the "merged" schema. The followers use
get_schema_for_write() to pull the "merged" schema.

Fixes #4575
Backports: 3.1
2019-06-25 17:13:47 +08:00
Asias He
14c1a71860 migration_manager: Add sync_schema
Makes sure this node knows about all schema changes known by
"nodes" that were made prior to this call.

Refs: #4575
Backports: 3.1
2019-06-25 17:13:47 +08:00
Piotr Sarna
add40d4e59 cql3: lift infinite bound check if it's supported
If the database supports infinite bound range deletions,
CQL layer will no longer throw an error indicating that both ranges
need to be specified.

[bhalevy] Update test_range_deletion_scenarios unit test accordingly.

Fixes #432

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-24 15:58:34 +03:00
Piotr Sarna
c19fdc4c90 service: enable infinite bound range deletions with mc
As soon as it's agreed that the cluster supports sstables in mc format,
infinite bound range deletions in statements can be safely enabled.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-24 15:58:28 +03:00
Piotr Sarna
e77ef849af database: add flag for infinite bound range deletions
Database can only support infinite bound range deletions if sstable mc
format is supported. As a first step to implement these checks,
an appropriate flag is added to database.
2019-06-24 15:57:47 +03:00
Piotr Sarna
b668ee2b2d tests: add indexing+paging test case for clustering keys
Indexing a non-prefix part of the clustering key has a separate
code path (see issue #3405), so it deserves a separate test case.
2019-06-24 14:51:17 +02:00
Piotr Sarna
3d9a37f28f tests: add indexing + paging + aggregation test case
Indexed queries used to erroneously return partial per-page results
for aggregation queries. This test case used to reproduce the problem
and now ensures that there would be no regressions.

Refs #4540
2019-06-24 14:06:42 +02:00
Piotr Sarna
60cafcc39c tests: add query_options to cquery_nofail
The cquery_nofail utility is extended, so it can accept custom
query options, just like execute_cql does.
2019-06-24 14:06:41 +02:00
Piotr Sarna
fe18638de3 cql3: make DEFAULT_COUNT_PAGE_SIZE constant public
The constant will be later used in test scenarios.
2019-06-24 13:21:37 +02:00
Piotr Sarna
bb08af7e68 cql3: add proper aggregation to paged indexing
Aggregated and paged filtering needs to aggregate the results
from all pages in order to avoid returning partial per-page
results. It's a little bit more complicated than regular aggregation,
because each paging state needs to be translated between the base
table and the underlying view. The routine keeps fetching pages
from the underlying view, which are then used to fetch base rows,
which go straight to the result set builder.

Fixes #4540
2019-06-24 13:21:32 +02:00
Piotr Sarna
97d476b90f cql3: add a query options constructor with explicit page size
For internal use, there already exists a query_options constructor
that copies data from another query_options with overwritten paging
state. This commit adds an option to overwrite page size as well.
2019-06-24 13:21:32 +02:00
Piotr Sarna
fa89e220ef cql3: enable explicit copying of query_options 2019-06-24 12:57:04 +02:00
Piotr Sarna
7a8b243ce4 cql3: split execute_base_query implementation
In order to handle aggregation queries correctly, the function that
returns base query results is split into two, so it's possible to
access raw query results, before they're converted into end-user
CQL message.
2019-06-24 12:57:03 +02:00
Benny Halevy
b1e78313fe log_histogram: log_heap_options::bucket_of: avoid calling pow2_rank(0)
pow2_rank is undefined for 0.
bucket_of currently works around that by using a bitmask of 0.
To allow asserting that count_{leading,trailing}_zeros are not
called with 0, we want to avoid it at all call sites.

Fixes #4153

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190623162137.2401-1-bhalevy@scylladb.com>
2019-06-23 19:32:51 +03:00
Avi Kivity
779b378785 Merge "Fix partitioned_sstable_set by making it self sufficient" from Raphael & Benny
"
partitioned_sstable_set is not self sufficient because it relies on
compatible_ring_position_view, which in turn relies on lifetime of
sstable object. This leads to use-after-free. Fix this problem by
introducing compatible_ring_position and using it in p__s__s.

Fixes #4572.

Test: unit (dev), compaction dtests (dev)
"

* 'projects/fix_partitioned_sstable_set/v4' of ssh://github.com/bhalevy/scylla:
  tests: Test partitioned sstable set's self-sufficiency
  sstables: Fix partitioned_sstable_set by making it self sufficient
  Introduce compatible_ring_position and compatible_ring_position_or_view
2019-06-23 17:17:18 +03:00
Raphael S. Carvalho
14fa7f6c02 tests: Test partitioned sstable set's self-sufficiency
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-23 16:29:13 +03:00
Raphael S. Carvalho
293557a34e sstables: Fix partitioned_sstable_set by making it self sufficient
Partitioned sstable set is not self sufficient, because it uses compatible_ring_position_view
as key for interval map, which is constructed from a decorated key in sstable object.
If sstable object is destroyed, like when compaction releases it early, partitioned set
potentially no longer works because c__r__p__v would store information that is already freed,
meaning its use implies use-after-free.
Therefore, the problem happens when partitioned set tries to access the interval of its
interval map and uses freed information from c__r__p__v.

Fix is about using the newly introduced compatible_ring_position_or_view which can hold a
ring_position, meaning that partitioned set is no longer dependent on lifetime of sstable
object.

Retire compatible_ring_position_view.hh as it is now unused.

Fixes #4572.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-23 16:29:13 +03:00
Raphael S. Carvalho
9a83561700 Introduce compatible_ring_position and compatible_ring_position_or_view
The motivation for supporting ring position is that containers using
it can be self sufficient. The existing compatible_ring_position_view
could lead to use after free when the ring position data, it was built
from, is gone.

The motivation for compatible_ring_position_or_view is to allow lookup
on containers that don't support different key types using c__r__p,
and also to avoid unnecessary copies.
If the user is provided only with a ring_position_view, c__r__p__or_v
could be built from it and used for lookups.
Converting ring_position_view to ring_position is very bug prone because
there could be information lost in the process.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-23 16:29:12 +03:00
Rafael Ávila de Espíndola
65ac0a831c Add to_string_impl that takes a data_value
Currently to_string takes raw bytes. This means that to print a
data_value it has to first be serialized to be passed to to_string,
which will then deserializes it.

This patch adds a virtual to_string_impl that takes a data_value and
implements a now non virtual to_sting on top of it.

I don't expect this to have a performance impact. It mostly documents
how to access a data_value without converting it to bytes.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190620183449.64779-3-espindola@scylladb.com>
2019-06-23 16:03:06 +03:00