Commit Graph

18822 Commits

Author SHA1 Message Date
Asias He
bc295a00a6 messaging_service: Add rpc stream verb for row level repair
- REPAIR_GET_ROW_DIFF_WITH_RPC_STREAM

Get repair rows from follower nodes

- REPAIR_PUT_ROW_DIFF_WITH_RPC_STREAM

Put repair rows to follower nodes

- REPAIR_GET_FULL_ROW_HASHES_WITH_RPC_STREAM:

Get full hashes from follower nodes
2019-07-02 21:18:55 +08:00
Asias He
c93113f3a5 idl: Add repair_row_on_wire_with_cmd 2019-07-02 21:18:54 +08:00
Asias He
a90fb24efc idl: Add repair_hash_with_cmd 2019-07-02 21:18:37 +08:00
Asias He
599d40fbe9 idl: Add repair_stream_cmd 2019-07-02 21:18:15 +08:00
Asias He
672c24f6b0 idl: Add send_full_set_rpc_stream for row_level_diff_detect_algorithm 2019-07-02 21:17:36 +08:00
Asias He
fb3f0125ee repair: Add default construct for partition_key_and_mutation_fragments
This is useful when we want to add an empty
partition_key_and_mutation_fragments.
2019-06-26 09:12:55 +08:00
Asias He
3fc53a6b72 repair: Add send_full_set_rpc_stream in row_level_diff_detect_algorithm
It is used to negotiate if the master can use the rpc stream interface
to transfer data.
2019-06-26 09:12:55 +08:00
Asias He
6054a56333 repair: Add repair_row_on_wire_with_cmd
It is used to contain both a repair cmd and repair_row_on_wire object.
2019-06-26 09:12:55 +08:00
Asias He
9f36d775dc repair: Add repair_hash_with_cmd
It is a wrapper contains both a repair cmd and repair_hash object.
2019-06-26 09:12:55 +08:00
Asias He
6b59279e26 repair: Add repair_stream_cmd
It is used by row level repair to add small protocol on top of the rpc stream
interface.
2019-06-26 09:12:55 +08:00
Rafael Ávila de Espíndola
94d2194c77 dht: token: Simplify operator<
While this is a strict weak ordering, it is not obvious and duplicates
a bit of logic. This ptach simplifies it by using tri_compare.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190621204820.37874-1-espindola@scylladb.com>
2019-06-25 19:06:30 +03:00
Tomasz Grabiec
269e65a8db Merge "Sync schema before repair" from Asias
This series makes sure new schema is propagated to repair master and
follower nodes before repair.

Fixes #4575

* dev.git asias/repair_pull_schema_v2:
  migration_manager: Add sync_schema
  repair: Sync schema from follower nodes before repair
2019-06-25 19:05:29 +03:00
Amos Kong
f0cd589a75 dist: suppress the yaml load warning
YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated,
as the default Loader is unsafe. Please read https://msg.pyyaml.org/load
for full details.

Fix it by use new safe interface - yaml.safe_load()

Signed-off-by: Amos Kong <amos@scylladb.com>
Cc: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <9b68601845117274573474ede0341cc81f80efa6.1561156205.git.amos@scylladb.com>
2019-06-25 19:05:29 +03:00
Avi Kivity
fc629bb14f Merge "cql3: lift infinite bound check" from Benny & Piotr
"
If the database supports infinite bound range deletions,
CQL layer will no longer throw an error indicating that both ranges
need to be specified.

Fixes #432

Update test_range_deletion_scenarios unit test accordingly.
"

* 'cql3-lift-infinite-bound-check' of https://github.com/bhalevy/scylla:
  cql3: lift infinite bound check if it's supported
  service: enable infinite bound range deletions with mc
  database: add flag for infinite bound range deletions
2019-06-25 19:05:29 +03:00
Nadav Har'El
a88c9ca5a5 Merge branch 'add_proper_aggregation_for_paged_indexing_2' of git://github.com/psarna/scylla into next
Piotr Sarna says:

Fixes #4540
This series adds proper handling of aggregation for paged indexed queries.
Before this series returned results were presented to the user in per-page
partial manner, while they should have been returned as a single aggregated
value.

Tests: unit(dev)

Piotr Sarna (8):
  cql3: split execute_base_query implementation
  cql3: enable explicit copying of query_options
  cql3: add a query options constructor with explicit page size
  cql3: add proper aggregation to paged indexing
  cql3: make DEFAULT_COUNT_PAGE_SIZE constant public
  tests: add query_options to cquery_nofail
  tests: add indexing + paging + aggregation test case
  tests: add indexing+paging test case for clustering keys
2019-06-25 19:05:29 +03:00
Avi Kivity
7195f75fb2 Update seastar submodule
* seastar ded50bd8a4...b629d5ef7a (9):
  > sharded: no_sharded_instance_exception: fix grammar
  > core,net: output_stream: remove redundant std::move()
  > perftune: make sure that ethtool -K has a chance of succeeding
  > net/dpdk: upgrade to dpdk-19.05
  > perftune.py: Fix a few more places where we use deprecated pyudev.Device ones
  > reactor: provide an uptime function
  > rpc: add sink::flush() to streaming api
  > Use a table to document the various build modes
  > foreign_ptr: Fix compilation error due to unused variable
2019-06-25 19:05:29 +03:00
Avi Kivity
9d21341733 review-checklist.md: add common checks
- code style
 - naming
 - micro-performance
 - concurrency
 - unit-testing
 - templates and type erasure
 - singletons
2019-06-25 19:05:29 +03:00
Piotr Sarna
efa7951ea5 main: stop view builder conditionally
The view builder is started only if it's enabled in config,
via the view_building=true variable. Unfortunately, stopping
the builder was unconditional, which may result in failed
assertions during shutdown. To remedy this, view building
is stopped only if it was previously started.

Fixes #4589
2019-06-25 19:05:29 +03:00
Asias He
bb5665331c repair: Sync schema from follower nodes before repair
Since commit "repair: Use the same schema version for repair master and
followers", repair master and followers uses the same schema version
that master decides to use during the whole repair operation. If master
has older version of schema, repair could ignore the data which makes use
of the new schema, e.g., writes to new columns.

To fix, always sync the schema agreement before repair.

The master node pulls schema from followers and applies locally. The
master then uses the "merged" schema. The followers use
get_schema_for_write() to pull the "merged" schema.

Fixes #4575
Backports: 3.1
2019-06-25 17:13:47 +08:00
Asias He
14c1a71860 migration_manager: Add sync_schema
Makes sure this node knows about all schema changes known by
"nodes" that were made prior to this call.

Refs: #4575
Backports: 3.1
2019-06-25 17:13:47 +08:00
Piotr Sarna
add40d4e59 cql3: lift infinite bound check if it's supported
If the database supports infinite bound range deletions,
CQL layer will no longer throw an error indicating that both ranges
need to be specified.

[bhalevy] Update test_range_deletion_scenarios unit test accordingly.

Fixes #432

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-24 15:58:34 +03:00
Piotr Sarna
c19fdc4c90 service: enable infinite bound range deletions with mc
As soon as it's agreed that the cluster supports sstables in mc format,
infinite bound range deletions in statements can be safely enabled.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-24 15:58:28 +03:00
Piotr Sarna
e77ef849af database: add flag for infinite bound range deletions
Database can only support infinite bound range deletions if sstable mc
format is supported. As a first step to implement these checks,
an appropriate flag is added to database.
2019-06-24 15:57:47 +03:00
Piotr Sarna
b668ee2b2d tests: add indexing+paging test case for clustering keys
Indexing a non-prefix part of the clustering key has a separate
code path (see issue #3405), so it deserves a separate test case.
2019-06-24 14:51:17 +02:00
Piotr Sarna
3d9a37f28f tests: add indexing + paging + aggregation test case
Indexed queries used to erroneously return partial per-page results
for aggregation queries. This test case used to reproduce the problem
and now ensures that there would be no regressions.

Refs #4540
2019-06-24 14:06:42 +02:00
Piotr Sarna
60cafcc39c tests: add query_options to cquery_nofail
The cquery_nofail utility is extended, so it can accept custom
query options, just like execute_cql does.
2019-06-24 14:06:41 +02:00
Piotr Sarna
fe18638de3 cql3: make DEFAULT_COUNT_PAGE_SIZE constant public
The constant will be later used in test scenarios.
2019-06-24 13:21:37 +02:00
Piotr Sarna
bb08af7e68 cql3: add proper aggregation to paged indexing
Aggregated and paged filtering needs to aggregate the results
from all pages in order to avoid returning partial per-page
results. It's a little bit more complicated than regular aggregation,
because each paging state needs to be translated between the base
table and the underlying view. The routine keeps fetching pages
from the underlying view, which are then used to fetch base rows,
which go straight to the result set builder.

Fixes #4540
2019-06-24 13:21:32 +02:00
Piotr Sarna
97d476b90f cql3: add a query options constructor with explicit page size
For internal use, there already exists a query_options constructor
that copies data from another query_options with overwritten paging
state. This commit adds an option to overwrite page size as well.
2019-06-24 13:21:32 +02:00
Piotr Sarna
fa89e220ef cql3: enable explicit copying of query_options 2019-06-24 12:57:04 +02:00
Piotr Sarna
7a8b243ce4 cql3: split execute_base_query implementation
In order to handle aggregation queries correctly, the function that
returns base query results is split into two, so it's possible to
access raw query results, before they're converted into end-user
CQL message.
2019-06-24 12:57:03 +02:00
Benny Halevy
b1e78313fe log_histogram: log_heap_options::bucket_of: avoid calling pow2_rank(0)
pow2_rank is undefined for 0.
bucket_of currently works around that by using a bitmask of 0.
To allow asserting that count_{leading,trailing}_zeros are not
called with 0, we want to avoid it at all call sites.

Fixes #4153

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190623162137.2401-1-bhalevy@scylladb.com>
2019-06-23 19:32:51 +03:00
Avi Kivity
779b378785 Merge "Fix partitioned_sstable_set by making it self sufficient" from Raphael & Benny
"
partitioned_sstable_set is not self sufficient because it relies on
compatible_ring_position_view, which in turn relies on lifetime of
sstable object. This leads to use-after-free. Fix this problem by
introducing compatible_ring_position and using it in p__s__s.

Fixes #4572.

Test: unit (dev), compaction dtests (dev)
"

* 'projects/fix_partitioned_sstable_set/v4' of ssh://github.com/bhalevy/scylla:
  tests: Test partitioned sstable set's self-sufficiency
  sstables: Fix partitioned_sstable_set by making it self sufficient
  Introduce compatible_ring_position and compatible_ring_position_or_view
2019-06-23 17:17:18 +03:00
Raphael S. Carvalho
14fa7f6c02 tests: Test partitioned sstable set's self-sufficiency
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-23 16:29:13 +03:00
Raphael S. Carvalho
293557a34e sstables: Fix partitioned_sstable_set by making it self sufficient
Partitioned sstable set is not self sufficient, because it uses compatible_ring_position_view
as key for interval map, which is constructed from a decorated key in sstable object.
If sstable object is destroyed, like when compaction releases it early, partitioned set
potentially no longer works because c__r__p__v would store information that is already freed,
meaning its use implies use-after-free.
Therefore, the problem happens when partitioned set tries to access the interval of its
interval map and uses freed information from c__r__p__v.

Fix is about using the newly introduced compatible_ring_position_or_view which can hold a
ring_position, meaning that partitioned set is no longer dependent on lifetime of sstable
object.

Retire compatible_ring_position_view.hh as it is now unused.

Fixes #4572.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-23 16:29:13 +03:00
Raphael S. Carvalho
9a83561700 Introduce compatible_ring_position and compatible_ring_position_or_view
The motivation for supporting ring position is that containers using
it can be self sufficient. The existing compatible_ring_position_view
could lead to use after free when the ring position data, it was built
from, is gone.

The motivation for compatible_ring_position_or_view is to allow lookup
on containers that don't support different key types using c__r__p,
and also to avoid unnecessary copies.
If the user is provided only with a ring_position_view, c__r__p__or_v
could be built from it and used for lookups.
Converting ring_position_view to ring_position is very bug prone because
there could be information lost in the process.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-06-23 16:29:12 +03:00
Rafael Ávila de Espíndola
65ac0a831c Add to_string_impl that takes a data_value
Currently to_string takes raw bytes. This means that to print a
data_value it has to first be serialized to be passed to to_string,
which will then deserializes it.

This patch adds a virtual to_string_impl that takes a data_value and
implements a now non virtual to_sting on top of it.

I don't expect this to have a performance impact. It mostly documents
how to access a data_value without converting it to bytes.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190620183449.64779-3-espindola@scylladb.com>
2019-06-23 16:03:06 +03:00
Rafael Ávila de Espíndola
3bd5dd7570 Add a few more tests of data_value::to_string
I found that no tests covered this code while refactoring it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190620183449.64779-2-espindola@scylladb.com>
2019-06-23 16:03:06 +03:00
Nadav Har'El
6e87bca65d storage_proxy: fix race and crash in case of MV and other node shutdown
Recently, in merge commit 2718c90448,
we added the ability to cancel pending view-update requests when we detect
that the target node went down. This is important for view updates because
these have a very long timeout (5 minutes), and we wanted to make this
timeout even longer.

However, the implementation caused a race: Between *creating* the update's
request handler (create_write_response_handler()) and actually starting
the request with this handler (mutate_begin()), there is a preemption point
and we may end up deleting the request handler before starting the request.
So mutate_begin() must gracefully handle the case of a missing request
handler, and not crash with a segmentation fault as it did before this patch.

Eventually the lifetime management of request handlers could be refactored
to avoid this delicate fix (which requires more comments to explain than
code), or even better, it would be more correct to cancel individual writes
when a node goes down, not drop the entire handler (see issue #4523).
However, for now, let's not do such invasive changes and just fix bug that
we set out to fix.

Fixes #4386.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190620123949.22123-1-nyh@scylladb.com>
2019-06-23 16:03:06 +03:00
Asias He
b99c75429a repair: Avoid searching all the rows in to_repair_rows_on_wire
The repair_rows in row_list are sorted. It is only possible for the
current repair_row to share the same partition key with the last
repair_row inserted into repair_row_on_wire. So, no need to search from
the beginning of the repair_rows_on_wire to avoid quadratic complexity.
To fix, look at the last item in repair_rows_on_wire.

Fixes #4580
Message-Id: <08a8bfe90d1a6cf16b67c210151245879418c042.1561001271.git.asias@scylladb.com>
2019-06-23 16:03:06 +03:00
Benny Halevy
883cb4318f Merge pull request #4583 from bhalevy/init-and-shutdown-logging
Init and shutdown logging
2019-06-23 16:03:06 +03:00
Rafael Ávila de Espíndola
3660caff77 Reduce memory used by all tests
Tests without custom flags were already being run with -m2G. Tests
with custom flags have to manually specify it, but some were missing
it. This could cause tests to fail with std::bad_alloc when two
concurrent tests tried to allocate all the memory.

This patch adds -m2G to all tests that were missing it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190620002921.101481-1-espindola@scylladb.com>
2019-06-23 16:03:06 +03:00
Avi Kivity
9229afe64f Merge "Fix infinite paging for indexed queries" from Piotr
"
Fixes #4569

This series fixes the infinite paging for indexed queries issue.
Before this fix, paging indexes tended to end up in an infinite loop
of returning pages with 0 results, but has_more_pages flag set to true,
which confused the drivers.

Tests: unit(dev)
Branches: 3.0, 3.1
"

* 'fix_infinite_paging_for_indexed_queries' of https://github.com/psarna/scylla:
  tests: add test case for finishing index paging
  cql3: fix infinite paging for indexed queries
2019-06-23 16:03:06 +03:00
Takuya ASADA
2135d2ae7f dist/debian: install capabilities.conf on postinst script
We still has "{{^jessie}}" tag on scylla-server systemd unit file to
skip using AmbientCapabilities on Debian 8, but it does not able to work
anymore since we moved to single binary .deb package for all debian variants,
we must share same systemd unit file across all Debian variants.

To do so we need to have separated file on /etc/systemd to define
AmbientCapabilities, create the file while running postinst script only
if distribution is not Debian 8, just like we do in .rpm.

See #3344
See #3486

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190619064224.23035-1-syuu@scylladb.com>
2019-06-23 16:03:06 +03:00
Tomasz Grabiec
46341bd63f gdb: Print coordinator stats related to memory usage from 'scylla memory'
Example:

 Coordinator:
  fg writes:            150
  bg writes:          39980, 21429280 B
  fg reads:               0
  bg reads:               0
  hints:                  0 B
  view hints:             0 B

Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1559906745-17150-1-git-send-email-tgrabiec@scylladb.com>
2019-06-23 16:03:06 +03:00
Tomasz Grabiec
f7e79b07d1 lsa: Respect the reclamation step hint from seastar allocator
This will allow us to reduce the amount of segment compaction when
reclaiming on behlaf of a large allocation because we'll evict much
more up front.

Tests:
  - unit (dev)

Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1559906584-16770-1-git-send-email-tgrabiec@scylladb.com>
2019-06-23 16:03:06 +03:00
Tomasz Grabiec
c5184b3dd0 gdb: Print region_impl pointer from scylla lsa
Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1559906684-17019-1-git-send-email-tgrabiec@scylladb.com>
2019-06-23 16:03:06 +03:00
Alexys Jacob
98bc9edf6f thrift/: support version 0.11+ after THRIFT-2221
Thrift 0.11 changed to generate c++ code with
std::shared_ptr instead of boost::shared_ptr.

- https://issues.apache.org/jira/browse/THRIFT-2221

This was forcing scylla to stick with older versions
of thrift.

Fixes issue #3097.

thrift: add type aliases to build with old and new versions.

update to using namespace =
2019-06-23 16:03:06 +03:00
Takuya ASADA
e4320d6537 dist/debian: run 'systemctl daemon-reload' automatically on package install/uninstall
Since we cannot use dh --with=systemd because we don't want to
automatically enabling systemd units, manage them by our setup scripts,
we have to do 'systemctl daemon-reload' manually.
(On dh --with=systemd, systemd helper automatically provides such
scirpts)

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190618000210.28972-1-syuu@scylladb.com>
2019-06-23 16:03:06 +03:00
Rafael Ávila de Espíndola
8c067c26d9 Add support for the sanitize build mode in scylla
Running tests in debug mode takes 25:22.08 in my machine. Using
sanitize instead takes that down to 10:46.39.

The mode is opt in, in that it must be explicitly selected with
"configure.py --mode=sanitize" or "ninja sanitize". It must also be
explicitly passed to test.py.

Unfortunately building with asan, optimizations and debug info is
very slow and there is nothing like -gline-tables-only in gcc.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190617170007.44117-1-espindola@scylladb.com>
2019-06-23 16:03:06 +03:00