Commit Graph

22336 Commits

Author SHA1 Message Date
Tomasz Grabiec
c95dd67d11 utils: Introduce cached_file
It is a read-through cache of a file.

Will be used to cache contents of the promoted index area from the
index file.

Currently, cached pages are evicted manually using the invalidate_*()
method family, or when the object is destroyed.

The cached_file represents a subset of the file. The reason for this
is to satisfy two requirements. One is that we have a page-aligned
caching, where pages are aligned relative to the start of the
underlying file. This matches requirements of the seastar I/O engine
on I/O requests.  Another requirement is to have an effective way to
populate the cache using an unaligned buffer which starts in the
middle of the file when we know that we won't need to access bytes
located before the buffer's position. See populate_front(). If we
couldn't assume that, we wouldn't be able to insert an unaligned
buffer into the cache.
2020-06-16 16:15:23 +02:00
Tomasz Grabiec
ab274b8203 sstables: clustered_index: Relax scope of validity of entry_info
entry_info holds views, which may get invalidated when the containing
index blocks are removed. Current implementations of next_entry() keeps
the blocks in memory as long as the cursor is alive but that will
change in new implementations of the cursor.

Adjust the assumption of tests accordingly.
2020-06-16 16:15:23 +02:00
Tomasz Grabiec
ea2fbcc2cd sstables: index_entry: Introduce owning promoted_index_block_position 2020-06-16 16:15:23 +02:00
Tomasz Grabiec
714da3c644 compound_compat: Allow constructing composite from a view 2020-06-16 16:15:23 +02:00
Tomasz Grabiec
f2e52c433f sstables: index_entry: Rename promoted_index_block_position to promoted_index_block_position_view 2020-06-16 16:15:23 +02:00
Tomasz Grabiec
101fd613c5 sstables: mc: Extract parser for promoted index block
It will be reused in binary search over the index.
2020-06-16 16:15:14 +02:00
Tomasz Grabiec
a557c374fd sstables: mc: Extract parser for clustering out of the promoted index block parser
This parser will be used stand-alone when doing a binary search over
promoted index blocks. We will only parse the start key not the whole
block.
2020-06-16 16:14:31 +02:00
Tomasz Grabiec
95df7126a7 sstables: consumer: Extract primitive_consumer
This change extracts the parser for primitive types out of
continuous_data_consumer so that it can be used stand-alone
or embedded in other parsers.
2020-06-16 16:14:30 +02:00
Tomasz Grabiec
d5bf540079 sstables: Abstract the clustering index cursor behavior
In preparation for supporting more than one algorithm for lookups in
the promoted index, extract relevant logic out of the index_reader
(which is a partition index cursor).

The clustered index cursor implementation is now hidden behind
abstract interface called clustered_index_cursor.

The current implementation is put into the
scanning_clustered_index_cursor. It's mostly code movement with minor
adjustments.

In order to encapsulate iteration over promoted index entries,
clustered_index_cursor::next_entry() was introduced.

No change in behavior intended in this patch.
2020-06-16 16:14:17 +02:00
Tomasz Grabiec
a858f87b11 sstables: index_reader: Rearrange to reduce branching and optionals
No change in logic.

Will make it easier to make further refactoring.
2020-06-16 16:13:39 +02:00
Calle Wilund
5105e9f5e1 cdc::log: Missing "preimage" check in row deletion pre-image
Fixes #6561

Pre-image generation in row deletion case only checked if we had a pre-image
result set row. But that can be from post-image. Also check actual existance
of the pre-image CK.
Message-Id: <20200608132804.23541-1-calle@scylladb.com>
2020-06-09 10:56:41 +03:00
Piotr Sarna
2746a3597f Update seastar submodule
* seastar 42e77050...81242ccc (7):
  > demos: coroutine_demo: fix for SEASTAR_API_LEVEL >= 3
  > core: Avoid warning on disable_backtrace_temporarily::_old being unused
  > future: Add a couple of friend declarations
  > Merge "net: make socket stack nothrow move constructible" from Benny
  > reactor: Avoid declaring _Unwind_RaiseException
  > future-util: Delete SEASTAR__WAIT_ALL__AVOID_ALLOCATION_WHEN_ALL_READY
  > file: io_priority_class: specify constructor as noexcept
2020-06-08 19:38:28 +02:00
Takuya ASADA
1e2509ffec dist/offline_installer/debian: fix umask error
same as redhat, makeself script changes current umask, scylla_setup causes
"scylla does not work with current umask setting (0077)" error.
To fix that we need use latest version of makeself, and specfiy --keep-umask
option.

See #6243
2020-06-08 20:06:21 +03:00
Takuya ASADA
4eae7f66eb dist/offline_installer/debian: support cross build
Unlike redhat version, debian version already supported cross build since
it uses debootstrap, but the shellscript rejecting to continue build on
non-debian distribution, so drop these lines to build on Fedora.

[avi: regenerate toolchain]
2020-06-08 19:54:09 +03:00
Takuya ASADA
058da69a3b dist/debian/python3: cleanup build/debian, rename build directory
This is scylla-python3 version of #6611, but we also need to rename
.deb build directory for scylla-python3, since we may lose .deb when
building both scylla and scylla-python3 .deb package, since we currently
sharing build directory.
So renamed it to build/python3/debian.
2020-06-08 15:49:22 +03:00
Takuya ASADA
260d264d3c dist/debian: cleanup build/debian before building .deb
On 287d6e5, we stopped to rm -rf debian/ on build_deb.sh, since now we have
prebuilt debian/ directory.
However, it might cause .deb build error when we modified debian package source,
since it never cleanup.

To prevent build error, we need to cleanup build/debian on reloc/build_deb.sh,
before extracting contents from relocatable package.
2020-06-08 15:18:42 +03:00
Kamil Braun
013330199d cdc/storage_proxy: keep cdc_service alive in storage_proxy operations
storage_proxy is never deinitialized, so it may have still used cdc_service
after its destructor was called.

This fixes the problem by cdc_service inheriting from
async_sharded_service and storage_proxy calling shared_from_this on
the service whenever it uses it.

cdc_service inherits from async_sharded_service and not simply from
enable_shared_from_this, because there might be other services that
cdc_service depends on. Assuming that these services are
deinitialized after cdc_service (as they should), i.e. after stop() is
called on cdc_service, making cdc_service async_sharded_service will
keep their deinitialization code from being called until all references
to cdc_service disappear (async_sharded_service keeps stop() from
returning until this happens).

Some more improvements should be possible through some refactoring:
1. Make augment_mutation_call a free function, not a member of
   cdc_service: it doesn't need any state that cdc_service has.
   db_context can be passed down from storage_proxy when it calls the
   function.
2. Remove the storage_proxy -> cdc_service reference. storage_proxy
   only needs augment_mutation_call, which would not be a part of the
   service. This would also get rid of the proxy -> cdc -> proxy
   reference cycle that we have now, and would allow storage_proxy to be
   safely deinitialized after cdc_service.
3. Maybe we could even remove the cdc_service -> storage_proxy
   reference. Is it really needed?
2020-06-08 13:25:51 +03:00
Takuya ASADA
969c4258cf aws: update enhanced networking supported instance list
Sync enhanced networking supported instance list to latest one.

Reference: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html

Fixes #6540
2020-06-08 12:48:36 +03:00
Takuya ASADA
bebaaa038f dist/debian: fix node-exporter.service file name
Since 287d6e5, we mistakenly packaging node-exporter.service in wrong name
on .deb, need to rename in correct name.

Fixes #6604
2020-06-08 12:39:18 +03:00
Asias He
dddde33512 gossip: Do not send shutdown message when a node is in unknown status
When a replacing node is in early boot up and is not in HIBERNATE sate
yet, if the node is killed by a user, the node will wrongly send a
shutdown message to other nodes. This is because UNKNOWN is not in
SILENT_SHUTDOWN_STATES, so in gossiper::do_stop_gossiping, the node will
send shutdown message. Other nodes in the cluster will call
storage_service::handle_state_normal for this node, since NORMAL and
SHUTDOWN status share the same status handler. As a result, other nodes
will incorrectly think the node is part of the cluster and the replace
operation is finished.

Such problem was seen in replace_node_no_hibernate_state_test dtest:

   n1, n2 are in the cluster
   n2 is dead
   n3 is started to replace n2, but n3 is killed in the middle
   n3 announces SHUTDOWN status wrongly
   n1 runs storage_service::handle_state_normal for n3
   n1 get tokens for n3 which is empty, because n3 hasn't gossip tokens yet
   n1 skips update normal tokens for n3,  but think n3 has replaced n2
   n4 starts to replace n2
   n4 checks the tokens for n2 in storage_service::join_token_ring (Cannot
      replace token {} which does not exist!) or
      storage_service::prepare_replacement_info (Cannot replace_address {}
      because it doesn't exist in gossip)

To fix, we add UNKNOWN into SILENT_SHUTDOWN_STATES and avoid sending
shutdown message.

Tests: replace_address_test.py:TestReplaceAddress.replace_node_no_hibernate_state_test
Fixes: #6436
2020-06-08 11:32:23 +02:00
Pavel Solodovnikov
6f6e6762ba cql: remove unused functions
It seems that the following functions are never used, delete them:
 * `function::has_reference_to`
 * `functions::get_overload_count`
 * `to_identifiers` in column_identifier.hh
 * `single_column_relation::get_map_key`

Tests: unit(dev, debug)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200606115149.1770453-1-pa.solodovnikov@scylladb.com>
2020-06-08 11:28:57 +03:00
Piotr Sarna
3458bd2e32 db,view: fix outdated comments
Some comments still referred to variable names which are no longer
up-to-date.

Follow-up for #6560.
Message-Id: <2b857ccc900dd64f0d9379f5d6c87fd3aaa5d902.1591594042.git.sarna@scylladb.com>
2020-06-08 09:02:10 +03:00
Nadav Har'El
d6626c217a merge: add error injection to mv
Merged pull request https://github.com/scylladb/scylla/pull/6516 from
Piotr Sarna:

This series adds error injection points to materialized view paths:

	view update generation from staging sstables;
	view building;
	generating view updates from user writes.

This series comes with a corresponding dtest pull request which adds some
test cases based on error injection.

Fixes #6488
2020-06-07 19:23:23 +03:00
Avi Kivity
53a19fc1f2 Merge 'Debian version number fix' from Takuya
"
Now we generate dist/changelog on relocatable package generation time,
we cannot run '.rc' fixup on .deb package building time, need to do it
in debian_files_gen.py.

Also, we uses '_'  in version number for some test version packages,
which does not supported in .deb packaging system, need to replaced
with '-'.
"

* syuu1228-debian_version_number_fix:
  dist/debian: support version number containing '_'
  dist/debian: move version number fixup to debian_files_gen.py
2020-06-07 19:14:24 +03:00
Piotr Sarna
b3a6a33487 db,view: ensure that local updates are applied locally
In current mutate_MV() code it's possible for a local endpoint
to become a target for a network operation. That's the source
of occasional `broken promise` benign error messages appearing,
since the mutation is actually applied locally, so there's no point
in creating a write response handler - the node will not send a response
to itself via network.
While at it, the code is deduplicated a little bit - with the paths
simplified, it's easier to ensure that a local endpoint is never
listed as a target for remote network operations.

Fixes #5459
Tests: unit(dev),
       dtest(materialized_views_test.TestMaterializedViews.add_dc_during_mv_insert_test)
2020-06-07 19:10:03 +03:00
Kamil Braun
a1e235b1a4 CDC: Don't split collection tombstone away from base update
Overwriting a collection cell using timestamp T is a process with
following steps:
1. inserting a row marker (if applicable) with timestamp T;
2. writing a collection tombstone with timestamp T-1;
3. writing the new collection value with timestamp T.
Since CDC does clustering of the operations by timestamp, this
would result in 3 separate calls to `transform` (in case of
INSERT, or 2 - in the case of UPDATE), which seems excessive,
especially when pre-/postimage is enabled. This patch makes
collection tombstones being treated as if they had the same TS as
the base write and thus they are processed in one call to `transform`
(as long as TTLs are not used).

Also, `cdc_test` had to be updated in places that relied on former
splitting strategy.

Fixes #6084
2020-06-07 17:09:05 +03:00
Tomasz Grabiec
c1df00859e sstables: Make deletion_time printable
Message-Id: <1591387901-7974-12-git-send-email-tgrabiec@scylladb.com>
2020-06-07 13:55:34 +03:00
Raphael S. Carvalho
8e47f61df7 compaction: Enable tombstone expiration based on the presence of the sstable set
For tombstone expiration to proceed correctly without the risk of resurrecting
data, the sstable set must be present.
Regular compaction and derivatives provide the sstable set, so they're able
to expire tombstones with no resurrection risk.
Resharding, on the other hand, can run on any shard, not necessarily on the
same shard that one of the input sstables belongs to, so it currently cannot
provide a sstable set for tombstone expiration to proceed safely.
That being said, let's only do expiration based on the presence of the set.
This makes room for the sstable set to be feeded to compaction via descriptor,
allowing even resharding to do expiration. Currently, compaction thinks that
sstable set can only come from the table, and that also needs to be changed
for further flexibility.

It's theoretically possible that a given resharding job will resurrect data if
a fully expired SSTable is resharded at a shard which it doesn't belong to.
Resharding will have no way to tell that expiring all that data will lead to
resurrection because the relevant SSTables are at different shards.
This is fixed by checking for fully expired sstables only on presence of
the sstable set.

Fixes #6600.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200605200954.24696-1-raphaelsc@scylladb.com>
2020-06-07 11:46:48 +03:00
Pavel Solodovnikov
5b1b6b1395 cql: pass cql3::operation::raw_deletion by unique_ptr
Another small step towards shared_ptr usage reduction in cql3
code. Also make `raw_deletion` dtor virtual to make address
sanitizer happy in debug builds.

Tests: unit(dev, debug)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200606104528.1732241-1-pa.solodovnikov@scylladb.com>
2020-06-06 21:04:06 +03:00
Takuya ASADA
9de65f26de dist/debian: support version number containing '_'
.deb packaging system does not support version number contains '_',
it should be replacedwith '-'
2020-06-05 21:35:02 +09:00
Takuya ASADA
509ad875aa dist/debian: move version number fixup to debian_files_gen.py
Now we generate dist/changelog on relocatable package generation time,
we cannot run '.rc' fixup on .deb package building time, need to do it
in debian_files_gen.py.
2020-06-05 21:34:55 +09:00
Kamil Braun
1b7f1806ac test: improve comments on test_schema_digest_does_not_change
This test tends to cause a lot of discussion resulting from
not understanding what is actually being tested.

Closes https://github.com/scylladb/scylla/issues/6582.
2020-06-05 14:30:02 +02:00
Kamil Braun
d89b7a0548 cdc: rename CDC description tables
Commit 968177da04 has changed the schema
of cdc_topology_description and cdc_description tables in the
system_distributed keyspace.

Unfortunately this was a backwards-incompatible change: these tables
would always be created, irrespective of whether or not "experimental"
was enabled. They just wouldn't be populated with experimental=off.

If the user now tries to upgrade Scylla from a version before this change
to a version after this change, it will work as long as CDC is protected
b the experimental flag and the flag is off.

However, if we drop the flag, or if the user turns experimental on,
weird things will happen, such as nodes refusing to start because they
try to populate cdc_topology_description while assuming a different schema
for this table.

The simplest fix for this problem is to rename the tables. This fix must
get merged in before CDC goes out of experimental.
If the user upgrades his cluster from a pre-rename version, he will simply
have two garbage tables that he is free to delete after upgrading.

sstables and digests need to be regenerated for schema_digest_test since
this commit effectively adds new tables to the system_distributed keyspace.
This doesn't result in schema disagreement because the table is
announced to all nodes through the migration manager.
2020-06-05 09:59:16 +02:00
Piotr Sarna
64b8b77ac2 table: add error injection points to the materialized view path
... in order to be able to test scenarios with failures.
2020-06-05 09:39:58 +02:00
Piotr Sarna
76e89efc1a db,view: add error injection points to view building
... in order to be able to test scenarios with failures.
2020-06-05 09:39:58 +02:00
Piotr Sarna
9d524a7a7e db,view: add error injection points to view update generator
... in order to be able to test scenarios with failures.
2020-06-05 09:39:58 +02:00
Piotr Sarna
9a4394327a Merge 'CDC: Disallowed CDC for tables with counter column(s)'
from Juliusz.

CDC for counters is unimplemented as of now,
therefore any attempt to enable CDC log on counter
table needs to be clearly disallowed. This patch does
exactly this.

The check whether schema has counter columns
is performed in `cdc_service::impl` in:
- `on_before_create_column_family`,
- `on_before_update_column_family`
and, if so, results in `invalid_request_exception` thrown.

Fixes #6553

* jul-stas-6553-disallow-cdc-for-counters:
  test/cql: Check that CDC for counters is disallowed
  CDC: Disallowed CDC for tables with counter column(s)
2020-06-05 07:46:53 +02:00
Nadav Har'El
ace1697aa9 alternator test: reproducer for unjustly refused condition expression
This patch adds a test reproducing issue #6572, where the perfectly
good condition expression:

   #name1 = :val1 OR #name2 = :val2

Gets refused because of the following combination in our implementation:

  1. Short-circuit evaluation, i.e., after we discover #name1 = :val1
     we don't evaluate the second half of the expression.

  2. The list of "used" references is collected at evaluation time,
     instead of at parsing time. Because evaluation never reaches
     #name2 (or :val2) our implementation complains that they are not
     used, and refuses the request - which should have been allowed.

This test xfails on Alternator. It passes on DynamoDB.

Refs #6572

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200604171954.444291-1-nyh@scylladb.com>
2020-06-05 07:43:50 +02:00
Piotr Sarna
0ba23d2b40 test: add manual test for tagging return value
While not very interesting by itself, the test case shows
that in case of TagResource and UntagResource it's actually correct
to return empty HTTP body instead of an empty JSON object,
which was the case for PutItem.
Message-Id: <6331963179c5174a695f0e9eeed17de6c9f9a3be.1591269516.git.sarna@scylladb.com>
2020-06-04 16:17:24 +03:00
Nadav Har'El
db45ff2733 alternator: clean up usage of describe_item()
The DynamoDB GetItem request returns the requested item in a specific way,
wrapped in a map with a "Item" member. For historic reasons, we used the
same function that returns this (describe_item()) also in other code which
reads items - e.g. for checking conditional operations. The result is
wasteful - after adding this "Item" member we had other code to extract it,
all for no good reason.  It is also ugly and confusing.

Importantly, this situation also makes it harder for me to add support for
FilterExpression. The issue is that the expression evaluator got the item
with the wrapper (from the existing ConditionExpression code) but the
filtering code had it without this wrapper, as it didn't use describe_item().

So this patch uses describe_single_item(), which doesn't add the wrapper
map, instead of describe_item(). The latter function is used just once -
to implement GetItem. The unnecessary code to unwrap the item in multiple
places was then dropped.

All the tests still pass. I also tested test_expected.py in unsafe_rmw write
isolation mode, because code only for this mode had to be modified as well.

Refs #5038.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200604092050.422092-1-nyh@scylladb.com>
2020-06-04 12:33:48 +02:00
Nadav Har'El
3d26bde4c1 alternator doc: correct state of filtering support
Correct the compatibility section in docs/alternator/alternator.md:
Filtering of Scan/Query results using the older syntax (ScanFilter,
QueryFilter) is, after commit bea9629031,
now fully supported. The newer syntax (FilterExpression) is not yet.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200604073207.416860-1-nyh@scylladb.com>
2020-06-04 12:33:10 +02:00
Avi Kivity
5b92a6d9e4 build: drop __pycache__ directories from python3 relocatable package
Recently ./reloc/build_deb.sh started failing with

dpkg-source: info: using source format '1.0'
dpkg-source: info: building scylla-python3 using existing scylla-python3_3.8.3-0.20200604.77dfa4f15.orig.tar.gz
dpkg-source: info: building scylla-python3 in scylla-python3_3.8.3-0.20200604.77dfa4f15-1.diff.gz
dpkg-source: error: cannot represent change to scylla-python3/lib64/python3.8/site-packages/urllib3/packages/backports/__pycache__/__init__.cpython-38.pyc:
dpkg-source: error:   new version is plain file
dpkg-source: error:   old version is symlink to /usr/lib/python3.8/site-packages/__pycache__/six.cpython-38.pyc
dpkg-source: error: unrepresentable changes to source
dpkg-buildpackage: error: dpkg-source -b . subprocess returned exit status 1
debuild: fatal error at line 1182:

Those files are not in fact symlinks, so it's clear that dpkg is confused
about something. Rather than debug dpkg, however, it's easier to just
drop __pycache__ directories. These hold the result of bytecode
compilation and are therefore optional, as Python will compile the sources
if the cache is not populated.

Fixes #6584.
2020-06-04 13:04:34 +03:00
Israel Fruchter
a2bb48f44b fix "scylla_coredump_setup: Remove the coredump create by the check"
In 28c3d4 `out()` was used without `shell=True` and was the spliting of arguments
failed cause of the complex commands in the cmd (pipe and such)

Fixes #6159
2020-06-04 12:55:10 +03:00
Raphael S. Carvalho
77dfa4f151 sstables: kill unused resharding code
output_sstables is no longer needed after we made resharding
use a special interposer.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200603165324.176665-1-raphaelsc@scylladb.com>
2020-06-03 23:20:15 +03:00
Avi Kivity
0c34e114e2 Merge "Upgrade to seastar api version 3" (make_file_output_stream returns future) from Rafael
"
The new seastar api changes make_file_output_stream and
make_file_data_sink to return futures. This series includes a few
refactoring patches and the actual transition.
"

* 'espindola/api-v3-v3' of https://github.com/espindola/scylla:
  table: Fix indentation
  everywhere: Move to seastar api level 3
  sstables: Pass an output_stream to make_compressed_file_.*_format_output_stream
  sstables: Pass a data_sink to checksummed_file_writer's constructor
  sstables: Convert a file_writer constructor to a static make
  sstables: Move file_writer constructor out of line
2020-06-03 23:09:49 +03:00
Rafael Ávila de Espíndola
686f9220c1 table: Fix indentation
It was broken by the previous commit.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-06-03 10:32:46 -07:00
Rafael Ávila de Espíndola
e5876f6696 everywhere: Move to seastar api level 3
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-06-03 10:32:46 -07:00
Rafael Ávila de Espíndola
13282b3d4c sstables: Pass an output_stream to make_compressed_file_.*_format_output_stream
This is a bit simpler as we don't have to pass in the options and
moves the calls to make_file_output_stream to places where we can
handle futures.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-06-03 10:32:46 -07:00
Rafael Ávila de Espíndola
f6ec7364a7 sstables: Pass a data_sink to checksummed_file_writer's constructor
checksummed_file_writer cannot be moved, so we can't have a
checksummed_file_writer::make that returns a future. So instead we
pass in a data_sink and let the callers call make_file_data_sink.

This is in preparation for make_file_data_sink returning a future in
the seastar api v3.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-06-03 10:32:46 -07:00
Rafael Ávila de Espíndola
c1f37db72b sstables: Convert a file_writer constructor to a static make
For now it always returns a ready future. This is in preparation for
using seastar v3 api where make_file_output_stream returns a future.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-06-03 10:32:45 -07:00