Commit Graph

22768 Commits

Author SHA1 Message Date
Pekka Enberg
7ef50d7c71 configure.py: Don't install dependencies when building submodules
Let's pass the "--nodeps" option to "build_reloc.sh" script of the
submodules to avoid the build system running "sudo"...

Reported-by: Piotr Sarna <sarna@scylladb.com>
Reported-by: Pavel Emelyanov <xemul@scylladb.com>
Tested-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200714114340.440781-1-penberg@scylladb.com>
2020-07-14 14:50:59 +03:00
Tomasz Grabiec
f20c77d0f8 Merge "Make handle_state_left more robust when tokens are empty" from Asias
1. storage_service: Make handle_state_left more robust when tokens are empty

In case the tokens for the node to be removed from the cluster are
empty, log the application_state of the leaving node to help understand
why the tokens are empty and try to get the tokens from token_metadata.

2. token_metadata: Do not throw if empty tokens are passed to remove_bootstrap_tokens

Gossip on_change callback calls storage_service::excise which calls
remove_bootstrap_tokens to remove the tokens of the leaving node from
bootstrap tokens. If empty tokens, e.g., due to gossip propagation issue
as we saw in #6468, are passed
to remove_bootstrap_tokens, it will throw. Since the on_change callback
is marked as noexcept, such throw will cause the node to terminate which
is an overkill.

To avoid such error causing the whole cluster to down in worse cases,
just log the tokens are empty passed to remove_bootstrap_tokens.

Refs #6468
2020-07-14 13:19:45 +02:00
Asias He
116f6141d5 token_metadata: Fix incorrect log in update_normal_tokens
Currently, when update_normal_tokens is called, a warning logged.

   Token X changing ownership from A to B

It is not correct to log so because we can call update_normal_tokens
against a temporary token_metadata object during topology calculation.

Refs: #6437
2020-07-14 14:13:37 +03:00
Pekka Enberg
f0ae550553 configure.py: Add 'build' target for building artifats
The default ninja build target now builds artifacts and packages. Let's
add a 'build' target that only builds the artifacts.

Message-Id: <20200714105042.416698-1-penberg@scylladb.com>
2020-07-14 13:55:32 +03:00
Asias He
38d964352d repair: Relax node selection in bootstrap when nodes are less than RF
Consider a cluster with two nodes:

 - n1 (dc1)
 - n2 (dc2)

A third node is bootstrapped:

 - n3 (dc2)

The n3 fails to bootstrap as follows:

 [shard 0] init - Startup failed: std::runtime_error
 (bootstrap_with_repair: keyspace=system_distributed,
 range=(9183073555191895134, 9196226903124807343], no existing node in
 local dc)

The system_distributed keyspace is using SimpleStrategy with RF 3. For
the keyspace that does not use NetworkTopologyStrategy, we should not
require the source node to be in the same DC.

Fixes: #6744
Backports: 4.0 4.1, 4.2
2020-07-14 11:54:34 +02:00
Pekka Enberg
16baf98d67 README.md: Add project description
This adds a short project description to README to make the git
repository more discoverable. The text is an edited version of a Scylla
blurb provided by Peter Corless.

Message-Id: <20200714065726.143147-1-penberg@scylladb.com>
2020-07-14 11:28:43 +03:00
Asias He
271fac56a3 repair: Add synchronous API to query repair status
This new api blocks until the repair job is either finished or failed or timeout.

E.g.,

- Without timeout
curl -X GET http://127.0.0.1:10000/storage_service/repair_status/?id=123

- With timeout
curl -X GET http://127.0.0.1:10000/storage_service/repair_status/?id=123&timeout=5

The timeout is in second.

The current asynchronous api returns immediately even if the repair is in progress.

E.g., curl -X GET http://127.0.0.1:10000/storage_service/repair_async/ks?id=123

User can use the new synchronous API to avoid keep sending the query to
poll if the repair job is finished.

Fixes #6445
2020-07-14 11:20:15 +03:00
Amnon Heiman
186301aff8 per table metrics: change estimated_histogram to time_estimated_histogram
This patch changes the per table latencies histograms: read, write,
cas_prepare, cas_accept, and cas_learn.

Beside changing the definition type and the insertion method, the API
was changed to support the new metrics.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-07-14 11:17:43 +03:00
Amnon Heiman
ea8d52b11c row_locking: change estimated histogram with time_estimated_histogram
This patch changes the row locking latencies to use
time_estimated_histogram.

The change consist of changing the histogram definition and changing how
values are inserted to the histogram.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-07-14 11:17:43 +03:00
Amnon Heiman
edd3c97364 alternator: change estimated_histogram to time_estimated_histogram
This patch moves the alternator latencies histograms to use the time_estimated_histogram.
The changes requires changing the defined type and use the simpler
insertion method.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-07-14 11:17:43 +03:00
Takuya ASADA
a233b0ab3b redis: add strlen command
Add strlen command that returns string length of the key.

see: https://redis.io/commands/strlen
2020-07-14 10:56:23 +03:00
Asias He
a00ab8688f repair: Relax size check of get_row_diff and set_diff
In case a row hash conflict, a hash in set_diff will get more than one
row from get_row_diff.

For example,

Node1 (Repair master):
row1  -> hash1
row2  -> hash2
row3  -> hash3
row3' -> hash3

Node2 (Repair follower):
row1  -> hash1
row2  -> hash2

We will have set_diff = {hash3} between node1 and node2, while
get_row_diff({hash3}) will return two rows: row3 and row3'. And the
error below was observed:

   repair - Got error in row level repair: std::runtime_error
   (row_diff.size() != set_diff.size())

In this case, node1 should send both row3 and row3' to peer node
instead of fail the whole repair. Because node2 does not have row3 or
row3', otherwise node1 won't send row with hash3 to node1 in the first
place.

Refs: #6252
2020-07-14 10:39:30 +03:00
Nadav Har'El
8e3be5e7d6 alternator test: configurable temporary directory
The test/alternator/run script creates a temporary directory for the Scylla
database in /tmp. The assumption was that this is the fastest disk (usually
even a ramdisk) on the test machine, and we didn't need anything else from
it.

But it turns out that on some systems, /tmp is actually a slow disk, so
this patch adds a way to configure the temporary directory - if the TMPDIR
environment variable exists, it is used instead of /tmp. As before this
patch, a temporary subdirectry is created in $TMPDIR, and this subdirectory
is automatically deleted when the test ends.

The test.py script already passes an appropriate TMPDIR (testlog/$mode),
which after this patch the Alternator test will use instead of /tmp.

Fixes #6750

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200713193023.788634-1-nyh@scylladb.com>
2020-07-14 08:52:22 +03:00
Konstantin Osipov
e628da863d Export TMPDIR pointing at subdir of testlog/
Export TMPDIR environment variable pointing at a subdir of testlog.
This variable is used by seastar/scylla tests to create a
a subdirectory with temporary test data. Normally a test cleans
up the temporary directory, but if it crashes or is killed the
directory remains.

By resetting the default location from /tmp to testlog/{mode}
we allow test.py we consolidate all test artefacts in a single
place.

Fixes #6062, "test.py uses tmpfs"
2020-07-13 22:22:43 +03:00
Avi Kivity
60c115add2 Update seastar submodule
* seastar 5632cf2146...0fe32ec596 (11):
  > futures: Add a test for a broken promise in a parallel_for_each
  > future: Simplify finally_body implementation
  > futures_test: Extend nested_exception test
  > Merge "make gate methods noexcept" from Benny
  > tutorial: fix service_loop example
  > sharded: fix doxygen \example clause for sharded_parameter
  > Merge "future: Don't call need_preempt in 'then' and 'then_impl'" from Rafael
  > future: Refactor a bit of duplicated code
  > Merge "Add with_file helpers" from Benny
  > Merge "Fix doxygen warnings" from Benny
  > build: add doxygen to install-dependencies.sh
2020-07-13 20:19:42 +03:00
Juliusz Stasiewicz
d1dec3fcd7 cdc: Retry generation fetching after read_failure_exception
While fetching CDC generations, various exceptions can occur. They
are divided into "fatal" and "nonfatal", where "fatal" ones prevent
retrying of the fetch operation.

This patch makes `read_failure_exception` "non-fatal", because such
error may appear during restart. In general this type of error can
mean a few different things (e.g. an error code in a response from
replica, but also a broken connection) so retrying seems reasonable.

Fixes #6804
2020-07-13 18:17:45 +03:00
Pekka Enberg
d67f4dba1e README.md: Consolidate Docker image build instructions
Consolidate the Docker image build instructions into the "Building Scylla"
section of the README instead of having it in a separate section in a different
place of the file.

Message-Id: <20200713132600.126360-1-penberg@scylladb.com>
2020-07-13 17:14:44 +03:00
Nadav Har'El
35f7048228 alternator: CreateTable with bad Tags shouldn't create a table
Currently, if a user tries to CreateTable with a forbidden set of tags,
e.g., the Tags list is too long or contains an invalid value for
system:write_isolation, then the CreateTable request fails but the table
is still created. Without the tag of course.

This patch fixes this bug, and adds two test cases for it that fail
before this patch, and succeed with it. One of the test cases is
scylla_only because it checks the Scylla-specific system:write_isolation
tag, but the second test case works on DynamoDB as well.

What this patch does is to split the update_tags() function into two
parts - the first part just parses the Tags, validates them, and builds
a map. Only the second part actually writes the tags to the schema.
CreateTable now does the first part early, before creating the table,
so failure in parsing or validating the Tags will not leave a created
table behind.

Fixes #6809.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200713120611.767736-1-nyh@scylladb.com>
2020-07-13 17:14:44 +03:00
Pekka Enberg
c6116c36e0 configure.py: Remove obsolete "--with-osv" option
The "--with-osv" option is has been a no-op since commit cc17c44640
("Move seastar to a submodule"). Let's remove it as obsolete.

Message-Id: <20200713131333.125634-1-penberg@scylladb.com>
2020-07-13 17:14:44 +03:00
Nadav Har'El
21ae457e8a test.py: print test durations
When tests are run in parallel, it is hard to tell how much time each test
ran. The time difference between consecutive printouts (indicating a test's
end) says nothing about the test's duration.

This patch adds in "--verbose" mode, at the end of each test result, the
duration in seconds (in wall-clock time) of the test. For example,

$ ./test.py --mode dev --verbose alternator
================================================================================
[N/TOTAL] TEST                                                 MODE   RESULT
------------------------------------------------------------------------------
[1/2]     boost/alternator_base64_test                         dev    [ PASS ] 0.02s
[2/2]     alternator/run                                       dev    [ PASS ] 26.57s

These durations are useful for recognizing tests which are especially slow,
or runs where all the tests are unusually slow (which might indicate some
sort of misconfiguration of the test machine).

Fixes #6759

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200706142109.438905-1-nyh@scylladb.com>
2020-07-13 17:14:44 +03:00
Pekka Enberg
ace1b15ed6 configure.py: Make "dist" part of default target
This adds a new "dist-<mode>" target, which builds the server package in
selected build mode together with the other packages, and wires it to
the "<mode>" target, which is built as part of default "ninja"
invocation.

This allows us to perform a full build, package, and test cycle across
all build modes with:

  ./configure.py && ninja && ./test.py

Message-Id: <20200713101918.117692-1-penberg@scylladb.com>
2020-07-13 17:14:44 +03:00
Takuya ASADA
e6e4359414 scylla_raid_setup: switch to systemd mount unit
Since we already use systemd unit file for coredump bind mount and swapfile,
we should move to systemd mount unit for data partition as well.
2020-07-13 17:14:44 +03:00
Pekka Enberg
c807c903ab pull_github_pr.sh: Use "cherry-pick" for single-commit pull requests
Improve the "pull_github_pr.sh" to detect the number of commits in a
pull request, and use "git cherry-pick" to merge single-commit pull
requests.
Message-Id: <20200713093044.96764-1-penberg@scylladb.com>
2020-07-13 17:14:44 +03:00
Avi Kivity
d74582fbc5 move jmx/tools submodules to tools directory
Move all package repositories to tools directory.
2020-07-13 17:14:14 +03:00
Avi Kivity
06341d2528 dist: fix debian generated files for non-default PRODUCT setting
There are a bunch of renames that are done if PRODUCT is not the
default, but the Python code for them is incorrect. Path.glob()
is not a static method, and Path does not support .endswith().

Fix by constructing a Path object, and later casting to str.
2020-07-13 11:51:31 +03:00
Pekka Enberg
f2b4c1a212 scylla_prepare: Improve error message on missing CPU features
Let's report each missing CPU feature individually, and improve the
error message a bit. For example, if the "clmul" instruction is missing,
the report looks as follows:

  ERROR: You will not be able to run Scylla on this machine because its CPU lacks the following features: pclmulqdq

  If this is a virtual machine, please update its CPU feature configuration or upgrade to a newer hypervisor.

Fixes #6528
2020-07-13 11:39:29 +03:00
Pekka Enberg
bc053b3cfa README.md: Add links to mailing lists and Slack
Add links to the users and developers mailing lists, and the Slack
channel in README.md to make them more discoverable.

Message-Id: <20200713074654.90204-1-penberg@scylladb.com>
2020-07-13 10:48:55 +03:00
Pekka Enberg
df6a0ec5e5 README.md: Update build and run instructions
Simplify the build and run instructions by splitting the text in three
sections (prerequisites, building, and running) and streamlining the
steps a bit.

Message-Id: <20200713065910.84582-1-penberg@scylladb.com>
2020-07-13 10:04:12 +03:00
Pekka Enberg
5476efabb3 configure.py: Make output less verbose by default
The configure.py script outputs the Seastar build command it executes:

['./cooking.sh', '-i', 'dpdk', '-d', '../build/release/seastar', '--', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DCMAKE_C_COMPILER=gcc', '-DCMAKE_CXX_COMPILER=g++', '-DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON', '-DSeastar_CXX_FLAGS=;-Wno-error=stack-usage=-ffile-prefix-map=/home/penberg/src/scylla/scylla=.;-march=westmere;-O3;-Wstack-usage=13312;--param;inline-unit-growth=300', '-DSeastar_LD_FLAGS=-Wl,--build-id=sha1,--dynamic-linker=/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////lib64/ld-linux-x86-64.so.2 ', '-DSeastar_CXX_DIALECT=gnu++20', '-DSeastar_API_LEVEL=4', '-DSeastar_UNUSED_RESULT_ERROR=ON', '-DSeastar_DPDK=ON', '-DSeastar_DPDK_MACHINE=wsm']

The output is mostly useful for debugging the build process itself, so
hide it behind a "--verbose" flag, and make it more human-readable while
at it:

./cooking.sh \
  -i \
  dpdk \
  -d \
  ../build/release/seastar \
  -- \
  -DCMAKE_BUILD_TYPE=RelWithDebInfo \
  -DCMAKE_C_COMPILER=gcc \
  -DCMAKE_CXX_COMPILER=g++ \
  -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON \
  -DSeastar_CXX_FLAGS=;-Wno-error=stack-usage=-ffile-prefix-map=/home/penberg/src/scylla/scylla=.;-march=westmere;-O3;-Wstack-usage=13312;--param;inline-unit-growth=300 \
  -DSeastar_LD_FLAGS=-Wl,--build-id=sha1,--dynamic-linker=/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////lib64/ld-linux-x86-64.so.2  \
  -DSeastar_CXX_DIALECT=gnu++20 \
  -DSeastar_API_LEVEL=4 \
  -DSeastar_UNUSED_RESULT_ERROR=ON \
  -DSeastar_DPDK=ON \
  -DSeastar_DPDK_MACHINE=wsm
Message-Id: <20200713065509.83184-1-penberg@scylladb.com>
2020-07-13 09:57:38 +03:00
Botond Dénes
ef2c8f563b scylla-gdb.py: scylla fiber: add suggestion for further investigation
scylla fiber often fails to really unwind the entire fiber, stopping
sooner than expected. This is expected as scylla fiber only recognizes
the most standard continuations but can drop the ball as soon as there
is an unusual transmission.
This commits adds a message below the found tasks explaining that the
list might not be exhaustive and prints a command which can be used to
explain why the unwinding stopped at the last task.

While at it also rephrase an out-of-date comment.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200710120813.100009-1-bdenes@scylladb.com>
2020-07-12 15:43:21 +03:00
Dejan Mircevski
29fccd76ea cql/restrictions: Rename find_if to find_atom
As requested in #5763 feedback, rename to avoid clashes with
std::find_if and boost::find_if.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-07-12 14:12:30 +03:00
Dejan Mircevski
9dac9a25e5 cql/restrictions: Constrain find_if and count_if
As requested in #5763 feedback, require that Fn be callable with
binary_operator in the functions mentioned above.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-07-12 14:11:39 +03:00
Pavel Emelyanov
1331623465 test.py: Don't feed fail-on-abandoned-failed-futures to unit tests
The problem is that this option is defined in seastar testing wrapper,
while no unit tests use it, all just start themselves with app.run() and
would complain on unknown option.

"Would", because nowadays every single test in it declares its own options
in suite.yaml, that override test.py's defaults. Once an option-less unit
test is added (B+ tree ones) it will complain.

The proposal is to remove this option from defaults, if any unit test will
use the seastar testing wrappers and will need this option, it can add one
to the suite.yaml.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200709084602.8386-1-xemul@scylladb.com>
2020-07-10 16:21:14 +02:00
Tomasz Grabiec
883ac4a78c Merge "Some selective noexcept bombing" form Pavel E.
The goal is to make the lambdas, that are fed into partition cache's
clear_and_dispose() and erase_in_dispose(), to be noexcept.

This is to satisfy B+, which strictly requires those to be noexcept
(currently used collections don't care).

The set covers not only the strictly required minimum, but also some
other methods that happened to be nearby.

* https://github.com/xemul/scylla/tree/br-noexcepts-over-the-row-cache:
  row_cache: Mark invalidation lambda as noexcept
  cache_tracker: Mark methods noexcept
  cache_entry: Mark methods noexcept
  region: Mark trivial noexcept methods as such
  allocation_strategy: Mark returning lambda as noexcept
  allocation_strategy: Mark trivial noexcept methods as such
  dht: Mark noexcept methods
2020-07-10 15:02:52 +02:00
Nadav Har'El
f549d147ea alternator: fix Expected's "NULL" operator with missing AttributeValueList
The "NULL" operator in Expected (old-style conditional operations) doesn't
have any parameters, so we insisted that the AttributeValueList be empty.
However, we forgot to allow it to also be missing - a possibility which
DynamoDB allows.

This patch adds a test to reproduce this case (the test passes on DyanmoDB,
fails on Alternator before this patch, and succeeds after this patch), and
a fix.

Fixes #6816.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200709161254.618755-1-nyh@scylladb.com>
2020-07-10 07:45:02 +02:00
Benny Halevy
3ce86a7160 test: restrictions_test: set_contains: uncomment check depnding on #6797
Now that #6797 is fixed.

Refs #5763

Cc: Dejan Mircevski <dejan@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Test: restrictions_test(debug)
Message-Id: <20200709123703.955897-1-bhalevy@scylladb.com>
2020-07-09 17:56:09 +03:00
Benny Halevy
ec77777bda bytes: compare_unsigned: do not pass nullptr to memcmp
If any of the compared bytes_view's is empty
consider the empty prefix is same and proceed to compare
the size of the suffix.

A similar issue exists in legacy_compound_view::tri_comparator::operator().
It too must not pass nullptr to memcmp if any of the compared byte_view's
is empty.

Fixes #6797
Refs #6814

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Test: unit(dev)
Branches: all
Message-Id: <20200709123453.955569-1-bhalevy@scylladb.com>
2020-07-09 17:54:46 +03:00
Nadav Har'El
9042161ba3 merge: cdc: better pre/postimages for complicated batches
Merged pull request https://github.com/scylladb/scylla/pull/6741
by Piotr Dulikowski:

This PR changes the algorithm used to generate preimages and postimages
in CDC log. While its behavior is the same for non-batch operations
(with one exception described later), it generates pre/postimages that
are organized more nicely, and account for multiple updates to the same
row in one CQL batch.

Fixes #6597, #6598

Tests:
- unit(dev), for each consecutive commit
- unit(debug), for the last commit

Previous method

The previous method worked on a per delta row basis. First, the base
table is queried for the current state of the rows being modified in
the processed mutation (this is called the "preimage query"). Then,
for each delta row (representing a modification of a row):

    If preimage is enabled and the row was already present in the table,
    a corresponding preimage row is inserted before the delta row.
    The preimage row contains data taken directly from the preimage
    query result. Only columns that are modified by the delta are
    included in the preimage.
    If postimage is enabled, then a postimage row is inserted after the
    delta row. The postimage row contains data which was a result of
    taking row data directly from the preimage query result and applying
    the change the corresponding delta row represented. All columns
    of the row are included in the postimage.

The above works well for simple cases such like singular CQL INSERT,
UPDATE, DELETE, or simple CQL BATCH-es. An example:

cqlsh:ks> BEGIN UNLOGGED BATCH
			INSERT INTO tbl (pk, ck, v) VALUES (0, 1, 111);
			INSERT INTO tbl (pk, ck, v) VALUES (0, 2, 222);
			APPLY BATCH;
cqlsh:ks> SELECT "cdc$batch_seq_no", "cdc$operation", "cdc$ttl",
			pk, ck, v from ks.tbl_scylla_cdc_log ;

 cdc$batch_seq_no | cdc$operation | cdc$ttl | pk | ck | v
------------------+---------------+---------+----+----+-----
...snip...
                0 |             0 |    null |  0 |  1 | 100
                1 |             2 |    null |  0 |  1 | 111
                2 |             9 |    null |  0 |  1 | 111
                3 |             0 |    null |  0 |  2 | 200
                4 |             2 |    null |  0 |  2 | 222
                5 |             9 |    null |  0 |  2 | 222

Preimage rows are represented by cdc operation 0, and postimage by 9.
Please note that all rows presented above share the same value of
cdc$time column, which was not shown here for brevity.

Problems with previous approach

This simple algorithm has some conceptual and implementational problems
which arise when processing more complicated CQL BATCH-es. Consider
the following example:

cqlsh:ks> BEGIN UNLOGGED BATCH
			INSERT INTO tbl (pk, ck, v1) VALUES (0, 0, 1) USING TTL 1000;
			INSERT INTO tbl (pk, ck, v2) VALUES (0, 0, 2) USING TTL 2000;
			APPLY BATCH;
cqlsh:ks> SELECT "cdc$batch_seq_no", "cdc$operation", "cdc$ttl",
			pk, ck, v1, v2 FROM tbl_scylla_cdc_log;

 cdc$batch_seq_no | cdc$operation | cdc$ttl | pk | ck | v1   | v2
------------------+---------------+---------+----+----+------+------
...snip...
                0 |             0 |    null |  0 |  0 | null |    0
                1 |             2 |    2000 |  0 |  0 | null |    2
                2 |             9 |    null |  0 |  0 |    0 |    2
                3 |             0 |    null |  0 |  0 |    0 | null
                4 |             1 |    1000 |  0 |  0 |    1 | null
                5 |             9 |    null |  0 |  0 |    1 |    0

A single cdc group (corresponding to rows sharing the same cdc$time)
might have more than one delta that modify the same row. For example,
this happens when modifying two columns of the same row with
different TTLs - due to our choice of CDC log schema, we must
represent such change with two delta rows.

It does not make sense to present a postimage after the first delta
and preimage before the second - both deltas are applied
simultaneously by the same CQL BATCH, so the middle "image" is purely
imaginary and does not appear at any point in the table.

Moreover, in this example, the last postimage is wrong - v1 is updated,
but v2 is not. None of the postimages presented above represent the
final state of the row.

New algorithm

The new algorithm works now on per cdc group basis, not delta row.
When starting processing a CQL BATCH:

    Load preimage query results into a data structure representing
    current state of the affected rows.

For each cdc group:

    For each row modified within the group, a preimage is produced,
    regardless if the row was present in the table. The preimage
    is calculated based on the current state. Only include columns
    that are modified for this row within the group.
    For each delta, produce a delta row and update the current state
    accordingly.
    Produce postimages in the same way as preimages - but include all
    columns for each row in the postimage.

The new algorithm produces postimage correctly when multiple deltas
affect one, because the state of the row is updated on the fly.

This algorithm moves preimage and postimage rows to the beginning and
the end of the cdc group, accordingly. This solves the problem of
imaginary preimages and postimages appearing inside a cdc group.

Unfortunately, it is possible for one CQL BATCH to contain changes that
use multiple timestamps. This will result in one CQL BATCH creating
multiple cdc groups, with different cdc$time. As it is impossible, with
our choice of schema, to tell that those cdc groups were created from
one CQL BATCH, instead we pretend as if those groups were separate CQL
operations. By tracking the state of the affected rows, we make sure
that preimage in later groups will reflect changes introduces in
previous groups.

One more thing - this algorithm should have the same results for
singular CQL operations and simple CQL BATCH-es, with one exception.
Previously, preimage not produced if a row was not present in the
table. Now, the preimage row will appear unconditionally - it will have
nulls in place of column values.

* 'cdc-pre-postimage-persistence' of github.com:piodul/scylla:
  cdc: fix indentation
  cdc: don't update partition state when not needed
  cdc: implement pre/postimage persistence
  cdc: add interface for producing pre/postimages
  cdc: load preimage query result into partition state fields
  cdc: introduce fields for keeping partition state
  cdc: rename set_pk_columns -> allocate_new_log_row
  cdc: track batch_no inside transformer
  cdc: move cdc$time generation to transformer
  cdc: move find_timestamp to split.cc
  cdc: introduce change_processor interface
  cdc: remove redundant schema arguments from cdc functions
  cdc: move management of generated mutations inside transformer
  cdc: move preimage result set into a field of transformer
  cdc: keep ts and tuuid inside transformer
  cdc: track touched parts of mutations inside transformer
  cdc: always include preimage for affected rows
2020-07-09 16:55:55 +03:00
Pavel Emelyanov
bb32cff23d row_cache: Mark invalidation lambda as noexcept
It calls noexcept functions inside and handles the exception from throwing one itself

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-09 14:46:38 +03:00
Pavel Emelyanov
1346289151 cache_tracker: Mark methods noexcept
All but few are trivially such.

The clear_continuity() calls cache_entry::set_continuous() that had become noexcept
a patch ago.

The allocator() calls region.allocator() which had been marked noexcept few patches
back.

The on_partition_erase() calls allocator().invalidate_references(), both had
been marked noexcept few patches back.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-09 14:44:17 +03:00
Pavel Emelyanov
d4ef845136 cache_entry: Mark methods noexcept
All but one are trivially such, the position() one calls is_dummy_entry()
which has become noexcept right now.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-09 14:41:43 +03:00
Pavel Emelyanov
3237796e00 region: Mark trivial noexcept methods as such
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-09 14:41:37 +03:00
Pavel Emelyanov
2c4a94aeab allocation_strategy: Mark returning lambda as noexcept
It just calls current_alloctor().destroy() which is noexcept

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-09 14:41:23 +03:00
Pavel Emelyanov
a497dfdd0b allocation_strategy: Mark trivial noexcept methods as such
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-09 14:41:03 +03:00
Pavel Emelyanov
6d7ae4ead1 dht: Mark noexcept methods
These are either trivially noexcept already, or call each-other, thus becoming noexcept too

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-09 14:41:03 +03:00
Piotr Sarna
7ae3b25d8e alternator: cleanup raw GetString() calls
Instead of using raw GetString() from rapidjson, it's neater
to use a helper for creating string views: rjson::to_string_view().
Message-Id: <3afda97403d4601c9600f6838f2028bfabd2f2f9.1594289250.git.sarna@scylladb.com>
2020-07-09 13:58:40 +03:00
Piotr Sarna
75dbaa0834 test: add alternator test for incorrect numeric values
The test case is put inside test_manual_requests suite, because
boto3 validates numeric inputs and does not allow passing arbitrary
incorrect values.

Tests: unit(dev), alternator(local, remote)

Message-Id: <ac2baedc2ea61f0d857e7c01839f34cd15f7e02d.1594289250.git.sarna@scylladb.com>
2020-07-09 13:58:33 +03:00
Piotr Sarna
96426df72e alternator: translate number errors to ValidationException
In order to be consistent with returned error types, marshaling
exceptions thrown from parsing big decimals are translated
to ValidationException.

Message-Id: <1446878cd63ad8291327a399cf700e4f402d108c.1594289250.git.sarna@scylladb.com>
2020-07-09 13:58:25 +03:00
Dejan Mircevski
d956233a80 cql_query_test: Drop get() on cquery_nofail result
cquery_nofail returns the query result, not a future.  Invoking .get()
on its result is unnecessary.  This just happened to compile because
shared_ptr has a get() method with the same signature as future::get.

Tests: cql_query_test unit test (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-07-09 13:52:52 +03:00
Nadav Har'El
8b3dac040a alternator: add request headers to trace-level logging
When "trace"-level logging is enabled for Alternator, we log every request,
but currently only the request's body. For debugging, it is sometimes useful
to also see the headers - which are important to debug authentication,
for example. So let's print the headers as well.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200709103414.599883-1-nyh@scylladb.com>
2020-07-09 12:38:45 +02:00