Compare commits

..

173 Commits

Author SHA1 Message Date
Piotr Sarna
b764db3f1c cql: fix qualifying indexed columns for filtering
When qualifying columns to be fetched for filtering, we also check
if the target column is not used as an index - in which case there's
no need of fetching it. However, the check was incorrectly assuming
that any restriction is eligible for indexing, while it's currently
only true for EQ. The fix makes a more specific check and contains
many dynamic casts, but these will hopefully we gone once our
long planned "restrictions rewrite" is done.
This commit comes with a test.

Fixes #5708
Tests: unit(dev)

(cherry picked from commit 767ff59418)
2020-03-22 10:08:48 +01:00
Konstantin Osipov
304d339193 locator: correctly select endpoints if RF=0
SimpleStrategy creates a list of endpoints by iterating over the set of
all configured endpoints for the given token, until we reach keyspace
replication factor.
There is a trivial coding bug when we first add at least one endpoint
to the list, and then compare list size and replication factor.
If RF=0 this never yields true.
Fix by moving the RF check before at least one endpoint is added to the
list.
Cassandra never had this bug since it uses a less fancy while()
loop.

Fixes #5962
Message-Id: <20200306193729.130266-1-kostja@scylladb.com>

(cherry picked from commit ac6f64a885)
2020-03-12 12:10:45 +02:00
Avi Kivity
9f7ba4203d logalloc: increase capacity of _regions vector outside reclaim lock
Reclaim consults the _regions vector, so we don't want it moving around while
allocating more capacity. For that we take the reclaim lock. However, that
can cause a false-positive OOM during startup:

1. all memory is allocated to LSA as part of priming (2baa16b371)
2. the _regions vector is resized from 64k to 128k, requiring a segment
   to be freed (plenty are free)
3. but reclaiming_lock is taken, so we cannot reclaim anything.

To fix, resize the _regions vector outside the lock.

Fixes #6003.
Message-Id: <20200311091217.1112081-1-avi@scylladb.com>

(cherry picked from commit c020b4e5e2)
2020-03-12 11:25:50 +02:00
Benny Halevy
8b6a792f81 dist/redhat: scylla.spec.mustache: set _no_recompute_build_ids
By default, `/usr/lib/rpm/find-debuginfo.sh` will temper with
the binary's build-id when stripping its debug info as it is passed
the `--build-id-seed <version>.<release>` option.

To prevent that we need to set the following macros as follows:
  unset `_unique_build_ids`
  set `_no_recompute_build_ids` to 1

Fixes #5881

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 25a763a187)
2020-03-09 15:22:58 +02:00
Benny Halevy
8a94f6b180 gossiper: do_stop_gossiping: copy live endpoints vector
It can be resized asynchronously by mark_dead.

Fixes #5701

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200203091344.229518-1-bhalevy@scylladb.com>
(cherry picked from commit f45fabab73)
2020-02-26 13:00:33 +02:00
Benny Halevy
27209a5b2e storage_service: drain_on_shutdown: unregister storage_proxy subscribers from local_storage_service
Match subscription done in main() and avoid cross shard access
to _lifecycle_subscribers vector.

Fixes #5385

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Acked-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200123092817.454271-1-bhalevy@scylladb.com>
(cherry picked from commit 5b0ea4c114)
2020-02-25 16:40:31 +02:00
Hagit Segev
c25f627a6e release: prepare for 3.1.4 2020-02-24 23:53:03 +02:00
Avi Kivity
58b1bdc20c Revert "streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations"
This reverts commit 4791b726a0. It exposes a data
resurrection bug (#5858).
2020-02-24 11:41:37 +02:00
Piotr Dulikowski
e1a7df174c hh: handle counter update hints correctly
This patch fixes a bug that appears because of an incorrect interaction
between counters and hinted handoff.

When a counter is updated on the leader, it sends mutations to other
replicas that contain all counter shards from the leader. If consistency
level is achieved but some replicas are unavailable, a hint with
mutation containing counter shards is stored.

When a hint's destination node is no longer its replica, it is attempted
to be sent to all its current replicas. Previously,
storage_proxy::mutate was used for that purpose. It was incorrect
because that function treats mutations for counter tables as mutations
containing only a delta (by how much to increase/decrease the counter).
These two types of mutations have different serialization format, so in
this case a "shards" mutation is reinterpreted as "delta" mutation,
which can cause data corruption to occur.

This patch backports `storage_proxy::mutate_hint_from_scratch`
function, which bypasses special handling of counter mutations and
treats them as regular mutations - which is the correct behavior for
"shards" mutations.

Refs #5833.
Backports: 3.1, 3.2, 3.3
Tests: unit(dev)
(cherry picked from commit ec513acc49)
2020-02-19 18:04:06 +02:00
Avi Kivity
6c39e17838 Merge "cql3: time_uuid_fcts: validate time UUID" from Benny
"
Throw an error in case we hit an invalid time UUID
rather than hitting an assert.

Fixes #5552

(Ref #5588 that was dequeued and fixed here)

Test: UUID_test, cql_query_test(debug)
"

* 'validate-time-uuid' of https://github.com/bhalevy/scylla:
  cql3: abstract_function_selector: provide assignment_testable_source_context
  test: cql_query_test: add time uuid validation tests
  cql3: time_uuid_fcts: validate timestamp arg
  cql3: make_max_timeuuid_fct: delete outdated FIXME comment
  cql3: time_uuid_fcts: validate time UUID
  test: UUID_test: add tests for time uuid
  utils: UUID: create_time assert nanos_since validity
  utils/UUID_gen: make_nanos_since
  utils: UUID: assert UUID.is_timestamp

(cherry picked from commit 3343baf159)

Conflicts:
	cql3/functions/time_uuid_fcts.hh
	tests/cql_query_test.cc
2020-02-17 20:09:09 +02:00
Avi Kivity
507d763f45 Update seastar submodule
* seastar a5312ab85a...a51bd8b91a (1):
  > config: Do not allow zero rates

Fixes #5360.
2020-02-16 17:02:54 +02:00
Benny Halevy
b042e27f0a repair: initialize row_level_repair: _zero_rows
Avoid following UBSAN error:
repair/row_level.cc:2141:7: runtime error: load of value 240, which is not a valid value for type 'bool'

Fixes #5531

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 474ffb6e54)
2020-02-16 16:11:36 +02:00
Rafael Ávila de Espíndola
375ce345a3 main: Explicitly allow scylla core dumps
I have not looked into the security reason for disabling it when
a program has file capabilities.

Fixes #5560

[avi: remove extraneous semicolon]
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200106231836.99052-1-espindola@scylladb.com>
(cherry picked from commit b80852c447)
2020-02-16 16:04:24 +02:00
Avi Kivity
493b821dfa Update seastar submodule
* seastar cfc082207c...a5312ab85a (1):
  > perftune.py: Use safe_load() for fix arbitrary code execution

Fixes #5630.
2020-02-16 15:55:05 +02:00
Avi Kivity
e1999c76b2 tools: toolchain: dbuild: relax process limit in container
Docker restricts the number of processes in a container to some
limit it calculates. This limit turns out to be too low on large
machines, since we run multiple links in parallel, and each link
runs many threads.

Remove the limit by specifying --pids-limit -1. Since dbuild is
meant to provide a build environment, not a security barrier,
this is okay (the container is still restricted by host limits).

I checked that --pids-limit is supported by old versions of
docker and by podman.

Fixes #5651.
Message-Id: <20200127090807.3528561-1-avi@scylladb.com>

(cherry picked from commit 897320f6ab)
2020-02-16 15:42:12 +02:00
Asias He
4791b726a0 streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations
The table::flush_streaming_mutations is used in the days when streaming
data goes to memtable. After switching to the new streaming, data goes
to sstables directly in streaming, so the sstables generated in
table::flush_streaming_mutations will be empty.

It is unnecessary to invalidate the cache if no sstables are added. To
avoid unnecessary cache invalidating which pokes hole in the cache, skip
calling _cache.invalidate() if the sstables is empty.

The steps are:

- STREAM_MUTATION_DONE verb is sent when streaming is done with old or
  new streaming
- table::flush_streaming_mutations is called in the verb handler
- cache is invalidated for the streaming ranges

In summary, this patch will avoid a lot of cache invalidation for
streaming.

Backports: 3.0 3.1 3.2
Fixes: #5769
(cherry picked from commit 5e9925b9f0)
2020-02-16 15:16:50 +02:00
Botond Dénes
d7354a5b8d row: append(): downgrade assert to on_internal_error()
This assert, added by 060e3f8 is supposed to make sure the invariant of
the append() is respected, in order to prevent building an invalid row.
The assert however proved to be too harsh, as it converts any bug
causing out-of-order clustering rows into cluster unavailability.
Downgrade it to on_internal_error(). This will still prevent corrupt
data from spreading in the cluster, without the unavailability caused by
the assert.

Fixes: #5786
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200211083829.915031-1-bdenes@scylladb.com>
(cherry picked from commit 3164456108)
2020-02-16 15:14:03 +02:00
Takuya ASADA
6815b72b06 dist/debian: keep /etc/systemd .conf files on 'remove'
Since dpkg does not re-install conffiles when it removed by user,
currently we are missing dependencies.conf and sysconfdir.conf on rollback.
To prevent this, we need to stop running
'rm -rf /etc/systemd/system/scylla-server.service.d/' on 'remove'.

Fixes #5734

(cherry picked from commit 43097854a5)
2020-02-12 14:29:30 +02:00
Rafael Ávila de Espíndola
efc2df8ca3 types: Fix encoding of negative varint
We would sometimes produce an unnecessary extra 0xff prefix byte.

The new encoding matches what cassandra does.

This was both a efficiency and correctness issue, as using varint in a
key could produce different tokens.

Fixes #5656

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
(cherry picked from commit c89c90d07f)
2020-02-02 17:35:02 +02:00
Avi Kivity
dbe90f131f test: make eventually() more patient
We use eventually() in tests to wait for eventually consistent data
to become consistent. However, we see spurious failures indicating
that we wait too little.

Increasing the timeout has a negative side effect in that tests that
fail will now take longer to do so. However, this negative side effect
is negligible to false-positive failures, since they throw away large
test efforts and sometimes require a person to investigate the problem,
only to conclude it is a false positive.

This patch therefore makes eventually() more patient, by a factor of
32.

Fixes #4707.
Message-Id: <20200130162745.45569-1-avi@scylladb.com>

(cherry picked from commit ec5b721db7)
2020-02-01 13:22:03 +02:00
Takuya ASADA
31d5d16c3d dist/debian: Use tilde for release candidate builds
We need to add '~' to handle rcX version correctly on Debian variants
(merged at ae33e9f), but when we moved to relocated package we mistakenly
dropped the code, so add the code again.

Fixes #5641

(cherry picked from commit dd81fd3454)
2020-01-28 18:35:40 +02:00
Hagit Segev
b0d122f9c5 release: prepare for 3.1.3 2020-01-28 14:09:57 +02:00
Asias He
9a10e4a245 repair: Avoid duplicated partition_end write
Consider this:

1) Write partition_start of p1
2) Write clustering_row of p1
3) Write partition_end of p1
4) Repair is stopped due to error before writing partition_start of p2
5) Repair calls repair_row_level_stop() to tear down which calls
   wait_for_writer_done(). A duplicate partition_end is written.

To fix, track the partition_start and partition_end written, avoid
unpaired writes.

Backports: 3.1 and 3.2
Fixes: #5527
(cherry picked from commit 401854dbaf)
2020-01-21 13:39:19 +02:00
Piotr Sarna
871d1ebdd5 view: ignore duplicated key entries in progress virtual reader
Build progress virtual reader uses Scylla-specific
scylla_views_builds_in_progress table in order to represent
legacy views_builds_in_progress rows. The Scylla-specific table contains
additional cpu_id clustering key part, which is trimmed before returning
it to the user. That may cause duplicated clustering row fragments to be
emitted by the reader, which may cause undefined behaviour in consumers.
The solution is to keep track of previous clustering keys for each
partition and drop fragments that would cause duplication. That way if
any shard is still building a view, its progress will be returned,
and if many shards are still building, the returned value will indicate
the progress of a single arbitrary shard.

Fixes #4524
Tests:
unit(dev) + custom monotonicity checks from <tgrabiec@scylladb.com>

(cherry picked from commit 85a3a4b458)
2020-01-16 12:07:40 +01:00
Tomasz Grabiec
bff996959d cql: alter type: Format field name as text instead of hex
Fixes #4841

Message-Id: <1565702635-26214-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 64ff1b6405)
2020-01-05 18:51:53 +02:00
Gleb Natapov
1bdc83540b cache_hitrate_calculator: do not ignore a future returned from gossiper::add_local_application_state
We should wait for a future returned from add_local_application_state() to
resolve before issuing new calculation, otherwise two
add_local_application_state() may run simultaneously for the same state.

Fixes #4838.

Message-Id: <20190812082158.GE17984@scylladb.com>
(cherry picked from commit 00c4078af3)
2020-01-05 18:50:13 +02:00
Takuya ASADA
478c35e07a dist/debian: fix missing scyllatop files
Debian package build script does runs relocate_python_scripts.py for scyllatop,
but mistakenly forgetting to install tools/scyllatop/*.py.
We need install them by using scylla-server.install.

Fixes #5518

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191227025750.434407-1-syuu@scylladb.com>
2019-12-30 19:38:34 +02:00
Benny Halevy
ba968ab9ec tracing: one_session_records: keep local tracing ptr
Similar to trace_state keep shared_ptr<tracing> _local_tracing_ptr
in one_session_records when constructed so it can be used
during shutdown.

Fixes #5243

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 7aef39e400)
2019-12-24 18:42:21 +02:00
Avi Kivity
883b5e8395 database: fix schema use-after-move in make_multishard_streaming_reader
On aarch64, asan detected a use-after-move. It doesn't happen on x86_64,
likely due to different argument evaluation order.

Fix by evaluating full_slice before moving the schema.

Note: I used "auto&&" and "std::move()" even though full_slice()
returns a reference. I think this is safer in case full_slice()
changes, and works just as well with a reference.

Fixes #5419.

(cherry picked from commit 85822c7786)
2019-12-24 18:35:01 +02:00
Rafael Ávila de Espíndola
b47033676a types: recreate dependent user types.
In the system.types table a user type refers to another by name. When
a user type is modified, only its entry in the table is changed.

At runtime a user type has direct pointer to the types it uses. To
handle the discrepancy we need to recreate any dependent types when a
entry in system.types changes.

Fixes #5049

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
(cherry picked from commit 5af8b1e4a3)
2019-12-23 17:58:26 +02:00
Tomasz Grabiec
67e45b73f0 types: Fix abort on type alter which affects a compact storage table with no regular columns
Fixes #4837

Message-Id: <1565702247-23800-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 34cff6ed6b)
2019-12-23 17:34:06 +02:00
Rafael Ávila de Espíndola
37eac75b6f cql: Fix use of UDT in reversed columns
We were missing calls to underlying_type in a few locations and so the
insert would think the given literal was invalid and the select would
refuse to fetch a UDT field.

Fixes #4672

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190708200516.59841-1-espindola@scylladb.com>
(cherry picked from commit 4e7ffb80c0)
2019-12-23 15:54:36 +02:00
Piotr Sarna
e8431a3474 table: Reduce read amplification in view update generation
This commit makes sure that single-partition readers for
read-before-write do not have fast-forwarding enabled,
as it may lead to huge read amplification. The observed case was:
1. Creating an index.
  CREATE INDEX index1  ON myks2.standard1 ("C1");
2. Running cassandra-stress in order to generate view updates.
cassandra-stress write no-warmup n=1000000 cl=ONE -schema \
  'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \
  keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors
  skip-read-validation -node 127.0.0.1;

Without disabling fast-forwarding, single-partition readers
were turned into scanning readers in cache, which resulted
in reading 36GB (sic!) on a workload which generates less
than 1GB of view updates. After applying the fix, the number
dropped down to less than 1GB, as expected.

Refs #5409
Fixes #4615
Fixes #5418

(cherry picked from commit 79c3a508f4)
2019-12-05 22:36:20 +02:00
Yaron Kaikov
9d78d848e6 release: prepare for 3.1.2 2019-11-27 10:24:43 +02:00
Rafael Ávila de Espíndola
32aa6ddd7e commitlog: make sure a file is closed
If allocate or truncate throws, we have to close the file.

Fixes #4877

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191114174810.49004-1-espindola@scylladb.com>
(cherry picked from commit 6160b9017d)
2019-11-24 17:48:24 +02:00
Tomasz Grabiec
74cc9477af row_cache: Fix abort on bad_alloc during cache update
Since 90d6c0b, cache will abort when trying to detach partition
entries while they're updated. This should never happen. It can happen
though, when the update fails on bad_alloc, because the cleanup guard
invalidates the cache before it releases partition snapshots (held by
"update" coroutine).

Fix by destroying the coroutine first.

Fixes #5327.

Tests:
  - row_cache_test (dev)

Message-Id: <1574360259-10132-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit e3d025d014)
2019-11-24 17:44:07 +02:00
Nadav Har'El
95acf71680 merge: row_marker: correct row expiry condition
Merged patch set by Piotr Dulikowski:

This change corrects condition on which a row was considered expired by its
TTL.

The logic that decides when a row becomes expired was inconsistent with the
logic that decides if a single cell is expired. A single cell becomes expired
when expiry_timestamp <= now, while a row became expired when
expiry_timestamp < now (notice the strict inequality). For rows inserted
with TTL, this caused non-key cells to expire (change their values to null)
one second before the row disappeared. Now, row expiry logic uses non-strict
inequality.

Fixes #4263,
Fixes #5290.

Tests:

    unit(dev)
    python test described in issue #5290

(cherry picked from commit 9b9609c65b)
2019-11-20 21:40:11 +02:00
Asias He
921f8baf00 gossip: Fix max generation drift measure
Assume n1 and n2 in a cluster with generation number g1, g2. The
cluster runs for more than 1 year (MAX_GENERATION_DIFFERENCE). When n1
reboots with generation g1' which is time based, n2 will see
g1' > g2 + MAX_GENERATION_DIFFERENCE and reject n1's gossip update.

To fix, check the generation drift with generation value this node would
get if this node were restarted.

This is a backport of CASSANDRA-10969.

Fixes #5164

(cherry picked from commit 0a52ecb6df)
2019-11-20 11:39:16 +02:00
Avi Kivity
071d7d9210 reloc: do not install dependencies when building the relocatable package
The dependencies are provided by the frozen toolchain. If a dependency
is missing, we must update the toolchain rather than rely on build-time
installation, which is not reproducible (as different package versions
are available at different times).

Luckily "dnf install" does not update an already-installed package. Had
that been a case, none of our builds would have been reproducible, since
packages would be updated to the latest version as of the build time rather
than the version selected by the frozen toolchain.

So, to prevent missing packages in the frozen toolchain translating to
an unreproducible build, remove the support for installing dependencies
from reloc/build_reloc.sh. We still parse the --nodeps option in case some
script uses it.

Fixes #5222.

Tests: reloc/build_reloc.sh.
(cherry picked from commit cd075e9132)
2019-11-18 14:58:24 +02:00
Kamil Braun
769b9bbe59 view: fix bug in virtual columns.
When creating a virtual column of non-frozen map type,
the wrong type was used for the map's keys.

Fixes #5165.

(cherry picked from commit ef9d5750c8)
2019-11-18 14:55:17 +02:00
Benny Halevy
d4e553c153 sstables: delete_atomically: fix misplaced parenthesis in pending_delete_log warning message
Fixes #4861.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190818064637.9207-1-bhalevy@scylladb.com>
(cherry picked from commit 20083be9f6)
2019-11-17 17:57:00 +02:00
Avi Kivity
d983411488 build: adjust libthread_db file name to match gdb expectations
gdb searches for libthread_db.so using its canonical name of libthread_db.so.1 rather
than the file name of libthread_db-1.0.so, so use that name to store the file in the
archive.

Fixes #4996.

(cherry picked from commit d77171e10e)
2019-11-17 17:57:00 +02:00
Avi Kivity
27de1bb8e6 reconcilable_result: use chunked_vector to hold partitions
Usually, a reconcilable_result holds very few partitions (1 is common),
since the page size is limited by 1MB. But if we have paging disabled or
if we are reconciling a range full of tombstones, we may see many more.
This can cause large allocations.

Change to chunked_vector to prevent those large allocations, as they
can be quite expensive.

Fixes #4780.

(cherry picked from commit 093d2cd7e5)
2019-11-17 17:57:00 +02:00
Avi Kivity
854f8ccb40 utils::chunked_vector: add rbegin() and related iterators
Needed as an std::vector replacement.

(cherry picked from commit eaa9a5b0d7)

Prerequisite for #4780.
2019-11-17 17:57:00 +02:00
Avi Kivity
a68170c9a3 utils: chunked_vector: make begin()/end() const correct
begin() of a const vector should return a const_iterator, to avoid
giving the caller the ability to mutate it.

This slipped through since iterator's constructor does a const_cast.

Noticed by code inspection.

(cherry picked from commit df6faae980)

Prerequisite for #4780.
2019-11-17 17:57:00 +02:00
Glauber Costa
7e4bcf2c0f do not crash in user-defined operations if the controller is disabled
Scylla currently crashes if we run manual operations like nodetool
compact with the controller disabled. While we neither like nor
recommend running with the controller disabled, due to some corner cases
in the controller algorithm we are not yet at the point in which we can
deprecate this and are sometimes forced to disable it.

The reason for the crash is that manual operations will invoke
_backlog_of_shares, which returns what is the backlog needed to
create a certain number of shares. That scan the existing control
points, but when we run without the controller there are no control
points and we crash.

Backlog doesn't matter if the controller is disabled, and the return
value of this function will be immaterial in this case. So to avoid the
crash, we return something right away if the controller is disabled.

Fixes #5016

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit c9f2d1d105)
2019-11-17 12:33:23 +02:00
Avi Kivity
a74b3a182e Merge "Add proper aggregation for paged indexing" from Piotr
"
Fixes #4540

This series adds proper handling of aggregation for paged indexed queries.
Before this series returned results were presented to the user in per-page
partial manner, while they should have been returned as a single aggregated
value.

Tests: unit(dev)
"

* 'add_proper_aggregation_for_paged_indexing_for_3.1' of https://github.com/psarna/scylla:
  test: add 'eventually' block to index paging test
  tests: add indexing+paging test case for clustering keys
  tests: add indexing + paging + aggregation test case
  tests: add query_options to cquery_nofail
  cql3: make DEFAULT_COUNT_PAGE_SIZE constant public
  cql3: add proper aggregation to paged indexing
  cql3: add a query options constructor with explicit page size
  cql3: enable explicit copying of query_options
  cql3: split execute_base_query implementation
2019-11-17 12:23:30 +02:00
Piotr Sarna
e9bc579565 test: add 'eventually' block to index paging test
Without 'eventually', the test is flaky because the index can still
be not up to date while checking its conditions.

Fixes #4670

(cherry picked from commit ebbe038d19)
2019-11-15 09:12:40 +01:00
Piotr Sarna
ad46bf06a7 tests: add indexing+paging test case for clustering keys
Indexing a non-prefix part of the clustering key has a separate
code path (see issue #3405), so it deserves a separate test case.
2019-11-14 10:26:49 +01:00
Piotr Sarna
1ff21a28b7 tests: add indexing + paging + aggregation test case
Indexed queries used to erroneously return partial per-page results
for aggregation queries. This test case used to reproduce the problem
and now ensures that there would be no regressions.

Refs #4540
2019-11-14 10:26:49 +01:00
Piotr Sarna
fb3dfaa736 tests: add query_options to cquery_nofail
The cquery_nofail utility is extended, so it can accept custom
query options, just like execute_cql does.
2019-11-14 10:26:49 +01:00
Piotr Sarna
5a02e6976f cql3: make DEFAULT_COUNT_PAGE_SIZE constant public
The constant will be later used in test scenarios.
2019-11-14 10:26:49 +01:00
Piotr Sarna
5202eea7a7 cql3: add proper aggregation to paged indexing
Aggregated and paged filtering needs to aggregate the results
from all pages in order to avoid returning partial per-page
results. It's a little bit more complicated than regular aggregation,
because each paging state needs to be translated between the base
table and the underlying view. The routine keeps fetching pages
from the underlying view, which are then used to fetch base rows,
which go straight to the result set builder.

Fixes #4540
2019-11-14 10:26:48 +01:00
Gleb Natapov
038733f1a5 storage_proxy: do not release mutation if not all replies were received
MV backpressure code frees mutation for delayed client replies earlier
to save memory. The commit 2d7c026d6e that
introduced the logic claimed to do it only when all replies are received,
but this is not the case. Fix the code to free only when all replies
are received for real.

Fixes #5242

Message-Id: <20191113142117.GA14484@scylladb.com>
(cherry picked from commit 552c56633e)
2019-11-14 11:04:27 +02:00
Piotr Sarna
0ed2e90925 cql3: add a query options constructor with explicit page size
For internal use, there already exists a query_options constructor
that copies data from another query_options with overwritten paging
state. This commit adds an option to overwrite page size as well.
2019-11-14 09:58:35 +01:00
Piotr Sarna
9ee6d2bc15 cql3: enable explicit copying of query_options 2019-11-14 09:58:28 +01:00
Piotr Sarna
23582a2ce9 cql3: split execute_base_query implementation
In order to handle aggregation queries correctly, the function that
returns base query results is split into two, so it's possible to
access raw query results, before they're converted into end-user
CQL message.
2019-11-14 09:58:05 +01:00
Takuya ASADA
5ddf0ec1df dist/common/scripts/scylla_setup: don't proceed with empty NIC name
Currently NIC selection prompt on scylla_setup just proceed setup when
user just pressed Enter key on the prompt.
The prompt should ask NIC name again until user input correct NIC name.

Fixes #4517
Message-Id: <20190617124925.11559-1-syuu@scylladb.com>

(cherry picked from commit 7320c966bc)
2019-11-13 17:27:21 +02:00
Avi Kivity
e6eb54af90 Update seastar submodule
* seastar 75488f6ef2...cfc082207c (2):
  > core: fix a race in execution stages
  > execution_stage: prevent unbounded growth

Fixes #4749.
Fixes #4856.
2019-11-13 13:14:27 +02:00
Piotr Sarna
f5a869966a view: fix view_info select statement for local indexes
Calculating the select statement for given view_info structure
used to work fine, but once local indexes were introduced, a subtle
bug appeared: the legacy token column does not exist in local indexes
and a valid clustering key column was omitted instead.
That results in potentially incorrect partition slices being used later
in read-before-write.
There's a long term plan for removing select_statement from
view info altogether, but nonetheless the bug needs to be fixed first.

cherry picked from commit 9e98b51aaa

Fixes #5241
Message-Id: <cb2e863e8e993e00ec7329505f737a9ce4b752ae.1572432826.git.sarna@scylladb.com>
2019-11-01 08:06:30 +02:00
Piotr Sarna
0c70cd626b index: add is_global_index() utility
The helper function is useful for determining if given schema
represents a global index.

cherry picked from commit 2ee8c6f595
Message-Id: <db5c9383e426fb2e55e5dbeebc7b8127afc91158.1572432826.git.sarna@scylladb.com>
2019-11-01 08:06:25 +02:00
Botond Dénes
0928aa4791 repair: repair_cf_range(): extract result of local checksum calculation only once
The loop that collects the result of the checksum calculations and logs
any errors. The error logging includes `checksums[0]` which corresponds
to the checksum calculation on the local node. This violates the
assumption of the code following the loop, which assumes that the future
of `checksums[0]` is intact after the loop terminates. However this is
only true when the checksum calculation is successful and is false when
it fails, as in this case the loop extracts the error and logs it. When
the code after the loop checks again whether said calculation failed, it
will get a false negative and will go ahead and attempt to extract the
value, triggering an assert failure.
Fix by making sure that even in the case of failed checksum calculation,
the result of `checksum[0]` is extracted only once.

Fixes: #5238
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191029151709.90986-1-bdenes@scylladb.com>
(cherry picked from commit e48f301e95)
2019-10-29 20:43:30 +02:00
Yaron Kaikov
f32ec885c4 release: prepare for 3.1.1 2019-10-24 21:55:50 +03:00
Tomasz Grabiec
762eec2bc6 Merge "Fix TTL serialization breakage" from Avi
ommit 93270dd changed gc_clock to be 64-bit, to fix the Y2038
problem. While 64-bit tombstone::deletion_time is serialized in a
compatible way, TTLs (gc_clock::duration) were not.

This patchset reverts TTL serialization to the 32-bit serialization
format, and also allows opting-in to the 64-bit format in case a
cluster was installed with the broken code. Only Scylla 3.1.0 is
vulnerable.

Fixes #4855

Tests: unit (dev)
(cherry picked from commit e621db591e)
2019-10-24 08:55:34 +03:00
Avi Kivity
3f4d9f210f Merge "Fix handling of schema alters and eviction in cache" from Tomasz
"
Fixes #5134, Eviction concurrent with preempted partition entry update after
  memtable flush may allow stale data to be populated into cache.

Fixes #5135, Cache reads may miss some writes if schema alter followed by a
  read happened concurrently with preempted partition entry update.

Fixes #5127, Cache populating read concurrent with schema alter may use the
  wrong schema version to interpret sstable data.

Fixes #5128, Reads of multi-row partitions concurrent with memtable flush may
  fail or cause a node crash after schema alter.
"

* tag 'fix-cache-issues-with-schema-alter-and-eviction-v2' of github.com:tgrabiec/scylla:
  tests: row_cache: Introduce test_alter_then_preempted_update_then_memtable_read
  tests: row_cache_stress_test: Verify all entries are evictable at the end
  tests: row_cache_stress_test: Exercise single-partition reads
  tests: row_cache_stress_test: Add periodic schema alters
  tests: memtable_snapshot_source: Allow changing the schema
  tests: simple_schema: Prepare for schema altering
  row_cache: Record upgraded schema in memtable entries during update
  memtable: Extract memtable_entry::upgrade_schema()
  row_cache, mvcc: Prevent locked snapshots from being evicted
  row_cache: Make evict() not use invalidate_unwrapped()
  mvcc: Introduce partition_snapshot::touch()
  row_cache, mvcc: Do not upgrade schema of entries which are being updated
  row_cache: Use the correct schema version to populate the partition entry
  delegating_reader: Optimize fill_buffer()
  row_cache, memtable: Use upgrade_schema()
  flat_mutation_reader: Introduce upgrade_schema()

(cherry picked from commit 8ed6f94a16)
2019-10-18 13:59:40 +02:00
Yaron Kaikov
9c3cdded9e release: prepare for 3.1.0 2019-10-12 08:45:49 +03:00
yaronkaikov
05272c53ed release: prepare for 3.1.0.rc9 2019-10-06 10:51:37 +03:00
Botond Dénes
393b2abdc9 querier_cache: correctly account entries evicted on insertion in the population
Currently, the population stat is not increased for entries that are
evicted immediately on insert, however the code that does the eviction
still decreases the population stat, leading to an imbalance and in some
cases the underflow of the population stat. To fix, unconditionally
increase the population stat upon inserting an entry, regardless of
whether it is immediately evicted or not.

Fixes: #5123

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191001153215.82997-1-bdenes@scylladb.com>
(cherry picked from commit 00b432b61d)
2019-10-05 12:36:21 +03:00
Avi Kivity
d9dc8f92cc Merge " hinted handoff: fix races during shutdown and draining" from Vlad
"
Fix races that may lead to use-after-free events and file system level exceptions
during shutdown and drain.

The root cause of use-after-free events in question is that space_watchdog blocks on
end_point_hints_manager::file_update_mutex() and we need to make sure this mutex is alive as long as
it's accessed even if the corresponding end_point_hints_manager instance
is destroyed in the context of manager::drain_for().

File system exceptions may occur when space_watchdog attempts to scan a
directory while it's being deleted from the drain_for() context.
In case of such an exception new hints generation is going to be blocked
- including for materialized views, till the next space_watchdog round (in 1s).

Issues that are fixed are #4685 and #4836.

Tested as follows:
 1) Patched the code in order to trigger the race with (a lot) higher
    probability and running slightly modified hinted handoff replace
    dtest with a debug binary for 100 times. Side effect of this
    testing was discovering of #4836.
 2) Using the same patch as above tested that there are no crashes and
    nodes survive stop/start sequences (they were not without this series)
    in the context of all hinted handoff dtests. Ran the whole set of
    tests with dev binary for 10 times.
"

Fixes #4685
Fixes #4836

* 'hinted_handoff_race_between_drain_for_and_space_watchdog_no_global_lock-v2' of https://github.com/vladzcloudius/scylla:
  hinted handoff: fix a race on a directory removal between space_watchdog and drain_for()
  hinted handoff: make taking file_update_mutex safe
  db::hints::manager::drain_for(): fix alignment
  db::hints::manager: serialize calls to drain_for()
  db::hints: cosmetics: identation and missing method qualifier

(cherry picked from commit 3cb081eb84)
2019-10-05 09:50:05 +03:00
Gleb Natapov
c009f7b182 messaging_service: enable reuseaddr on messaging service rpc
Fixes #4943

Message-Id: <20190918152405.GV21540@scylladb.com>
(cherry picked from commit 73e3d0a283)
2019-10-03 14:42:38 +03:00
Avi Kivity
303a56f2bd Update seastar submodule
* seastar 7dfcf334c4...75488f6ef2 (2):
  > net: socket::{set,get}_reuseaddr() should not be virtual
  > Merge "fix some tcp connection bugs and add reuseaddr option to a client socket" from Gleb

Prerequisite for #4943.
2019-10-03 14:41:34 +03:00
Tomasz Grabiec
57512d3df9 db: read: Filter-out sstables using its first and last keys
Affects single-partition reads only.

Refs #5113

When executing a query on the replica we do several things in order to
narrow down the sstable set we read from.

For tables which use LeveledCompactionStrategy, we store sstables in
an interval set and we select only sstables whose partition ranges
overlap with the queried range. Other compaction strategies don't
organize the sstables and will select all sstables at this stage. The
reasoning behind this is that for non-LCS compaction strategies the
sstables' ranges will typically overlap and using interval sets in
this case would not be effective and would result in quadratic (in
sstable count) memory consumption.

The assumption for overlap does not hold if the sstables come from
repair or streaming, which generates non-overlapping sstables.

At a later stage, for single-partition queries, we use the sstables'
bloom filter (kept in memory) to drop sstables which surely don't
contain given partition. Then we proceed to sstable indexes to narrow
down the data file range.

Tables which don't use LCS will do unnecessary I/O to read index pages
for single-partition reads if the partition is outside of the
sstable's range and the bloom filter is ineffective (Refs #5112).

This patch fixes the problem by consulting sstable's partition range
in addition to the bloom filter, so that the non-overlapping sstables
will be filtered out with certainty and not depend on bloom filter's
efficiency.

It's also faster to drop sstables based on the keys than the bloom
filter.

Tests:
  - unit (dev)
  - manual using cqlsh

Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190927122505.21932-1-tgrabiec@scylladb.com>
(cherry picked from commit b0e0f29b06)
2019-09-29 10:57:58 +03:00
Tomasz Grabiec
a894868298 sstables: Fix partition key count estimation for a range
The method sstable::estimated_keys_for_range() was severely
under-estimating the number of partitions in an sstable for a given
token range.

The first reason is that it underestimated the number of sstable index
pages covered by the range, by one. In extreme, if the requested range
falls into a single index page, we will assume 0 pages, and report 1
partition. The reason is that we were using
get_sample_indexes_for_range(), which returns entries with the keys
falling into the range, not entries for pages which may contain the
keys.

A single page can have a lot of partitions though. By default, there
is a 1:20000 ratio between summary entry size and the data file size
covered by it. If partitions are small, that can be many hundreds of
partitions.

Another reason is that we underestimate the number of partitions in an
index page. We multiply the number of pages by:

   (downsampling::BASE_SAMPLING_LEVEL * _components->summary.header.min_index_interval)
     / _components->summary.header.sampling_level

Using defaults, that means multiplying by 128. In the cassandra-stress
workload a single partition takes about 300 bytes in the data file and
summary entry is 22 bytes. That means a single page covers 22 * 20'000
= 440'000 bytes of the data file, which contains about 1'466
partitions. So we underestimate by an order of magnitude.

Underestimating the number of partitions will result in too small
bloom filters being generated for the sstables which are the output of
repair or streaming. This will make the bloom filters ineffective
which results in reads selecting more sstables than necessary.

The fix is to base the estimation on the number of index pages which
may contain keys for the range, and multiply that by the average key
count per index page.

Fixes #5112.
Refs #4994.

The output of test_key_count_estimation:

Before:

count = 10000
est = 10112
est([-inf; +inf]) = 512
est([0; 0]) = 128
est([0; 63]) = 128
est([0; 255]) = 128
est([0; 511]) = 128
est([0; 1023]) = 128
est([0; 4095]) = 256
est([0; 9999]) = 512
est([5000; 5000]) = 1
est([5000; 5063]) = 1
est([5000; 5255]) = 1
est([5000; 5511]) = 1
est([5000; 6023]) = 128
est([5000; 9095]) = 256
est([5000; 9999]) = 256
est(non-overlapping to the left) = 1
est(non-overlapping to the right) = 1

After:

count = 10000
est = 10112
est([-inf; +inf]) = 10112
est([0; 0]) = 2528
est([0; 63]) = 2528
est([0; 255]) = 2528
est([0; 511]) = 2528
est([0; 1023]) = 2528
est([0; 4095]) = 5056
est([0; 9999]) = 10112
est([5000; 5000]) = 2528
est([5000; 5063]) = 2528
est([5000; 5255]) = 2528
est([5000; 5511]) = 2528
est([5000; 6023]) = 5056
est([5000; 9095]) = 7584
est([5000; 9999]) = 7584
est(non-overlapping to the left) = 0
est(non-overlapping to the right) = 0

Tests:
  - unit (dev)

Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190927141339.31315-1-tgrabiec@scylladb.com>
(cherry picked from commit b93cc21a94)
2019-09-28 22:12:04 +03:00
Raphael S. Carvalho
a5d385d702 sstables/compaction_manager: Don't perform upgrade on shared SSTables
compaction_manager::perform_sstable_upgrade() fails when it feeds
compaction mechanism with shared sstables. Shared sstables should
be ignored when performing upgrade and so wait for reshard to pick
them up in parallel. Whenever a shared sstable is brought up either
on restart or via refresh, reshard procedure kicks in.
Reshard picks the highest supported format so the upgrade for
shared sstable will naturally take place.

Fixes #5056.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190925042414.4330-1-raphaelsc@scylladb.com>
(cherry picked from commit 571fa94eb5)
2019-09-28 17:39:09 +03:00
Avi Kivity
6413063b1b Merge "mvcc: Fix incorrect schema version being used to copy the mutation when applying (#5099)" from Tomasz
"
Currently affects only counter tables.

Introduced in 27014a2.

mutation_partition(s, mp) is incorrect because it uses s to interpret
mp, while it should use mp_schema.

We may hit this if the current node has a newer schema than the
incoming mutation. This can happen during table schema altering when we receive the
mutation from a node which hasn't processed the schema change yet.

This is undefined behavior in general. If the alter was adding or
removing columns, this may result in corruption of the write where
values of one column are inserted into a different column.

Fixes #5095.
"

* 'fix-schema-alter-counter-tables' of https://github.com/tgrabiec/scylla:
  mvcc: Fix incorrect schema verison being used to copy the mutation when applying
  mutation_partition: Track and validate schema version in debug builds
  tests: Use the correct schema to access mutation_partition

(cherry picked from commit 83bc59a89f)
2019-09-28 17:38:04 +03:00
Tomasz Grabiec
0d31c6da62 Merge "storage_proxy: tolerate view_update_write_response_handler id not found on shutdown" from Benny
1. Add assert in remove_response_handler to make crashes like in #5032 easier to understand.
2. Lookup the view_update_write_response_handler id before calling  timeout_cb and tolerate it not found.
   Just log a warning if this happened.

Fixes #5032

(cherry picked from commit 06b9818e98)
2019-09-28 17:37:40 +03:00
Tomasz Grabiec
b62bb036ed Merge "toppartitions: don't transport schema_ptr across shards" from Avi
When the toppartitions operation gathers results, it copies partition
keys with their schema_ptr:s. When these schema_ptr:s are copies
or destroyed, they can cause leaks or premature frees of the schema
in its original shard since reference count operations in are not atomic.

Fix that by converting the schema_ptr to a global_schema_ptr during
transportation.

Fixes #5104 (direct bug)
Fixes #5018 (schema prematurely freed, toppartitions previously executed on that node)
Fixes #4973 (corrupted memory pool of the same size class as schema, toppartitions previously executed on that node)

Tests: new test added that fails with the existing code in debug mode,
manual toppartitions test

(cherry picked from commit 5b0e48f25b)
2019-09-28 17:35:19 +03:00
Glauber Costa
bdabd2e7a4 toppartitions: fix typo
toppartitons -> toppartitions

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190627160937.7842-1-glauber@scylladb.com>
(cherry picked from commit d916601ea4)

Ref #5104 (prerequisite for patch)
2019-09-28 17:34:24 +03:00
Benny Halevy
d7fc7bcf9f commitlog: descriptor: skip leading path from filename
std::regex_match of the leading path may run out of stack
with long paths in debug build.

Using rfind instead to lookup the last '/' in in pathname
and skip it if found.

Fixes #4464

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190505144133.4333-1-bhalevy@scylladb.com>
(cherry picked from commit d9136f96f3)
2019-09-23 11:29:26 +03:00
Hagit Segev
21aec9c7ef release: prepare for 3.1.0.rc8 2019-09-23 07:01:02 +03:00
Asias He
02ce19e851 storage_service: Replicate and advertise tokens early in the boot up process
When a node is restarted, there is a race between gossip starts (other
nodes will mark this node up again and send requests) and the tokens are
replicated to other shards. Here is an example:

- n1, n2
- n2 is down, n1 think n2 is down
- n2 starts again, n2 starts gossip service, n1 thinks n2 is up and sends
  reads/writes to n2, but n2 hasn't replicated the token_metadata to all
  the shards.
- n2 complains:
  token_metadata - sorted_tokens is empty in first_token_index!
  token_metadata - sorted_tokens is empty in first_token_index!
  token_metadata - sorted_tokens is empty in first_token_index!
  token_metadata - sorted_tokens is empty in first_token_index!
  token_metadata - sorted_tokens is empty in first_token_index!
  token_metadata - sorted_tokens is empty in first_token_index!
  storage_proxy - Failed to apply mutation from $ip#4: std::runtime_error
  (sorted_tokens is empty in first_token_index!)

The code path looks like below:

0 stoarge_service::init_server
1    prepare_to_join()
2          add gossip application state of NET_VERSION, SCHEMA and so on.
3         _gossiper.start_gossiping().get()
4    join_token_ring()
5           _token_metadata.update_normal_tokens(tokens, get_broadcast_address());
6           replicate_to_all_cores().get()
7           storage_service::set_gossip_tokens() which adds the gossip application state of TOKENS and STATUS

The race talked above is at line 3 and line 6.

To fix, we can replicate the token_metadata early after it is filled
with the tokens read from system table before gossip starts. So that
when other nodes think this restarting node is up, the tokens are
already replicated to all the shards.

In addition, this patch also fixes the issue that other nodes might see
a node miss the TOKENS and STATUS application state in gossip if that
node failed in the middle of a restarting process, i.e., it is killed
after line 3 and before line 7. As a result we could not replace the
node.

Tests: update_cluster_layout_tests.py
Fixes: #4709
Fixes: #4723
(cherry picked from commit 3b39a59135)
2019-09-22 12:45:22 +03:00
Eliran Sinvani
37c4be5e74 Storage proxy: protect against infinite recursion in query_partition_key_range_concurrent
A recent fix to #3767 limited the amount of ranges that
can return from query_ranges_to_vnodes_generator. This with
the combination of a large amount of token ranges can lead to
an infinite recursion. The algorithm multiplies by factor of
2 (actualy a shift left by one)  the amount of requested
tokens in each recursion iteration. As long as the requested
number of ranges is greater than 0, the recursion is implicit,
and each call is scheduled separately since the call is inside
a continuation of a map reduce.
But if the amount of iterations is large enough (~32) the
counter for requested ranges zeros out and from that moment on
two things will happen:
1. The counter will remain 0 forever (0*2 == 0)
2. The map reduce future will be immediately available and this
will result in the continuation being invoked immediately.
The latter causes the recursive call to be a "regular" recursive call
thus, through the stack and not the task queue of the scheduler, and
the former causes this recursion to be infinite.
The combination creates a stack that keeps growing and eventually
overflows resulting in undefined behavior (due to memory overrun).

This patch prevent the problem from happening, it limits the growth of
the concurrency counter beyond twice the last amount of tokens returned
by the query_ranges_to_vnodes_generator.And also makes sure it is not
get stuck at zero.

Testing: * Unit test in dev mode.
         * Modified add 50 dtest that reproduce the problem

Fixes #4944

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20190922072838.14957-1-eliransin@scylladb.com>
(cherry picked from commit 280715ad45)
2019-09-22 11:59:19 +03:00
Avi Kivity
d81ac93728 Update seastar submodule
* seastar b314eb21b1...7dfcf334c4 (1):
  > iotune: fix exception handling in case test file creation fails

Fixes #5001.
2019-09-18 18:36:13 +03:00
Tomasz Grabiec
024d1563ad Revert "Simplify db::cql_type_parser::parse"
This reverts commit 7f64a6ec4b.

Fixes #5011

The reverted commit exposes #3760 for all schemas, not only those
which have UDTs.

The problem is that table schema deserialization now requires keyspace
to be present. If the replica hasn't received schema changes which
introduce the keyspace yet, the write will fail.

(cherry picked from commit 8517eecc28)
2019-09-12 20:17:39 +03:00
yaronkaikov
4a1a281e84 release: prepare for3.1.0.rc7 2019-09-11 15:15:38 +03:00
Piotr Sarna
d61dd1a933 main: make sure view_builder doesn't propagate semaphore errors
Stopping services which occurs in a destructor of deferred_action
should not throw, or it will end the program with
terminate(). View builder breaks a semaphore during its shutdown,
which results in propagating a broken_semaphore exception,
which in turn results in throwing an exception during stop().get().
In order to fix that issue, semaphore exceptions are explicitly
ignored, since they're expected to appear during shutdown.

Fixes #4875
Fixes #4995.

(cherry picked from commit 23c891923e)
2019-09-10 16:34:46 +03:00
Gleb Natapov
447c1e3bcc messaging_service: configure different streaming domain for each rpc server
A streaming domain identifies a server across shards. Each server should
have different one.

Fixes: #4953

Message-Id: <20190908085327.GR21540@scylladb.com>
(cherry picked from commit 9e9f64d90e)
2019-09-09 20:36:11 +03:00
Botond Dénes
834b92b3d7 stream_session: STREAM_MUTATION_FRAGMENTS: print errors in receive and distribute phase
Currently when an error happens during the receive and distribute phase
it is swallowed and we just return a -1 status to the remote. We only
log errors that happen during responding with the status. This means
that when streaming fails, we only know that something went wrong, but
the node on which the failure happened doesn't log anything.

Fix by also logging errors happening in the receive and distribute
phase. Also mention the phase in which the error happened in both error
log messages.

Refs: #4901
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190903115735.49915-1-bdenes@scylladb.com>
(cherry picked from commit 783277fb02)
2019-09-09 14:34:33 +03:00
Hagit Segev
2ec036f50c release: prepare for3.1.0.rc6 2019-09-08 10:32:22 +03:00
Avi Kivity
958fe2024f Update seastar submodule
* seastar c59d019d6b...b314eb21b1 (2):
  > reactor: fix false positives in the stall detector due to large task queue
  > reactor: remove unused _tasks_processed variable

Ref #4955, #4951, #4899, #4898.
2019-09-05 14:39:53 +03:00
Rafael Ávila de Espíndola
cd998b949a sstable: close file_writer if an exception in thrown
The previous code was not exception safe and would eventually cause a
file to be destroyed without being closed, causing an assert failure.

Unfortunately it doesn't seem to be possible to test this without
error injection, since using an invalid directory fails before this
code is executed.

Fixes #4948

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190904002314.79591-1-espindola@scylladb.com>
(cherry picked from commit 000514e7cc)
2019-09-05 10:14:36 +03:00
Avi Kivity
2e1e1392ea storage_proxy: protect _view_update_handlers_list iterators from invalidation
on_down() iterates over _view_update_handlers_list, but it yields during iteration,
and while it yields, elements in that list can be removed, resulting in a
use-after-free.

Prevent this by registering iterators that can be potentially invalidated, and
any time we remove an element from the list, check whether we're removing an element
that is being pointed to by a live iterator. If that is the case, advance the iterator
so that it points at a valid element (or at the end of the list).

Fixes #4912.

Tests: unit (dev)
(cherry picked from commit 301246f6c0)
2019-09-05 09:42:00 +03:00
yaronkaikov
623ea5e3d9 release: prepare for3.1.0.rc5 2019-09-02 14:42:47 +03:00
Avi Kivity
f92a7ca2bf tools: toolchain: fix dbuild in interactive mode regression
Before ede1d248af, running "tools/toolchain/dbuild -it -- bash" was
a nice way to play in the toolchain environment, for example to start
a debugger. But that commit caused containers to run in detached mode,
which is incompatible with interactive mode.

To restore the old behavior, detect that the user wants interactive mode,
and run the container in non-detached mode instead. Add the --rm flag
so the container is removed after execution (as it was before ede1d248af).

Fixes #4930.

Message-Id: <20190506175942.27361-1-avi@scylladb.com>

(cherry picked from commit db536776d9)
2019-08-29 18:33:44 +03:00
Tomasz Grabiec
d70c2db09c service: Announce the new schema version when features are enabled
Introduced in c96ee98.

We call update_schema_version() after features are enabled and we
recalculate the schema version. This method is not updating gossip
though. The node will still use it's database::version() to decide on
syncing, so it will not sync and stay inconsistent in gossip until the
next schema change.

We should call updatE_schema_version_and_announce() instead so that
the gossip state is also updated.

There is no actual schema inconsistency, but the joining node will
think there is and will wait indefinitely. Making a random schema
change would unbock it.

Fixes #4647.

Message-Id: <1566825684-18000-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit ac5ff4994a)
2019-08-27 08:35:58 +03:00
Paweł Dziepak
e4a39ed319 mutation_partition: verify row::append_cell() precondition
row::append_cell() has a precondition that the new cell column id needs
to be larger than that of any other already existing cell. If this
precondition is violated the row will end up in an invalid state. This
patch adds assertion to make sure we fail early in such cases.

(cherry picked from commit 060e3f8ac2)
2019-08-23 15:05:59 +02:00
Hagit Segev
bb70b9ed56 release: prepare for 3.1.0.rc4 2019-08-22 21:12:42 +03:00
Avi Kivity
e06e795031 Merge "database: assign proper io priority for streaming view updates" from Piotr
"
Streamed view updates parasitized on writing io priority, which is
reserved for user writes - it's now properly bound to streaming
write priority.

Verified manually by checking appropriate io metrics: scylla_io_queue_total_bytes{class="streaming_write" ...} vs scylla_io_queue_total_bytes{class="query" ...}

Tests: unit(dev)
"

Fixes #4615.

* 'assign_proper_io_priority_to_streaming_view_updates' of https://github.com/psarna/scylla:
  db,view: wrap view update generation in stream scheduling group
  database: assign proper io priority for streaming view updates

(cherry picked from commit 2c7435418a)
2019-08-22 16:20:19 +03:00
Piotr Sarna
7d56e8e5bb storage_proxy: fix iterator liveness issue in on_down (#4876)
The loop over view update handlers used a guard in order to ensure
that the object is not prematurely destroyed (thus invalidating
the iterator), but the guard itself was not in the right scope.
Fixed by replacinga 'for' loop with a 'while' loop, which moves
the iterator incrementation inside the scope in which it's still
guarded and valid.

Fixes #4866

(cherry picked from commit 526f4c42aa)
2019-08-21 19:04:56 +03:00
Avi Kivity
417250607b relocatable: switch from run-time relocation to install-time relocation
Our current relocation works by invoking the dynamic linker with the
executable as an argument. This confuses gdb since the kernel records
the dynamic linker as the executable, not the real executable.

Switch to install-time relocation with patchelf: when installing the
executable and libraries, all paths are known, and we can update the
path to the dynamic loader and to the dynamic libraries.

Since patchelf itself is dynamically linked, we have to relocate it
dynamically (with the old method of invoking it via the dynamic linker).
This is okay since it's a one-time operation and since we don't expect
to debug core dumps of patchelf crashes.

We lose the ability to run scylla directly from the uninstalled
tarball, but since the nonroot installer is already moving in the
direction of requiring install.sh, that is not a great loss, and
certainly the ability to debug is more important.

dh_strip barfs on some binaries which were treated with patchelf,
so exclude them from dh_strip. This doesn't lose any functionality,
since these binaries didn't have debug information to begin with
(they are already-stripped Fedora executables).

Fixes #4673.

(cherry-picked from commit 698b72b501)

Backport notes:
 - 3.1 doesn't call install.sh from the debian packager, so add an adjust_bin
   and call it from the debian rules file directly
 - adjusted install.sh for 3.1 prefix (/usr) compared to master prefix (/opt/scylladb)
2019-08-20 17:08:49 +03:00
Pekka Enberg
d06bcef3b7 Merge "docker: relax permission checks" from Avi
"Commit e3f7fe4 added file owner validation to prevent Scylla from
 crashing when it tries to touch a file it doesn't own. However, under
 docker, we cannot expect to pass this check since user IDs are from
 different namespaces: the process runs in a container namespace, but the
 data files usually come from a mounted volume, and so their uids are
 from the host namespace.

 So we need to relax the check. We do this by reverting b1226fb, which
 causes Scylla to run as euid 0 in docker, and by special-casing euid 0
 in the ownership verification step.

 Fixes #4823."

* 'docker-euid-0' of git://github.com/avikivity/scylla:
  main: relax file ownership checks if running under euid 0
  Revert "dist/docker/redhat: change user of scylla services to 'scylla'"

(cherry picked from commit 595434a554)
2019-08-14 08:31:10 +03:00
Tomasz Grabiec
50c5cb6861 Merge "Multishard combining reader more robust reader recreation" from Botond
Make the reader recreation logic more robust, by moving away from
deciding which fragments have to be dropped based on a bunch of
special cases, instead replacing this with a general logic which just
drops all already seen fragments (based on their position).  Special
handling is added for the case when the last position is a range
tombstone with a non full prefix starting position.  Reproducer unit
tests are added for both cases.

Refs #4695
Fixes #4733

(cherry picked from commit 0cf4fab2ca)
2019-08-14 08:30:53 +03:00
Kamil Braun
70f5154109 Fix command line argument parsing in main.
Command line arguments are parsed twice in Scylla: once in main and once in Seastar's app_template::run.
The first parse is there to check if the "--version" flag is present --- in this case the version is printed
and the program exists.  The second parsing is correct; however, most of the arguments were improperly treated
as positional arguments during the first parsing (e.g., "--network host" would treat "host" as a positional argument).
This happened because the arguments weren't known to the command line parser.
This commit fixes the issue by moving the parsing code until after the arguments are registered.
Resolves #4141.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
(cherry picked from commit f155a2d334)
2019-08-13 20:11:26 +03:00
Rafael Ávila de Espíndola
329c419c30 Always close commitlog files
We were using segment::_closed to decide whether _file was already
closed. Unfortunately they are not exactly the same thing. As far as
I understand it, segments can be closed and reused without actually
closing the file.

Found with a seastar patch that asserts on destroying an open
append_challenged_posix_file_impl.

Fixes #4745.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190721171332.7995-1-espindola@scylladb.com>
(cherry picked from commit 636e2470b1)
2019-08-13 19:59:13 +03:00
Avi Kivity
062d43c76e Merge "Unbreak the Unbreakable Linux" from Glauber
"
scylla_setup is currently broken for OEL. This happens because the
OS detection code checks for RHEL and Fedora. CentOS returns itself
as RHEL, but OEL does not.
"

Fixes #4842.

* 'unbreakable' of github.com:glommer/scylla:
  scylla_setup: be nicer about unrecognized OS
  scylla_util: recognize OEL as part of the RHEL family

(cherry picked from commit 1cf72b39a5)
2019-08-13 16:52:05 +03:00
Avi Kivity
cf4c238b28 Merge "Catch unclosed partition sstable write #4794" from Tomasz
"
Not emitting partition_end for a partition is incorrect. SStable
writer assumes that it is emitted. If it's not, the sstable will not
be written correctly. The partition index entry for the last partition
will be left partially written, which will result in errors during
reads. Also, statistics and sstable key ranges will not include the
last partition.

It's better to catch this problem at the time of writing, and not
generate bad sstables.

Another way of handling this would be to implicitly generate a
partition_end, but I don't think that we should do this. We cannot
trust the mutation stream when invariants are violated, we don't know
if this was really the last partition which was supposed to be
written. So it's safer to fail the write.

Enabled for both mc and la/ka.

Passing --abort-on-internal-error on the command line will switch to
aborting instead of throwing an exception.

The reason we don't abort by default is that it may bring the whole
cluster down and cause unavailability, while it may not be necessary
to do so. It's safer to fail just the affected operation,
e.g. repair. However, failing the operation with an exception leaves
little information for debugging the root cause. So the idea is that the
user would enable aborts on only one of the nodes in the cluster to
get a core dump and not bring the whole cluster down.
"

* 'catch-unclosed-partition-sstable-write' of https://github.com/tgrabiec/scylla:
  sstables: writer: Validate that partition is closed when the input mutation stream ends
  config, exceptions: Add helper for handling internal errors
  utils: config_file: Introduce named_value::observe()

(cherry picked from commit 95c0804731)
2019-08-08 13:13:42 +02:00
Amnon Heiman
20090c1992 init: do not allow replace-address for seeds
If a node is a seed node, it can not be started with
replace-address-first-boot or the replace-address flag.

The issue is that as a seed node it will generate new tokens instead of
replacing the existing one the user expect it to replaec when supplying
the flags.

This patch will throw a bad_configuration_error exception
in this case.

Fixes #3889

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 399d79fc6f)
2019-08-07 22:04:58 +03:00
Raphael S. Carvalho
8ffb567474 table: do not rely on undefined behavior in cleanup_sstables
It shouldn't rely on argument evaluation order, which is ub.

Fixes #4718.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 0e732ed1cf)
2019-08-07 21:48:44 +03:00
Tomasz Grabiec
710ec83d12 Merge "Fix the system.size_estimates table" from Kamil
Fixes a segfault when querying for an empty keyspace.

Also, fixes an infinite loop on smp > 1. Queries to
system.size_estimates table which are not single-partition queries
caused Scylla to go into an infinite loop inside
multishard_combining_reader::fill_buffer. This happened because
multishard_combinind_reader assumes that shards return rows belonging
to separate partitions, which was not the case for
size_estimates_mutation_reader.

Fixes #4689.

(cherry picked from commit 14700c2ac4)
2019-08-07 21:38:38 +03:00
Asias He
8d7c489436 streaming: Move stream_mutation_fragments_cmd to a new file (#4812)
Avoid including the lengthy stream_session.hh in messaging_service.

More importantly, fix the build because currently messaging_service.cc
and messaging_service.hh does not include stream_mutation_fragments_cmd.
I am not sure why it builds on my machine. Spotted this when backporting
the "streaming: Send error code from the sender to receiver" to 3.0
branch.

Refs: #4789
(cherry picked from commit 49a73aa2fc)
2019-08-07 19:11:51 +02:00
Asias He
6ec558e3a0 streaming: Send error code from the sender to receiver
In case of error on the sender side, the sender does not propagate the
error to the receiver. The sender will close the stream. As a result,
the receiver will get nullopt from the source in
get_next_mutation_fragment and pass mutation_fragment_opt with no value
to the generating_reader. In turn, the generating_reader generates end
of stream. However, the last element that the generating_reader has
generated can be any type of mutation_fragment. This makes the sstable
that consumes the generating_reader violates the mutation_fragment
stream rule.

To fix, we need to propagate the error. However RPC streaming does not
support propagate the error in the framework. User has to send an error
code explicitly.

Fixes: #4789
(cherry picked from commit bac987e32a)
(cherry picked from commit 288371ce75)
2019-08-07 19:11:33 +02:00
Tomasz Grabiec
b1e2842c8c sstables: ka/la: reader: Make sure push_ready_fragments() does not miss to emit partition_end
Currently, if there is a fragment in _ready and _out_of_range was set
after row end was consumer, push_ready_fragments() would return
without emitting partition_end.

This is problematic once we make consume_row_start() emit
partiton_start directly, because we will want to assume that all
fragments for the previous partition are emitted by then. If they're
not, then we'd emit partition_start before partition_end for the
previous partition. The fix is to make sure that
push_ready_fragments() emits everything.

Fixes #4786

(cherry picked from commit 9b8ac5ecbc)
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-08-01 13:06:08 +03:00
Avi Kivity
5a273737e3 Update seastar submodule
* seastar 82637dcab4...c59d019d6b (1):
  > reactor: fix deadlock of stall detector vs dlopen

Fixes #4759.
2019-07-31 18:31:40 +03:00
Avi Kivity
b0d2312623 toppartitions: fix race between listener removal and reads
Data listener reads are implemented as flat_mutation_readers, which
take a reference to the listener and then execute asynchronously.
The listener can be removed between the time when the reference is
taken and actual execution, resulting in a dangling pointer
dereference.

Fix by using a weak_ptr to avoid writing to a destroyed object. Note that writes
don't need protection because they execute atomically.

Fixes #4661.

Tests: unit (dev)
(cherry picked from commit e03c7003f1)
2019-07-28 13:53:40 +03:00
Avi Kivity
2f007d8e6b sstable: index_reader: close index_reader::reader more robustly
If we had an error while reading, then we would have failed to close
the reader, which in turn can cause memory corruption. Make the
closing more robust by using then_wrapped (that doesn't skip on
exception) and log the error for analysis.

Fixes #4761.

(cherry picked from commit b272db368f)
2019-07-27 18:19:57 +03:00
yaronkaikov
bebfd7b26c release: prepare for 3.1.0.rc3 2019-07-25 12:15:55 +03:00
Tomasz Grabiec
03b48b2caf database: Add missing partition slicing on streaming reader recreation
streaming_reader_lifecycle_policy::create_reader() was ignoring the
partition_slice passed to it and always creating the reader for the
full slice.

That's wrong because create_reader() is called when recreating a
reader after it's evicted. If the reader stopped in the middle of
partition we need to start from that point. Otherwise, fragments in
the mutation stream will appear duplicated or out of ordre, violating
assumptions of the consumers.

This was observed to result in repair writing incorrect sstables with
duplicated clustering rows, which results in
malformed_sstable_exception on read from those sstables.

Fixes #4659.

In v2:

  - Added an overload without partition_slice to avoid changing existing users which never slice

Tests:

  - unit (dev)
  - manual (3 node ccm + repair)

Backport: 3.1
Reviewd-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1563451506-8871-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 7604980d63)
2019-07-22 15:08:09 +03:00
Avi Kivity
95362624bc Merge "Fix disable_sstable_write synchronization with on_compaction_completion" from Benny
"
disable_sstable_write needs to acquire _sstable_deletion_sem to properly synchronize
with background deletions done by on_compaction_completion to ensure no sstables will
be created or deleted during reshuffle_sstables after
storage_service::load_new_sstables disables sstable writes.

Fixes #4622

Test: unit(dev), nodetool_additional_test.py migration_test.py
"

* 'scylla-4622-fix-disable-sstable-write' of https://github.com/bhalevy/scylla:
  table: document _sstables_lock/_sstable_deletion_sem locking order
  table: disable_sstable_write: acquire _sstable_deletion_sem
  table: uninline enable_sstable_write
  table: reshuffle_sstables: add log message

(cherry picked from commit 43690ecbdf)
2019-07-22 13:47:25 +03:00
Asias He
7865c314a5 repair: Avoid deadlock in remove_repair_meta
Start n1, n2
Create ks with rf = 2
Run repair on n2
Stop n2 in the middle of repair
n1 will notice n2 is DOWN, gossip handler will remove repair instance
with n2 which calls remove_repair_meta().

Inside remove_repair_meta(), we have:

```
1        return parallel_for_each(*repair_metas, [repair_metas] (auto& rm) {
2            return rm->stop();
3        }).then([repair_metas, from] {
4            rlogger.debug("Removed all repair_meta for single node {}", from);
5        });
```

Since 3.1, we start 16 repair instances in parallel which will create 16
readers.The reader semaphore is 10.

At line 2, it calls

```
6    future<> stop() {
7       auto gate_future = _gate.close();
8       auto writer_future = _repair_writer.wait_for_writer_done();
9       return when_all_succeed(std::move(gate_future), std::move(writer_future));
10    }
```

The gate protects the reader to read data from disk:

```
11 with_gate(_gate, [] {
12   read_rows_from_disk
13        return _repair_reader.read_mutation_fragment() --> calls reader() to read data
14 })
```

So line 7 won't return until all the 16 readers return from the call of
reader().

The problem is, the reader won't release the reader semaphore until the
reader is destroyed!
So, even if 10 out of the 16 readers have finished reading, they won't
release the semaphore. As a result, the stop() hangs forever.

To fix in short term, we can delete the reader, aka, drop the the
repair_meta object once it is stopped.

Refs: #4693
(cherry picked from commit 8774adb9d0)
2019-07-21 13:31:08 +03:00
Asias He
0e6b62244c streaming: Do not open rpc stream connection if ranges are not relevant to a shard
Given a list of ranges to stream, stream_transfer_task will create an
reader with the ranges and create a rpc stream connection on all the shards.

When user provides ranges to repair with -st -et options, e.g.,
using scylla-manger, such ranges can belong to only one shard, repair
will pass such ranges to streaming.

As a result, only one shard will have data to send while the rpc stream
connections are created on all the shards, which can cause the kernel
run out of ports in some systems.

To mitigate the problem, do not open the connection if the ranges do not
belong to the shard at all.

Refs: #4708
(cherry picked from commit 64a4c0ede2)
2019-07-21 10:23:49 +03:00
Kamil Braun
9d722a56b3 Fix timestamp_type_impl::timestamp_from_string.
Now it accepts the 'z' or 'Z' timezone, denoting UTC+00:00.
Fixes #4641.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
(cherry picked from commit 4417e78125)
2019-07-17 21:54:47 +03:00
Eliran Sinvani
7009d5fb23 auth: Prevent race between role_manager and pasword_authenticator
When scylla is started for the first time with PasswordAuthenticator
enabled, it can be that a record of the default superuser
will be created in the table with the can_login and is_superuser
set to null. It happens because the module in charge of creating
the row is the role manger and the module in charge of setting the
default password salted hash value is the password authenticator.
Those two modules are started together, it the case when the
password authenticator finish the initialization first, in the
period until the role manager completes it initialization, the row
contains those null columns and any loging attempt in this period
will cause a memory access violation since those columns are not
expected to ever be null. This patch removes the race by starting
the password authenticator and autorizer only after the role manger
finished its initialization.

Tests:
  1. Unit tests (release)
  2. Auth and cqlsh auth related dtests.

Fixes #4226

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20190714124839.8392-1-eliransin@scylladb.com>
(cherry picked from commit 997a146c7f)
2019-07-15 21:18:05 +03:00
Takuya ASADA
eb49fae020 reloc: provide libthread_db.so.1 to debug thread on gdb
In scylla-debuginfo package, we have /usr/lib/debug/opt/scylladb/libreloc/libthread_db-1.0.so-666.development-0.20190711.73a1978fb.el7.x86_64.debug
but we actually does not have libthread_db.so.1 in /opt/scylladb/libreloc
since it's not available on ldd result with scylla binary.

To debug thread, we need to add the library in a relocatable package manually.

Fixes #4673

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190711111058.7454-1-syuu@scylladb.com>
(cherry picked from commit 842f75d066)
2019-07-15 14:50:56 +03:00
Asias He
92bf928170 repair: Allow repair when a replica is down
Since commit bb56653 (repair: Sync schema from follower nodes before
repair), the behaviour of handling down node during repair has been
changed.  That is, if a repair follower is down, it will fail to sync
schema with it and the repair of the range will be skipped. This means
a range can not be repaired unless all the nodes for the replicas are up.

To fix, we filter out the nodes that is down and mark the repair is
partial and repair with the nodes that are still up.

Tests: repair_additional_test:RepairAdditionalTest.repair_with_down_nodes_2b_test
Fixes: #4616
Backports: 3.1

Message-Id: <621572af40335cf5ad222c149345281e669f7116.1562568434.git.asias@scylladb.com>
(cherry picked from commit 39ca044dab)
2019-07-11 11:44:49 +03:00
Rafael Ávila de Espíndola
deac0b0e94 mc writer: Fix exception safety when closing _index_writer
This fixes a possible cause of #4614.

From the backtrace in that issue, it looks like a file is being closed
twice. The first point in the backtrace where that seems likely is in
the MC writer.

My first idea was to add a writer::close and make it the responsibility
of the code using the writer to call it. That way we would move work
out of the destructor.

That is a bit hard since the writer is destroyed from
flat_mutation_reader::impl::~consumer_adapter and that would need to
get a close function too.

This patch instead just fixes an exception safety issue. If
_index_writer->close() throws, _index_writer is still valid and
~writer will try to close it again.

If the exception was thrown after _completed.set_value(), that would
explain the assert about _completed.set_value() being called twice.

With this patch the path outside of the destructor now moves the
writer to a local variable before trying to close it.

Fixes #4614
Message-Id: <20190710171747.27337-1-espindola@scylladb.com>

(cherry picked from commit 281f3a69f8)
2019-07-11 11:42:48 +03:00
kbr-
c294000113 Implement tuple_type_impl::to_string_impl. (#4645)
Resolves #4633.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
(cherry picked from commit 8995945052)
2019-07-08 11:08:10 +03:00
Avi Kivity
18bb2045aa Update seastar submodule
* seastar 4cdccae53b...82637dcab4 (1):
  > perftune.py: fix the i3 metal detection pattern

Ref #4057.
2019-07-02 13:49:21 +03:00
Avi Kivity
5e3276d08f Update seastar submodule to point to scylla-seastar.git
This allows us to add 3.1 specific patches to Seastar.
2019-07-02 13:47:50 +03:00
Piotr Sarna
acff367ea8 main: stop view builder conditionally
The view builder is started only if it's enabled in config,
via the view_building=true variable. Unfortunately, stopping
the builder was unconditional, which may result in failed
assertions during shutdown. To remedy this, view building
is stopped only if it was previously started.

Fixes #4589

(cherry picked from commit efa7951ea5)
2019-06-26 10:45:50 +03:00
Tomasz Grabiec
e39724a343 Merge "Sync schema before repair" from Asias
This series makes sure new schema is propagated to repair master and
follower nodes before repair.

Fixes #4575

* dev.git asias/repair_pull_schema_v2:
  migration_manager: Add sync_schema
  repair: Sync schema from follower nodes before repair

(cherry picked from commit 269e65a8db)
2019-06-26 09:35:42 +02:00
Asias He
31c4db83d8 repair: Avoid searching all the rows in to_repair_rows_on_wire
The repair_rows in row_list are sorted. It is only possible for the
current repair_row to share the same partition key with the last
repair_row inserted into repair_row_on_wire. So, no need to search from
the beginning of the repair_rows_on_wire to avoid quadratic complexity.
To fix, look at the last item in repair_rows_on_wire.

Fixes #4580
Message-Id: <08a8bfe90d1a6cf16b67c210151245879418c042.1561001271.git.asias@scylladb.com>

(cherry picked from commit b99c75429a)
2019-06-25 12:48:37 +02:00
Tomasz Grabiec
433cb93f7a Merge "Use same schema version for repair nodes" from Asias
This patch set fixes repair nodes using different schema version and
optimizes the hashing thanks to the fact now all nodes uses same schema
version.

Fixes: #4549

* seastar-dev.git asias/repair_use_same_schema.v3:
  repair: Use the same schema version for repair master and followers
  repair: Hash column kind and id instead of column name and type name

(cherry picked from commit cd1ff1fe02)
2019-06-23 20:57:16 +03:00
Avi Kivity
f553819919 Merge "Fix infinite paging for indexed queries" from Piotr
"
Fixes #4569

This series fixes the infinite paging for indexed queries issue.
Before this fix, paging indexes tended to end up in an infinite loop
of returning pages with 0 results, but has_more_pages flag set to true,
which confused the drivers.

Tests: unit(dev)
Branches: 3.0, 3.1
"

* 'fix_infinite_paging_for_indexed_queries' of https://github.com/psarna/scylla:
  tests: add test case for finishing index paging
  cql3: fix infinite paging for indexed queries

(cherry picked from commit 9229afe64f)
2019-06-23 20:54:27 +03:00
Nadav Har'El
48c34e7635 storage_proxy: fix race and crash in case of MV and other node shutdown
Recently, in merge commit 2718c90448,
we added the ability to cancel pending view-update requests when we detect
that the target node went down. This is important for view updates because
these have a very long timeout (5 minutes), and we wanted to make this
timeout even longer.

However, the implementation caused a race: Between *creating* the update's
request handler (create_write_response_handler()) and actually starting
the request with this handler (mutate_begin()), there is a preemption point
and we may end up deleting the request handler before starting the request.
So mutate_begin() must gracefully handle the case of a missing request
handler, and not crash with a segmentation fault as it did before this patch.

Eventually the lifetime management of request handlers could be refactored
to avoid this delicate fix (which requires more comments to explain than
code), or even better, it would be more correct to cancel individual writes
when a node goes down, not drop the entire handler (see issue #4523).
However, for now, let's not do such invasive changes and just fix bug that
we set out to fix.

Fixes #4386.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190620123949.22123-1-nyh@scylladb.com>
(cherry picked from commit 6e87bca65d)
2019-06-23 20:53:01 +03:00
Nadav Har'El
7f85b30941 Fix deciding whether a query uses indexing
The code that decides whether a query should used indexing was buggy - a partition key index might have influenced the decision even if the whole partition key was passed in the query (which effectively means that indexing it is not necessary).

Fixes #4539

Closes https://github.com/scylladb/scylla/pull/4544

Merged from branch 'fix_deciding_whether_a_query_uses_indexing' of git://github.com/psarna/scylla
  tests: add case for partition key index and filtering
  cql3: fix deciding if a query uses indexing

(cherry picked from commit 6aab1a61be)
2019-06-18 13:25:18 +03:00
Hagit Segev
7d14514b8a release: prepare for 3.1.0.rc2 2019-06-16 20:28:31 +03:00
Piotr Sarna
35f906f06f tests: add a test case for filtering clustering key
The test cases makes sure that clustering key restriction
columns are fetched for filtering if they form a clustering key prefix,
but not a primary key prefix (partition key columns are missing).

Ref #4541
Message-Id: <3612dc1c6c22c59ac9184220a2e7f24e8d18407c.1560410018.git.sarna@scylladb.com>

(cherry picked from commit 2c2122e057)
2019-06-16 14:36:52 +03:00
Piotr Sarna
2c50a484f5 cql3: fix qualifying clustering key restrictions for filtering
Clustering key restrictions can sometimes avoid filtering if they form
a prefix, but that can happen only if the whole partition key is
restricted as well.

Ref #4541
Message-Id: <9656396ee831e29c2b8d3ad4ef90c4a16ab71f4b.1560410018.git.sarna@scylladb.com>

(cherry picked from commit c4b935780b)
2019-06-16 14:36:52 +03:00
Piotr Sarna
24ddb46707 cql3: fix fetching clustering key columns for filtering
When a column is not present in the select clause, but used for
filtering, it usually needs to be fetched from replicas.
Sometimes it can be avoided, e.g. if primary key columns form a valid
prefix - then, they will be optimized out before filtering itself.
However, clustering key prefix can only be qualified for this
optimization if the whole partition key is restricted - otherwise
the clustering columns still need to be present for filtering.

This commit also fixes tests in cql_query_test suite, because they now
expect more values - columns fetched for filtering will be present as
well (only internally, the clients receive only data they asked for).

Fixes #4541
Message-Id: <f08ebae5562d570ece2bb7ee6c84e647345dfe48.1560410018.git.sarna@scylladb.com>

(cherry picked from commit adeea0a022)
2019-06-16 14:36:52 +03:00
Dejan Mircevski
f2fc3f32af tests: Add cquery_nofail() utility
Most tests await the result of cql_test_env::execute_cql().  Most
would also benefit from reporting errors with top-level location
included.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
(cherry picked from commit a9849ecba7)
2019-06-16 14:36:52 +03:00
Asias He
c9f488ddc2 repair: Avoid writing row with same partition key and clustering key more than once
Consider

   master: row(pk=1, ck=1, col=10)
follower1: row(pk=1, ck=1, col=20)
follower2: row(pk=1, ck=1, col=30)

When repair runs, master fetches row(pk=1, ck=1, col=20) and row(pk=1,
ck=1, col=30) from follower1 and follower2.

Then repair master sends row(pk=1, ck=1, col=10) and row(pk=1, ck=1,
col=30) to follower1, follower1 will write the row with the same
pk=1, ck=1 twice, which violates uniqueness constraints.

To fix, we apply the row with same pk and ck into the previous row.
We only needs this on repair follower because the rows can come from
multiple nodes. While on repair master, we have a sstable writer per
follower, so the rows feed into sstable writer can come from only a
single node.

Tests: repair_additional_test.py:RepairAdditionalTest.repair_same_row_diff_value_3nodes_test
Fixes: #4510
Message-Id: <cb4fbba1e10fb0018116ffe5649c0870cda34575.1560405722.git.asias@scylladb.com>
(cherry picked from commit 9079790f85)
2019-06-16 10:23:58 +03:00
Asias He
46498e77b8 repair: Allow repair_row to initialize partially
On repair follower node, only decorated_key_with_hash and the
mutation_fragment inside repair_row are used in apply_rows() to apply
the rows to disk. Allow repair_row to initialize partially and throw if
the uninitialized member is accessed to be safe.
Message-Id: <b4e5cc050c11b1bafcf997076a3e32f20d059045.1560405722.git.asias@scylladb.com>

(cherry picked from commit 912ce53fc5)
2019-06-16 10:23:50 +03:00
Piotr Jastrzebski
440f33709e sstables: distinguish empty and missing cellpath
Before this patch mc sstables writer was ignoring
empty cellpaths. This is a wrong behaviour because
it is possible to have empty key in a map. In such case,
our writer creats a wrong sstable that we can't read back.
This is becaus a complex cell expects cellpath for each
simple cell it has. When writer ignores empty cellpath
it writes nothing and instead it should write a length
of zero to the file so that we know there's an empty cellpath.

Fixes #4533

Tests: unit(release)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <46242906c691a56a915ca5994b36baf87ee633b7.1560532790.git.piotr@scylladb.com>
(cherry picked from commit a41c9763a9)
2019-06-16 09:04:24 +03:00
Pekka Enberg
34696e1582 dist/docker: Switch to 3.1 release repository 2019-06-14 08:10:02 +03:00
Takuya ASADA
43bb290705 dist/docker/redhat: change user of scylla services to 'scylla'
On branch-3.1 / master, we are getting following error:

ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/data: File not owned by current euid: 0. Owner is: 999
ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999)
ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/hints: File not owned by current euid: 0. Owner is: 999
ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999)
ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/commitlog: File not owned by current euid: 0. Owner is: 999
ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999)
ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/view_hints: File not owned by current euid: 0. Owner is: 999
ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999)

It seems like owner verification of data directory fails because
scylla-server process is running in root but data directory owned by
scylla, so we should run services as scylla user.

Fixes #4536
Message-Id: <20190611113142.23599-1-syuu@scylladb.com>

(cherry picked from commit b1226fb15a)
2019-06-14 08:02:45 +03:00
Calle Wilund
53980816de api.hh: Fix bool parsing in req_param
Fixes #4525

req_param uses boost::lexical cast to convert text->var.
However, lexical_cast does not handle textual booleans,
thus param=true causes not only wrong values, but
exceptions.

Message-Id: <20190610140511.15478-1-calle@scylladb.com>
(cherry picked from commit 26702612f3)
2019-06-13 11:56:27 +03:00
Vlad Zolotarov
c1f4617530 fix_system_distributed_tables.py: declare the 'port' argument as 'int'
If a port value passed as a string this makes the cluster.connect() to
fail with Python3.4.

Let's fix this by explicitly declaring a 'port' argument as 'int'.

Fixes #4527

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20190606133321.28225-1-vladz@scylladb.com>
(cherry picked from commit 20a610f6bc)
2019-06-13 11:45:54 +03:00
Raphael S. Carvalho
efde9416ed sstables: fix log of failure on large data entry deletion by fixing use-after-move
Fixes #4532.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190527200828.25339-1-raphaelsc@scylladb.com>
(cherry picked from commit 62aa0ea3fa)
2019-06-13 11:44:22 +03:00
Hagit Segev
224f9cee7e release: prepare for 3.1.0.rc1 2019-06-06 18:16:06 +03:00
Hagit Segev
cd1d13f805 release: prepare for 3.1.rc1 2019-06-06 15:32:54 +03:00
Pekka Enberg
899291bc9b relocate_python_scripts.py: Fix node-exporter install on Debian variants
The relocatable Python is built from Fedora packages. Unfortunately TLS
certificates are in a different location on Debian variants, which
causes "node_exporter_install" to fail as follows:

  Traceback (most recent call last):
    File "/usr/lib/scylla/libexec/node_exporter_install", line 58, in <module>
      data = curl('https://github.com/prometheus/node_exporter/releases/download/v{version}/node_exporter-{version}.linux-amd64.tar.gz'.format(version=VERSION), byte=True)
    File "/usr/lib/scylla/scylla_util.py", line 40, in curl
      with urllib.request.urlopen(req) as res:
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 222, in urlopen
      return opener.open(url, data, timeout)
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 525, in open
      response = self._open(req, data)
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 543, in _open
      '_open', req)
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 503, in _call_chain
      result = func(*args)
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 1360, in https_open
      context=self._context, check_hostname=self._check_hostname)
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 1319, in do_open
      raise URLError(err)
  urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)>
  Unable to retrieve version information
  node exporter setup failed.

Fix the problem by overriding the SSL_CERT_FILE environment variable to
point to the correct location of the TLS bundle.

Message-Id: <20190604175434.24534-1-penberg@scylladb.com>
(cherry picked from commit eb00095bca)
2019-06-05 22:20:06 +03:00
Paweł Dziepak
4130973f51 tests/perf_fast_forward: report average number of aio operations
perf_fast_forward is used to detect performance regressions. The two
main metrics used for this are fargments per second and the number of
the IO operations. The former is a median of a several runs, but the
latter is just the actual number of asynchronous IO operations performed
in the run that happened to be picked as a median frag/s-wise. There's
no always a direct correlation between frag/s and aio and the latter can
vary which makes the latter hard to compare.

In order to make this easier a new metric was introduced: "average aio"
which reports the average number of asynchronous IO operations performed
in a run. This should produce much more stable results and therefore
make the comparison more meaningful.
Message-Id: <20190430134401.19238-1-pdziepak@scylladb.com>

(cherry picked from commit 51e98e0e11)
2019-06-05 16:36:09 +03:00
Takuya ASADA
24e2c72888 dist/debian: support relocatable python3 on Debian variants
Unlike CentOS, Debian variants has python3 package on official repository,
so we don't have to use relocatable python3 on these distributions.
However, official python3 version is different on each distribution, we may
have issue because of that.
Also, our scripts and packaging implementation are becoming presuppose
existence of relocatable python3, it is causing issue on Debian
variants.

Switching to relocatable python3 on Debian variants avoid these issues,
it will easier to manage Scylla python3 environments accross multiple
distributions.

Fixes #4495

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190531112707.20082-1-syuu@scylladb.com>
(cherry picked from commit 25112408a7)
2019-06-03 17:42:26 +03:00
Raphael S. Carvalho
69cc7d89c8 compaction: do not unconditionally delete a new sstable in interrupted compaction
After incremental compaction, new sstables may have already replaced old
sstables at any point. Meaning that a new sstable is in-use by table and
a old sstable is already deleted when compaction itself is UNFINISHED.
Therefore, we should *NEVER* delete a new sstable unconditionally for an
interrupted compaction, or data loss could happen.
To fix it, we'll only delete new sstables that didn't replace anything
in the table, meaning they are unused.

Found the problem while auditting the code.

Fixes #4479.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190506134723.16639-1-raphaelsc@scylladb.com>
(cherry picked from commit ef5681486f)
2019-06-03 16:00:20 +03:00
Avi Kivity
5f6c5d566a Revert "dist/debian: support relocatable python3 on Debian variants"
This reverts commit 1fbab82553. Breaks build_deb.sh:

18:39:56 +	seastar/scripts/perftune.py seastar/scripts/seastar-addr2line seastar/scripts/perftune.py
18:39:56 Traceback (most recent call last):
18:39:56   File "./relocate_python_scripts.py", line 116, in <module>
18:39:56     fixup_scripts(archive, args.scripts)
18:39:56   File "./relocate_python_scripts.py", line 104, in fixup_scripts
18:39:56     fixup_script(output, script)
18:39:56   File "./relocate_python_scripts.py", line 79, in fixup_script
18:39:56     orig_stat = os.stat(script)
18:39:56 FileNotFoundError: [Errno 2] No such file or directory: '/data/jenkins/workspace/scylla-master/unified-deb/scylla/build/debian/scylla-package/+'
18:39:56 make[1]: *** [debian/rules:19: override_dh_auto_install] Error 1
2019-05-29 14:00:29 +03:00
Takuya ASADA
f32aea3834 reloc/python3: add license files on relocatable python3 package
It's better to have license files on our python3 distribution.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190516094329.13273-1-syuu@scylladb.com>
(cherry picked from commit 4b08a3f906)
2019-05-29 13:59:38 +03:00
Takuya ASADA
933260cb53 dist/ami: output scylla version information to AMI tags and description
Users may want to know which version of packages are used for the AMI,
it's good to have it on AMI tags and description.

To do this, we need to download .rpm from specified .repo, extract
version information from .rpm.

Fixes #4499

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190520123924.14060-2-syuu@scylladb.com>
(cherry picked from commit a55330a10b)
2019-05-29 13:59:38 +03:00
Takuya ASADA
f8ff0e1993 dist/ami: build scylla-python3 when specified --localrpm
Since we switched to relocatable python3, we need to build it for AMI too.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190520123924.14060-1-syuu@scylladb.com>
(cherry picked from commit abe44c28c5)
2019-05-29 13:59:38 +03:00
Takuya ASADA
1fbab82553 dist/debian: support relocatable python3 on Debian variants
Unlike CentOS, Debian variants has python3 package on official repository,
so we don't have to use relocatable python3 on these distributions.
However, official python3 version is different on each distribution, we may
have issue because of that.
Also, our scripts and packaging implementation are becoming presuppose
existence of relocatable python3, it is causing issue on Debian
variants.

Switching to relocatable python3 on Debian variants avoid these issues,
it will easier to manage Scylla python3 environments accross multiple
distributions.

Fixes #4495

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190526105138.677-1-syuu@scylladb.com>
(cherry picked from commit 4d119cbd6d)
2019-05-26 15:40:56 +03:00
Paweł Dziepak
c664615960 Merge "Fix empty counters handling in MC" from Piotr
"
Before this patchset empty counters were incorrectly persisted for
MC format. No value was written to disk for them. The correct way
is to still write a header that informs the counter is empty.

We also need to make sure that reading wrongly persisted empty
counters works because customers may have sstables with wrongly
persisted empty counters.

Fixes #4363
"

* 'haaawk/4363/v3' of github.com:scylladb/seastar-dev:
  sstables: add test for empty counters
  docs: add CorrectEmptyCounters to sstable-scylla-format
  sstables: Add a feature for empty counters in Scylla.db.
  sstables: Write header for empty counters
  sstables: Remove unused variables in make_counter_cell
  sstables: Handle empty counter value in read path

(cherry picked from commit 899ebe483a)
2019-05-23 22:15:00 +03:00
Benny Halevy
6a682dc5a2 cql3: select_statement: provide default initializer for parameters::_bypass_cache
Fixes #4503

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190521143300.22753-1-bhalevy@scylladb.com>
(cherry picked from commit fae4ca756c)
2019-05-23 08:28:01 +03:00
Gleb Natapov
c1271d08d3 cache_hitrate_calculator: make cache hitrate calculation preemptable
The calculation is done in a non preemptable loop over all tables, so if
numbers of tables is very large it may take a while since we also build
a string for gossiper state. Make the loop preemtable and also make
the string calculation more efficient by preallocating memory for it.
Message-Id: <20190516132748.6469-3-gleb@scylladb.com>

(cherry picked from commit 31bf4cfb5e)
2019-05-17 12:38:34 +02:00
Gleb Natapov
0d5c2501b3 cache_hitrate_calculator: do not copy stats map for each cpu
invoke_on_all() copies provided function for each shard it is executed
on, so by moving stats map into the capture we copy it for each shard
too. Avoid it by putting it into the top level object which is already
captured by reference.
Message-Id: <20190516132748.6469-2-gleb@scylladb.com>

(cherry picked from commit 4517c56a57)
2019-05-17 12:38:30 +02:00
Asias He
0dd84898ee repair: Fix use after free in remove_repair_meta for repair_metas
We should capture repair_metas so that it will not be freed until the
parallel_for_each is finished.

Fixes: #4333
Tests: repair_additional_test.py:RepairAdditionalTest.repair_kill_1_test
Message-Id: <237b20a359122a639330f9f78c67568410aef014.1557922403.git.asias@scylladb.com>
(cherry picked from commit 51c4f8cc47)
2019-05-16 11:12:09 +03:00
Avi Kivity
d568270d7f Merge "gc_clock: Fix hashing to be backwards-compatible" from Tomasz
"
Commit d0f9e00 changed the representation of the gc_clock::duration
from int32_t to int64_t.

Mutation hashing uses appending_hash<gc_clock::time_point>, which by
default feeds duration::count() into the hasher. duration::rep changed
from int32_t to int64_t, which changes the value of the hash.

This affects schema digest and query digests, resulting in mismatches
between nodes during a rolling upgrade.

Fixes #4460.
Refs #4485.
"

* tag 'fix-gc_clock-digest-v2.1' of github.com:tgrabiec/scylla:
  tests: Add test which verifies that schema digest stays the same
  tests: Add sstables for the schema digest test
  schema_tables, storage_service: Make schema digest insensitive to expired tombstones in empty partition
  db/schema_tables: Move feed_hash_for_schema_digest() to .cc file
  hashing: Introduce type-erased interface for the hasher
  hashing: Introduce C++ concept for the hasher
  hashers: Rename hasher to cryptopp_hasher
  gc_clock: Fix hashing to be backwards-compatible

(cherry picked from commit 82b91c1511)
2019-05-15 09:48:05 +03:00
Takuya ASADA
78c57f18c4 dist/ami: fix wrong path of SCYLLA-PRODUCT-FILE
Since other build_*.sh are for running inside extracted relocatable
package, they have SCYLLA-PRODUCT-FILE on top of the directory,
but build_ami.sh is not running in such condition, we need to run
SCYLLA-VERSION-GEN first, then refer to build/SCYLLA-PRODUCT-FILE.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190509110621.27468-1-syuu@scylladb.com>
(cherry picked from commit 19a973cd05)
2019-05-13 16:45:25 +03:00
Glauber Costa
ce27949797 Support AWS i3en instances
AWS just released their new instances, the i3en instances.  The instance
is verified already to work well with scylla, the only adjustments that
we need is advertise that we support it, and pre-fill the disk
information according to the performance numbers obtained by running the
instance.

Fixes #4486
Branches: 3.1

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190508170831.6003-1-glauber@scylladb.com>
(cherry picked from commit a23531ebd5)
2019-05-13 16:45:25 +03:00
Hagit Segev
6b47e23d29 release: prepare for 3.1.0.rc0 2019-05-13 15:03:34 +03:00
Piotr Sarna
1cb6cc0ac4 Revert "view: cache is_index for view pointer"
This reverts commit dbe8491655.
Caching the value was not done in a correct manner, which resulted
in longevity tests failures.

Fixes #4478

Branches: 3.1

Message-Id: <762ca9db618ca2ed7702372fbafe8ecd193dcf4d.1557129652.git.sarna@scylladb.com>
(cherry picked from commit cf8d2a5141)
2019-05-08 11:14:11 +03:00
Benny Halevy
67435eff15 time_window_backlog_tracker: fix use after free
Fixes #4465

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190430094209.13958-1-bhalevy@scylladb.com>
(cherry picked from commit 3a2fa82d6e)
2019-05-06 09:38:08 +03:00
Gleb Natapov
086ce13fb9 batchlog_manager: fix array out of bound access
endpoint_filter() function assumes that each bucket of
std::unordered_multimap contains elements with the same key only, so
its size can be used to know how many elements with a particular key
are there.  But this is not the case, elements with multiple keys may
share a bucket. Fix it by counting keys in other way.

Fixes #3229

Message-Id: <20190501133127.GE21208@scylladb.com>
(cherry picked from commit 95c6d19f6c)
2019-05-03 11:59:09 +03:00
Glauber Costa
eb9a8f4442 scylla_setup: respect user's decision not to call housekeeping
The setup script asks the user whether or not housekeeping should
be called, and in the first time the script is executed this decision
is respected.

However if the script is invoked again, that decision is not respected.

This is because the check has the form:

 if (housekeeping_cfg_file_exists) {
    version_check = ask_user();
 }
 if (version_check) { do_version_check() } else { dont_do_it() }

When it should have the form:

 if (housekeeping_cfg_file_exists) {
    version_check = ask_user();
    if (version_check) { do_version_check() } else { dont_do_it() }
 }

(Thanks python)

This is problematic in systems that are not connected to the internet, since
housekeeping will fail to run and crash the setup script.

Fixes #4462

Branches: master, branch-3.1
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190502034211.18435-1-glauber@scylladb.com>
(cherry picked from commit 47d04e49e8)
2019-05-03 09:57:31 +03:00
Glauber Costa
178fb5fe5f make scylla_util OS detection robust against empty lines
Newer versions of RHEL ship the os-release file with newlines in the
end, which our script was not prepared to handle. As such, scylla_setup
would fail.

This patch makes our OS detection robust against that.

Fixes #4473

Branches: master, branch-3.1
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190502152224.31307-1-glauber@scylladb.com>
(cherry picked from commit 99c00547ad)
2019-05-03 09:57:21 +03:00
1150 changed files with 12834 additions and 43786 deletions

4
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View File

@@ -0,0 +1,4 @@
Scylla doesn't use pull-requests, please send a patch to the [mailing list](mailto:scylladb-dev@googlegroups.com) instead.
See our [contributing guidelines](../CONTRIBUTING.md) and our [Scylla development guidelines](../HACKING.md) for more information.
If you have any questions please don't hesitate to send a mail to the [dev list](mailto:scylladb-dev@googlegroups.com).

3
.gitignore vendored
View File

@@ -19,6 +19,3 @@ CMakeLists.txt.user
__pycache__CMakeLists.txt.user
.gdbinit
resources
.pytest_cache
/expressions.tokens
tags

3
.gitmodules vendored
View File

@@ -12,6 +12,3 @@
[submodule "libdeflate"]
path = libdeflate
url = ../libdeflate
[submodule "zstd"]
path = zstd
url = ../zstd

View File

@@ -1,6 +1,6 @@
# Asking questions or requesting help
Use the [ScyllaDB user mailing list](https://groups.google.com/forum/#!forum/scylladb-users) or the [Slack workspace](http://slack.scylladb.com) for general questions and help.
Use the [ScyllaDB user mailing list](https://groups.google.com/forum/#!forum/scylladb-users) for general questions and help.
# Reporting an issue

View File

@@ -22,15 +22,6 @@ Scylla depends on the system package manager for its development dependencies.
Running `./install-dependencies.sh` (as root) installs the appropriate packages based on your Linux distribution.
On Ubuntu and Debian based Linux distributions, some packages
required to build Scylla are missing in the official upstream:
- libthrift-dev and libthrift
- antlr3-c++-dev
Try running ```sudo ./scripts/scylla_current_repo``` to add Scylla upstream,
and get the missing packages from it.
### Build system
**Note**: Compiling Scylla requires, conservatively, 2 GB of memory per native
@@ -56,7 +47,7 @@ $ ./configure.py --help
The most important option is:
- `--enable-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.
- `--{enable,disable}-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.
Source files and build targets are tracked manually in `configure.py`, so the script needs to be updated when new files or targets are added or removed.
@@ -64,7 +55,6 @@ To save time -- for instance, to avoid compiling all unit tests -- you can also
```bash
$ ninja-build build/release/tests/schema_change_test
$ ninja-build build/release/service/storage_proxy.o
```
You can also specify a single mode. For example
@@ -204,7 +194,7 @@ cqlsh and nodetool. They are available at
https://github.com/scylladb/scylla-tools-java and can be built with
```bash
$ sudo ./install-dependencies.sh
$ ./install-dependencies.sh
$ ant jar
```

View File

29
README-DPDK.md Normal file
View File

@@ -0,0 +1,29 @@
Seastar and DPDK
================
Seastar uses the Data Plane Development Kit to drive NIC hardware directly. This
provides an enormous performance boost.
To enable DPDK, specify `--enable-dpdk` to `./configure.py`, and `--dpdk-pmd` as a
run-time parameter. This will use the DPDK package provided as a git submodule with the
seastar sources.
To use your own self-compiled DPDK package, follow this procedure:
1. Setup host to compile DPDK:
- Ubuntu
`sudo apt-get install -y build-essential linux-image-extra-$(uname -r)`
2. Prepare a DPDK SDK:
- Download the latest DPDK release: `wget http://dpdk.org/browse/dpdk/snapshot/dpdk-1.8.0.tar.gz`
- Untar it.
- Edit config/common_linuxapp: set CONFIG_RTE_MBUF_REFCNT and CONFIG_RTE_LIBRTE_KNI to 'n'.
- For DPDK 1.7.x: edit config/common_linuxapp:
- Set CONFIG_RTE_LIBRTE_PMD_BOND to 'n'.
- Set CONFIG_RTE_MBUF_SCATTER_GATHER to 'n'.
- Set CONFIG_RTE_LIBRTE_IP_FRAG to 'n'.
- Start the tools/setup.sh script as root.
- Compile a linuxapp target (option 9).
- Install IGB_UIO module (option 11).
- Bind some physical port to IGB_UIO (option 17).
- Configure hugepage mappings (option 14/15).
3. Run a configure.py: `./configure.py --dpdk-target <Path to untared dpdk-1.8.0 above>/x86_64-native-linuxapp-gcc`.

View File

@@ -2,22 +2,21 @@
## Quick-start
To get the build going quickly, Scylla offers a [frozen toolchain](tools/toolchain/README.md)
which would build and run Scylla using a pre-configured Docker image.
Using the frozen toolchain will also isolate all of the installed
dependencies in a Docker container.
Assuming you have met the toolchain prerequisites, which is running
Docker in user mode, building and running is as easy as:
```bash
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla
$ ./tools/toolchain/dbuild ./build/release/scylla --developer-mode 1
```
$ git submodule update --init --recursive
$ sudo ./install-dependencies.sh
$ ./configure.py --mode=release
$ ninja-build -j4 # Assuming 4 system threads.
$ ./build/release/scylla
$ # Rejoice!
```
Please see [HACKING.md](HACKING.md) for detailed information on building and developing Scylla.
**Note**: GCC >= 8.1.1 is required to compile Scylla.
**Note**: GCC >= 8.1.1 is require to compile Scylla.
**Note**: See [frozen toolchain](tools/toolchain/README.md) for a way to build and run
on an older distribution.
## Running Scylla
@@ -38,24 +37,6 @@ Please see [HACKING.md](HACKING.md) for detailed information on building and dev
./build/release/scylla --help
```
## Scylla APIs and compatibility
By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and
Thrift. There is also experimental support for the API of Amazon DynamoDB,
but being experimental it needs to be explicitly enabled to be used. For more
information on how to enable the experimental DynamoDB compatibility in Scylla,
and the current limitations of this feature, see
[Alternator](docs/alternator/alternator.md) and
[Getting started with Alternator](docs/alternator/getting-started.md).
## Documentation
Documentation can be found in [./docs](./docs) and on the
[wiki](https://github.com/scylladb/scylla/wiki). There is currently no clear
definition of what goes where, so when looking for something be sure to check
both.
Seastar documentation can be found [here](http://docs.seastar.io/master/index.html).
User documentation can be found [here](https://docs.scylladb.com/).
## Building Fedora RPM
As a pre-requisite, you need to install [Mock](https://fedoraproject.org/wiki/Mock) on your machine:
@@ -99,5 +80,4 @@ docker run -p $(hostname -i):9042:9042 -i -t <image name>
## Contributing to Scylla
[Hacking howto](HACKING.md)
[Guidelines for contributing](CONTRIBUTING.md)

View File

@@ -1,7 +1,7 @@
#!/bin/sh
PRODUCT=scylla
VERSION=3.2.5
VERSION=3.1.4
if test -f version
then

View File

@@ -1,78 +0,0 @@
Tests for Alternator that should also pass, identically, against DynamoDB.
Tests use the boto3 library for AWS API, and the pytest frameworks
(both are available from Linux distributions, or with "pip install").
To run all tests against the local installation of Alternator on
http://localhost:8000, just run `pytest`.
Some additional pytest options:
* To run all tests in a single file, do `pytest test_table.py`.
* To run a single specific test, do `pytest test_table.py::test_create_table_unsupported_names`.
* Additional useful pytest options, especially useful for debugging tests:
* -v: show the names of each individual test running instead of just dots.
* -s: show the full output of running tests (by default, pytest captures the test's output and only displays it if a test fails)
Add the `--aws` option to test against AWS instead of the local installation.
For example - `pytest --aws test_item.py` or `pytest --aws`.
If you plan to run tests against AWS and not just a local Scylla installation,
the files ~/.aws/credentials should be configured with your AWS key:
```
[default]
aws_access_key_id = XXXXXXXXXXXXXXXXXXXX
aws_secret_access_key = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```
and ~/.aws/config with the default region to use in the test:
```
[default]
region = us-east-1
```
## HTTPS support
In order to run tests with HTTPS, run pytest with `--https` parameter. Note that the Scylla cluster needs to be provided
with alternator\_https\_port configuration option in order to initialize a HTTPS server.
Moreover, running an instance of a HTTPS server requires a certificate. Here's how to easily generate
a key and a self-signed certificate, which is sufficient to run `--https` tests:
```
openssl genrsa 2048 > scylla.key
openssl req -new -x509 -nodes -sha256 -days 365 -key scylla.key -out scylla.crt
```
If this pair is put into `conf/` directory, it will be enough
to allow the alternator HTTPS server to think it's been authorized and properly certified.
Still, boto3 library issues warnings that the certificate used for communication is self-signed,
and thus should not be trusted. For the sake of running local tests this warning is explicitly ignored.
## Authorization
By default, boto3 prepares a properly signed Authorization header with every request.
In order to confirm the authorization, the server recomputes the signature by using
user credentials (user-provided username + a secret key known by the server),
and then checks if it matches the signature from the header.
Early alternator code did not verify signatures at all, which is also allowed by the protocol.
A partial implementation of the authorization verification can be allowed by providing a Scylla
configuration parameter:
```yaml
alternator_enforce_authorization: true
```
The implementation is currently coupled with Scylla's system\_auth.roles table,
which means that an additional step needs to be performed when setting up Scylla
as the test environment. Tests will use the following credentials:
Username: `alternator`
Secret key: `secret_pass`
With CQLSH, it can be achieved by executing this snipped:
```bash
cqlsh -x "INSERT INTO system_auth.roles (role, salted_hash) VALUES ('alternator', 'secret_pass')"
```
Most tests expect the authorization to succeed, so they will pass even with `alternator_enforce_authorization`
turned off. However, test cases from `test_authorization.py` may require this option to be turned on,
so it's advised.

View File

@@ -1,179 +0,0 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# This file contains "test fixtures", a pytest concept described in
# https://docs.pytest.org/en/latest/fixture.html.
# A "fixture" is some sort of setup which an invididual test requires to run.
# The fixture has setup code and teardown code, and if multiple tests
# require the same fixture, it can be set up only once - while still allowing
# the user to run individual tests and automatically set up the fixtures they need.
import pytest
import boto3
from util import create_test_table
# Test that the Boto libraries are new enough. These tests want to test a
# large variety of DynamoDB API features, and to do this we need a new-enough
# version of the the Boto libraries (boto3 and botocore) so that they can
# access all these API features.
# In particular, the BillingMode feature was added in botocore 1.12.54.
import botocore
import sys
from distutils.version import LooseVersion
if (LooseVersion(botocore.__version__) < LooseVersion('1.12.54')):
pytest.exit("Your Boto library is too old. Please upgrade it,\ne.g. using:\n sudo pip{} install --upgrade boto3".format(sys.version_info[0]))
# By default, tests run against a local Scylla installation on localhost:8080/.
# The "--aws" option can be used to run against Amazon DynamoDB in the us-east-1
# region.
def pytest_addoption(parser):
parser.addoption("--aws", action="store_true",
help="run against AWS instead of a local Scylla installation")
parser.addoption("--https", action="store_true",
help="communicate via HTTPS protocol on port 8043 instead of HTTP when"
" running against a local Scylla installation")
# "dynamodb" fixture: set up client object for communicating with the DynamoDB
# API. Currently this chooses either Amazon's DynamoDB in the default region
# or a local Alternator installation on http://localhost:8080 - depending on the
# existence of the "--aws" option. In the future we should provide options
# for choosing other Amazon regions or local installations.
# We use scope="session" so that all tests will reuse the same client object.
@pytest.fixture(scope="session")
def dynamodb(request):
if request.config.getoption('aws'):
return boto3.resource('dynamodb')
else:
# Even though we connect to the local installation, Boto3 still
# requires us to specify dummy region and credential parameters,
# otherwise the user is forced to properly configure ~/.aws even
# for local runs.
local_url = 'https://localhost:8043' if request.config.getoption('https') else 'http://localhost:8000'
# Disable verifying in order to be able to use self-signed TLS certificates
verify = not request.config.getoption('https')
# Silencing the 'Unverified HTTPS request warning'
if request.config.getoption('https'):
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
return boto3.resource('dynamodb', endpoint_url=local_url, verify=verify,
region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='secret_pass')
# "test_table" fixture: Create and return a temporary table to be used in tests
# that need a table to work on. The table is automatically deleted at the end.
# We use scope="session" so that all tests will reuse the same client object.
# This "test_table" creates a table which has a specific key schema: both a
# partition key and a sort key, and both are strings. Other fixtures (below)
# can be used to create different types of tables.
#
# TODO: Although we are careful about deleting temporary tables when the
# fixture is torn down, in some cases (e.g., interrupted tests) we can be left
# with some tables not deleted, and they will never be deleted. Because all
# our temporary tables have the same test_table_prefix, we can actually find
# and remove these old tables with this prefix. We can have a fixture, which
# test_table will require, which on teardown will delete all remaining tables
# (possibly from an older run). Because the table's name includes the current
# time, we can also remove just tables older than a particular age. Such
# mechanism will allow running tests in parallel, without the risk of deleting
# a parallel run's temporary tables.
@pytest.fixture(scope="session")
def test_table(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
])
yield table
# We get back here when this fixture is torn down. We ask Dynamo to delete
# this table, but not wait for the deletion to complete. The next time
# we create a test_table fixture, we'll choose a different table name
# anyway.
table.delete()
# The following fixtures test_table_* are similar to test_table but create
# tables with different key schemas.
@pytest.fixture(scope="session")
def test_table_s(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' } ])
yield table
table.delete()
@pytest.fixture(scope="session")
def test_table_b(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'B' } ])
yield table
table.delete()
@pytest.fixture(scope="session")
def test_table_sb(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' }, { 'AttributeName': 'c', 'AttributeType': 'B' } ])
yield table
table.delete()
@pytest.fixture(scope="session")
def test_table_sn(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' }, { 'AttributeName': 'c', 'AttributeType': 'N' } ])
yield table
table.delete()
# "filled_test_table" fixture: Create a temporary table to be used in tests
# that involve reading data - GetItem, Scan, etc. The table is filled with
# 328 items - each consisting of a partition key, clustering key and two
# string attributes. 164 of the items are in a single partition (with the
# partition key 'long') and the 164 other items are each in a separate
# partition. Finally, a 329th item is added with different attributes.
# This table is supposed to be read from, not updated nor overwritten.
# This fixture returns both a table object and the description of all items
# inserted into it.
@pytest.fixture(scope="session")
def filled_test_table(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
])
count = 164
items = [{
'p': str(i),
'c': str(i),
'attribute': "x" * 7,
'another': "y" * 16
} for i in range(count)]
items = items + [{
'p': 'long',
'c': str(i),
'attribute': "x" * (1 + i % 7),
'another': "y" * (1 + i % 16)
} for i in range(count)]
items.append({'p': 'hello', 'c': 'world', 'str': 'and now for something completely different'})
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
yield table, items
table.delete()

View File

@@ -1,74 +0,0 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for authorization
import pytest
import botocore
from botocore.exceptions import ClientError
import boto3
import requests
# Test that trying to perform an operation signed with a wrong key
# will not succeed
def test_wrong_key_access(request, dynamodb):
print("Please make sure authorization is enforced in your Scylla installation: alternator_enforce_authorization: true")
url = dynamodb.meta.client._endpoint.host
with pytest.raises(ClientError, match='UnrecognizedClientException'):
if url.endswith('.amazonaws.com'):
boto3.client('dynamodb',endpoint_url=url, aws_access_key_id='wrong_id', aws_secret_access_key='').describe_endpoints()
else:
verify = not url.startswith('https')
boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='whatever', aws_secret_access_key='', verify=verify).describe_endpoints()
# A similar test, but this time the user is expected to exist in the database (for local tests)
def test_wrong_password(request, dynamodb):
print("Please make sure authorization is enforced in your Scylla installation: alternator_enforce_authorization: true")
url = dynamodb.meta.client._endpoint.host
with pytest.raises(ClientError, match='UnrecognizedClientException'):
if url.endswith('.amazonaws.com'):
boto3.client('dynamodb',endpoint_url=url, aws_access_key_id='alternator', aws_secret_access_key='wrong_key').describe_endpoints()
else:
verify = not url.startswith('https')
boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='wrong_key', verify=verify).describe_endpoints()
# A test ensuring that expired signatures are not accepted
def test_expired_signature(dynamodb, test_table):
url = dynamodb.meta.client._endpoint.host
print(url)
headers = {'Content-Type': 'application/x-amz-json-1.0',
'X-Amz-Date': '20170101T010101Z',
'X-Amz-Target': 'DynamoDB_20120810.DescribeEndpoints',
'Authorization': 'AWS4-HMAC-SHA256 Credential=alternator/2/3/4/aws4_request SignedHeaders=x-amz-date;host Signature=123'
}
response = requests.post(url, headers=headers)
assert not response.ok
assert "InvalidSignatureException" in response.text and "Signature expired" in response.text
# A test ensuring that signatures that exceed current time too much are not accepted.
# Watch out - this test is valid only for around next 1000 years, it needs to be updated later.
def test_signature_too_futuristic(dynamodb, test_table):
url = dynamodb.meta.client._endpoint.host
print(url)
headers = {'Content-Type': 'application/x-amz-json-1.0',
'X-Amz-Date': '30200101T010101Z',
'X-Amz-Target': 'DynamoDB_20120810.DescribeEndpoints',
'Authorization': 'AWS4-HMAC-SHA256 Credential=alternator/2/3/4/aws4_request SignedHeaders=x-amz-date;host Signature=123'
}
response = requests.post(url, headers=headers)
assert not response.ok
assert "InvalidSignatureException" in response.text and "Signature not yet current" in response.text

View File

@@ -1,253 +0,0 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for batch operations - BatchWriteItem, BatchReadItem.
# Note that various other tests in other files also use these operations,
# so they are actually tested by other tests as well.
import pytest
from botocore.exceptions import ClientError
from util import random_string, full_scan, full_query, multiset
# Test ensuring that items inserted by a batched statement can be properly extracted
# via GetItem. Schema has both hash and sort keys.
def test_basic_batch_write_item(test_table):
count = 7
with test_table.batch_writer() as batch:
for i in range(count):
batch.put_item(Item={
'p': "batch{}".format(i),
'c': "batch_ck{}".format(i),
'attribute': str(i),
'another': 'xyz'
})
for i in range(count):
item = test_table.get_item(Key={'p': "batch{}".format(i), 'c': "batch_ck{}".format(i)}, ConsistentRead=True)['Item']
assert item['p'] == "batch{}".format(i)
assert item['c'] == "batch_ck{}".format(i)
assert item['attribute'] == str(i)
assert item['another'] == 'xyz'
# Test batch write to a table with only a hash key
def test_batch_write_hash_only(test_table_s):
items = [{'p': random_string(), 'val': random_string()} for i in range(10)]
with test_table_s.batch_writer() as batch:
for item in items:
batch.put_item(item)
for item in items:
assert test_table_s.get_item(Key={'p': item['p']}, ConsistentRead=True)['Item'] == item
# Test batch delete operation (DeleteRequest): We create a bunch of items, and
# then delete them all.
def test_batch_write_delete(test_table_s):
items = [{'p': random_string(), 'val': random_string()} for i in range(10)]
with test_table_s.batch_writer() as batch:
for item in items:
batch.put_item(item)
for item in items:
assert test_table_s.get_item(Key={'p': item['p']}, ConsistentRead=True)['Item'] == item
with test_table_s.batch_writer() as batch:
for item in items:
batch.delete_item(Key={'p': item['p']})
# Verify that all items are now missing:
for item in items:
assert not 'Item' in test_table_s.get_item(Key={'p': item['p']}, ConsistentRead=True)
# Test the same batch including both writes and delete. Should be fine.
def test_batch_write_and_delete(test_table_s):
p1 = random_string()
p2 = random_string()
test_table_s.put_item(Item={'p': p1})
assert 'Item' in test_table_s.get_item(Key={'p': p1}, ConsistentRead=True)
assert not 'Item' in test_table_s.get_item(Key={'p': p2}, ConsistentRead=True)
with test_table_s.batch_writer() as batch:
batch.put_item({'p': p2})
batch.delete_item(Key={'p': p1})
assert not 'Item' in test_table_s.get_item(Key={'p': p1}, ConsistentRead=True)
assert 'Item' in test_table_s.get_item(Key={'p': p2}, ConsistentRead=True)
# It is forbidden to update the same key twice in the same batch.
# DynamoDB says "Provided list of item keys contains duplicates".
def test_batch_write_duplicate_write(test_table_s, test_table):
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table_s.batch_writer() as batch:
batch.put_item({'p': p})
batch.put_item({'p': p})
c = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table.batch_writer() as batch:
batch.put_item({'p': p, 'c': c})
batch.put_item({'p': p, 'c': c})
# But it is fine to touch items with one component the same, but the other not.
other = random_string()
with test_table.batch_writer() as batch:
batch.put_item({'p': p, 'c': c})
batch.put_item({'p': p, 'c': other})
batch.put_item({'p': other, 'c': c})
def test_batch_write_duplicate_delete(test_table_s, test_table):
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table_s.batch_writer() as batch:
batch.delete_item(Key={'p': p})
batch.delete_item(Key={'p': p})
c = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table.batch_writer() as batch:
batch.delete_item(Key={'p': p, 'c': c})
batch.delete_item(Key={'p': p, 'c': c})
# But it is fine to touch items with one component the same, but the other not.
other = random_string()
with test_table.batch_writer() as batch:
batch.delete_item(Key={'p': p, 'c': c})
batch.delete_item(Key={'p': p, 'c': other})
batch.delete_item(Key={'p': other, 'c': c})
def test_batch_write_duplicate_write_and_delete(test_table_s, test_table):
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table_s.batch_writer() as batch:
batch.delete_item(Key={'p': p})
batch.put_item({'p': p})
c = random_string()
with pytest.raises(ClientError, match='ValidationException.*duplicates'):
with test_table.batch_writer() as batch:
batch.delete_item(Key={'p': p, 'c': c})
batch.put_item({'p': p, 'c': c})
# But it is fine to touch items with one component the same, but the other not.
other = random_string()
with test_table.batch_writer() as batch:
batch.delete_item(Key={'p': p, 'c': c})
batch.put_item({'p': p, 'c': other})
batch.put_item({'p': other, 'c': c})
# Test that BatchWriteItem's PutRequest completely replaces an existing item.
# It shouldn't merge it with a previously existing value. See also the same
# test for PutItem - test_put_item_replace().
def test_batch_put_item_replace(test_table_s, test_table):
p = random_string()
with test_table_s.batch_writer() as batch:
batch.put_item(Item={'p': p, 'a': 'hi'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hi'}
with test_table_s.batch_writer() as batch:
batch.put_item(Item={'p': p, 'b': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hello'}
c = random_string()
with test_table.batch_writer() as batch:
batch.put_item(Item={'p': p, 'c': c, 'a': 'hi'})
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'a': 'hi'}
with test_table.batch_writer() as batch:
batch.put_item(Item={'p': p, 'c': c, 'b': 'hello'})
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'b': 'hello'}
# Test that if one of the batch's operations is invalid, because a key
# column is missing or has the wrong type, the entire batch is rejected
# before any write is done.
def test_batch_write_invalid_operation(test_table_s):
# test key attribute with wrong type:
p1 = random_string()
p2 = random_string()
items = [{'p': p1}, {'p': 3}, {'p': p2}]
with pytest.raises(ClientError, match='ValidationException'):
with test_table_s.batch_writer() as batch:
for item in items:
batch.put_item(item)
for p in [p1, p2]:
assert not 'item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
# test missing key attribute:
p1 = random_string()
p2 = random_string()
items = [{'p': p1}, {'x': 'whatever'}, {'p': p2}]
with pytest.raises(ClientError, match='ValidationException'):
with test_table_s.batch_writer() as batch:
for item in items:
batch.put_item(item)
for p in [p1, p2]:
assert not 'item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
# Basic test for BatchGetItem, reading several entire items.
# Schema has both hash and sort keys.
def test_batch_get_item(test_table):
items = [{'p': random_string(), 'c': random_string(), 'val': random_string()} for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
keys = [{k: x[k] for k in ('p', 'c')} for x in items]
# We use the low-level batch_get_item API for lack of a more convenient
# API. At least it spares us the need to encode the key's types...
reply = test_table.meta.client.batch_get_item(RequestItems = {test_table.name: {'Keys': keys, 'ConsistentRead': True}})
print(reply)
got_items = reply['Responses'][test_table.name]
assert multiset(got_items) == multiset(items)
# Same, with schema has just hash key.
def test_batch_get_item_hash(test_table_s):
items = [{'p': random_string(), 'val': random_string()} for i in range(10)]
with test_table_s.batch_writer() as batch:
for item in items:
batch.put_item(item)
keys = [{k: x[k] for k in ('p')} for x in items]
reply = test_table_s.meta.client.batch_get_item(RequestItems = {test_table_s.name: {'Keys': keys, 'ConsistentRead': True}})
got_items = reply['Responses'][test_table_s.name]
assert multiset(got_items) == multiset(items)
# Test what do we get if we try to read two *missing* values in addition to
# an existing one. It turns out the missing items are simply not returned,
# with no sign they are missing.
def test_batch_get_item_missing(test_table_s):
p = random_string();
test_table_s.put_item(Item={'p': p})
reply = test_table_s.meta.client.batch_get_item(RequestItems = {test_table_s.name: {'Keys': [{'p': random_string()}, {'p': random_string()}, {'p': p}], 'ConsistentRead': True}})
got_items = reply['Responses'][test_table_s.name]
assert got_items == [{'p' : p}]
# If all the keys requested from a particular table are missing, we still
# get a response array for that table - it's just empty.
def test_batch_get_item_completely_missing(test_table_s):
reply = test_table_s.meta.client.batch_get_item(RequestItems = {test_table_s.name: {'Keys': [{'p': random_string()}], 'ConsistentRead': True}})
got_items = reply['Responses'][test_table_s.name]
assert got_items == []
# Test GetItem with AttributesToGet
def test_batch_get_item_attributes_to_get(test_table):
items = [{'p': random_string(), 'c': random_string(), 'val1': random_string(), 'val2': random_string()} for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
keys = [{k: x[k] for k in ('p', 'c')} for x in items]
for wanted in [['p'], ['p', 'c'], ['val1'], ['p', 'val2']]:
reply = test_table.meta.client.batch_get_item(RequestItems = {test_table.name: {'Keys': keys, 'AttributesToGet': wanted, 'ConsistentRead': True}})
got_items = reply['Responses'][test_table.name]
expected_items = [{k: item[k] for k in wanted if k in item} for item in items]
assert multiset(got_items) == multiset(expected_items)
# Test GetItem with ProjectionExpression (just a simple one, with
# top-level attributes)
def test_batch_get_item_projection_expression(test_table):
items = [{'p': random_string(), 'c': random_string(), 'val1': random_string(), 'val2': random_string()} for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
keys = [{k: x[k] for k in ('p', 'c')} for x in items]
for wanted in [['p'], ['p', 'c'], ['val1'], ['p', 'val2']]:
reply = test_table.meta.client.batch_get_item(RequestItems = {test_table.name: {'Keys': keys, 'ProjectionExpression': ",".join(wanted), 'ConsistentRead': True}})
got_items = reply['Responses'][test_table.name]
expected_items = [{k: item[k] for k in wanted if k in item} for item in items]
assert multiset(got_items) == multiset(expected_items)

View File

@@ -1,40 +0,0 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the ConditionExpression parameter
import pytest
from botocore.exceptions import ClientError
from util import random_string
# Test that ConditionExpression works as expected
@pytest.mark.xfail(reason="ConditionExpression not yet implemented")
def test_update_condition_expression(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1',
ExpressionAttributeValues={':val1': 4})
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1',
ConditionExpression='b = :oldval',
ExpressionAttributeValues={':val1': 6, ':oldval': 4})
with pytest.raises(ClientError, match='ConditionalCheckFailedException.*'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1',
ConditionExpression='b = :oldval',
ExpressionAttributeValues={':val1': 8, ':oldval': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 6}

View File

@@ -1,49 +0,0 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Test for the DescribeEndpoints operation
import boto3
# Test that the DescribeEndpoints operation works as expected: that it
# returns one endpoint (it may return more, but it never does this in
# Amazon), and this endpoint can be used to make more requests.
def test_describe_endpoints(request, dynamodb):
endpoints = dynamodb.meta.client.describe_endpoints()['Endpoints']
# It is not strictly necessary that only a single endpoint be returned,
# but this is what Amazon DynamoDB does today (and so does Alternator).
assert len(endpoints) == 1
for endpoint in endpoints:
assert 'CachePeriodInMinutes' in endpoint.keys()
address = endpoint['Address']
# Check that the address is a valid endpoint by checking that we can
# send it another describe_endpoints() request ;-) Note that the
# address does not include the "http://" or "https://" prefix, and
# we need to choose one manually.
prefix = "https://" if request.config.getoption('https') else "http://"
verify = not request.config.getoption('https')
url = prefix + address
if address.endswith('.amazonaws.com'):
boto3.client('dynamodb',endpoint_url=url, verify=verify).describe_endpoints()
else:
# Even though we connect to the local installation, Boto3 still
# requires us to specify dummy region and credential parameters,
# otherwise the user is forced to properly configure ~/.aws even
# for local runs.
boto3.client('dynamodb',endpoint_url=url, region_name='us-east-1', aws_access_key_id='alternator', aws_secret_access_key='secret_pass', verify=verify).describe_endpoints()
# Nothing to check here - if the above call failed with an exception,
# the test would fail.

View File

@@ -1,170 +0,0 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the DescribeTable operation.
# Some attributes used only by a specific major feature will be tested
# elsewhere:
# 1. Tests for describing tables with global or local secondary indexes
# (the GlobalSecondaryIndexes and LocalSecondaryIndexes attributes)
# are in test_gsi.py and test_lsi.py.
# 2. Tests for the stream feature (LatestStreamArn, LatestStreamLabel,
# StreamSpecification) will be in the tests devoted to the stream
# feature.
# 3. Tests for describing a restored table (RestoreSummary, TableId)
# will be together with tests devoted to the backup/restore feature.
import pytest
from botocore.exceptions import ClientError
import re
import time
from util import multiset
# Test that DescribeTable correctly returns the table's name and state
def test_describe_table_basic(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert got['TableName'] == test_table.name
assert got['TableStatus'] == 'ACTIVE'
# Test that DescribeTable correctly returns the table's schema, in
# AttributeDefinitions and KeySchema attributes
@pytest.mark.xfail(reason="DescribeTable does not yet return schema")
def test_describe_table_schema(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
expected = { # Copied from test_table()'s fixture
'KeySchema': [ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
'AttributeDefinitions': [
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
]
}
assert got['KeySchema'] == expected['KeySchema']
# The list of attribute definitions may be arbitrarily reordered
assert multiset(got['AttributeDefinitions']) == multiset(expected['AttributeDefinitions'])
# Test that DescribeTable correctly returns the table's billing mode,
# in the BillingModeSummary attribute.
def test_describe_table_billing(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert got['BillingModeSummary']['BillingMode'] == 'PAY_PER_REQUEST'
# The BillingModeSummary should also contain a
# LastUpdateToPayPerRequestDateTime attribute, which is a date.
# We don't know what date this is supposed to be, but something we
# do know is that the test table was created already with this billing
# mode, so the table creation date should be the same as the billing
# mode setting date.
assert 'LastUpdateToPayPerRequestDateTime' in got['BillingModeSummary']
assert got['BillingModeSummary']['LastUpdateToPayPerRequestDateTime'] == got['CreationDateTime']
# Test that DescribeTable correctly returns the table's creation time.
# We don't know what this creation time is supposed to be, so this test
# cannot be very thorough... We currently just tests against something we
# know to be wrong - returning the *current* time, which changes on every
# call.
@pytest.mark.xfail(reason="DescribeTable does not return table creation time")
def test_describe_table_creation_time(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert 'CreationDateTime' in got
time1 = got['CreationDateTime']
time.sleep(1)
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
time2 = got['CreationDateTime']
assert time1 == time2
# Test that DescribeTable returns the table's estimated item count
# in the ItemCount attribute. Unfortunately, there's not much we can
# really test here... The documentation says that the count can be
# delayed by six hours, so the number we get here may have no relation
# to the current number of items in the test table. The attribute should exist,
# though. This test does NOT verify that ItemCount isn't always returned as
# zero - such stub implementation will pass this test.
@pytest.mark.xfail(reason="DescribeTable does not return table item count")
def test_describe_table_item_count(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert 'ItemCount' in got
# Similar test for estimated size in bytes - TableSizeBytes - which again,
# may reflect the size as long as six hours ago.
@pytest.mark.xfail(reason="DescribeTable does not return table size")
def test_describe_table_size(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert 'TableSizeBytes' in got
# Test the ProvisionedThroughput attribute returned by DescribeTable.
# This is a very partial test: Our test table is configured without
# provisioned throughput, so obviously it will not have interesting settings
# for it. DynamoDB returns zeros for some of the attributes, even though
# the documentation suggests missing values should have been fine too.
@pytest.mark.xfail(reason="DescribeTable does not return provisioned throughput")
def test_describe_table_provisioned_throughput(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert got['ProvisionedThroughput']['NumberOfDecreasesToday'] == 0
assert got['ProvisionedThroughput']['WriteCapacityUnits'] == 0
assert got['ProvisionedThroughput']['ReadCapacityUnits'] == 0
# This is a silly test for the RestoreSummary attribute in DescribeTable -
# it should not exist in a table not created by a restore. When testing
# the backup/restore feature, we will have more meaninful tests for the
# value of this attribute in that case.
def test_describe_table_restore_summary(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert not 'RestoreSummary' in got
# This is a silly test for the SSEDescription attribute in DescribeTable -
# by default, a table is encrypted with AWS-owned keys, not using client-
# owned keys, and the SSEDescription attribute is not returned at all.
def test_describe_table_encryption(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert not 'SSEDescription' in got
# This is a silly test for the StreamSpecification attribute in DescribeTable -
# when there are no streams, this attribute should be missing.
def test_describe_table_stream_specification(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert not 'StreamSpecification' in got
# Test that the table has an ARN, a unique identifier for the table which
# includes which zone it is on, which account, and of course the table's
# name. The ARN format is described in
# https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html#genref-arns
@pytest.mark.xfail(reason="DescribeTable does not return ARN")
def test_describe_table_arn(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert 'TableArn' in got and got['TableArn'].startswith('arn:')
# Test that the table has a TableId.
# TODO: Figure out what is this TableId supposed to be, it is just a
# unique id that is created with the table and never changes? Or anything
# else?
@pytest.mark.xfail(reason="DescribeTable does not return TableId")
def test_describe_table_id(test_table):
got = test_table.meta.client.describe_table(TableName=test_table.name)['Table']
assert 'TableId' in got
# DescribeTable error path: trying to describe a non-existent table should
# result in a ResourceNotFoundException.
def test_describe_table_non_existent_table(dynamodb):
with pytest.raises(ClientError, match='ResourceNotFoundException') as einfo:
dynamodb.meta.client.describe_table(TableName='non_existent_table')
# As one of the first error-path tests that we wrote, let's test in more
# detail that the error reply has the appropriate fields:
response = einfo.value.response
print(response)
err = response['Error']
assert err['Code'] == 'ResourceNotFoundException'
assert re.match(err['Message'], 'Requested resource not found: Table: non_existent_table not found')

File diff suppressed because it is too large Load Diff

View File

@@ -1,801 +0,0 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests of GSI (Global Secondary Indexes)
#
# Note that many of these tests are slower than usual, because many of them
# need to create new tables and/or new GSIs of different types, operations
# which are extremely slow in DynamoDB, often taking minutes (!).
import pytest
import time
from botocore.exceptions import ClientError, ParamValidationError
from util import create_test_table, random_string, full_scan, full_query, multiset, list_tables
# GSIs only support eventually consistent reads, so tests that involve
# writing to a table and then expect to read something from it cannot be
# guaranteed to succeed without retrying the read. The following utility
# functions make it easy to write such tests.
# Note that in practice, there repeated reads are almost never necessary:
# Amazon claims that "Changes to the table data are propagated to the global
# secondary indexes within a fraction of a second, under normal conditions"
# and indeed, in practice, the tests here almost always succeed without a
# retry.
def assert_index_query(table, index_name, expected_items, **kwargs):
for i in range(3):
if multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs)):
return
print('assert_index_query retrying')
time.sleep(1)
assert multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs))
def assert_index_scan(table, index_name, expected_items, **kwargs):
for i in range(3):
if multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs)):
return
print('assert_index_scan retrying')
time.sleep(1)
assert multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs))
# Although quite silly, it is actually allowed to create an index which is
# identical to the base table.
def test_gsi_identical(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
'Projection': { 'ProjectionType': 'ALL' }
}
])
items = [{'p': random_string(), 'x': random_string()} for i in range(10)]
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Scanning the entire table directly or via the index yields the same
# results (in different order).
assert multiset(items) == multiset(full_scan(table))
assert_index_scan(table, 'hello', items)
# We can't scan a non-existant index
with pytest.raises(ClientError, match='ValidationException'):
full_scan(table, IndexName='wrong')
table.delete()
# One of the simplest forms of a non-trivial GSI: The base table has a hash
# and sort key, and the index reverses those roles. Other attributes are just
# copied.
@pytest.fixture(scope="session")
def test_table_gsi_1(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'c', 'KeyType': 'HASH' },
{ 'AttributeName': 'p', 'KeyType': 'RANGE' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
],
)
yield table
table.delete()
def test_gsi_simple(test_table_gsi_1):
items = [{'p': random_string(), 'c': random_string(), 'x': random_string()} for i in range(10)]
with test_table_gsi_1.batch_writer() as batch:
for item in items:
batch.put_item(item)
c = items[0]['c']
# The index allows a query on just a specific sort key, which isn't
# allowed on the base table.
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table_gsi_1, KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
expected_items = [x for x in items if x['c'] == c]
assert_index_query(test_table_gsi_1, 'hello', expected_items,
KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
# Scanning the entire table directly or via the index yields the same
# results (in different order).
assert_index_scan(test_table_gsi_1, 'hello', full_scan(test_table_gsi_1))
def test_gsi_same_key(test_table_gsi_1):
c = random_string();
# All these items have the same sort key 'c' but different hash key 'p'
items = [{'p': random_string(), 'c': c, 'x': random_string()} for i in range(10)]
with test_table_gsi_1.batch_writer() as batch:
for item in items:
batch.put_item(item)
assert_index_query(test_table_gsi_1, 'hello', items,
KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
# Check we get an appropriate error when trying to read a non-existing index
# of an existing table. Although the documentation specifies that a
# ResourceNotFoundException should be returned if "The operation tried to
# access a nonexistent table or index", in fact in the specific case that
# the table does exist but an index does not - we get a ValidationException.
def test_gsi_missing_index(test_table_gsi_1):
with pytest.raises(ClientError, match='ValidationException.*wrong_name'):
full_query(test_table_gsi_1, IndexName='wrong_name',
KeyConditions={'x': {'AttributeValueList': [1], 'ComparisonOperator': 'EQ'}})
with pytest.raises(ClientError, match='ValidationException.*wrong_name'):
full_scan(test_table_gsi_1, IndexName='wrong_name')
# Nevertheless, if the table itself does not exist, a query should return
# a ResourceNotFoundException, not ValidationException:
def test_gsi_missing_table(dynamodb):
with pytest.raises(ClientError, match='ResourceNotFoundException'):
dynamodb.meta.client.query(TableName='nonexistent_table', IndexName='any_name', KeyConditions={'x': {'AttributeValueList': [1], 'ComparisonOperator': 'EQ'}})
with pytest.raises(ClientError, match='ResourceNotFoundException'):
dynamodb.meta.client.scan(TableName='nonexistent_table', IndexName='any_name')
# Verify that strongly-consistent reads on GSI are *not* allowed.
@pytest.mark.xfail(reason="GSI strong consistency not checked")
def test_gsi_strong_consistency(test_table_gsi_1):
with pytest.raises(ClientError, match='ValidationException.*Consistent'):
full_query(test_table_gsi_1, KeyConditions={'c': {'AttributeValueList': ['hi'], 'ComparisonOperator': 'EQ'}}, IndexName='hello', ConsistentRead=True)
with pytest.raises(ClientError, match='ValidationException.*Consistent'):
full_scan(test_table_gsi_1, IndexName='hello', ConsistentRead=True)
# Verify that a GSI is correctly listed in describe_table
@pytest.mark.xfail(reason="DescribeTable provides index names only, no size or item count")
def test_gsi_describe(test_table_gsi_1):
desc = test_table_gsi_1.meta.client.describe_table(TableName=test_table_gsi_1.name)
assert 'Table' in desc
assert 'GlobalSecondaryIndexes' in desc['Table']
gsis = desc['Table']['GlobalSecondaryIndexes']
assert len(gsis) == 1
gsi = gsis[0]
assert gsi['IndexName'] == 'hello'
assert 'IndexSizeBytes' in gsi # actual size depends on content
assert 'ItemCount' in gsi
assert gsi['Projection'] == {'ProjectionType': 'ALL'}
assert gsi['IndexStatus'] == 'ACTIVE'
assert gsi['KeySchema'] == [{'KeyType': 'HASH', 'AttributeName': 'c'},
{'KeyType': 'RANGE', 'AttributeName': 'p'}]
# TODO: check also ProvisionedThroughput, IndexArn
# When a GSI's key includes an attribute not in the base table's key, we
# need to remember to add its type to AttributeDefinitions.
def test_gsi_missing_attribute_definition(dynamodb):
with pytest.raises(ClientError, match='ValidationException.*AttributeDefinitions'):
create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' } ],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [ { 'AttributeName': 'c', 'KeyType': 'HASH' } ],
'Projection': { 'ProjectionType': 'ALL' }
}
])
# test_table_gsi_1_hash_only is a variant of test_table_gsi_1: It's another
# case where the index doesn't involve non-key attributes. Again the base
# table has a hash and sort key, but in this case the index has *only* a
# hash key (which is the base's hash key). In the materialized-view-based
# implementation, we need to remember the other part of the base key as a
# clustering key.
@pytest.fixture(scope="session")
def test_table_gsi_1_hash_only(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'c', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
],
)
yield table
table.delete()
def test_gsi_key_not_in_index(test_table_gsi_1_hash_only):
# Test with items with different 'c' values:
items = [{'p': random_string(), 'c': random_string(), 'x': random_string()} for i in range(10)]
with test_table_gsi_1_hash_only.batch_writer() as batch:
for item in items:
batch.put_item(item)
c = items[0]['c']
expected_items = [x for x in items if x['c'] == c]
assert_index_query(test_table_gsi_1_hash_only, 'hello', expected_items,
KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
# Test items with the same sort key 'c' but different hash key 'p'
c = random_string();
items = [{'p': random_string(), 'c': c, 'x': random_string()} for i in range(10)]
with test_table_gsi_1_hash_only.batch_writer() as batch:
for item in items:
batch.put_item(item)
assert_index_query(test_table_gsi_1_hash_only, 'hello', items,
KeyConditions={'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}})
# Scanning the entire table directly or via the index yields the same
# results (in different order).
assert_index_scan(test_table_gsi_1_hash_only, 'hello', full_scan(test_table_gsi_1_hash_only))
# A second scenario of GSI. Base table has just hash key, Index has a
# different hash key - one of the non-key attributes from the base table.
@pytest.fixture(scope="session")
def test_table_gsi_2(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'x', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
yield table
table.delete()
def test_gsi_2(test_table_gsi_2):
items1 = [{'p': random_string(), 'x': random_string()} for i in range(10)]
x1 = items1[0]['x']
x2 = random_string()
items2 = [{'p': random_string(), 'x': x2} for i in range(10)]
items = items1 + items2
with test_table_gsi_2.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [i for i in items if i['x'] == x1]
assert_index_query(test_table_gsi_2, 'hello', expected_items,
KeyConditions={'x': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})
expected_items = [i for i in items if i['x'] == x2]
assert_index_query(test_table_gsi_2, 'hello', expected_items,
KeyConditions={'x': {'AttributeValueList': [x2], 'ComparisonOperator': 'EQ'}})
# Test that when a table has a GSI, if the indexed attribute is missing, the
# item is added to the base table but not the index.
def test_gsi_missing_attribute(test_table_gsi_2):
p1 = random_string()
x1 = random_string()
test_table_gsi_2.put_item(Item={'p': p1, 'x': x1})
p2 = random_string()
test_table_gsi_2.put_item(Item={'p': p2})
# Both items are now in the base table:
assert test_table_gsi_2.get_item(Key={'p': p1})['Item'] == {'p': p1, 'x': x1}
assert test_table_gsi_2.get_item(Key={'p': p2})['Item'] == {'p': p2}
# But only the first item is in the index: It can be found using a
# Query, and a scan of the index won't find it (but a scan on the base
# will).
assert_index_query(test_table_gsi_2, 'hello', [{'p': p1, 'x': x1}],
KeyConditions={'x': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})
assert any([i['p'] == p1 for i in full_scan(test_table_gsi_2)])
# Note: with eventually consistent read, we can't really be sure that
# and item will "never" appear in the index. We do this test last,
# so if we had a bug and such item did appear, hopefully we had enough
# time for the bug to become visible. At least sometimes.
assert not any([i['p'] == p2 for i in full_scan(test_table_gsi_2, IndexName='hello')])
# Test when a table has a GSI, if the indexed attribute has the wrong type,
# the update operation is rejected, and is added to neither base table nor
# index. This is different from the case of a *missing* attribute, where
# the item is added to the base table but not index.
# The following three tests test_gsi_wrong_type_attribute_{put,update,batch}
# test updates using PutItem, UpdateItem, and BatchWriteItem respectively.
def test_gsi_wrong_type_attribute_put(test_table_gsi_2):
# PutItem with wrong type for 'x' is rejected, item isn't created even
# in the base table.
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*mismatch'):
test_table_gsi_2.put_item(Item={'p': p, 'x': 3})
assert not 'Item' in test_table_gsi_2.get_item(Key={'p': p}, ConsistentRead=True)
def test_gsi_wrong_type_attribute_update(test_table_gsi_2):
# An UpdateItem with wrong type for 'x' is also rejected, but naturally
# if the item already existed, it remains as it was.
p = random_string()
x = random_string()
test_table_gsi_2.put_item(Item={'p': p, 'x': x})
with pytest.raises(ClientError, match='ValidationException.*mismatch'):
test_table_gsi_2.update_item(Key={'p': p}, AttributeUpdates={'x': {'Value': 3, 'Action': 'PUT'}})
assert test_table_gsi_2.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'x': x}
def test_gsi_wrong_type_attribute_batch(test_table_gsi_2):
# In a BatchWriteItem, if any update is forbidden, the entire batch is
# rejected, and none of the updates happen at all.
p1 = random_string()
p2 = random_string()
p3 = random_string()
items = [{'p': p1, 'x': random_string()},
{'p': p2, 'x': 3},
{'p': p3, 'x': random_string()}]
with pytest.raises(ClientError, match='ValidationException.*mismatch'):
with test_table_gsi_2.batch_writer() as batch:
for item in items:
batch.put_item(item)
for p in [p1, p2, p3]:
assert not 'Item' in test_table_gsi_2.get_item(Key={'p': p}, ConsistentRead=True)
# A third scenario of GSI. Index has a hash key and a sort key, both are
# non-key attributes from the base table. This scenario may be very
# difficult to implement in Alternator because Scylla's materialized-views
# implementation only allows one new key column in the view, and here
# we need two (which, also, aren't actual columns, but map items).
@pytest.fixture(scope="session")
def test_table_gsi_3(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'a', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'a', 'KeyType': 'HASH' },
{ 'AttributeName': 'b', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
yield table
table.delete()
def test_gsi_3(test_table_gsi_3):
items = [{'p': random_string(), 'a': random_string(), 'b': random_string()} for i in range(10)]
with test_table_gsi_3.batch_writer() as batch:
for item in items:
batch.put_item(item)
assert_index_query(test_table_gsi_3, 'hello', [items[3]],
KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})
@pytest.mark.xfail(reason="GSI in alternator currently have a bug on updating the second regular base column")
def test_gsi_update_second_regular_base_column(test_table_gsi_3):
items = [{'p': random_string(), 'a': random_string(), 'b': random_string(), 'd': random_string()} for i in range(10)]
with test_table_gsi_3.batch_writer() as batch:
for item in items:
batch.put_item(item)
items[3]['b'] = 'updated'
test_table_gsi_3.update_item(Key={'p': items[3]['p']}, AttributeUpdates={'b': {'Value': 'updated', 'Action': 'PUT'}})
assert_index_query(test_table_gsi_3, 'hello', [items[3]],
KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})
# A fourth scenario of GSI. Two GSIs on a single base table.
@pytest.fixture(scope="session")
def test_table_gsi_4(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'a', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello_a',
'KeySchema': [
{ 'AttributeName': 'a', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'ALL' }
},
{ 'IndexName': 'hello_b',
'KeySchema': [
{ 'AttributeName': 'b', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
yield table
table.delete()
# Test that a base table with two GSIs updates both as expected.
def test_gsi_4(test_table_gsi_4):
items = [{'p': random_string(), 'a': random_string(), 'b': random_string()} for i in range(10)]
with test_table_gsi_4.batch_writer() as batch:
for item in items:
batch.put_item(item)
assert_index_query(test_table_gsi_4, 'hello_a', [items[3]],
KeyConditions={'a': {'AttributeValueList': [items[3]['a']], 'ComparisonOperator': 'EQ'}})
assert_index_query(test_table_gsi_4, 'hello_b', [items[3]],
KeyConditions={'b': {'AttributeValueList': [items[3]['b']], 'ComparisonOperator': 'EQ'}})
# Verify that describe_table lists the two GSIs.
def test_gsi_4_describe(test_table_gsi_4):
desc = test_table_gsi_4.meta.client.describe_table(TableName=test_table_gsi_4.name)
assert 'Table' in desc
assert 'GlobalSecondaryIndexes' in desc['Table']
gsis = desc['Table']['GlobalSecondaryIndexes']
assert len(gsis) == 2
assert multiset([g['IndexName'] for g in gsis]) == multiset(['hello_a', 'hello_b'])
# A scenario for GSI in which the table has both hash and sort key
@pytest.fixture(scope="session")
def test_table_gsi_5(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'x', 'KeyType': 'RANGE' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
yield table
table.delete()
def test_gsi_5(test_table_gsi_5):
items1 = [{'p': random_string(), 'c': random_string(), 'x': random_string()} for i in range(10)]
p1, x1 = items1[0]['p'], items1[0]['x']
p2, x2 = random_string(), random_string()
items2 = [{'p': p2, 'c': random_string(), 'x': x2} for i in range(10)]
items = items1 + items2
with test_table_gsi_5.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [i for i in items if i['p'] == p1 and i['x'] == x1]
assert_index_query(test_table_gsi_5, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'x': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})
expected_items = [i for i in items if i['p'] == p2 and i['x'] == x2]
assert_index_query(test_table_gsi_5, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'x': {'AttributeValueList': [x2], 'ComparisonOperator': 'EQ'}})
# All tests above involved "ProjectionType: ALL". This test checks how
# "ProjectionType:: KEYS_ONLY" works. We note that it projects both
# the index's key, *and* the base table's key. So items which had different
# base-table keys cannot suddenly become the same item in the index.
@pytest.mark.xfail(reason="GSI not supported")
def test_gsi_projection_keys_only(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'x', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'KEYS_ONLY' }
}
])
items = [{'p': random_string(), 'x': random_string(), 'y': random_string()} for i in range(10)]
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
wanted = ['p', 'x']
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert_index_scan(table, 'hello', expected_items)
table.delete()
# Test for "ProjectionType:: INCLUDE". The secondary table includes the
# its own and the base's keys (as in KEYS_ONLY) plus the extra keys given
# in NonKeyAttributes.
@pytest.mark.xfail(reason="GSI not supported")
def test_gsi_projection_include(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'x', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'INCLUDE',
'NonKeyAttributes': ['a', 'b'] }
}
])
# Some items have the projected attributes a,b and some don't:
items = [{'p': random_string(), 'x': random_string(), 'a': random_string(), 'b': random_string(), 'y': random_string()} for i in range(10)]
items = items + [{'p': random_string(), 'x': random_string(), 'y': random_string()} for i in range(10)]
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
wanted = ['p', 'x', 'a', 'b']
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert_index_scan(table, 'hello', expected_items)
print(len(expected_items))
table.delete()
# DynamoDB's says the "Projection" argument of GlobalSecondaryIndexes is
# mandatory, and indeed Boto3 enforces that it must be passed. The
# documentation then goes on to claim that the "ProjectionType" member of
# "Projection" is optional - and Boto3 allows it to be missing. But in
# fact, it is not allowed to be missing: DynamoDB complains: "Unknown
# ProjectionType: null".
@pytest.mark.xfail(reason="GSI not supported")
def test_gsi_missing_projection_type(dynamodb):
with pytest.raises(ClientError, match='ValidationException.*ProjectionType'):
create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
'Projection': {}
}
])
# update_table() for creating a GSI is an asynchronous operation.
# The table's TableStatus changes from ACTIVE to UPDATING for a short while
# and then goes back to ACTIVE, but the new GSI's IndexStatus appears as
# CREATING, until eventually (after a *long* time...) it becomes ACTIVE.
# During the CREATING phase, at some point the Backfilling attribute also
# appears, until it eventually disappears. We need to wait until all three
# markers indicate completion.
# Unfortunately, while boto3 has a client.get_waiter('table_exists') to
# wait for a table to exists, there is no such function to wait for an
# index to come up, so we need to code it ourselves.
def wait_for_gsi(table, gsi_name):
start_time = time.time()
# Surprisingly, even for tiny tables this can take a very long time
# on DynamoDB - often many minutes!
for i in range(300):
time.sleep(1)
desc = table.meta.client.describe_table(TableName=table.name)
table_status = desc['Table']['TableStatus']
if table_status != 'ACTIVE':
print('%d Table status still %s' % (i, table_status))
continue
index_desc = [x for x in desc['Table']['GlobalSecondaryIndexes'] if x['IndexName'] == gsi_name]
assert len(index_desc) == 1
index_status = index_desc[0]['IndexStatus']
if index_status != 'ACTIVE':
print('%d Index status still %s' % (i, index_status))
continue
# When the index is ACTIVE, this must be after backfilling completed
assert not 'Backfilling' in index_desc[0]
print('wait_for_gsi took %d seconds' % (time.time() - start_time))
return
raise AssertionError("wait_for_gsi did not complete")
# Similarly to how wait_for_gsi() waits for a GSI to finish adding,
# this function waits for a GSI to be finally deleted.
def wait_for_gsi_gone(table, gsi_name):
start_time = time.time()
for i in range(300):
time.sleep(1)
desc = table.meta.client.describe_table(TableName=table.name)
table_status = desc['Table']['TableStatus']
if table_status != 'ACTIVE':
print('%d Table status still %s' % (i, table_status))
continue
if 'GlobalSecondaryIndexes' in desc['Table']:
index_desc = [x for x in desc['Table']['GlobalSecondaryIndexes'] if x['IndexName'] == gsi_name]
if len(index_desc) != 0:
index_status = index_desc[0]['IndexStatus']
print('%d Index status still %s' % (i, index_status))
continue
print('wait_for_gsi_gone took %d seconds' % (time.time() - start_time))
return
raise AssertionError("wait_for_gsi_gone did not complete")
# All tests above involved creating a new table with a GSI up-front. This
# test will test creating a base table *without* a GSI, putting data in
# it, and then adding a GSI with the UpdateTable operation. This starts
# a backfilling stage - where data is copied to the index - and when this
# stage is done, the index is usable. Items whose indexed column contains
# the wrong type are silently ignored and not added to the index (it would
# not have been possible to add such items if the GSI was already configured
# when they were added).
@pytest.mark.xfail(reason="GSI not supported")
def test_gsi_backfill(dynamodb):
# First create, and fill, a table without GSI. The items in items1
# will have the appropriate string type for 'x' and will later get
# indexed. Items in item2 have no value for 'x', and in item3 'x' is in
# not a string; So the items in items2 and items3 will be missing
# in the index we'll create later.
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[ { 'AttributeName': 'p', 'AttributeType': 'S' } ])
items1 = [{'p': random_string(), 'x': random_string(), 'y': random_string()} for i in range(10)]
items2 = [{'p': random_string(), 'y': random_string()} for i in range(10)]
items3 = [{'p': random_string(), 'x': i} for i in range(10)]
items = items1 + items2 + items3
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
assert multiset(items) == multiset(full_scan(table))
# Now use UpdateTable to create the GSI
dynamodb.meta.client.update_table(TableName=table.name,
AttributeDefinitions=[{ 'AttributeName': 'x', 'AttributeType': 'S' }],
GlobalSecondaryIndexUpdates=[ { 'Create':
{ 'IndexName': 'hello',
'KeySchema': [{ 'AttributeName': 'x', 'KeyType': 'HASH' }],
'Projection': { 'ProjectionType': 'ALL' }
}}])
# update_table is an asynchronous operation. We need to wait until it
# finishes and the table is backfilled.
wait_for_gsi(table, 'hello')
# As explained above, only items in items1 got copied to the gsi,
# and Scan on them works as expected.
# Note that we don't need to retry the reads here (i.e., use the
# assert_index_scan() or assert_index_query() functions) because after
# we waited for backfilling to complete, we know all the pre-existing
# data is already in the index.
assert multiset(items1) == multiset(full_scan(table, IndexName='hello'))
# We can also use Query on the new GSI, to search on the attribute x:
assert multiset([items1[3]]) == multiset(full_query(table,
IndexName='hello',
KeyConditions={'x': {'AttributeValueList': [items1[3]['x']], 'ComparisonOperator': 'EQ'}}))
# Let's also test that we cannot add another index with the same name
# that already exists
with pytest.raises(ClientError, match='ValidationException.*already exists'):
dynamodb.meta.client.update_table(TableName=table.name,
AttributeDefinitions=[{ 'AttributeName': 'y', 'AttributeType': 'S' }],
GlobalSecondaryIndexUpdates=[ { 'Create':
{ 'IndexName': 'hello',
'KeySchema': [{ 'AttributeName': 'y', 'KeyType': 'HASH' }],
'Projection': { 'ProjectionType': 'ALL' }
}}])
table.delete()
# Test deleting an existing GSI using UpdateTable
@pytest.mark.xfail(reason="GSI not supported")
def test_gsi_delete(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'x', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'x', 'KeyType': 'HASH' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
items = [{'p': random_string(), 'x': random_string()} for i in range(10)]
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
# So far, we have the index for "x" and can use it:
assert_index_query(table, 'hello', [items[3]],
KeyConditions={'x': {'AttributeValueList': [items[3]['x']], 'ComparisonOperator': 'EQ'}})
# Now use UpdateTable to delete the GSI for "x"
dynamodb.meta.client.update_table(TableName=table.name,
GlobalSecondaryIndexUpdates=[{ 'Delete':
{ 'IndexName': 'hello' } }])
# update_table is an asynchronous operation. We need to wait until it
# finishes and the GSI is removed.
wait_for_gsi_gone(table, 'hello')
# Now index is gone. We cannot query using it.
with pytest.raises(ClientError, match='ValidationException.*hello'):
full_query(table, IndexName='hello',
KeyConditions={'x': {'AttributeValueList': [items[3]['x']], 'ComparisonOperator': 'EQ'}})
table.delete()
# Utility function for creating a new table a GSI with the given name,
# and, if creation was successful, delete it. Useful for testing which
# GSI names work.
def create_gsi(dynamodb, index_name):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }],
GlobalSecondaryIndexes=[
{ 'IndexName': index_name,
'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
'Projection': { 'ProjectionType': 'ALL' }
}
])
# Verify that the GSI wasn't just ignored, as Scylla originally did ;-)
assert 'GlobalSecondaryIndexes' in table.meta.client.describe_table(TableName=table.name)['Table']
table.delete()
# Like table names (tested in test_table.py), index names must must also
# be 3-255 characters and match the regex [a-zA-Z0-9._-]+. This test
# is similar to test_create_table_unsupported_names(), but for GSI names.
# Note that Scylla is actually more limited in the length of the index
# names, because both table name and index name, together, have to fit in
# 221 characters. But we don't verify here this specific limitation.
def test_gsi_unsupported_names(dynamodb):
# Unfortunately, the boto library tests for names shorter than the
# minimum length (3 characters) immediately, and failure results in
# ParamValidationError. But the other invalid names are passed to
# DynamoDB, which returns an HTTP response code, which results in a
# CientError exception.
with pytest.raises(ParamValidationError):
create_gsi(dynamodb, 'n')
with pytest.raises(ParamValidationError):
create_gsi(dynamodb, 'nn')
with pytest.raises(ClientError, match='ValidationException.*nnnnn'):
create_gsi(dynamodb, 'n' * 256)
with pytest.raises(ClientError, match='ValidationException.*nyh'):
create_gsi(dynamodb, 'nyh@test')
# On the other hand, names following the above rules should be accepted. Even
# names which the Scylla rules forbid, such as a name starting with .
def test_gsi_non_scylla_name(dynamodb):
create_gsi(dynamodb, '.alternator_test')
# Index names with 255 characters are allowed in Dynamo. In Scylla, the
# limit is different - the sum of both table and index length cannot
# exceed 211 characters. So we test a much shorter limit.
# (compare test_create_and_delete_table_very_long_name()).
def test_gsi_very_long_name(dynamodb):
#create_gsi(dynamodb, 'n' * 255) # works on DynamoDB, but not on Scylla
create_gsi(dynamodb, 'n' * 190)
# Verify that ListTables does not list materialized views used for indexes.
# This is hard to test, because we don't really know which table names
# should be listed beyond those we created, and don't want to assume that
# no other test runs in parallel with us. So the method we chose is to use a
# unique random name for an index, and check that no table contains this
# name. This assumes that materialized-view names are composed using the
# index's name (which is currently what we do).
@pytest.fixture(scope="session")
def test_table_gsi_random_name(dynamodb):
index_name = random_string()
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': index_name,
'KeySchema': [
{ 'AttributeName': 'c', 'KeyType': 'HASH' },
{ 'AttributeName': 'p', 'KeyType': 'RANGE' },
],
'Projection': { 'ProjectionType': 'ALL' }
}
],
)
yield [table, index_name]
table.delete()
def test_gsi_list_tables(dynamodb, test_table_gsi_random_name):
table, index_name = test_table_gsi_random_name
# Check that the random "index_name" isn't a substring of any table name:
tables = list_tables(dynamodb)
for name in tables:
assert not index_name in name
# But of course, the table's name should be in the list:
assert table.name in tables

View File

@@ -1,34 +0,0 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the health check
import requests
# Test that a health check can be performed with a GET packet
def test_health_works(dynamodb):
url = dynamodb.meta.client._endpoint.host
response = requests.get(url)
assert response.ok
assert response.content.decode('utf-8').strip() == 'healthy: {}'.format(url.replace('https://', '').replace('http://', ''))
# Test that a health check only works for the root URL ('/')
def test_health_only_works_for_root_path(dynamodb):
url = dynamodb.meta.client._endpoint.host
for suffix in ['/abc', '/..', '/-', '/index.htm', '/health']:
response = requests.get(url + suffix)
assert response.status_code in range(400, 405)

View File

@@ -1,402 +0,0 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the CRUD item operations: PutItem, GetItem, UpdateItem, DeleteItem
import pytest
from botocore.exceptions import ClientError
from decimal import Decimal
from util import random_string, random_bytes
# Basic test for creating a new item with a random name, and reading it back
# with strong consistency.
# Only the string type is used for keys and attributes. None of the various
# optional PutItem features (Expected, ReturnValues, ReturnConsumedCapacity,
# ReturnItemCollectionMetrics, ConditionalOperator, ConditionExpression,
# ExpressionAttributeNames, ExpressionAttributeValues) are used, and
# for GetItem strong consistency is requested as well as all attributes,
# but no other optional features (AttributesToGet, ReturnConsumedCapacity,
# ProjectionExpression, ExpressionAttributeNames)
def test_basic_string_put_and_get(test_table):
p = random_string()
c = random_string()
val = random_string()
val2 = random_string()
test_table.put_item(Item={'p': p, 'c': c, 'attribute': val, 'another': val2})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item['p'] == p
assert item['c'] == c
assert item['attribute'] == val
assert item['another'] == val2
# Similar to test_basic_string_put_and_get, just uses UpdateItem instead of
# PutItem. Because the item does not yet exist, it should work the same.
def test_basic_string_update_and_get(test_table):
p = random_string()
c = random_string()
val = random_string()
val2 = random_string()
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'attribute': {'Value': val, 'Action': 'PUT'}, 'another': {'Value': val2, 'Action': 'PUT'}})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item['p'] == p
assert item['c'] == c
assert item['attribute'] == val
assert item['another'] == val2
# Test put_item and get_item of various types for the *attributes*,
# including both scalars as well as nested documents, lists and sets.
# The full list of types tested here:
# number, boolean, bytes, null, list, map, string set, number set,
# binary set.
# The keys are still strings.
# Note that only top-level attributes are written and read in this test -
# this test does not attempt to modify *nested* attributes.
# See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/dynamodb.html
# on how to pass these various types to Boto3's put_item().
def test_put_and_get_attribute_types(test_table):
key = {'p': random_string(), 'c': random_string()}
test_items = [
Decimal("12.345"),
42,
True,
False,
b'xyz',
None,
['hello', 'world', 42],
{'hello': 'world', 'life': 42},
{'hello': {'test': 'hi', 'hello': True, 'list': [1, 2, 'hi']}},
set(['hello', 'world', 'hi']),
set([1, 42, Decimal("3.14")]),
set([b'xyz', b'hi']),
]
item = { str(i) : test_items[i] for i in range(len(test_items)) }
item.update(key)
test_table.put_item(Item=item)
got_item = test_table.get_item(Key=key, ConsistentRead=True)['Item']
assert item == got_item
# The test_empty_* tests below verify support for empty items, with no
# attributes except the key. This is a difficult case for Scylla, because
# for an empty row to exist, Scylla needs to add a "CQL row marker".
# There are several ways to create empty items - via PutItem, UpdateItem
# and deleting attributes from non-empty items, and we need to check them
# all, in several test_empty_* tests:
def test_empty_put(test_table):
p = random_string()
c = random_string()
test_table.put_item(Item={'p': p, 'c': c})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item == {'p': p, 'c': c}
def test_empty_put_delete(test_table):
p = random_string()
c = random_string()
test_table.put_item(Item={'p': p, 'c': c, 'hello': 'world'})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'hello': {'Action': 'DELETE'}})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item == {'p': p, 'c': c}
def test_empty_update(test_table):
p = random_string()
c = random_string()
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item == {'p': p, 'c': c}
def test_empty_update_delete(test_table):
p = random_string()
c = random_string()
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'hello': {'Value': 'world', 'Action': 'PUT'}})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'hello': {'Action': 'DELETE'}})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item == {'p': p, 'c': c}
# Test error handling of UpdateItem passed a bad "Action" field.
def test_update_bad_action(test_table):
p = random_string()
c = random_string()
val = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'attribute': {'Value': val, 'Action': 'NONEXISTENT'}})
# A more elaborate UpdateItem test, updating different attributes at different
# times. Includes PUT and DELETE operations.
def test_basic_string_more_update(test_table):
p = random_string()
c = random_string()
val1 = random_string()
val2 = random_string()
val3 = random_string()
val4 = random_string()
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a3': {'Value': val1, 'Action': 'PUT'}})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a1': {'Value': val1, 'Action': 'PUT'}})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a2': {'Value': val2, 'Action': 'PUT'}})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a1': {'Value': val3, 'Action': 'PUT'}})
test_table.update_item(Key={'p': p, 'c': c}, AttributeUpdates={'a3': {'Action': 'DELETE'}})
item = test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item']
assert item['p'] == p
assert item['c'] == c
assert item['a1'] == val3
assert item['a2'] == val2
assert not 'a3' in item
# Test that item operations on a non-existant table name fail with correct
# error code.
def test_item_operations_nonexistent_table(dynamodb):
with pytest.raises(ClientError, match='ResourceNotFoundException'):
dynamodb.meta.client.put_item(TableName='non_existent_table',
Item={'a':{'S':'b'}})
# Fetching a non-existant item. According to the DynamoDB doc, "If there is no
# matching item, GetItem does not return any data and there will be no Item
# element in the response."
def test_get_item_missing_item(test_table):
p = random_string()
c = random_string()
assert not "Item" in test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)
# Test that if we have a table with string hash and sort keys, we can't read
# or write items with other key types to it.
def test_put_item_wrong_key_type(test_table):
b = random_bytes()
s = random_string()
n = Decimal("3.14")
# Should succeed (correct key types)
test_table.put_item(Item={'p': s, 'c': s})
assert test_table.get_item(Key={'p': s, 'c': s}, ConsistentRead=True)['Item'] == {'p': s, 'c': s}
# Should fail (incorrect hash key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'p': b, 'c': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'p': n, 'c': s})
# Should fail (incorrect sort key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'p': s, 'c': b})
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'p': s, 'c': n})
# Should fail (missing hash key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'c': s})
# Should fail (missing sort key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.put_item(Item={'p': s})
def test_update_item_wrong_key_type(test_table, test_table_s):
b = random_bytes()
s = random_string()
n = Decimal("3.14")
# Should succeed (correct key types)
test_table.update_item(Key={'p': s, 'c': s}, AttributeUpdates={})
assert test_table.get_item(Key={'p': s, 'c': s}, ConsistentRead=True)['Item'] == {'p': s, 'c': s}
# Should fail (incorrect hash key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': b, 'c': s}, AttributeUpdates={})
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': n, 'c': s}, AttributeUpdates={})
# Should fail (incorrect sort key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': s, 'c': b}, AttributeUpdates={})
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': s, 'c': n}, AttributeUpdates={})
# Should fail (missing hash key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'c': s}, AttributeUpdates={})
# Should fail (missing sort key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.update_item(Key={'p': s}, AttributeUpdates={})
# Should fail (spurious key columns)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': s, 'c': s, 'spurious': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': s, 'c': s})
def test_get_item_wrong_key_type(test_table, test_table_s):
b = random_bytes()
s = random_string()
n = Decimal("3.14")
# Should succeed (correct key types) but have empty result
assert not "Item" in test_table.get_item(Key={'p': s, 'c': s}, ConsistentRead=True)
# Should fail (incorrect hash key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': b, 'c': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': n, 'c': s})
# Should fail (incorrect sort key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': s, 'c': b})
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': s, 'c': n})
# Should fail (missing hash key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'c': s})
# Should fail (missing sort key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': s})
# Should fail (spurious key columns)
with pytest.raises(ClientError, match='ValidationException'):
test_table.get_item(Key={'p': s, 'c': s, 'spurious': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': s, 'c': s})
def test_delete_item_wrong_key_type(test_table, test_table_s):
b = random_bytes()
s = random_string()
n = Decimal("3.14")
# Should succeed (correct key types)
test_table.delete_item(Key={'p': s, 'c': s})
# Should fail (incorrect hash key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': b, 'c': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': n, 'c': s})
# Should fail (incorrect sort key types)
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': s, 'c': b})
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': s, 'c': n})
# Should fail (missing hash key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'c': s})
# Should fail (missing sort key)
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': s})
# Should fail (spurious key columns)
with pytest.raises(ClientError, match='ValidationException'):
test_table.delete_item(Key={'p': s, 'c': s, 'spurious': s})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.delete_item(Key={'p': s, 'c': s})
# Most of the tests here arbitrarily used a table with both hash and sort keys
# (both strings). Let's check that a table with *only* a hash key works ok
# too, for PutItem, GetItem, and UpdateItem.
def test_only_hash_key(test_table_s):
s = random_string()
test_table_s.put_item(Item={'p': s, 'hello': 'world'})
assert test_table_s.get_item(Key={'p': s}, ConsistentRead=True)['Item'] == {'p': s, 'hello': 'world'}
test_table_s.update_item(Key={'p': s}, AttributeUpdates={'hi': {'Value': 'there', 'Action': 'PUT'}})
assert test_table_s.get_item(Key={'p': s}, ConsistentRead=True)['Item'] == {'p': s, 'hello': 'world', 'hi': 'there'}
# Tests for item operations in tables with non-string hash or sort keys.
# These tests focus only on the type of the key - everything else is as
# simple as we can (string attributes, no special options for GetItem
# and PutItem). These tests also focus on individual items only, and
# not about the sort order of sort keys - this should be verified in
# test_query.py, for example.
def test_bytes_hash_key(test_table_b):
# Bytes values are passed using base64 encoding, which has weird cases
# depending on len%3 and len%4. So let's try various lengths.
for len in range(10,18):
p = random_bytes(len)
val = random_string()
test_table_b.put_item(Item={'p': p, 'attribute': val})
assert test_table_b.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'attribute': val}
def test_bytes_sort_key(test_table_sb):
p = random_string()
c = random_bytes()
val = random_string()
test_table_sb.put_item(Item={'p': p, 'c': c, 'attribute': val})
assert test_table_sb.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'attribute': val}
# Tests for using a large binary blob as hash key, sort key, or attribute.
# DynamoDB strictly limits the size of the binary hash key to 2048 bytes,
# and binary sort key to 1024 bytes, and refuses anything larger. The total
# size of an item is limited to 400KB, which also limits the size of the
# largest attributes. For more details on these limits, see
# https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html
# Alternator currently does *not* have these limitations, and can accept much
# larger keys and attributes, but what we do in the following tests is to verify
# that items up to DynamoDB's maximum sizes also work well in Alternator.
def test_large_blob_hash_key(test_table_b):
b = random_bytes(2048)
test_table_b.put_item(Item={'p': b})
assert test_table_b.get_item(Key={'p': b}, ConsistentRead=True)['Item'] == {'p': b}
def test_large_blob_sort_key(test_table_sb):
s = random_string()
b = random_bytes(1024)
test_table_sb.put_item(Item={'p': s, 'c': b})
assert test_table_sb.get_item(Key={'p': s, 'c': b}, ConsistentRead=True)['Item'] == {'p': s, 'c': b}
def test_large_blob_attribute(test_table):
p = random_string()
c = random_string()
b = random_bytes(409500) # a bit less than 400KB
test_table.put_item(Item={'p': p, 'c': c, 'attribute': b })
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'attribute': b}
# Checks what it is not allowed to use in a single UpdateItem request both
# old-style AttributeUpdates and new-style UpdateExpression.
def test_update_item_two_update_methods(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
AttributeUpdates={'a': {'Value': 3, 'Action': 'PUT'}},
UpdateExpression='SET b = :val1',
ExpressionAttributeValues={':val1': 4})
# Verify that having neither AttributeUpdates nor UpdateExpression is
# allowed, and results in creation of an empty item.
def test_update_item_no_update_method(test_table_s):
p = random_string()
assert not "Item" in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
test_table_s.update_item(Key={'p': p})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p}
# Test GetItem with the AttributesToGet parameter. Result should include the
# selected attributes only - if one wants the key attributes as well, one
# needs to select them explicitly. When no key attributes are selected,
# some items may have *none* of the selected attributes. Those items are
# returned too, as empty items - they are not outright missing.
def test_getitem_attributes_to_get(dynamodb, test_table):
p = random_string()
c = random_string()
item = {'p': p, 'c': c, 'a': 'hello', 'b': 'hi'}
test_table.put_item(Item=item)
for wanted in [ ['a'], # only non-key attribute
['c', 'a'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # Our item doesn't have this
]:
got_item = test_table.get_item(Key={'p': p, 'c': c}, AttributesToGet=wanted, ConsistentRead=True)['Item']
expected_item = {k: item[k] for k in wanted if k in item}
assert expected_item == got_item
# Basic test for DeleteItem, with hash key only
def test_delete_item_hash(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p})
assert 'Item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
test_table_s.delete_item(Key={'p': p})
assert not 'Item' in test_table_s.get_item(Key={'p': p}, ConsistentRead=True)
# Basic test for DeleteItem, with hash and sort key
def test_delete_item_sort(test_table):
p = random_string()
c = random_string()
key = {'p': p, 'c': c}
test_table.put_item(Item=key)
assert 'Item' in test_table.get_item(Key=key, ConsistentRead=True)
test_table.delete_item(Key=key)
assert not 'Item' in test_table.get_item(Key=key, ConsistentRead=True)
# Test that PutItem completely replaces an existing item. It shouldn't merge
# it with a previously existing value, as UpdateItem does!
# We test for a table with just hash key, and for a table with both hash and
# sort keys.
def test_put_item_replace(test_table_s, test_table):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hi'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hi'}
test_table_s.put_item(Item={'p': p, 'b': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hello'}
c = random_string()
test_table.put_item(Item={'p': p, 'c': c, 'a': 'hi'})
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'a': 'hi'}
test_table.put_item(Item={'p': p, 'c': c, 'b': 'hello'})
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'b': 'hello'}

View File

@@ -1,365 +0,0 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests of LSI (Local Secondary Indexes)
#
# Note that many of these tests are slower than usual, because many of them
# need to create new tables and/or new LSIs of different types, operations
# which are extremely slow in DynamoDB, often taking minutes (!).
import pytest
import time
from botocore.exceptions import ClientError, ParamValidationError
from util import create_test_table, random_string, full_scan, full_query, multiset, list_tables
# Currently, Alternator's LSIs only support eventually consistent reads, so tests
# that involve writing to a table and then expect to read something from it cannot
# be guaranteed to succeed without retrying the read. The following utility
# functions make it easy to write such tests.
def assert_index_query(table, index_name, expected_items, **kwargs):
for i in range(3):
if multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs)):
return
print('assert_index_query retrying')
time.sleep(1)
assert multiset(expected_items) == multiset(full_query(table, IndexName=index_name, **kwargs))
def assert_index_scan(table, index_name, expected_items, **kwargs):
for i in range(3):
if multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs)):
return
print('assert_index_scan retrying')
time.sleep(1)
assert multiset(expected_items) == multiset(full_scan(table, IndexName=index_name, **kwargs))
# Although quite silly, it is actually allowed to create an index which is
# identical to the base table.
def test_lsi_identical(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }, { 'AttributeName': 'c', 'AttributeType': 'S' }],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [{ 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' }],
'Projection': { 'ProjectionType': 'ALL' }
}
])
items = [{'p': random_string(), 'c': random_string()} for i in range(10)]
with table.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Scanning the entire table directly or via the index yields the same
# results (in different order).
assert multiset(items) == multiset(full_scan(table))
assert_index_scan(table, 'hello', items)
# We can't scan a non-existant index
with pytest.raises(ClientError, match='ValidationException'):
full_scan(table, IndexName='wrong')
table.delete()
# Checks that providing a hash key different than the base table is not allowed,
# and so is providing duplicated keys or no sort key at all
def test_lsi_wrong(dynamodb):
with pytest.raises(ClientError, match='ValidationException.*'):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'a', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'b', 'KeyType': 'HASH' },
{ 'AttributeName': 'p', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
table.delete()
with pytest.raises(ClientError, match='ValidationException.*'):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'a', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'p', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
table.delete()
with pytest.raises(ClientError, match='ValidationException.*'):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'a', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' }
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
table.delete()
# A simple scenario for LSI. Base table has just hash key, Index has an
# additional sort key - one of the non-key attributes from the base table.
@pytest.fixture(scope="session")
def test_table_lsi_1(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' },
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'b', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'ALL' }
}
])
yield table
table.delete()
def test_lsi_1(test_table_lsi_1):
items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string()} for i in range(10)]
p1, b1 = items1[0]['p'], items1[0]['b']
p2, b2 = random_string(), random_string()
items2 = [{'p': p2, 'c': p2, 'b': b2}]
items = items1 + items2
with test_table_lsi_1.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [i for i in items if i['p'] == p1 and i['b'] == b1]
assert_index_query(test_table_lsi_1, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}})
expected_items = [i for i in items if i['p'] == p2 and i['b'] == b2]
assert_index_query(test_table_lsi_1, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}})
# A second scenario of LSI. Base table has both hash and sort keys,
# a local index is created on each non-key parameter
@pytest.fixture(scope="session")
def test_table_lsi_4(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'x1', 'AttributeType': 'S' },
{ 'AttributeName': 'x2', 'AttributeType': 'S' },
{ 'AttributeName': 'x3', 'AttributeType': 'S' },
{ 'AttributeName': 'x4', 'AttributeType': 'S' },
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello_' + column,
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': column, 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'ALL' }
} for column in ['x1','x2','x3','x4']
])
yield table
table.delete()
def test_lsi_4(test_table_lsi_4):
items1 = [{'p': random_string(), 'c': random_string(),
'x1': random_string(), 'x2': random_string(), 'x3': random_string(), 'x4': random_string()} for i in range(10)]
i_values = items1[0]
i5 = random_string()
items2 = [{'p': i5, 'c': i5, 'x1': i5, 'x2': i5, 'x3': i5, 'x4': i5}]
items = items1 + items2
with test_table_lsi_4.batch_writer() as batch:
for item in items:
batch.put_item(item)
for column in ['x1', 'x2', 'x3', 'x4']:
expected_items = [i for i in items if (i['p'], i[column]) == (i_values['p'], i_values[column])]
assert_index_query(test_table_lsi_4, 'hello_' + column, expected_items,
KeyConditions={'p': {'AttributeValueList': [i_values['p']], 'ComparisonOperator': 'EQ'},
column: {'AttributeValueList': [i_values[column]], 'ComparisonOperator': 'EQ'}})
expected_items = [i for i in items if (i['p'], i[column]) == (i5, i5)]
assert_index_query(test_table_lsi_4, 'hello_' + column, expected_items,
KeyConditions={'p': {'AttributeValueList': [i5], 'ComparisonOperator': 'EQ'},
column: {'AttributeValueList': [i5], 'ComparisonOperator': 'EQ'}})
def test_lsi_describe(test_table_lsi_4):
desc = test_table_lsi_4.meta.client.describe_table(TableName=test_table_lsi_4.name)
assert 'Table' in desc
assert 'LocalSecondaryIndexes' in desc['Table']
lsis = desc['Table']['LocalSecondaryIndexes']
assert(sorted([lsi['IndexName'] for lsi in lsis]) == ['hello_x1', 'hello_x2', 'hello_x3', 'hello_x4'])
# TODO: check projection and key params
# TODO: check also ProvisionedThroughput, IndexArn
# A table with selective projection - only keys are projected into the index
@pytest.fixture(scope="session")
def test_table_lsi_keys_only(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'b', 'AttributeType': 'S' }
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'b', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'KEYS_ONLY' }
}
])
yield table
table.delete()
# Check that it's possible to extract a non-projected attribute from the index,
# as the documentation promises
def test_lsi_get_not_projected_attribute(test_table_lsi_keys_only):
items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string(), 'd': random_string()} for i in range(10)]
p1, b1, d1 = items1[0]['p'], items1[0]['b'], items1[0]['d']
p2, b2, d2 = random_string(), random_string(), random_string()
items2 = [{'p': p2, 'c': p2, 'b': b2, 'd': d2}]
items = items1 + items2
with test_table_lsi_keys_only.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [i for i in items if i['p'] == p1 and i['b'] == b1 and i['d'] == d1]
assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}},
Select='ALL_ATTRIBUTES')
expected_items = [i for i in items if i['p'] == p2 and i['b'] == b2 and i['d'] == d2]
assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}},
Select='ALL_ATTRIBUTES')
expected_items = [{'d': i['d']} for i in items if i['p'] == p2 and i['b'] == b2 and i['d'] == d2]
assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}},
Select='SPECIFIC_ATTRIBUTES', AttributesToGet=['d'])
# Check that only projected attributes can be extracted
@pytest.mark.xfail(reason="LSI in alternator currently only implement full projections")
def test_lsi_get_all_projected_attributes(test_table_lsi_keys_only):
items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string(), 'd': random_string()} for i in range(10)]
p1, b1, d1 = items1[0]['p'], items1[0]['b'], items1[0]['d']
p2, b2, d2 = random_string(), random_string(), random_string()
items2 = [{'p': p2, 'c': p2, 'b': b2, 'd': d2}]
items = items1 + items2
with test_table_lsi_keys_only.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [{'p': i['p'], 'c': i['c'],'b': i['b']} for i in items if i['p'] == p1 and i['b'] == b1]
assert_index_query(test_table_lsi_keys_only, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}})
# Check that strongly consistent reads are allowed for LSI
def test_lsi_consistent_read(test_table_lsi_1):
items1 = [{'p': random_string(), 'c': random_string(), 'b': random_string()} for i in range(10)]
p1, b1 = items1[0]['p'], items1[0]['b']
p2, b2 = random_string(), random_string()
items2 = [{'p': p2, 'c': p2, 'b': b2}]
items = items1 + items2
with test_table_lsi_1.batch_writer() as batch:
for item in items:
batch.put_item(item)
expected_items = [i for i in items if i['p'] == p1 and i['b'] == b1]
assert_index_query(test_table_lsi_1, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b1], 'ComparisonOperator': 'EQ'}},
ConsistentRead=True)
expected_items = [i for i in items if i['p'] == p2 and i['b'] == b2]
assert_index_query(test_table_lsi_1, 'hello', expected_items,
KeyConditions={'p': {'AttributeValueList': [p2], 'ComparisonOperator': 'EQ'},
'b': {'AttributeValueList': [b2], 'ComparisonOperator': 'EQ'}},
ConsistentRead=True)
# A table with both gsi and lsi present
@pytest.fixture(scope="session")
def test_table_lsi_gsi(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[ { 'AttributeName': 'p', 'KeyType': 'HASH' }, { 'AttributeName': 'c', 'KeyType': 'RANGE' } ],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'x1', 'AttributeType': 'S' },
],
GlobalSecondaryIndexes=[
{ 'IndexName': 'hello_g1',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'x1', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'KEYS_ONLY' }
}
],
LocalSecondaryIndexes=[
{ 'IndexName': 'hello_l1',
'KeySchema': [
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'x1', 'KeyType': 'RANGE' }
],
'Projection': { 'ProjectionType': 'KEYS_ONLY' }
}
])
yield table
table.delete()
# Test that GSI and LSI can coexist, even if they're identical
def test_lsi_and_gsi(test_table_lsi_gsi):
desc = test_table_lsi_gsi.meta.client.describe_table(TableName=test_table_lsi_gsi.name)
assert 'Table' in desc
assert 'LocalSecondaryIndexes' in desc['Table']
assert 'GlobalSecondaryIndexes' in desc['Table']
lsis = desc['Table']['LocalSecondaryIndexes']
gsis = desc['Table']['GlobalSecondaryIndexes']
assert(sorted([lsi['IndexName'] for lsi in lsis]) == ['hello_l1'])
assert(sorted([gsi['IndexName'] for gsi in gsis]) == ['hello_g1'])
items = [{'p': random_string(), 'c': random_string(), 'x1': random_string()} for i in range(17)]
p1, c1, x1 = items[0]['p'], items[0]['c'], items[0]['x1']
with test_table_lsi_gsi.batch_writer() as batch:
for item in items:
batch.put_item(item)
for index in ['hello_g1', 'hello_l1']:
expected_items = [i for i in items if i['p'] == p1 and i['x1'] == x1]
assert_index_query(test_table_lsi_gsi, index, expected_items,
KeyConditions={'p': {'AttributeValueList': [p1], 'ComparisonOperator': 'EQ'},
'x1': {'AttributeValueList': [x1], 'ComparisonOperator': 'EQ'}})

View File

@@ -1,60 +0,0 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Test for operations on items with *nested* attributes.
import pytest
from botocore.exceptions import ClientError
from util import random_string
# Test that we can write a top-level attribute that is a nested document, and
# read it back correctly.
def test_nested_document_attribute_write(test_table_s):
nested_value = {
'a': 3,
'b': {'c': 'hello', 'd': ['hi', 'there', {'x': 'y'}, '42']},
}
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': nested_value})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': nested_value}
# Test that if we have a top-level attribute that is a nested document (i.e.,
# a dictionary), updating this attribute will replace it entirely by a new
# nested document - not merge into the old content with the new content.
def test_nested_document_attribute_overwrite(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5}
test_table_s.update_item(Key={'p': p}, AttributeUpdates={'a': {'Value': {'c': 5}, 'Action': 'PUT'}})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'c': 5}, 'd': 5}
# Moreover, we can overwrite an entire nested document by, say, a string,
# and that's also fine.
def test_nested_document_attribute_overwrite_2(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5}
test_table_s.update_item(Key={'p': p}, AttributeUpdates={'a': {'Value': 'hi', 'Action': 'PUT'}})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hi', 'd': 5}
# Verify that AttributeUpdates cannot be used to update a nested attribute -
# trying to use a dot in the name of the attribute, will just create one with
# an actual dot in its name.
def test_attribute_updates_dot(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p}, AttributeUpdates={'a.b': {'Value': 3, 'Action': 'PUT'}})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a.b': 3}

View File

@@ -1,201 +0,0 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the various operations (GetItem, Query, Scan) with a
# ProjectionExpression parameter.
#
# ProjectionExpression is an expension of the legacy AttributesToGet
# parameter. Both parameters request that only a subset of the attributes
# be fetched for each item, instead of all of them. But while AttributesToGet
# was limited to top-level attributes, ProjectionExpression can request also
# nested attributes.
import pytest
from botocore.exceptions import ClientError
from util import random_string, full_scan, full_query, multiset
# Basic test for ProjectionExpression, requesting only top-level attributes.
# Result should include the selected attributes only - if one wants the key
# attributes as well, one needs to select them explicitly. When no key
# attributes are selected, an item may have *none* of the selected
# attributes, and returned as an empty item.
def test_projection_expression_toplevel(test_table):
p = random_string()
c = random_string()
item = {'p': p, 'c': c, 'a': 'hello', 'b': 'hi'}
test_table.put_item(Item=item)
for wanted in [ ['a'], # only non-key attribute
['c', 'a'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # Our item doesn't have this
]:
got_item = test_table.get_item(Key={'p': p, 'c': c}, ProjectionExpression=",".join(wanted), ConsistentRead=True)['Item']
expected_item = {k: item[k] for k in wanted if k in item}
assert expected_item == got_item
# Various simple tests for ProjectionExpression's syntax, using only top-evel
# attributes.
def test_projection_expression_toplevel_syntax(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': 'hi'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a')['Item'] == {'a': 'hello'}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='#name', ExpressionAttributeNames={'#name': 'a'})['Item'] == {'a': 'hello'}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,b')['Item'] == {'a': 'hello', 'b': 'hi'}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression=' a , b ')['Item'] == {'a': 'hello', 'b': 'hi'}
# Missing or unused names in ExpressionAttributeNames are errors:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='#name', ExpressionAttributeNames={'#wrong': 'a'})['Item'] == {'a': 'hello'}
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='#name', ExpressionAttributeNames={'#name': 'a', '#unused': 'b'})['Item'] == {'a': 'hello'}
# It is not allowed to fetch the same top-level attribute twice (or in
# general, list two overlapping attributes). We get an error like
# "Invalid ProjectionExpression: Two document paths overlap with each
# other; must remove or rewrite one of these paths; path one: [a], path
# two: [a]".
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,a')['Item']
# A comma with nothing after it is a syntax error:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,')['Item']
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression=',a')['Item']
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,,b')['Item']
# An empty ProjectionExpression is not allowed. DynamoDB recognizes its
# syntax, but then writes: "Invalid ProjectionExpression: The expression
# can not be empty".
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='')['Item']
# The following two tests are similar to test_projection_expression_toplevel()
# which tested the GetItem operation - but these test Scan and Query.
# Both test ProjectionExpression with only top-level attributes.
def test_projection_expression_scan(filled_test_table):
table, items = filled_test_table
for wanted in [ ['another'], # only non-key attributes (one item doesn't have it!)
['c', 'another'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # none of the items have this attribute!
]:
got_items = full_scan(table, ProjectionExpression=",".join(wanted))
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
def test_projection_expression_query(test_table):
p = random_string()
items = [{'p': p, 'c': str(i), 'a': str(i*10), 'b': str(i*100) } for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
for wanted in [ ['a'], # only non-key attributes
['c', 'a'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # none of the items have this attribute!
]:
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ProjectionExpression=",".join(wanted))
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
# The previous tests all fetched only top-level attributes. They could all
# be written using AttributesToGet instead of ProjectionExpression (and,
# in fact, we do have similar tests with AttributesToGet in other files),
# but the previous test checked that the alternative syntax works correctly.
# The following test checks fetching more elaborate attribute paths from
# nested documents.
@pytest.mark.xfail(reason="ProjectionExpression does not yet support attribute paths")
def test_projection_expression_path(test_table_s):
p = random_string()
test_table_s.put_item(Item={
'p': p,
'a': {'b': [2, 4, {'x': 'hi', 'y': 'yo'}], 'c': 5},
'b': 'hello'
})
# Fetching the entire nested document "a" works, of course:
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a')['Item'] == {'a': {'b': [2, 4, {'x': 'hi', 'y': 'yo'}], 'c': 5}}
# If we fetch a.b, we get only the content of b - but it's still inside
# the a dictionary:
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b')['Item'] == {'a': {'b': [2, 4, {'x': 'hi', 'y': 'yo'}]}}
# Similarly, fetching a.b[0] gives us a one-element array in a dictionary.
# Note that [0] is the first element of an array.
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0]')['Item'] == {'a': {'b': [2]}}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[2]')['Item'] == {'a': {'b': [{'x': 'hi', 'y': 'yo'}]}}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[2].y')['Item'] == {'a': {'b': [{'y': 'yo'}]}}
# Trying to read any sort of non-existant attribute returns an empty item.
# This includes a non-existing top-level attribute, an attempt to read
# beyond the end of an array or a non-existant member of a dictionary, as
# well as paths which begin with a non-existant prefix.
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='x')['Item'] == {}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[3]')['Item'] == {}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.x')['Item'] == {}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.x.y')['Item'] == {}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[3].x')['Item'] == {}
# We can read multiple paths - the result are merged into one object
# structured the same was as in the original item:
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0],a.b[1]')['Item'] == {'a': {'b': [2, 4]}}
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0],a.c')['Item'] == {'a': {'b': [2], 'c': 5}}
# It is not allowed to read the same path multiple times. The error from
# DynamoDB looks like: "Invalid ProjectionExpression: Two document paths
# overlap with each other; must remove or rewrite one of these paths;
# path one: [a, b, [0]], path two: [a, b, [0]]".
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a.b[0],a.b[0]')['Item']
# Two paths are considered to "overlap" if the content of one path
# contains the content of the second path. So requesting both "a" and
# "a.b[0]" is not allowed.
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a,a.b[0]')['Item']
@pytest.mark.xfail(reason="ProjectionExpression does not yet support attribute paths")
def test_query_projection_expression_path(test_table):
p = random_string()
items = [{'p': p, 'c': str(i), 'a': {'x': str(i*10), 'y': 'hi'}, 'b': 'hello' } for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ProjectionExpression="a.x")
expected_items = [{'a': {'x': x['a']['x']}} for x in items]
assert multiset(expected_items) == multiset(got_items)
@pytest.mark.xfail(reason="ProjectionExpression does not yet support attribute paths")
def test_scan_projection_expression_path(test_table):
# This test is similar to test_query_projection_expression_path above,
# but uses a scan instead of a query. The scan will generate unrelated
# partitions created by other tests (hopefully not too many...) that we
# need to ignore. We also need to ask for "p" too, so we can filter by it.
p = random_string()
items = [{'p': p, 'c': str(i), 'a': {'x': str(i*10), 'y': 'hi'}, 'b': 'hello' } for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
got_items = [ x for x in full_scan(test_table, ProjectionExpression="p, a.x") if x['p'] == p]
expected_items = [{'p': p, 'a': {'x': x['a']['x']}} for x in items]
assert multiset(expected_items) == multiset(got_items)
# It is not allowed to use both ProjectionExpression and its older cousin,
# AttributesToGet, together. If trying to do this, DynamoDB produces an error
# like "Can not use both expression and non-expression parameters in the same
# request: Non-expression parameters: {AttributesToGet} Expression
# parameters: {ProjectionExpression}
def test_projection_expression_and_attributes_to_get(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': 'hi'})
with pytest.raises(ClientError, match='ValidationException.*both'):
test_table_s.get_item(Key={'p': p}, ConsistentRead=True, ProjectionExpression='a', AttributesToGet=['b'])['Item']
with pytest.raises(ClientError, match='ValidationException.*both'):
full_scan(test_table_s, ProjectionExpression='a', AttributesToGet=['a'])
with pytest.raises(ClientError, match='ValidationException.*both'):
full_query(test_table_s, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, ProjectionExpression='a', AttributesToGet=['a'])

View File

@@ -1,358 +0,0 @@
# -*- coding: utf-8 -*-
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the Query operation
import random
import pytest
from botocore.exceptions import ClientError
from decimal import Decimal
from util import random_string, random_bytes, full_query, multiset
from boto3.dynamodb.conditions import Key, Attr
# Test that scanning works fine with in-stock paginator
def test_query_basic_restrictions(dynamodb, filled_test_table):
test_table, items = filled_test_table
paginator = dynamodb.meta.client.get_paginator('query')
# EQ
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long']) == multiset(got_items)
# LT
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['12'], 'ComparisonOperator': 'LT'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] < '12']) == multiset(got_items)
# LE
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['14'], 'ComparisonOperator': 'LE'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] <= '14']) == multiset(got_items)
# GT
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['15'], 'ComparisonOperator': 'GT'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] > '15']) == multiset(got_items)
# GE
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['14'], 'ComparisonOperator': 'GE'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] >= '14']) == multiset(got_items)
# BETWEEN
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['155', '164'], 'ComparisonOperator': 'BETWEEN'}
}):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] >= '155' and item['c'] <= '164']) == multiset(got_items)
# BEGINS_WITH
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': ['11'], 'ComparisonOperator': 'BEGINS_WITH'}
}):
print([item for item in items if item['p'] == 'long' and item['c'].startswith('11')])
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'].startswith('11')]) == multiset(got_items)
# Test that KeyConditionExpression parameter is supported
@pytest.mark.xfail(reason="KeyConditionExpression not supported yet")
def test_query_key_condition_expression(dynamodb, filled_test_table):
test_table, items = filled_test_table
paginator = dynamodb.meta.client.get_paginator('query')
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditionExpression=Key("p").eq("long") & Key("c").lt("12")):
got_items += page['Items']
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['c'] < '12']) == multiset(got_items)
def test_begins_with(dynamodb, test_table):
paginator = dynamodb.meta.client.get_paginator('query')
items = [{'p': 'unorthodox_chars', 'c': sort_key, 'str': 'a'} for sort_key in [u'ÿÿÿ', u'cÿbÿ', u'cÿbÿÿabg'] ]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
# TODO(sarna): Once bytes type is supported, /xFF character should be tested
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': [u'ÿÿ'], 'ComparisonOperator': 'BEGINS_WITH'}
}):
got_items += page['Items']
print(got_items)
assert sorted([d['c'] for d in got_items]) == sorted([d['c'] for d in items if d['c'].startswith(u'ÿÿ')])
got_items = []
for page in paginator.paginate(TableName=test_table.name, KeyConditions={
'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': [u'cÿbÿ'], 'ComparisonOperator': 'BEGINS_WITH'}
}):
got_items += page['Items']
print(got_items)
assert sorted([d['c'] for d in got_items]) == sorted([d['c'] for d in items if d['c'].startswith(u'cÿbÿ')])
def test_begins_with_wrong_type(dynamodb, test_table_sn):
paginator = dynamodb.meta.client.get_paginator('query')
with pytest.raises(ClientError, match='ValidationException'):
for page in paginator.paginate(TableName=test_table_sn.name, KeyConditions={
'p' : {'AttributeValueList': ['unorthodox_chars'], 'ComparisonOperator': 'EQ'},
'c' : {'AttributeValueList': [17], 'ComparisonOperator': 'BEGINS_WITH'}
}):
pass
# Items returned by Query should be sorted by the sort key. The following
# tests verify that this is indeed the case, for the three allowed key types:
# strings, binary, and numbers. These tests test not just the Query operation,
# but inherently that the sort-key sorting works.
def test_query_sort_order_string(test_table):
# Insert a lot of random items in one new partition:
# str(i) has a non-obvious sort order (e.g., "100" comes before "2") so is a nice test.
p = random_string()
items = [{'p': p, 'c': str(i)} for i in range(128)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
assert len(items) == len(got_items)
# Extract just the sort key ("c") from the items
sort_keys = [x['c'] for x in items]
got_sort_keys = [x['c'] for x in got_items]
# Verify that got_sort_keys are already sorted (in string order)
assert sorted(got_sort_keys) == got_sort_keys
# Verify that got_sort_keys are a sorted version of the expected sort_keys
assert sorted(sort_keys) == got_sort_keys
def test_query_sort_order_bytes(test_table_sb):
# Insert a lot of random items in one new partition:
# We arbitrarily use random_bytes with a random length.
p = random_string()
items = [{'p': p, 'c': random_bytes(10)} for i in range(128)]
with test_table_sb.batch_writer() as batch:
for item in items:
batch.put_item(item)
got_items = full_query(test_table_sb, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
assert len(items) == len(got_items)
sort_keys = [x['c'] for x in items]
got_sort_keys = [x['c'] for x in got_items]
# Boto3's "Binary" objects are sorted as if bytes are signed integers.
# This isn't the order that DynamoDB itself uses (byte 0 should be first,
# not byte -128). Sorting the byte array ".value" works.
assert sorted(got_sort_keys, key=lambda x: x.value) == got_sort_keys
assert sorted(sort_keys) == got_sort_keys
def test_query_sort_order_number(test_table_sn):
# This is a list of numbers, sorted in correct order, and each suitable
# for accurate representation by Alternator's number type.
numbers = [
Decimal("-2e10"),
Decimal("-7.1e2"),
Decimal("-4.1"),
Decimal("-0.1"),
Decimal("-1e-5"),
Decimal("0"),
Decimal("2e-5"),
Decimal("0.15"),
Decimal("1"),
Decimal("1.00000000000000000000000001"),
Decimal("3.14159"),
Decimal("3.1415926535897932384626433832795028841"),
Decimal("31.4"),
Decimal("1.4e10"),
]
# Insert these numbers, in random order, into one partition:
p = random_string()
items = [{'p': p, 'c': num} for num in random.sample(numbers, len(numbers))]
with test_table_sn.batch_writer() as batch:
for item in items:
batch.put_item(item)
# Finally, verify that we get back exactly the same numbers (with identical
# precision), and in their original sorted order.
got_items = full_query(test_table_sn, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
got_sort_keys = [x['c'] for x in got_items]
assert got_sort_keys == numbers
def test_query_filtering_attributes_equality(filled_test_table):
test_table, items = filled_test_table
query_filter = {
"attribute" : {
"AttributeValueList" : [ "xxxx" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx']) == multiset(got_items)
query_filter = {
"attribute" : {
"AttributeValueList" : [ "xxxx" ],
"ComparisonOperator": "EQ"
},
"another" : {
"AttributeValueList" : [ "yy" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx' and item['another'] == 'yy']) == multiset(got_items)
# Test that FilterExpression works as expected
@pytest.mark.xfail(reason="FilterExpression not supported yet")
def test_query_filter_expression(filled_test_table):
test_table, items = filled_test_table
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, FilterExpression=Attr("attribute").eq("xxxx"))
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx']) == multiset(got_items)
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, FilterExpression=Attr("attribute").eq("xxxx") & Attr("another").eq("yy"))
print(got_items)
assert multiset([item for item in items if item['p'] == 'long' and item['attribute'] == 'xxxx' and item['another'] == 'yy']) == multiset(got_items)
# QueryFilter can only contain non-key attributes in order to be compatible
def test_query_filtering_key_equality(filled_test_table):
test_table, items = filled_test_table
with pytest.raises(ClientError, match='ValidationException'):
query_filter = {
"c" : {
"AttributeValueList" : [ "5" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)
print(got_items)
with pytest.raises(ClientError, match='ValidationException'):
query_filter = {
"attribute" : {
"AttributeValueList" : [ "x" ],
"ComparisonOperator": "EQ"
},
"p" : {
"AttributeValueList" : [ "5" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': ['long'], 'ComparisonOperator': 'EQ'}}, QueryFilter=query_filter)
print(got_items)
# Test Query with the AttributesToGet parameter. Result should include the
# selected attributes only - if one wants the key attributes as well, one
# needs to select them explicitly. When no key attributes are selected,
# some items may have *none* of the selected attributes. Those items are
# returned too, as empty items - they are not outright missing.
def test_query_attributes_to_get(dynamodb, test_table):
p = random_string()
items = [{'p': p, 'c': str(i), 'a': str(i*10), 'b': str(i*100) } for i in range(10)]
with test_table.batch_writer() as batch:
for item in items:
batch.put_item(item)
for wanted in [ ['a'], # only non-key attributes
['c', 'a'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # none of the items have this attribute!
]:
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}}, AttributesToGet=wanted)
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
# Test that in a table with both hash key and sort key, which keys we can
# Query by: We can Query by the hash key, by a combination of both hash and
# sort keys, but *cannot* query by just the sort key, and obviously not
# by any non-key column.
def test_query_which_key(test_table):
p = random_string()
c = random_string()
p2 = random_string()
c2 = random_string()
item1 = {'p': p, 'c': c}
item2 = {'p': p, 'c': c2}
item3 = {'p': p2, 'c': c}
for i in [item1, item2, item3]:
test_table.put_item(Item=i)
# Query by hash key only:
got_items = full_query(test_table, KeyConditions={'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'}})
expected_items = [item1, item2]
assert multiset(expected_items) == multiset(got_items)
# Query by hash key *and* sort key (this is basically a GetItem):
got_items = full_query(test_table, KeyConditions={
'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},
'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})
expected_items = [item1]
assert multiset(expected_items) == multiset(got_items)
# Query by sort key alone is not allowed. DynamoDB reports:
# "Query condition missed key schema element: p".
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table, KeyConditions={
'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})
# Query by a non-key isn't allowed, for the same reason - that the
# actual hash key (p) is missing in the query:
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table, KeyConditions={
'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})
# If we try both p and a non-key we get a complaint that the sort
# key is missing: "Query condition missed key schema element: c"
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table, KeyConditions={
'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},
'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})
# If we try p, c and another key, we get an error that
# "Conditions can be of length 1 or 2 only".
with pytest.raises(ClientError, match='ValidationException'):
full_query(test_table, KeyConditions={
'p': {'AttributeValueList': [p], 'ComparisonOperator': 'EQ'},
'c': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'},
'z': {'AttributeValueList': [c], 'ComparisonOperator': 'EQ'}
})

View File

@@ -1,191 +0,0 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the Scan operation
import pytest
from botocore.exceptions import ClientError
from util import random_string, full_scan, multiset
from boto3.dynamodb.conditions import Attr
# Test that scanning works fine with/without pagination
def test_scan_basic(filled_test_table):
test_table, items = filled_test_table
for limit in [None,1,2,4,33,50,100,9007,16*1024*1024]:
pos = None
got_items = []
while True:
if limit:
response = test_table.scan(Limit=limit, ExclusiveStartKey=pos) if pos else test_table.scan(Limit=limit)
assert len(response['Items']) <= limit
else:
response = test_table.scan(ExclusiveStartKey=pos) if pos else test_table.scan()
pos = response.get('LastEvaluatedKey', None)
got_items += response['Items']
if not pos:
break
assert len(items) == len(got_items)
assert multiset(items) == multiset(got_items)
def test_scan_with_paginator(dynamodb, filled_test_table):
test_table, items = filled_test_table
paginator = dynamodb.meta.client.get_paginator('scan')
got_items = []
for page in paginator.paginate(TableName=test_table.name):
got_items += page['Items']
assert len(items) == len(got_items)
assert multiset(items) == multiset(got_items)
for page_size in [1, 17, 1234]:
got_items = []
for page in paginator.paginate(TableName=test_table.name, PaginationConfig={'PageSize': page_size}):
got_items += page['Items']
assert len(items) == len(got_items)
assert multiset(items) == multiset(got_items)
# Although partitions are scanned in seemingly-random order, inside a
# partition items must be returned by Scan sorted in sort-key order.
# This test verifies this, for string sort key. We'll need separate
# tests for the other sort-key types (number and binary)
def test_scan_sort_order_string(filled_test_table):
test_table, items = filled_test_table
got_items = full_scan(test_table)
assert len(items) == len(got_items)
# Extract just the sort key ("c") from the partition "long"
items_long = [x['c'] for x in items if x['p'] == 'long']
got_items_long = [x['c'] for x in got_items if x['p'] == 'long']
# Verify that got_items_long are already sorted (in string order)
assert sorted(got_items_long) == got_items_long
# Verify that got_items_long are a sorted version of the expected items_long
assert sorted(items_long) == got_items_long
# Test Scan with the AttributesToGet parameter. Result should include the
# selected attributes only - if one wants the key attributes as well, one
# needs to select them explicitly. When no key attributes are selected,
# some items may have *none* of the selected attributes. Those items are
# returned too, as empty items - they are not outright missing.
def test_scan_attributes_to_get(dynamodb, filled_test_table):
table, items = filled_test_table
for wanted in [ ['another'], # only non-key attributes (one item doesn't have it!)
['c', 'another'], # a key attribute (sort key) and non-key
['p', 'c'], # entire key
['nonexistent'] # none of the items have this attribute!
]:
print(wanted)
got_items = full_scan(table, AttributesToGet=wanted)
expected_items = [{k: x[k] for k in wanted if k in x} for x in items]
assert multiset(expected_items) == multiset(got_items)
def test_scan_with_attribute_equality_filtering(dynamodb, filled_test_table):
table, items = filled_test_table
scan_filter = {
"attribute" : {
"AttributeValueList" : [ "xxxxx" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_scan(table, ScanFilter=scan_filter)
expected_items = [item for item in items if "attribute" in item.keys() and item["attribute"] == "xxxxx" ]
assert multiset(expected_items) == multiset(got_items)
scan_filter = {
"another" : {
"AttributeValueList" : [ "y" ],
"ComparisonOperator": "EQ"
},
"attribute" : {
"AttributeValueList" : [ "xxxxx" ],
"ComparisonOperator": "EQ"
}
}
got_items = full_scan(table, ScanFilter=scan_filter)
expected_items = [item for item in items if "attribute" in item.keys() and item["attribute"] == "xxxxx" and item["another"] == "y" ]
assert multiset(expected_items) == multiset(got_items)
# Test that FilterExpression works as expected
@pytest.mark.xfail(reason="FilterExpression not supported yet")
def test_scan_filter_expression(filled_test_table):
test_table, items = filled_test_table
got_items = full_scan(test_table, FilterExpression=Attr("attribute").eq("xxxx"))
print(got_items)
assert multiset([item for item in items if 'attribute' in item.keys() and item['attribute'] == 'xxxx']) == multiset(got_items)
got_items = full_scan(test_table, FilterExpression=Attr("attribute").eq("xxxx") & Attr("another").eq("yy"))
print(got_items)
assert multiset([item for item in items if 'attribute' in item.keys() and 'another' in item.keys() and item['attribute'] == 'xxxx' and item['another'] == 'yy']) == multiset(got_items)
def test_scan_with_key_equality_filtering(dynamodb, filled_test_table):
table, items = filled_test_table
scan_filter_p = {
"p" : {
"AttributeValueList" : [ "7" ],
"ComparisonOperator": "EQ"
}
}
scan_filter_c = {
"c" : {
"AttributeValueList" : [ "9" ],
"ComparisonOperator": "EQ"
}
}
scan_filter_p_and_attribute = {
"p" : {
"AttributeValueList" : [ "7" ],
"ComparisonOperator": "EQ"
},
"attribute" : {
"AttributeValueList" : [ "x"*7 ],
"ComparisonOperator": "EQ"
}
}
scan_filter_c_and_another = {
"c" : {
"AttributeValueList" : [ "9" ],
"ComparisonOperator": "EQ"
},
"another" : {
"AttributeValueList" : [ "y"*16 ],
"ComparisonOperator": "EQ"
}
}
# Filtering on the hash key
got_items = full_scan(table, ScanFilter=scan_filter_p)
expected_items = [item for item in items if "p" in item.keys() and item["p"] == "7" ]
assert multiset(expected_items) == multiset(got_items)
# Filtering on the sort key
got_items = full_scan(table, ScanFilter=scan_filter_c)
expected_items = [item for item in items if "c" in item.keys() and item["c"] == "9"]
assert multiset(expected_items) == multiset(got_items)
# Filtering on the hash key and an attribute
got_items = full_scan(table, ScanFilter=scan_filter_p_and_attribute)
expected_items = [item for item in items if "p" in item.keys() and "another" in item.keys() and item["p"] == "7" and item["another"] == "y"*16]
assert multiset(expected_items) == multiset(got_items)
# Filtering on the sort key and an attribute
got_items = full_scan(table, ScanFilter=scan_filter_c_and_another)
expected_items = [item for item in items if "c" in item.keys() and "another" in item.keys() and item["c"] == "9" and item["another"] == "y"*16]
assert multiset(expected_items) == multiset(got_items)

View File

@@ -1,276 +0,0 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for basic table operations: CreateTable, DeleteTable, ListTables.
import pytest
from botocore.exceptions import ClientError
from util import list_tables, test_table_name, create_test_table, random_string
# Utility function for create a table with a given name and some valid
# schema.. This function initiates the table's creation, but doesn't
# wait for the table to actually become ready.
def create_table(dynamodb, name, BillingMode='PAY_PER_REQUEST', **kwargs):
return dynamodb.create_table(
TableName=name,
BillingMode=BillingMode,
KeySchema=[
{
'AttributeName': 'p',
'KeyType': 'HASH'
},
{
'AttributeName': 'c',
'KeyType': 'RANGE'
}
],
AttributeDefinitions=[
{
'AttributeName': 'p',
'AttributeType': 'S'
},
{
'AttributeName': 'c',
'AttributeType': 'S'
},
],
**kwargs
)
# Utility function for creating a table with a given name, and then deleting
# it immediately, waiting for these operations to complete. Since the wait
# uses DescribeTable, this function requires all of CreateTable, DescribeTable
# and DeleteTable to work correctly.
# Note that in DynamoDB, table deletion takes a very long time, so tests
# successfully using this function are very slow.
def create_and_delete_table(dynamodb, name, **kwargs):
table = create_table(dynamodb, name, **kwargs)
table.meta.client.get_waiter('table_exists').wait(TableName=name)
table.delete()
table.meta.client.get_waiter('table_not_exists').wait(TableName=name)
##############################################################################
# Test creating a table, and then deleting it, waiting for each operation
# to have completed before proceeding. Since the wait uses DescribeTable,
# this tests requires all of CreateTable, DescribeTable and DeleteTable to
# function properly in their basic use cases.
# Unfortunately, this test is extremely slow with DynamoDB because deleting
# a table is extremely slow until it really happens.
def test_create_and_delete_table(dynamodb):
create_and_delete_table(dynamodb, 'alternator_test')
# DynamoDB documentation specifies that table names must be 3-255 characters,
# and match the regex [a-zA-Z0-9._-]+. Names not matching these rules should
# be rejected, and no table be created.
def test_create_table_unsupported_names(dynamodb):
from botocore.exceptions import ParamValidationError, ClientError
# Intererstingly, the boto library tests for names shorter than the
# minimum length (3 characters) immediately, and failure results in
# ParamValidationError. But the other invalid names are passed to
# DynamoDB, which returns an HTTP response code, which results in a
# CientError exception.
with pytest.raises(ParamValidationError):
create_table(dynamodb, 'n')
with pytest.raises(ParamValidationError):
create_table(dynamodb, 'nn')
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, 'n' * 256)
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, 'nyh@test')
# On the other hand, names following the above rules should be accepted. Even
# names which the Scylla rules forbid, such as a name starting with .
def test_create_and_delete_table_non_scylla_name(dynamodb):
create_and_delete_table(dynamodb, '.alternator_test')
# names with 255 characters are allowed in Dynamo, but they are not currently
# supported in Scylla because we create a directory whose name is the table's
# name followed by 33 bytes (underscore and UUID). So currently, we only
# correctly support names with length up to 222.
def test_create_and_delete_table_very_long_name(dynamodb):
# In the future, this should work:
#create_and_delete_table(dynamodb, 'n' * 255)
# But for now, only 222 works:
create_and_delete_table(dynamodb, 'n' * 222)
# We cannot test the following on DynamoDB because it will succeed
# (DynamoDB allows up to 255 bytes)
#with pytest.raises(ClientError, match='ValidationException'):
# create_table(dynamodb, 'n' * 223)
# Tests creating a table with an invalid schema should return a
# ValidationException error.
def test_create_table_invalid_schema(dynamodb):
# The name of the table "created" by this test shouldn't matter, the
# creation should not succeed anyway.
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'p', 'KeyType': 'HASH' },
{ 'AttributeName': 'c', 'KeyType': 'HASH' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
)
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'p', 'KeyType': 'RANGE' },
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
)
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'c', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'c', 'AttributeType': 'S' },
],
)
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'c', 'KeyType': 'HASH' },
{ 'AttributeName': 'p', 'KeyType': 'RANGE' },
{ 'AttributeName': 'z', 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': 'c', 'AttributeType': 'S' },
{ 'AttributeName': 'p', 'AttributeType': 'S' },
{ 'AttributeName': 'z', 'AttributeType': 'S' }
],
)
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'c', 'KeyType': 'HASH' },
],
AttributeDefinitions=[
{ 'AttributeName': 'z', 'AttributeType': 'S' }
],
)
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(
TableName='name_doesnt_matter',
BillingMode='PAY_PER_REQUEST',
KeySchema=[
{ 'AttributeName': 'k', 'KeyType': 'HASH' },
],
AttributeDefinitions=[
{ 'AttributeName': 'k', 'AttributeType': 'Q' }
],
)
# Test that trying to create a table that already exists fails in the
# appropriate way (ResourceInUseException)
def test_create_table_already_exists(dynamodb, test_table):
with pytest.raises(ClientError, match='ResourceInUseException'):
create_table(dynamodb, test_table.name)
# Test that BillingMode error path works as expected - only the values
# PROVISIONED or PAY_PER_REQUEST are allowed. The former requires
# ProvisionedThroughput to be set, the latter forbids it.
# If BillingMode is outright missing, it defaults (as original
# DynamoDB did) to PROVISIONED so ProvisionedThroughput is allowed.
def test_create_table_billing_mode_errors(dynamodb, test_table):
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, test_table_name(), BillingMode='unknown')
# billing mode is case-sensitive
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, test_table_name(), BillingMode='pay_per_request')
# PAY_PER_REQUEST cannot come with a ProvisionedThroughput:
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, test_table_name(),
BillingMode='PAY_PER_REQUEST', ProvisionedThroughput={'ReadCapacityUnits': 10, 'WriteCapacityUnits': 10})
# On the other hand, PROVISIONED requires ProvisionedThroughput:
# By the way, ProvisionedThroughput not only needs to appear, it must
# have both ReadCapacityUnits and WriteCapacityUnits - but we can't test
# this with boto3, because boto3 has its own verification that if
# ProvisionedThroughput is given, it must have the correct form.
with pytest.raises(ClientError, match='ValidationException'):
create_table(dynamodb, test_table_name(), BillingMode='PROVISIONED')
# If BillingMode is completely missing, it defaults to PROVISIONED, so
# ProvisionedThroughput is required
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.create_table(TableName=test_table_name(),
KeySchema=[{ 'AttributeName': 'p', 'KeyType': 'HASH' }],
AttributeDefinitions=[{ 'AttributeName': 'p', 'AttributeType': 'S' }])
# Our first implementation had a special column name called "attrs" where
# we stored a map for all non-key columns. If the user tried to name one
# of the key columns with this same name, the result was a disaster - Scylla
# goes into a bad state after trying to write data with two updates to same-
# named columns.
special_column_name1 = 'attrs'
special_column_name2 = ':attrs'
@pytest.fixture(scope="session")
def test_table_special_column_name(dynamodb):
table = create_test_table(dynamodb,
KeySchema=[
{ 'AttributeName': special_column_name1, 'KeyType': 'HASH' },
{ 'AttributeName': special_column_name2, 'KeyType': 'RANGE' }
],
AttributeDefinitions=[
{ 'AttributeName': special_column_name1, 'AttributeType': 'S' },
{ 'AttributeName': special_column_name2, 'AttributeType': 'S' },
],
)
yield table
table.delete()
@pytest.mark.xfail(reason="special attrs column not yet hidden correctly")
def test_create_table_special_column_name(test_table_special_column_name):
s = random_string()
c = random_string()
h = random_string()
expected = {special_column_name1: s, special_column_name2: c, 'hello': h}
test_table_special_column_name.put_item(Item=expected)
got = test_table_special_column_name.get_item(Key={special_column_name1: s, special_column_name2: c}, ConsistentRead=True)['Item']
assert got == expected
# Test that all tables we create are listed, and pagination works properly.
# Note that the DyanamoDB setup we run this against may have hundreds of
# other tables, for all we know. We just need to check that the tables we
# created are indeed listed.
def test_list_tables_paginated(dynamodb, test_table, test_table_s, test_table_b):
my_tables_set = {table.name for table in [test_table, test_table_s, test_table_b]}
for limit in [1, 2, 3, 4, 50, 100]:
print("testing limit={}".format(limit))
list_tables_set = set(list_tables(dynamodb, limit))
assert my_tables_set.issubset(list_tables_set)
# Test that pagination limit is validated
def test_list_tables_wrong_limit(dynamodb):
# lower limit (min. 1) is imposed by boto3 library checks
with pytest.raises(ClientError, match='ValidationException'):
dynamodb.meta.client.list_tables(Limit=101)

View File

@@ -1,854 +0,0 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Tests for the UpdateItem operations with an UpdateExpression parameter
import random
import string
import pytest
from botocore.exceptions import ClientError
from decimal import Decimal
from util import random_string
# The simplest test of using UpdateExpression to set a top-level attribute,
# instead of the older AttributeUpdates parameter.
# Checks only one "SET" action in an UpdateExpression.
def test_update_expression_set(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1',
ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 4}
# An empty UpdateExpression is NOT allowed, and generates a "The expression
# can not be empty" error. This contrasts with an empty AttributeUpdates which
# is allowed, and results in the creation of an empty item if it didn't exist
# yet (see test_empty_update()).
def test_update_expression_empty(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='')
# A basic test with multiple SET actions in one expression
def test_update_expression_set_multi(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET x = :val1, y = :val1',
ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'x': 4, 'y': 4}
# SET can be used to copy an existing attribute to a new one
def test_update_expression_set_copy(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello'}
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET b = a')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hello'}
# Copying an non-existing attribute generates an error
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c = z')
# It turns out that attributes to be copied are read before the SET
# starts to write, so "SET x = :val1, y = x" does not work...
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET x = :val1, y = x', ExpressionAttributeValues={':val1': 4})
# SET z=z does nothing if z exists, or fails if it doesn't
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = a')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hello'}
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET z = z')
# We can also use name references in either LHS or RHS of SET, e.g.,
# SET #one = #two. We need to also take the references used in the RHS
# when we want to complain about unused names in ExpressionAttributeNames.
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #one = #two',
ExpressionAttributeNames={'#one': 'c', '#two': 'a'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hello', 'c': 'hello'}
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #one = #two',
ExpressionAttributeNames={'#one': 'c', '#two': 'a', '#three': 'z'})
# Test for read-before-write action where the value to be read is nested inside a - operator
def test_update_expression_set_nested_copy(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #n = :two',
ExpressionAttributeNames={'#n': 'n'}, ExpressionAttributeValues={':two': 2})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #nn = :seven - #n',
ExpressionAttributeNames={'#nn': 'nn', '#n': 'n'}, ExpressionAttributeValues={':seven': 7})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'n': 2, 'nn': 5}
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #nnn = :nnn',
ExpressionAttributeNames={'#nnn': 'nnn'}, ExpressionAttributeValues={':nnn': [2,4]})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #nnnn = list_append(:val1, #nnn)',
ExpressionAttributeNames={'#nnnn': 'nnnn', '#nnn': 'nnn'}, ExpressionAttributeValues={':val1': [1,3]})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'n': 2, 'nn': 5, 'nnn': [2,4], 'nnnn': [1,3,2,4]}
# Test for getting a key value with read-before-write
def test_update_expression_set_key(test_table_sn):
p = random_string()
test_table_sn.update_item(Key={'p': p, 'c': 7});
test_table_sn.update_item(Key={'p': p, 'c': 7}, UpdateExpression='SET #n = #p',
ExpressionAttributeNames={'#n': 'n', '#p': 'p'})
test_table_sn.update_item(Key={'p': p, 'c': 7}, UpdateExpression='SET #nn = #c + #c',
ExpressionAttributeNames={'#nn': 'nn', '#c': 'c'})
assert test_table_sn.get_item(Key={'p': p, 'c': 7}, ConsistentRead=True)['Item'] == {'p': p, 'c': 7, 'n': p, 'nn': 14}
# Simple test for the "REMOVE" action
def test_update_expression_remove(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': 'hi'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello', 'b': 'hi'}
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hi'}
# Demonstrate that although all DynamoDB examples give UpdateExpression
# action names in uppercase - e.g., "SET", it can actually be any case.
def test_update_expression_action_case(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET b = :val1', ExpressionAttributeValues={':val1': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 3}
test_table_s.update_item(Key={'p': p}, UpdateExpression='set b = :val1', ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 4}
test_table_s.update_item(Key={'p': p}, UpdateExpression='sEt b = :val1', ExpressionAttributeValues={':val1': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 5}
# Demonstrate that whitespace is ignored in UpdateExpression parsing.
def test_update_expression_action_whitespace(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p}, UpdateExpression='set b = :val1', ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 4}
test_table_s.update_item(Key={'p': p}, UpdateExpression=' set b=:val1 ', ExpressionAttributeValues={':val1': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 5}
# In UpdateExpression, the attribute name can appear directly in the expression
# (without a "#placeholder" notation) only if it is a single "token" as
# determined by DynamoDB's lexical analyzer rules: Such token is composed of
# alphanumeric characters whose first character must be alphabetic. Other
# names cause the parser to see multiple tokens, and produce syntax errors.
def test_update_expression_name_token(test_table_s):
p = random_string()
# Alphanumeric names starting with an alphabetical character work
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET alnum = :val1', ExpressionAttributeValues={':val1': 1})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['alnum'] == 1
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET Alpha_Numeric_123 = :val1', ExpressionAttributeValues={':val1': 2})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['Alpha_Numeric_123'] == 2
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET A123_ = :val1', ExpressionAttributeValues={':val1': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['A123_'] == 3
# But alphanumeric names cannot start with underscore or digits.
# DynamoDB's lexical analyzer doesn't recognize them, and produces
# a ValidationException looking like:
# Invalid UpdateExpression: Syntax error; token: "_", near: "SET _123"
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET _123 = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET _abc = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET 123a = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET 123 = :val1', ExpressionAttributeValues={':val1': 3})
# Various other non-alpha-numeric characters, split a token and NOT allowed
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET hi-there = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET hi$there = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET "hithere" = :val1', ExpressionAttributeValues={':val1': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET !hithere = :val1', ExpressionAttributeValues={':val1': 3})
# In addition to the literal names, DynamoDB also allows references to any
# name, using the "#reference" syntax. It turns out the reference name is
# also a token following the rules as above, with one interesting point:
# since "#" already started the token, the next character may be any
# alphanumeric and doesn't need to be only alphabetical.
# Note that the reference target - the actual attribute name - can include
# absolutely any characters, and we use silly_name below as an example
silly_name = '3can include any character!.#='
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #Alpha_Numeric_123 = :val1', ExpressionAttributeValues={':val1': 4}, ExpressionAttributeNames={'#Alpha_Numeric_123': silly_name})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 4
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #123a = :val1', ExpressionAttributeValues={':val1': 5}, ExpressionAttributeNames={'#123a': silly_name})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 5
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #123 = :val1', ExpressionAttributeValues={':val1': 6}, ExpressionAttributeNames={'#123': silly_name})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 6
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #_ = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#_': silly_name})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'][silly_name] == 7
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #hi-there = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#hi-there': silly_name})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #!hi = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#!hi': silly_name})
# Just a "#" is not enough as a token. Interestingly, DynamoDB will
# find the bad name in ExpressionAttributeNames before it actually tries
# to parse UpdateExpression, but we can verify the parse fails too by
# using a valid but irrelevant name in ExpressionAttributeNames:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET # = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#': silly_name})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET # = :val1', ExpressionAttributeValues={':val1': 7}, ExpressionAttributeNames={'#a': silly_name})
# There is also the value references, ":reference", for the right-hand
# side of an assignment. These have similar naming rules like "#reference".
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :Alpha_Numeric_123', ExpressionAttributeValues={':Alpha_Numeric_123': 8})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 8
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :123a', ExpressionAttributeValues={':123a': 9})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 9
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :123', ExpressionAttributeValues={':123': 10})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 10
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :_', ExpressionAttributeValues={':_': 11})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 11
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :hi!there', ExpressionAttributeValues={':hi!there': 12})
# Just a ":" is not enough as a token.
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :', ExpressionAttributeValues={':': 7})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :', ExpressionAttributeValues={':a': 7})
# Trying to use a :reference on the left-hand side of an assignment will
# not work. In DynamoDB, it's a different type of token (and generates
# syntax error).
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET :a = :b', ExpressionAttributeValues={':a': 1, ':b': 2})
# Multiple actions are allowed in one expression, but actions are divided
# into clauses (SET, REMOVE, DELETE, ADD) and each of those can only appear
# once.
def test_update_expression_multi(test_table_s):
p = random_string()
# We can have two SET actions in one SET clause:
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1, b = :val2', ExpressionAttributeValues={':val1': 1, ':val2': 2})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 1, 'b': 2}
# But not two SET clauses - we get error "The "SET" section can only be used once in an update expression"
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1 SET b = :val2', ExpressionAttributeValues={':val1': 1, ':val2': 2})
# We can have a REMOVE and a SET clause (note no comma between clauses):
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a SET b = :val2', ExpressionAttributeValues={':val2': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 3}
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c = :val2 REMOVE b', ExpressionAttributeValues={':val2': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'c': 3}
# The same clause (e.g., SET) cannot be used twice, even if interleaved with something else
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1 REMOVE a SET b = :val2', ExpressionAttributeValues={':val1': 1, ':val2': 2})
# Trying to modify the same item twice in the same update is forbidden.
# For "SET a=:v REMOVE a" DynamoDB says: "Invalid UpdateExpression: Two
# document paths overlap with each other; must remove or rewrite one of
# these paths; path one: [a], path two: [a]".
# It is actually good for Scylla that such updates are forbidden, because had
# we allowed "SET a=:v REMOVE a" the result would be surprising - because data
# wins over a delete with the same timestamp, so "a" would be set despite the
# REMOVE command appearing later in the command line.
def test_update_expression_multi_overlap(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello'}
# Neither "REMOVE a SET a = :v" nor "SET a = :v REMOVE a" are allowed:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a SET a = :v', ExpressionAttributeValues={':v': 'hi'})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :v REMOVE a', ExpressionAttributeValues={':v': 'yo'})
# It's also not allowed to set a twice in the same clause
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :v1, a = :v2', ExpressionAttributeValues={':v1': 'yo', ':v2': 'he'})
# Obviously, the paths are compared after the name references are evaluated
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #a1 = :v1, #a2 = :v2', ExpressionAttributeValues={':v1': 'yo', ':v2': 'he'}, ExpressionAttributeNames={'#a1': 'a', '#a2': 'a'})
# The problem isn't just with identical paths - we can't modify two paths that
# "overlap" in the sense that one is the ancestor of the other.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_multi_overlap_nested(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*overlap'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1, a.b = :val2',
ExpressionAttributeValues={':val1': {'b': 7}, ':val2': 'there'})
test_table_s.put_item(Item={'p': p, 'a': {'b': {'c': 2}}})
with pytest.raises(ClientError, match='ValidationException.*overlap'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.b = :val1, a.b.c = :val2',
ExpressionAttributeValues={':val1': 'hi', ':val2': 'there'})
# In the previous test we saw that *modifying* the same item twice in the same
# update is forbidden; But it is allowed to *read* an item in the same update
# that also modifies it, and we check this here.
def test_update_expression_multi_with_copy(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': 'hello'}
# "REMOVE a SET b = a" works: as noted in test_update_expression_set_copy()
# the value of 'a' is read before the actual REMOVE operation happens.
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a SET b = a')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 'hello'}
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c = b REMOVE b')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'c': 'hello'}
# Test case where a :val1 is referenced, without being defined
def test_update_expression_set_missing_value(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1',
ExpressionAttributeValues={':val2': 4})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1')
# It is forbidden for ExpressionAttributeValues to contain values not used
# by the expression. DynamoDB produces an error like: "Value provided in
# ExpressionAttributeValues unused in expressions: keys: {:val1}"
def test_update_expression_spurious_value(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a = :val1',
ExpressionAttributeValues={':val1': 3, ':val2': 4})
# Test case where a #name is referenced, without being defined
def test_update_expression_set_missing_name(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET #name = :val1',
ExpressionAttributeValues={':val2': 4},
ExpressionAttributeNames={'#wrongname': 'hello'})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET #name = :val1',
ExpressionAttributeValues={':val2': 4})
# It is forbidden for ExpressionAttributeNames to contain names not used
# by the expression. DynamoDB produces an error like: "Value provided in
# ExpressionAttributeNames unused in expressions: keys: {#b}"
def test_update_expression_spurious_name(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #a = :val1',
ExpressionAttributeNames={'#a': 'hello', '#b': 'hi'},
ExpressionAttributeValues={':val1': 3, ':val2': 4})
# Test that the key attributes (hash key or sort key) cannot be modified
# by an update
def test_update_expression_cannot_modify_key(test_table):
p = random_string()
c = random_string()
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='SET p = :val1', ExpressionAttributeValues={':val1': 4})
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='SET c = :val1', ExpressionAttributeValues={':val1': 4})
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='REMOVE p')
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='REMOVE c')
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='ADD p :val1', ExpressionAttributeValues={':val1': 4})
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='ADD c :val1', ExpressionAttributeValues={':val1': 4})
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='DELETE p :val1', ExpressionAttributeValues={':val1': set(['cat', 'mouse'])})
with pytest.raises(ClientError, match='ValidationException.*key'):
test_table.update_item(Key={'p': p, 'c': c},
UpdateExpression='DELETE c :val1', ExpressionAttributeValues={':val1': set(['cat', 'mouse'])})
# As sanity check, verify we *can* modify a non-key column
test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='SET a = :val1', ExpressionAttributeValues={':val1': 4})
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c, 'a': 4}
test_table.update_item(Key={'p': p, 'c': c}, UpdateExpression='REMOVE a')
assert test_table.get_item(Key={'p': p, 'c': c}, ConsistentRead=True)['Item'] == {'p': p, 'c': c}
# Test that trying to start an expression with some nonsense like HELLO
# instead of SET, REMOVE, ADD or DELETE, fails.
def test_update_expression_non_existant_clause(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='HELLO b = :val1',
ExpressionAttributeValues={':val1': 4})
# Test support for "SET a = :val1 + :val2", "SET a = :val1 - :val2"
# Only exactly these combinations work - e.g., it's a syntax error to
# try to add three. Trying to add a string fails.
def test_update_expression_plus_basic(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + :val2',
ExpressionAttributeValues={':val1': 4, ':val2': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 7}
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 - :val2',
ExpressionAttributeValues={':val1': 5, ':val2': 2})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': 3}
# Only the addition of exactly two values is supported!
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + :val2 + :val3',
ExpressionAttributeValues={':val1': 4, ':val2': 3, ':val3': 2})
# Only numeric values can be added - other things like strings or lists
# cannot be added, and we get an error like "Incorrect operand type for
# operator or function; operator or function: +, operand type: S".
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + :val2',
ExpressionAttributeValues={':val1': 'dog', ':val2': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + :val2',
ExpressionAttributeValues={':val1': ['a', 'b'], ':val2': ['1', '2']})
# While most of the Alternator code just saves high-precision numbers
# unchanged, the "+" and "-" operations need to calculate with them, and
# we should check the calculation isn't done with some lower-precision
# representation, e.g., double
def test_update_expression_plus_precision(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + :val2',
ExpressionAttributeValues={':val1': Decimal("1"), ':val2': Decimal("10000000000000000000000")})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': Decimal("10000000000000000000001")}
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val2 - :val1',
ExpressionAttributeValues={':val1': Decimal("1"), ':val2': Decimal("10000000000000000000000")})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'b': Decimal("9999999999999999999999")}
# Test support for "SET a = b + :val2" et al., i.e., a version of the
# above test_update_expression_plus_basic with read before write.
def test_update_expression_plus_rmw(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 2})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = a + :val1',
ExpressionAttributeValues={':val1': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 5
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = :val1 + a',
ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 9
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = :val1 + a',
ExpressionAttributeValues={':val1': 1})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 10
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = b + a')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 19
# Test the list_append() function in SET, for the most basic use case of
# concatenating two value references. Because this is the first test of
# functions in SET, we also test some generic features of how functions
# are parsed.
def test_update_expression_list_append_basic(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1, :val2)',
ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': ['hi', 7]})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': [4, 'hello', 'hi', 7]}
# Unlike the operation name "SET", function names are case-sensitive!
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = LIST_APPEND(:val1, :val2)',
ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': ['hi', 7]})
# As usual, spaces are ignored by the parser
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1, :val2)',
ExpressionAttributeValues={':val1': ['a'], ':val2': ['b']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['a', 'b']}
# The list_append function only allows two parameters. The parser can
# correctly parse fewer or more, but then an error is generated: "Invalid
# UpdateExpression: Incorrect number of operands for operator or function;
# operator or function: list_append, number of operands: 1".
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1)',
ExpressionAttributeValues={':val1': ['a']})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1, :val2, :val3)',
ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': [7], ':val3': ['a']})
# If list_append is used on value which isn't a list, we get
# error: "Invalid UpdateExpression: Incorrect operand type for operator
# or function; operator or function: list_append, operand type: S"
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1, :val2)',
ExpressionAttributeValues={':val1': [4, 'hello'], ':val2': 'hi'})
# Additional list_append() tests, also using attribute paths as parameters
# (i.e., read-modify-write).
def test_update_expression_list_append(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = :val1',
ExpressionAttributeValues={':val1': ['hi', 2]})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] ==['hi', 2]
# Often, list_append is used to append items to a list attribute
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(a, :val1)',
ExpressionAttributeValues={':val1': [4, 'hello']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['hi', 2, 4, 'hello']
# But it can also be used to just concatenate in other ways:
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(:val1, a)',
ExpressionAttributeValues={':val1': ['dog']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['dog', 'hi', 2, 4, 'hello']
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = list_append(a, :val1)',
ExpressionAttributeValues={':val1': ['cat']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == ['dog', 'hi', 2, 4, 'hello', 'cat']
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET c = list_append(a, b)')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['c'] == ['dog', 'hi', 2, 4, 'hello', 'dog', 'hi', 2, 4, 'hello', 'cat']
# As usual, #references are allowed instead of inline names:
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET #name1 = list_append(#name2,:val1)',
ExpressionAttributeValues={':val1': [8]},
ExpressionAttributeNames={'#name1': 'a', '#name2': 'a'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['dog', 'hi', 2, 4, 'hello', 8]
# Test the "if_not_exists" function in SET
# The test also checks additional features of function-call parsing.
def test_update_expression_if_not_exists(test_table_s):
p = random_string()
# Since attribute a doesn't exist, set it:
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = if_not_exists(a, :val1)',
ExpressionAttributeValues={':val1': 2})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2
# Now the attribute does exist, so set does nothing:
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = if_not_exists(a, :val1)',
ExpressionAttributeValues={':val1': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2
# if_not_exists can also be used to check one attribute and set another,
# but note that if_not_exists(a, :val) means a's value if it exists,
# otherwise :val!
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(c, :val1)',
ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 4
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 2
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(c, :val1)',
ExpressionAttributeValues={':val1': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 5
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(a, :val1)',
ExpressionAttributeValues={':val1': 6})
# note how because 'a' does exist, its value is copied, overwriting b's
# value:
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 2
# The parser expects function parameters to be value references, paths,
# or nested call to functions. Other crap will cause syntax errors:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(non@sense, :val1)',
ExpressionAttributeValues={':val1': 6})
# if_not_exists() requires that the first parameter be a path. However,
# the parser doesn't know this, and allows for a function parameter
# also a value reference or a function call. If try one of these other
# things the parser succeeds, but we get a later error, looking like:
# "Invalid UpdateExpression: Operator or function requires a document
# path; operator or function: if_not_exists"
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(if_not_exists(a, :val2), :val1)',
ExpressionAttributeValues={':val1': 6, ':val2': 3})
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(:val2, :val1)',
ExpressionAttributeValues={':val1': 6, ':val2': 3})
# Surprisingly, if the wrong argument is a :val value reference, the
# parser first tries to look it up in ExpressionAttributeValues (and
# fails if it's missing), before realizing any value reference would be
# wrong... So the following fails like the above does - but with a
# different error message (which we do not check here): "Invalid
# UpdateExpression: An expression attribute value used in expression
# is not defined; attribute value: :val2"
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(:val2, :val1)',
ExpressionAttributeValues={':val1': 6})
# When the expression parser parses a function call f(value, value), each
# value may itself be a function call - ad infinitum. So expressions like
# list_append(if_not_exists(a, :val1), :val2) are legal and so is deeper
# nesting.
@pytest.mark.xfail(reason="for unknown reason, DynamoDB does not allow nesting list_append")
def test_update_expression_function_nesting(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(if_not_exists(a, :val1), :val2)',
ExpressionAttributeValues={':val1': ['a', 'b'], ':val2': ['cat', 'dog']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['a', 'b', 'cat', 'dog']
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(if_not_exists(a, :val1), :val2)',
ExpressionAttributeValues={':val1': ['a', 'b'], ':val2': ['1', '2']})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == ['a', 'b', 'cat', 'dog', '1', '2']
# I don't understand why the following expression isn't accepted, but it
# isn't! It produces a "Invalid UpdateExpression: The function is not
# allowed to be used this way in an expression; function: list_append".
# I don't know how to explain it. In any case, the *parsing* works -
# this is not a syntax error - the failure is in some verification later.
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(list_append(:val1, :val2), :val3)',
ExpressionAttributeValues={':val1': ['a'], ':val2': ['1'], ':val3': ['hi']})
# Ditto, the following passes the parser but fails some later check with
# the same error message as above.
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = list_append(list_append(list_append(:val1, :val2), :val3), :val4)',
ExpressionAttributeValues={':val1': ['a'], ':val2': ['1'], ':val3': ['hi'], ':val4': ['yo']})
# Verify how in SET expressions, "+" (or "-") nests with functions.
# We discover that f(x)+f(y) works but f(x+y) does NOT (results in a syntax
# error on the "+"). This means that the parser has two separate rules:
# 1. set_action: SET path = value + value
# 2. value: VALREF | NAME | NAME (value, ...)
def test_update_expression_function_plus_nesting(test_table_s):
p = random_string()
# As explained above, this - with "+" outside the expression, works:
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET b = if_not_exists(b, :val1)+:val2',
ExpressionAttributeValues={':val1': 2, ':val2': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['b'] == 5
# ...but this - with the "+" inside an expression parameter, is a syntax
# error:
with pytest.raises(ClientError, match='ValidationException'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET c = if_not_exists(c, :val1+:val2)',
ExpressionAttributeValues={':val1': 5, ':val2': 4})
# This test tries to use an undefined function "f". This, obviously, fails,
# but where we to actually print the error we would see "Invalid
# UpdateExpression: Invalid function name; function: f". Not a syntax error.
# This means that the parser accepts any alphanumeric name as a function
# name, and only later use of this function fails because it's not one of
# the supported file.
def test_update_expression_unknown_function(test_table_s):
p = random_string()
with pytest.raises(ClientError, match='ValidationException.*f'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = f(b,c,d)')
with pytest.raises(ClientError, match='ValidationException.*f123_hi'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = f123_hi(b,c,d)')
# Just like unreferenced column names parsed by the DynamoDB parser,
# function names must also start with an alphabetic character. Trying
# to use _f as a function name will result with an actual syntax error,
# on the "_" token.
with pytest.raises(ClientError, match='ValidationException.*yntax error'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='SET a = _f(b,c,d)')
# Test "ADD" operation for numbers
def test_update_expression_add_numbers(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 3, 'b': 'hi'})
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD a :val1',
ExpressionAttributeValues={':val1': 4})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == 7
# If the value to be added isn't a number, we get an error like "Invalid
# UpdateExpression: Incorrect operand type for operator or function;
# operator: ADD, operand type: STRING".
with pytest.raises(ClientError, match='ValidationException.*type'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD a :val1',
ExpressionAttributeValues={':val1': 'hello'})
# Similarly, if the attribute we're adding to isn't a number, we get an
# error like "An operand in the update expression has an incorrect data
# type"
with pytest.raises(ClientError, match='ValidationException.*type'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD b :val1',
ExpressionAttributeValues={':val1': 1})
# Test "ADD" operation for sets
def test_update_expression_add_sets(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': set(['dog', 'cat', 'mouse']), 'b': 'hi'})
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD a :val1',
ExpressionAttributeValues={':val1': set(['pig'])})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['dog', 'cat', 'mouse', 'pig'])
# TODO: right now this test won't detect duplicated values in the returned result,
# because boto3 parses a set out of the returned JSON anyway. This check should leverage
# lower level API (if exists) to ensure that the JSON contains no duplicates
# in the set representation. It has been verified manually.
test_table_s.put_item(Item={'p': p, 'a': set(['beaver', 'lynx', 'coati']), 'b': 'hi'})
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD a :val1',
ExpressionAttributeValues={':val1': set(['coati', 'beaver', 'badger'])})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['beaver', 'badger', 'lynx', 'coati'])
# The value to be added needs to be a set of the same type - it can't
# be a single element or anything else. If the value has the wrong type,
# we get an error like "Invalid UpdateExpression: Incorrect operand type
# for operator or function; operator: ADD, operand type: STRING".
with pytest.raises(ClientError, match='ValidationException.*type'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='ADD a :val1',
ExpressionAttributeValues={':val1': 'hello'})
# Test "DELETE" operation for sets
def test_update_expression_delete_sets(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': set(['dog', 'cat', 'mouse']), 'b': 'hi'})
test_table_s.update_item(Key={'p': p},
UpdateExpression='DELETE a :val1',
ExpressionAttributeValues={':val1': set(['cat', 'mouse'])})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['dog'])
# Deleting an element not present in the set is not an error - it just
# does nothing
test_table_s.update_item(Key={'p': p},
UpdateExpression='DELETE a :val1',
ExpressionAttributeValues={':val1': set(['pig'])})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == set(['dog'])
# The value to be deleted must be a set of the same type - it can't
# be a single element or anything else. If the value has the wrong type,
# we get an error like "Invalid UpdateExpression: Incorrect operand type
# for operator or function; operator: DELETE, operand type: STRING".
with pytest.raises(ClientError, match='ValidationException.*type'):
test_table_s.update_item(Key={'p': p},
UpdateExpression='DELETE a :val1',
ExpressionAttributeValues={':val1': 'hello'})
######## Tests for paths and nested attribute updates:
# A dot inside a name in ExpressionAttributeNames is a literal dot, and
# results in a top-level attribute with an actual dot in its name - not
# a nested attribute path.
def test_update_expression_dot_in_name(test_table_s):
p = random_string()
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET #a = :val1',
ExpressionAttributeValues={':val1': 3},
ExpressionAttributeNames={'#a': 'a.b'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a.b': 3}
# A basic test for direct update of a nested attribute: One of the top-level
# attributes is itself a document, and we update only one of that document's
# nested attributes.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_nested_attribute_dot(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 4}, 'd': 5}
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c = :val1',
ExpressionAttributeValues={':val1': 7})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 7}, 'd': 5}
# Of course we can also add new nested attributes, not just modify
# existing ones:
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.d = :val1',
ExpressionAttributeValues={':val1': 3})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 7, 'd': 3}, 'd': 5}
# Similar test, for a list: one of the top-level attributes is a list, we
# can update one of its items.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_nested_attribute_index(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': ['one', 'two', 'three']})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[1] = :val1',
ExpressionAttributeValues={':val1': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['one', 'hello', 'three']}
# Test that just like happens in top-level attributes, also in nested
# attributes, setting them replaces the old value - potentially an entire
# nested document, by the whole value (which may have a different type)
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_nested_different_type(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': {'one': 1, 'two': 2}}})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c = :val1',
ExpressionAttributeValues={':val1': 7})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': {'b': 3, 'c': 7}}
# Yet another test of a nested attribute update. This one uses deeper
# level of nesting (dots and indexes), adds #name references to the mix.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_nested_deep(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': ['hi', {'x': {'y': [3, 5, 7]}}]}})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c[1].#name.y[1] = :val1',
ExpressionAttributeValues={':val1': 9}, ExpressionAttributeNames={'#name': 'x'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == {'b': 3, 'c': ['hi', {'x': {'y': [3, 9, 7]}}]}
# A deep path can also appear on the right-hand-side of an assignment
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.z = a.c[1].#name.y[1]',
ExpressionAttributeNames={'#name': 'x'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a']['z'] == 9
# A REMOVE operation can be used to remove nested attributes, and also
# individual list items.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_update_expression_nested_remove(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': {'b': 3, 'c': ['hi', {'x': {'y': [3, 5, 7]}, 'q': 2}]}})
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a.c[1].x.y[1], a.c[1].q')
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item']['a'] == {'b': 3, 'c': ['hi', {'x': {'y': [3, 7]}}]}
# The DynamoDB documentation specifies: "When you use SET to update a list
# element, the contents of that element are replaced with the new data that
# you specify. If the element does not already exist, SET will append the
# new element at the end of the list."
# So if we take a three-element list a[7], and set a[7], the new element
# will be put at the end of the list, not position 7 specifically.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_nested_attribute_update_array_out_of_bounds(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': ['one', 'two', 'three']})
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[7] = :val1',
ExpressionAttributeValues={':val1': 'hello'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['one', 'two', 'three', 'hello']}
# The DynamoDB documentation also says: "If you add multiple elements
# in a single SET operation, the elements are sorted in order by element
# number.
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[84] = :val1, a[37] = :val2',
ExpressionAttributeValues={':val1': 'a1', ':val2': 'a2'})
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == {'p': p, 'a': ['one', 'two', 'three', 'hello', 'a2', 'a1']}
# Test what happens if we try to write to a.b, which would only make sense if
# a were a nested document, but a doesn't exist, or exists and is NOT a nested
# document but rather a scalar or list or something.
# DynamoDB actually detects this case and prints an error:
# ClientError: An error occurred (ValidationException) when calling the
# UpdateItem operation: The document path provided in the update expression
# is invalid for update
# Because Scylla doesn't read before write, it cannot detect this as an error,
# so we'll probably want to allow for that possibility as well.
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_nested_attribute_update_bad_path_dot(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello', 'b': ['hi']})
with pytest.raises(ClientError, match='ValidationException.*path'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a.c = :val1',
ExpressionAttributeValues={':val1': 7})
with pytest.raises(ClientError, match='ValidationException.*path'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET b.c = :val1',
ExpressionAttributeValues={':val1': 7})
with pytest.raises(ClientError, match='ValidationException.*path'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET c.c = :val1',
ExpressionAttributeValues={':val1': 7})
# Similarly for other types of bad paths - using [0] on something which
# isn't an array,
@pytest.mark.xfail(reason="nested updates not yet implemented")
def test_nested_attribute_update_bad_path_array(test_table_s):
p = random_string()
test_table_s.put_item(Item={'p': p, 'a': 'hello'})
with pytest.raises(ClientError, match='ValidationException.*path'):
test_table_s.update_item(Key={'p': p}, UpdateExpression='SET a[0] = :val1',
ExpressionAttributeValues={':val1': 7})

View File

@@ -1,121 +0,0 @@
# Copyright 2019 ScyllaDB
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
# Various utility functions which are useful for multiple tests
import string
import random
import collections
import time
def random_string(length=10, chars=string.ascii_uppercase + string.digits):
return ''.join(random.choice(chars) for x in range(length))
def random_bytes(length=10):
return bytearray(random.getrandbits(8) for _ in range(length))
# Utility functions for scan and query into an array of items:
# TODO: add to full_scan and full_query by default ConsistentRead=True, as
# it's not useful for tests without it!
def full_scan(table, **kwargs):
response = table.scan(**kwargs)
items = response['Items']
while 'LastEvaluatedKey' in response:
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)
items.extend(response['Items'])
return items
# Utility function for fetching the entire results of a query into an array of items
def full_query(table, **kwargs):
response = table.query(**kwargs)
items = response['Items']
while 'LastEvaluatedKey' in response:
response = table.query(ExclusiveStartKey=response['LastEvaluatedKey'], **kwargs)
items.extend(response['Items'])
return items
# To compare two lists of items (each is a dict) without regard for order,
# "==" is not good enough because it will fail if the order is different.
# The following function, multiset() converts the list into a multiset
# (set with duplicates) where order doesn't matter, so the multisets can
# be compared.
def freeze(item):
if isinstance(item, dict):
return frozenset((key, freeze(value)) for key, value in item.items())
elif isinstance(item, list):
return tuple(freeze(value) for value in item)
return item
def multiset(items):
return collections.Counter([freeze(item) for item in items])
test_table_prefix = 'alternator_test_'
def test_table_name():
current_ms = int(round(time.time() * 1000))
# In the off chance that test_table_name() is called twice in the same millisecond...
if test_table_name.last_ms >= current_ms:
current_ms = test_table_name.last_ms + 1
test_table_name.last_ms = current_ms
return test_table_prefix + str(current_ms)
test_table_name.last_ms = 0
def create_test_table(dynamodb, **kwargs):
name = test_table_name()
print("fixture creating new table {}".format(name))
table = dynamodb.create_table(TableName=name,
BillingMode='PAY_PER_REQUEST', **kwargs)
waiter = table.meta.client.get_waiter('table_exists')
# recheck every second instead of the default, lower, frequency. This can
# save a few seconds on AWS with its very slow table creation, but can
# more on tests on Scylla with its faster table creation turnaround.
waiter.config.delay = 1
waiter.config.max_attempts = 200
waiter.wait(TableName=name)
return table
# DynamoDB's ListTables request returns up to a single page of table names
# (e.g., up to 100) and it is up to the caller to call it again and again
# to get the next page. This is a utility function which calls it repeatedly
# as much as necessary to get the entire list.
# We deliberately return a list and not a set, because we want the caller
# to be able to recognize bugs in ListTables which causes the same table
# to be returned twice.
def list_tables(dynamodb, limit=100):
ret = []
pos = None
while True:
if pos:
page = dynamodb.meta.client.list_tables(Limit=limit, ExclusiveStartTableName=pos);
else:
page = dynamodb.meta.client.list_tables(Limit=limit);
results = page.get('TableNames', None)
assert(results)
ret = ret + results
newpos = page.get('LastEvaluatedTableName', None)
if not newpos:
break;
# It doesn't make sense for Dynamo to tell us we need more pages, but
# not send anything in *this* page!
assert len(results) > 0
assert newpos != pos
# Note that we only checked that we got back tables, not that we got
# any new tables not already in ret. So a buggy implementation might
# still cause an endless loop getting the same tables again and again.
pos = newpos
return ret

View File

@@ -1,146 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "alternator/error.hh"
#include "log.hh"
#include <string>
#include <string_view>
#include <gnutls/crypto.h>
#include <seastar/util/defer.hh>
#include "hashers.hh"
#include "bytes.hh"
#include "alternator/auth.hh"
#include <fmt/format.h>
#include "auth/common.hh"
#include "auth/password_authenticator.hh"
#include "auth/roles-metadata.hh"
#include "cql3/query_processor.hh"
#include "cql3/untyped_result_set.hh"
namespace alternator {
static logging::logger alogger("alternator-auth");
static hmac_sha256_digest hmac_sha256(std::string_view key, std::string_view msg) {
hmac_sha256_digest digest;
int ret = gnutls_hmac_fast(GNUTLS_MAC_SHA256, key.data(), key.size(), msg.data(), msg.size(), digest.data());
if (ret) {
throw std::runtime_error(fmt::format("Computing HMAC failed ({}): {}", ret, gnutls_strerror(ret)));
}
return digest;
}
static hmac_sha256_digest get_signature_key(std::string_view key, std::string_view date_stamp, std::string_view region_name, std::string_view service_name) {
auto date = hmac_sha256("AWS4" + std::string(key), date_stamp);
auto region = hmac_sha256(std::string_view(date.data(), date.size()), region_name);
auto service = hmac_sha256(std::string_view(region.data(), region.size()), service_name);
auto signing = hmac_sha256(std::string_view(service.data(), service.size()), "aws4_request");
return signing;
}
static std::string apply_sha256(std::string_view msg) {
sha256_hasher hasher;
hasher.update(msg.data(), msg.size());
return to_hex(hasher.finalize());
}
static std::string format_time_point(db_clock::time_point tp) {
time_t time_point_repr = db_clock::to_time_t(tp);
std::string time_point_str;
time_point_str.resize(17);
// strftime prints the terminating null character as well
std::strftime(time_point_str.data(), time_point_str.size(), "%Y%m%dT%H%M%SZ", std::gmtime(&time_point_repr));
time_point_str.resize(16);
return time_point_str;
}
void check_expiry(std::string_view signature_date) {
//FIXME: The default 15min can be changed with X-Amz-Expires header - we should honor it
std::string expiration_str = format_time_point(db_clock::now() - 15min);
std::string validity_str = format_time_point(db_clock::now() + 15min);
if (signature_date < expiration_str) {
throw api_error("InvalidSignatureException",
fmt::format("Signature expired: {} is now earlier than {} (current time - 15 min.)",
signature_date, expiration_str));
}
if (signature_date > validity_str) {
throw api_error("InvalidSignatureException",
fmt::format("Signature not yet current: {} is still later than {} (current time + 15 min.)",
signature_date, validity_str));
}
}
std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,
std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,
std::string_view body_content, std::string_view region, std::string_view service, std::string_view query_string) {
auto amz_date_it = signed_headers_map.find("x-amz-date");
if (amz_date_it == signed_headers_map.end()) {
throw api_error("InvalidSignatureException", "X-Amz-Date header is mandatory for signature verification");
}
std::string_view amz_date = amz_date_it->second;
check_expiry(amz_date);
std::string_view datestamp = amz_date.substr(0, 8);
if (datestamp != orig_datestamp) {
throw api_error("InvalidSignatureException",
format("X-Amz-Date date does not match the provided datestamp. Expected {}, got {}",
orig_datestamp, datestamp));
}
std::string_view canonical_uri = "/";
std::stringstream canonical_headers;
for (const auto& header : signed_headers_map) {
canonical_headers << fmt::format("{}:{}", header.first, header.second) << '\n';
}
std::string payload_hash = apply_sha256(body_content);
std::string canonical_request = fmt::format("{}\n{}\n{}\n{}\n{}\n{}", method, canonical_uri, query_string, canonical_headers.str(), signed_headers_str, payload_hash);
std::string_view algorithm = "AWS4-HMAC-SHA256";
std::string credential_scope = fmt::format("{}/{}/{}/aws4_request", datestamp, region, service);
std::string string_to_sign = fmt::format("{}\n{}\n{}\n{}", algorithm, amz_date, credential_scope, apply_sha256(canonical_request));
hmac_sha256_digest signing_key = get_signature_key(secret_access_key, datestamp, region, service);
hmac_sha256_digest signature = hmac_sha256(std::string_view(signing_key.data(), signing_key.size()), string_to_sign);
return to_hex(bytes_view(reinterpret_cast<const int8_t*>(signature.data()), signature.size()));
}
future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string username) {
static const sstring query = format("SELECT salted_hash FROM {} WHERE {} = ?",
auth::meta::roles_table::qualified_name(), auth::meta::roles_table::role_col_name);
auto cl = auth::password_authenticator::consistency_for_user(username);
auto timeout = auth::internal_distributed_timeout_config();
return qp.process(query, cl, timeout, {sstring(username)}, true).then_wrapped([username = std::move(username)] (future<::shared_ptr<cql3::untyped_result_set>> f) {
auto res = f.get0();
auto salted_hash = std::optional<sstring>();
if (res->empty()) {
throw api_error("UnrecognizedClientException", fmt::format("User not found: {}", username));
}
salted_hash = res->one().get_opt<sstring>("salted_hash");
if (!salted_hash) {
throw api_error("UnrecognizedClientException", fmt::format("No password found for user: {}", username));
}
return make_ready_future<std::string>(*salted_hash);
});
}
}

View File

@@ -1,46 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <string>
#include <string_view>
#include <array>
#include "gc_clock.hh"
#include "utils/loading_cache.hh"
namespace cql3 {
class query_processor;
}
namespace alternator {
using hmac_sha256_digest = std::array<char, 32>;
using key_cache = utils::loading_cache<std::string, std::string>;
std::string get_signature(std::string_view access_key_id, std::string_view secret_access_key, std::string_view host, std::string_view method,
std::string_view orig_datestamp, std::string_view signed_headers_str, const std::map<std::string_view, std::string_view>& signed_headers_map,
std::string_view body_content, std::string_view region, std::string_view service, std::string_view query_string);
future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string username);
}

View File

@@ -1,111 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
// The DynamoAPI dictates that "binary" (a.k.a. "bytes" or "blob") values
// be encoded in the JSON API as base64-encoded strings. This is code to
// convert byte arrays to base64-encoded strings, and back.
#include "base64.hh"
#include <ctype.h>
// Arrays for quickly converting to and from an integer between 0 and 63,
// and the character used in base64 encoding to represent it.
static class base64_chars {
public:
static constexpr const char* to =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
int8_t from[255];
base64_chars() {
static_assert(strlen(to) == 64);
for (int i = 0; i < 255; i++) {
from[i] = 255; // signal invalid character
}
for (int i = 0; i < 64; i++) {
from[(unsigned) to[i]] = i;
}
}
} base64_chars;
std::string base64_encode(bytes_view in) {
std::string ret;
ret.reserve(((4 * in.size() / 3) + 3) & ~3);
int i = 0;
unsigned char chunk3[3]; // chunk of input
for (auto byte : in) {
chunk3[i++] = byte;
if (i == 3) {
ret += base64_chars.to[ (chunk3[0] & 0xfc) >> 2 ];
ret += base64_chars.to[ ((chunk3[0] & 0x03) << 4) + ((chunk3[1] & 0xf0) >> 4) ];
ret += base64_chars.to[ ((chunk3[1] & 0x0f) << 2) + ((chunk3[2] & 0xc0) >> 6) ];
ret += base64_chars.to[ chunk3[2] & 0x3f ];
i = 0;
}
}
if (i) {
// i can be 1 or 2.
for(int j = i; j < 3; j++)
chunk3[j] = '\0';
ret += base64_chars.to[ ( chunk3[0] & 0xfc) >> 2 ];
ret += base64_chars.to[ ((chunk3[0] & 0x03) << 4) + ((chunk3[1] & 0xf0) >> 4) ];
if (i == 2) {
ret += base64_chars.to[ ((chunk3[1] & 0x0f) << 2) + ((chunk3[2] & 0xc0) >> 6) ];
} else {
ret += '=';
}
ret += '=';
}
return ret;
}
bytes base64_decode(std::string_view in) {
int i = 0;
int8_t chunk4[4]; // chunk of input, each byte converted to 0..63;
std::string ret;
ret.reserve(in.size() * 3 / 4);
for (unsigned char c : in) {
uint8_t dc = base64_chars.from[c];
if (dc == 255) {
// Any unexpected character, include the "=" character usually
// used for padding, signals the end of the decode.
break;
}
chunk4[i++] = dc;
if (i == 4) {
ret += (chunk4[0] << 2) + ((chunk4[1] & 0x30) >> 4);
ret += ((chunk4[1] & 0xf) << 4) + ((chunk4[2] & 0x3c) >> 2);
ret += ((chunk4[2] & 0x3) << 6) + chunk4[3];
i = 0;
}
}
if (i) {
// i can be 2 or 3, meaning 1 or 2 more output characters
if (i>=2)
ret += (chunk4[0] << 2) + ((chunk4[1] & 0x30) >> 4);
if (i==3)
ret += ((chunk4[1] & 0xf) << 4) + ((chunk4[2] & 0x3c) >> 2);
}
// FIXME: This copy is sad. The problem is we need back "bytes"
// but "bytes" doesn't have efficient append and std::string.
// To fix this we need to use bytes' "uninitialized" feature.
return bytes(ret.begin(), ret.end());
}

View File

@@ -1,34 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <string_view>
#include "bytes.hh"
#include "rjson.hh"
std::string base64_encode(bytes_view);
bytes base64_decode(std::string_view);
inline bytes base64_decode(const rjson::value& v) {
return base64_decode(std::string_view(v.GetString(), v.GetStringLength()));
}

View File

@@ -1,387 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <list>
#include <map>
#include <string_view>
#include "alternator/conditions.hh"
#include "alternator/error.hh"
#include "cql3/constants.hh"
#include <unordered_map>
#include "rjson.hh"
#include "serialization.hh"
#include "base64.hh"
namespace alternator {
static logging::logger clogger("alternator-conditions");
comparison_operator_type get_comparison_operator(const rjson::value& comparison_operator) {
static std::unordered_map<std::string, comparison_operator_type> ops = {
{"EQ", comparison_operator_type::EQ},
{"NE", comparison_operator_type::NE},
{"LE", comparison_operator_type::LE},
{"LT", comparison_operator_type::LT},
{"GE", comparison_operator_type::GE},
{"GT", comparison_operator_type::GT},
{"IN", comparison_operator_type::IN},
{"NULL", comparison_operator_type::IS_NULL},
{"NOT_NULL", comparison_operator_type::NOT_NULL},
{"BETWEEN", comparison_operator_type::BETWEEN},
{"BEGINS_WITH", comparison_operator_type::BEGINS_WITH},
}; //TODO: CONTAINS
if (!comparison_operator.IsString()) {
throw api_error("ValidationException", format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));
}
std::string op = comparison_operator.GetString();
auto it = ops.find(op);
if (it == ops.end()) {
throw api_error("ValidationException", format("Unsupported comparison operator {}", op));
}
return it->second;
}
static ::shared_ptr<cql3::restrictions::single_column_restriction::contains> make_map_element_restriction(const column_definition& cdef, std::string_view key, const rjson::value& value) {
bytes raw_key = utf8_type->from_string(sstring_view(key.data(), key.size()));
auto key_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_key)));
bytes raw_value = serialize_item(value);
auto entry_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_value)));
return make_shared<cql3::restrictions::single_column_restriction::contains>(cdef, std::move(key_value), std::move(entry_value));
}
static ::shared_ptr<cql3::restrictions::single_column_restriction::EQ> make_key_eq_restriction(const column_definition& cdef, const rjson::value& value) {
bytes raw_value = get_key_from_typed_value(value, cdef, type_to_string(cdef.type));
auto restriction_value = ::make_shared<cql3::constants::value>(cql3::raw_value::make_value(std::move(raw_value)));
return make_shared<cql3::restrictions::single_column_restriction::EQ>(cdef, std::move(restriction_value));
}
::shared_ptr<cql3::restrictions::statement_restrictions> get_filtering_restrictions(schema_ptr schema, const column_definition& attrs_col, const rjson::value& query_filter) {
clogger.trace("Getting filtering restrictions for: {}", rjson::print(query_filter));
auto filtering_restrictions = ::make_shared<cql3::restrictions::statement_restrictions>(schema, true);
for (auto it = query_filter.MemberBegin(); it != query_filter.MemberEnd(); ++it) {
std::string_view column_name(it->name.GetString(), it->name.GetStringLength());
const rjson::value& condition = it->value;
const rjson::value& comp_definition = rjson::get(condition, "ComparisonOperator");
const rjson::value& attr_list = rjson::get(condition, "AttributeValueList");
comparison_operator_type op = get_comparison_operator(comp_definition);
if (op != comparison_operator_type::EQ) {
throw api_error("ValidationException", "Filtering is currently implemented for EQ operator only");
}
if (attr_list.Size() != 1) {
throw api_error("ValidationException", format("EQ restriction needs exactly 1 attribute value: {}", rjson::print(attr_list)));
}
if (const column_definition* cdef = schema->get_column_definition(to_bytes(column_name.data()))) {
// Primary key restriction
filtering_restrictions->add_restriction(make_key_eq_restriction(*cdef, attr_list[0]), false, true);
} else {
// Regular column restriction
filtering_restrictions->add_restriction(make_map_element_restriction(attrs_col, column_name, attr_list[0]), false, true);
}
}
return filtering_restrictions;
}
namespace {
struct size_check {
// True iff size passes this check.
virtual bool operator()(rapidjson::SizeType size) const = 0;
// Check description, such that format("expected array {}", check.what()) is human-readable.
virtual sstring what() const = 0;
};
class exact_size : public size_check {
rapidjson::SizeType _expected;
public:
explicit exact_size(rapidjson::SizeType expected) : _expected(expected) {}
bool operator()(rapidjson::SizeType size) const override { return size == _expected; }
sstring what() const override { return format("of size {}", _expected); }
};
struct empty : public size_check {
bool operator()(rapidjson::SizeType size) const override { return size < 1; }
sstring what() const override { return "to be empty"; }
};
struct nonempty : public size_check {
bool operator()(rapidjson::SizeType size) const override { return size > 0; }
sstring what() const override { return "to be non-empty"; }
};
} // anonymous namespace
// Check that array has the expected number of elements
static void verify_operand_count(const rjson::value* array, const size_check& expected, const rjson::value& op) {
if (!array || !array->IsArray()) {
throw api_error("ValidationException", "With ComparisonOperator, AttributeValueList must be given and an array");
}
if (!expected(array->Size())) {
throw api_error("ValidationException",
format("{} operator requires AttributeValueList {}, instead found list size {}",
op, expected.what(), array->Size()));
}
}
// Check if two JSON-encoded values match with the EQ relation
static bool check_EQ(const rjson::value* v1, const rjson::value& v2) {
return v1 && *v1 == v2;
}
// Check if two JSON-encoded values match with the NE relation
static bool check_NE(const rjson::value* v1, const rjson::value& v2) {
return !v1 || *v1 != v2; // null is unequal to anything.
}
// Check if two JSON-encoded values match with the BEGINS_WITH relation
static bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2) {
// BEGINS_WITH requires that its single operand (v2) be a string or
// binary - otherwise it's a validation error. However, problems with
// the stored attribute (v1) will just return false (no match).
if (!v2.IsObject() || v2.MemberCount() != 1) {
throw api_error("ValidationException", format("BEGINS_WITH operator encountered malformed AttributeValue: {}", v2));
}
auto it2 = v2.MemberBegin();
if (it2->name != "S" && it2->name != "B") {
throw api_error("ValidationException", format("BEGINS_WITH operator requires String or Binary in AttributeValue, got {}", it2->name));
}
if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
return false;
}
auto it1 = v1->MemberBegin();
if (it1->name != it2->name) {
return false;
}
std::string_view val1(it1->value.GetString(), it1->value.GetStringLength());
std::string_view val2(it2->value.GetString(), it2->value.GetStringLength());
return val1.substr(0, val2.size()) == val2;
}
// Check if a JSON-encoded value equals any element of an array, which must have at least one element.
static bool check_IN(const rjson::value* val, const rjson::value& array) {
if (!array[0].IsObject() || array[0].MemberCount() != 1) {
throw api_error("ValidationException",
format("IN operator encountered malformed AttributeValue: {}", array[0]));
}
const auto& type = array[0].MemberBegin()->name;
if (type != "S" && type != "N" && type != "B") {
throw api_error("ValidationException",
"IN operator requires AttributeValueList elements to be of type String, Number, or Binary ");
}
if (!val) {
return false;
}
bool have_match = false;
for (const auto& elem : array.GetArray()) {
if (!elem.IsObject() || elem.MemberCount() != 1 || elem.MemberBegin()->name != type) {
throw api_error("ValidationException",
"IN operator requires all AttributeValueList elements to have the same type ");
}
if (!have_match && *val == elem) {
// Can't return yet, must check types of all array elements. <sigh>
have_match = true;
}
}
return have_match;
}
static bool check_NULL(const rjson::value* val) {
return val == nullptr;
}
static bool check_NOT_NULL(const rjson::value* val) {
return val != nullptr;
}
// Check if two JSON-encoded values match with cmp.
template <typename Comparator>
bool check_compare(const rjson::value* v1, const rjson::value& v2, const Comparator& cmp) {
if (!v2.IsObject() || v2.MemberCount() != 1) {
throw api_error("ValidationException",
format("{} requires a single AttributeValue of type String, Number, or Binary",
cmp.diagnostic()));
}
const auto& kv2 = *v2.MemberBegin();
if (kv2.name != "S" && kv2.name != "N" && kv2.name != "B") {
throw api_error("ValidationException",
format("{} requires a single AttributeValue of type String, Number, or Binary",
cmp.diagnostic()));
}
if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
return false;
}
const auto& kv1 = *v1->MemberBegin();
if (kv1.name != kv2.name) {
return false;
}
if (kv1.name == "N") {
return cmp(unwrap_number(*v1, cmp.diagnostic()), unwrap_number(v2, cmp.diagnostic()));
}
if (kv1.name == "S") {
return cmp(std::string_view(kv1.value.GetString(), kv1.value.GetStringLength()),
std::string_view(kv2.value.GetString(), kv2.value.GetStringLength()));
}
if (kv1.name == "B") {
return cmp(base64_decode(kv1.value), base64_decode(kv2.value));
}
clogger.error("check_compare panic: LHS type equals RHS type, but one is in {N,S,B} while the other isn't");
return false;
}
struct cmp_lt {
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return lhs < rhs; }
const char* diagnostic() const { return "LT operator"; }
};
struct cmp_gt {
// bytes only has <
template <typename T> bool operator()(const T& lhs, const T& rhs) const { return rhs < lhs; }
const char* diagnostic() const { return "GT operator"; }
};
// Verify one Expect condition on one attribute (whose content is "got")
// for the verify_expected() below.
// This function returns true or false depending on whether the condition
// succeeded - it does not throw ConditionalCheckFailedException.
// However, it may throw ValidationException on input validation errors.
static bool verify_expected_one(const rjson::value& condition, const rjson::value* got) {
const rjson::value* comparison_operator = rjson::find(condition, "ComparisonOperator");
const rjson::value* attribute_value_list = rjson::find(condition, "AttributeValueList");
const rjson::value* value = rjson::find(condition, "Value");
const rjson::value* exists = rjson::find(condition, "Exists");
// There are three types of conditions that Expected supports:
// A value, not-exists, and a comparison of some kind. Each allows
// and requires a different combinations of parameters in the request
if (value) {
if (exists && (!exists->IsBool() || exists->GetBool() != true)) {
throw api_error("ValidationException", "Cannot combine Value with Exists!=true");
}
if (comparison_operator) {
throw api_error("ValidationException", "Cannot combine Value with ComparisonOperator");
}
return check_EQ(got, *value);
} else if (exists) {
if (comparison_operator) {
throw api_error("ValidationException", "Cannot combine Exists with ComparisonOperator");
}
if (!exists->IsBool() || exists->GetBool() != false) {
throw api_error("ValidationException", "Exists!=false requires Value");
}
// Remember Exists=false, so we're checking that the attribute does *not* exist:
return !got;
} else {
if (!comparison_operator) {
throw api_error("ValidationException", "Missing ComparisonOperator, Value or Exists");
}
comparison_operator_type op = get_comparison_operator(*comparison_operator);
switch (op) {
case comparison_operator_type::EQ:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_EQ(got, (*attribute_value_list)[0]);
case comparison_operator_type::NE:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_NE(got, (*attribute_value_list)[0]);
case comparison_operator_type::LT:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_lt{});
case comparison_operator_type::GT:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_gt{});
case comparison_operator_type::BEGINS_WITH:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_BEGINS_WITH(got, (*attribute_value_list)[0]);
case comparison_operator_type::IN:
verify_operand_count(attribute_value_list, nonempty(), *comparison_operator);
return check_IN(got, *attribute_value_list);
case comparison_operator_type::IS_NULL:
verify_operand_count(attribute_value_list, empty(), *comparison_operator);
return check_NULL(got);
case comparison_operator_type::NOT_NULL:
verify_operand_count(attribute_value_list, empty(), *comparison_operator);
return check_NOT_NULL(got);
default:
// FIXME: implement all the missing types, so there will be no default here.
throw api_error("ValidationException", format("ComparisonOperator {} is not yet supported", *comparison_operator));
}
}
}
// Verify that the existing values of the item (previous_item) match the
// conditions given by the Expected and ConditionalOperator parameters
// (if they exist) in the request (an UpdateItem, PutItem or DeleteItem).
// This function will throw a ConditionalCheckFailedException API error
// if the values do not match the condition, or ValidationException if there
// are errors in the format of the condition itself.
void verify_expected(const rjson::value& req, const std::unique_ptr<rjson::value>& previous_item) {
const rjson::value* expected = rjson::find(req, "Expected");
if (!expected) {
return;
}
if (!expected->IsObject()) {
throw api_error("ValidationException", "'Expected' parameter, if given, must be an object");
}
// ConditionalOperator can be "AND" for requiring all conditions, or
// "OR" for requiring one condition, and defaults to "AND" if missing.
const rjson::value* conditional_operator = rjson::find(req, "ConditionalOperator");
bool require_all = true;
if (conditional_operator) {
if (!conditional_operator->IsString()) {
throw api_error("ValidationException", "'ConditionalOperator' parameter, if given, must be a string");
}
std::string_view s(conditional_operator->GetString(), conditional_operator->GetStringLength());
if (s == "AND") {
// require_all is already true
} else if (s == "OR") {
require_all = false;
} else {
throw api_error("ValidationException", "'ConditionalOperator' parameter must be AND, OR or missing");
}
if (expected->GetObject().ObjectEmpty()) {
throw api_error("ValidationException", "'ConditionalOperator' parameter cannot be specified for empty Expression");
}
}
for (auto it = expected->MemberBegin(); it != expected->MemberEnd(); ++it) {
const rjson::value* got = nullptr;
if (previous_item && previous_item->IsObject() && previous_item->HasMember("Item")) {
got = rjson::find((*previous_item)["Item"], rjson::string_ref_type(it->name.GetString()));
}
bool success = verify_expected_one(it->value, got);
if (success && !require_all) {
// When !require_all, one success is enough!
return;
} else if (!success && require_all) {
// When require_all, one failure is enough!
throw api_error("ConditionalCheckFailedException", "Failed condition.");
}
}
// If we got here and require_all, none of the checks failed, so succeed.
// If we got here and !require_all, all of the checks failed, so fail.
if (!require_all) {
throw api_error("ConditionalCheckFailedException", "None of ORed Expect conditions were successful.");
}
}
}

View File

@@ -1,49 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* This file contains definitions and functions related to placing conditions
* on Alternator queries (equivalent of CQL's restrictions).
*
* With conditions, it's possible to add criteria to selection requests (Scan, Query)
* and use them for narrowing down the result set, by means of filtering or indexing.
*
* Ref: https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Condition.html
*/
#pragma once
#include "cql3/restrictions/statement_restrictions.hh"
#include "serialization.hh"
namespace alternator {
enum class comparison_operator_type {
EQ, NE, LE, LT, GE, GT, IN, BETWEEN, CONTAINS, IS_NULL, NOT_NULL, BEGINS_WITH
};
comparison_operator_type get_comparison_operator(const rjson::value& comparison_operator);
::shared_ptr<cql3::restrictions::statement_restrictions> get_filtering_restrictions(schema_ptr schema, const column_definition& attrs_col, const rjson::value& query_filter);
void verify_expected(const rjson::value& req, const std::unique_ptr<rjson::value>& previous_item);
}

View File

@@ -1,50 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <seastar/http/httpd.hh>
#include "seastarx.hh"
namespace alternator {
// DynamoDB's error messages are described in detail in
// https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html
// Ah An error message has a "type", e.g., "ResourceNotFoundException", a coarser
// HTTP code (almost always, 400), and a human readable message. Eventually these
// will be wrapped into a JSON object returned to the client.
class api_error : public std::exception {
public:
using status_type = httpd::reply::status_type;
status_type _http_code;
std::string _type;
std::string _msg;
api_error(std::string type, std::string msg, status_type http_code = status_type::bad_request)
: _http_code(std::move(http_code))
, _type(std::move(type))
, _msg(std::move(msg))
{ }
api_error() = default;
virtual const char* what() const noexcept override { return _msg.c_str(); }
};
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,71 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <seastar/core/future.hh>
#include <seastar/http/httpd.hh>
#include "seastarx.hh"
#include <seastar/json/json_elements.hh>
#include "service/storage_proxy.hh"
#include "service/migration_manager.hh"
#include "service/client_state.hh"
#include "stats.hh"
namespace alternator {
class executor {
service::storage_proxy& _proxy;
service::migration_manager& _mm;
public:
using client_state = service::client_state;
stats _stats;
static constexpr auto ATTRS_COLUMN_NAME = ":attrs";
static constexpr auto KEYSPACE_NAME = "alternator";
executor(service::storage_proxy& proxy, service::migration_manager& mm) : _proxy(proxy), _mm(mm) {}
future<json::json_return_type> create_table(client_state& client_state, std::string content);
future<json::json_return_type> describe_table(client_state& client_state, std::string content);
future<json::json_return_type> delete_table(client_state& client_state, std::string content);
future<json::json_return_type> put_item(client_state& client_state, std::string content);
future<json::json_return_type> get_item(client_state& client_state, std::string content);
future<json::json_return_type> delete_item(client_state& client_state, std::string content);
future<json::json_return_type> update_item(client_state& client_state, std::string content);
future<json::json_return_type> list_tables(client_state& client_state, std::string content);
future<json::json_return_type> scan(client_state& client_state, std::string content);
future<json::json_return_type> describe_endpoints(client_state& client_state, std::string content, std::string host_header);
future<json::json_return_type> batch_write_item(client_state& client_state, std::string content);
future<json::json_return_type> batch_get_item(client_state& client_state, std::string content);
future<json::json_return_type> query(client_state& client_state, std::string content);
future<> start();
future<> stop() { return make_ready_future<>(); }
future<> maybe_create_keyspace();
static void maybe_trace_query(client_state& client_state, sstring_view op, sstring_view query);
};
}

View File

@@ -1,98 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "expressions.hh"
#include "alternator/expressionsLexer.hpp"
#include "alternator/expressionsParser.hpp"
#include <seastarx.hh>
#include <seastar/core/print.hh>
#include <seastar/util/log.hh>
#include <functional>
namespace alternator {
template <typename Func, typename Result = std::result_of_t<Func(expressionsParser&)>>
Result do_with_parser(std::string input, Func&& f) {
expressionsLexer::InputStreamType input_stream{
reinterpret_cast<const ANTLR_UINT8*>(input.data()),
ANTLR_ENC_UTF8,
static_cast<ANTLR_UINT32>(input.size()),
nullptr };
expressionsLexer lexer(&input_stream);
expressionsParser::TokenStreamType tstream(ANTLR_SIZE_HINT, lexer.get_tokSource());
expressionsParser parser(&tstream);
auto result = f(parser);
return result;
}
parsed::update_expression
parse_update_expression(std::string query) {
try {
return do_with_parser(query, std::mem_fn(&expressionsParser::update_expression));
} catch (...) {
throw expressions_syntax_error(format("Failed parsing UpdateExpression '{}': {}", query, std::current_exception()));
}
}
std::vector<parsed::path>
parse_projection_expression(std::string query) {
try {
return do_with_parser(query, std::mem_fn(&expressionsParser::projection_expression));
} catch (...) {
throw expressions_syntax_error(format("Failed parsing ProjectionExpression '{}': {}", query, std::current_exception()));
}
}
template<class... Ts> struct overloaded : Ts... { using Ts::operator()...; };
template<class... Ts> overloaded(Ts...) -> overloaded<Ts...>;
namespace parsed {
void update_expression::add(update_expression::action a) {
std::visit(overloaded {
[&] (action::set&) { seen_set = true; },
[&] (action::remove&) { seen_remove = true; },
[&] (action::add&) { seen_add = true; },
[&] (action::del&) { seen_del = true; }
}, a._action);
_actions.push_back(std::move(a));
}
void update_expression::append(update_expression other) {
if ((seen_set && other.seen_set) ||
(seen_remove && other.seen_remove) ||
(seen_add && other.seen_add) ||
(seen_del && other.seen_del)) {
throw expressions_syntax_error("Each of SET, REMOVE, ADD, DELETE may only appear once in UpdateExpression");
}
std::move(other._actions.begin(), other._actions.end(), std::back_inserter(_actions));
seen_set |= other.seen_set;
seen_remove |= other.seen_remove;
seen_add |= other.seen_add;
seen_del |= other.seen_del;
}
} // namespace parsed
} // namespace alternator

View File

@@ -1,214 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*
* This file is part of Scylla. See the LICENSE.PROPRIETARY file in the
* top-level directory for licensing information.
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
/*
* The DynamoDB protocol is based on JSON, and most DynamoDB requests
* describe the operation and its parameters via JSON objects such as maps
* and lists. Nevertheless, in some types of requests an "expression" is
* passed as a single string, and we need to parse this string. These
* cases include:
* 1. Attribute paths, such as "a[3].b.c", are used in projection
* expressions as well as inside other expressions described below.
* 2. Condition expressions, such as "(NOT (a=b OR c=d)) AND e=f",
* used in conditional updates, filters, and other places.
* 3. Update expressions, such as "SET #a.b = :x, c = :y DELETE d"
*
* All these expression syntaxes are very simple: Most of them could be
* parsed as regular expressions, and the parenthesized condition expression
* could be done with a simple hand-written lexical analyzer and recursive-
* descent parser. Nevertheless, we decided to specify these parsers in the
* ANTLR3 language already used in the Scylla project, hopefully making these
* parsers easier to reason about, and easier to change if needed - and
* reducing the amount of boiler-plate code.
*/
grammar expressions;
options {
language = Cpp;
}
@parser::namespace{alternator}
@lexer::namespace{alternator}
/* TODO: explain what these traits things are. I haven't seen them explained
* in any document... Compilation fails without these fail because a definition
* of "expressionsLexerTraits" and "expressionParserTraits" is needed.
*/
@lexer::traits {
class expressionsLexer;
class expressionsParser;
typedef antlr3::Traits<expressionsLexer, expressionsParser> expressionsLexerTraits;
}
@parser::traits {
typedef expressionsLexerTraits expressionsParserTraits;
}
@lexer::header {
#include "alternator/expressions.hh"
// ANTLR generates a bunch of unused variables and functions. Yuck...
#pragma GCC diagnostic ignored "-Wunused-variable"
#pragma GCC diagnostic ignored "-Wunused-function"
}
@parser::header {
#include "expressionsLexer.hpp"
}
/* By default, ANTLR3 composes elaborate syntax-error messages, saying which
* token was unexpected, where, and so on on, but then dutifully writes these
* error messages to the standard error, and returns from the parser as if
* everything was fine, with a half-constructed output object! If we define
* the "displayRecognitionError" method, it will be called upon to build this
* error message, and we can instead throw an exception to stop the parsing
* immediately. This is good enough for now, for our simple needs, but if
* we ever want to show more information about the syntax error, Cql3.g
* contains an elaborate implementation (it would be nice if we could reuse
* it, not duplicate it).
* Unfortunately, we have to repeat the same definition twice - once for the
* parser, and once for the lexer.
*/
@parser::context {
void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex) {
throw expressions_syntax_error("syntax error");
}
}
@lexer::context {
void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex) {
throw expressions_syntax_error("syntax error");
}
}
/*
* Lexical analysis phase, i.e., splitting the input up to tokens.
* Lexical analyzer rules have names starting in capital letters.
* "fragment" rules do not generate tokens, and are just aliases used to
* make other rules more readable.
* Characters *not* listed here, e.g., '=', '(', etc., will be handled
* as individual tokens on their own right.
* Whitespace spans are skipped, so do not generate tokens.
*/
WHITESPACE: (' ' | '\t' | '\n' | '\r')+ { skip(); };
/* shortcuts for case-insensitive keywords */
fragment A:('a'|'A');
fragment B:('b'|'B');
fragment C:('c'|'C');
fragment D:('d'|'D');
fragment E:('e'|'E');
fragment F:('f'|'F');
fragment G:('g'|'G');
fragment H:('h'|'H');
fragment I:('i'|'I');
fragment J:('j'|'J');
fragment K:('k'|'K');
fragment L:('l'|'L');
fragment M:('m'|'M');
fragment N:('n'|'N');
fragment O:('o'|'O');
fragment P:('p'|'P');
fragment Q:('q'|'Q');
fragment R:('r'|'R');
fragment S:('s'|'S');
fragment T:('t'|'T');
fragment U:('u'|'U');
fragment V:('v'|'V');
fragment W:('w'|'W');
fragment X:('x'|'X');
fragment Y:('y'|'Y');
fragment Z:('z'|'Z');
/* These keywords must be appear before the generic NAME token below,
* because NAME matches too, and the first to match wins.
*/
SET: S E T;
REMOVE: R E M O V E;
ADD: A D D;
DELETE: D E L E T E;
fragment ALPHA: 'A'..'Z' | 'a'..'z';
fragment DIGIT: '0'..'9';
fragment ALNUM: ALPHA | DIGIT | '_';
INTEGER: DIGIT+;
NAME: ALPHA ALNUM*;
NAMEREF: '#' ALNUM+;
VALREF: ':' ALNUM+;
/*
* Parsing phase - parsing the string of tokens generated by the lexical
* analyzer defined above.
*/
path_component: NAME | NAMEREF;
path returns [parsed::path p]:
root=path_component { $p.set_root($root.text); }
( '.' name=path_component { $p.add_dot($name.text); }
| '[' INTEGER ']' { $p.add_index(std::stoi($INTEGER.text)); }
)*;
update_expression_set_value returns [parsed::value v]:
VALREF { $v.set_valref($VALREF.text); }
| path { $v.set_path($path.p); }
| NAME { $v.set_func_name($NAME.text); }
'(' x=update_expression_set_value { $v.add_func_parameter($x.v); }
(',' x=update_expression_set_value { $v.add_func_parameter($x.v); })*
')'
;
update_expression_set_rhs returns [parsed::set_rhs rhs]:
v=update_expression_set_value { $rhs.set_value(std::move($v.v)); }
( '+' v=update_expression_set_value { $rhs.set_plus(std::move($v.v)); }
| '-' v=update_expression_set_value { $rhs.set_minus(std::move($v.v)); }
)?
;
update_expression_set_action returns [parsed::update_expression::action a]:
path '=' rhs=update_expression_set_rhs { $a.assign_set($path.p, $rhs.rhs); };
update_expression_remove_action returns [parsed::update_expression::action a]:
path { $a.assign_remove($path.p); };
update_expression_add_action returns [parsed::update_expression::action a]:
path VALREF { $a.assign_add($path.p, $VALREF.text); };
update_expression_delete_action returns [parsed::update_expression::action a]:
path VALREF { $a.assign_del($path.p, $VALREF.text); };
update_expression_clause returns [parsed::update_expression e]:
SET s=update_expression_set_action { $e.add(s); }
(',' s=update_expression_set_action { $e.add(s); })*
| REMOVE r=update_expression_remove_action { $e.add(r); }
(',' r=update_expression_remove_action { $e.add(r); })*
| ADD a=update_expression_add_action { $e.add(a); }
(',' a=update_expression_add_action { $e.add(a); })*
| DELETE d=update_expression_delete_action { $e.add(d); }
(',' d=update_expression_delete_action { $e.add(d); })*
;
// Note the "EOF" token at the end of the update expression. We want to the
// parser to match the entire string given to it - not just its beginning!
update_expression returns [parsed::update_expression e]:
(update_expression_clause { e.append($update_expression_clause.e); })* EOF;
projection_expression returns [std::vector<parsed::path> v]:
p=path { $v.push_back(std::move($p.p)); }
(',' p=path { $v.push_back(std::move($p.p)); } )* EOF;

View File

@@ -1,41 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <string>
#include <stdexcept>
#include <vector>
#include "expressions_types.hh"
namespace alternator {
class expressions_syntax_error : public std::runtime_error {
public:
using runtime_error::runtime_error;
};
parsed::update_expression parse_update_expression(std::string query);
std::vector<parsed::path> parse_projection_expression(std::string query);
} /* namespace alternator */

View File

@@ -1,166 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <vector>
#include <string>
#include <variant>
/*
* Parsed representation of expressions and their components.
*
* Types in alternator::parse namespace are used for holding the parse
* tree - objects generated by the Antlr rules after parsing an expression.
* Because of the way Antlr works, all these objects are default-constructed
* first, and then assigned when the rule is completed, so all these types
* have only default constructors - but setter functions to set them later.
*/
namespace alternator {
namespace parsed {
// "path" is an attribute's path in a document, e.g., a.b[3].c.
class path {
// All paths have a "root", a top-level attribute, and any number of
// "dereference operators" - each either an index (e.g., "[2]") or a
// dot (e.g., ".xyz").
std::string _root;
std::vector<std::variant<std::string, unsigned>> _operators;
public:
void set_root(std::string root) {
_root = std::move(root);
}
void add_index(unsigned i) {
_operators.emplace_back(i);
}
void add_dot(std::string(name)) {
_operators.emplace_back(std::move(name));
}
const std::string& root() const {
return _root;
}
bool has_operators() const {
return !_operators.empty();
}
};
// "value" is is a value used in the right hand side of an assignment
// expression, "SET a = ...". It can be a reference to a value included in
// the request (":val"), a path to an attribute from the existing item
// (e.g., "a.b[3].c"), or a function of other such values.
// Note that the real right-hand-side of an assignment is actually a bit
// more general - it allows either a value, or a value+value or value-value -
// see class set_rhs below.
struct value {
struct function_call {
std::string _function_name;
std::vector<value> _parameters;
};
std::variant<std::string, path, function_call> _value;
void set_valref(std::string s) {
_value = std::move(s);
}
void set_path(path p) {
_value = std::move(p);
}
void set_func_name(std::string s) {
_value = function_call {std::move(s), {}};
}
void add_func_parameter(value v) {
std::get<function_call>(_value)._parameters.emplace_back(std::move(v));
}
};
// The right-hand-side of a SET in an update expression can be either a
// single value (see above), or value+value, or value-value.
class set_rhs {
public:
char _op; // '+', '-', or 'v''
value _v1;
value _v2;
void set_value(value&& v1) {
_op = 'v';
_v1 = std::move(v1);
}
void set_plus(value&& v2) {
_op = '+';
_v2 = std::move(v2);
}
void set_minus(value&& v2) {
_op = '-';
_v2 = std::move(v2);
}
};
class update_expression {
public:
struct action {
path _path;
struct set {
set_rhs _rhs;
};
struct remove {
};
struct add {
std::string _valref;
};
struct del {
std::string _valref;
};
std::variant<set, remove, add, del> _action;
void assign_set(path p, set_rhs rhs) {
_path = std::move(p);
_action = set { std::move(rhs) };
}
void assign_remove(path p) {
_path = std::move(p);
_action = remove { };
}
void assign_add(path p, std::string v) {
_path = std::move(p);
_action = add { std::move(v) };
}
void assign_del(path p, std::string v) {
_path = std::move(p);
_action = del { std::move(v) };
}
};
private:
std::vector<action> _actions;
bool seen_set = false;
bool seen_remove = false;
bool seen_add = false;
bool seen_del = false;
public:
void add(action a);
void append(update_expression other);
bool empty() const {
return _actions.empty();
}
const std::vector<action>& actions() const {
return _actions;
}
};
} // namespace parsed
} // namespace alternator

View File

@@ -1,120 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "rjson.hh"
#include "error.hh"
#include <seastar/core/print.hh>
namespace rjson {
static allocator the_allocator;
std::string print(const rjson::value& value) {
string_buffer buffer;
writer writer(buffer);
value.Accept(writer);
return std::string(buffer.GetString());
}
rjson::value copy(const rjson::value& value) {
return rjson::value(value, the_allocator);
}
rjson::value parse(const std::string& str) {
return parse_raw(str.c_str(), str.size());
}
rjson::value parse_raw(const char* c_str, size_t size) {
rjson::document d;
d.Parse(c_str, size);
if (d.HasParseError()) {
throw rjson::error(format("Parsing JSON failed: {}", GetParseError_En(d.GetParseError())));
}
rjson::value& v = d;
return std::move(v);
}
rjson::value& get(rjson::value& value, rjson::string_ref_type name) {
auto member_it = value.FindMember(name);
if (member_it != value.MemberEnd())
return member_it->value;
else {
throw rjson::error(format("JSON parameter {} not found", name));
}
}
const rjson::value& get(const rjson::value& value, rjson::string_ref_type name) {
auto member_it = value.FindMember(name);
if (member_it != value.MemberEnd())
return member_it->value;
else {
throw rjson::error(format("JSON parameter {} not found", name));
}
}
rjson::value from_string(const std::string& str) {
return rjson::value(str.c_str(), str.size(), the_allocator);
}
rjson::value from_string(const sstring& str) {
return rjson::value(str.c_str(), str.size(), the_allocator);
}
rjson::value from_string(const char* str, size_t size) {
return rjson::value(str, size, the_allocator);
}
const rjson::value* find(const rjson::value& value, string_ref_type name) {
auto member_it = value.FindMember(name);
return member_it != value.MemberEnd() ? &member_it->value : nullptr;
}
rjson::value* find(rjson::value& value, string_ref_type name) {
auto member_it = value.FindMember(name);
return member_it != value.MemberEnd() ? &member_it->value : nullptr;
}
void set_with_string_name(rjson::value& base, const std::string& name, rjson::value&& member) {
base.AddMember(rjson::value(name.c_str(), name.size(), the_allocator), std::move(member), the_allocator);
}
void set_with_string_name(rjson::value& base, const std::string& name, rjson::string_ref_type member) {
base.AddMember(rjson::value(name.c_str(), name.size(), the_allocator), rjson::value(member), the_allocator);
}
void set(rjson::value& base, rjson::string_ref_type name, rjson::value&& member) {
base.AddMember(name, std::move(member), the_allocator);
}
void set(rjson::value& base, rjson::string_ref_type name, rjson::string_ref_type member) {
base.AddMember(name, rjson::value(member), the_allocator);
}
void push_back(rjson::value& base_array, rjson::value&& item) {
base_array.PushBack(std::move(item), the_allocator);
}
} // end namespace rjson
std::ostream& std::operator<<(std::ostream& os, const rjson::value& v) {
return os << rjson::print(v);
}

View File

@@ -1,159 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
/*
* rjson is a wrapper over rapidjson library, providing fast JSON parsing and generation.
*
* rapidjson has strict copy elision policies, which, among other things, involves
* using provided char arrays without copying them and allows copying objects only explicitly.
* As such, one should be careful when passing strings with limited liveness
* (e.g. data underneath local std::strings) to rjson functions, because created JSON objects
* may end up relying on dangling char pointers. All rjson functions that create JSONs from strings
* by rjson have both APIs for string_ref_type (more optimal, used when the string is known to live
* at least as long as the object, e.g. a static char array) and for std::strings. The more optimal
* variants should be used *only* if the liveness of the string is guaranteed, otherwise it will
* result in undefined behaviour.
* Also, bear in mind that methods exposed by rjson::value are generic, but some of them
* work fine only for specific types. In case the type does not match, an rjson::error will be thrown.
* Examples of such mismatched usages is calling MemberCount() on a JSON value not of object type
* or calling Size() on a non-array value.
*/
#include <string>
#include <stdexcept>
namespace rjson {
class error : public std::exception {
std::string _msg;
public:
error() = default;
error(const std::string& msg) : _msg(msg) {}
virtual const char* what() const noexcept override { return _msg.c_str(); }
};
}
// rapidjson configuration macros
#define RAPIDJSON_HAS_STDSTRING 1
// Default rjson policy is to use assert() - which is dangerous for two reasons:
// 1. assert() can be turned off with -DNDEBUG
// 2. assert() crashes a program
// Fortunately, the default policy can be overridden, and so rapidjson errors will
// throw an rjson::error exception instead.
#define RAPIDJSON_ASSERT(x) do { if (!(x)) throw rjson::error(std::string("JSON error: condition not met: ") + #x); } while (0)
#include <rapidjson/document.h>
#include <rapidjson/writer.h>
#include <rapidjson/stringbuffer.h>
#include <rapidjson/error/en.h>
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
namespace rjson {
using allocator = rapidjson::CrtAllocator;
using encoding = rapidjson::UTF8<>;
using document = rapidjson::GenericDocument<encoding, allocator>;
using value = rapidjson::GenericValue<encoding, allocator>;
using string_ref_type = value::StringRefType;
using string_buffer = rapidjson::GenericStringBuffer<encoding>;
using writer = rapidjson::Writer<string_buffer, encoding>;
using type = rapidjson::Type;
// Returns an object representing JSON's null
inline rjson::value null_value() {
return rjson::value(rapidjson::kNullType);
}
// Returns an empty JSON object - {}
inline rjson::value empty_object() {
return rjson::value(rapidjson::kObjectType);
}
// Returns an empty JSON array - []
inline rjson::value empty_array() {
return rjson::value(rapidjson::kArrayType);
}
// Returns an empty JSON string - ""
inline rjson::value empty_string() {
return rjson::value(rapidjson::kStringType);
}
// Convert the JSON value to a string with JSON syntax, the opposite of parse().
// The representation is dense - without any redundant indentation.
std::string print(const rjson::value& value);
// Copies given JSON value - involves allocation
rjson::value copy(const rjson::value& value);
// Parses a JSON value from given string or raw character array.
// The string/char array liveness does not need to be persisted,
// as both parse() and parse_raw() will allocate member names and values.
// Throws rjson::error if parsing failed.
rjson::value parse(const std::string& str);
rjson::value parse_raw(const char* c_str, size_t size);
// Creates a JSON value (of JSON string type) out of internal string representations.
// The string value is copied, so str's liveness does not need to be persisted.
rjson::value from_string(const std::string& str);
rjson::value from_string(const sstring& str);
rjson::value from_string(const char* str, size_t size);
// Returns a pointer to JSON member if it exists, nullptr otherwise
rjson::value* find(rjson::value& value, rjson::string_ref_type name);
const rjson::value* find(const rjson::value& value, rjson::string_ref_type name);
// Returns a reference to JSON member if it exists, throws otherwise
rjson::value& get(rjson::value& value, rjson::string_ref_type name);
const rjson::value& get(const rjson::value& value, rjson::string_ref_type name);
// Sets a member in given JSON object by moving the member - allocates the name.
// Throws if base is not a JSON object.
void set_with_string_name(rjson::value& base, const std::string& name, rjson::value&& member);
// Sets a string member in given JSON object by assigning its reference - allocates the name.
// NOTICE: member string liveness must be ensured to be at least as long as base's.
// Throws if base is not a JSON object.
void set_with_string_name(rjson::value& base, const std::string& name, rjson::string_ref_type member);
// Sets a member in given JSON object by moving the member.
// NOTICE: name liveness must be ensured to be at least as long as base's.
// Throws if base is not a JSON object.
void set(rjson::value& base, rjson::string_ref_type name, rjson::value&& member);
// Sets a string member in given JSON object by assigning its reference.
// NOTICE: name liveness must be ensured to be at least as long as base's.
// NOTICE: member liveness must be ensured to be at least as long as base's.
// Throws if base is not a JSON object.
void set(rjson::value& base, rjson::string_ref_type name, rjson::string_ref_type member);
// Adds a value to a JSON list by moving the item to its end.
// Throws if base_array is not a JSON array.
void push_back(rjson::value& base_array, rjson::value&& item);
} // end namespace rjson
namespace std {
std::ostream& operator<<(std::ostream& os, const rjson::value& v);
}

View File

@@ -1,248 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "base64.hh"
#include "log.hh"
#include "serialization.hh"
#include "error.hh"
#include "rapidjson/writer.h"
#include "concrete_types.hh"
static logging::logger slogger("alternator-serialization");
namespace alternator {
type_info type_info_from_string(std::string type) {
static thread_local const std::unordered_map<std::string, type_info> type_infos = {
{"S", {alternator_type::S, utf8_type}},
{"B", {alternator_type::B, bytes_type}},
{"BOOL", {alternator_type::BOOL, boolean_type}},
{"N", {alternator_type::N, decimal_type}}, //FIXME: Replace with custom Alternator type when implemented
};
auto it = type_infos.find(type);
if (it == type_infos.end()) {
return {alternator_type::NOT_SUPPORTED_YET, utf8_type};
}
return it->second;
}
type_representation represent_type(alternator_type atype) {
static thread_local const std::unordered_map<alternator_type, type_representation> type_representations = {
{alternator_type::S, {"S", utf8_type}},
{alternator_type::B, {"B", bytes_type}},
{alternator_type::BOOL, {"BOOL", boolean_type}},
{alternator_type::N, {"N", decimal_type}}, //FIXME: Replace with custom Alternator type when implemented
};
auto it = type_representations.find(atype);
if (it == type_representations.end()) {
throw std::runtime_error(format("Unknown alternator type {}", int8_t(atype)));
}
return it->second;
}
struct from_json_visitor {
const rjson::value& v;
bytes_ostream& bo;
void operator()(const reversed_type_impl& t) const { visit(*t.underlying_type(), from_json_visitor{v, bo}); };
void operator()(const string_type_impl& t) {
bo.write(t.from_string(sstring_view(v.GetString(), v.GetStringLength())));
}
void operator()(const bytes_type_impl& t) const {
bo.write(base64_decode(v));
}
void operator()(const boolean_type_impl& t) const {
bo.write(boolean_type->decompose(v.GetBool()));
}
void operator()(const decimal_type_impl& t) const {
bo.write(t.from_string(sstring_view(v.GetString(), v.GetStringLength())));
}
// default
void operator()(const abstract_type& t) const {
bo.write(t.from_json_object(Json::Value(rjson::print(v)), cql_serialization_format::internal()));
}
};
bytes serialize_item(const rjson::value& item) {
if (item.IsNull() || item.MemberCount() != 1) {
throw api_error("ValidationException", format("An item can contain only one attribute definition: {}", item));
}
auto it = item.MemberBegin();
type_info type_info = type_info_from_string(it->name.GetString()); // JSON keys are guaranteed to be strings
if (type_info.atype == alternator_type::NOT_SUPPORTED_YET) {
slogger.trace("Non-optimal serialization of type {}", it->name.GetString());
return bytes{int8_t(type_info.atype)} + to_bytes(rjson::print(item));
}
bytes_ostream bo;
bo.write(bytes{int8_t(type_info.atype)});
visit(*type_info.dtype, from_json_visitor{it->value, bo});
return bytes(bo.linearize());
}
struct to_json_visitor {
rjson::value& deserialized;
const std::string& type_ident;
bytes_view bv;
void operator()(const reversed_type_impl& t) const { visit(*t.underlying_type(), to_json_visitor{deserialized, type_ident, bv}); };
void operator()(const decimal_type_impl& t) const {
auto s = decimal_type->to_json_string(bytes(bv));
//FIXME(sarna): unnecessary copy
rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(s));
}
void operator()(const string_type_impl& t) {
rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(reinterpret_cast<const char *>(bv.data()), bv.size()));
}
void operator()(const bytes_type_impl& t) const {
std::string b64 = base64_encode(bv);
rjson::set_with_string_name(deserialized, type_ident, rjson::from_string(b64));
}
// default
void operator()(const abstract_type& t) const {
rjson::set_with_string_name(deserialized, type_ident, rjson::parse(t.to_string(bytes(bv))));
}
};
rjson::value deserialize_item(bytes_view bv) {
rjson::value deserialized(rapidjson::kObjectType);
if (bv.empty()) {
throw api_error("ValidationException", "Serialized value empty");
}
alternator_type atype = alternator_type(bv[0]);
bv.remove_prefix(1);
if (atype == alternator_type::NOT_SUPPORTED_YET) {
slogger.trace("Non-optimal deserialization of alternator type {}", int8_t(atype));
return rjson::parse_raw(reinterpret_cast<const char *>(bv.data()), bv.size());
}
type_representation type_representation = represent_type(atype);
visit(*type_representation.dtype, to_json_visitor{deserialized, type_representation.ident, bv});
return deserialized;
}
std::string type_to_string(data_type type) {
static thread_local std::unordered_map<data_type, std::string> types = {
{utf8_type, "S"},
{bytes_type, "B"},
{boolean_type, "BOOL"},
{decimal_type, "N"}, // FIXME: use a specialized Alternator number type instead of the general decimal_type
};
auto it = types.find(type);
if (it == types.end()) {
throw std::runtime_error(format("Unknown type {}", type->name()));
}
return it->second;
}
bytes get_key_column_value(const rjson::value& item, const column_definition& column) {
std::string column_name = column.name_as_text();
std::string expected_type = type_to_string(column.type);
const rjson::value& key_typed_value = rjson::get(item, rjson::value::StringRefType(column_name.c_str()));
if (!key_typed_value.IsObject() || key_typed_value.MemberCount() != 1) {
throw api_error("ValidationException",
format("Missing or invalid value object for key column {}: {}", column_name, item));
}
return get_key_from_typed_value(key_typed_value, column, expected_type);
}
bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column, const std::string& expected_type) {
auto it = key_typed_value.MemberBegin();
if (it->name.GetString() != expected_type) {
throw api_error("ValidationException",
format("Type mismatch: expected type {} for key column {}, got type {}",
expected_type, column.name_as_text(), it->name.GetString()));
}
if (column.type == bytes_type) {
return base64_decode(it->value);
} else {
return column.type->from_string(it->value.GetString());
}
}
rjson::value json_key_column_value(bytes_view cell, const column_definition& column) {
if (column.type == bytes_type) {
std::string b64 = base64_encode(cell);
return rjson::from_string(b64);
} if (column.type == utf8_type) {
return rjson::from_string(std::string(reinterpret_cast<const char*>(cell.data()), cell.size()));
} else if (column.type == decimal_type) {
// FIXME: use specialized Alternator number type, not the more
// general "decimal_type". A dedicated type can be more efficient
// in storage space and in parsing speed.
auto s = decimal_type->to_json_string(bytes(cell));
return rjson::from_string(s);
} else {
// We shouldn't get here, we shouldn't see such key columns.
throw std::runtime_error(format("Unexpected key type: {}", column.type->name()));
}
}
partition_key pk_from_json(const rjson::value& item, schema_ptr schema) {
std::vector<bytes> raw_pk;
// FIXME: this is a loop, but we really allow only one partition key column.
for (const column_definition& cdef : schema->partition_key_columns()) {
bytes raw_value = get_key_column_value(item, cdef);
raw_pk.push_back(std::move(raw_value));
}
return partition_key::from_exploded(raw_pk);
}
clustering_key ck_from_json(const rjson::value& item, schema_ptr schema) {
if (schema->clustering_key_size() == 0) {
return clustering_key::make_empty();
}
std::vector<bytes> raw_ck;
// FIXME: this is a loop, but we really allow only one clustering key column.
for (const column_definition& cdef : schema->clustering_key_columns()) {
bytes raw_value = get_key_column_value(item, cdef);
raw_ck.push_back(std::move(raw_value));
}
return clustering_key::from_exploded(raw_ck);
}
big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic) {
if (!v.IsObject() || v.MemberCount() != 1) {
throw api_error("ValidationException", format("{}: invalid number object", diagnostic));
}
auto it = v.MemberBegin();
if (it->name != "N") {
throw api_error("ValidationException", format("{}: expected number, found type '{}'", diagnostic, it->name));
}
if (it->value.IsNumber()) {
// FIXME(sarna): should use big_decimal constructor with numeric values directly:
return big_decimal(rjson::print(it->value));
}
if (!it->value.IsString()) {
throw api_error("ValidationException", format("{}: improperly formatted number constant", diagnostic));
}
return big_decimal(it->value.GetString());
}
}

View File

@@ -1,66 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <string>
#include <string_view>
#include "types.hh"
#include "schema.hh"
#include "keys.hh"
#include "rjson.hh"
#include "utils/big_decimal.hh"
namespace alternator {
enum class alternator_type : int8_t {
S, B, BOOL, N, NOT_SUPPORTED_YET
};
struct type_info {
alternator_type atype;
data_type dtype;
};
struct type_representation {
std::string ident;
data_type dtype;
};
type_info type_info_from_string(std::string type);
type_representation represent_type(alternator_type atype);
bytes serialize_item(const rjson::value& item);
rjson::value deserialize_item(bytes_view bv);
std::string type_to_string(data_type type);
bytes get_key_column_value(const rjson::value& item, const column_definition& column);
bytes get_key_from_typed_value(const rjson::value& key_typed_value, const column_definition& column, const std::string& expected_type);
rjson::value json_key_column_value(bytes_view cell, const column_definition& column);
partition_key pk_from_json(const rjson::value& item, schema_ptr schema);
clustering_key ck_from_json(const rjson::value& item, schema_ptr schema);
// If v encodes a number (i.e., it is a {"N": [...]}, returns an object representing it. Otherwise,
// raises ValidationException with diagnostic.
big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic);
}

View File

@@ -1,311 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "alternator/server.hh"
#include "log.hh"
#include <seastar/http/function_handlers.hh>
#include <seastar/json/json_elements.hh>
#include <seastarx.hh>
#include "error.hh"
#include "rjson.hh"
#include "auth.hh"
#include <cctype>
#include "cql3/query_processor.hh"
static logging::logger slogger("alternator-server");
using namespace httpd;
namespace alternator {
static constexpr auto TARGET = "X-Amz-Target";
inline std::vector<std::string_view> split(std::string_view text, char separator) {
std::vector<std::string_view> tokens;
if (text == "") {
return tokens;
}
while (true) {
auto pos = text.find_first_of(separator);
if (pos != std::string_view::npos) {
tokens.emplace_back(text.data(), pos);
text.remove_prefix(pos + 1);
} else {
tokens.emplace_back(text);
break;
}
}
return tokens;
}
// DynamoDB HTTP error responses are structured as follows
// https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html
// Our handlers throw an exception to report an error. If the exception
// is of type alternator::api_error, it unwrapped and properly reported to
// the user directly. Other exceptions are unexpected, and reported as
// Internal Server Error.
class api_handler : public handler_base {
public:
api_handler(const future_json_function& _handle) : _f_handle(
[_handle](std::unique_ptr<request> req, std::unique_ptr<reply> rep) {
return seastar::futurize_apply(_handle, std::move(req)).then_wrapped([rep = std::move(rep)](future<json::json_return_type> resf) mutable {
if (resf.failed()) {
// Exceptions of type api_error are wrapped as JSON and
// returned to the client as expected. Other types of
// exceptions are unexpected, and returned to the user
// as an internal server error:
api_error ret;
try {
resf.get();
} catch (api_error &ae) {
ret = ae;
} catch (rjson::error & re) {
ret = api_error("ValidationException", re.what());
} catch (...) {
ret = api_error(
"Internal Server Error",
format("Internal server error: {}", std::current_exception()),
reply::status_type::internal_server_error);
}
// FIXME: what is this version number?
rep->_content += "{\"__type\":\"com.amazonaws.dynamodb.v20120810#" + ret._type + "\"," +
"\"message\":\"" + ret._msg + "\"}";
rep->_status = ret._http_code;
slogger.trace("api_handler error case: {}", rep->_content);
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
}
slogger.trace("api_handler success case");
auto res = resf.get0();
if (res._body_writer) {
rep->write_body("json", std::move(res._body_writer));
} else {
rep->_content += res._res;
}
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
});
}), _type("json") { }
api_handler(const api_handler&) = default;
future<std::unique_ptr<reply>> handle(const sstring& path,
std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {
return _f_handle(std::move(req), std::move(rep)).then(
[this](std::unique_ptr<reply> rep) {
rep->done(_type);
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
});
}
protected:
future_handler_function _f_handle;
sstring _type;
};
class health_handler : public handler_base {
virtual future<std::unique_ptr<reply>> handle(const sstring& path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) override {
rep->set_status(reply::status_type::ok);
rep->write_body("txt", format("healthy: {}", req->get_header("Host")));
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
}
};
future<> server::verify_signature(const request& req) {
if (!_enforce_authorization) {
slogger.debug("Skipping authorization");
return make_ready_future<>();
}
auto host_it = req._headers.find("Host");
if (host_it == req._headers.end()) {
throw api_error("InvalidSignatureException", "Host header is mandatory for signature verification");
}
auto authorization_it = req._headers.find("Authorization");
if (host_it == req._headers.end()) {
throw api_error("InvalidSignatureException", "Authorization header is mandatory for signature verification");
}
std::string host = host_it->second;
std::vector<std::string_view> credentials_raw = split(authorization_it->second, ' ');
std::string credential;
std::string user_signature;
std::string signed_headers_str;
std::vector<std::string_view> signed_headers;
for (std::string_view entry : credentials_raw) {
std::vector<std::string_view> entry_split = split(entry, '=');
if (entry_split.size() != 2) {
if (entry != "AWS4-HMAC-SHA256") {
throw api_error("InvalidSignatureException", format("Only AWS4-HMAC-SHA256 algorithm is supported. Found: {}", entry));
}
continue;
}
std::string_view auth_value = entry_split[1];
// Commas appear as an additional (quite redundant) delimiter
if (auth_value.back() == ',') {
auth_value.remove_suffix(1);
}
if (entry_split[0] == "Credential") {
credential = std::string(auth_value);
} else if (entry_split[0] == "Signature") {
user_signature = std::string(auth_value);
} else if (entry_split[0] == "SignedHeaders") {
signed_headers_str = std::string(auth_value);
signed_headers = split(auth_value, ';');
std::sort(signed_headers.begin(), signed_headers.end());
}
}
std::vector<std::string_view> credential_split = split(credential, '/');
if (credential_split.size() != 5) {
throw api_error("ValidationException", format("Incorrect credential information format: {}", credential));
}
std::string user(credential_split[0]);
std::string datestamp(credential_split[1]);
std::string region(credential_split[2]);
std::string service(credential_split[3]);
std::map<std::string_view, std::string_view> signed_headers_map;
for (const auto& header : signed_headers) {
signed_headers_map.emplace(header, std::string_view());
}
for (auto& header : req._headers) {
std::string header_str;
header_str.resize(header.first.size());
std::transform(header.first.begin(), header.first.end(), header_str.begin(), ::tolower);
auto it = signed_headers_map.find(header_str);
if (it != signed_headers_map.end()) {
it->second = std::string_view(header.second);
}
}
auto cache_getter = [] (std::string username) {
return get_key_from_roles(cql3::get_query_processor().local(), std::move(username));
};
return _key_cache.get_ptr(user, cache_getter).then([this, &req,
user = std::move(user),
host = std::move(host),
datestamp = std::move(datestamp),
signed_headers_str = std::move(signed_headers_str),
signed_headers_map = std::move(signed_headers_map),
region = std::move(region),
service = std::move(service),
user_signature = std::move(user_signature)] (key_cache::value_ptr key_ptr) {
std::string signature = get_signature(user, *key_ptr, std::string_view(host), req._method,
datestamp, signed_headers_str, signed_headers_map, req.content, region, service, "");
if (signature != std::string_view(user_signature)) {
_key_cache.remove(user);
throw api_error("UnrecognizedClientException", "The security token included in the request is invalid.");
}
});
}
future<json::json_return_type> server::handle_api_request(std::unique_ptr<request>&& req) {
sstring target = req->get_header(TARGET);
std::vector<std::string_view> split_target = split(target, '.');
//NOTICE(sarna): Target consists of Dynamo API version followed by a dot '.' and operation type (e.g. CreateTable)
std::string op = split_target.empty() ? std::string() : std::string(split_target.back());
slogger.trace("Request: {} {}", op, req->content);
return verify_signature(*req).then([this, op, req = std::move(req)] () mutable {
auto callback_it = _callbacks.find(op);
if (callback_it == _callbacks.end()) {
_executor.local()._stats.unsupported_operations++;
throw api_error("UnknownOperationException",
format("Unsupported operation {}", op));
}
//FIXME: Client state can provide more context, e.g. client's endpoint address
// We use unique_ptr because client_state cannot be moved or copied
return do_with(std::make_unique<executor::client_state>(executor::client_state::internal_tag()), [this, callback_it = std::move(callback_it), op = std::move(op), req = std::move(req)] (std::unique_ptr<executor::client_state>& client_state) mutable {
client_state->set_raw_keyspace(executor::KEYSPACE_NAME);
executor::maybe_trace_query(*client_state, op, req->content);
tracing::trace(client_state->get_trace_state(), op);
return callback_it->second(_executor.local(), *client_state, std::move(req));
});
});
}
void server::set_routes(routes& r) {
api_handler* req_handler = new api_handler([this] (std::unique_ptr<request> req) mutable {
return handle_api_request(std::move(req));
});
r.add(operation_type::POST, url("/"), req_handler);
r.add(operation_type::GET, url("/"), new health_handler);
}
//FIXME: A way to immediately invalidate the cache should be considered,
// e.g. when the system table which stores the keys is changed.
// For now, this propagation may take up to 1 minute.
server::server(seastar::sharded<executor>& e)
: _executor(e), _key_cache(1024, 1min, slogger), _enforce_authorization(false)
, _callbacks{
{"CreateTable", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) {
return e.maybe_create_keyspace().then([&e, &client_state, req = std::move(req)] { return e.create_table(client_state, req->content); }); }
},
{"DescribeTable", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.describe_table(client_state, req->content); }},
{"DeleteTable", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.delete_table(client_state, req->content); }},
{"PutItem", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.put_item(client_state, req->content); }},
{"UpdateItem", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.update_item(client_state, req->content); }},
{"GetItem", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.get_item(client_state, req->content); }},
{"DeleteItem", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.delete_item(client_state, req->content); }},
{"ListTables", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.list_tables(client_state, req->content); }},
{"Scan", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.scan(client_state, req->content); }},
{"DescribeEndpoints", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.describe_endpoints(client_state, req->content, req->get_header("Host")); }},
{"BatchWriteItem", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.batch_write_item(client_state, req->content); }},
{"BatchGetItem", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.batch_get_item(client_state, req->content); }},
{"Query", [] (executor& e, executor::client_state& client_state, std::unique_ptr<request> req) { return e.query(client_state, req->content); }},
} {
}
future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds, bool enforce_authorization) {
_enforce_authorization = enforce_authorization;
if (!port && !https_port) {
return make_exception_future<>(std::runtime_error("Either regular port or TLS port"
" must be specified in order to init an alternator HTTP server instance"));
}
return seastar::async([this, addr, port, https_port, creds] {
try {
_executor.invoke_on_all([] (executor& e) {
return e.start();
}).get();
if (port) {
_control.start().get();
_control.set_routes(std::bind(&server::set_routes, this, std::placeholders::_1)).get();
_control.listen(socket_address{addr, *port}).get();
slogger.info("Alternator HTTP server listening on {} port {}", addr, *port);
}
if (https_port) {
_https_control.start().get();
_https_control.set_routes(std::bind(&server::set_routes, this, std::placeholders::_1)).get();
_https_control.server().invoke_on_all([creds] (http_server& serv) {
return serv.set_tls_credentials(creds->build_server_credentials());
}).get();
_https_control.listen(socket_address{addr, *https_port}).get();
slogger.info("Alternator HTTPS server listening on {} port {}", addr, *https_port);
}
} catch (...) {
slogger.warn("Failed to set up Alternator HTTP server on {} port {}, TLS port {}: {}",
addr, port ? std::to_string(*port) : "OFF", https_port ? std::to_string(*https_port) : "OFF", std::current_exception());
throw;
}
});
}
}

View File

@@ -1,54 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "alternator/executor.hh"
#include <seastar/core/future.hh>
#include <seastar/http/httpd.hh>
#include <seastar/net/tls.hh>
#include <optional>
#include <alternator/auth.hh>
namespace alternator {
class server {
using alternator_callback = std::function<future<json::json_return_type>(executor&, executor::client_state&, std::unique_ptr<request>)>;
using alternator_callbacks_map = std::unordered_map<std::string_view, alternator_callback>;
seastar::httpd::http_server_control _control;
seastar::httpd::http_server_control _https_control;
seastar::sharded<executor>& _executor;
key_cache _key_cache;
bool _enforce_authorization;
alternator_callbacks_map _callbacks;
public:
server(seastar::sharded<executor>& executor);
seastar::future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds, bool enforce_authorization);
private:
void set_routes(seastar::httpd::routes& r);
future<> verify_signature(const seastar::httpd::request& r);
future<json::json_return_type> handle_api_request(std::unique_ptr<request>&& req);
};
}

View File

@@ -1,98 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "stats.hh"
#include <seastar/core/metrics.hh>
namespace alternator {
const char* ALTERNATOR_METRICS = "alternator";
stats::stats() : api_operations{} {
// Register the
seastar::metrics::label op("op");
_metrics.add_group("alternator", {
#define OPERATION(name, CamelCaseName) \
seastar::metrics::make_total_operations("operation", api_operations.name, \
seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName)}),
#define OPERATION_LATENCY(name, CamelCaseName) \
seastar::metrics::make_histogram("op_latency", \
seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return api_operations.name.get_histogram(1,20);}),
OPERATION(batch_write_item, "BatchWriteItem")
OPERATION(create_backup, "CreateBackup")
OPERATION(create_global_table, "CreateGlobalTable")
OPERATION(create_table, "CreateTable")
OPERATION(delete_backup, "DeleteBackup")
OPERATION(delete_item, "DeleteItem")
OPERATION(delete_table, "DeleteTable")
OPERATION(describe_backup, "DescribeBackup")
OPERATION(describe_continuous_backups, "DescribeContinuousBackups")
OPERATION(describe_endpoints, "DescribeEndpoints")
OPERATION(describe_global_table, "DescribeGlobalTable")
OPERATION(describe_global_table_settings, "DescribeGlobalTableSettings")
OPERATION(describe_limits, "DescribeLimits")
OPERATION(describe_table, "DescribeTable")
OPERATION(describe_time_to_live, "DescribeTimeToLive")
OPERATION(get_item, "GetItem")
OPERATION(list_backups, "ListBackups")
OPERATION(list_global_tables, "ListGlobalTables")
OPERATION(list_tables, "ListTables")
OPERATION(list_tags_of_resource, "ListTagsOfResource")
OPERATION(put_item, "PutItem")
OPERATION(query, "Query")
OPERATION(restore_table_from_backup, "RestoreTableFromBackup")
OPERATION(restore_table_to_point_in_time, "RestoreTableToPointInTime")
OPERATION(scan, "Scan")
OPERATION(tag_resource, "TagResource")
OPERATION(transact_get_items, "TransactGetItems")
OPERATION(transact_write_items, "TransactWriteItems")
OPERATION(untag_resource, "UntagResource")
OPERATION(update_continuous_backups, "UpdateContinuousBackups")
OPERATION(update_global_table, "UpdateGlobalTable")
OPERATION(update_global_table_settings, "UpdateGlobalTableSettings")
OPERATION(update_item, "UpdateItem")
OPERATION(update_table, "UpdateTable")
OPERATION(update_time_to_live, "UpdateTimeToLive")
OPERATION_LATENCY(put_item_latency, "PutItem")
OPERATION_LATENCY(get_item_latency, "GetItem")
OPERATION_LATENCY(delete_item_latency, "DeleteItem")
OPERATION_LATENCY(update_item_latency, "UpdateItem")
});
_metrics.add_group("alternator", {
seastar::metrics::make_total_operations("unsupported_operations", unsupported_operations,
seastar::metrics::description("number of unsupported operations via Alternator API")),
seastar::metrics::make_total_operations("total_operations", total_operations,
seastar::metrics::description("number of total operations via Alternator API")),
seastar::metrics::make_total_operations("reads_before_write", reads_before_write,
seastar::metrics::description("number of performed read-before-write operations")),
seastar::metrics::make_total_operations("filtered_rows_read_total", cql_stats.filtered_rows_read_total,
seastar::metrics::description("number of rows read during filtering operations")),
seastar::metrics::make_total_operations("filtered_rows_matched_total", cql_stats.filtered_rows_matched_total,
seastar::metrics::description("number of rows read and matched during filtering operations")),
seastar::metrics::make_total_operations("filtered_rows_dropped_total", [this] { return cql_stats.filtered_rows_read_total - cql_stats.filtered_rows_matched_total; },
seastar::metrics::description("number of rows read and dropped during filtering operations")),
});
}
}

View File

@@ -1,95 +0,0 @@
/*
* Copyright 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <cstdint>
#include <seastar/core/metrics_registration.hh>
#include "seastarx.hh"
#include "utils/estimated_histogram.hh"
#include "cql3/stats.hh"
namespace alternator {
// Object holding per-shard statistics related to Alternator.
// While this object is alive, these metrics are also registered to be
// visible by the metrics REST API, with the "alternator" prefix.
class stats {
public:
stats();
// Count of DynamoDB API operations by types
struct {
uint64_t batch_get_item = 0;
uint64_t batch_write_item = 0;
uint64_t create_backup = 0;
uint64_t create_global_table = 0;
uint64_t create_table = 0;
uint64_t delete_backup = 0;
uint64_t delete_item = 0;
uint64_t delete_table = 0;
uint64_t describe_backup = 0;
uint64_t describe_continuous_backups = 0;
uint64_t describe_endpoints = 0;
uint64_t describe_global_table = 0;
uint64_t describe_global_table_settings = 0;
uint64_t describe_limits = 0;
uint64_t describe_table = 0;
uint64_t describe_time_to_live = 0;
uint64_t get_item = 0;
uint64_t list_backups = 0;
uint64_t list_global_tables = 0;
uint64_t list_tables = 0;
uint64_t list_tags_of_resource = 0;
uint64_t put_item = 0;
uint64_t query = 0;
uint64_t restore_table_from_backup = 0;
uint64_t restore_table_to_point_in_time = 0;
uint64_t scan = 0;
uint64_t tag_resource = 0;
uint64_t transact_get_items = 0;
uint64_t transact_write_items = 0;
uint64_t untag_resource = 0;
uint64_t update_continuous_backups = 0;
uint64_t update_global_table = 0;
uint64_t update_global_table_settings = 0;
uint64_t update_item = 0;
uint64_t update_table = 0;
uint64_t update_time_to_live = 0;
utils::estimated_histogram put_item_latency;
utils::estimated_histogram get_item_latency;
utils::estimated_histogram delete_item_latency;
utils::estimated_histogram update_item_latency;
} api_operations;
// Miscellaneous event counters
uint64_t total_operations = 0;
uint64_t unsupported_operations = 0;
uint64_t reads_before_write = 0;
// CQL-derived stats
cql3::cql_stats cql_stats;
private:
// The metric_groups object holds this stat object's metrics registered
// as long as the stats object is alive.
seastar::metrics::metric_groups _metrics;
};
}

View File

@@ -127,24 +127,6 @@
}
]
},
{
"path": "/compaction_manager/metrics/pending_tasks_by_table",
"operations": [
{
"method": "GET",
"summary": "Get pending tasks by table name",
"type": "array",
"items": {
"type": "pending_compaction"
},
"nickname": "get_pending_tasks_by_table",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/compaction_manager/metrics/completed_tasks",
"operations": [
@@ -262,23 +244,6 @@
}
}
},
"pending_compaction": {
"id": "pending_compaction",
"properties": {
"cf": {
"type": "string",
"description": "The column family name"
},
"ks": {
"type":"string",
"description": "The keyspace name"
},
"task": {
"type":"long",
"description": "The number of pending tasks"
}
}
},
"history": {
"id":"history",
"description":"Compaction history information",

View File

@@ -671,6 +671,21 @@
}
]
},
{
"path": "/storage_proxy/metrics/cas_read/condition_not_met",
"operations": [
{
"method": "GET",
"summary": "Get cas read metrics",
"type": "int",
"nickname": "get_cas_read_metrics_condition_not_met",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/read/timeouts",
"operations": [
@@ -776,36 +791,6 @@
}
]
},
{
"path": "/storage_proxy/metrics/cas_read/moving_average_histogram",
"operations": [
{
"method": "GET",
"summary": "Get CAS read rate and latency histogram",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname": "get_cas_read_metrics_latency_histogram",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/view_write/moving_average_histogram",
"operations": [
{
"method": "GET",
"summary": "Get view write rate and latency histogram",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname": "get_view_write_metrics_latency_histogram",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/range/moving_average_histogram",
"operations": [
@@ -971,21 +956,6 @@
}
]
},
{
"path": "/storage_proxy/metrics/cas_write/moving_average_histogram",
"operations": [
{
"method": "GET",
"summary": "Get CAS write rate and latency histogram",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname": "get_cas_write_metrics_latency_histogram",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path":"/storage_proxy/metrics/read/estimated_histogram/",
"operations":[

View File

@@ -2164,42 +2164,7 @@
]
}
]
},
{
"path":"/storage_service/sstable_info",
"operations":[
{
"method":"GET",
"summary":"SSTable information",
"type":"array",
"items":{
"type":"table_sstables"
},
"nickname":"sstable_info",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"The keyspace",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"cf",
"description":"column family name",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
}
}
],
"models":{
"mapper":{
@@ -2359,92 +2324,6 @@
"description":"The endpoint details"
}
}
},
"named_maps":{
"id":"named_maps",
"properties":{
"group":{
"type":"string"
},
"attributes":{
"type":"array",
"items":{
"type":"mapper"
}
}
}
},
"sstable":{
"id":"sstable",
"properties":{
"size":{
"type":"long",
"description":"Total size in bytes of sstable"
},
"data_size":{
"type":"long",
"description":"The size in bytes on disk of data"
},
"index_size":{
"type":"long",
"description":"The size in bytes on disk of index"
},
"filter_size":{
"type":"long",
"description":"The size in bytes on disk of filter"
},
"timestamp":{
"type":"datetime",
"description":"File creation time"
},
"generation":{
"type":"long",
"description":"SSTable generation"
},
"level":{
"type":"long",
"description":"SSTable level"
},
"version":{
"type":"string",
"enum":[
"ka", "la", "mc"
],
"description":"SSTable version"
},
"properties":{
"type":"array",
"description":"SSTable attributes",
"items":{
"type":"mapper"
}
},
"extended_properties":{
"type":"array",
"description":"SSTable extended attributes",
"items":{
"type":"named_maps"
}
}
}
},
"table_sstables":{
"id":"table_sstables",
"description":"Per-table SSTable info and attributes",
"properties":{
"keyspace":{
"type":"string"
},
"table":{
"type":"string"
},
"sstables":{
"type":"array",
"items":{
"$ref":"sstable"
}
}
}
}
}
}

View File

@@ -26,7 +26,7 @@
#include "sstables/sstables.hh"
#include "utils/estimated_histogram.hh"
#include <algorithm>
#include "db/system_keyspace_view_types.hh"
#include "db/data_listeners.hh"
extern logging::logger apilog;
@@ -53,7 +53,8 @@ std::tuple<sstring, sstring> parse_fully_qualified_cf_name(sstring name) {
return std::make_tuple(name.substr(0, pos), name.substr(end));
}
const utils::UUID& get_uuid(const sstring& ks, const sstring& cf, const database& db) {
const utils::UUID& get_uuid(const sstring& name, const database& db) {
auto [ks, cf] = parse_fully_qualified_cf_name(name);
try {
return db.find_uuid(ks, cf);
} catch (std::out_of_range& e) {
@@ -61,11 +62,6 @@ const utils::UUID& get_uuid(const sstring& ks, const sstring& cf, const database
}
}
const utils::UUID& get_uuid(const sstring& name, const database& db) {
auto [ks, cf] = parse_fully_qualified_cf_name(name);
return get_uuid(ks, cf, db);
}
future<> foreach_column_family(http_context& ctx, const sstring& name, function<void(column_family&)> f) {
auto uuid = get_uuid(name, ctx.db.local());
@@ -75,28 +71,28 @@ future<> foreach_column_family(http_context& ctx, const sstring& name, function<
}
future<json::json_return_type> get_cf_stats(http_context& ctx, const sstring& name,
int64_t column_family_stats::*f) {
int64_t column_family::stats::*f) {
return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
return cf.get_stats().*f;
}, std::plus<int64_t>());
}
future<json::json_return_type> get_cf_stats(http_context& ctx,
int64_t column_family_stats::*f) {
int64_t column_family::stats::*f) {
return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
return cf.get_stats().*f;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_stats_count(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
return (cf.get_stats().*f).hist.count;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_stats_sum(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
auto uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([uuid, f](database& db) {
// Histograms information is sample of the actual load
@@ -112,14 +108,14 @@ static future<json::json_return_type> get_cf_stats_sum(http_context& ctx, const
static future<json::json_return_type> get_cf_stats_count(http_context& ctx,
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
return (cf.get_stats().*f).hist.count;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_histogram(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::UUID uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([f, uuid](const database& p) {
return (p.find_column_family(uuid).get_stats().*f).hist;},
@@ -130,7 +126,7 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, const
});
}
static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
std::function<utils::ihistogram(const database&)> fun = [f] (const database& db) {
utils::ihistogram res;
for (auto i : db.get_column_families()) {
@@ -146,7 +142,7 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils:
}
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::UUID uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([f, uuid](const database& p) {
return (p.find_column_family(uuid).get_stats().*f).rate();},
@@ -157,7 +153,7 @@ static future<json::json_return_type> get_cf_rate_and_histogram(http_context& c
});
}
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family_stats::*f) {
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
std::function<utils::rate_moving_average_and_histogram(const database&)> fun = [f] (const database& db) {
utils::rate_moving_average_and_histogram res;
for (auto i : db.get_column_families()) {
@@ -254,11 +250,12 @@ class sum_ratio {
uint64_t _n = 0;
T _total = 0;
public:
void operator()(T value) {
future<> operator()(T value) {
if (value > 0) {
_total += value;
_n++;
}
return make_ready_future<>();
}
// Returns average value of all registered ratios.
T get() && {
@@ -407,11 +404,11 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx,req->param["name"] ,&column_family_stats::memtable_switch_count);
return get_cf_stats(ctx,req->param["name"] ,&column_family::stats::memtable_switch_count);
});
cf::get_all_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family_stats::memtable_switch_count);
return get_cf_stats(ctx, &column_family::stats::memtable_switch_count);
});
// FIXME: this refers to partitions, not rows.
@@ -456,67 +453,67 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx,req->param["name"] ,&column_family_stats::pending_flushes);
return get_cf_stats(ctx,req->param["name"] ,&column_family::stats::pending_flushes);
});
cf::get_all_pending_flushes.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family_stats::pending_flushes);
return get_cf_stats(ctx, &column_family::stats::pending_flushes);
});
cf::get_read.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_count(ctx,req->param["name"] ,&column_family_stats::reads);
return get_cf_stats_count(ctx,req->param["name"] ,&column_family::stats::reads);
});
cf::get_all_read.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_count(ctx, &column_family_stats::reads);
return get_cf_stats_count(ctx, &column_family::stats::reads);
});
cf::get_write.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_count(ctx, req->param["name"] ,&column_family_stats::writes);
return get_cf_stats_count(ctx, req->param["name"] ,&column_family::stats::writes);
});
cf::get_all_write.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_count(ctx, &column_family_stats::writes);
return get_cf_stats_count(ctx, &column_family::stats::writes);
});
cf::get_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family_stats::reads);
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::reads);
});
cf::get_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family_stats::reads);
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::reads);
});
cf::get_read_latency.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_sum(ctx,req->param["name"] ,&column_family_stats::reads);
return get_cf_stats_sum(ctx,req->param["name"] ,&column_family::stats::reads);
});
cf::get_write_latency.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_sum(ctx, req->param["name"] ,&column_family_stats::writes);
return get_cf_stats_sum(ctx, req->param["name"] ,&column_family::stats::writes);
});
cf::get_all_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, &column_family_stats::writes);
return get_cf_histogram(ctx, &column_family::stats::writes);
});
cf::get_all_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, &column_family_stats::writes);
return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);
});
cf::get_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family_stats::writes);
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::writes);
});
cf::get_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family_stats::writes);
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::writes);
});
cf::get_all_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, &column_family_stats::writes);
return get_cf_histogram(ctx, &column_family::stats::writes);
});
cf::get_all_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, &column_family_stats::writes);
return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);
});
cf::get_pending_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -532,11 +529,11 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, req->param["name"], &column_family_stats::live_sstable_count);
return get_cf_stats(ctx, req->param["name"], &column_family::stats::live_sstable_count);
});
cf::get_all_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family_stats::live_sstable_count);
return get_cf_stats(ctx, &column_family::stats::live_sstable_count);
});
cf::get_unleveled_sstables.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -795,25 +792,25 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_cas_prepare.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_cas_prepare;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
cf::get_cas_prepare.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
});
cf::get_cas_propose.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_cas_propose;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
cf::get_cas_propose.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
});
cf::get_cas_commit.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
return cf.get_stats().estimated_cas_commit;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
cf::get_cas_commit.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
});
cf::get_sstables_per_read_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -824,11 +821,11 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_tombstone_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family_stats::tombstone_scanned);
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::tombstone_scanned);
});
cf::get_live_scanned_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family_stats::live_scanned);
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::live_scanned);
});
cf::get_col_update_time_delta_histogram.set(r, [] (std::unique_ptr<request> req) {
@@ -846,28 +843,13 @@ void set_column_family(http_context& ctx, routes& r) {
return true;
});
cf::get_built_indexes.set(r, [&ctx](std::unique_ptr<request> req) {
auto [ks, cf_name] = parse_fully_qualified_cf_name(req->param["name"]);
return db::system_keyspace::load_view_build_progress().then([ks, cf_name, &ctx](const std::vector<db::system_keyspace::view_build_progress>& vb) mutable {
std::set<sstring> vp;
for (auto b : vb) {
if (b.view.first == ks) {
vp.insert(b.view.second);
}
}
std::vector<sstring> res;
auto uuid = get_uuid(ks, cf_name, ctx.db.local());
column_family& cf = ctx.db.local().find_column_family(uuid);
res.reserve(cf.get_index_manager().list_indexes().size());
for (auto&& i : cf.get_index_manager().list_indexes()) {
if (vp.find(secondary_index::index_table_name(i.metadata().name())) == vp.end()) {
res.emplace_back(i.metadata().name());
}
}
return make_ready_future<json::json_return_type>(res);
});
cf::get_built_indexes.set(r, [](const_req) {
// FIXME
// Currently there are no index support
return std::vector<sstring>();
});
cf::get_compression_metadata_off_heap_memory_used.set(r, [](const_req) {
// FIXME
// Currently there are no information on the compression

View File

@@ -39,14 +39,14 @@ template<class Mapper, class I, class Reducer>
future<I> map_reduce_cf_raw(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer) {
auto uuid = get_uuid(name, ctx.db.local());
using mapper_type = std::function<std::unique_ptr<std::any>(database&)>;
using reducer_type = std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)>;
using mapper_type = std::function<std::any (database&)>;
using reducer_type = std::function<std::any (std::any, std::any)>;
return ctx.db.map_reduce0(mapper_type([mapper, uuid](database& db) {
return std::make_unique<std::any>(I(mapper(db.find_column_family(uuid))));
}), std::make_unique<std::any>(std::move(init)), reducer_type([reducer = std::move(reducer)] (std::unique_ptr<std::any> a, std::unique_ptr<std::any> b) mutable {
return std::make_unique<std::any>(I(reducer(std::any_cast<I>(std::move(*a)), std::any_cast<I>(std::move(*b)))));
})).then([] (std::unique_ptr<std::any> r) {
return std::any_cast<I>(std::move(*r));
return I(mapper(db.find_column_family(uuid)));
}), std::any(std::move(init)), reducer_type([reducer = std::move(reducer)] (std::any a, std::any b) mutable {
return I(reducer(std::any_cast<I>(std::move(a)), std::any_cast<I>(std::move(b))));
})).then([] (std::any r) {
return std::any_cast<I>(std::move(r));
});
}
@@ -70,14 +70,14 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& n
struct map_reduce_column_families_locally {
std::any init;
std::function<std::unique_ptr<std::any>(column_family&)> mapper;
std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)> reducer;
future<std::unique_ptr<std::any>> operator()(database& db) const {
auto res = seastar::make_lw_shared<std::unique_ptr<std::any>>(std::make_unique<std::any>(init));
std::function<std::any (column_family&)> mapper;
std::function<std::any (std::any, std::any)> reducer;
future<std::any> operator()(database& db) const {
auto res = seastar::make_lw_shared<std::any>(init);
return do_for_each(db.get_column_families(), [res, this](const std::pair<utils::UUID, seastar::lw_shared_ptr<table>>& i) {
*res = std::move(reducer(std::move(*res), mapper(*i.second.get())));
*res = reducer(*res.get(), mapper(*i.second.get()));
}).then([res] {
return std::move(*res);
return *res;
});
}
};
@@ -85,17 +85,16 @@ struct map_reduce_column_families_locally {
template<class Mapper, class I, class Reducer>
future<I> map_reduce_cf_raw(http_context& ctx, I init,
Mapper mapper, Reducer reducer) {
using mapper_type = std::function<std::unique_ptr<std::any>(column_family&)>;
using reducer_type = std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)>;
using mapper_type = std::function<std::any (column_family&)>;
using reducer_type = std::function<std::any (std::any, std::any)>;
auto wrapped_mapper = mapper_type([mapper = std::move(mapper)] (column_family& cf) mutable {
return std::make_unique<std::any>(I(mapper(cf)));
return I(mapper(cf));
});
auto wrapped_reducer = reducer_type([reducer = std::move(reducer)] (std::unique_ptr<std::any> a, std::unique_ptr<std::any> b) mutable {
return std::make_unique<std::any>(I(reducer(std::any_cast<I>(std::move(*a)), std::any_cast<I>(std::move(*b)))));
auto wrapped_reducer = reducer_type([reducer = std::move(reducer)] (std::any a, std::any b) mutable {
return I(reducer(std::any_cast<I>(std::move(a)), std::any_cast<I>(std::move(b))));
});
return ctx.db.map_reduce0(map_reduce_column_families_locally{init,
std::move(wrapped_mapper), wrapped_reducer}, std::make_unique<std::any>(init), wrapped_reducer).then([] (std::unique_ptr<std::any> res) {
return std::any_cast<I>(std::move(*res));
return ctx.db.map_reduce0(map_reduce_column_families_locally{init, std::move(wrapped_mapper), wrapped_reducer}, std::any(init), wrapped_reducer).then([] (std::any res) {
return std::any_cast<I>(std::move(res));
});
}
@@ -109,9 +108,9 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, I init,
}
future<json::json_return_type> get_cf_stats(http_context& ctx, const sstring& name,
int64_t column_family_stats::*f);
int64_t column_family::stats::*f);
future<json::json_return_type> get_cf_stats(http_context& ctx,
int64_t column_family_stats::*f);
int64_t column_family::stats::*f);
}

View File

@@ -24,7 +24,6 @@
#include "api/api-doc/compaction_manager.json.hh"
#include "db/system_keyspace.hh"
#include "column_family.hh"
#include <utility>
namespace api {
@@ -39,16 +38,6 @@ static future<json::json_return_type> get_cm_stats(http_context& ctx,
return make_ready_future<json::json_return_type>(res);
});
}
static std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash> sum_pending_tasks(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>&& a,
const std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& b) {
for (auto&& i : b) {
if (i.second) {
a[i.first] += i.second;
}
}
return std::move(a);
}
void set_compaction_manager(http_context& ctx, routes& r) {
cm::get_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -72,32 +61,6 @@ void set_compaction_manager(http_context& ctx, routes& r) {
});
});
cm::get_pending_tasks_by_table.set(r, [&ctx] (std::unique_ptr<request> req) {
return ctx.db.map_reduce0([&ctx](database& db) {
return do_with(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), [&ctx, &db](std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& tasks) {
return do_for_each(db.get_column_families(), [&tasks](const std::pair<utils::UUID, seastar::lw_shared_ptr<table>>& i) {
table& cf = *i.second.get();
tasks[std::make_pair(cf.schema()->ks_name(), cf.schema()->cf_name())] = cf.get_compaction_strategy().estimated_pending_compactions(cf);
return make_ready_future<>();
}).then([&tasks] {
return std::move(tasks);
});
});
}, std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), sum_pending_tasks).then(
[](const std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& task_map) {
std::vector<cm::pending_compaction> res;
res.reserve(task_map.size());
for (auto i : task_map) {
cm::pending_compaction task;
task.ks = i.first.first;
task.cf = i.first.second;
task.task = i.second;
res.emplace_back(std::move(task));
}
return make_ready_future<json::json_return_type>(res);
});
});
cm::force_user_defined_compaction.set(r, [] (std::unique_ptr<request> req) {
//TBD
// FIXME

View File

@@ -44,14 +44,14 @@ json::json_return_type get_json_return_type(const db::seed_provider_type& val) {
return json::json_return_type(val.class_name);
}
std::string_view format_type(std::string_view type) {
std::string format_type(const std::string& type) {
if (type == "int") {
return "integer";
}
return type;
}
future<> get_config_swagger_entry(std::string_view name, const std::string& description, std::string_view type, bool& first, output_stream<char>& os) {
future<> get_config_swagger_entry(const std::string& name, const std::string& description, const std::string& type, bool& first, output_stream<char>& os) {
std::stringstream ss;
if (first) {
first=false;
@@ -88,29 +88,23 @@ future<> get_config_swagger_entry(std::string_view name, const std::string& desc
}
namespace cs = httpd::config_json;
#define _get_config_value(name, type, deflt, status, desc, ...) if (id == #name) {return get_json_return_type(ctx.db.local().get_config().name());}
#define _get_config_description(name, type, deflt, status, desc, ...) f = f.then([&os, &first] {return get_config_swagger_entry(#name, desc, #type, first, os);});
void set_config(std::shared_ptr < api_registry_builder20 > rb, http_context& ctx, routes& r) {
rb->register_function(r, [&ctx] (output_stream<char>& os) {
return do_with(true, [&os, &ctx] (bool& first) {
rb->register_function(r, [] (output_stream<char>& os) {
return do_with(true, [&os] (bool& first) {
auto f = make_ready_future();
for (auto&& cfg_ref : ctx.db.local().get_config().values()) {
auto&& cfg = cfg_ref.get();
f = f.then([&os, &first, &cfg] {
return get_config_swagger_entry(cfg.name(), std::string(cfg.desc()), cfg.type_name(), first, os);
});
}
_make_config_values(_get_config_description)
return f;
});
});
cs::find_config_id.set(r, [&ctx] (const_req r) {
auto id = r.param["id"];
for (auto&& cfg_ref : ctx.db.local().get_config().values()) {
auto&& cfg = cfg_ref.get();
if (id == cfg.name()) {
return cfg.value_as_json();
}
}
_make_config_values(_get_config_value)
throw bad_param_exception(sstring("No such config entry: ") + id);
});
}

View File

@@ -76,7 +76,7 @@ future_json_function get_server_getter(std::function<uint64_t(const rpc::stats&)
auto get_shard_map = [f](messaging_service& ms) {
std::unordered_map<gms::inet_address, unsigned long> map;
ms.foreach_server_connection_stats([&map, f] (const rpc::client_info& info, const rpc::stats& stats) mutable {
map[gms::inet_address(info.addr.addr())] = f(stats);
map[gms::inet_address(net::ipv4_address(info.addr))] = f(stats);
});
return map;
};

View File

@@ -47,10 +47,6 @@ static future<json::json_return_type> sum_timed_rate_as_obj(distributed<proxy>&
});
}
httpd::utils_json::rate_moving_average_and_histogram get_empty_moving_average() {
return timer_to_json(utils::rate_moving_average_and_histogram());
}
static future<json::json_return_type> sum_timed_rate_as_long(distributed<proxy>& d, utils::timed_rate_moving_average proxy::stats::*f) {
return sum_timed_rate(d, f).then([](const utils::rate_moving_average& val) {
return make_ready_future<json::json_return_type>(val.count);
@@ -81,9 +77,12 @@ void set_storage_proxy(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(0);
});
sp::get_hinted_handoff_enabled.set(r, [&ctx](std::unique_ptr<request> req) {
auto enabled = ctx.db.local().get_config().hinted_handoff_enabled();
return make_ready_future<json::json_return_type>(enabled);
sp::get_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// hinted handoff is not supported currently,
// so we should return false
return make_ready_future<json::json_return_type>(false);
});
sp::set_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req) {
@@ -247,40 +246,68 @@ void set_storage_proxy(http_context& ctx, routes& r) {
});
});
sp::get_cas_read_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_timeouts);
sp::get_cas_read_timeouts.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return 0
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_read_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_read_unavailables);
sp::get_cas_read_unavailables.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return 0
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_write_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_timeouts);
sp::get_cas_write_timeouts.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return 0
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_write_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::cas_write_unavailables);
sp::get_cas_write_unavailables.set(r, [](std::unique_ptr<request> req) {
//TBD
// FIXME
// cas is not supported yet, so just return 0
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_write_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::cas_write_unfinished_commit);
sp::get_cas_write_metrics_unfinished_commit.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_write_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_estimated_histogram(ctx, &proxy::stats::cas_write_contention);
sp::get_cas_write_metrics_contention.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_write_metrics_condition_not_met.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::cas_write_condition_not_met);
sp::get_cas_write_metrics_condition_not_met.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_read_metrics_unfinished_commit.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::cas_read_unfinished_commit);
sp::get_cas_read_metrics_unfinished_commit.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_read_metrics_contention.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_estimated_histogram(ctx, &proxy::stats::cas_read_contention);
sp::get_cas_read_metrics_contention.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
sp::get_cas_read_metrics_condition_not_met.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
});
sp::get_read_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
@@ -350,21 +377,6 @@ void set_storage_proxy(http_context& ctx, routes& r) {
sp::get_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::write);
});
sp::get_cas_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::cas_write);
});
sp::get_cas_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::cas_read);
});
sp::get_view_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
// FIXME
// No View metrics are available, so just return empty moving average
return make_ready_future<json::json_return_type>(get_empty_moving_average());
});
sp::get_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::read);

View File

@@ -22,15 +22,13 @@
#include "storage_service.hh"
#include "api/api-doc/storage_service.json.hh"
#include "db/config.hh"
#include <optional>
#include <time.h>
#include <boost/range/adaptor/map.hpp>
#include <boost/range/adaptor/filtered.hpp>
#include "service/storage_service.hh"
#include "db/commitlog/commitlog.hh"
#include "gms/gossiper.hh"
#include "db/system_keyspace.hh"
#include "seastar/http/exception.hh"
#include <service/storage_service.hh>
#include <db/commitlog/commitlog.hh>
#include <gms/gossiper.hh>
#include <db/system_keyspace.hh>
#include <seastar/http/exception.hh>
#include "repair/repair.hh"
#include "locator/snitch_base.hh"
#include "column_family.hh"
@@ -39,7 +37,6 @@
#include "sstables/compaction_manager.hh"
#include "sstables/sstables.hh"
#include "database.hh"
#include "db/extensions.hh"
sstables::sstable::version_types get_highest_supported_format();
@@ -55,6 +52,7 @@ static sstring validate_keyspace(http_context& ctx, const parameters& param) {
throw bad_param_exception("Keyspace " + param["keyspace"] + " Does not exist");
}
static std::vector<ss::token_range> describe_ring(const sstring& keyspace) {
std::vector<ss::token_range> res;
for (auto d : service::get_local_storage_service().describe_ring(keyspace)) {
@@ -192,7 +190,7 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_load.set(r, [&ctx](std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family_stats::live_disk_space_used);
return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);
});
ss::get_load_map.set(r, [] (std::unique_ptr<request> req) {
@@ -254,9 +252,6 @@ void set_storage_service(http_context& ctx, routes& r) {
if (column_family.empty()) {
resp = service::get_local_storage_service().take_snapshot(tag, keynames);
} else {
if (keynames.empty()) {
throw httpd::bad_param_exception("The keyspace of column families must be specified");
}
if (keynames.size() > 1) {
throw httpd::bad_param_exception("Only one keyspace allowed when specifying a column family");
}
@@ -307,24 +302,17 @@ void set_storage_service(http_context& ctx, routes& r) {
if (column_families.empty()) {
column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
}
return service::get_local_storage_service().is_cleanup_allowed(keyspace).then([&ctx, keyspace,
column_families = std::move(column_families)] (bool is_cleanup_allowed) mutable {
if (!is_cleanup_allowed) {
return make_exception_future<json::json_return_type>(
std::runtime_error("Can not perform cleanup operation when topology changes"));
return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {
std::vector<column_family*> column_families_vec;
auto& cm = db.get_compaction_manager();
for (auto cf : column_families) {
column_families_vec.push_back(&db.find_column_family(keyspace, cf));
}
return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {
std::vector<column_family*> column_families_vec;
auto& cm = db.get_compaction_manager();
for (auto cf : column_families) {
column_families_vec.push_back(&db.find_column_family(keyspace, cf));
}
return parallel_for_each(column_families_vec, [&cm] (column_family* cf) {
return cm.perform_cleanup(cf);
});
}).then([]{
return make_ready_future<json::json_return_type>(0);
return parallel_for_each(column_families_vec, [&cm] (column_family* cf) {
return cm.perform_cleanup(cf);
});
}).then([]{
return make_ready_future<json::json_return_type>(0);
});
});
@@ -870,7 +858,7 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_metrics_load.set(r, [&ctx](std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family_stats::live_disk_space_used);
return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);
});
ss::get_exceptions.set(r, [](const_req req) {
@@ -912,133 +900,6 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(map_to_key_value(std::move(status), res));
});
});
ss::sstable_info.set(r, [&ctx] (std::unique_ptr<request> req) {
auto ks = api::req_param<sstring>(*req, "keyspace", {}).value;
auto cf = api::req_param<sstring>(*req, "cf", {}).value;
// The size of this vector is bound by ks::cf. I.e. it is as most Nks + Ncf long
// which is not small, but not huge either.
using table_sstables_list = std::vector<ss::table_sstables>;
return do_with(table_sstables_list{}, [ks, cf, &ctx](table_sstables_list& dst) {
return service::get_local_storage_service().db().map_reduce([&dst](table_sstables_list&& res) {
for (auto&& t : res) {
auto i = std::find_if(dst.begin(), dst.end(), [&t](const ss::table_sstables& t2) {
return t.keyspace() == t2.keyspace() && t.table() == t2.table();
});
if (i == dst.end()) {
dst.emplace_back(std::move(t));
continue;
}
auto& ssd = i->sstables;
for (auto&& sd : t.sstables._elements) {
auto j = std::find_if(ssd._elements.begin(), ssd._elements.end(), [&sd](const ss::sstable& s) {
return s.generation() == sd.generation();
});
if (j == ssd._elements.end()) {
i->sstables.push(std::move(sd));
}
}
}
}, [ks, cf](const database& db) {
// see above
table_sstables_list res;
auto& ext = db.get_config().extensions();
for (auto& t : db.get_column_families() | boost::adaptors::map_values) {
auto& schema = t->schema();
if ((ks.empty() || ks == schema->ks_name()) && (cf.empty() || cf == schema->cf_name())) {
// at most Nsstables long
ss::table_sstables tst;
tst.keyspace = schema->ks_name();
tst.table = schema->cf_name();
for (auto sstable : *t->get_sstables_including_compacted_undeleted()) {
auto ts = db_clock::to_time_t(sstable->data_file_write_time());
::tm t;
::gmtime_r(&ts, &t);
ss::sstable info;
info.timestamp = t;
info.generation = sstable->generation();
info.level = sstable->get_sstable_level();
info.size = sstable->bytes_on_disk();
info.data_size = sstable->ondisk_data_size();
info.index_size = sstable->index_size();
info.filter_size = sstable->filter_size();
info.version = sstable->get_version();
if (sstable->has_component(sstables::component_type::CompressionInfo)) {
auto& c = sstable->get_compression();
auto cp = sstables::get_sstable_compressor(c);
ss::named_maps nm;
nm.group = "compression_parameters";
for (auto& p : cp->options()) {
ss::mapper e;
e.key = p.first;
e.value = p.second;
nm.attributes.push(std::move(e));
}
if (!cp->options().count(compression_parameters::SSTABLE_COMPRESSION)) {
ss::mapper e;
e.key = compression_parameters::SSTABLE_COMPRESSION;
e.value = cp->name();
nm.attributes.push(std::move(e));
}
info.extended_properties.push(std::move(nm));
}
sstables::file_io_extension::attr_value_map map;
for (auto* ep : ext.sstable_file_io_extensions()) {
map.merge(ep->get_attributes(*sstable));
}
for (auto& p : map) {
struct {
const sstring& key;
ss::sstable& info;
void operator()(const std::map<sstring, sstring>& map) const {
ss::named_maps nm;
nm.group = key;
for (auto& p : map) {
ss::mapper e;
e.key = p.first;
e.value = p.second;
nm.attributes.push(std::move(e));
}
info.extended_properties.push(std::move(nm));
}
void operator()(const sstring& value) const {
ss::mapper e;
e.key = key;
e.value = value;
info.properties.push(std::move(e));
}
} v{p.first, info};
std::visit(v, p.second);
}
tst.sstables.push(std::move(info));
}
res.emplace_back(std::move(tst));
}
}
std::sort(res.begin(), res.end(), [](const ss::table_sstables& t1, const ss::table_sstables& t2) {
return t1.keyspace() < t2.keyspace() || (t1.keyspace() == t2.keyspace() && t1.table() < t2.table());
});
return res;
}).then([&dst] {
return make_ready_future<json::json_return_type>(stream_object(dst));
});
});
});
}
}

View File

@@ -22,6 +22,7 @@
#include "atomic_cell.hh"
#include "atomic_cell_or_collection.hh"
#include "types.hh"
#include "types/collection.hh"
/// LSA mirator for cells with irrelevant type
///
@@ -147,6 +148,35 @@ atomic_cell_or_collection::atomic_cell_or_collection(const abstract_type& type,
{
}
static collection_mutation_view get_collection_mutation_view(const uint8_t* ptr)
{
auto f = data::cell::structure::get_member<data::cell::tags::flags>(ptr);
auto ti = data::type_info::make_collection();
data::cell::context ctx(f, ti);
auto view = data::cell::structure::get_member<data::cell::tags::cell>(ptr).as<data::cell::tags::collection>(ctx);
auto dv = data::cell::variable_value::make_view(view, f.get<data::cell::tags::external_data>());
return collection_mutation_view { dv };
}
collection_mutation_view atomic_cell_or_collection::as_collection_mutation() const {
return get_collection_mutation_view(_data.get());
}
collection_mutation::collection_mutation(const collection_type_impl& type, collection_mutation_view v)
: _data(imr_object_type::make(data::cell::make_collection(v.data), &type.imr_state().lsa_migrator()))
{
}
collection_mutation::collection_mutation(const collection_type_impl& type, bytes_view v)
: _data(imr_object_type::make(data::cell::make_collection(v), &type.imr_state().lsa_migrator()))
{
}
collection_mutation::operator collection_mutation_view() const
{
return get_collection_mutation_view(_data.get());
}
bool atomic_cell_or_collection::equals(const abstract_type& type, const atomic_cell_or_collection& other) const
{
auto ptr_a = _data.get();
@@ -201,7 +231,7 @@ size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t)
size_t external_value_size = 0;
if (flags.get<data::cell::tags::external_data>()) {
if (flags.get<data::cell::tags::collection>()) {
external_value_size = as_collection_mutation().data.size_bytes();
external_value_size = get_collection_mutation_view(_data.get()).data.size_bytes();
} else {
auto cell_view = data::cell::atomic_cell_view(t.imr_state().type_info(), view);
external_value_size = cell_view.value_size();

View File

@@ -221,6 +221,30 @@ public:
friend std::ostream& operator<<(std::ostream& os, const atomic_cell& ac);
};
class collection_mutation_view;
// Represents a mutation of a collection. Actual format is determined by collection type,
// and is:
// set: list of atomic_cell
// map: list of pair<atomic_cell, bytes> (for key/value)
// list: tbd, probably ugly
class collection_mutation {
public:
using imr_object_type = imr::utils::object<data::cell::structure>;
imr_object_type _data;
collection_mutation() {}
collection_mutation(const collection_type_impl&, collection_mutation_view v);
collection_mutation(const collection_type_impl&, bytes_view bv);
operator collection_mutation_view() const;
};
class collection_mutation_view {
public:
atomic_cell_value_view data;
};
class column_definition;
int compare_atomic_cell_for_merge(atomic_cell_view left, atomic_cell_view right);

View File

@@ -34,12 +34,14 @@ template<>
struct appending_hash<collection_mutation_view> {
template<typename Hasher>
void operator()(Hasher& h, collection_mutation_view cell, const column_definition& cdef) const {
cell.with_deserialized(*cdef.type, [&] (collection_mutation_view_description m_view) {
::feed_hash(h, m_view.tomb);
for (auto&& key_and_value : m_view.cells) {
::feed_hash(h, key_and_value.first);
::feed_hash(h, key_and_value.second, cdef);
}
cell.data.with_linearized([&] (bytes_view cell_bv) {
auto ctype = static_pointer_cast<const collection_type_impl>(cdef.type);
auto m_view = ctype->deserialize_mutation_form(cell_bv);
::feed_hash(h, m_view.tomb);
for (auto&& key_and_value : m_view.cells) {
::feed_hash(h, key_and_value.first);
::feed_hash(h, key_and_value.second, cdef);
}
});
}
};

View File

@@ -22,7 +22,6 @@
#pragma once
#include "atomic_cell.hh"
#include "collection_mutation.hh"
#include "schema.hh"
#include "hashing.hh"

View File

@@ -77,23 +77,17 @@ private:
void on_update_view(const sstring& ks_name, const sstring& view_name, bool columns_changed) override {}
void on_drop_keyspace(const sstring& ks_name) override {
// Do it in the background.
(void)_authorizer.revoke_all(
_authorizer.revoke_all(
auth::make_data_resource(ks_name)).handle_exception_type([](const unsupported_authorization_operation&) {
// Nothing.
}).handle_exception([] (std::exception_ptr e) {
log.error("Unexpected exception while revoking all permissions on dropped keyspace: {}", e);
});
}
void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {
// Do it in the background.
(void)_authorizer.revoke_all(
_authorizer.revoke_all(
auth::make_data_resource(
ks_name, cf_name)).handle_exception_type([](const unsupported_authorization_operation&) {
// Nothing.
}).handle_exception([] (std::exception_ptr e) {
log.error("Unexpected exception while revoking all permissions on dropped table: {}", e);
});
}

View File

@@ -101,8 +101,8 @@ static future<std::optional<record>> find_record(cql3::query_processor& qp, std:
return std::make_optional(
record{
row.get_as<sstring>(sstring(meta::roles_table::role_col_name)),
row.get_or<bool>("is_superuser", false),
row.get_or<bool>("can_login", false),
row.get_as<bool>("is_superuser"),
row.get_as<bool>("can_login"),
(row.has("member_of")
? row.get_set<sstring>("member_of")
: role_set())});
@@ -203,7 +203,7 @@ future<> standard_role_manager::migrate_legacy_metadata() const {
internal_distributed_timeout_config()).then([this](::shared_ptr<cql3::untyped_result_set> results) {
return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
role_config config;
config.is_superuser = row.get_or<bool>("super", false);
config.is_superuser = row.get_as<bool>("super");
config.can_login = true;
return do_with(

View File

@@ -79,8 +79,7 @@ mutation canonical_mutation::to_mutation(schema_ptr s) const {
if (version == m.schema()->version()) {
auto partition_view = mutation_partition_view::from_view(mv.partition());
mutation_application_stats app_stats;
m.partition().apply(*m.schema(), partition_view, *m.schema(), app_stats);
m.partition().apply(*m.schema(), partition_view, *m.schema());
} else {
column_mapping cm = mv.mapping();
converting_mutation_partition_applier v(cm, *m.schema(), m.partition());

View File

@@ -1,604 +0,0 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <utility>
#include <algorithm>
#include <seastar/util/defer.hh>
#include <seastar/core/thread.hh>
#include "cdc/cdc.hh"
#include "bytes.hh"
#include "database.hh"
#include "db/config.hh"
#include "dht/murmur3_partitioner.hh"
#include "partition_slice_builder.hh"
#include "schema.hh"
#include "schema_builder.hh"
#include "service/migration_manager.hh"
#include "service/storage_service.hh"
#include "types/tuple.hh"
#include "cql3/statements/select_statement.hh"
#include "cql3/multi_column_relation.hh"
#include "cql3/tuples.hh"
#include "log.hh"
using locator::snitch_ptr;
using locator::token_metadata;
using locator::topology;
using seastar::sstring;
using service::migration_manager;
using service::storage_proxy;
namespace std {
template<> struct hash<std::pair<net::inet_address, unsigned int>> {
std::size_t operator()(const std::pair<net::inet_address, unsigned int> &p) const {
return std::hash<net::inet_address>{}(p.first) ^ std::hash<int>{}(p.second);
}
};
}
using namespace std::chrono_literals;
static logging::logger cdc_log("cdc");
namespace cdc {
using operation_native_type = std::underlying_type_t<operation>;
using column_op_native_type = std::underlying_type_t<column_op>;
sstring log_name(const sstring& table_name) {
static constexpr auto cdc_log_suffix = "_scylla_cdc_log";
return table_name + cdc_log_suffix;
}
sstring desc_name(const sstring& table_name) {
static constexpr auto cdc_desc_suffix = "_scylla_cdc_desc";
return table_name + cdc_desc_suffix;
}
static future<>
remove_log(db_context ctx, const sstring& ks_name, const sstring& table_name) {
try {
return ctx._migration_manager.announce_column_family_drop(
ks_name, log_name(table_name), false);
} catch (exceptions::configuration_exception& e) {
// It's fine if the table does not exist.
return make_ready_future<>();
} catch (...) {
return make_exception_future<>(std::current_exception());
}
}
static future<>
remove_desc(db_context ctx, const sstring& ks_name, const sstring& table_name) {
try {
return ctx._migration_manager.announce_column_family_drop(
ks_name, desc_name(table_name), false);
} catch (exceptions::configuration_exception& e) {
// It's fine if the table does not exist.
return make_ready_future<>();
} catch (...) {
return make_exception_future<>(std::current_exception());
}
}
future<>
remove(db_context ctx, const sstring& ks_name, const sstring& table_name) {
return when_all(remove_log(ctx, ks_name, table_name),
remove_desc(ctx, ks_name, table_name)).discard_result();
}
static future<> setup_log(db_context ctx, const schema& s) {
schema_builder b(s.ks_name(), log_name(s.cf_name()));
b.set_default_time_to_live(gc_clock::duration{s.cdc_options().ttl()});
b.set_comment(sprint("CDC log for %s.%s", s.ks_name(), s.cf_name()));
b.with_column("stream_id", uuid_type, column_kind::partition_key);
b.with_column("time", timeuuid_type, column_kind::clustering_key);
b.with_column("batch_seq_no", int32_type, column_kind::clustering_key);
b.with_column("operation", data_type_for<operation_native_type>());
b.with_column("ttl", long_type);
auto add_columns = [&] (const schema::const_iterator_range_type& columns, bool is_data_col = false) {
for (const auto& column : columns) {
auto type = column.type;
if (is_data_col) {
type = tuple_type_impl::get_instance({ /* op */ data_type_for<column_op_native_type>(), /* value */ type, /* ttl */long_type});
}
b.with_column("_" + column.name(), type);
}
};
add_columns(s.partition_key_columns());
add_columns(s.clustering_key_columns());
add_columns(s.static_columns(), true);
add_columns(s.regular_columns(), true);
return ctx._migration_manager.announce_new_column_family(b.build(), false);
}
static future<> setup_stream_description_table(db_context ctx, const schema& s) {
schema_builder b(s.ks_name(), desc_name(s.cf_name()));
b.set_comment(sprint("CDC description for %s.%s", s.ks_name(), s.cf_name()));
b.with_column("node_ip", inet_addr_type, column_kind::partition_key);
b.with_column("shard_id", int32_type, column_kind::partition_key);
b.with_column("created_at", timestamp_type, column_kind::clustering_key);
b.with_column("stream_id", uuid_type);
return ctx._migration_manager.announce_new_column_family(b.build(), false);
}
// This function assumes setup_stream_description_table was called on |s| before the call to this
// function.
static future<> populate_desc(db_context ctx, const schema& s) {
auto& db = ctx._proxy.get_db().local();
auto desc_schema =
db.find_schema(s.ks_name(), desc_name(s.cf_name()));
auto log_schema =
db.find_schema(s.ks_name(), log_name(s.cf_name()));
auto belongs_to = [&](const gms::inet_address& endpoint,
const unsigned int shard_id,
const int shard_count,
const unsigned int ignore_msb_bits,
const utils::UUID& stream_id) {
const auto log_pk = partition_key::from_singular(*log_schema,
data_value(stream_id));
const auto token = ctx._partitioner.decorate_key(*log_schema, log_pk).token();
if (ctx._token_metadata.get_endpoint(ctx._token_metadata.first_token(token)) != endpoint) {
return false;
}
const auto owning_shard_id = dht::murmur3_partitioner(shard_count, ignore_msb_bits).shard_of(token);
return owning_shard_id == shard_id;
};
std::vector<mutation> mutations;
const auto ts = api::new_timestamp();
const auto ck = clustering_key::from_single_value(
*desc_schema, timestamp_type->decompose(ts));
auto cdef = desc_schema->get_column_definition(to_bytes("stream_id"));
for (const auto& dc : ctx._token_metadata.get_topology().get_datacenter_endpoints()) {
for (const auto& endpoint : dc.second) {
const auto decomposed_ip = inet_addr_type->decompose(endpoint.addr());
const unsigned int shard_count = ctx._snitch->get_shard_count(endpoint);
const unsigned int ignore_msb_bits = ctx._snitch->get_ignore_msb_bits(endpoint);
for (unsigned int shard_id = 0; shard_id < shard_count; ++shard_id) {
const auto pk = partition_key::from_exploded(
*desc_schema, { decomposed_ip, int32_type->decompose(static_cast<int>(shard_id)) });
mutations.emplace_back(desc_schema, pk);
auto stream_id = utils::make_random_uuid();
while (!belongs_to(endpoint, shard_id, shard_count, ignore_msb_bits, stream_id)) {
stream_id = utils::make_random_uuid();
}
auto value = atomic_cell::make_live(*uuid_type,
ts,
uuid_type->decompose(stream_id));
mutations.back().set_cell(ck, *cdef, std::move(value));
}
}
}
return ctx._proxy.mutate(std::move(mutations),
db::consistency_level::QUORUM,
db::no_timeout,
nullptr,
empty_service_permit());
}
future<> setup(db_context ctx, schema_ptr s) {
return seastar::async([ctx = std::move(ctx), s = std::move(s)] {
setup_log(ctx, *s).get();
auto log_guard = seastar::defer([&] { remove_log(ctx, s->ks_name(), s->cf_name()).get(); });
setup_stream_description_table(ctx, *s).get();
auto desc_guard = seastar::defer([&] { remove_desc(ctx, s->ks_name(), s->cf_name()).get(); });
populate_desc(ctx, *s).get();
desc_guard.cancel();
log_guard.cancel();
});
}
db_context db_context::builder::build() {
return db_context{
_proxy,
_migration_manager ? _migration_manager->get() : service::get_local_migration_manager(),
_token_metadata ? _token_metadata->get() : service::get_local_storage_service().get_token_metadata(),
_snitch ? _snitch->get() : locator::i_endpoint_snitch::get_local_snitch_ptr(),
_partitioner ? _partitioner->get() : dht::global_partitioner()
};
}
class transformer final {
public:
using streams_type = std::unordered_map<std::pair<net::inet_address, unsigned int>, utils::UUID>;
private:
db_context _ctx;
schema_ptr _schema;
schema_ptr _log_schema;
utils::UUID _time;
bytes _decomposed_time;
::shared_ptr<const transformer::streams_type> _streams;
const column_definition& _op_col;
clustering_key set_pk_columns(const partition_key& pk, int batch_no, mutation& m) const {
const auto log_ck = clustering_key::from_exploded(
*m.schema(), { _decomposed_time, int32_type->decompose(batch_no) });
auto pk_value = pk.explode(*_schema);
size_t pos = 0;
for (const auto& column : _schema->partition_key_columns()) {
assert (pos < pk_value.size());
auto cdef = m.schema()->get_column_definition(to_bytes("_" + column.name()));
auto value = atomic_cell::make_live(*column.type,
_time.timestamp(),
bytes_view(pk_value[pos]));
m.set_cell(log_ck, *cdef, std::move(value));
++pos;
}
return log_ck;
}
void set_operation(const clustering_key& ck, operation op, mutation& m) const {
m.set_cell(ck, _op_col, atomic_cell::make_live(*_op_col.type, _time.timestamp(), _op_col.type->decompose(operation_native_type(op))));
}
partition_key stream_id(const net::inet_address& ip, unsigned int shard_id) const {
auto it = _streams->find(std::make_pair(ip, shard_id));
if (it == std::end(*_streams)) {
throw std::runtime_error(format("No stream found for node {} and shard {}", ip, shard_id));
}
return partition_key::from_exploded(*_log_schema, { uuid_type->decompose(it->second) });
}
public:
transformer(db_context ctx, schema_ptr s, ::shared_ptr<const transformer::streams_type> streams)
: _ctx(ctx)
, _schema(std::move(s))
, _log_schema(ctx._proxy.get_db().local().find_schema(_schema->ks_name(), log_name(_schema->cf_name())))
, _time(utils::UUID_gen::get_time_UUID())
, _decomposed_time(timeuuid_type->decompose(_time))
, _streams(std::move(streams))
, _op_col(*_log_schema->get_column_definition(to_bytes("operation")))
{}
// TODO: is pre-image data based on query enough. We only have actual column data. Do we need
// more details like tombstones/ttl? Probably not but keep in mind.
mutation transform(const mutation& m, const cql3::untyped_result_set* rs = nullptr) const {
auto& t = m.token();
auto&& ep = _ctx._token_metadata.get_endpoint(
_ctx._token_metadata.first_token(t));
if (!ep) {
throw std::runtime_error(format("No owner found for key {}", m.decorated_key()));
}
auto shard_id = dht::murmur3_partitioner(_ctx._snitch->get_shard_count(*ep), _ctx._snitch->get_ignore_msb_bits(*ep)).shard_of(t);
mutation res(_log_schema, stream_id(ep->addr(), shard_id));
auto& p = m.partition();
if (p.partition_tombstone()) {
// Partition deletion
auto log_ck = set_pk_columns(m.key(), 0, res);
set_operation(log_ck, operation::partition_delete, res);
} else if (!p.row_tombstones().empty()) {
// range deletion
int batch_no = 0;
for (auto& rt : p.row_tombstones()) {
auto set_bound = [&] (const clustering_key& log_ck, const clustering_key_prefix& ckp) {
auto exploded = ckp.explode(*_schema);
size_t pos = 0;
for (const auto& column : _schema->clustering_key_columns()) {
if (pos >= exploded.size()) {
break;
}
auto cdef = _log_schema->get_column_definition(to_bytes("_" + column.name()));
auto value = atomic_cell::make_live(*column.type,
_time.timestamp(),
bytes_view(exploded[pos]));
res.set_cell(log_ck, *cdef, std::move(value));
++pos;
}
};
{
auto log_ck = set_pk_columns(m.key(), batch_no, res);
set_bound(log_ck, rt.start);
// TODO: separate inclusive/exclusive range
set_operation(log_ck, operation::range_delete_start, res);
++batch_no;
}
{
auto log_ck = set_pk_columns(m.key(), batch_no, res);
set_bound(log_ck, rt.end);
// TODO: separate inclusive/exclusive range
set_operation(log_ck, operation::range_delete_end, res);
++batch_no;
}
}
} else {
// should be update or deletion
int batch_no = 0;
for (const rows_entry& r : p.clustered_rows()) {
auto ck_value = r.key().explode(*_schema);
std::optional<clustering_key> pikey;
const cql3::untyped_result_set_row * pirow = nullptr;
if (rs) {
for (auto& utr : *rs) {
bool match = true;
for (auto& c : _schema->clustering_key_columns()) {
auto rv = utr.get_view(c.name_as_text());
auto cv = r.key().get_component(*_schema, c.component_index());
if (rv != cv) {
match = false;
break;
}
}
if (match) {
pikey = set_pk_columns(m.key(), batch_no, res);
set_operation(*pikey, operation::pre_image, res);
pirow = &utr;
++batch_no;
break;
}
}
}
auto log_ck = set_pk_columns(m.key(), batch_no, res);
size_t pos = 0;
for (const auto& column : _schema->clustering_key_columns()) {
assert (pos < ck_value.size());
auto cdef = _log_schema->get_column_definition(to_bytes("_" + column.name()));
res.set_cell(log_ck, *cdef, atomic_cell::make_live(*column.type, _time.timestamp(), bytes_view(ck_value[pos])));
if (pirow) {
assert(pirow->has(column.name_as_text()));
res.set_cell(*pikey, *cdef, atomic_cell::make_live(*column.type, _time.timestamp(), bytes_view(ck_value[pos])));
}
++pos;
}
std::vector<bytes_opt> values(3);
auto process_cells = [&](const row& r, column_kind ckind) {
r.for_each_cell([&](column_id id, const atomic_cell_or_collection& cell) {
auto& cdef = _schema->column_at(ckind, id);
auto* dst = _log_schema->get_column_definition(to_bytes("_" + cdef.name()));
// todo: collections.
if (cdef.is_atomic()) {
column_op op;
values[1] = values[2] = std::nullopt;
auto view = cell.as_atomic_cell(cdef);
if (view.is_live()) {
op = column_op::set;
values[1] = view.value().linearize();
if (view.is_live_and_has_ttl()) {
values[2] = long_type->decompose(data_value(view.ttl().count()));
}
} else {
op = column_op::del;
}
values[0] = data_type_for<column_op_native_type>()->decompose(data_value(static_cast<column_op_native_type>(op)));
res.set_cell(log_ck, *dst, atomic_cell::make_live(*dst->type, _time.timestamp(), tuple_type_impl::build_value(values)));
if (pirow && pirow->has(cdef.name_as_text())) {
values[0] = data_type_for<column_op_native_type>()->decompose(data_value(static_cast<column_op_native_type>(column_op::set)));
values[1] = pirow->get_blob(cdef.name_as_text());
values[2] = std::nullopt;
assert(std::addressof(res.partition().clustered_row(*_log_schema, *pikey)) != std::addressof(res.partition().clustered_row(*_log_schema, log_ck)));
assert(pikey->explode() != log_ck.explode());
res.set_cell(*pikey, *dst, atomic_cell::make_live(*dst->type, _time.timestamp(), tuple_type_impl::build_value(values)));
}
} else {
cdc_log.warn("Non-atomic cell ignored {}.{}:{}", _schema->ks_name(), _schema->cf_name(), cdef.name_as_text());
}
});
};
process_cells(r.row().cells(), column_kind::regular_column);
process_cells(p.static_row().get(), column_kind::static_column);
set_operation(log_ck, operation::update, res);
++batch_no;
}
}
return res;
}
static db::timeout_clock::time_point default_timeout() {
return db::timeout_clock::now() + 10s;
}
future<lw_shared_ptr<cql3::untyped_result_set>> pre_image_select(
service::storage_proxy& proxy,
service::client_state& client_state,
db::consistency_level cl,
const mutation& m)
{
auto& p = m.partition();
if (p.partition_tombstone() || !p.row_tombstones().empty() || p.clustered_rows().empty()) {
return make_ready_future<lw_shared_ptr<cql3::untyped_result_set>>();
}
dht::partition_range_vector partition_ranges{dht::partition_range(m.decorated_key())};
auto&& pc = _schema->partition_key_columns();
auto&& cc = _schema->clustering_key_columns();
std::vector<query::clustering_range> bounds;
if (cc.empty()) {
bounds.push_back(query::clustering_range::make_open_ended_both_sides());
} else {
for (const rows_entry& r : p.clustered_rows()) {
auto& ck = r.key();
bounds.push_back(query::clustering_range::make_singular(ck));
}
}
std::vector<const column_definition*> columns;
columns.reserve(_schema->all_columns().size());
std::transform(pc.begin(), pc.end(), std::back_inserter(columns), [](auto& c) { return &c; });
std::transform(cc.begin(), cc.end(), std::back_inserter(columns), [](auto& c) { return &c; });
query::column_id_vector static_columns, regular_columns;
auto sk = column_kind::static_column;
auto rk = column_kind::regular_column;
// TODO: this assumes all mutations touch the same set of columns. This might not be true, and we may need to do more horrible set operation here.
for (auto& [r, cids, kind] : { std::tie(p.static_row().get(), static_columns, sk), std::tie(p.clustered_rows().begin()->row().cells(), regular_columns, rk) }) {
r.for_each_cell([&](column_id id, const atomic_cell_or_collection&) {
auto& cdef =_schema->column_at(kind, id);
cids.emplace_back(id);
columns.emplace_back(&cdef);
});
}
auto selection = cql3::selection::selection::for_columns(_schema, std::move(columns));
auto partition_slice = query::partition_slice(std::move(bounds), std::move(static_columns), std::move(regular_columns), selection->get_query_options());
auto command = ::make_lw_shared<query::read_command>(_schema->id(), _schema->version(), partition_slice, query::max_partitions);
return proxy.query(_schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), empty_service_permit(), client_state)).then(
[this, partition_slice = std::move(partition_slice), selection = std::move(selection)] (service::storage_proxy::coordinator_query_result qr) -> lw_shared_ptr<cql3::untyped_result_set> {
cql3::selection::result_set_builder builder(*selection, gc_clock::now(), cql_serialization_format::latest());
query::result_view::consume(*qr.query_result, partition_slice, cql3::selection::result_set_builder::visitor(builder, *_schema, *selection));
auto result_set = builder.build();
if (!result_set || result_set->empty()) {
return {};
}
return make_lw_shared<cql3::untyped_result_set>(*result_set);
});
}
};
// This class is used to build a mapping from <node ip, shard id> to stream_id
// It is used as a consumer for rows returned by the query to CDC Description Table
class streams_builder {
const schema& _schema;
transformer::streams_type _streams;
net::inet_address _node_ip = net::inet_address();
unsigned int _shard_id = 0;
api::timestamp_type _latest_row_timestamp = api::min_timestamp;
utils::UUID _latest_row_stream_id = utils::UUID();
public:
streams_builder(const schema& s) : _schema(s) {}
void accept_new_partition(const partition_key& key, uint32_t row_count) {
auto exploded = key.explode(_schema);
_node_ip = value_cast<net::inet_address>(inet_addr_type->deserialize(exploded[0]));
_shard_id = static_cast<unsigned int>(value_cast<int>(int32_type->deserialize(exploded[1])));
_latest_row_timestamp = api::min_timestamp;
_latest_row_stream_id = utils::UUID();
}
void accept_new_partition(uint32_t row_count) {
assert(false);
}
void accept_new_row(
const clustering_key& key,
const query::result_row_view& static_row,
const query::result_row_view& row) {
auto row_iterator = row.iterator();
api::timestamp_type timestamp = value_cast<db_clock::time_point>(
timestamp_type->deserialize(key.explode(_schema)[0])).time_since_epoch().count();
if (timestamp <= _latest_row_timestamp) {
return;
}
_latest_row_timestamp = timestamp;
for (auto&& cdef : _schema.regular_columns()) {
if (cdef.name_as_text() != "stream_id") {
row_iterator.skip(cdef);
continue;
}
auto val_opt = row_iterator.next_atomic_cell();
assert(val_opt);
val_opt->value().with_linearized([&] (bytes_view bv) {
_latest_row_stream_id = value_cast<utils::UUID>(uuid_type->deserialize(bv));
});
}
}
void accept_new_row(const query::result_row_view& static_row, const query::result_row_view& row) {
assert(false);
}
void accept_partition_end(const query::result_row_view& static_row) {
_streams.emplace(std::make_pair(_node_ip, _shard_id), _latest_row_stream_id);
}
transformer::streams_type build() {
return std::move(_streams);
}
};
static future<::shared_ptr<transformer::streams_type>> get_streams(
db_context ctx,
const sstring& ks_name,
const sstring& cf_name,
lowres_clock::time_point timeout,
service::query_state& qs) {
auto s =
ctx._proxy.get_db().local().find_schema(ks_name, desc_name(cf_name));
query::read_command cmd(
s->id(),
s->version(),
partition_slice_builder(*s).with_no_static_columns().build());
return ctx._proxy.query(
s,
make_lw_shared(std::move(cmd)),
{dht::partition_range::make_open_ended_both_sides()},
db::consistency_level::QUORUM,
{timeout, qs.get_permit(), qs.get_client_state()}).then([s = std::move(s)] (auto qr) mutable {
return query::result_view::do_with(*qr.query_result,
[s = std::move(s)] (query::result_view v) {
auto slice = partition_slice_builder(*s)
.with_no_static_columns()
.build();
streams_builder builder{ *s };
v.consume(slice, builder);
return ::make_shared<transformer::streams_type>(builder.build());
});
});
}
future<std::vector<mutation>> append_log_mutations(
db_context ctx,
schema_ptr s,
service::storage_proxy::clock_type::time_point timeout,
service::query_state& qs,
std::vector<mutation> muts) {
auto mp = ::make_lw_shared<std::vector<mutation>>(std::move(muts));
return get_streams(ctx, s->ks_name(), s->cf_name(), timeout, qs).then([ctx, s = std::move(s), mp, &qs](::shared_ptr<transformer::streams_type> streams) mutable {
mp->reserve(2 * mp->size());
auto trans = make_lw_shared<transformer>(ctx, s, std::move(streams));
auto i = mp->begin();
auto e = mp->end();
return parallel_for_each(i, e, [ctx, &qs, trans, mp](mutation& m) {
return trans->pre_image_select(ctx._proxy, qs.get_client_state(), db::consistency_level::LOCAL_QUORUM, m).then([trans, mp, &m](lw_shared_ptr<cql3::untyped_result_set> rs) {
mp->push_back(trans->transform(m, rs.get()));
});
}).then([mp] {
return std::move(*mp);
});
});
}
} // namespace cdc

View File

@@ -1,233 +0,0 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <functional>
#include <optional>
#include <map>
#include <string>
#include <vector>
#include <seastar/core/future.hh>
#include <seastar/core/lowres_clock.hh>
#include <seastar/core/shared_ptr.hh>
#include <seastar/core/sstring.hh>
#include "exceptions/exceptions.hh"
#include "json.hh"
#include "timestamp.hh"
class schema;
using schema_ptr = seastar::lw_shared_ptr<const schema>;
namespace locator {
class snitch_ptr;
class token_metadata;
} // namespace locator
namespace service {
class migration_manager;
class storage_proxy;
class query_state;
} // namespace service
namespace dht {
class i_partitioner;
} // namespace dht
class mutation;
class partition_key;
namespace cdc {
class options final {
bool _enabled = false;
bool _preimage = false;
bool _postimage = false;
int _ttl = 86400; // 24h in seconds
public:
options() = default;
options(const std::map<sstring, sstring>& map) {
if (map.find("enabled") == std::end(map)) {
return;
}
for (auto& p : map) {
if (p.first == "enabled") {
_enabled = p.second == "true";
} else if (p.first == "preimage") {
_preimage = p.second == "true";
} else if (p.first == "postimage") {
_postimage = p.second == "true";
} else if (p.first == "ttl") {
_ttl = std::stoi(p.second);
} else {
throw exceptions::configuration_exception("Invalid CDC option: " + p.first);
}
}
}
std::map<sstring, sstring> to_map() const {
if (!_enabled) {
return {};
}
return {
{ "enabled", _enabled ? "true" : "false" },
{ "preimage", _preimage ? "true" : "false" },
{ "postimage", _postimage ? "true" : "false" },
{ "ttl", std::to_string(_ttl) },
};
}
sstring to_sstring() const {
return json::to_json(to_map());
}
bool enabled() const { return _enabled; }
bool preimage() const { return _preimage; }
bool postimage() const { return _postimage; }
int ttl() const { return _ttl; }
bool operator==(const options& o) const {
return _enabled == o._enabled && _preimage == o._preimage && _postimage == o._postimage && _ttl == o._ttl;
}
bool operator!=(const options& o) const {
return !(*this == o);
}
};
struct db_context final {
service::storage_proxy& _proxy;
service::migration_manager& _migration_manager;
locator::token_metadata& _token_metadata;
locator::snitch_ptr& _snitch;
dht::i_partitioner& _partitioner;
class builder final {
service::storage_proxy& _proxy;
std::optional<std::reference_wrapper<service::migration_manager>> _migration_manager;
std::optional<std::reference_wrapper<locator::token_metadata>> _token_metadata;
std::optional<std::reference_wrapper<locator::snitch_ptr>> _snitch;
std::optional<std::reference_wrapper<dht::i_partitioner>> _partitioner;
public:
builder(service::storage_proxy& proxy) : _proxy(proxy) { }
builder& with_migration_manager(service::migration_manager& migration_manager) {
_migration_manager = migration_manager;
return *this;
}
builder& with_token_metadata(locator::token_metadata& token_metadata) {
_token_metadata = token_metadata;
return *this;
}
builder& with_snitch(locator::snitch_ptr& snitch) {
_snitch = snitch;
return *this;
}
builder& with_partitioner(dht::i_partitioner& partitioner) {
_partitioner = partitioner;
return *this;
}
db_context build();
};
};
/// \brief Sets up CDC related tables for a given table
///
/// This function not only creates CDC Log and CDC Description for a given table
/// but also populates CDC Description with a list of change streams.
///
/// param[in] ctx object with references to database components
/// param[in] schema schema of a table for which CDC tables are being created
seastar::future<> setup(db_context ctx, schema_ptr schema);
// cdc log table operation
enum class operation : int8_t {
// note: these values will eventually be read by a third party, probably not privvy to this
// enum decl, so don't change the constant values (or the datatype).
pre_image = 0, update = 1, row_delete = 2, range_delete_start = 3, range_delete_end = 4, partition_delete = 5
};
// cdc log data column operation
enum class column_op : int8_t {
// same as "operation". Do not edit values or type/type unless you _really_ want to.
set = 0, del = 1, add = 2,
};
/// \brief Deletes CDC Log and CDC Description tables for a given table
///
/// This function cleans up all CDC related tables created for a given table.
/// At the moment, CDC Log and CDC Description are the only affected tables.
/// It's ok if some/all of them don't exist.
///
/// \param[in] ctx object with references to database components
/// \param[in] ks_name keyspace name of a table for which CDC tables are removed
/// \param[in] table_name name of a table for which CDC tables are removed
///
/// \pre This function works correctly no matter if CDC Log and/or CDC Description
/// exist.
seastar::future<>
remove(db_context ctx, const seastar::sstring& ks_name, const seastar::sstring& table_name);
seastar::sstring log_name(const seastar::sstring& table_name);
seastar::sstring desc_name(const seastar::sstring& table_name);
/// \brief For each mutation in the set appends related CDC Log mutation
///
/// This function should be called with a set of mutations of a table
/// with CDC enabled. Returned set of mutations contains all original mutations
/// and for each original mutation appends a mutation to CDC Log that reflects
/// the change.
///
/// \param[in] ctx object with references to database components
/// \param[in] s schema of a CDC enabled table which is being modified
/// \param[in] timeout period of time after which a request is considered timed out
/// \param[in] qs the state of the query that's being executed
/// \param[in] mutations set of changes of a CDC enabled table
///
/// \return set of mutations from input parameter with relevant CDC Log mutations appended
///
/// \pre CDC Log and CDC Description have to exist
/// \pre CDC Description has to be in sync with cluster topology
///
/// \note At the moment, cluster topology changes are not supported
// so the assumption that CDC Description is in sync with cluster topology
// is easy to enforce. When support for cluster topology changes is added
// it has to make sure the assumption holds.
seastar::future<std::vector<mutation>>append_log_mutations(
db_context ctx,
schema_ptr s,
lowres_clock::time_point timeout,
service::query_state& qs,
std::vector<mutation> mutations);
} // namespace cdc

View File

@@ -68,7 +68,7 @@ public:
public:
explicit iterator(const mutation_partition& mp)
: _mp(mp)
, _current(position_in_partition_view(position_in_partition_view::static_row_tag_t()), mp.static_row().get())
, _current(position_in_partition_view(position_in_partition_view::static_row_tag_t()), mp.static_row())
{ }
iterator(const mutation_partition& mp, mutation_partition::rows_type::const_iterator it)

3
coding-style.md Normal file
View File

@@ -0,0 +1,3 @@
# Scylla Coding Style
Please see the [Seastar style document](https://github.com/scylladb/seastar/blob/master/coding-style.md).

View File

@@ -1,440 +0,0 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "types/collection.hh"
#include "types/user.hh"
#include "concrete_types.hh"
#include "atomic_cell_or_collection.hh"
#include "mutation_partition.hh"
#include "compaction_garbage_collector.hh"
#include "combine.hh"
#include "collection_mutation.hh"
collection_mutation::collection_mutation(const abstract_type& type, collection_mutation_view v)
: _data(imr_object_type::make(data::cell::make_collection(v.data), &type.imr_state().lsa_migrator())) {}
collection_mutation::collection_mutation(const abstract_type& type, bytes_view v)
: _data(imr_object_type::make(data::cell::make_collection(v), &type.imr_state().lsa_migrator())) {}
static collection_mutation_view get_collection_mutation_view(const uint8_t* ptr)
{
auto f = data::cell::structure::get_member<data::cell::tags::flags>(ptr);
auto ti = data::type_info::make_collection();
data::cell::context ctx(f, ti);
auto view = data::cell::structure::get_member<data::cell::tags::cell>(ptr).as<data::cell::tags::collection>(ctx);
auto dv = data::cell::variable_value::make_view(view, f.get<data::cell::tags::external_data>());
return collection_mutation_view { dv };
}
collection_mutation::operator collection_mutation_view() const
{
return get_collection_mutation_view(_data.get());
}
collection_mutation_view atomic_cell_or_collection::as_collection_mutation() const {
return get_collection_mutation_view(_data.get());
}
bool collection_mutation_view::is_empty() const {
return data.with_linearized([&] (bytes_view in) { // FIXME: we can guarantee that this is in the first fragment
auto has_tomb = read_simple<bool>(in);
return !has_tomb && read_simple<uint32_t>(in) == 0;
});
}
template <typename F>
GCC6_CONCEPT(requires std::is_invocable_r_v<const data::type_info&, F, bytes_view&>)
static bool is_any_live(const atomic_cell_value_view& data, tombstone tomb, gc_clock::time_point now, F&& read_cell_type_info) {
return data.with_linearized([&] (bytes_view in) {
auto has_tomb = read_simple<bool>(in);
if (has_tomb) {
auto ts = read_simple<api::timestamp_type>(in);
auto ttl = read_simple<gc_clock::duration::rep>(in);
tomb.apply(tombstone{ts, gc_clock::time_point(gc_clock::duration(ttl))});
}
auto nr = read_simple<uint32_t>(in);
for (uint32_t i = 0; i != nr; ++i) {
auto& type_info = read_cell_type_info(in);
auto vsize = read_simple<uint32_t>(in);
auto value = atomic_cell_view::from_bytes(type_info, read_simple_bytes(in, vsize));
if (value.is_live(tomb, now, false)) {
return true;
}
}
return false;
});
}
bool collection_mutation_view::is_any_live(const abstract_type& type, tombstone tomb, gc_clock::time_point now) const {
return visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
auto& type_info = ctype.value_comparator()->imr_state().type_info();
return ::is_any_live(data, tomb, now, [&type_info] (bytes_view& in) -> const data::type_info& {
auto key_size = read_simple<uint32_t>(in);
in.remove_prefix(key_size);
return type_info;
});
},
[&] (const user_type_impl& utype) {
return ::is_any_live(data, tomb, now, [&utype] (bytes_view& in) -> const data::type_info& {
auto key_size = read_simple<uint32_t>(in);
auto key = read_simple_bytes(in, key_size);
return utype.type(deserialize_field_index(key))->imr_state().type_info();
});
},
[&] (const abstract_type& o) -> bool {
throw std::runtime_error(format("collection_mutation_view::is_any_live: unknown type {}", o.name()));
}
));
}
template <typename F>
GCC6_CONCEPT(requires std::is_invocable_r_v<const data::type_info&, F, bytes_view&>)
static api::timestamp_type last_update(const atomic_cell_value_view& data, F&& read_cell_type_info) {
return data.with_linearized([&] (bytes_view in) {
api::timestamp_type max = api::missing_timestamp;
auto has_tomb = read_simple<bool>(in);
if (has_tomb) {
max = std::max(max, read_simple<api::timestamp_type>(in));
(void)read_simple<gc_clock::duration::rep>(in);
}
auto nr = read_simple<uint32_t>(in);
for (uint32_t i = 0; i != nr; ++i) {
auto& type_info = read_cell_type_info(in);
auto vsize = read_simple<uint32_t>(in);
auto value = atomic_cell_view::from_bytes(type_info, read_simple_bytes(in, vsize));
max = std::max(value.timestamp(), max);
}
return max;
});
}
api::timestamp_type collection_mutation_view::last_update(const abstract_type& type) const {
return visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
auto& type_info = ctype.value_comparator()->imr_state().type_info();
return ::last_update(data, [&type_info] (bytes_view& in) -> const data::type_info& {
auto key_size = read_simple<uint32_t>(in);
in.remove_prefix(key_size);
return type_info;
});
},
[&] (const user_type_impl& utype) {
return ::last_update(data, [&utype] (bytes_view& in) -> const data::type_info& {
auto key_size = read_simple<uint32_t>(in);
auto key = read_simple_bytes(in, key_size);
return utype.type(deserialize_field_index(key))->imr_state().type_info();
});
},
[&] (const abstract_type& o) -> api::timestamp_type {
throw std::runtime_error(format("collection_mutation_view::last_update: unknown type {}", o.name()));
}
));
}
collection_mutation_description
collection_mutation_view_description::materialize(const abstract_type& type) const {
collection_mutation_description m;
m.tomb = tomb;
m.cells.reserve(cells.size());
visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
auto& value_type = *ctype.value_comparator();
for (auto&& e : cells) {
m.cells.emplace_back(to_bytes(e.first), atomic_cell(value_type, e.second));
}
},
[&] (const user_type_impl& utype) {
for (auto&& e : cells) {
m.cells.emplace_back(to_bytes(e.first), atomic_cell(*utype.type(deserialize_field_index(e.first)), e.second));
}
},
[&] (const abstract_type& o) {
throw std::runtime_error(format("attempted to materialize collection_mutation_view_description with type {}", o.name()));
}
));
return m;
}
bool collection_mutation_description::compact_and_expire(column_id id, row_tombstone base_tomb, gc_clock::time_point query_time,
can_gc_fn& can_gc, gc_clock::time_point gc_before, compaction_garbage_collector* collector)
{
bool any_live = false;
auto t = tomb;
tombstone purged_tomb;
if (tomb <= base_tomb.regular()) {
tomb = tombstone();
} else if (tomb.deletion_time < gc_before && can_gc(tomb)) {
purged_tomb = tomb;
tomb = tombstone();
}
t.apply(base_tomb.regular());
utils::chunked_vector<std::pair<bytes, atomic_cell>> survivors;
utils::chunked_vector<std::pair<bytes, atomic_cell>> losers;
for (auto&& name_and_cell : cells) {
atomic_cell& cell = name_and_cell.second;
auto cannot_erase_cell = [&] {
return cell.deletion_time() >= gc_before || !can_gc(tombstone(cell.timestamp(), cell.deletion_time()));
};
if (cell.is_covered_by(t, false) || cell.is_covered_by(base_tomb.shadowable().tomb(), false)) {
continue;
}
if (cell.has_expired(query_time)) {
if (cannot_erase_cell()) {
survivors.emplace_back(std::make_pair(
std::move(name_and_cell.first), atomic_cell::make_dead(cell.timestamp(), cell.deletion_time())));
} else if (collector) {
losers.emplace_back(std::pair(
std::move(name_and_cell.first), atomic_cell::make_dead(cell.timestamp(), cell.deletion_time())));
}
} else if (!cell.is_live()) {
if (cannot_erase_cell()) {
survivors.emplace_back(std::move(name_and_cell));
} else if (collector) {
losers.emplace_back(std::move(name_and_cell));
}
} else {
any_live |= true;
survivors.emplace_back(std::move(name_and_cell));
}
}
if (collector) {
collector->collect(id, collection_mutation_description{purged_tomb, std::move(losers)});
}
cells = std::move(survivors);
return any_live;
}
template <typename Iterator>
static collection_mutation serialize_collection_mutation(
const abstract_type& type,
const tombstone& tomb,
boost::iterator_range<Iterator> cells) {
auto element_size = [] (size_t c, auto&& e) -> size_t {
return c + 8 + e.first.size() + e.second.serialize().size();
};
auto size = accumulate(cells, (size_t)4, element_size);
size += 1;
if (tomb) {
size += sizeof(tomb.timestamp) + sizeof(tomb.deletion_time);
}
bytes ret(bytes::initialized_later(), size);
bytes::iterator out = ret.begin();
*out++ = bool(tomb);
if (tomb) {
write(out, tomb.timestamp);
write(out, tomb.deletion_time.time_since_epoch().count());
}
auto writeb = [&out] (bytes_view v) {
serialize_int32(out, v.size());
out = std::copy_n(v.begin(), v.size(), out);
};
// FIXME: overflow?
serialize_int32(out, boost::distance(cells));
for (auto&& kv : cells) {
auto&& k = kv.first;
auto&& v = kv.second;
writeb(k);
writeb(v.serialize());
}
return collection_mutation(type, ret);
}
collection_mutation collection_mutation_description::serialize(const abstract_type& type) const {
return serialize_collection_mutation(type, tomb, boost::make_iterator_range(cells.begin(), cells.end()));
}
collection_mutation collection_mutation_view_description::serialize(const abstract_type& type) const {
return serialize_collection_mutation(type, tomb, boost::make_iterator_range(cells.begin(), cells.end()));
}
template <typename C>
GCC6_CONCEPT(requires std::is_base_of_v<abstract_type, std::remove_reference_t<C>>)
static collection_mutation_view_description
merge(collection_mutation_view_description a, collection_mutation_view_description b, C&& key_type) {
using element_type = std::pair<bytes_view, atomic_cell_view>;
auto compare = [&] (const element_type& e1, const element_type& e2) {
return key_type.less(e1.first, e2.first);
};
auto merge = [] (const element_type& e1, const element_type& e2) {
// FIXME: use std::max()?
return std::make_pair(e1.first, compare_atomic_cell_for_merge(e1.second, e2.second) > 0 ? e1.second : e2.second);
};
// applied to a tombstone, returns a predicate checking whether a cell is killed by
// the tombstone
auto cell_killed = [] (const std::optional<tombstone>& t) {
return [&t] (const element_type& e) {
if (!t) {
return false;
}
// tombstone wins if timestamps equal here, unlike row tombstones
if (t->timestamp < e.second.timestamp()) {
return false;
}
return true;
// FIXME: should we consider TTLs too?
};
};
collection_mutation_view_description merged;
merged.cells.reserve(a.cells.size() + b.cells.size());
combine(a.cells.begin(), std::remove_if(a.cells.begin(), a.cells.end(), cell_killed(b.tomb)),
b.cells.begin(), std::remove_if(b.cells.begin(), b.cells.end(), cell_killed(a.tomb)),
std::back_inserter(merged.cells),
compare,
merge);
merged.tomb = std::max(a.tomb, b.tomb);
return merged;
}
collection_mutation merge(const abstract_type& type, collection_mutation_view a, collection_mutation_view b) {
return a.with_deserialized(type, [&] (collection_mutation_view_description a_view) {
return b.with_deserialized(type, [&] (collection_mutation_view_description b_view) {
return visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
return merge(std::move(a_view), std::move(b_view), *ctype.name_comparator());
},
[&] (const user_type_impl& utype) {
return merge(std::move(a_view), std::move(b_view), *short_type);
},
[] (const abstract_type& o) -> collection_mutation_view_description {
throw std::runtime_error(format("collection_mutation merge: unknown type: {}", o.name()));
}
)).serialize(type);
});
});
}
template <typename C>
GCC6_CONCEPT(requires std::is_base_of_v<abstract_type, std::remove_reference_t<C>>)
static collection_mutation_view_description
difference(collection_mutation_view_description a, collection_mutation_view_description b, C&& key_type)
{
collection_mutation_view_description diff;
diff.cells.reserve(std::max(a.cells.size(), b.cells.size()));
auto it = b.cells.begin();
for (auto&& c : a.cells) {
while (it != b.cells.end() && key_type.less(it->first, c.first)) {
++it;
}
if (it == b.cells.end() || !key_type.equal(it->first, c.first)
|| compare_atomic_cell_for_merge(c.second, it->second) > 0) {
auto cell = std::make_pair(c.first, c.second);
diff.cells.emplace_back(std::move(cell));
}
}
if (a.tomb > b.tomb) {
diff.tomb = a.tomb;
}
return diff;
}
collection_mutation difference(const abstract_type& type, collection_mutation_view a, collection_mutation_view b)
{
return a.with_deserialized(type, [&] (collection_mutation_view_description a_view) {
return b.with_deserialized(type, [&] (collection_mutation_view_description b_view) {
return visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
return difference(std::move(a_view), std::move(b_view), *ctype.name_comparator());
},
[&] (const user_type_impl& utype) {
return difference(std::move(a_view), std::move(b_view), *short_type);
},
[] (const abstract_type& o) -> collection_mutation_view_description {
throw std::runtime_error(format("collection_mutation difference: unknown type: {}", o.name()));
}
)).serialize(type);
});
});
}
template <typename F>
GCC6_CONCEPT(requires std::is_invocable_r_v<std::pair<bytes_view, atomic_cell_view>, F, bytes_view&>)
static collection_mutation_view_description
deserialize_collection_mutation(bytes_view in, F&& read_kv) {
collection_mutation_view_description ret;
auto has_tomb = read_simple<bool>(in);
if (has_tomb) {
auto ts = read_simple<api::timestamp_type>(in);
auto ttl = read_simple<gc_clock::duration::rep>(in);
ret.tomb = tombstone{ts, gc_clock::time_point(gc_clock::duration(ttl))};
}
auto nr = read_simple<uint32_t>(in);
ret.cells.reserve(nr);
for (uint32_t i = 0; i != nr; ++i) {
ret.cells.push_back(read_kv(in));
}
assert(in.empty());
return ret;
}
collection_mutation_view_description
deserialize_collection_mutation(const abstract_type& type, bytes_view in) {
return visit(type, make_visitor(
[&] (const collection_type_impl& ctype) {
// value_comparator(), ugh
auto& type_info = ctype.value_comparator()->imr_state().type_info();
return deserialize_collection_mutation(in, [&type_info] (bytes_view& in) {
// FIXME: we could probably avoid the need for size
auto ksize = read_simple<uint32_t>(in);
auto key = read_simple_bytes(in, ksize);
auto vsize = read_simple<uint32_t>(in);
auto value = atomic_cell_view::from_bytes(type_info, read_simple_bytes(in, vsize));
return std::make_pair(key, value);
});
},
[&] (const user_type_impl& utype) {
return deserialize_collection_mutation(in, [&utype] (bytes_view& in) {
// FIXME: we could probably avoid the need for size
auto ksize = read_simple<uint32_t>(in);
auto key = read_simple_bytes(in, ksize);
auto vsize = read_simple<uint32_t>(in);
auto value = atomic_cell_view::from_bytes(
utype.type(deserialize_field_index(key))->imr_state().type_info(), read_simple_bytes(in, vsize));
return std::make_pair(key, value);
});
},
[&] (const abstract_type& o) -> collection_mutation_view_description {
throw std::runtime_error(format("deserialize_collection_mutation: unknown type {}", o.name()));
}
));
}

View File

@@ -1,124 +0,0 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "utils/chunked_vector.hh"
#include "schema_fwd.hh"
#include "gc_clock.hh"
#include "atomic_cell.hh"
#include "cql_serialization_format.hh"
class abstract_type;
class compaction_garbage_collector;
class row_tombstone;
class collection_mutation;
// An auxiliary struct used to (de)construct collection_mutations.
// Unlike collection_mutation which is a serialized blob, this struct allows to inspect logical units of information
// (tombstone and cells) inside the mutation easily.
struct collection_mutation_description {
tombstone tomb;
// FIXME: use iterators?
// we never iterate over `cells` more than once, so there is no need to store them in memory.
// In some cases instead of constructing the `cells` vector, it would be more efficient to provide
// a one-time-use forward iterator which returns the cells.
utils::chunked_vector<std::pair<bytes, atomic_cell>> cells;
// Expires cells based on query_time. Expires tombstones based on max_purgeable and gc_before.
// Removes cells covered by tomb or this->tomb.
bool compact_and_expire(column_id id, row_tombstone tomb, gc_clock::time_point query_time,
can_gc_fn&, gc_clock::time_point gc_before, compaction_garbage_collector* collector = nullptr);
// Packs the data to a serialized blob.
collection_mutation serialize(const abstract_type&) const;
};
// Similar to collection_mutation_description, except that it doesn't store the cells' data, only observes it.
struct collection_mutation_view_description {
tombstone tomb;
// FIXME: use iterators? See the fixme in collection_mutation_description; the same considerations apply here.
utils::chunked_vector<std::pair<bytes_view, atomic_cell_view>> cells;
// Copies the observed data, storing it in a collection_mutation_description.
collection_mutation_description materialize(const abstract_type&) const;
// Packs the data to a serialized blob.
collection_mutation serialize(const abstract_type&) const;
};
// Given a linearized collection_mutation_view, returns an auxiliary struct allowing the inspection of each cell.
// The struct is an observer of the data given by the collection_mutation_view and doesn't extend its lifetime.
// The function needs to be given the type of stored data to reconstruct the structural information.
collection_mutation_view_description deserialize_collection_mutation(const abstract_type&, bytes_view);
class collection_mutation_view {
public:
atomic_cell_value_view data;
// Is this a noop mutation?
bool is_empty() const;
// Is any of the stored cells live (not deleted nor expired) at the time point `tp`,
// given the later of the tombstones `t` and the one stored in the mutation (if any)?
// Requires a type to reconstruct the structural information.
bool is_any_live(const abstract_type&, tombstone t = tombstone(), gc_clock::time_point tp = gc_clock::time_point::min()) const;
// The maximum of timestamps of the mutation's cells and tombstone.
api::timestamp_type last_update(const abstract_type&) const;
// Given a function that operates on a collection_mutation_view_description,
// calls it on the corresponding description of `this`.
template <typename F>
inline decltype(auto) with_deserialized(const abstract_type& type, F f) const {
return data.with_linearized([&] (bytes_view bv) {
return f(deserialize_collection_mutation(type, std::move(bv)));
});
}
};
// A serialized mutation of a collection of cells.
// Used to represent mutations of collections (lists, maps, sets) or non-frozen user defined types.
// It contains a sequence of cells, each representing a mutation of a single entry (element or field) of the collection.
// Each cell has an associated 'key' (or 'path'). The meaning of each (key, cell) pair is:
// for sets: the key is the serialized set element, the cell contains no data (except liveness information),
// for maps: the key is the serialized map element's key, the cell contains the serialized map element's value,
// for lists: the key is a timeuuid identifying the list entry, the cell contains the serialized value,
// for user types: the key is an index identifying the field, the cell contains the value of the field.
// The mutation may also contain a collection-wide tombstone.
class collection_mutation {
public:
using imr_object_type = imr::utils::object<data::cell::structure>;
imr_object_type _data;
collection_mutation() {}
collection_mutation(const abstract_type&, collection_mutation_view);
collection_mutation(const abstract_type&, bytes_view);
operator collection_mutation_view() const;
};
collection_mutation merge(const abstract_type&, collection_mutation_view, collection_mutation_view);
collection_mutation difference(const abstract_type&, collection_mutation_view, collection_mutation_view);
// Serializes the given collection of cells to a sequence of bytes ready to be sent over the CQL protocol.
bytes serialize_for_cql(const abstract_type&, collection_mutation_view, cql_serialization_format);

View File

@@ -1,63 +0,0 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "bytes.hh"
class schema;
class partition_key;
class clustering_row;
class column_computation;
using column_computation_ptr = std::unique_ptr<column_computation>;
/*
* Column computation represents a computation performed in order to obtain a value for a computed column.
* Computed columns description is also available at docs/system_schema_keyspace.md. They hold values
* not provided directly by the user, but rather computed: from other column values and possibly other sources.
* This class is able to serialize/deserialize column computations and perform the computation itself,
* based on given schema, partition key and clustering row. Responsibility for providing enough data
* in the clustering row in order for computation to succeed belongs to the caller. In particular,
* generating a value might involve performing a read-before-write if the computation is performed
* on more values than are present in the update request.
*/
class column_computation {
public:
virtual ~column_computation() = default;
static column_computation_ptr deserialize(bytes_view raw);
static column_computation_ptr deserialize(const Json::Value& json);
virtual column_computation_ptr clone() const = 0;
virtual bytes serialize() const = 0;
virtual bytes_opt compute_value(const schema& schema, const partition_key& key, const clustering_row& row) const = 0;
};
class token_column_computation : public column_computation {
public:
virtual column_computation_ptr clone() const override {
return std::make_unique<token_column_computation>(*this);
}
virtual bytes serialize() const override;
virtual bytes_opt compute_value(const schema& schema, const partition_key& key, const clustering_row& row) const override;
};

View File

@@ -1,36 +0,0 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "schema.hh"
#include "collection_mutation.hh"
class atomic_cell;
class row_marker;
class compaction_garbage_collector {
public:
virtual ~compaction_garbage_collector() = default;
virtual void collect(column_id id, atomic_cell) = 0;
virtual void collect(column_id id, collection_mutation_description) = 0;
virtual void collect(row_marker) = 0;
};

View File

@@ -21,19 +21,14 @@
#pragma once
#include <seastar/core/future.hh>
#include <seastar/util/noncopyable_function.hh>
#include "schema_fwd.hh"
#include "sstables/shared_sstable.hh"
#include "exceptions/exceptions.hh"
#include "sstables/compaction_backlog_manager.hh"
class table;
using column_family = table;
class flat_mutation_reader;
struct mutation_source_metadata;
class schema;
using schema_ptr = lw_shared_ptr<const schema>;
namespace sstables {
@@ -52,8 +47,6 @@ class sstable_set;
struct compaction_descriptor;
struct resharding_descriptor;
using reader_consumer = noncopyable_function<future<> (flat_mutation_reader)>;
class compaction_strategy {
::shared_ptr<compaction_strategy_impl> _compaction_strategy_impl;
public:
@@ -82,8 +75,8 @@ public:
// Return if optimization to rule out sstables based on clustering key filter should be applied.
bool use_clustering_key_filter() const;
// Return true if compaction strategy doesn't care if a sstable belonging to partial sstable run is compacted.
bool can_compact_partial_runs() const;
// Return true if compaction strategy ignores sstables coming from partial runs.
bool ignore_partial_runs() const;
// An estimation of number of compaction for strategy to be satisfied.
int64_t estimated_pending_compactions(column_family& cf) const;
@@ -136,10 +129,6 @@ public:
sstable_set make_sstable_set(schema_ptr schema) const;
compaction_backlog_tracker& get_backlog_tracker();
uint64_t adjust_partition_estimate(const mutation_source_metadata& ms_meta, uint64_t partition_estimate);
reader_consumer make_interposer_consumer(const mutation_source_metadata& ms_meta, reader_consumer end_consumer);
};
// Creates a compaction_strategy object from one of the strategies available.

View File

@@ -1,165 +0,0 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "query-request.hh"
#include <optional>
#include <variant>
// Wraps ring_position_view so it is compatible with old-style C++: default
// constructor, stateless comparators, yada yada.
class compatible_ring_position_view {
const ::schema* _schema = nullptr;
// Optional to supply a default constructor, no more.
std::optional<dht::ring_position_view> _rpv;
public:
constexpr compatible_ring_position_view() = default;
compatible_ring_position_view(const schema& s, dht::ring_position_view rpv)
: _schema(&s), _rpv(rpv) {
}
const dht::ring_position_view& position() const {
return *_rpv;
}
const ::schema& schema() const {
return *_schema;
}
friend int tri_compare(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return dht::ring_position_tri_compare(*x._schema, *x._rpv, *y._rpv);
}
friend bool operator<(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) < 0;
}
friend bool operator<=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) <= 0;
}
friend bool operator>(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) > 0;
}
friend bool operator>=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) >= 0;
}
friend bool operator==(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) == 0;
}
friend bool operator!=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) != 0;
}
};
// Wraps ring_position so it is compatible with old-style C++: default
// constructor, stateless comparators, yada yada.
class compatible_ring_position {
schema_ptr _schema;
// Optional to supply a default constructor, no more.
std::optional<dht::ring_position> _rp;
public:
constexpr compatible_ring_position() = default;
compatible_ring_position(schema_ptr s, dht::ring_position rp)
: _schema(std::move(s)), _rp(std::move(rp)) {
}
dht::ring_position_view position() const {
return *_rp;
}
const ::schema& schema() const {
return *_schema;
}
friend int tri_compare(const compatible_ring_position& x, const compatible_ring_position& y) {
return dht::ring_position_tri_compare(*x._schema, *x._rp, *y._rp);
}
friend bool operator<(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) < 0;
}
friend bool operator<=(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) <= 0;
}
friend bool operator>(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) > 0;
}
friend bool operator>=(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) >= 0;
}
friend bool operator==(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) == 0;
}
friend bool operator!=(const compatible_ring_position& x, const compatible_ring_position& y) {
return tri_compare(x, y) != 0;
}
};
// Wraps ring_position or ring_position_view so either is compatible with old-style C++: default
// constructor, stateless comparators, yada yada.
// The motivations for supporting both types are to make containers self-sufficient by not relying
// on callers to keep ring position alive, allow lookup on containers that don't support different
// key types, and also avoiding unnecessary copies.
class compatible_ring_position_or_view {
// Optional to supply a default constructor, no more.
std::optional<std::variant<compatible_ring_position, compatible_ring_position_view>> _crp_or_view;
public:
constexpr compatible_ring_position_or_view() = default;
explicit compatible_ring_position_or_view(schema_ptr s, dht::ring_position rp)
: _crp_or_view(compatible_ring_position(std::move(s), std::move(rp))) {
}
explicit compatible_ring_position_or_view(const schema& s, dht::ring_position_view rpv)
: _crp_or_view(compatible_ring_position_view(s, rpv)) {
}
dht::ring_position_view position() const {
struct rpv_accessor {
dht::ring_position_view operator()(const compatible_ring_position& crp) {
return crp.position();
}
dht::ring_position_view operator()(const compatible_ring_position_view& crpv) {
return crpv.position();
}
};
return std::visit(rpv_accessor{}, *_crp_or_view);
}
friend int tri_compare(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
struct schema_accessor {
const ::schema& operator()(const compatible_ring_position& crp) {
return crp.schema();
}
const ::schema& operator()(const compatible_ring_position_view& crpv) {
return crpv.schema();
}
};
return dht::ring_position_tri_compare(std::visit(schema_accessor{}, *x._crp_or_view), x.position(), y.position());
}
friend bool operator<(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return tri_compare(x, y) < 0;
}
friend bool operator<=(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return tri_compare(x, y) <= 0;
}
friend bool operator>(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return tri_compare(x, y) > 0;
}
friend bool operator>=(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return tri_compare(x, y) >= 0;
}
friend bool operator==(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return tri_compare(x, y) == 0;
}
friend bool operator!=(const compatible_ring_position_or_view& x, const compatible_ring_position_or_view& y) {
return tri_compare(x, y) != 0;
}
};

View File

@@ -0,0 +1,64 @@
/*
* Copyright (C) 2016 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "query-request.hh"
#include <optional>
// Wraps ring_position_view so it is compatible with old-style C++: default
// constructor, stateless comparators, yada yada.
class compatible_ring_position_view {
const schema* _schema = nullptr;
// Optional to supply a default constructor, no more.
std::optional<dht::ring_position_view> _rpv;
public:
constexpr compatible_ring_position_view() = default;
compatible_ring_position_view(const schema& s, dht::ring_position_view rpv)
: _schema(&s), _rpv(rpv) {
}
const dht::ring_position_view& position() const {
return *_rpv;
}
friend int tri_compare(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return dht::ring_position_tri_compare(*x._schema, *x._rpv, *y._rpv);
}
friend bool operator<(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) < 0;
}
friend bool operator<=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) <= 0;
}
friend bool operator>(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) > 0;
}
friend bool operator>=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) >= 0;
}
friend bool operator==(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) == 0;
}
friend bool operator!=(const compatible_ring_position_view& x, const compatible_ring_position_view& y) {
return tri_compare(x, y) != 0;
}
};

View File

@@ -1,243 +0,0 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <seastar/net/inet_address.hh>
#include "types.hh"
#include "types/list.hh"
#include "types/map.hh"
#include "types/set.hh"
#include "types/tuple.hh"
#include "types/user.hh"
#include "utils/big_decimal.hh"
struct empty_type_impl final : public abstract_type {
empty_type_impl();
};
struct counter_type_impl final : public abstract_type {
counter_type_impl();
};
template <typename T>
struct simple_type_impl : public concrete_type<T> {
simple_type_impl(abstract_type::kind k, sstring name, std::optional<uint32_t> value_length_if_fixed);
};
template<typename T>
struct integer_type_impl : public simple_type_impl<T> {
integer_type_impl(abstract_type::kind k, sstring name, std::optional<uint32_t> value_length_if_fixed);
};
struct byte_type_impl final : public integer_type_impl<int8_t> {
byte_type_impl();
};
struct short_type_impl final : public integer_type_impl<int16_t> {
short_type_impl();
};
struct int32_type_impl final : public integer_type_impl<int32_t> {
int32_type_impl();
};
struct long_type_impl final : public integer_type_impl<int64_t> {
long_type_impl();
};
struct boolean_type_impl final : public simple_type_impl<bool> {
boolean_type_impl();
};
template <typename T>
struct floating_type_impl : public simple_type_impl<T> {
floating_type_impl(abstract_type::kind k, sstring name, std::optional<uint32_t> value_length_if_fixed);
};
struct double_type_impl final : public floating_type_impl<double> {
double_type_impl();
};
struct float_type_impl final : public floating_type_impl<float> {
float_type_impl();
};
struct decimal_type_impl final : public concrete_type<big_decimal> {
decimal_type_impl();
};
struct duration_type_impl final : public concrete_type<cql_duration> {
duration_type_impl();
};
struct timestamp_type_impl final : public simple_type_impl<db_clock::time_point> {
timestamp_type_impl();
};
struct simple_date_type_impl final : public simple_type_impl<uint32_t> {
simple_date_type_impl();
};
struct time_type_impl final : public simple_type_impl<int64_t> {
time_type_impl();
};
struct string_type_impl : public concrete_type<sstring> {
string_type_impl(kind k, sstring name);
};
struct ascii_type_impl final : public string_type_impl {
ascii_type_impl();
};
struct utf8_type_impl final : public string_type_impl {
utf8_type_impl();
};
struct bytes_type_impl final : public concrete_type<bytes> {
bytes_type_impl();
};
// This is the old version of timestamp_type_impl, but has been replaced as it
// wasn't comparing pre-epoch timestamps correctly. This is kept for backward
// compatibility but shouldn't be used in new code.
struct date_type_impl final : public concrete_type<db_clock::time_point> {
date_type_impl();
};
using timestamp_date_base_class = concrete_type<db_clock::time_point>;
struct timeuuid_type_impl final : public concrete_type<utils::UUID> {
timeuuid_type_impl();
};
struct varint_type_impl final : public concrete_type<boost::multiprecision::cpp_int> {
varint_type_impl();
};
struct inet_addr_type_impl final : public concrete_type<seastar::net::inet_address> {
inet_addr_type_impl();
};
struct uuid_type_impl final : public concrete_type<utils::UUID> {
uuid_type_impl();
};
template <typename Func> using visit_ret_type = std::invoke_result_t<Func, const ascii_type_impl&>;
GCC6_CONCEPT(
template <typename Func> concept bool CanHandleAllTypes = requires(Func f) {
{ f(*static_cast<const ascii_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const boolean_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const byte_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const bytes_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const counter_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const date_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const decimal_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const double_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const duration_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const empty_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const float_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const inet_addr_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const int32_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const list_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const long_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const map_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const reversed_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const set_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const short_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const simple_date_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const time_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const timestamp_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const timeuuid_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const tuple_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const user_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const utf8_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const uuid_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
{ f(*static_cast<const varint_type_impl*>(nullptr)) } -> visit_ret_type<Func>;
};
)
template<typename Func>
GCC6_CONCEPT(requires CanHandleAllTypes<Func>)
static inline visit_ret_type<Func> visit(const abstract_type& t, Func&& f) {
switch (t.get_kind()) {
case abstract_type::kind::ascii:
return f(*static_cast<const ascii_type_impl*>(&t));
case abstract_type::kind::boolean:
return f(*static_cast<const boolean_type_impl*>(&t));
case abstract_type::kind::byte:
return f(*static_cast<const byte_type_impl*>(&t));
case abstract_type::kind::bytes:
return f(*static_cast<const bytes_type_impl*>(&t));
case abstract_type::kind::counter:
return f(*static_cast<const counter_type_impl*>(&t));
case abstract_type::kind::date:
return f(*static_cast<const date_type_impl*>(&t));
case abstract_type::kind::decimal:
return f(*static_cast<const decimal_type_impl*>(&t));
case abstract_type::kind::double_kind:
return f(*static_cast<const double_type_impl*>(&t));
case abstract_type::kind::duration:
return f(*static_cast<const duration_type_impl*>(&t));
case abstract_type::kind::empty:
return f(*static_cast<const empty_type_impl*>(&t));
case abstract_type::kind::float_kind:
return f(*static_cast<const float_type_impl*>(&t));
case abstract_type::kind::inet:
return f(*static_cast<const inet_addr_type_impl*>(&t));
case abstract_type::kind::int32:
return f(*static_cast<const int32_type_impl*>(&t));
case abstract_type::kind::list:
return f(*static_cast<const list_type_impl*>(&t));
case abstract_type::kind::long_kind:
return f(*static_cast<const long_type_impl*>(&t));
case abstract_type::kind::map:
return f(*static_cast<const map_type_impl*>(&t));
case abstract_type::kind::reversed:
return f(*static_cast<const reversed_type_impl*>(&t));
case abstract_type::kind::set:
return f(*static_cast<const set_type_impl*>(&t));
case abstract_type::kind::short_kind:
return f(*static_cast<const short_type_impl*>(&t));
case abstract_type::kind::simple_date:
return f(*static_cast<const simple_date_type_impl*>(&t));
case abstract_type::kind::time:
return f(*static_cast<const time_type_impl*>(&t));
case abstract_type::kind::timestamp:
return f(*static_cast<const timestamp_type_impl*>(&t));
case abstract_type::kind::timeuuid:
return f(*static_cast<const timeuuid_type_impl*>(&t));
case abstract_type::kind::tuple:
return f(*static_cast<const tuple_type_impl*>(&t));
case abstract_type::kind::user:
return f(*static_cast<const user_type_impl*>(&t));
case abstract_type::kind::utf8:
return f(*static_cast<const utf8_type_impl*>(&t));
case abstract_type::kind::uuid:
return f(*static_cast<const uuid_type_impl*>(&t));
case abstract_type::kind::varint:
return f(*static_cast<const varint_type_impl*>(&t));
}
__builtin_unreachable();
}

View File

@@ -112,9 +112,6 @@ read_request_timeout_in_ms: 5000
# How long the coordinator should wait for writes to complete
write_request_timeout_in_ms: 2000
# how long a coordinator should continue to retry a CAS operation
# that contends with other proposals for the same row
cas_contention_timeout_in_ms: 1000
# phi value that must be reached for a host to be marked down.
# most users should never need to adjust this.
@@ -241,9 +238,7 @@ batch_size_fail_threshold_in_kb: 50
# broadcast_rpc_address: 1.2.3.4
# Uncomment to enable experimental features
# experimental_features:
# - cdc
# - lwt
# experimental: true
# The directory where hints files are stored if hinted handoff is enabled.
# hints_directory: /var/lib/scylla/hints
@@ -618,16 +613,10 @@ commitlog_total_space_in_mb: -1
# compaction_throughput_mb_per_sec: 16
# Log a warning when writing partitions larger than this value
# compaction_large_partition_warning_threshold_mb: 1000
# compaction_large_partition_warning_threshold_mb: 100
# Log a warning when writing rows larger than this value
# compaction_large_row_warning_threshold_mb: 10
# Log a warning when writing cells larger than this value
# compaction_large_cell_warning_threshold_mb: 1
# Log a warning when row number is larger than this value
# compaction_rows_count_warning_threshold: 100000
# compaction_large_row_warning_threshold_mb: 50
# When compacting, the replacement sstable(s) can be opened before they
# are completely written, and used in place of the prior sstables for

View File

@@ -240,7 +240,8 @@ def find_headers(repodir, excluded_dirs):
modes = {
'debug': {
'cxxflags': '-DDEBUG -DDEBUG_LSA_SANITIZER',
'cxx_ld_flags': '',
# Disable vptr because of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88684
'cxx_ld_flags': '-O0 -fsanitize=address -fsanitize=leak -fsanitize=undefined -fno-sanitize=vptr',
},
'release': {
'cxxflags': '',
@@ -250,10 +251,6 @@ modes = {
'cxxflags': '',
'cxx_ld_flags': '-O1',
},
'sanitize': {
'cxxflags': '-DDEBUG -DDEBUG_LSA_SANITIZER',
'cxx_ld_flags': '-Os',
}
}
scylla_tests = [
@@ -285,9 +282,7 @@ scylla_tests = [
'tests/row_cache_stress_test',
'tests/memory_footprint',
'tests/perf/perf_sstable',
'tests/cdc_test',
'tests/cql_query_test',
'tests/user_types_test',
'tests/secondary_index_test',
'tests/json_cql_query_test',
'tests/filtering_test',
@@ -368,7 +363,7 @@ scylla_tests = [
'tests/imr_test',
'tests/partition_data_test',
'tests/reusable_buffer_test',
'tests/mutation_writer_test',
'tests/multishard_writer_test',
'tests/observable_test',
'tests/transport_test',
'tests/fragmented_temporary_buffer_test',
@@ -380,8 +375,6 @@ scylla_tests = [
'tests/small_vector_test',
'tests/data_listeners_test',
'tests/truncation_migration_test',
'tests/like_matcher_test',
'tests/enum_option_test',
]
perf_tests = [
@@ -469,7 +462,6 @@ cassandra_interface = Thrift(source='interface/cassandra.thrift', service='Cassa
scylla_core = (['database.cc',
'table.cc',
'atomic_cell.cc',
'collection_mutation.cc',
'hashers.cc',
'schema.cc',
'frozen_schema.cc',
@@ -488,7 +480,6 @@ scylla_core = (['database.cc',
'utils/large_bitset.cc',
'utils/buffer_input_stream.cc',
'utils/limiting_data_source.cc',
'utils/updateable_value.cc',
'mutation_partition.cc',
'mutation_partition_view.cc',
'mutation_partition_serializer.cc',
@@ -499,7 +490,6 @@ scylla_core = (['database.cc',
'keys.cc',
'counters.cc',
'compress.cc',
'zstd.cc',
'sstables/mp_row_consumer.cc',
'sstables/sstables.cc',
'sstables/sstables_manager.cc',
@@ -509,7 +499,6 @@ scylla_core = (['database.cc',
'sstables/partition.cc',
'sstables/compaction.cc',
'sstables/compaction_strategy.cc',
'sstables/leveled_compaction_strategy.cc',
'sstables/compaction_manager.cc',
'sstables/integrity_checked_file_impl.cc',
'sstables/prepended_input_stream.cc',
@@ -518,7 +507,6 @@ scylla_core = (['database.cc',
'transport/event_notifier.cc',
'transport/server.cc',
'transport/messages/result_message.cc',
'cdc/cdc.cc',
'cql3/abstract_marker.cc',
'cql3/attributes.cc',
'cql3/cf_name.cc',
@@ -547,7 +535,6 @@ scylla_core = (['database.cc',
'cql3/statements/schema_altering_statement.cc',
'cql3/statements/ks_prop_defs.cc',
'cql3/statements/modification_statement.cc',
'cql3/statements/cas_request.cc',
'cql3/statements/parsed_statement.cc',
'cql3/statements/property_definitions.cc',
'cql3/statements/update_statement.cc',
@@ -585,10 +572,6 @@ scylla_core = (['database.cc',
'service/priority_manager.cc',
'service/migration_manager.cc',
'service/storage_proxy.cc',
'service/paxos/proposal.cc',
'service/paxos/prepare_response.cc',
'service/paxos/paxos_state.cc',
'service/paxos/prepare_summary.cc',
'cql3/operator.cc',
'cql3/relation.cc',
'cql3/column_identifier.cc',
@@ -727,7 +710,6 @@ scylla_core = (['database.cc',
'tracing/trace_keyspace_helper.cc',
'tracing/trace_state.cc',
'tracing/tracing_backend_registry.cc',
'tracing/traced_file.cc',
'table_helper.cc',
'range_tombstone.cc',
'range_tombstone_list.cc',
@@ -737,14 +719,12 @@ scylla_core = (['database.cc',
'utils/arch/powerpc/crc32-vpmsum/crc32_wrapper.cc',
'querier.cc',
'data/cell.cc',
'mutation_writer/multishard_writer.cc',
'multishard_writer.cc',
'multishard_mutation_query.cc',
'reader_concurrency_semaphore.cc',
'distributed_loader.cc',
'utils/utf8.cc',
'utils/ascii.cc',
'utils/like_matcher.cc',
'mutation_writer/timestamp_based_splitting_writer.cc',
] + [Antlr3Grammar('cql3/Cql.g')] + [Thrift('interface/cassandra.thrift', 'Cassandra')]
)
@@ -784,19 +764,6 @@ api = ['api/api.cc',
'api/api-doc/config.json',
]
alternator = [
'alternator/server.cc',
'alternator/executor.cc',
'alternator/stats.cc',
'alternator/base64.cc',
'alternator/serialization.cc',
'alternator/expressions.cc',
Antlr3Grammar('alternator/expressions.g'),
'alternator/conditions.cc',
'alternator/rjson.cc',
'alternator/auth.cc',
]
idls = ['idl/gossip_digest.idl.hh',
'idl/uuid.idl.hh',
'idl/range.idl.hh',
@@ -821,8 +788,6 @@ idls = ['idl/gossip_digest.idl.hh',
'idl/consistency_level.idl.hh',
'idl/cache_temperature.idl.hh',
'idl/view.idl.hh',
'idl/messaging_service.idl.hh',
'idl/paxos.idl.hh',
]
headers = find_headers('.', excluded_dirs=['idl', 'build', 'seastar', '.git'])
@@ -837,12 +802,10 @@ scylla_tests_dependencies = scylla_core + idls + scylla_tests_generic_dependenci
'tests/result_set_assertions.cc',
'tests/mutation_source_test.cc',
'tests/data_model.cc',
'tests/exception_utils.cc',
'tests/random_schema.cc',
]
deps = {
'scylla': idls + ['main.cc', 'release.cc'] + scylla_core + api + alternator,
'scylla': idls + ['main.cc', 'release.cc'] + scylla_core + api,
}
pure_boost_tests = set([
@@ -856,6 +819,7 @@ pure_boost_tests = set([
'tests/range_test',
'tests/crc_test',
'tests/checksum_utils_test',
'tests/managed_vector_test',
'tests/dynamic_bitset_test',
'tests/idl_test',
'tests/cartesian_product_test',
@@ -870,13 +834,14 @@ pure_boost_tests = set([
'tests/enum_set_test',
'tests/cql_auth_syntax_test',
'tests/meta_test',
'tests/imr_test',
'tests/partition_data_test',
'tests/reusable_buffer_test',
'tests/observable_test',
'tests/json_test',
'tests/auth_passwords_test',
'tests/top_k_test',
'tests/small_vector_test',
'tests/like_matcher_test',
'tests/enum_option_test',
])
tests_not_using_seastar_test_framework = set([
@@ -888,6 +853,8 @@ tests_not_using_seastar_test_framework = set([
'tests/perf/perf_hash',
'tests/perf/perf_cql_parser',
'tests/message',
'tests/perf/perf_simple_query',
'tests/perf/perf_fast_forward',
'tests/perf/perf_cache_eviction',
'tests/row_cache_stress_test',
'tests/memory_footprint',
@@ -935,8 +902,6 @@ deps['tests/small_vector_test'] = ['tests/small_vector_test.cc']
deps['tests/multishard_mutation_query_test'] += ['tests/test_table.cc']
deps['tests/vint_serialization_test'] = ['tests/vint_serialization_test.cc', 'vint-serialization.cc', 'bytes.cc']
deps['tests/duration_test'] += ['tests/exception_utils.cc']
deps['utils/gz/gen_crc_combine_table'] = ['utils/gz/gen_crc_combine_table.cc']
warnings = [
@@ -1071,7 +1036,6 @@ total_memory = os.sysconf('SC_PAGE_SIZE') * os.sysconf('SC_PHYS_PAGES')
link_pool_depth = max(int(total_memory / 7e9), 1)
selected_modes = args.selected_modes or modes.keys()
default_modes = args.selected_modes or ['debug', 'release', 'dev']
build_modes = {m: modes[m] for m in selected_modes}
build_artifacts = all_artifacts if not args.artifacts else args.artifacts
@@ -1087,70 +1051,51 @@ scylla_release = file.read().strip()
extra_cxxflags["release.cc"] = "-DSCYLLA_VERSION=\"\\\"" + scylla_version + "\\\"\" -DSCYLLA_RELEASE=\"\\\"" + scylla_release + "\\\"\""
seastar_flags = []
if args.dpdk:
# fake dependencies on dpdk, so that it is built before anything else
seastar_flags += ['--enable-dpdk']
if args.gcc6_concepts:
seastar_flags += ['--enable-gcc6-concepts']
if args.alloc_failure_injector:
seastar_flags += ['--enable-alloc-failure-injector']
if args.split_dwarf:
seastar_flags += ['--split-dwarf']
# We never compress debug info in debug mode
modes['debug']['cxxflags'] += ' -gz'
# We compress it by default in release mode
flag_dest = 'cxx_ld_flags' if args.compress_exec_debuginfo else 'cxxflags'
modes['release'][flag_dest] += ' -gz'
for m in ['debug', 'release', 'sanitize']:
modes[m]['cxxflags'] += ' ' + dbgflag
modes['debug']['cxxflags'] += ' ' + dbgflag
modes['release']['cxxflags'] += ' ' + dbgflag
seastar_cflags = args.user_cflags
seastar_cflags += ' -Wno-error'
if args.target != '':
seastar_cflags += ' -march=' + args.target
seastar_ldflags = args.user_ldflags
seastar_flags += ['--compiler', args.cxx, '--c-compiler', args.cc, '--cflags=%s' % (seastar_cflags), '--ldflags=%s' % (seastar_ldflags),
'--c++-dialect=gnu++17', '--use-std-optional-variant-stringview=1', '--optflags=%s' % (modes['release']['cxx_ld_flags']), ]
libdeflate_cflags = seastar_cflags
zstd_cflags = seastar_cflags + ' -Wno-implicit-fallthrough'
MODE_TO_CMAKE_BUILD_TYPE = {'release' : 'RelWithDebInfo', 'debug' : 'Debug', 'dev' : 'Dev', 'sanitize' : 'Sanitize' }
status = subprocess.call([args.python, './configure.py'] + seastar_flags, cwd='seastar')
def configure_seastar(build_dir, mode):
seastar_build_dir = os.path.join(build_dir, mode, 'seastar')
if status != 0:
print('Seastar configuration failed')
sys.exit(1)
seastar_cmake_args = [
'-DCMAKE_BUILD_TYPE={}'.format(MODE_TO_CMAKE_BUILD_TYPE[mode]),
'-DCMAKE_C_COMPILER={}'.format(args.cc),
'-DCMAKE_CXX_COMPILER={}'.format(args.cxx),
'-DSeastar_CXX_FLAGS={}'.format((seastar_cflags + ' ' + modes[mode]['cxx_ld_flags']).replace(' ', ';')),
'-DSeastar_LD_FLAGS={}'.format(seastar_ldflags),
'-DSeastar_CXX_DIALECT=gnu++17',
'-DSeastar_STD_OPTIONAL_VARIANT_STRINGVIEW=ON',
'-DSeastar_UNUSED_RESULT_ERROR=ON',
]
if args.dpdk:
seastar_cmake_args += ['-DSeastar_DPDK=ON', '-DSeastar_DPDK_MACHINE=wsm']
if args.gcc6_concepts:
seastar_cmake_args += ['-DSeastar_GCC6_CONCEPTS=ON']
if args.split_dwarf:
seastar_cmake_args += ['-DSeastar_SPLIT_DWARF=ON']
if args.alloc_failure_injector:
seastar_cmake_args += ['-DSeastar_ALLOC_FAILURE_INJECTION=ON']
seastar_cmd = ['cmake', '-G', 'Ninja', os.path.relpath('seastar', seastar_build_dir)] + seastar_cmake_args
cmake_dir = seastar_build_dir
if args.dpdk:
# need to cook first
cmake_dir = 'seastar' # required by cooking.sh
relative_seastar_build_dir = os.path.join('..', seastar_build_dir) # relative to seastar/
seastar_cmd = ['./cooking.sh', '-i', 'dpdk', '-d', relative_seastar_build_dir, '--'] + seastar_cmd[4:]
print(seastar_cmd)
os.makedirs(seastar_build_dir, exist_ok=True)
subprocess.check_call(seastar_cmd, shell=False, cwd=cmake_dir)
for mode in build_modes:
configure_seastar('build', mode)
pc = {mode: 'build/{}/seastar/seastar.pc'.format(mode) for mode in build_modes}
pc = {mode: 'build/{}/seastar.pc'.format(mode) for mode in build_modes}
ninja = find_executable('ninja') or find_executable('ninja-build')
if not ninja:
print('Ninja executable (ninja or ninja-build) not found on PATH\n')
sys.exit(1)
def query_seastar_flags(pc_file, link_static_cxx=False):
def query_seastar_flags(seastar_pc_file, link_static_cxx=False):
pc_file = os.path.join('seastar', seastar_pc_file)
cflags = pkg_config(pc_file, '--cflags', '--static')
libs = pkg_config(pc_file, '--libs', '--static')
@@ -1164,30 +1109,11 @@ for mode in build_modes:
modes[mode]['seastar_cflags'] = seastar_cflags
modes[mode]['seastar_libs'] = seastar_libs
# We need to use experimental features of the zstd library (to use our own allocators for the (de)compression context),
# which are available only when the library is linked statically.
def configure_zstd(build_dir, mode):
zstd_build_dir = os.path.join(build_dir, mode, 'zstd')
zstd_cmake_args = [
'-DCMAKE_BUILD_TYPE={}'.format(MODE_TO_CMAKE_BUILD_TYPE[mode]),
'-DCMAKE_C_COMPILER={}'.format(args.cc),
'-DCMAKE_CXX_COMPILER={}'.format(args.cxx),
'-DCMAKE_C_FLAGS={}'.format(zstd_cflags),
'-DZSTD_BUILD_PROGRAMS=OFF'
]
zstd_cmd = ['cmake', '-G', 'Ninja', os.path.relpath('zstd/build/cmake', zstd_build_dir)] + zstd_cmake_args
print(zstd_cmd)
os.makedirs(zstd_build_dir, exist_ok=True)
subprocess.check_call(zstd_cmd, shell=False, cwd=zstd_build_dir)
args.user_cflags += " " + pkg_config('jsoncpp', '--cflags')
args.user_cflags += ' -march=' + args.target
libs = ' '.join([maybe_static(args.staticyamlcpp, '-lyaml-cpp'), '-latomic', '-llz4', '-lz', '-lsnappy', pkg_config('jsoncpp', '--libs'),
' -lstdc++fs', ' -lcrypt', ' -lcryptopp', ' -lpthread',
maybe_static(args.staticboost, '-lboost_date_time -lboost_regex -licuuc'), ])
maybe_static(args.staticboost, '-lboost_date_time'), ])
xxhash_dir = 'xxHash'
@@ -1197,17 +1123,6 @@ if not os.path.exists(xxhash_dir) or not os.listdir(xxhash_dir):
if not args.staticboost:
args.user_cflags += ' -DBOOST_TEST_DYN_LINK'
# thrift version detection, see #4538
proc_res = subprocess.run(["thrift", "-version"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
proc_res_output = proc_res.stdout.decode("utf-8")
if proc_res.returncode != 0 and not re.search(r'^Thrift version', proc_res_output):
raise Exception("Thrift compiler must be missing: {}".format(proc_res_output))
thrift_version = proc_res_output.split(" ")[-1]
thrift_boost_versions = ["0.{}.".format(n) for n in range(1, 11)]
if any(filter(thrift_version.startswith, thrift_boost_versions)):
args.user_cflags += ' -DTHRIFT_USES_BOOST'
for pkg in pkgs:
args.user_cflags += ' ' + pkg_config(pkg, '--cflags')
libs += ' ' + pkg_config(pkg, '--libs')
@@ -1233,16 +1148,7 @@ if args.antlr3_exec:
else:
antlr3_exec = "antlr3"
for mode in build_modes:
configure_zstd(outdir, mode)
# configure.py may run automatically from an already-existing build.ninja.
# If the user interrupts configure.py in the middle, we need build.ninja
# to remain in a valid state. So we write our output to a temporary
# file, and only when done we rename it atomically to build.ninja.
buildfile_tmp = buildfile + ".tmp"
with open(buildfile_tmp, 'w') as f:
with open(buildfile, 'w') as f:
f.write(textwrap.dedent('''\
configure_args = {configure_args}
builddir = {outdir}
@@ -1252,7 +1158,7 @@ with open(buildfile_tmp, 'w') as f:
libs = {libs}
pool link_pool
depth = {link_pool_depth}
pool submodule_pool
pool seastar_pool
depth = 1
rule gen
command = echo -e $text > $out
@@ -1332,7 +1238,7 @@ with open(buildfile_tmp, 'w') as f:
serializers = {}
thrifts = set()
antlr3_grammars = set()
seastar_dep = 'build/{}/seastar/libseastar.a'.format(mode)
seastar_dep = 'seastar/build/{}/libseastar.a'.format(mode)
for binary in build_artifacts:
if binary in other:
continue
@@ -1352,8 +1258,7 @@ with open(buildfile_tmp, 'w') as f:
f.write('build $builddir/{}/{}: ar.{} {}\n'.format(mode, binary, mode, str.join(' ', objs)))
else:
objs.extend(['$builddir/' + mode + '/' + artifact for artifact in [
'libdeflate/libdeflate.a',
'zstd/lib/libzstd.a',
'libdeflate/libdeflate.a'
]])
objs.append('$builddir/' + mode + '/gen/utils/gz/crc_combine_table.o')
if binary.startswith('tests/'):
@@ -1361,7 +1266,7 @@ with open(buildfile_tmp, 'w') as f:
if binary in pure_boost_tests:
local_libs += ' ' + maybe_static(args.staticboost, '-lboost_unit_test_framework')
if binary not in tests_not_using_seastar_test_framework:
pc_path = pc[mode].replace('seastar.pc', 'seastar-testing.pc')
pc_path = os.path.join('seastar', pc[mode].replace('seastar.pc', 'seastar-testing.pc'))
local_libs += ' ' + pkg_config(pc_path, '--libs', '--static')
if has_thrift:
local_libs += ' ' + thrift_libs + ' ' + maybe_static(args.staticboost, '-lboost_system')
@@ -1400,7 +1305,6 @@ with open(buildfile_tmp, 'w') as f:
'$builddir/' + mode + '/utils/gz/gen_crc_combine_table'))
f.write('build {}: link.{} {}\n'.format('$builddir/' + mode + '/utils/gz/gen_crc_combine_table', mode,
'$builddir/' + mode + '/utils/gz/gen_crc_combine_table.o'))
f.write(' libs = $seastar_libs_{}\n'.format(mode))
f.write(
'build {mode}-objects: phony {objs}\n'.format(
mode=mode,
@@ -1449,30 +1353,19 @@ with open(buildfile_tmp, 'w') as f:
f.write('build $builddir/{mode}/{hh}.o: checkhh.{mode} {hh} || {gen_headers_dep}\n'.format(
mode=mode, hh=hh, gen_headers_dep=gen_headers_dep))
f.write('build build/{mode}/seastar/libseastar.a: ninja | always\n'
f.write('build seastar/build/{mode}/libseastar.a seastar/build/{mode}/apps/iotune/iotune: ninja | always\n'
.format(**locals()))
f.write(' pool = submodule_pool\n')
f.write(' subdir = build/{mode}/seastar\n'.format(**locals()))
f.write(' target = seastar seastar_testing\n'.format(**locals()))
f.write('build build/{mode}/seastar/apps/iotune/iotune: ninja\n'
.format(**locals()))
f.write(' pool = submodule_pool\n')
f.write(' subdir = build/{mode}/seastar\n'.format(**locals()))
f.write(' target = iotune\n'.format(**locals()))
f.write(' pool = seastar_pool\n')
f.write(' subdir = seastar/build/{mode}\n'.format(**locals()))
f.write(' target = seastar seastar_testing app_iotune\n'.format(**locals()))
f.write(textwrap.dedent('''\
build build/{mode}/iotune: copy build/{mode}/seastar/apps/iotune/iotune
build build/{mode}/iotune: copy seastar/build/{mode}/apps/iotune/iotune
''').format(**locals()))
f.write('build build/{mode}/scylla-package.tar.gz: package build/{mode}/scylla build/{mode}/iotune build/SCYLLA-RELEASE-FILE build/SCYLLA-VERSION-FILE | always\n'.format(**locals()))
f.write(' pool = submodule_pool\n')
f.write(' mode = {mode}\n'.format(**locals()))
f.write(' mode = {mode}\n'.format(**locals()))
f.write('rule libdeflate.{mode}\n'.format(**locals()))
f.write(' command = make -C libdeflate BUILD_DIR=../build/{mode}/libdeflate/ CFLAGS="{libdeflate_cflags}" CC={args.cc} ../build/{mode}/libdeflate//libdeflate.a\n'.format(**locals()))
f.write(' command = make -C libdeflate BUILD_DIR=../build/{mode}/libdeflate/ CFLAGS="{libdeflate_cflags}" CC={args.cc} ../build/{mode}/libdeflate//libdeflate.a\n'.format(**locals()))
f.write('build build/{mode}/libdeflate/libdeflate.a: libdeflate.{mode}\n'.format(**locals()))
f.write(' pool = submodule_pool\n')
f.write('build build/{mode}/zstd/lib/libzstd.a: ninja\n'.format(**locals()))
f.write(' pool = submodule_pool\n')
f.write(' subdir = build/{mode}/zstd\n'.format(**locals()))
f.write(' target = libzstd.a\n'.format(**locals()))
mode = 'dev' if 'dev' in modes else modes[0]
f.write('build checkheaders: phony || {}\n'.format(' '.join(['$builddir/{}/{}.o'.format(mode, hh) for hh in headers])))
@@ -1481,7 +1374,7 @@ with open(buildfile_tmp, 'w') as f:
rule configure
command = {python} configure.py $configure_args
generator = 1
build build.ninja: configure | configure.py
build build.ninja: configure | configure.py seastar/configure.py
rule cscope
command = find -name '*.[chS]' -o -name "*.cc" -o -name "*.hh" | cscope -bq -i-
description = CSCOPE
@@ -1491,12 +1384,10 @@ with open(buildfile_tmp, 'w') as f:
description = CLEAN
build clean: clean
default {modes_list}
''').format(modes_list=' '.join(default_modes), **globals()))
''').format(modes_list=' '.join(build_modes), **globals()))
f.write(textwrap.dedent('''\
build always: phony
rule scylla_version_gen
command = ./SCYLLA-VERSION-GEN
build build/SCYLLA-RELEASE-FILE build/SCYLLA-VERSION-FILE: scylla_version_gen
''').format(modes_list=' '.join(build_modes), **globals()))
os.rename(buildfile_tmp, buildfile)

View File

@@ -21,9 +21,6 @@
#pragma once
#include "types/user.hh"
#include "concrete_types.hh"
#include "mutation_partition_view.hh"
#include "mutation_partition.hh"
#include "schema.hh"
@@ -38,8 +35,8 @@ class converting_mutation_partition_applier : public mutation_partition_visitor
const column_mapping& _visited_column_mapping;
deletable_row* _current_row;
private:
static bool is_compatible(const column_definition& new_def, const abstract_type& old_type, column_kind kind) {
return ::is_compatible(new_def.kind, kind) && new_def.type->is_value_compatible_with(old_type);
static bool is_compatible(const column_definition& new_def, const data_type& old_type, column_kind kind) {
return ::is_compatible(new_def.kind, kind) && new_def.type->is_value_compatible_with(*old_type);
}
static atomic_cell upgrade_cell(const abstract_type& new_type, const abstract_type& old_type, atomic_cell_view cell,
atomic_cell::collection_member cm = atomic_cell::collection_member::no) {
@@ -52,59 +49,32 @@ private:
return atomic_cell(new_type, cell);
}
}
static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const abstract_type& old_type, atomic_cell_view cell) {
static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, atomic_cell_view cell) {
if (!is_compatible(new_def, old_type, kind) || cell.timestamp() <= new_def.dropped_at()) {
return;
}
dst.apply(new_def, upgrade_cell(*new_def.type, old_type, cell));
dst.apply(new_def, upgrade_cell(*new_def.type, *old_type, cell));
}
static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const abstract_type& old_type, collection_mutation_view cell) {
static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, collection_mutation_view cell) {
if (!is_compatible(new_def, old_type, kind)) {
return;
}
cell.data.with_linearized([&] (bytes_view cell_bv) {
auto new_ctype = static_pointer_cast<const collection_type_impl>(new_def.type);
auto old_ctype = static_pointer_cast<const collection_type_impl>(old_type);
auto old_view = old_ctype->deserialize_mutation_form(cell_bv);
cell.with_deserialized(old_type, [&] (collection_mutation_view_description old_view) {
collection_mutation_description new_view;
collection_type_impl::mutation new_view;
if (old_view.tomb.timestamp > new_def.dropped_at()) {
new_view.tomb = old_view.tomb;
}
visit(old_type, make_visitor(
[&] (const collection_type_impl& old_ctype) {
assert(new_def.type->is_collection()); // because is_compatible
auto& new_ctype = static_cast<const collection_type_impl&>(*new_def.type);
auto& new_value_type = *new_ctype.value_comparator();
auto& old_value_type = *old_ctype.value_comparator();
for (auto& c : old_view.cells) {
if (c.second.timestamp() > new_def.dropped_at()) {
new_view.cells.emplace_back(c.first, upgrade_cell(
new_value_type, old_value_type, c.second, atomic_cell::collection_member::yes));
}
}
},
[&] (const user_type_impl& old_utype) {
assert(new_def.type->is_user_type()); // because is_compatible
auto& new_utype = static_cast<const user_type_impl&>(*new_def.type);
for (auto& c : old_view.cells) {
if (c.second.timestamp() > new_def.dropped_at()) {
auto idx = deserialize_field_index(c.first);
assert(idx < new_utype.size() && idx < old_utype.size());
new_view.cells.emplace_back(c.first, upgrade_cell(
*new_utype.type(idx), *old_utype.type(idx), c.second, atomic_cell::collection_member::yes));
}
}
},
[&] (const abstract_type& o) {
throw std::runtime_error(format("not a multi-cell type: {}", o.name()));
for (auto& c : old_view.cells) {
if (c.second.timestamp() > new_def.dropped_at()) {
new_view.cells.emplace_back(c.first, upgrade_cell(*new_ctype->value_comparator(), *old_ctype->value_comparator(), c.second, atomic_cell::collection_member::yes));
}
));
}
if (new_view.tomb || !new_view.cells.empty()) {
dst.apply(new_def, new_view.serialize(*new_def.type));
dst.apply(new_def, new_ctype->serialize_mutation_form(std::move(new_view)));
}
});
}
@@ -130,7 +100,7 @@ public:
const column_mapping_entry& col = _visited_column_mapping.static_column_at(id);
const column_definition* def = _p_schema.get_column_definition(col.name());
if (def) {
accept_cell(_p._static_row.maybe_create(), column_kind::static_column, *def, *col.type(), cell);
accept_cell(_p._static_row, column_kind::static_column, *def, col.type(), cell);
}
}
@@ -138,7 +108,7 @@ public:
const column_mapping_entry& col = _visited_column_mapping.static_column_at(id);
const column_definition* def = _p_schema.get_column_definition(col.name());
if (def) {
accept_cell(_p._static_row.maybe_create(), column_kind::static_column, *def, *col.type(), collection);
accept_cell(_p._static_row, column_kind::static_column, *def, col.type(), collection);
}
}
@@ -161,7 +131,7 @@ public:
const column_mapping_entry& col = _visited_column_mapping.regular_column_at(id);
const column_definition* def = _p_schema.get_column_definition(col.name());
if (def) {
accept_cell(_current_row->cells(), column_kind::regular_column, *def, *col.type(), cell);
accept_cell(_current_row->cells(), column_kind::regular_column, *def, col.type(), cell);
}
}
@@ -169,7 +139,7 @@ public:
const column_mapping_entry& col = _visited_column_mapping.regular_column_at(id);
const column_definition* def = _p_schema.get_column_definition(col.name());
if (def) {
accept_cell(_current_row->cells(), column_kind::regular_column, *def, *col.type(), collection);
accept_cell(_current_row->cells(), column_kind::regular_column, *def, col.type(), collection);
}
}
@@ -177,9 +147,9 @@ public:
// Cells must have monotonic names.
static void append_cell(row& dst, column_kind kind, const column_definition& new_def, const column_definition& old_def, const atomic_cell_or_collection& cell) {
if (new_def.is_atomic()) {
accept_cell(dst, kind, new_def, *old_def.type, cell.as_atomic_cell(old_def));
accept_cell(dst, kind, new_def, old_def.type, cell.as_atomic_cell(old_def));
} else {
accept_cell(dst, kind, new_def, *old_def.type, cell.as_collection_mutation());
accept_cell(dst, kind, new_def, old_def.type, cell.as_collection_mutation());
}
}
};

View File

@@ -309,8 +309,7 @@ class basic_counter_cell_view {
protected:
using linearized_value_view = std::conditional_t<is_mutable == mutable_view::no,
bytes_view, bytes_mutable_view>;
using pointer_type = std::conditional_t<is_mutable == mutable_view::no,
bytes_view::const_pointer, bytes_mutable_view::pointer>;
using pointer_type = typename linearized_value_view::pointer;
basic_atomic_cell_view<is_mutable> _cell;
linearized_value_view _value;
private:

View File

@@ -125,7 +125,6 @@ struct uninitialized {
uninitialized& operator=(uninitialized&&) = default;
operator const T&() const & { return check(), *_val; }
operator T&&() && { return check(), std::move(*_val); }
operator std::optional<T>&&() && { return check(), std::move(_val); }
void check() const { if (!_val) { throw std::runtime_error("not intitialized"); } }
};
@@ -395,7 +394,6 @@ selectStatement returns [shared_ptr<raw::select_statement> expr]
)
K_FROM cf=columnFamilyName
( K_WHERE wclause=whereClause )?
( K_GROUP K_BY gbcolumns=listOfIdentifiers)?
( K_ORDER K_BY orderByClause[orderings] ( ',' orderByClause[orderings] )* )?
( K_PER K_PARTITION K_LIMIT rows=intValue { per_partition_limit = rows; } )?
( K_LIMIT rows=intValue { limit = rows; } )?
@@ -404,8 +402,7 @@ selectStatement returns [shared_ptr<raw::select_statement> expr]
{
auto params = ::make_shared<raw::select_statement::parameters>(std::move(orderings), is_distinct, allow_filtering, is_json, bypass_cache);
$expr = ::make_shared<raw::select_statement>(std::move(cf), std::move(params),
std::move(sclause), std::move(wclause), std::move(limit), std::move(per_partition_limit),
std::move(gbcolumns));
std::move(sclause), std::move(wclause), std::move(limit), std::move(per_partition_limit));
}
;
@@ -524,7 +521,6 @@ usingClauseObjective[::shared_ptr<cql3::attributes::raw> attrs]
*/
updateStatement returns [::shared_ptr<raw::update_statement> expr]
@init {
bool if_exists = false;
auto attrs = ::make_shared<cql3::attributes::raw>();
std::vector<std::pair<::shared_ptr<cql3::column_identifier::raw>, ::shared_ptr<cql3::operation::raw_update>>> operations;
}
@@ -532,14 +528,13 @@ updateStatement returns [::shared_ptr<raw::update_statement> expr]
( usingClause[attrs] )?
K_SET columnOperation[operations] (',' columnOperation[operations])*
K_WHERE wclause=whereClause
( K_IF (K_EXISTS{ if_exists = true; } | conditions=updateConditions) )?
( K_IF conditions=updateConditions )?
{
return ::make_shared<raw::update_statement>(std::move(cf),
std::move(attrs),
std::move(operations),
std::move(wclause),
std::move(conditions),
if_exists);
std::move(conditions));
}
;
@@ -583,7 +578,6 @@ deleteSelection returns [std::vector<::shared_ptr<cql3::operation::raw_deletion>
deleteOp returns [::shared_ptr<cql3::operation::raw_deletion> op]
: c=cident { $op = ::make_shared<cql3::operation::column_deletion>(std::move(c)); }
| c=cident '[' t=term ']' { $op = ::make_shared<cql3::operation::element_deletion>(std::move(c), std::move(t)); }
| c=cident '.' field=ident { $op = ::make_shared<cql3::operation::field_deletion>(std::move(c), std::move(field)); }
;
usingClauseDelete[::shared_ptr<cql3::attributes::raw> attrs]
@@ -1131,10 +1125,10 @@ createUserStatement returns [::shared_ptr<create_role_statement> stmt]
bool ifNotExists = false;
}
: K_CREATE K_USER (K_IF K_NOT K_EXISTS { ifNotExists = true; })? u=username
: K_CREATE K_USER (K_IF K_NOT K_EXISTS { ifNotExists = true; })? username
( K_WITH K_PASSWORD v=STRING_LITERAL { opts.password = $v.text; })?
( K_SUPERUSER { opts.is_superuser = true; } | K_NOSUPERUSER { opts.is_superuser = false; } )?
{ $stmt = ::make_shared<create_role_statement>(cql3::role_name(u, cql3::preserve_role_case::yes), std::move(opts), ifNotExists); }
{ $stmt = ::make_shared<create_role_statement>(cql3::role_name($username.text, cql3::preserve_role_case::yes), std::move(opts), ifNotExists); }
;
/**
@@ -1144,10 +1138,10 @@ alterUserStatement returns [::shared_ptr<alter_role_statement> stmt]
@init {
cql3::role_options opts;
}
: K_ALTER K_USER u=username
: K_ALTER K_USER username
( K_WITH K_PASSWORD v=STRING_LITERAL { opts.password = $v.text; })?
( K_SUPERUSER { opts.is_superuser = true; } | K_NOSUPERUSER { opts.is_superuser = false; } )?
{ $stmt = ::make_shared<alter_role_statement>(cql3::role_name(u, cql3::preserve_role_case::yes), std::move(opts)); }
{ $stmt = ::make_shared<alter_role_statement>(cql3::role_name($username.text, cql3::preserve_role_case::yes), std::move(opts)); }
;
/**
@@ -1155,8 +1149,8 @@ alterUserStatement returns [::shared_ptr<alter_role_statement> stmt]
*/
dropUserStatement returns [::shared_ptr<drop_role_statement> stmt]
@init { bool ifExists = false; }
: K_DROP K_USER (K_IF K_EXISTS { ifExists = true; })? u=username
{ $stmt = ::make_shared<drop_role_statement>(cql3::role_name(u, cql3::preserve_role_case::yes), ifExists); }
: K_DROP K_USER (K_IF K_EXISTS { ifExists = true; })? username
{ $stmt = ::make_shared<drop_role_statement>(cql3::role_name($username.text, cql3::preserve_role_case::yes), ifExists); }
;
/**
@@ -1399,9 +1393,8 @@ columnOperation[operations_type& operations]
columnOperationDifferentiator[operations_type& operations, ::shared_ptr<cql3::column_identifier::raw> key]
: '=' normalColumnOperation[operations, key]
| '[' k=term ']' collectionColumnOperation[operations, key, k, false]
| '.' field=ident udtColumnOperation[operations, key, field]
| '[' K_SCYLLA_TIMEUUID_LIST_INDEX '(' k=term ')' ']' collectionColumnOperation[operations, key, k, true]
| '[' k=term ']' specializedColumnOperation[operations, key, k, false]
| '[' K_SCYLLA_TIMEUUID_LIST_INDEX '(' k=term ')' ']' specializedColumnOperation[operations, key, k, true]
;
normalColumnOperation[operations_type& operations, ::shared_ptr<cql3::column_identifier::raw> key]
@@ -1444,38 +1437,31 @@ normalColumnOperation[operations_type& operations, ::shared_ptr<cql3::column_ide
}
;
collectionColumnOperation[operations_type& operations,
shared_ptr<cql3::column_identifier::raw> key,
shared_ptr<cql3::term::raw> k,
bool by_uuid]
specializedColumnOperation[std::vector<std::pair<shared_ptr<cql3::column_identifier::raw>,
shared_ptr<cql3::operation::raw_update>>>& operations,
shared_ptr<cql3::column_identifier::raw> key,
shared_ptr<cql3::term::raw> k,
bool by_uuid]
: '=' t=term
{
add_raw_update(operations, key, make_shared<cql3::operation::set_element>(k, t, by_uuid));
}
;
udtColumnOperation[operations_type& operations,
shared_ptr<cql3::column_identifier::raw> key,
shared_ptr<cql3::column_identifier> field]
: '=' t=term
{
add_raw_update(operations, std::move(key), make_shared<cql3::operation::set_field>(std::move(field), std::move(t)));
}
;
columnCondition[conditions_type& conditions]
// Note: we'll reject duplicates later
: key=cident
( op=relationType t=term { conditions.emplace_back(key, cql3::column_condition::raw::simple_condition(t, {}, *op)); }
( op=relationType t=term { conditions.emplace_back(key, cql3::column_condition::raw::simple_condition(t, *op)); }
| K_IN
( values=singleColumnInValues { conditions.emplace_back(key, cql3::column_condition::raw::in_condition({}, {}, values)); }
| marker=inMarker { conditions.emplace_back(key, cql3::column_condition::raw::in_condition({}, marker, {})); }
( values=singleColumnInValues { conditions.emplace_back(key, cql3::column_condition::raw::simple_in_condition(values)); }
| marker=inMarker { conditions.emplace_back(key, cql3::column_condition::raw::simple_in_condition(marker)); }
)
| '[' element=term ']'
( op=relationType t=term { conditions.emplace_back(key, cql3::column_condition::raw::simple_condition(t, element, *op)); }
( op=relationType t=term { conditions.emplace_back(key, cql3::column_condition::raw::collection_condition(t, element, *op)); }
| K_IN
( values=singleColumnInValues { conditions.emplace_back(key, cql3::column_condition::raw::in_condition(element, {}, values)); }
| marker=inMarker { conditions.emplace_back(key, cql3::column_condition::raw::in_condition(element, marker, {})); }
( values=singleColumnInValues { conditions.emplace_back(key, cql3::column_condition::raw::collection_in_condition(element, values)); }
| marker=inMarker { conditions.emplace_back(key, cql3::column_condition::raw::collection_in_condition(element, marker)); }
)
)
)
@@ -1502,7 +1488,6 @@ relationType returns [const cql3::operator_type* op = nullptr]
| '>' { $op = &cql3::operator_type::GT; }
| '>=' { $op = &cql3::operator_type::GTE; }
| '!=' { $op = &cql3::operator_type::NEQ; }
| K_LIKE { $op = &cql3::operator_type::LIKE; }
;
relation[std::vector<cql3::relation_ptr>& clauses]
@@ -1552,10 +1537,6 @@ tupleOfIdentifiers returns [std::vector<::shared_ptr<cql3::column_identifier::ra
: '(' n1=cident { $ids.push_back(n1); } (',' ni=cident { $ids.push_back(ni); })* ')'
;
listOfIdentifiers returns [std::vector<::shared_ptr<cql3::column_identifier::raw>> ids]
: n1=cident { $ids.push_back(n1); } (',' ni=cident { $ids.push_back(ni); })*
;
singleColumnInValues returns [std::vector<::shared_ptr<cql3::term::raw>> terms]
: '(' ( t1 = term { $terms.push_back(t1); } (',' ti=term { $terms.push_back(ti); })* )? ')'
;
@@ -1676,10 +1657,9 @@ tuple_type [bool internal] returns [shared_ptr<cql3::cql3_type::raw> t]
'>' { $t = cql3::cql3_type::raw::tuple(std::move(types)); }
;
username returns [sstring str]
: t=IDENT { $str = $t.text; }
| t=STRING_LITERAL { $str = $t.text; }
| s=unreserved_keyword { $str = s; }
username
: IDENT
| STRING_LITERAL
| QUOTED_NAME { add_recognition_error("Quoted strings are not supported for user names"); }
;
@@ -1748,10 +1728,8 @@ basic_unreserved_keyword returns [sstring str]
| K_JSON
| K_CACHE
| K_BYPASS
| K_LIKE
| K_PER
| K_PARTITION
| K_GROUP
) { $str = $k.text; }
;
@@ -1903,10 +1881,6 @@ K_PARTITION: P A R T I T I O N;
K_SCYLLA_TIMEUUID_LIST_INDEX: S C Y L L A '_' T I M E U U I D '_' L I S T '_' I N D E X;
K_SCYLLA_COUNTER_SHARD_LIST: S C Y L L A '_' C O U N T E R '_' S H A R D '_' L I S T;
K_GROUP: G R O U P;
K_LIKE: L I K E;
// Case-insensitive alpha characters
fragment A: ('a'|'A');
fragment B: ('b'|'B');

View File

@@ -45,7 +45,6 @@
#include "cql3/lists.hh"
#include "cql3/maps.hh"
#include "cql3/sets.hh"
#include "cql3/user_types.hh"
#include "types/list.hh"
namespace cql3 {
@@ -69,22 +68,19 @@ abstract_marker::raw::raw(int32_t bind_index)
::shared_ptr<term> abstract_marker::raw::prepare(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver)
{
if (receiver->type->is_collection()) {
if (receiver->type->get_kind() == abstract_type::kind::list) {
return ::make_shared<lists::marker>(_bind_index, receiver);
} else if (receiver->type->get_kind() == abstract_type::kind::set) {
return ::make_shared<sets::marker>(_bind_index, receiver);
} else if (receiver->type->get_kind() == abstract_type::kind::map) {
return ::make_shared<maps::marker>(_bind_index, receiver);
}
assert(0);
auto receiver_type = ::dynamic_pointer_cast<const collection_type_impl>(receiver->type);
if (receiver_type == nullptr) {
return ::make_shared<constants::marker>(_bind_index, receiver);
}
if (receiver->type->is_user_type()) {
return ::make_shared<user_types::marker>(_bind_index, receiver);
if (&receiver_type->_kind == &collection_type_impl::kind::list) {
return ::make_shared<lists::marker>(_bind_index, receiver);
} else if (&receiver_type->_kind == &collection_type_impl::kind::set) {
return ::make_shared<sets::marker>(_bind_index, receiver);
} else if (&receiver_type->_kind == &collection_type_impl::kind::map) {
return ::make_shared<maps::marker>(_bind_index, receiver);
}
return ::make_shared<constants::marker>(_bind_index, receiver);
assert(0);
return shared_ptr<term>();
}
assignment_testable::test_result abstract_marker::raw::test_assignment(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver) {

View File

@@ -45,8 +45,6 @@
#include "lists.hh"
#include "maps.hh"
#include <boost/range/algorithm_ext/push_back.hpp>
#include "types/map.hh"
#include "types/list.hh"
namespace {
@@ -63,54 +61,8 @@ void validate_operation_on_durations(const abstract_type& type, const cql3::oper
}
}
int is_satisfied_by(const cql3::operator_type &op, const abstract_type& cell_type,
const abstract_type& param_type, const data_value& cell_value, const bytes& param) {
int rc;
// For multi-cell sets and lists, cell value is represented as a map,
// thanks to collections_as_maps flag in partition_slice. param, however,
// is represented as a set or list type.
// We must implement an own compare of two different representations
// to compare the two.
if (cell_type.is_map() && cell_type.is_multi_cell() && param_type.is_listlike()) {
const listlike_collection_type_impl& list_type = static_cast<const listlike_collection_type_impl&>(param_type);
const map_type_impl& map_type = static_cast<const map_type_impl&>(cell_type);
assert(list_type.is_multi_cell());
// Inverse comparison result since the order of arguments is inverse.
rc = -list_type.compare_with_map(map_type, param, map_type.decompose(cell_value));
} else {
rc = cell_type.compare(cell_type.decompose(cell_value), param);
}
if (op == cql3::operator_type::EQ) {
return rc == 0;
} else if (op == cql3::operator_type::NEQ) {
return rc != 0;
} else if (op == cql3::operator_type::GTE) {
return rc >= 0;
} else if (op == cql3::operator_type::LTE) {
return rc <= 0;
} else if (op == cql3::operator_type::GT) {
return rc > 0;
} else if (op == cql3::operator_type::LT) {
return rc < 0;
}
assert(false);
return false;
}
// Read the list index from key and check that list index is not
// negative. The negative range check repeats Cassandra behaviour.
uint32_t read_and_check_list_index(const cql3::raw_value_view& key) {
// The list element type is always int32_type, see lists::index_spec_of
int32_t idx = read_simple_exactly<int32_t>(to_bytes(key));
if (idx < 0) {
throw exceptions::invalid_request_exception(format("Invalid negative list index {}", idx));
}
return static_cast<uint32_t>(idx);
}
} // end of anonymous namespace
namespace cql3 {
bool
@@ -140,134 +92,7 @@ void column_condition::collect_marker_specificaton(::shared_ptr<variable_specifi
value->collect_marker_specification(bound_names);
}
}
if (_value) {
_value->collect_marker_specification(bound_names);
}
}
bool column_condition::applies_to(const data_value* cell_value, const query_options& options) const {
// Cassandra condition support has a few quirks:
// - only a simple conjunct of predicates is supported "predicate AND predicate AND ..."
// - a predicate can operate on a column or a collection element, which must always be
// on the right side: "a = 3" or "collection['key'] IN (1,2,3)"
// - parameter markers are allowed on the right hand side only
// - only <, >, >=, <=, != and IN predicates are supported.
// - NULLs and missing values are treated differently from the WHERE clause:
// a term or cell in IF clause is allowed to be NULL or compared with NULL,
// and NULL value is treated just like any other value in the domain (there is no
// three-value logic or UNKNOWN like in SQL).
// - empty sets/lists/maps are treated differently when comparing with NULLs depending on
// whether the object is frozen or not. An empty *frozen* set/map/list is not equal to NULL.
// An empty *multi-cell* set/map/list is identical to NULL.
// The code below implements these rules in a way compatible with Cassandra.
// Use a map/list value instead of entire collection if a key is present in the predicate.
if (_collection_element != nullptr && cell_value != nullptr) {
// Checked in column_condition::raw::prepare()
assert(cell_value->type()->is_collection());
const collection_type_impl& cell_type = static_cast<const collection_type_impl&>(*cell_value->type());
cql3::raw_value_view key = _collection_element->bind_and_get(options);
if (key.is_unset_value()) {
throw exceptions::invalid_request_exception(
format("Invalid 'unset' value in {} element access", cell_type.cql3_type_name()));
}
if (key.is_null()) {
throw exceptions::invalid_request_exception(
format("Invalid null value for {} element access", cell_type.cql3_type_name()));
}
if (cell_type.is_map()) {
// If a collection is multi-cell and not frozen, it is returned as a map even if the
// underlying data type is "set" or "list". This is controlled by
// partition_slice::collections_as_maps enum, which is set when preparing a read command
// object. Representing a list as a map<timeuuid, listval> is necessary to identify the list field
// being updated, e.g. in case of UPDATE t SET list[3] = null WHERE a = 1 IF list[3]
// = 'key'
const map_type_impl& map_type = static_cast<const map_type_impl&>(cell_type);
// A map is serialized as a vector of data value pairs.
const std::vector<std::pair<data_value, data_value>>& map = map_type.from_value(*cell_value);
if (column.type->is_map()) {
// We're working with a map *type*, not only map *representation*.
with_linearized(*key, [&map, &map_type, &cell_value] (bytes_view key) {
auto end = map.end();
const auto& map_key_type = *map_type.get_keys_type();
auto less = [&map_key_type](const std::pair<data_value, data_value>& value, bytes_view key) {
return map_key_type.less(map_key_type.decompose(value.first), key);
};
// Map elements are sorted by key.
auto it = std::lower_bound(map.begin(), end, key, less);
if (it != end && map_key_type.equal(map_key_type.decompose(it->first), key)) {
cell_value = &it->second;
} else {
cell_value = nullptr;
}
});
} else if (column.type->is_list()) {
// We're working with a list type, represented as map.
uint32_t idx = read_and_check_list_index(key);
cell_value = idx >= map.size() ? nullptr : &map[idx].second;
} else {
// Syntax like "set_column['key'] = constant" is invalid.
assert(false);
}
} else if (cell_type.is_list()) {
// This is a *frozen* list.
const list_type_impl& list_type = static_cast<const list_type_impl&>(cell_type);
const std::vector<data_value>& list = list_type.from_value(*cell_value);
uint32_t idx = read_and_check_list_index(key);
cell_value = idx >= list.size() ? nullptr : &list[idx];
} else {
assert(false);
}
}
if (_op.is_compare()) {
// <, >, >=, <=, !=
cql3::raw_value_view param = _value->bind_and_get(options);
if (param.is_unset_value()) {
throw exceptions::invalid_request_exception("Invalid 'unset' value in condition");
}
if (param.is_null()) {
if (_op == operator_type::EQ) {
return cell_value == nullptr;
} else if (_op == operator_type::NEQ) {
return cell_value != nullptr;
} else {
throw exceptions::invalid_request_exception(format("Invalid comparison with null for operator \"{}\"", _op));
}
} else if (cell_value == nullptr) {
// The condition parameter is not null, so only NEQ can return true
return _op == operator_type::NEQ;
}
// type::validate() is called by bind_and_get(), so it's safe to pass to_bytes() result
// directly to compare.
return is_satisfied_by(_op, *cell_value->type(), *column.type, *cell_value, to_bytes(param));
}
assert(_op == operator_type::IN);
std::vector<bytes_opt> in_values;
if (_value) {
auto&& lval = dynamic_pointer_cast<multi_item_terminal>(_value->bind(options));
if (!lval) {
throw exceptions::invalid_request_exception("Invalid null value for IN condition");
}
in_values = std::move(lval->get_elements());
} else {
for (auto&& v : _in_values) {
in_values.emplace_back(to_bytes_opt(v->bind_and_get(options)));
}
}
// If cell value is NULL, IN list must contain NULL or an empty set/list. Otherwise it must contain cell value.
if (cell_value) {
return std::any_of(in_values.begin(), in_values.end(), [this, cell_value] (const bytes_opt& value) {
return value.has_value() && is_satisfied_by(operator_type::EQ, *cell_value->type(), *column.type, *cell_value, *value);
});
} else {
return std::any_of(in_values.begin(), in_values.end(), [] (const bytes_opt& value) { return !value.has_value() || value->empty(); });
}
_value->collect_marker_specification(bound_names);
}
::shared_ptr<column_condition>
@@ -275,54 +100,61 @@ column_condition::raw::prepare(database& db, const sstring& keyspace, const colu
if (receiver.type->is_counter()) {
throw exceptions::invalid_request_exception("Conditions on counters are not supported");
}
shared_ptr<term> collection_element_term;
shared_ptr<column_specification> value_spec = receiver.column_specification;
if (_collection_element) {
if (!receiver.type->is_collection()) {
throw exceptions::invalid_request_exception(format("Invalid element access syntax for non-collection column {}",
receiver.name_as_text()));
}
// Pass a correct type specification to the collection_element->prepare(), so that it can
// later be used to validate the parameter type is compatible with receiver type.
shared_ptr<column_specification> element_spec;
auto ctype = static_cast<const collection_type_impl*>(receiver.type.get());
if (ctype->get_kind() == abstract_type::kind::list) {
element_spec = lists::index_spec_of(receiver.column_specification);
value_spec = lists::value_spec_of(receiver.column_specification);
} else if (ctype->get_kind() == abstract_type::kind::map) {
element_spec = maps::key_spec_of(*receiver.column_specification);
value_spec = maps::value_spec_of(*receiver.column_specification);
} else if (ctype->get_kind() == abstract_type::kind::set) {
throw exceptions::invalid_request_exception(format("Invalid element access syntax for set column {}",
receiver.name_as_text()));
if (!_collection_element) {
if (_op == operator_type::IN) {
if (_in_values.empty()) { // ?
return column_condition::in_condition(receiver, _in_marker->prepare(db, keyspace, receiver.column_specification));
}
std::vector<::shared_ptr<term>> terms;
for (auto&& value : _in_values) {
terms.push_back(value->prepare(db, keyspace, receiver.column_specification));
}
return column_condition::in_condition(receiver, std::move(terms));
} else {
throw exceptions::invalid_request_exception(
format("Unsupported collection type {} in a condition with element access", ctype->cql3_type_name()));
validate_operation_on_durations(*receiver.type, _op);
return column_condition::condition(receiver, _value->prepare(db, keyspace, receiver.column_specification), _op);
}
collection_element_term = _collection_element->prepare(db, keyspace, element_spec);
}
if (_op.is_compare()) {
if (!receiver.type->is_collection()) {
throw exceptions::invalid_request_exception(format("Invalid element access syntax for non-collection column {}", receiver.name_as_text()));
}
shared_ptr<column_specification> element_spec, value_spec;
auto ctype = static_cast<const collection_type_impl*>(receiver.type.get());
if (&ctype->_kind == &collection_type_impl::kind::list) {
element_spec = lists::index_spec_of(receiver.column_specification);
value_spec = lists::value_spec_of(receiver.column_specification);
} else if (&ctype->_kind == &collection_type_impl::kind::map) {
element_spec = maps::key_spec_of(*receiver.column_specification);
value_spec = maps::value_spec_of(*receiver.column_specification);
} else if (&ctype->_kind == &collection_type_impl::kind::set) {
throw exceptions::invalid_request_exception(format("Invalid element access syntax for set column {}", receiver.name()));
} else {
abort();
}
if (_op == operator_type::IN) {
if (_in_values.empty()) {
return column_condition::in_condition(receiver,
_collection_element->prepare(db, keyspace, element_spec),
_in_marker->prepare(db, keyspace, value_spec));
}
std::vector<shared_ptr<term>> terms;
terms.reserve(_in_values.size());
boost::push_back(terms, _in_values
| boost::adaptors::transformed(std::bind(&term::raw::prepare, std::placeholders::_1, std::ref(db), std::ref(keyspace), value_spec)));
return column_condition::in_condition(receiver, _collection_element->prepare(db, keyspace, element_spec), terms);
} else {
validate_operation_on_durations(*receiver.type, _op);
return column_condition::condition(receiver, collection_element_term, _value->prepare(db, keyspace, value_spec), _op);
}
if (_op != operator_type::IN) {
throw exceptions::invalid_request_exception(format("Unsupported operator type {} in a condition ", _op));
}
if (_in_marker) {
assert(_in_values.empty());
shared_ptr<term> multi_item_term = _in_marker->prepare(db, keyspace, value_spec);
return column_condition::in_condition(receiver, collection_element_term, multi_item_term, {});
return column_condition::condition(receiver,
_collection_element->prepare(db, keyspace, element_spec),
_value->prepare(db, keyspace, value_spec),
_op);
}
// Both _in_values and in _in_marker can be missing in case of empty IN list: "a IN ()"
std::vector<::shared_ptr<term>> terms;
terms.reserve(_in_values.size());
for (auto&& value : _in_values) {
terms.push_back(value->prepare(db, keyspace, value_spec));
}
return column_condition::in_condition(receiver, collection_element_term, {}, std::move(terms));
}
} // end of namespace cql3
}

View File

@@ -52,18 +52,11 @@ namespace cql3 {
*/
class column_condition final {
public:
// If _collection_element is not zero, this defines the receiver cell, not the entire receiver
// column.
// E.g. if column type is list<string> and expression is "a = ['test']", then the type of the
// column definition below is list<string>. If expression is "a[0] = 'test'", then the column
// object stands for the string cell. See column_condition::raw::prepare() for details.
const column_definition& column;
private:
// For collection, when testing the equality of a specific element, nullptr otherwise.
::shared_ptr<term> _collection_element;
// A literal value for comparison predicates or a multi item terminal for "a IN ?"
::shared_ptr<term> _value;
// List of terminals for "a IN (value, value, ...)"
std::vector<::shared_ptr<term>> _in_values;
const operator_type& _op;
public:
@@ -79,6 +72,41 @@ public:
assert(_in_values.empty());
}
}
static ::shared_ptr<column_condition> condition(const column_definition& def, ::shared_ptr<term> value, const operator_type& op) {
return ::make_shared<column_condition>(def, ::shared_ptr<term>{}, std::move(value), std::vector<::shared_ptr<term>>{}, op);
}
static ::shared_ptr<column_condition> condition(const column_definition& def, ::shared_ptr<term> collection_element,
::shared_ptr<term> value, const operator_type& op) {
return ::make_shared<column_condition>(def, std::move(collection_element), std::move(value),
std::vector<::shared_ptr<term>>{}, op);
}
static ::shared_ptr<column_condition> in_condition(const column_definition& def, std::vector<::shared_ptr<term>> in_values) {
return ::make_shared<column_condition>(def, ::shared_ptr<term>{}, ::shared_ptr<term>{},
std::move(in_values), operator_type::IN);
}
static ::shared_ptr<column_condition> in_condition(const column_definition& def, ::shared_ptr<term> collection_element,
std::vector<::shared_ptr<term>> in_values) {
return ::make_shared<column_condition>(def, std::move(collection_element), ::shared_ptr<term>{},
std::move(in_values), operator_type::IN);
}
static ::shared_ptr<column_condition> in_condition(const column_definition& def, ::shared_ptr<term> in_marker) {
return ::make_shared<column_condition>(def, ::shared_ptr<term>{}, std::move(in_marker),
std::vector<::shared_ptr<term>>{}, operator_type::IN);
}
static ::shared_ptr<column_condition> in_condition(const column_definition& def, ::shared_ptr<term> collection_element,
::shared_ptr<term> in_marker) {
return ::make_shared<column_condition>(def, std::move(collection_element), std::move(in_marker),
std::vector<::shared_ptr<term>>{}, operator_type::IN);
}
bool uses_function(const sstring& ks_name, const sstring& function_name);
public:
/**
* Collects the column specification for the bind variables of this operation.
*
@@ -87,34 +115,592 @@ public:
*/
void collect_marker_specificaton(::shared_ptr<variable_specifications> bound_names);
bool uses_function(const sstring& ks_name, const sstring& function_name);
// Retrieve parameter marker values, if any, find the appropriate collection
// element if the cell is a collection and an element access is used in the expression,
// and evaluate the condition.
bool applies_to(const data_value* cell_value, const query_options& options) const;
// Helper constructor wrapper for "IF col['key'] = 'foo'" or "IF col = 'foo'" */
static ::shared_ptr<column_condition> condition(const column_definition& def, ::shared_ptr<term> collection_element,
::shared_ptr<term> value, const operator_type& op) {
return ::make_shared<column_condition>(def, std::move(collection_element), std::move(value),
std::vector<::shared_ptr<term>>{}, op);
#if 0
public ColumnCondition.Bound bind(QueryOptions options) throws InvalidRequestException
{
boolean isInCondition = operator == Operator.IN;
if (column.type instanceof CollectionType)
{
if (collectionElement == null)
return isInCondition ? new CollectionInBound(this, options) : new CollectionBound(this, options);
else
return isInCondition ? new ElementAccessInBound(this, options) : new ElementAccessBound(this, options);
}
return isInCondition ? new SimpleInBound(this, options) : new SimpleBound(this, options);
}
// Helper constructor wrapper for "IF col IN ... and IF col['key'] IN ... */
static ::shared_ptr<column_condition> in_condition(const column_definition& def, ::shared_ptr<term> collection_element,
::shared_ptr<term> in_marker, std::vector<::shared_ptr<term>> in_values) {
return ::make_shared<column_condition>(def, std::move(collection_element), std::move(in_marker),
std::move(in_values), operator_type::IN);
public static abstract class Bound
{
public final ColumnDefinition column;
public final Operator operator;
protected Bound(ColumnDefinition column, Operator operator)
{
this.column = column;
this.operator = operator;
}
/**
* Validates whether this condition applies to {@code current}.
*/
public abstract boolean appliesTo(Composite rowPrefix, ColumnFamily current, long now) throws InvalidRequestException;
public ByteBuffer getCollectionElementValue()
{
return null;
}
protected boolean isSatisfiedByValue(ByteBuffer value, Cell c, AbstractType<?> type, Operator operator, long now) throws InvalidRequestException
{
ByteBuffer columnValue = (c == null || !c.isLive(now)) ? null : c.value();
return compareWithOperator(operator, type, value, columnValue);
}
/** Returns true if the operator is satisfied (i.e. "value operator otherValue == true"), false otherwise. */
protected boolean compareWithOperator(Operator operator, AbstractType<?> type, ByteBuffer value, ByteBuffer otherValue) throws InvalidRequestException
{
if (value == null)
{
switch (operator)
{
case EQ:
return otherValue == null;
case NEQ:
return otherValue != null;
default:
throw new InvalidRequestException(String.format("Invalid comparison with null for operator \"%s\"", operator));
}
}
else if (otherValue == null)
{
// the condition value is not null, so only NEQ can return true
return operator == Operator.NEQ;
}
int comparison = type.compare(otherValue, value);
switch (operator)
{
case EQ:
return comparison == 0;
case LT:
return comparison < 0;
case LTE:
return comparison <= 0;
case GT:
return comparison > 0;
case GTE:
return comparison >= 0;
case NEQ:
return comparison != 0;
default:
// we shouldn't get IN, CONTAINS, or CONTAINS KEY here
throw new AssertionError();
}
}
protected Iterator<Cell> collectionColumns(CellName collection, ColumnFamily cf, final long now)
{
// We are testing for collection equality, so we need to have the expected values *and* only those.
ColumnSlice[] collectionSlice = new ColumnSlice[]{ collection.slice() };
// Filter live columns, this makes things simpler afterwards
return Iterators.filter(cf.iterator(collectionSlice), new Predicate<Cell>()
{
public boolean apply(Cell c)
{
// we only care about live columns
return c.isLive(now);
}
});
}
}
/**
* A condition on a single non-collection column. This does not support IN operators (see SimpleInBound).
*/
static class SimpleBound extends Bound
{
public final ByteBuffer value;
private SimpleBound(ColumnCondition condition, QueryOptions options) throws InvalidRequestException
{
super(condition.column, condition.operator);
assert !(column.type instanceof CollectionType) && condition.collectionElement == null;
assert condition.operator != Operator.IN;
this.value = condition.value.bindAndGet(options);
}
public boolean appliesTo(Composite rowPrefix, ColumnFamily current, long now) throws InvalidRequestException
{
CellName name = current.metadata().comparator.create(rowPrefix, column);
return isSatisfiedByValue(value, current.getColumn(name), column.type, operator, now);
}
}
/**
* An IN condition on a single non-collection column.
*/
static class SimpleInBound extends Bound
{
public final List<ByteBuffer> inValues;
private SimpleInBound(ColumnCondition condition, QueryOptions options) throws InvalidRequestException
{
super(condition.column, condition.operator);
assert !(column.type instanceof CollectionType) && condition.collectionElement == null;
assert condition.operator == Operator.IN;
if (condition.inValues == null)
this.inValues = ((Lists.Marker) condition.value).bind(options).getElements();
else
{
this.inValues = new ArrayList<>(condition.inValues.size());
for (Term value : condition.inValues)
this.inValues.add(value.bindAndGet(options));
}
}
public boolean appliesTo(Composite rowPrefix, ColumnFamily current, long now) throws InvalidRequestException
{
CellName name = current.metadata().comparator.create(rowPrefix, column);
for (ByteBuffer value : inValues)
{
if (isSatisfiedByValue(value, current.getColumn(name), column.type, Operator.EQ, now))
return true;
}
return false;
}
}
/** A condition on an element of a collection column. IN operators are not supported here, see ElementAccessInBound. */
static class ElementAccessBound extends Bound
{
public final ByteBuffer collectionElement;
public final ByteBuffer value;
private ElementAccessBound(ColumnCondition condition, QueryOptions options) throws InvalidRequestException
{
super(condition.column, condition.operator);
assert column.type instanceof CollectionType && condition.collectionElement != null;
assert condition.operator != Operator.IN;
this.collectionElement = condition.collectionElement.bindAndGet(options);
this.value = condition.value.bindAndGet(options);
}
public boolean appliesTo(Composite rowPrefix, ColumnFamily current, final long now) throws InvalidRequestException
{
if (collectionElement == null)
throw new InvalidRequestException("Invalid null value for " + (column.type instanceof MapType ? "map" : "list") + " element access");
if (column.type instanceof MapType)
{
MapType mapType = (MapType) column.type;
if (column.type.isMultiCell())
{
Cell cell = current.getColumn(current.metadata().comparator.create(rowPrefix, column, collectionElement));
return isSatisfiedByValue(value, cell, mapType.getValuesType(), operator, now);
}
else
{
Cell cell = current.getColumn(current.metadata().comparator.create(rowPrefix, column));
ByteBuffer mapElementValue = cell.isLive(now) ? mapType.getSerializer().getSerializedValue(cell.value(), collectionElement, mapType.getKeysType())
: null;
return compareWithOperator(operator, mapType.getValuesType(), value, mapElementValue);
}
}
// sets don't have element access, so it's a list
ListType listType = (ListType) column.type;
if (column.type.isMultiCell())
{
ByteBuffer columnValue = getListItem(
collectionColumns(current.metadata().comparator.create(rowPrefix, column), current, now),
getListIndex(collectionElement));
return compareWithOperator(operator, listType.getElementsType(), value, columnValue);
}
else
{
Cell cell = current.getColumn(current.metadata().comparator.create(rowPrefix, column));
ByteBuffer listElementValue = cell.isLive(now) ? listType.getSerializer().getElement(cell.value(), getListIndex(collectionElement))
: null;
return compareWithOperator(operator, listType.getElementsType(), value, listElementValue);
}
}
static int getListIndex(ByteBuffer collectionElement) throws InvalidRequestException
{
int idx = ByteBufferUtil.toInt(collectionElement);
if (idx < 0)
throw new InvalidRequestException(String.format("Invalid negative list index %d", idx));
return idx;
}
static ByteBuffer getListItem(Iterator<Cell> iter, int index)
{
int adv = Iterators.advance(iter, index);
if (adv == index && iter.hasNext())
return iter.next().value();
else
return null;
}
public ByteBuffer getCollectionElementValue()
{
return collectionElement;
}
}
static class ElementAccessInBound extends Bound
{
public final ByteBuffer collectionElement;
public final List<ByteBuffer> inValues;
private ElementAccessInBound(ColumnCondition condition, QueryOptions options) throws InvalidRequestException
{
super(condition.column, condition.operator);
assert column.type instanceof CollectionType && condition.collectionElement != null;
this.collectionElement = condition.collectionElement.bindAndGet(options);
if (condition.inValues == null)
this.inValues = ((Lists.Marker) condition.value).bind(options).getElements();
else
{
this.inValues = new ArrayList<>(condition.inValues.size());
for (Term value : condition.inValues)
this.inValues.add(value.bindAndGet(options));
}
}
public boolean appliesTo(Composite rowPrefix, ColumnFamily current, final long now) throws InvalidRequestException
{
if (collectionElement == null)
throw new InvalidRequestException("Invalid null value for " + (column.type instanceof MapType ? "map" : "list") + " element access");
CellNameType nameType = current.metadata().comparator;
if (column.type instanceof MapType)
{
MapType mapType = (MapType) column.type;
AbstractType<?> valueType = mapType.getValuesType();
if (column.type.isMultiCell())
{
CellName name = nameType.create(rowPrefix, column, collectionElement);
Cell item = current.getColumn(name);
for (ByteBuffer value : inValues)
{
if (isSatisfiedByValue(value, item, valueType, Operator.EQ, now))
return true;
}
return false;
}
else
{
Cell cell = current.getColumn(nameType.create(rowPrefix, column));
ByteBuffer mapElementValue = null;
if (cell != null && cell.isLive(now))
mapElementValue = mapType.getSerializer().getSerializedValue(cell.value(), collectionElement, mapType.getKeysType());
for (ByteBuffer value : inValues)
{
if (value == null)
{
if (mapElementValue == null)
return true;
continue;
}
if (valueType.compare(value, mapElementValue) == 0)
return true;
}
return false;
}
}
ListType listType = (ListType) column.type;
AbstractType<?> elementsType = listType.getElementsType();
if (column.type.isMultiCell())
{
ByteBuffer columnValue = ElementAccessBound.getListItem(
collectionColumns(nameType.create(rowPrefix, column), current, now),
ElementAccessBound.getListIndex(collectionElement));
for (ByteBuffer value : inValues)
{
if (compareWithOperator(Operator.EQ, elementsType, value, columnValue))
return true;
}
}
else
{
Cell cell = current.getColumn(nameType.create(rowPrefix, column));
ByteBuffer listElementValue = null;
if (cell != null && cell.isLive(now))
listElementValue = listType.getSerializer().getElement(cell.value(), ElementAccessBound.getListIndex(collectionElement));
for (ByteBuffer value : inValues)
{
if (value == null)
{
if (listElementValue == null)
return true;
continue;
}
if (elementsType.compare(value, listElementValue) == 0)
return true;
}
}
return false;
}
}
/** A condition on an entire collection column. IN operators are not supported here, see CollectionInBound. */
static class CollectionBound extends Bound
{
private final Term.Terminal value;
private CollectionBound(ColumnCondition condition, QueryOptions options) throws InvalidRequestException
{
super(condition.column, condition.operator);
assert column.type.isCollection() && condition.collectionElement == null;
assert condition.operator != Operator.IN;
this.value = condition.value.bind(options);
}
public boolean appliesTo(Composite rowPrefix, ColumnFamily current, final long now) throws InvalidRequestException
{
CollectionType type = (CollectionType)column.type;
if (type.isMultiCell())
{
Iterator<Cell> iter = collectionColumns(current.metadata().comparator.create(rowPrefix, column), current, now);
if (value == null)
{
if (operator == Operator.EQ)
return !iter.hasNext();
else if (operator == Operator.NEQ)
return iter.hasNext();
else
throw new InvalidRequestException(String.format("Invalid comparison with null for operator \"%s\"", operator));
}
return valueAppliesTo(type, iter, value, operator);
}
// frozen collections
Cell cell = current.getColumn(current.metadata().comparator.create(rowPrefix, column));
if (value == null)
{
if (operator == Operator.EQ)
return cell == null || !cell.isLive(now);
else if (operator == Operator.NEQ)
return cell != null && cell.isLive(now);
else
throw new InvalidRequestException(String.format("Invalid comparison with null for operator \"%s\"", operator));
}
// make sure we use v3 serialization format for comparison
ByteBuffer conditionValue;
if (type.kind == CollectionType.Kind.LIST)
conditionValue = ((Lists.Value) value).getWithProtocolVersion(Server.VERSION_3);
else if (type.kind == CollectionType.Kind.SET)
conditionValue = ((Sets.Value) value).getWithProtocolVersion(Server.VERSION_3);
else
conditionValue = ((Maps.Value) value).getWithProtocolVersion(Server.VERSION_3);
return compareWithOperator(operator, type, conditionValue, cell.value());
}
static boolean valueAppliesTo(CollectionType type, Iterator<Cell> iter, Term.Terminal value, Operator operator)
{
if (value == null)
return !iter.hasNext();
switch (type.kind)
{
case LIST: return listAppliesTo((ListType)type, iter, ((Lists.Value)value).elements, operator);
case SET: return setAppliesTo((SetType)type, iter, ((Sets.Value)value).elements, operator);
case MAP: return mapAppliesTo((MapType)type, iter, ((Maps.Value)value).map, operator);
}
throw new AssertionError();
}
private static boolean setOrListAppliesTo(AbstractType<?> type, Iterator<Cell> iter, Iterator<ByteBuffer> conditionIter, Operator operator, boolean isSet)
{
while(iter.hasNext())
{
if (!conditionIter.hasNext())
return (operator == Operator.GT) || (operator == Operator.GTE) || (operator == Operator.NEQ);
// for lists we use the cell value; for sets we use the cell name
ByteBuffer cellValue = isSet? iter.next().name().collectionElement() : iter.next().value();
int comparison = type.compare(cellValue, conditionIter.next());
if (comparison != 0)
return evaluateComparisonWithOperator(comparison, operator);
}
if (conditionIter.hasNext())
return (operator == Operator.LT) || (operator == Operator.LTE) || (operator == Operator.NEQ);
// they're equal
return operator == Operator.EQ || operator == Operator.LTE || operator == Operator.GTE;
}
private static boolean evaluateComparisonWithOperator(int comparison, Operator operator)
{
// called when comparison != 0
switch (operator)
{
case EQ:
return false;
case LT:
case LTE:
return comparison < 0;
case GT:
case GTE:
return comparison > 0;
case NEQ:
return true;
default:
throw new AssertionError();
}
}
static boolean listAppliesTo(ListType type, Iterator<Cell> iter, List<ByteBuffer> elements, Operator operator)
{
return setOrListAppliesTo(type.getElementsType(), iter, elements.iterator(), operator, false);
}
static boolean setAppliesTo(SetType type, Iterator<Cell> iter, Set<ByteBuffer> elements, Operator operator)
{
ArrayList<ByteBuffer> sortedElements = new ArrayList<>(elements.size());
sortedElements.addAll(elements);
Collections.sort(sortedElements, type.getElementsType());
return setOrListAppliesTo(type.getElementsType(), iter, sortedElements.iterator(), operator, true);
}
static boolean mapAppliesTo(MapType type, Iterator<Cell> iter, Map<ByteBuffer, ByteBuffer> elements, Operator operator)
{
Iterator<Map.Entry<ByteBuffer, ByteBuffer>> conditionIter = elements.entrySet().iterator();
while(iter.hasNext())
{
if (!conditionIter.hasNext())
return (operator == Operator.GT) || (operator == Operator.GTE) || (operator == Operator.NEQ);
Map.Entry<ByteBuffer, ByteBuffer> conditionEntry = conditionIter.next();
Cell c = iter.next();
// compare the keys
int comparison = type.getKeysType().compare(c.name().collectionElement(), conditionEntry.getKey());
if (comparison != 0)
return evaluateComparisonWithOperator(comparison, operator);
// compare the values
comparison = type.getValuesType().compare(c.value(), conditionEntry.getValue());
if (comparison != 0)
return evaluateComparisonWithOperator(comparison, operator);
}
if (conditionIter.hasNext())
return (operator == Operator.LT) || (operator == Operator.LTE) || (operator == Operator.NEQ);
// they're equal
return operator == Operator.EQ || operator == Operator.LTE || operator == Operator.GTE;
}
}
public static class CollectionInBound extends Bound
{
private final List<Term.Terminal> inValues;
private CollectionInBound(ColumnCondition condition, QueryOptions options) throws InvalidRequestException
{
super(condition.column, condition.operator);
assert column.type instanceof CollectionType && condition.collectionElement == null;
assert condition.operator == Operator.IN;
inValues = new ArrayList<>();
if (condition.inValues == null)
{
// We have a list of serialized collections that need to be deserialized for later comparisons
CollectionType collectionType = (CollectionType) column.type;
Lists.Marker inValuesMarker = (Lists.Marker) condition.value;
if (column.type instanceof ListType)
{
ListType deserializer = ListType.getInstance(collectionType.valueComparator(), false);
for (ByteBuffer buffer : inValuesMarker.bind(options).elements)
{
if (buffer == null)
this.inValues.add(null);
else
this.inValues.add(Lists.Value.fromSerialized(buffer, deserializer, options.getProtocolVersion()));
}
}
else if (column.type instanceof MapType)
{
MapType deserializer = MapType.getInstance(collectionType.nameComparator(), collectionType.valueComparator(), false);
for (ByteBuffer buffer : inValuesMarker.bind(options).elements)
{
if (buffer == null)
this.inValues.add(null);
else
this.inValues.add(Maps.Value.fromSerialized(buffer, deserializer, options.getProtocolVersion()));
}
}
else if (column.type instanceof SetType)
{
SetType deserializer = SetType.getInstance(collectionType.valueComparator(), false);
for (ByteBuffer buffer : inValuesMarker.bind(options).elements)
{
if (buffer == null)
this.inValues.add(null);
else
this.inValues.add(Sets.Value.fromSerialized(buffer, deserializer, options.getProtocolVersion()));
}
}
}
else
{
for (Term value : condition.inValues)
this.inValues.add(value.bind(options));
}
}
public boolean appliesTo(Composite rowPrefix, ColumnFamily current, final long now) throws InvalidRequestException
{
CollectionType type = (CollectionType)column.type;
CellName name = current.metadata().comparator.create(rowPrefix, column);
if (type.isMultiCell())
{
// copy iterator contents so that we can properly reuse them for each comparison with an IN value
List<Cell> cells = newArrayList(collectionColumns(name, current, now));
for (Term.Terminal value : inValues)
{
if (CollectionBound.valueAppliesTo(type, cells.iterator(), value, Operator.EQ))
return true;
}
return false;
}
else
{
Cell cell = current.getColumn(name);
for (Term.Terminal value : inValues)
{
if (value == null)
{
if (cell == null || !cell.isLive(now))
return true;
}
else if (type.compare(((Term.CollectionTerminal)value).getWithProtocolVersion(Server.VERSION_3), cell.value()) == 0)
{
return true;
}
}
return false;
}
}
}
#endif
class raw final {
private:
::shared_ptr<term::raw> _value;
std::vector<::shared_ptr<term::raw>> _in_values;
::shared_ptr<abstract_marker::in_raw> _in_marker;
// Can be nullptr, used with the syntax "IF m[e] = ..." (in which case it's 'e')
// Can be nullptr, only used with the syntax "IF m[e] = ..." (in which case it's 'e')
::shared_ptr<term::raw> _collection_element;
const operator_type& _op;
public:
@@ -130,29 +716,46 @@ public:
, _op(op)
{ }
/** A condition on a column or collection element. For example: "IF col['key'] = 'foo'" or "IF col = 'foo'" */
static ::shared_ptr<raw> simple_condition(::shared_ptr<term::raw> value, ::shared_ptr<term::raw> collection_element,
const operator_type& op) {
/** A condition on a column. For example: "IF col = 'foo'" */
static ::shared_ptr<raw> simple_condition(::shared_ptr<term::raw> value, const operator_type& op) {
return ::make_shared<raw>(std::move(value), std::vector<::shared_ptr<term::raw>>{},
::shared_ptr<abstract_marker::in_raw>{}, std::move(collection_element), op);
::shared_ptr<abstract_marker::in_raw>{}, ::shared_ptr<term::raw>{}, op);
}
/**
* An IN condition on a column or a collection element. IN may contain a list of values or a single marker.
* For example:
* "IF col IN ('foo', 'bar', ...)"
* "IF col IN ?"
* "IF col['key'] IN * ('foo', 'bar', ...)"
* "IF col['key'] IN ?"
*/
static ::shared_ptr<raw> in_condition(::shared_ptr<term::raw> collection_element,
::shared_ptr<abstract_marker::in_raw> in_marker, std::vector<::shared_ptr<term::raw>> in_values) {
return ::make_shared<raw>(::shared_ptr<term::raw>{}, std::move(in_values), std::move(in_marker),
std::move(collection_element), operator_type::IN);
/** An IN condition on a column. For example: "IF col IN ('foo', 'bar', ...)" */
static ::shared_ptr<raw> simple_in_condition(std::vector<::shared_ptr<term::raw>> in_values) {
return ::make_shared<raw>(::shared_ptr<term::raw>{}, std::move(in_values),
::shared_ptr<abstract_marker::in_raw>{}, ::shared_ptr<term::raw>{}, operator_type::IN);
}
/** An IN condition on a column with a single marker. For example: "IF col IN ?" */
static ::shared_ptr<raw> simple_in_condition(::shared_ptr<abstract_marker::in_raw> in_marker) {
return ::make_shared<raw>(::shared_ptr<term::raw>{}, std::vector<::shared_ptr<term::raw>>{},
std::move(in_marker), ::shared_ptr<term::raw>{}, operator_type::IN);
}
/** A condition on a collection element. For example: "IF col['key'] = 'foo'" */
static ::shared_ptr<raw> collection_condition(::shared_ptr<term::raw> value, ::shared_ptr<term::raw> collection_element,
const operator_type& op) {
return ::make_shared<raw>(std::move(value), std::vector<::shared_ptr<term::raw>>{}, ::shared_ptr<abstract_marker::in_raw>{}, std::move(collection_element), op);
}
/** An IN condition on a collection element. For example: "IF col['key'] IN ('foo', 'bar', ...)" */
static ::shared_ptr<raw> collection_in_condition(::shared_ptr<term::raw> collection_element,
std::vector<::shared_ptr<term::raw>> in_values) {
return ::make_shared<raw>(::shared_ptr<term::raw>{}, std::move(in_values), ::shared_ptr<abstract_marker::in_raw>{},
std::move(collection_element), operator_type::IN);
}
/** An IN condition on a collection element with a single marker. For example: "IF col['key'] IN ?" */
static ::shared_ptr<raw> collection_in_condition(::shared_ptr<term::raw> collection_element,
::shared_ptr<abstract_marker::in_raw> in_marker) {
return ::make_shared<raw>(::shared_ptr<term::raw>{}, std::vector<::shared_ptr<term::raw>>{}, std::move(in_marker),
std::move(collection_element), operator_type::IN);
}
::shared_ptr<column_condition> prepare(database& db, const sstring& keyspace, const column_definition& receiver);
};
};
} // end of namespace cql3
}

View File

@@ -85,7 +85,7 @@ assignment_testable::test_result
constants::literal::test_assignment(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver)
{
auto receiver_type = receiver->type->as_cql3_type();
if (receiver_type.is_collection() || receiver_type.is_user_type()) {
if (receiver_type.is_collection()) {
return test_result::NOT_ASSIGNABLE;
}
if (!receiver_type.is_native()) {
@@ -166,10 +166,10 @@ constants::literal::prepare(database& db, const sstring& keyspace, ::shared_ptr<
void constants::deleter::execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) {
if (column.type->is_multi_cell()) {
collection_mutation_description coll_m;
collection_type_impl::mutation coll_m;
coll_m.tomb = params.make_tombstone();
m.set_cell(prefix, column, coll_m.serialize(*column.type));
auto ctype = static_pointer_cast<const collection_type_impl>(column.type);
m.set_cell(prefix, column, atomic_cell_or_collection::from_collection_mutation(ctype->serialize_mutation_form(coll_m)));
} else {
m.set_cell(prefix, column, make_dead_cell(params));
}

View File

@@ -173,7 +173,7 @@ public:
marker(int32_t bind_index, ::shared_ptr<column_specification> receiver)
: abstract_marker{bind_index, std::move(receiver)}
{
assert(!_receiver->type->is_collection() && !_receiver->type->is_user_type());
assert(!_receiver->type->is_collection());
}
virtual cql3::raw_value_view bind_and_get(const query_options& options) override {

View File

@@ -37,7 +37,7 @@ namespace cql3 {
cql3_type cql3_type::raw::prepare(database& db, const sstring& keyspace) {
try {
auto&& ks = db.find_keyspace(keyspace);
return prepare_internal(keyspace, *ks.metadata()->user_types());
return prepare_internal(keyspace, ks.metadata()->user_types());
} catch (no_such_keyspace& nsk) {
throw exceptions::invalid_request_exception("Unknown keyspace " + keyspace);
}
@@ -54,10 +54,6 @@ bool cql3_type::raw::references_user_type(const sstring& name) const {
class cql3_type::raw_type : public raw {
private:
cql3_type _type;
virtual sstring to_string() const override {
return _type.to_string();
}
public:
raw_type(cql3_type type)
: _type{type}
@@ -66,7 +62,7 @@ public:
virtual cql3_type prepare(database& db, const sstring& keyspace) {
return _type;
}
cql3_type prepare_internal(const sstring&, user_types_metadata&) override {
cql3_type prepare_internal(const sstring&, lw_shared_ptr<user_types_metadata>) override {
return _type;
}
@@ -78,30 +74,21 @@ public:
return _type.is_counter();
}
virtual sstring to_string() const {
return _type.to_string();
}
virtual bool is_duration() const override {
return _type.get_type() == duration_type;
return _type.get_type()->equals(duration_type);
}
};
class cql3_type::raw_collection : public raw {
const abstract_type::kind _kind;
const collection_type_impl::kind* _kind;
shared_ptr<raw> _keys;
shared_ptr<raw> _values;
virtual sstring to_string() const override {
sstring start = is_frozen() ? "frozen<" : "";
sstring end = is_frozen() ? ">" : "";
if (_kind == abstract_type::kind::list) {
return format("{}list<{}>{}", start, _values, end);
} else if (_kind == abstract_type::kind::set) {
return format("{}set<{}>{}", start, _values, end);
} else if (_kind == abstract_type::kind::map) {
return format("{}map<{}, {}>{}", start, _keys, _values, end);
}
abort();
}
public:
raw_collection(const abstract_type::kind kind, shared_ptr<raw> keys, shared_ptr<raw> values)
raw_collection(const collection_type_impl::kind* kind, shared_ptr<raw> keys, shared_ptr<raw> values)
: _kind(kind), _keys(std::move(keys)), _values(std::move(values)) {
}
@@ -123,37 +110,35 @@ public:
return true;
}
virtual cql3_type prepare_internal(const sstring& keyspace, user_types_metadata& user_types) override {
virtual cql3_type prepare_internal(const sstring& keyspace, lw_shared_ptr<user_types_metadata> user_types) override {
assert(_values); // "Got null values type for a collection";
if (!is_frozen() && _values->supports_freezing() && !_values->is_frozen()) {
throw exceptions::invalid_request_exception(
format("Non-frozen user types or collections are not allowed inside collections: {}", *this));
if (!_frozen && _values->supports_freezing() && !_values->_frozen) {
throw exceptions::invalid_request_exception(format("Non-frozen collections are not allowed inside collections: {}", *this));
}
if (_values->is_counter()) {
throw exceptions::invalid_request_exception(format("Counters are not allowed inside collections: {}", *this));
}
if (_keys) {
if (!is_frozen() && _keys->supports_freezing() && !_keys->is_frozen()) {
throw exceptions::invalid_request_exception(
format("Non-frozen user types or collections are not allowed inside collections: {}", *this));
if (!_frozen && _keys->supports_freezing() && !_keys->_frozen) {
throw exceptions::invalid_request_exception(format("Non-frozen collections are not allowed inside collections: {}", *this));
}
}
if (_kind == abstract_type::kind::list) {
return cql3_type(list_type_impl::get_instance(_values->prepare_internal(keyspace, user_types).get_type(), !is_frozen()));
} else if (_kind == abstract_type::kind::set) {
if (_kind == &collection_type_impl::kind::list) {
return cql3_type(list_type_impl::get_instance(_values->prepare_internal(keyspace, user_types).get_type(), !_frozen));
} else if (_kind == &collection_type_impl::kind::set) {
if (_values->is_duration()) {
throw exceptions::invalid_request_exception(format("Durations are not allowed inside sets: {}", *this));
}
return cql3_type(set_type_impl::get_instance(_values->prepare_internal(keyspace, user_types).get_type(), !is_frozen()));
} else if (_kind == abstract_type::kind::map) {
return cql3_type(set_type_impl::get_instance(_values->prepare_internal(keyspace, user_types).get_type(), !_frozen));
} else if (_kind == &collection_type_impl::kind::map) {
assert(_keys); // "Got null keys type for a collection";
if (_keys->is_duration()) {
throw exceptions::invalid_request_exception(format("Durations are not allowed as map keys: {}", *this));
}
return cql3_type(map_type_impl::get_instance(_keys->prepare_internal(keyspace, user_types).get_type(), _values->prepare_internal(keyspace, user_types).get_type(), !is_frozen()));
return cql3_type(map_type_impl::get_instance(_keys->prepare_internal(keyspace, user_types).get_type(), _values->prepare_internal(keyspace, user_types).get_type(), !_frozen));
}
abort();
}
@@ -165,18 +150,23 @@ public:
bool is_duration() const override {
return false;
}
virtual sstring to_string() const override {
sstring start = _frozen ? "frozen<" : "";
sstring end = _frozen ? ">" : "";
if (_kind == &collection_type_impl::kind::list) {
return format("{}list<{}>{}", start, _values, end);
} else if (_kind == &collection_type_impl::kind::set) {
return format("{}set<{}>{}", start, _values, end);
} else if (_kind == &collection_type_impl::kind::map) {
return format("{}map<{}, {}>{}", start, _keys, _values, end);
}
abort();
}
};
class cql3_type::raw_ut : public raw {
ut_name _name;
virtual sstring to_string() const override {
if (is_frozen()) {
return format("frozen<{}>", _name.to_string());
}
return _name.to_string();
}
public:
raw_ut(ut_name name)
: _name(std::move(name)) {
@@ -190,7 +180,7 @@ public:
_frozen = true;
}
virtual cql3_type prepare_internal(const sstring& keyspace, user_types_metadata& user_types) override {
virtual cql3_type prepare_internal(const sstring& keyspace, lw_shared_ptr<user_types_metadata> user_types) override {
if (_name.has_keyspace()) {
// The provided keyspace is the one of the current statement this is part of. If it's different from the keyspace of
// the UTName, we reject since we want to limit user types to their own keyspace (see #6643)
@@ -202,10 +192,14 @@ public:
} else {
_name.set_keyspace(keyspace);
}
if (!user_types) {
// bootstrap mode.
throw exceptions::invalid_request_exception(format("Unknown type {}", _name));
}
try {
data_type type = user_types.get_type(_name.get_user_type_name());
if (is_frozen()) {
type = type->freeze();
auto&& type = user_types->get_type(_name.get_user_type_name());
if (!_frozen) {
throw exceptions::invalid_request_exception("Non-frozen User-Defined types are not supported, please use frozen<>");
}
return cql3_type(std::move(type));
} catch (std::out_of_range& e) {
@@ -219,18 +213,14 @@ public:
return true;
}
virtual bool is_user_type() const override {
return true;
virtual sstring to_string() const override {
return _name.to_string();
}
};
class cql3_type::raw_tuple : public raw {
std::vector<shared_ptr<raw>> _types;
virtual sstring to_string() const override {
return format("tuple<{}>", join(", ", _types));
}
public:
raw_tuple(std::vector<shared_ptr<raw>> types)
: _types(std::move(types)) {
@@ -249,8 +239,8 @@ public:
}
_frozen = true;
}
virtual cql3_type prepare_internal(const sstring& keyspace, user_types_metadata& user_types) override {
if (!is_frozen()) {
virtual cql3_type prepare_internal(const sstring& keyspace, lw_shared_ptr<user_types_metadata> user_types) override {
if (!_frozen) {
freeze();
}
std::vector<data_type> ts;
@@ -268,6 +258,10 @@ public:
return t->references_user_type(name);
});
}
virtual sstring to_string() const override {
return format("tuple<{}>", join(", ", _types));
}
};
bool
@@ -280,16 +274,6 @@ cql3_type::raw::is_counter() const {
return false;
}
bool
cql3_type::raw::is_user_type() const {
return false;
}
bool
cql3_type::raw::is_frozen() const {
return _frozen;
}
std::optional<sstring>
cql3_type::raw::keyspace() const {
return std::nullopt;
@@ -313,17 +297,17 @@ cql3_type::raw::user_type(ut_name name) {
shared_ptr<cql3_type::raw>
cql3_type::raw::map(shared_ptr<raw> t1, shared_ptr<raw> t2) {
return make_shared(raw_collection(abstract_type::kind::map, std::move(t1), std::move(t2)));
return make_shared(raw_collection(&collection_type_impl::kind::map, std::move(t1), std::move(t2)));
}
shared_ptr<cql3_type::raw>
cql3_type::raw::list(shared_ptr<raw> t) {
return make_shared(raw_collection(abstract_type::kind::list, {}, std::move(t)));
return make_shared(raw_collection(&collection_type_impl::kind::list, {}, std::move(t)));
}
shared_ptr<cql3_type::raw>
cql3_type::raw::set(shared_ptr<raw> t) {
return make_shared(raw_collection(abstract_type::kind::set, {}, std::move(t)));
return make_shared(raw_collection(&collection_type_impl::kind::set, {}, std::move(t)));
}
shared_ptr<cql3_type::raw>

View File

@@ -60,28 +60,23 @@ public:
bool is_collection() const { return _type->is_collection(); }
bool is_counter() const { return _type->is_counter(); }
bool is_native() const { return _type->is_native(); }
bool is_user_type() const { return _type->is_user_type(); }
data_type get_type() const { return _type; }
const sstring& to_string() const { return _type->cql3_type_name(); }
// For UserTypes, we need to know the current keyspace to resolve the
// actual type used, so Raw is a "not yet prepared" CQL3Type.
class raw {
virtual sstring to_string() const = 0;
protected:
bool _frozen = false;
public:
virtual ~raw() {}
bool _frozen = false;
virtual bool supports_freezing() const = 0;
virtual bool is_collection() const;
virtual bool is_counter() const;
virtual bool is_duration() const;
virtual bool is_user_type() const;
bool is_frozen() const;
virtual bool references_user_type(const sstring&) const;
virtual std::optional<sstring> keyspace() const;
virtual void freeze();
virtual cql3_type prepare_internal(const sstring& keyspace, user_types_metadata&) = 0;
virtual cql3_type prepare_internal(const sstring& keyspace, lw_shared_ptr<user_types_metadata>) = 0;
virtual cql3_type prepare(database& db, const sstring& keyspace);
static shared_ptr<raw> from(cql3_type type);
static shared_ptr<raw> user_type(ut_name name);
@@ -90,6 +85,7 @@ public:
static shared_ptr<raw> set(shared_ptr<raw> t);
static shared_ptr<raw> tuple(std::vector<shared_ptr<raw>> ts);
static shared_ptr<raw> frozen(shared_ptr<raw> t);
virtual sstring to_string() const = 0;
friend std::ostream& operator<<(std::ostream& os, const raw& r);
};

View File

@@ -1,36 +0,0 @@
/*
* Copyright (C) 2019 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "restrictions/restrictions_config.hh"
namespace cql3 {
struct cql_config {
restrictions::restrictions_config restrictions;
};
extern const cql_config default_cql_config;
}

View File

@@ -115,21 +115,4 @@ public:
}
};
// Conditional modification statements and batches
// return a result set and have metadata, while same
// statements without conditions do not.
class cql_statement_opt_metadata : public cql_statement {
protected:
// Result set metadata, may be empty for simple updates and batches
shared_ptr<metadata> _metadata;
public:
using cql_statement::cql_statement;
virtual shared_ptr<const metadata> get_result_metadata() const override {
if (_metadata) {
return _metadata;
}
return make_empty_metadata();
}
};
}

View File

@@ -233,13 +233,13 @@ struct aggregate_type_for {
};
template<>
struct aggregate_type_for<ascii_native_type> {
using type = ascii_native_type::primary_type;
struct aggregate_type_for<simple_date_native_type> {
using type = simple_date_native_type::primary_type;
};
template<>
struct aggregate_type_for<simple_date_native_type> {
using type = simple_date_native_type::primary_type;
struct aggregate_type_for<timestamp_native_type> {
using type = timestamp_native_type::primary_type;
};
template<>

View File

@@ -27,13 +27,10 @@
#include "cql3/sets.hh"
#include "cql3/lists.hh"
#include "cql3/constants.hh"
#include "cql3/user_types.hh"
#include "database.hh"
#include "types/map.hh"
#include "types/set.hh"
#include "types/list.hh"
#include "types/user.hh"
#include "concrete_types.hh"
namespace cql3 {
namespace functions {
@@ -114,17 +111,13 @@ functions::init() {
declare(aggregate_fcts::make_max_function<sstring>());
declare(aggregate_fcts::make_min_function<sstring>());
declare(aggregate_fcts::make_count_function<ascii_native_type>());
declare(aggregate_fcts::make_max_function<ascii_native_type>());
declare(aggregate_fcts::make_min_function<ascii_native_type>());
declare(aggregate_fcts::make_count_function<simple_date_native_type>());
declare(aggregate_fcts::make_max_function<simple_date_native_type>());
declare(aggregate_fcts::make_min_function<simple_date_native_type>());
declare(aggregate_fcts::make_count_function<db_clock::time_point>());
declare(aggregate_fcts::make_max_function<db_clock::time_point>());
declare(aggregate_fcts::make_min_function<db_clock::time_point>());
declare(aggregate_fcts::make_count_function<timestamp_native_type>());
declare(aggregate_fcts::make_max_function<timestamp_native_type>());
declare(aggregate_fcts::make_min_function<timestamp_native_type>());
declare(aggregate_fcts::make_count_function<timeuuid_native_type>());
declare(aggregate_fcts::make_max_function<timeuuid_native_type>());
@@ -138,10 +131,6 @@ functions::init() {
declare(aggregate_fcts::make_max_function<bytes>());
declare(aggregate_fcts::make_min_function<bytes>());
declare(aggregate_fcts::make_count_function<bool>());
declare(aggregate_fcts::make_max_function<bool>());
declare(aggregate_fcts::make_min_function<bool>());
// FIXME: more count/min/max
declare(make_varchar_as_blob_fct());
@@ -205,7 +194,7 @@ make_from_json_function(database& db, const sstring& keyspace, data_type t) {
if (!json_value.isNull()) {
parsed_json_value.emplace(t->from_json_object(json_value, sf));
}
return parsed_json_value;
return std::move(parsed_json_value);
});
}
@@ -470,34 +459,23 @@ function_call::contains_bind_marker() const {
shared_ptr<terminal>
function_call::make_terminal(shared_ptr<function> fun, cql3::raw_value result, cql_serialization_format sf) {
static constexpr auto to_buffer = [] (const cql3::raw_value& v) {
if (v) {
return fragmented_temporary_buffer::view{bytes_view{*v}};
}
return fragmented_temporary_buffer::view{};
};
return visit(*fun->return_type(), make_visitor(
[&] (const list_type_impl& ltype) -> shared_ptr<terminal> {
return make_shared(lists::value::from_serialized(to_buffer(result), ltype, sf));
},
[&] (const set_type_impl& stype) -> shared_ptr<terminal> {
return make_shared(sets::value::from_serialized(to_buffer(result), stype, sf));
},
[&] (const map_type_impl& mtype) -> shared_ptr<terminal> {
return make_shared(maps::value::from_serialized(to_buffer(result), mtype, sf));
},
[&] (const user_type_impl& utype) -> shared_ptr<terminal> {
// TODO (kbraun): write a test for this case when User Defined Functions are implemented
return make_shared(user_types::value::from_serialized(to_buffer(result), utype));
},
[&] (const abstract_type& type) -> shared_ptr<terminal> {
if (type.is_collection()) {
throw std::runtime_error(format("function_call::make_terminal: unhandled collection type {}", type.name()));
}
return make_shared<constants::value>(std::move(result));
if (!dynamic_pointer_cast<const collection_type_impl>(fun->return_type())) {
return ::make_shared<constants::value>(std::move(result));
}
));
auto ctype = static_pointer_cast<const collection_type_impl>(fun->return_type());
fragmented_temporary_buffer::view res;
if (result) {
res = fragmented_temporary_buffer::view(bytes_view(*result));
}
if (&ctype->_kind == &collection_type_impl::kind::list) {
return make_shared(lists::value::from_serialized(std::move(res), static_pointer_cast<const list_type_impl>(ctype), sf));
} else if (&ctype->_kind == &collection_type_impl::kind::set) {
return make_shared(sets::value::from_serialized(std::move(res), static_pointer_cast<const set_type_impl>(ctype), sf));
} else if (&ctype->_kind == &collection_type_impl::kind::map) {
return make_shared(maps::value::from_serialized(std::move(res), static_pointer_cast<const map_type_impl>(ctype), sf));
}
abort();
}
::shared_ptr<term>
@@ -573,7 +551,7 @@ function_call::raw::test_assignment(database& db, const sstring& keyspace, share
// later with a more helpful error message that if we were to return false here.
try {
auto&& fun = functions::get(db, keyspace, _name, _terms, receiver->ks_name, receiver->cf_name, receiver);
if (fun && receiver->type == fun->return_type()) {
if (fun && receiver->type->equals(fun->return_type())) {
return assignment_testable::test_result::EXACT_MATCH;
} else if (!fun || receiver->type->is_value_compatible_with(*fun->return_type())) {
return assignment_testable::test_result::WEAKLY_ASSIGNABLE;

View File

@@ -151,7 +151,7 @@ inline shared_ptr<function>
make_currenttimestamp_fct() {
return make_native_scalar_function<true>("currenttimestamp", timestamp_type, {},
[] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
return {timestamp_type->decompose(db_clock::now())};
return {timestamp_type->decompose(timestamp_native_type{db_clock::now()})};
});
}
@@ -171,7 +171,7 @@ make_currentdate_fct() {
return make_native_scalar_function<true>("currentdate", simple_date_type, {},
[] (cql_serialization_format sf, const std::vector<bytes_opt>& values) -> bytes_opt {
auto to_simple_date = get_castas_fctn(simple_date_type, timestamp_type);
return {simple_date_type->decompose(to_simple_date(db_clock::now()))};
return {simple_date_type->decompose(to_simple_date(timestamp_native_type{db_clock::now()}))};
});
}

View File

@@ -54,14 +54,6 @@ shared_ptr<term>
lists::literal::prepare(database& db, const sstring& keyspace, shared_ptr<column_specification> receiver) {
validate_assignable_to(db, keyspace, receiver);
// In Cassandra, an empty (unfrozen) map/set/list is equivalent to the column being null. In
// other words a non-frozen collection only exists if it has elements. Return nullptr right
// away to simplify predicate evaluation. See also
// https://issues.apache.org/jira/browse/CASSANDRA-5141
if (receiver->type->is_multi_cell() && _elements.empty()) {
return cql3::constants::null_literal::NULL_VALUE;
}
auto&& value_spec = value_spec_of(receiver);
std::vector<shared_ptr<term>> values;
values.reserve(_elements.size());
@@ -124,24 +116,24 @@ lists::literal::to_string() const {
}
lists::value
lists::value::from_serialized(const fragmented_temporary_buffer::view& val, const list_type_impl& type, cql_serialization_format sf) {
lists::value::from_serialized(const fragmented_temporary_buffer::view& val, list_type type, cql_serialization_format sf) {
return with_linearized(val, [&] (bytes_view v) {
return from_serialized(v, type, sf);
});
}
lists::value
lists::value::from_serialized(bytes_view v, const list_type_impl& type, cql_serialization_format sf) {
lists::value::from_serialized(bytes_view v, list_type type, cql_serialization_format sf) {
try {
// Collections have this small hack that validate cannot be called on a serialized object,
// but compose does the validation (so we're fine).
// FIXME: deserializeForNativeProtocol()?!
auto l = value_cast<list_type_impl::native_type>(type.deserialize(v, sf));
auto l = value_cast<list_type_impl::native_type>(type->deserialize(v, sf));
std::vector<bytes_opt> elements;
elements.reserve(l.size());
for (auto&& element : l) {
// elements can be null in lists that represent a set of IN values
elements.push_back(element.is_null() ? bytes_opt() : bytes_opt(type.get_elements_type()->decompose(element)));
elements.push_back(element.is_null() ? bytes_opt() : bytes_opt(type->get_elements_type()->decompose(element)));
}
return value(std::move(elements));
} catch (marshal_exception& e) {
@@ -174,7 +166,7 @@ lists::value::equals(shared_ptr<list_type_impl> lt, const value& v) {
[t = lt->get_elements_type()] (const bytes_opt& e1, const bytes_opt& e2) { return t->equal(*e1, *e2); });
}
const std::vector<bytes_opt>&
std::vector<bytes_opt>
lists::value::get_elements() {
return _elements;
}
@@ -227,7 +219,7 @@ lists::delayed_value::bind(const query_options& options) {
::shared_ptr<terminal>
lists::marker::bind(const query_options& options) {
const auto& value = options.get_value_at(_bind_index);
auto& ltype = static_cast<const list_type_impl&>(*_receiver->type);
auto ltype = static_pointer_cast<const list_type_impl>(_receiver->type);
if (value.is_null()) {
return nullptr;
} else if (value.is_unset_value()) {
@@ -235,8 +227,8 @@ lists::marker::bind(const query_options& options) {
} else {
try {
return with_linearized(*value, [&] (bytes_view v) {
ltype.validate(v, options.get_cql_serialization_format());
return make_shared(value::from_serialized(v, ltype, options.get_cql_serialization_format()));
ltype->validate(v, options.get_cql_serialization_format());
return make_shared(value::from_serialized(v, std::move(ltype), options.get_cql_serialization_format()));
});
} catch (marshal_exception& e) {
throw exceptions::invalid_request_exception(e.what());
@@ -270,11 +262,12 @@ lists::setter::execute(mutation& m, const clustering_key_prefix& prefix, const u
return;
}
if (column.type->is_multi_cell()) {
// Delete all cells first, then append new ones
collection_mutation_view_description mut;
// delete + append
collection_type_impl::mutation mut;
mut.tomb = params.make_tombstone_just_before();
m.set_cell(prefix, column, mut.serialize(*column.type));
auto ctype = static_pointer_cast<const list_type_impl>(column.type);
auto col_mut = ctype->serialize_mutation_form(std::move(mut));
m.set_cell(prefix, column, std::move(col_mut));
}
do_append(value, m, prefix, column, params);
}
@@ -310,10 +303,11 @@ lists::setter_by_index::execute(mutation& m, const clustering_key_prefix& prefix
auto idx = with_linearized(*index, [] (bytes_view v) {
return value_cast<int32_t>(data_type_for<int32_t>()->deserialize(v));
});
auto&& existing_list_opt = params.get_prefetched_list(m.key(), prefix, column);
auto&& existing_list_opt = params.get_prefetched_list(m.key().view(), prefix.view(), column);
if (!existing_list_opt) {
throw exceptions::invalid_request_exception("Attempted to set an element on a list which is null");
}
auto ltype = dynamic_pointer_cast<const list_type_impl>(column.type);
auto&& existing_list = *existing_list_opt;
// we verified that index is an int32_type
if (idx < 0 || size_t(idx) >= existing_list.size()) {
@@ -321,19 +315,16 @@ lists::setter_by_index::execute(mutation& m, const clustering_key_prefix& prefix
idx, existing_list.size()));
}
auto ltype = static_cast<const list_type_impl*>(column.type.get());
const data_value& eidx_dv = existing_list[idx].first;
bytes eidx = eidx_dv.type()->decompose(eidx_dv);
collection_mutation_description mut;
const bytes& eidx = existing_list[idx].key;
list_type_impl::mutation mut;
mut.cells.reserve(1);
if (!value) {
mut.cells.emplace_back(std::move(eidx), params.make_dead_cell());
mut.cells.emplace_back(eidx, params.make_dead_cell());
} else {
mut.cells.emplace_back(std::move(eidx),
params.make_cell(*ltype->value_comparator(), *value, atomic_cell::collection_member::yes));
mut.cells.emplace_back(eidx, params.make_cell(*ltype->value_comparator(), *value, atomic_cell::collection_member::yes));
}
m.set_cell(prefix, column, mut.serialize(*ltype));
auto smut = ltype->serialize_mutation_form(mut);
m.set_cell(prefix, column, atomic_cell_or_collection::from_collection_mutation(std::move(smut)));
}
bool
@@ -353,13 +344,15 @@ lists::setter_by_uuid::execute(mutation& m, const clustering_key_prefix& prefix,
throw exceptions::invalid_request_exception("Invalid null value for list index");
}
auto ltype = static_cast<const list_type_impl*>(column.type.get());
auto ltype = dynamic_pointer_cast<const list_type_impl>(column.type);
collection_mutation_description mut;
list_type_impl::mutation mut;
mut.cells.reserve(1);
mut.cells.emplace_back(to_bytes(*index), params.make_cell(*ltype->value_comparator(), *value, atomic_cell::collection_member::yes));
m.set_cell(prefix, column, mut.serialize(*ltype));
auto smut = ltype->serialize_mutation_form(mut);
m.set_cell(prefix, column,
atomic_cell_or_collection::from_collection_mutation(
std::move(smut)));
}
void
@@ -379,6 +372,7 @@ lists::do_append(shared_ptr<term> value,
const column_definition& column,
const update_parameters& params) {
auto&& list_value = dynamic_pointer_cast<lists::value>(value);
auto&& ltype = dynamic_pointer_cast<const list_type_impl>(column.type);
if (column.type->is_multi_cell()) {
// If we append null, do nothing. Note that for Setter, we've
// already removed the previous value so we're good here too
@@ -386,10 +380,8 @@ lists::do_append(shared_ptr<term> value,
return;
}
auto ltype = static_cast<const list_type_impl*>(column.type.get());
auto&& to_add = list_value->_elements;
collection_mutation_description appended;
collection_type_impl::mutation appended;
appended.cells.reserve(to_add.size());
for (auto&& e : to_add) {
auto uuid1 = utils::UUID_gen::get_time_UUID_bytes();
@@ -397,7 +389,7 @@ lists::do_append(shared_ptr<term> value,
// FIXME: can e be empty?
appended.cells.emplace_back(std::move(uuid), params.make_cell(*ltype->value_comparator(), *e, atomic_cell::collection_member::yes));
}
m.set_cell(prefix, column, appended.serialize(*ltype));
m.set_cell(prefix, column, ltype->serialize_mutation_form(appended));
} else {
// for frozen lists, we're overwriting the whole cell value
if (!value) {
@@ -421,11 +413,11 @@ lists::prepender::execute(mutation& m, const clustering_key_prefix& prefix, cons
assert(lvalue);
auto time = precision_time::REFERENCE_TIME - (db_clock::now() - precision_time::REFERENCE_TIME);
collection_mutation_description mut;
collection_type_impl::mutation mut;
mut.cells.reserve(lvalue->get_elements().size());
// We reverse the order of insertion, so that the last element gets the lastest time
// (lists are sorted by time)
auto ltype = static_cast<const list_type_impl*>(column.type.get());
auto&& ltype = static_cast<const list_type_impl*>(column.type.get());
for (auto&& v : lvalue->_elements | boost::adaptors::reversed) {
auto&& pt = precision_time::get_next(time);
auto uuid = utils::UUID_gen::get_time_UUID_bytes(pt.millis.time_since_epoch().count(), pt.nanos);
@@ -433,7 +425,7 @@ lists::prepender::execute(mutation& m, const clustering_key_prefix& prefix, cons
}
// now reverse again, to get the original order back
std::reverse(mut.cells.begin(), mut.cells.end());
m.set_cell(prefix, column, mut.serialize(*ltype));
m.set_cell(prefix, column, atomic_cell_or_collection::from_collection_mutation(ltype->serialize_mutation_form(std::move(mut))));
}
bool
@@ -445,10 +437,12 @@ void
lists::discarder::execute(mutation& m, const clustering_key_prefix& prefix, const update_parameters& params) {
assert(column.type->is_multi_cell()); // "Attempted to delete from a frozen list";
auto&& existing_list = params.get_prefetched_list(m.key(), prefix, column);
auto&& existing_list = params.get_prefetched_list(m.key().view(), prefix.view(), column);
// We want to call bind before possibly returning to reject queries where the value provided is not a list.
auto&& value = _t->bind(params._options);
auto&& ltype = static_pointer_cast<const list_type_impl>(column.type);
if (!existing_list) {
return;
}
@@ -466,27 +460,24 @@ lists::discarder::execute(mutation& m, const clustering_key_prefix& prefix, cons
auto lvalue = dynamic_pointer_cast<lists::value>(value);
assert(lvalue);
auto ltype = static_cast<const list_type_impl*>(column.type.get());
// Note: below, we will call 'contains' on this toDiscard list for each element of existingList.
// Meaning that if toDiscard is big, converting it to a HashSet might be more efficient. However,
// the read-before-write this operation requires limits its usefulness on big lists, so in practice
// toDiscard will be small and keeping a list will be more efficient.
auto&& to_discard = lvalue->_elements;
collection_mutation_description mnew;
collection_type_impl::mutation mnew;
for (auto&& cell : elist) {
auto has_value = [&] (bytes_view value) {
auto have_value = [&] (bytes_view value) {
return std::find_if(to_discard.begin(), to_discard.end(),
[ltype, value] (auto&& v) { return ltype->get_elements_type()->equal(*v, value); })
!= to_discard.end();
};
bytes eidx = cell.first.type()->decompose(cell.first);
bytes value = cell.second.type()->decompose(cell.second);
if (has_value(value)) {
mnew.cells.emplace_back(std::move(eidx), params.make_dead_cell());
if (have_value(cell.value)) {
mnew.cells.emplace_back(cell.key, params.make_dead_cell());
}
}
m.set_cell(prefix, column, mnew.serialize(*ltype));
auto mnew_ser = ltype->serialize_mutation_form(mnew);
m.set_cell(prefix, column, atomic_cell_or_collection::from_collection_mutation(std::move(mnew_ser)));
}
bool
@@ -505,10 +496,11 @@ lists::discarder_by_index::execute(mutation& m, const clustering_key_prefix& pre
return;
}
auto ltype = static_pointer_cast<const list_type_impl>(column.type);
auto cvalue = dynamic_pointer_cast<constants::value>(index);
assert(cvalue);
auto&& existing_list_opt = params.get_prefetched_list(m.key(), prefix, column);
auto&& existing_list_opt = params.get_prefetched_list(m.key().view(), prefix.view(), column);
int32_t idx = read_simple_exactly<int32_t>(*cvalue->_bytes);
if (!existing_list_opt) {
throw exceptions::invalid_request_exception("Attempted to delete an element from a list which is null");
@@ -517,11 +509,9 @@ lists::discarder_by_index::execute(mutation& m, const clustering_key_prefix& pre
if (idx < 0 || size_t(idx) >= existing_list.size()) {
throw exceptions::invalid_request_exception(format("List index {:d} out of bound, list has size {:d}", idx, existing_list.size()));
}
collection_mutation_description mut;
const data_value& eidx_dv = existing_list[idx].first;
bytes eidx = eidx_dv.type()->decompose(eidx_dv);
mut.cells.emplace_back(std::move(eidx), params.make_dead_cell());
m.set_cell(prefix, column, mut.serialize(*column.type));
collection_type_impl::mutation mut;
mut.cells.emplace_back(existing_list[idx].key, params.make_dead_cell());
m.set_cell(prefix, column, ltype->serialize_mutation_form(mut));
}
}

View File

@@ -73,18 +73,18 @@ public:
};
class value : public multi_item_terminal, collection_terminal {
static value from_serialized(bytes_view v, const list_type_impl& type, cql_serialization_format sf);
static value from_serialized(bytes_view v, list_type type, cql_serialization_format sf);
public:
std::vector<bytes_opt> _elements;
public:
explicit value(std::vector<bytes_opt> elements)
: _elements(std::move(elements)) {
}
static value from_serialized(const fragmented_temporary_buffer::view& v, const list_type_impl& type, cql_serialization_format sf);
static value from_serialized(const fragmented_temporary_buffer::view& v, list_type type, cql_serialization_format sf);
virtual cql3::raw_value get(const query_options& options) override;
virtual bytes get_with_protocol_version(cql_serialization_format sf) override;
bool equals(shared_ptr<list_type_impl> lt, const value& v);
virtual const std::vector<bytes_opt>& get_elements() override;
virtual std::vector<bytes_opt> get_elements() override;
virtual sstring to_string() const;
friend class lists;
};

View File

@@ -153,17 +153,17 @@ maps::literal::to_string() const {
}
maps::value
maps::value::from_serialized(const fragmented_temporary_buffer::view& fragmented_value, const map_type_impl& type, cql_serialization_format sf) {
maps::value::from_serialized(const fragmented_temporary_buffer::view& fragmented_value, map_type type, cql_serialization_format sf) {
try {
// Collections have this small hack that validate cannot be called on a serialized object,
// but compose does the validation (so we're fine).
// FIXME: deserialize_for_native_protocol?!
return with_linearized(fragmented_value, [&] (bytes_view value) {
auto m = value_cast<map_type_impl::native_type>(type.deserialize(value, sf));
std::map<bytes, bytes, serialized_compare> map(type.get_keys_type()->as_less_comparator());
auto m = value_cast<map_type_impl::native_type>(type->deserialize(value, sf));
std::map<bytes, bytes, serialized_compare> map(type->get_keys_type()->as_less_comparator());
for (auto&& e : m) {
map.emplace(type.get_keys_type()->decompose(e.first),
type.get_values_type()->decompose(e.second));
map.emplace(type->get_keys_type()->decompose(e.first),
type->get_values_type()->decompose(e.second));
}
return maps::value { std::move(map) };
});
@@ -269,7 +269,8 @@ maps::marker::bind(const query_options& options) {
} catch (marshal_exception& e) {
throw exceptions::invalid_request_exception(e.what());
}
return ::make_shared(maps::value::from_serialized(*val, static_cast<const map_type_impl&>(*_receiver->type), options.get_cql_serialization_format()));
return ::make_shared<maps::value>(maps::value::from_serialized(*val, static_pointer_cast<const map_type_impl>(_receiver->type),
options.get_cql_serialization_format()));
}
void
@@ -284,10 +285,12 @@ maps::setter::execute(mutation& m, const clustering_key_prefix& row_key, const u
return;
}
if (column.type->is_multi_cell()) {
// Delete all cells first, then put new ones
collection_mutation_description mut;
// delete + put
collection_type_impl::mutation mut;
mut.tomb = params.make_tombstone_just_before();
m.set_cell(row_key, column, mut.serialize(*column.type));
auto ctype = static_pointer_cast<const map_type_impl>(column.type);
auto col_mut = ctype->serialize_mutation_form(std::move(mut));
m.set_cell(row_key, column, std::move(col_mut));
}
do_put(m, row_key, params, value, column);
}
@@ -307,12 +310,13 @@ maps::setter_by_key::execute(mutation& m, const clustering_key_prefix& prefix, c
if (!key) {
throw invalid_request_exception("Invalid null map key");
}
auto ctype = static_cast<const map_type_impl*>(column.type.get());
auto ctype = static_pointer_cast<const map_type_impl>(column.type);
auto avalue = value ? params.make_cell(*ctype->get_values_type(), *value, atomic_cell::collection_member::yes) : params.make_dead_cell();
collection_mutation_description update;
map_type_impl::mutation update;
update.cells.emplace_back(std::move(to_bytes(*key)), std::move(avalue));
m.set_cell(prefix, column, update.serialize(*ctype));
// should have been verified as map earlier?
auto col_mut = ctype->serialize_mutation_form(std::move(update));
m.set_cell(prefix, column, std::move(col_mut));
}
void
@@ -329,18 +333,18 @@ maps::do_put(mutation& m, const clustering_key_prefix& prefix, const update_para
shared_ptr<term> value, const column_definition& column) {
auto map_value = dynamic_pointer_cast<maps::value>(value);
if (column.type->is_multi_cell()) {
collection_type_impl::mutation mut;
if (!value) {
return;
}
collection_mutation_description mut;
auto ctype = static_cast<const map_type_impl*>(column.type.get());
auto ctype = static_pointer_cast<const map_type_impl>(column.type);
for (auto&& e : map_value->map) {
mut.cells.emplace_back(e.first, params.make_cell(*ctype->get_values_type(), fragmented_temporary_buffer::view(e.second), atomic_cell::collection_member::yes));
}
m.set_cell(prefix, column, mut.serialize(*ctype));
auto col_mut = ctype->serialize_mutation_form(std::move(mut));
m.set_cell(prefix, column, std::move(col_mut));
} else {
// for frozen maps, we're overwriting the whole cell
if (!value) {
@@ -363,10 +367,10 @@ maps::discarder_by_key::execute(mutation& m, const clustering_key_prefix& prefix
if (key == constants::UNSET_VALUE) {
throw exceptions::invalid_request_exception("Invalid unset map key");
}
collection_mutation_description mut;
collection_type_impl::mutation mut;
mut.cells.emplace_back(*key->get(params._options), params.make_dead_cell());
m.set_cell(prefix, column, mut.serialize(*column.type));
auto mtype = static_cast<const map_type_impl*>(column.type.get());
m.set_cell(prefix, column, mtype->serialize_mutation_form(mut));
}
}

View File

@@ -81,7 +81,7 @@ public:
value(std::map<bytes, bytes, serialized_compare> map)
: map(std::move(map)) {
}
static value from_serialized(const fragmented_temporary_buffer::view& value, const map_type_impl& type, cql_serialization_format sf);
static value from_serialized(const fragmented_temporary_buffer::view& value, map_type type, cql_serialization_format sf);
virtual cql3::raw_value get(const query_options& options) override;
virtual bytes get_with_protocol_version(cql_serialization_format sf);
bool equals(map_type mt, const value& v);

View File

@@ -188,11 +188,6 @@ protected:
throw exceptions::invalid_request_exception(format("{} cannot be used for Multi-column relations", get_operator()));
}
virtual ::shared_ptr<restrictions::restriction> new_LIKE_restriction(
database& db, schema_ptr schema, ::shared_ptr<variable_specifications> bound_names) override {
throw exceptions::invalid_request_exception("LIKE cannot be used for Multi-column relations");
}
virtual ::shared_ptr<relation> maybe_rename_identifier(const column_identifier::raw& from, column_identifier::raw to) override {
auto new_entities = boost::copy_range<decltype(_entities)>(_entities | boost::adaptors::transformed([&] (auto&& entity) {
return *entity == from ? ::make_shared<column_identifier::raw>(to) : entity;

Some files were not shown because too many files have changed in this diff Show More