Compare commits

...

26 Commits

Author SHA1 Message Date
Yaron Kaikov
f92622e0de release: prepare for 5.0.rc3 2022-04-06 14:31:03 +03:00
Takuya ASADA
3bca608db5 docker: run scylla as root
Previous versions of Docker image runs scylla as root, but cb19048
accidently modified it to scylla user.
To keep compatibility we need to revert this to root.

Fixes #10261

Closes #10325

(cherry picked from commit f95a531407)
2022-04-05 12:46:25 +03:00
Takuya ASADA
a93b72d5dd docker: revert scylla-server.conf service name change
We changed supervisor service name at cb19048, but this breaks
compatibility with scylla-operator.
To fix the issue we need to revert the service name to previous one.

Fixes #10269

Closes #10323

(cherry picked from commit 41edc045d9)
2022-04-05 12:40:59 +03:00
Benny Halevy
d58ca2edbd range_tombstone_list: insert_from: correct rev.update range_tombstone in not overlapping case
2nd std::move(start) looks like a typo
in fe2fa3f20d.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220404124741.1775076-1-bhalevy@scylladb.com>
(cherry picked from commit 2d80057617)

Fixes ##10326
2022-04-05 12:39:13 +03:00
Alexey Kartashov
75740ace2a dist/docker: fix incorrect locale value
Docker build script contains an incorrect locale specification for LC_ALL setting,
this commit fixes that.

Fixes #10310

Closes #10321

(cherry picked from commit d86c3a8061)
2022-04-04 12:51:02 +03:00
Piotr Sarna
d7a1bf6331 cql3: fix qualifying restrictions with IN for indexing
When a query contains IN restriction on its partition key,
it's currently not eligible for indexing. It was however
erroneously qualified as such, which lead to fetching incorrect
results. This commit fixes the issue by not allowing such queries
to undergo indexing, and comes with a regression test.

Fixes #10300

Closes #10302

(cherry picked from commit c0fd53a9d7)
2022-04-03 11:20:49 +03:00
Avi Kivity
bbd7d657cc Update seastar submodule (pidof command not installed)
* seastar 1c0d622ba0...4a30c44c4c (1):
  > seastar-cpu-map.sh: switch from pidof to pgrep
Fixes #10238.
2022-03-29 12:36:06 +03:00
Avi Kivity
f5bf4c81d1 Merge 'replica/database: truncate: temporarily disable compaction on table and views before flush' from Benny Halevy
Flushing the base table triggers view building
and corresponding compactions on the view tables.

Temporarily disable compaction on both the base
table and all its view before flush and snapshot
since those flushed sstables are about to be truncated
anyway right after the snapshot is taken.

This should make truncate go faster.

In the process, this series also embeds `database::truncate_views`
into `truncate` and coroutinizes both

Refs #6309

Test: unit(dev)

Closes #10203

* github.com:scylladb/scylla:
  replica/database: truncate: fixup indentation
  replica/database: truncate: temporarily disable compaction on table and views before flush
  replica/database: truncate: coroutinize per-view logic
  replica/database: open-code truncate_view in truncate
  replica/database: truncate: coroutinize run_with_compaction_disabled lambda
  replica/database: coroutinize truncate
  compaction_manager: add disable_compaction method

(cherry picked from commit aab052c0d5)
2022-03-28 15:40:40 +03:00
Benny Halevy
02e8336659 atomic_cell: compare_atomic_cell_for_merge: compare ttl if expiry is equal
Following up on a57c087c89,
compare_atomic_cell_for_merge should compare the ttl value in the
reverse order since, when comparing two cells that are identical
in all attributes but their ttl, we want to keep the cell with the
smaller ttl value rather than the larger ttl, since it was written
at a later (wall-clock) time, and so would remain longer after it
expires, until purged after gc_grace seconds.

Fixes #10173

Test: mutation_test.test_cell_ordering, unit(dev)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220302154328.2400717-1-bhalevy@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220306091913.106508-1-bhalevy@scylladb.com>
(cherry picked from commit a085ef74ff)
2022-03-24 18:00:11 +02:00
Benny Halevy
601812e11b atomic_cell: compare_atomic_cell_for_merge: compare ttl if expiry is equal
Unlike atomic_cell_or_collection::equals, compare_atomic_cell_for_merge
currently returns std::strong_ordering::equal if two cells are equal in
every way except their ttl:s.

The problem with that is that the cells' hashes are different and this
will cause repair to keep trying to repair discrepancies caused by the
ttl being different.

This may be triggered by e.g. the spark migrator that computes the ttl
based on the expiry time by subtracting the expiry time from the current
time to produce a respective ttl.

If the cell is migrated multiple times at different times, it will generate
cells that the same expiry (by design) but have different ttl values.

Fixes #10156

Test: mutation_test.test_cell_ordering, unit(dev)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220302154328.2400717-1-bhalevy@scylladb.com>
(cherry picked from commit a57c087c89)
2022-03-24 18:00:11 +02:00
Benny Halevy
ea466320d2 atomic_cell: compare_atomic_cell_for_merge: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220302113833.2308533-2-bhalevy@scylladb.com>
(cherry picked from commit d43da5d6dc)
2022-03-24 18:00:11 +02:00
Benny Halevy
25ea831a15 atomic_cell: compare_atomic_cell_for_merge: simplify expiry/deltion_time comparison
No need to check first the the cells' expiry is different
or that deletion_time is different before comparing them
with `<=>`.

If they are the same the function returns std::strong_ordering::equal
anyhow and that is the same as `<=>` comparing identical values.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220302113833.2308533-1-bhalevy@scylladb.com>
(cherry picked from commit be865a29b8)
2022-03-24 18:00:11 +02:00
Benny Halevy
8648c79c9e main: shutdown: do not abort on certain system errors
Currently any unhandled error during deferred shutdown
is rethrown in a noexcept context (in ~deferred_action),
generating a core dump.

The core dump is not helpful if the cause of the
error is "environmental", i.e. in the system, rather
than in scylla itself.

This change detects several such errors and calls
_Exit(255) to exit the process early, without leaving
a coredump behind.  Otherwise, call abort() explicitly,
rather than letting terminate() be called implicitly
by the destructor exception handling code.

Fixes #9573

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220227101054.1294368-1-bhalevy@scylladb.com>
(cherry picked from commit 132c9d5933)
2022-03-24 14:48:52 +02:00
Nadav Har'El
7ae4d0e6f8 Seastar: backport Seastar fix for missing scring escape in JSON output
Backported Seastar fix:
  > Merge 'json/formatter: Escape strings' from Juliusz Stasiewicz

Fixes #9061

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-03-23 20:29:50 +02:00
Piotr Sarna
f3564db941 expression: fix get_value for mismatched column definitions
As observed in #10026, after schema changes it somehow happened
that a column defition that does not match any of the base table
columns was passed to expression verification code.
The function that looks up the index of a column happens to return
-1 when it doesn't find anything, so using this returned index
without checking if it's nonnegative results in accessing invalid
vector data, and a segfault or silent memory corruption.
Therefore, an explicit check is added to see if the column was actually
found. This serves two purposes:
 - avoiding segfaults/memory corruption
 - making it easier to investigate the root cause of #10026

Closes #10039

(cherry picked from commit 7b364fec9849e9a342af1c240e3a7185bf5401ef)
2022-03-21 10:37:48 +01:00
Pavel Emelyanov
97caf12836 Update seastar submodule (IO preemption overlap)
* seastar 47573503...8ef87d48 (3):
  > io_queue: Don't let preemption overlap requests
  > io_queue: Pending needs to keep capacity instead of ticket
  > io_queue: Extend grab_capacity() return codes

Fixes #10233
2022-03-17 11:26:38 +03:00
Yaron Kaikov
839d9ef41a release: prepare for 5.0.rc2 2022-03-16 14:35:52 +02:00
Benny Halevy
782bd50f92 compaction_manager: rewrite_sstables: do not acquire table write lock
Since regular compaction may run in parallel no lock
is required per-table.

We still acquire a read lock in this patch, for backporting
purposes, in case the branch doesn't contain
6737c88045.
But it can be removed entirely in master in a follow-up patch.

This should solve some of the slowness in cleanup compaction (and
likely in upgrade sstables seen in #10060, and
possibly #10166.

Fixes #10175

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10177

(cherry picked from commit 11ea2ffc3c)
2022-03-14 13:13:48 +02:00
Avi Kivity
0a4d971b4a Merge 'utils: cached_file: Fix alloc-dealloc mismatch during eviction' from Tomasz Grabiec
cached_page::on_evicted() is invoked in the LSA allocator context, set in the
reclaimer callback installed by the cache_tracker. However,
cached_pages are allocated in the standard allocator context (note:
page content is allocated inside LSA via lsa_buffer). The LSA region
will happily deallocate these, thinking that they these are large
objects which were delegated to the standard allocator. But the
_non_lsa_memory_in_use metric will underflow. When it underflows
enough, shard_segment_pool.total_memory() will become 0 and memory
reclamation will stop doing anything, leading to apparent OOM.

The fix is to switch to the standard allocator context inside
cached_page::on_evicted(). evict_range() was also given the same
treatment as a precaution, it currently is only invoked in the
standard allocator context.

The series also adds two safety checks to LSA to catch such problems earlier.

Fixes #10056

\cc @slivne @bhalevy

Closes #10130

* github.com:scylladb/scylla:
  lsa: Abort when trying to free a standard allocator object not allocated through the region
  lsa: Abort when _non_lsa_memory_in_use goes negative
  tests: utils: cached_file: Validate occupancy after eviction
  test: sstable_partition_index_cache_test: Fix alloc-dealloc mismatch
  utils: cached_file: Fix alloc-dealloc mismatch during eviction

(cherry picked from commit ff2cd72766)
2022-02-26 11:28:36 +02:00
Benny Halevy
22562f767f cql3: result_set: remove std::ref from comperator&
Applying std::ref on `RowComparator& cmp` hits the
following compilation error on Fedora 34 with
libstdc++-devel-11.2.1-9.fc34.x86_64

```
FAILED: build/dev/cql3/statements/select_statement.o
clang++ -MD -MT build/dev/cql3/statements/select_statement.o -MF build/dev/cql3/statements/select_statement.o.d -I/home/bhalevy/dev/scylla/seastar/include -I/home/bhalevy/dev/scylla/build/dev/seastar/gen/include -std=gnu++20 -U_FORTIFY_SOURCE -DSEASTAR_SSTRING -Werror=unused-result -fstack-clash-protection -DSEASTAR_API_LEVEL=6 -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_TYPE_ERASE_MORE -DFMT_LOCALE -DFMT_SHARED -I/usr/include/p11-kit-1  -DDEVEL -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSCYLLA_ENABLE_ERROR_INJECTION -O2 -DSCYLLA_ENABLE_WASMTIME -iquote. -iquote build/dev/gen --std=gnu++20  -ffile-prefix-map=/home/bhalevy/dev/scylla=.  -march=westmere -DBOOST_TEST_DYN_LINK   -Iabseil -fvisibility=hidden  -Wall -Werror -Wno-mismatched-tags -Wno-tautological-compare -Wno-parentheses-equality -Wno-c++11-narrowing -Wno-sometimes-uninitialized -Wno-return-stack-address -Wno-missing-braces -Wno-unused-lambda-capture -Wno-overflow -Wno-noexcept-type -Wno-error=cpp -Wno-ignored-attributes -Wno-overloaded-virtual -Wno-unused-command-line-argument -Wno-defaulted-function-deleted -Wno-redeclared-class-member -Wno-unsupported-friend -Wno-unused-variable -Wno-delete-non-abstract-non-virtual-dtor -Wno-braced-scalar-init -Wno-implicit-int-float-conversion -Wno-delete-abstract-non-virtual-dtor -Wno-uninitialized-const-reference -Wno-psabi -Wno-narrowing -Wno-array-bounds -Wno-nonnull -Wno-error=deprecated-declarations -DXXH_PRIVATE_API -DSEASTAR_TESTING_MAIN -DHAVE_LZ4_COMPRESS_DEFAULT  -c -o build/dev/cql3/statements/select_statement.o cql3/statements/select_statement.cc
In file included from cql3/statements/select_statement.cc:14:
In file included from ./cql3/statements/select_statement.hh:16:
In file included from ./cql3/statements/raw/select_statement.hh:16:
In file included from ./cql3/statements/raw/cf_statement.hh:16:
In file included from ./cql3/cf_name.hh:16:
In file included from ./cql3/keyspace_element_name.hh:16:
In file included from /home/bhalevy/dev/scylla/seastar/include/seastar/core/sstring.hh:25:
In file included from /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/algorithm:74:
In file included from /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/pstl/glue_algorithm_defs.h:13:
In file included from /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/functional:58:
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/refwrap.h:319:40: error: exception specification of 'function<__gnu_cxx::__ops::_Iter_comp_iter<std::reference_wrapper<const std::function<bool (const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &, const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &)>>>, void>' uses itself
                = decltype(reference_wrapper::_S_fun(std::declval<_Up>()))>
                                                     ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/refwrap.h:319:40: note: in instantiation of exception specification for 'function<__gnu_cxx::__ops::_Iter_comp_iter<std::reference_wrapper<const std::function<bool (const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &, const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &)>>>, void>' requested here
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/refwrap.h:321:2: note: in instantiation of default argument for 'reference_wrapper<__gnu_cxx::__ops::_Iter_comp_iter<std::reference_wrapper<const std::function<bool (const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &, const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &)>>>, void>' required here
        reference_wrapper(_Up&& __uref)
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/type_traits:1017:57: note: while substituting deduced template arguments into function template 'reference_wrapper' [with _Up = __gnu_cxx::__ops::_Iter_comp_iter<std::reference_wrapper<const std::function<bool (const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &, const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &)>>>, $1 = (no value), $2 = (no value)]
      = __bool_constant<__is_nothrow_constructible(_Tp, _Args...)>;
                                                        ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/type_traits:1023:14: note: in instantiation of template type alias '__is_nothrow_constructible_impl' requested here
    : public __is_nothrow_constructible_impl<_Tp, _Args...>::type
             ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/type_traits:153:14: note: in instantiation of template class 'std::is_nothrow_constructible<__gnu_cxx::__ops::_Iter_comp_iter<std::reference_wrapper<const std::function<bool (const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &, const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &)>>>, __gnu_cxx::__ops::_Iter_comp_iter<std::reference_wrapper<const std::function<bool (const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &, const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &)>>>>' requested here
    : public conditional<_B1::value, _B2, _B1>::type
             ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/std_function.h:298:11: note: (skipping 8 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
          return __and_<typename _Base::_Local_storage,
                 ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_algo.h:1933:13: note: in instantiation of function template specialization 'std::__partial_sort<utils::chunked_vector<std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>>, 131072>::iterator_type<std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>>>, __gnu_cxx::__ops::_Iter_comp_iter<std::reference_wrapper<const std::function<bool (const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &, const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &)>>>>' requested here
              std::__partial_sort(__first, __last, __last, __comp);
                   ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_algo.h:1954:9: note: in instantiation of function template specialization 'std::__introsort_loop<utils::chunked_vector<std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>>, 131072>::iterator_type<std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>>>, long, __gnu_cxx::__ops::_Iter_comp_iter<std::reference_wrapper<const std::function<bool (const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &, const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &)>>>>' requested here
          std::__introsort_loop(__first, __last,
               ^
/usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_algo.h:4875:12: note: in instantiation of function template specialization 'std::__sort<utils::chunked_vector<std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>>, 131072>::iterator_type<std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>>>, __gnu_cxx::__ops::_Iter_comp_iter<std::reference_wrapper<const std::function<bool (const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &, const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &)>>>>' requested here
      std::__sort(__first, __last, __gnu_cxx::__ops::__iter_comp_iter(__comp));
           ^
./cql3/result_set.hh:168:14: note: in instantiation of function template specialization 'std::sort<utils::chunked_vector<std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>>, 131072>::iterator_type<std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>>>, std::reference_wrapper<const std::function<bool (const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &, const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &)>>>' requested here
        std::sort(_rows.begin(), _rows.end(), std::ref(cmp));
             ^
cql3/statements/select_statement.cc:773:21: note: in instantiation of function template specialization 'cql3::result_set::sort<std::function<bool (const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &, const std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &)>>' requested here
                rs->sort(_ordering_comparator);
                    ^
1 error generated.
ninja: build stopped: subcommand failed.
```

Fixes #10079.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220215071955.316895-3-bhalevy@scylladb.com>
(cherry picked from commit 3e20fee070)

[avi: backport for developer quality-of-life rather than as a bug fix]
2022-02-16 10:07:11 +02:00
Raphael S. Carvalho
eb80dd1db5 Revert "sstables/compaction_manager: rewrite_sstables(): resolve maintenance group FIXME"
This reverts commit 4c05e5f966.

Moving cleanup to maintenance group made its operation time up to
10x slower than previous release. It's a blocker to 4.6 release,
so let's revert it until we figure this all out.

Probably this happens because maintenance group is fixed at a
relatively small constant, and cleanup may be incrementally
generating backlog for regular compaction, where the former is
fighting for resources against the latter.

Fixes #10060.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220213184306.91585-1-raphaelsc@scylladb.com>
(cherry picked from commit a9427f150a)
2022-02-14 18:05:43 +02:00
Avi Kivity
51d699ee21 Update seastar submodule (overzealous log silencer)
* seastar 0d250d15ac...47573503cd (1):
  > log: Fix silencer to be shard-local and logger-global
Fixes #9784.
2022-02-14 17:54:54 +02:00
Avi Kivity
83a33bff8c Point seastar submodule at scylla-seastar.git
This allows us to backport Seastar fixes to this branch.
2022-02-14 17:54:16 +02:00
Nadav Har'El
273563b9ad alternator: allow REMOVE of non-existent nested attribute
DynamoDB allows an UpdateItem operation "REMOVE x.y" when a map x
exists in the item, but x.y doesn't - the removal silently does
nothing. Alternator incorrectly generated an error in this case,
and unfortunately we didn't have a test for this case.

So in this patch we add the missing test (which fails on Alternator
before this patch - and passes on DynamoDB) and then fix the behavior.
After this patch, "REMOVE x.y" will remain an error if "x" doesn't
exist (saying "document paths not valid for this item"), but if "x"
exists and is a map, but "x.y" doesn't, the removal will silently
do nothing and will not be an error.

Fixes #10043.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220207133652.181994-1-nyh@scylladb.com>
(cherry picked from commit 9982a28007)
2022-02-08 11:37:31 +02:00
Yaron Kaikov
891990ec09 release: prepare for 5.0.rc1 2022-02-06 16:41:05 +02:00
Yaron Kaikov
da0cd2b107 release: prepare for 5.0.rc0 2022-02-03 08:10:30 +02:00
25 changed files with 286 additions and 138 deletions

2
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui

View File

@@ -60,7 +60,7 @@ fi
# Default scylla product/version tags
PRODUCT=scylla
VERSION=5.0.dev
VERSION=5.0.rc3
if test -f version
then

View File

@@ -2577,8 +2577,8 @@ static bool hierarchy_actions(
// attr member so we can use add()
rjson::add_with_string_name(v, attr, std::move(*newv));
} else {
throw api_error::validation(format("Can't remove document path {} - not present in item",
subh.get_value()._path));
// Removing a.b when a is a map but a.b doesn't exist
// is silently ignored. It's not considered an error.
}
} else {
throw api_error::validation(format("UpdateExpression: document paths not valid for this item:{}", h));

View File

@@ -87,19 +87,24 @@ compare_atomic_cell_for_merge(atomic_cell_view left, atomic_cell_view right) {
// prefer expiring cells.
return left.is_live_and_has_ttl() ? std::strong_ordering::greater : std::strong_ordering::less;
}
if (left.is_live_and_has_ttl() && left.expiry() != right.expiry()) {
return left.expiry() <=> right.expiry();
if (left.is_live_and_has_ttl()) {
if (left.expiry() != right.expiry()) {
return left.expiry() <=> right.expiry();
} else {
// prefer the cell that was written later,
// so it survives longer after it expires, until purged.
return right.ttl() <=> left.ttl();
}
}
} else {
// Both are deleted
if (left.deletion_time() != right.deletion_time()) {
// Origin compares big-endian serialized deletion time. That's because it
// delegates to AbstractCell.reconcile() which compares values after
// comparing timestamps, which in case of deleted cells will hold
// serialized expiry.
return (uint64_t) left.deletion_time().time_since_epoch().count()
<=> (uint64_t) right.deletion_time().time_since_epoch().count();
}
// Origin compares big-endian serialized deletion time. That's because it
// delegates to AbstractCell.reconcile() which compares values after
// comparing timestamps, which in case of deleted cells will hold
// serialized expiry.
return (uint64_t) left.deletion_time().time_since_epoch().count()
<=> (uint64_t) right.deletion_time().time_since_epoch().count();
}
return std::strong_ordering::equal;
}

View File

@@ -353,32 +353,50 @@ future<> compaction_manager::run_custom_job(replica::table* t, sstables::compact
return task->compaction_done.get_future().then([task] {});
}
compaction_manager::compaction_reenabler::compaction_reenabler(compaction_manager& cm, replica::table* t)
: _cm(cm)
, _table(t)
, _compaction_state(cm.get_compaction_state(_table))
, _holder(_compaction_state.gate.hold())
{
_compaction_state.compaction_disabled_counter++;
cmlog.debug("Temporarily disabled compaction for {}.{}. compaction_disabled_counter={}",
_table->schema()->ks_name(), _table->schema()->cf_name(), _compaction_state.compaction_disabled_counter);
}
compaction_manager::compaction_reenabler::compaction_reenabler(compaction_reenabler&& o) noexcept
: _cm(o._cm)
, _table(std::exchange(o._table, nullptr))
, _compaction_state(o._compaction_state)
, _holder(std::move(o._holder))
{}
compaction_manager::compaction_reenabler::~compaction_reenabler() {
// submit compaction request if we're the last holder of the gate which is still opened.
if (_table && --_compaction_state.compaction_disabled_counter == 0 && !_compaction_state.gate.is_closed()) {
cmlog.debug("Reenabling compaction for {}.{}",
_table->schema()->ks_name(), _table->schema()->cf_name());
try {
_cm.submit(_table);
} catch (...) {
cmlog.warn("compaction_reenabler could not reenable compaction for {}.{}: {}",
_table->schema()->ks_name(), _table->schema()->cf_name(), std::current_exception());
}
}
}
future<compaction_manager::compaction_reenabler>
compaction_manager::stop_and_disable_compaction(replica::table* t) {
compaction_reenabler cre(*this, t);
co_await stop_ongoing_compactions("user-triggered operation", t);
co_return cre;
}
future<>
compaction_manager::run_with_compaction_disabled(replica::table* t, std::function<future<> ()> func) {
auto& c_state = _compaction_state[t];
auto holder = c_state.gate.hold();
compaction_reenabler cre = co_await stop_and_disable_compaction(t);
c_state.compaction_disabled_counter++;
std::exception_ptr err;
try {
co_await stop_ongoing_compactions("user-triggered operation", t);
co_await func();
} catch (...) {
err = std::current_exception();
}
#ifdef DEBUG
assert(_compaction_state.contains(t));
#endif
// submit compaction request if we're the last holder of the gate which is still opened.
if (--c_state.compaction_disabled_counter == 0 && !c_state.gate.is_closed()) {
submit(t);
}
if (err) {
std::rethrow_exception(err);
}
co_return;
co_await func();
}
void compaction_manager::task::setup_new_compaction() {
@@ -810,7 +828,8 @@ future<> compaction_manager::rewrite_sstables(replica::table* t, sstables::compa
auto sstable_level = sst->get_sstable_level();
auto run_identifier = sst->run_identifier();
auto sstable_set_snapshot = can_purge ? std::make_optional(t.get_sstable_set()) : std::nullopt;
auto descriptor = sstables::compaction_descriptor({ sst }, std::move(sstable_set_snapshot), _maintenance_sg.io,
// FIXME: this compaction should run with maintenance priority.
auto descriptor = sstables::compaction_descriptor({ sst }, std::move(sstable_set_snapshot), service::get_local_compaction_priority(),
sstable_level, sstables::compaction_descriptor::default_max_sstable_bytes, run_identifier, options);
// Releases reference to cleaned sstable such that respective used disk space can be freed.
@@ -819,8 +838,9 @@ future<> compaction_manager::rewrite_sstables(replica::table* t, sstables::compa
};
auto maintenance_permit = co_await seastar::get_units(_maintenance_ops_sem, 1);
// Take write lock for table to serialize cleanup/upgrade sstables/scrub with major compaction/reshape/reshard.
auto write_lock_holder = co_await _compaction_state[&t].lock.hold_write_lock();
// FIXME: acquiring the read lock is not needed after acquiring the _maintenance_ops_sem
// only major compaction needs to acquire the write lock to synchronize with regular compaction.
auto lock_holder = co_await _compaction_state[&t].lock.hold_read_lock();
_stats.pending_tasks--;
_stats.active_tasks++;
@@ -852,7 +872,7 @@ future<> compaction_manager::rewrite_sstables(replica::table* t, sstables::compa
};
compaction_backlog_tracker user_initiated(std::make_unique<user_initiated_backlog_tracker>(_compaction_controller.backlog_of_shares(200), _available_memory));
completed = co_await with_scheduling_group(_maintenance_sg.cpu, std::ref(perform_rewrite));
completed = co_await with_scheduling_group(_compaction_controller.sg(), std::ref(perform_rewrite));
} while (!completed);
};

View File

@@ -269,6 +269,31 @@ public:
// parameter job is a function that will carry the operation
future<> run_custom_job(replica::table* t, sstables::compaction_type type, noncopyable_function<future<>(sstables::compaction_data&)> job);
class compaction_reenabler {
compaction_manager& _cm;
replica::table* _table;
compaction_state& _compaction_state;
gate::holder _holder;
public:
compaction_reenabler(compaction_manager&, replica::table*);
compaction_reenabler(compaction_reenabler&&) noexcept;
~compaction_reenabler();
replica::table* compacting_table() const noexcept {
return _table;
}
const compaction_state& compaction_state() const noexcept {
return _compaction_state;
}
};
// Disable compaction temporarily for a table t.
// Caller should call the compaction_reenabler::reenable
future<compaction_reenabler> stop_and_disable_compaction(replica::table* t);
// Run a function with compaction temporarily disabled for a table T.
future<> run_with_compaction_disabled(replica::table* t, std::function<future<> ()> func);

View File

@@ -103,7 +103,13 @@ managed_bytes_opt get_value(const column_value& col, const column_value_eval_bag
if (!col_type->is_map()) {
throw exceptions::invalid_request_exception(format("subscripting non-map column {}", cdef->name_as_text()));
}
const auto deserialized = cdef->type->deserialize(managed_bytes_view(*data.other_columns[data.sel.index_of(*cdef)]));
int32_t index = data.sel.index_of(*cdef);
if (index == -1) {
throw std::runtime_error(
format("Column definition {} does not match any column in the query selection",
cdef->name_as_text()));
}
const auto deserialized = cdef->type->deserialize(managed_bytes_view(*data.other_columns[index]));
const auto& data_map = value_cast<map_type_impl::native_type>(deserialized);
const auto key = evaluate(*col.sub, options);
auto&& key_type = col_type->name_comparator();
@@ -121,8 +127,16 @@ managed_bytes_opt get_value(const column_value& col, const column_value_eval_bag
case column_kind::clustering_key:
return managed_bytes(data.clustering_key[cdef->id]);
case column_kind::static_column:
case column_kind::regular_column:
return managed_bytes_opt(data.other_columns[data.sel.index_of(*cdef)]);
[[fallthrough]];
case column_kind::regular_column: {
int32_t index = data.sel.index_of(*cdef);
if (index == -1) {
throw std::runtime_error(
format("Column definition {} does not match any column in the query selection",
cdef->name_as_text()));
}
return managed_bytes_opt(data.other_columns[index]);
}
default:
throw exceptions::unsupported_operation_exception("Unknown column kind");
}

View File

@@ -514,7 +514,7 @@ statement_restrictions::statement_restrictions(data_dictionary::database db,
}
if (!_nonprimary_key_restrictions->empty()) {
if (_has_queriable_regular_index) {
if (_has_queriable_regular_index && _partition_range_is_simple) {
_uses_secondary_indexing = true;
} else if (!allow_filtering) {
throw exceptions::invalid_request_exception("Cannot execute this query as it might involve data filtering and "

View File

@@ -165,7 +165,7 @@ public:
template<typename RowComparator>
void sort(const RowComparator& cmp) {
std::sort(_rows.begin(), _rows.end(), std::ref(cmp));
std::sort(_rows.begin(), _rows.end(), cmp);
}
metadata& get_metadata();

View File

@@ -6,12 +6,16 @@ is_nonroot() {
[ -f "$scylladir"/SCYLLA-NONROOT-FILE ]
}
is_container() {
[ -f "$scylladir"/SCYLLA-CONTAINER-FILE ]
}
is_privileged() {
[ ${EUID:-${UID}} = 0 ]
}
execsudo() {
if is_nonroot; then
if is_nonroot || is_container; then
exec "$@"
else
exec sudo -u scylla -g scylla "$@"

View File

@@ -82,7 +82,7 @@ run bash -ec "echo 'debconf debconf/frontend select Noninteractive' | debconf-se
run bash -ec "rm -rf /etc/rsyslog.conf"
run apt-get -y install hostname supervisor openssh-server openssh-client openjdk-11-jre-headless python python-yaml curl rsyslog locales sudo
run locale-gen en_US.UTF-8
run update-locale LANG=en_US.UTF-8 LANGUAGE=en_US:en LC_ALL=en_US.UTF_8
run update-locale LANG=en_US.UTF-8 LANGUAGE=en_US:en LC_ALL=en_US.UTF-8
run bash -ec "dpkg -i packages/*.deb"
run apt-get -y clean all
run bash -ec "cat /scylla_bashrc >> /etc/bash.bashrc"
@@ -91,6 +91,7 @@ run mkdir -p /var/log/scylla
run chown -R scylla:scylla /var/lib/scylla
run mkdir -p /opt/scylladb/supervisor
run touch /opt/scylladb/SCYLLA-CONTAINER-FILE
bcp dist/common/supervisor/scylla-server.sh /opt/scylladb/supervisor/scylla-server.sh
bcp dist/common/supervisor/scylla-jmx.sh /opt/scylladb/supervisor/scylla-jmx.sh
bcp dist/common/supervisor/scylla-node-exporter.sh /opt/scylladb/supervisor/scylla-node-exporter.sh

View File

@@ -1,4 +1,4 @@
[program:scylla-server]
[program:scylla]
command=/opt/scylladb/supervisor/scylla-server.sh
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0

View File

@@ -508,8 +508,13 @@ relocate_python3 "$rprefix"/scripts fix_system_distributed_tables.py
if $supervisor; then
install -d -m755 `supervisor_dir $retc`
for service in scylla-server scylla-jmx scylla-node-exporter; do
if [ "$service" = "scylla-server" ]; then
program="scylla"
else
program=$service
fi
cat << EOS > `supervisor_conf $retc $service`
[program:$service]
[program:$program]
directory=$rprefix
command=/bin/bash -c './supervisor/$service.sh'
EOS

33
main.cc
View File

@@ -367,11 +367,38 @@ static auto defer_verbose_shutdown(const char* what, Func&& func) {
startlog.info("Shutting down {}", what);
try {
func();
startlog.info("Shutting down {} was successful", what);
} catch (...) {
startlog.error("Unexpected error shutting down {}: {}", what, std::current_exception());
throw;
auto ex = std::current_exception();
bool do_abort = true;
try {
std::rethrow_exception(ex);
} catch (const std::system_error& e) {
// System error codes we consider "environmental",
// i.e. not scylla's fault, therefore there is no point in
// aborting and dumping core.
for (int i : {EIO, EACCES, ENOSPC}) {
if (e.code() == std::error_code(i, std::system_category())) {
do_abort = false;
break;
}
}
} catch (...) {
}
auto msg = fmt::format("Unexpected error shutting down {}: {}", what, ex);
if (do_abort) {
startlog.error("{}: aborting", msg);
abort();
} else {
startlog.error("{}: exiting, at {}", msg, current_backtrace());
// Call _exit() rather than exit() to exit immediately
// without calling exit handlers, avoiding
// boost::intrusive::detail::destructor_impl assert failure
// from ~segment_pool exit handler.
_exit(255);
}
}
startlog.info("Shutting down {} was successful", what);
};
auto ret = deferred_action(std::move(vfunc));

View File

@@ -96,7 +96,7 @@ void range_tombstone_list::insert_from(const schema& s,
if (cmp(end, it->position()) < 0) {
// not overlapping
if (it->tombstone().tomb == tomb && cmp(end, it->position()) == 0) {
rev.update(it, {std::move(start), std::move(start), tomb});
rev.update(it, {std::move(start), std::move(end), tomb});
} else {
auto rt = construct_range_tombstone_entry(std::move(start), std::move(end), tomb);
rev.insert(it, *rt);

View File

@@ -2062,80 +2062,77 @@ future<> database::truncate(sstring ksname, sstring cfname, timestamp_func tsf)
future<> database::truncate(const keyspace& ks, column_family& cf, timestamp_func tsf, bool with_snapshot) {
dblog.debug("Truncating {}.{}", cf.schema()->ks_name(), cf.schema()->cf_name());
return with_gate(cf.async_gate(), [this, &ks, &cf, tsf = std::move(tsf), with_snapshot] () mutable -> future<> {
const auto auto_snapshot = with_snapshot && get_config().auto_snapshot();
const auto should_flush = auto_snapshot;
auto holder = cf.async_gate().hold();
// Force mutations coming in to re-acquire higher rp:s
// This creates a "soft" ordering, in that we will guarantee that
// any sstable written _after_ we issue the flush below will
// only have higher rp:s than we will get from the discard_sstable
// call.
auto low_mark = cf.set_low_replay_position_mark();
const auto auto_snapshot = with_snapshot && get_config().auto_snapshot();
const auto should_flush = auto_snapshot;
const auto uuid = cf.schema()->id();
// Force mutations coming in to re-acquire higher rp:s
// This creates a "soft" ordering, in that we will guarantee that
// any sstable written _after_ we issue the flush below will
// only have higher rp:s than we will get from the discard_sstable
// call.
auto low_mark = cf.set_low_replay_position_mark();
return _compaction_manager->run_with_compaction_disabled(&cf, [this, &cf, should_flush, auto_snapshot, tsf = std::move(tsf), low_mark]() mutable {
future<> f = make_ready_future<>();
bool did_flush = false;
if (should_flush && cf.can_flush()) {
// TODO:
// this is not really a guarantee at all that we've actually
// gotten all things to disk. Again, need queue-ish or something.
f = cf.flush();
did_flush = true;
} else {
f = cf.clear();
}
return f.then([this, &cf, auto_snapshot, tsf = std::move(tsf), low_mark, should_flush, did_flush] {
dblog.debug("Discarding sstable data for truncated CF + indexes");
// TODO: notify truncation
const auto uuid = cf.schema()->id();
return tsf().then([this, &cf, auto_snapshot, low_mark, should_flush, did_flush](db_clock::time_point truncated_at) {
future<> f = make_ready_future<>();
if (auto_snapshot) {
auto name = format("{:d}-{}", truncated_at.time_since_epoch().count(), cf.schema()->cf_name());
f = cf.snapshot(*this, name);
}
return f.then([this, &cf, truncated_at, low_mark, should_flush, did_flush] {
return cf.discard_sstables(truncated_at).then([this, &cf, truncated_at, low_mark, should_flush, did_flush](db::replay_position rp) {
// TODO: indexes.
// Note: since discard_sstables was changed to only count tables owned by this shard,
// we can get zero rp back. Changed assert, and ensure we save at least low_mark.
// #6995 - the assert below was broken in c2c6c71 and remained so for many years.
// We nowadays do not flush tables with sstables but autosnapshot=false. This means
// the low_mark assertion does not hold, because we maybe/probably never got around to
// creating the sstables that would create them.
assert(!did_flush || low_mark <= rp || rp == db::replay_position());
rp = std::max(low_mark, rp);
return truncate_views(cf, truncated_at, should_flush).then([&cf, truncated_at, rp] {
// save_truncation_record() may actually fail after we cached the truncation time
// but this is not be worse that if failing without caching: at least the correct time
// will be available until next reboot and a client will have to retry truncation anyway.
cf.cache_truncation_record(truncated_at);
return db::system_keyspace::save_truncation_record(cf, truncated_at, rp);
});
});
});
});
});
}).then([this, uuid] {
drop_repair_history_map_for_table(uuid);
});
});
}
std::vector<compaction_manager::compaction_reenabler> cres;
cres.reserve(1 + cf.views().size());
future<> database::truncate_views(const column_family& base, db_clock::time_point truncated_at, bool should_flush) {
return parallel_for_each(base.views(), [this, truncated_at, should_flush] (view_ptr v) {
cres.emplace_back(co_await _compaction_manager->stop_and_disable_compaction(&cf));
co_await parallel_for_each(cf.views(), [&, this] (view_ptr v) -> future<> {
auto& vcf = find_column_family(v);
return _compaction_manager->run_with_compaction_disabled(&vcf, [&vcf, truncated_at, should_flush] {
return (should_flush ? vcf.flush() : vcf.clear()).then([&vcf, truncated_at, should_flush] {
return vcf.discard_sstables(truncated_at).then([&vcf, truncated_at, should_flush](db::replay_position rp) {
return db::system_keyspace::save_truncation_record(vcf, truncated_at, rp);
});
});
});
cres.emplace_back(co_await _compaction_manager->stop_and_disable_compaction(&vcf));
});
bool did_flush = false;
if (should_flush && cf.can_flush()) {
// TODO:
// this is not really a guarantee at all that we've actually
// gotten all things to disk. Again, need queue-ish or something.
co_await cf.flush();
did_flush = true;
} else {
co_await cf.clear();
}
dblog.debug("Discarding sstable data for truncated CF + indexes");
// TODO: notify truncation
db_clock::time_point truncated_at = co_await tsf();
if (auto_snapshot) {
auto name = format("{:d}-{}", truncated_at.time_since_epoch().count(), cf.schema()->cf_name());
co_await cf.snapshot(*this, name);
}
db::replay_position rp = co_await cf.discard_sstables(truncated_at);
// TODO: indexes.
// Note: since discard_sstables was changed to only count tables owned by this shard,
// we can get zero rp back. Changed assert, and ensure we save at least low_mark.
// #6995 - the assert below was broken in c2c6c71 and remained so for many years.
// We nowadays do not flush tables with sstables but autosnapshot=false. This means
// the low_mark assertion does not hold, because we maybe/probably never got around to
// creating the sstables that would create them.
assert(!did_flush || low_mark <= rp || rp == db::replay_position());
rp = std::max(low_mark, rp);
co_await parallel_for_each(cf.views(), [this, truncated_at, should_flush] (view_ptr v) -> future<> {
auto& vcf = find_column_family(v);
if (should_flush) {
co_await vcf.flush();
} else {
co_await vcf.clear();
}
db::replay_position rp = co_await vcf.discard_sstables(truncated_at);
co_await db::system_keyspace::save_truncation_record(vcf, truncated_at, rp);
});
// save_truncation_record() may actually fail after we cached the truncation time
// but this is not be worse that if failing without caching: at least the correct time
// will be available until next reboot and a client will have to retry truncation anyway.
cf.cache_truncation_record(truncated_at);
co_await db::system_keyspace::save_truncation_record(cf, truncated_at, rp);
drop_repair_history_map_for_table(uuid);
}
const sstring& database::get_snitch_name() const {

View File

@@ -1568,7 +1568,6 @@ public:
/** Truncates the given column family */
future<> truncate(sstring ksname, sstring cfname, timestamp_func);
future<> truncate(const keyspace& ks, column_family& cf, timestamp_func, bool with_snapshot = true);
future<> truncate_views(const column_family& base, db_clock::time_point truncated_at, bool should_flush);
bool update_column_family(schema_ptr s);
future<> drop_column_family(const sstring& ks_name, const sstring& cf_name, timestamp_func, bool with_snapshot = true);

Submodule seastar updated: 0d250d15ac...4a30c44c4c

View File

@@ -1030,6 +1030,20 @@ def test_nested_attribute_remove_from_missing_item(test_table_s):
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE x.y')
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE x[0]')
# Though in an above test (test_nested_attribute_update_bad_path_dot) we
# showed that DynamoDB does not allow REMOVE x.y if attribute x doesn't
# exist - and generates a ValidationException, if x *does* exist but y
# doesn't, it's fine and the removal should just be silently ignored.
def test_nested_attribute_remove_missing_leaf(test_table_s):
p = random_string()
item = {'p': p, 'a': {'x': 3}, 'b': ['hi']}
test_table_s.put_item(Item=item)
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE a.y')
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE b[7]')
test_table_s.update_item(Key={'p': p}, UpdateExpression='REMOVE c')
# The above UpdateItem calls didn't change anything...
assert test_table_s.get_item(Key={'p': p}, ConsistentRead=True)['Item'] == item
# Similarly for other types of bad paths - using [0] on something which
# doesn't exist or isn't an array.
def test_nested_attribute_update_bad_path_array(test_table_s):

View File

@@ -207,7 +207,9 @@ SEASTAR_THREAD_TEST_CASE(test_eviction_via_lru) {
}
{
cf_lru.evict_all();
with_allocator(region.allocator(), [] {
cf_lru.evict_all();
});
BOOST_REQUIRE_EQUAL(0, metrics.cached_bytes); // change here
BOOST_REQUIRE_EQUAL(0, cf.cached_bytes()); // change here
@@ -215,6 +217,8 @@ SEASTAR_THREAD_TEST_CASE(test_eviction_via_lru) {
BOOST_REQUIRE_EQUAL(3, metrics.page_evictions); // change here
BOOST_REQUIRE_EQUAL(0, metrics.page_hits);
BOOST_REQUIRE_EQUAL(3, metrics.page_populations);
BOOST_REQUIRE_EQUAL(region.occupancy().used_space(), 0);
}
{

View File

@@ -690,6 +690,7 @@ SEASTAR_TEST_CASE(test_cell_ordering) {
};
auto assert_equal = [] (atomic_cell_view c1, atomic_cell_view c2) {
testlog.trace("Expected {} == {}", c1, c2);
BOOST_REQUIRE(compare_atomic_cell_for_merge(c1, c2) == 0);
BOOST_REQUIRE(compare_atomic_cell_for_merge(c2, c1) == 0);
};
@@ -711,9 +712,11 @@ SEASTAR_TEST_CASE(test_cell_ordering) {
atomic_cell::make_live(*bytes_type, 1, bytes(), expiry_2, ttl_2));
// Origin doesn't compare ttl (is it wise?)
assert_equal(
atomic_cell::make_live(*bytes_type, 1, bytes("value"), expiry_1, ttl_1),
atomic_cell::make_live(*bytes_type, 1, bytes("value"), expiry_1, ttl_2));
// But we do. See https://github.com/scylladb/scylla/issues/10156
// and https://github.com/scylladb/scylla/issues/10173
assert_order(
atomic_cell::make_live(*bytes_type, 1, bytes("value"), expiry_1, ttl_2),
atomic_cell::make_live(*bytes_type, 1, bytes("value"), expiry_1, ttl_1));
assert_order(
atomic_cell::make_live(*bytes_type, 0, bytes("value1")),

View File

@@ -24,11 +24,13 @@ static void add_entry(logalloc::region& r,
{
logalloc::allocating_section as;
as(r, [&] {
sstables::key sst_key = sstables::key::from_partition_key(s, key);
page._entries.push_back(make_managed<index_entry>(
managed_bytes(sst_key.get_bytes()),
position,
managed_ref<promoted_index>()));
with_allocator(r.allocator(), [&] {
sstables::key sst_key = sstables::key::from_partition_key(s, key);
page._entries.push_back(make_managed<index_entry>(
managed_bytes(sst_key.get_bytes()),
position,
managed_ref<promoted_index>()));
});
});
}

View File

@@ -115,3 +115,16 @@ def test_operator_ne_not_supported(cql, table1):
cql.execute(f'SELECT a FROM {table1} WHERE a != 0')
with pytest.raises(InvalidRequest, match='Unsupported.*!='):
cql.execute(f'SELECT a FROM {table1} WHERE token(a) != 0')
# Test that the fact that a column is indexed does not cause us to fetch
# incorrect results from a filtering query (issue #10300).
def test_index_with_in_relation(scylla_only, cql, test_keyspace):
schema = 'p int, c int, v boolean, primary key (p,c)'
with new_test_table(cql, test_keyspace, schema) as table:
cql.execute(f"create index on {table}(v)")
for p, c, v in [(0,0,True),(0,1,False),(0,2,True),(0,3,False),
(1,0,True),(1,1,False),(1,2,True),(1,3,False),
(2,0,True),(2,1,False),(2,2,True),(2,3,False)]:
cql.execute(f"insert into {table} (p,c,v) values ({p}, {c}, {v})")
res = cql.execute(f"select * from {table} where p in (0,1) and v = False ALLOW FILTERING")
assert set(res) == set([(0,1,False),(0,3,False),(1,1,False), (1,3,False)])

View File

@@ -326,6 +326,7 @@ public:
}
size_t evict_range(cache_type::iterator start, cache_type::iterator end) noexcept {
return with_allocator(standard_allocator(), [&] {
size_t count = 0;
auto disposer = [] (auto* p) noexcept {};
while (start != end) {
@@ -338,6 +339,7 @@ public:
}
}
return count;
});
}
public:
/// \brief Constructs a cached_file.
@@ -464,8 +466,10 @@ public:
inline
void cached_file::cached_page::on_evicted() noexcept {
parent->on_evicted(*this);
cached_file::cache_type::iterator it(this);
it.erase(page_idx_less_comparator());
with_allocator(standard_allocator(), [this] {
cached_file::cache_type::iterator it(this);
it.erase(page_idx_less_comparator());
});
}
class cached_file_impl : public file_impl {

View File

@@ -584,6 +584,10 @@ static constexpr auto max_used_space_ratio_for_compaction = 0.85;
static constexpr size_t max_used_space_for_compaction = segment_size * max_used_space_ratio_for_compaction;
static constexpr size_t min_free_space_for_compaction = segment_size - max_used_space_for_compaction;
struct [[gnu::packed]] non_lsa_object_cookie {
uint64_t value = 0xbadcaffe;
};
static_assert(min_free_space_for_compaction >= max_managed_object_size,
"Segments which cannot fit max_managed_object_size must not be considered compactible for the sake of forward progress of compaction");
@@ -827,9 +831,13 @@ public:
void clear_allocation_failure_flag() { _allocation_failure_flag = false; }
bool allocation_failure_flag() { return _allocation_failure_flag; }
void refill_emergency_reserve();
void update_non_lsa_memory_in_use(ssize_t n) {
void add_non_lsa_memory_in_use(size_t n) {
_non_lsa_memory_in_use += n;
}
void subtract_non_lsa_memory_in_use(size_t n) {
assert(_non_lsa_memory_in_use >= n);
_non_lsa_memory_in_use -= n;
}
size_t non_lsa_memory_in_use() const {
return _non_lsa_memory_in_use;
}
@@ -1630,17 +1638,18 @@ public:
memory::on_alloc_point();
shard_segment_pool.on_memory_allocation(size);
if (size > max_managed_object_size) {
auto ptr = standard_allocator().alloc(migrator, size, alignment);
auto ptr = standard_allocator().alloc(migrator, size + sizeof(non_lsa_object_cookie), alignment);
// This isn't very acurrate, the correct free_space value would be
// malloc_usable_size(ptr) - size, but there is no way to get
// the exact object size at free.
auto allocated_size = malloc_usable_size(ptr);
new ((char*)ptr + allocated_size - sizeof(non_lsa_object_cookie)) non_lsa_object_cookie();
_non_lsa_occupancy += occupancy_stats(0, allocated_size);
if (_group) {
_evictable_space += allocated_size;
_group->increase_usage(_heap_handle, allocated_size);
}
shard_segment_pool.update_non_lsa_memory_in_use(allocated_size);
shard_segment_pool.add_non_lsa_memory_in_use(allocated_size);
return ptr;
} else {
auto ptr = alloc_small(object_descriptor(migrator), (segment::size_type) size, alignment);
@@ -1652,12 +1661,14 @@ public:
private:
void on_non_lsa_free(void* obj) noexcept {
auto allocated_size = malloc_usable_size(obj);
auto cookie = (non_lsa_object_cookie*)((char*)obj + allocated_size) - 1;
assert(cookie->value == non_lsa_object_cookie().value);
_non_lsa_occupancy -= occupancy_stats(0, allocated_size);
if (_group) {
_evictable_space -= allocated_size;
_group->decrease_usage(_heap_handle, allocated_size);
}
shard_segment_pool.update_non_lsa_memory_in_use(-allocated_size);
shard_segment_pool.subtract_non_lsa_memory_in_use(allocated_size);
}
public:
virtual void free(void* obj) noexcept override {