Commit Graph

28815 Commits

Author SHA1 Message Date
Takuya ASADA
201a97e4a4 dist/docker: fix bashrc filename for Ubuntu
For Debian variants, correct filename is /etc/bash.bashrc.

Fixes #9588

Closes #9589
2021-11-07 17:01:13 +02:00
Avi Kivity
7a3930f7cf Merge 'More nodetool-replacing virtual tables' from Botond Dénes
This PR introduces 4 new virtual tables aimed at replacing nodetool commands, working towards the long-term goal of replacing nodetool completely at least for cluster information retrieval purposes.
As you may have noticed, most of these replacement are not exact matches. This is on purpose. I feel that the nodetool commands are somewhat chaotic: they might have had a clear plan on what command prints what but after years of organic development they are a mess of fields that feel like don't belong. In addition to this, they are centered on C* terminology which often sounds strange or doesn't make any sense for scylla (off-heap memory, counter cache, etc.).
So in this PR I tried to do a few things:
* Drop all fields that don't make sense for scylla;
* Rename/reformat/rephrase fields that have a corresponding concept in scylla, so that it uses the scylla terminology;
* Group information in tables based on some common theme;

With these guidelines in mind lets look at the virtual tables introduced in this PR:
* `system.snapshots` - replacement for `nodetool listnapshots`;
* `system.protocol_servers`- replacement for `nodetool statusbinary` as well as `Thrift active` and `Native Transport active` from `nodetool info`;
* `system.runtime_info` - replacement for `nodetool info`, not an exact match: some fields were removed, some were refactored to make sense for scylla;
* `system.versions` - replacement for `nodetool version`, prints all versions, including build-id;

Closes #9517

* github.com:scylladb/scylla:
  test/cql-pytest: add virtual_tables.py
  test/cql-pytest: nodetool.py: add take_snapshot()
  db/system_keyspace: add versions table
  configure.py: move release.cc and build_id.cc to scylla_core
  db/system_keyspace: add runtime_info table
  db/system_keyspace: add protocol_servers table
  service: storage_service: s/client_shutdown_hooks/protocol_servers/
  service: storage_service: remove unused unregister_client_shutdown_hook
  redis: redis_service: implement the protocol_server interface
  alternator: controller: implement the protocol_server interface
  transport: controller: implement the protocol_server interface
  thrift: controller: implement the protocol_server interface
  Add protocol_server interface
  db/system_keyspace: add snapshots virtual table
  db/virtual_table: remove _db member
  db/system_keyspace: propagate distributed<> database and storage_service to register_virtual_tables()
  docs/design-notes/system_keyspace.md: add listing of existing virtual tables
  docs/guides: add virtual-tables.md
2021-11-07 16:55:31 +02:00
Avi Kivity
c6ac1462c2 build, submodules: use utc for build datestamp
This helps keep packages built on different machines have the
same datestamp, if started on the same time.

* tools/java 05ec511bbb...fd10821045 (1):
  > build: use utc for build datestamp

* tools/jmx 48d37f3...d6225c5 (1):
  > build: use utc for build datestamp

* tools/python3 c51db54...8a77e76 (1):
  > build: use utc for build datestamp

[avi: commit own patches as this one requires excessive coordination
      across submodules, for something quite innocuous]

Ref #9563 (doesn't really fix it, but helps a little)
2021-11-07 15:58:48 +02:00
Avi Kivity
1d4f6498c8 Update tools/python3 submodule for .orig cleanup
* tools/python3 279aae1...c51db54 (1):
  > reloc: clean up '.orig' temporary directory before building deb package
2021-11-07 15:55:49 +02:00
Botond Dénes
e991604918 schema: make private constructor invokable via make_lw_shared
The schema has a private constructor, which means it can't be
constructed with `make_lw_shared()` even by classes which are otherwise
able to invoke the private constructor themselves.
This results in such classes (`schema_builder`) resorting to building a
local schema object, then invoking `make_lw_shared()` with the schema's
public move constructor. Moving a schema is not cheap at all however, so
each `schema_builder::build()` call results in two expensive schema
construction operations.
We could make `make_lw_shared()` a friend of `schema` to resolve this,
but then we'd de-facto open the private consctructor to the world.
Instead this patch introduces a private tag type, which is added to the
private constructor, which is then made public. Everybody can invoke the
constructor but only friends can create the private tag instance
required to actually call it.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211105085940.359708-1-bdenes@scylladb.com>
2021-11-07 12:51:09 +02:00
Tomasz Grabiec
31bc1eb681 Merge 'Memtable reversing reader: fix computing rt slice, if there was previously emitted range tombstone.' from Michał Radwański
This PR started by realizing that in the memtable reversing reader, it
never happened on tests that `do_refresh_state` was called with
`last_row` and `last_rts` which are not `std::nullopt`.

Changes
- fix memtable test (`tesst_memtable_with_many_versions_conforms_to_mutation_source`), so that there is a background job forcing state refreshes,
- fix the way rt_slice is computed (was `(last_rts, cr_range_snapshot.end]`, now is `[cr_range_snapshot.start, last_rts)`).

Fixes #9486

Closes #9572

* github.com:scylladb/scylla:
  partition_snapshot_reader: fix indentation in fill_buffer
  range_tombstone_list: {lower,upper,}slice share comparator implementation
  test: memtable: add full_compaction in background
  partition_snapshot_reader: fix obtaining rt_slice, if Reversing and _last_rts was set
  range_tombstone_list: add lower_slice
2021-11-05 15:27:03 +01:00
Botond Dénes
6993a55ff3 test/cql-pytest: add virtual_tables.py
Presence and column check for virtual tables. Where possible (and
simple) more is checked.
2021-11-05 16:26:21 +02:00
Botond Dénes
18f9d329ed test/cql-pytest: nodetool.py: add take_snapshot() 2021-11-05 16:26:01 +02:00
Botond Dénes
d51aa66a8a db/system_keyspace: add versions table
Contains all version related information (`nodetool version` and more).
Example printout:

    (cqlsh) select * from system.versions;

     key   | build_id                                 | build_mode | version
    -------+------------------------------------------+------------+-------------------------------
     local | aaecce2f5068b0160efd04a09b0e28e100b9cd9e |        dev | 4.6.dev-0.20211021.0d744fd3fa
2021-11-05 15:42:42 +02:00
Botond Dénes
5c87263ff8 configure.py: move release.cc and build_id.cc to scylla_core
These two files were only added to the scylla executable and some
specific unit tests. As we are about to use the symbols defined in these
files in some scylla_core code move them there.
2021-11-05 15:42:42 +02:00
Botond Dénes
89cc016f07 db/system_keyspace: add runtime_info table
Loosly contains the equivalent of the `nodetool info` command, with some
notable differences:
* Protocol server related information is in `system.protocol_servers`;
* Information about memory, memtable and cache is reformatted to be
  tailored to scylla: C* specific terminology and metrics are dropped;
* Information that doesn't change and is already in `system.local` is
  not contained;
* Added trace-probability too (`nodetool gettraceprobability`);

TODO(follow-up): exceptions.
2021-11-05 15:42:42 +02:00
Botond Dénes
78adda197f db/system_keyspace: add protocol_servers table
Lists all the client protocol server and their status. Example output:

    (cqlsh) select * from system.protocol_servers;

      name             | is_running | listen_addresses                      | protocol | protocol_version
    ------------------+------------+---------------------------------------+----------+------------------
     native transport |       True | ['127.0.0.1:9042', '127.0.0.1:19042'] |      cql |            3.3.1
	   alternator |      False |                                    [] | dynamodb |
		  rpc |      False |                                    [] |   thrift |           20.1.0
		redis |      False |                                    [] |    redis |

This prints the equivalent of `nodetool statusbinary` and the "Thrift
active" and "Native Transport active" fields from the `nodetool info`
output with some additional information:
* It contains alternator and redis status;
* It contains the protocol version;
* It contains the listen addresses (if respective server is running);
2021-11-05 15:42:42 +02:00
Botond Dénes
3f56c49a9e service: storage_service: s/client_shutdown_hooks/protocol_servers/
Replace the simple client shutdown hook registry mechanism with a more
powerful registry of the protocol servers themselves. This allows
enumerating the protocol servers at runtime, checking whether they are
running or not and starting/stopping them.
2021-11-05 15:42:42 +02:00
Botond Dénes
e9c9a39c06 service: storage_service: remove unused unregister_client_shutdown_hook
Nobody seems to unregister client shutdown hooks ever. We are about
to refactor the client shutdown hook machinery so remove this unused
code to make this easier.
2021-11-05 15:42:41 +02:00
Botond Dénes
f56f4ade22 redis: redis_service: implement the protocol_server interface
In the process de-globalize redis service and pass dependencies in the
constructor.
2021-11-05 15:42:41 +02:00
Botond Dénes
8ddfdd8aa9 alternator: controller: implement the protocol_server interface 2021-11-05 15:42:41 +02:00
Botond Dénes
134fa98ff4 transport: controller: implement the protocol_server interface 2021-11-05 15:42:41 +02:00
Botond Dénes
bda0d0ccba thrift: controller: implement the protocol_server interface 2021-11-05 15:42:41 +02:00
Botond Dénes
3ff8ba9146 Add protocol_server interface
We want to replace the current
`storage_service::register_client_shutdown_hook()` machinery with
something more powerful. We want to register all running client protocol
servers with the storage service, allowing enumerating these at runtime,
checking whether they are running or not and starting/stopping them.
As the first step towards this, we introduce an abstract interface that
we are going to implement at the controllers of the various protocol
servers we have. Then we will switch storage service to collect pointers
to this interface instead of simple stop functors.
2021-11-05 15:42:41 +02:00
Botond Dénes
64f658aea4 db/system_keyspace: add snapshots virtual table
Lists the equivalent of the `nodetool listsnapshots` command.
2021-11-05 15:42:41 +02:00
Botond Dénes
f0281eaa98 db/virtual_table: remove _db member
This member is potentially dangerous as it only becomes non-null
sometimes after the virtual table object is constructed. This is asking
for nullptr dereference.
Instead, remove this member and have virtual table implementations that
need a db, ask for it in the constructor, it is available in
`register_virtual_tables()` now.
2021-11-05 15:42:41 +02:00
Botond Dénes
200e2fad4d db/system_keyspace: propagate distributed<> database and storage_service to register_virtual_tables()
As some virtual tables will need the distributed versions of these.
2021-11-05 15:42:41 +02:00
Botond Dénes
185c5f1f5b docs/design-notes/system_keyspace.md: add listing of existing virtual tables
As well as a link to the newly added docs/guides/virtual-tables.md
2021-11-05 15:42:39 +02:00
Michał Radwański
ee601b7d87 partition_snapshot_reader: fix indentation in fill_buffer 2021-11-05 10:51:58 +01:00
Michał Radwański
35b1c3ff52 range_tombstone_list: {lower,upper,}slice share comparator
implementation

slice (2 overloads), upper_slice, lower_slice previously had
implementations of a comparator. Move out the common structs, so that
all 4 of them can share implementation.
2021-11-05 10:51:58 +01:00
Botond Dénes
b8c156d4f7 docs/guides: add virtual-tables.md
Explaining what virtual tables are, what are good candidates for virtual
tables and how you can write one.
2021-11-05 11:49:27 +02:00
Michael Livshin
60f76155a7 build: have configure.py create compile_commands.json
compile_commands.json (a.k.a. "compdb",
https://clang.llvm.org/docs/JSONCompilationDatabase.html) is intended
to help stand-alone C-family LSP servers index the codebase as
precisely as possible.

The actively maintained LSP servers with good C++ support are:
- Clangd (https://clangd.llvm.org/)
- CCLS (https://github.com/MaskRay/ccls)

This change causes a successful invocation of configure.py to create a
unified Scylla+Seastar+Abseil compdb for every selected build mode,
and to leave a valid symlink in the source root (if a valid symlink
already exists, it will be left alone).

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>

Closes #9558
2021-11-05 11:28:37 +02:00
Raphael S. Carvalho
4950ce539c schema: replace outdated comment on default compaction strategy
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211104210043.199156-1-raphaelsc@scylladb.com>
2021-11-05 00:35:41 +02:00
Nadav Har'El
5e52858295 rjson, alternator: rename set() functions add()
The rjson::set() *sounds* like it can set any member of a JSON object
(i.e., map), but that's not true :-( It calls the RapidJson function
AddMember() so it can only add a member to an object which doesn't have
a member with the same name (i.e., key). If it is called with a key
that already has a value, the result may have two values for the same
key, which is ill-formed and can cause bugs like issue #9542.

So in this patch we begin by renaming rjson::set() and its variant to
rjson::add() - to suggest to its user that this function only adds
members, without checking if they already exist.

After this rename, I was left with dozens of calls to the set() functions
that need to changed to either add() - if we're sure that the object
cannot already have a member with the same name - or to replace() if
it might.

The vast majority of the set() calls were starting with an empty item
and adding members with fixed (string constant) names, so these can
be trivially changed to add().

It turns out that *all* other set() calls - except the one fixed in
issue #9542 - can also use add() because there are various "excuses"
why we know the member names will be unique. A typical example is
a map with column-name keys, where we know that the column names
are unique. I added comments in front of such non-obvious uses of
add() which are safe.

Almost all uses of rjson except a handful are in Alternator, so I
verified that all Alternator test cases continue to pass after this
patch.

Fixes #9583
Refs #9542

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211104152540.48900-1-nyh@scylladb.com>
2021-11-04 16:35:38 +01:00
Nadav Har'El
b95e431228 alternator: fix bug in ReturnValues=ALL_NEW
This patch fixes a bug in UpdateItem's ReturnValues=ALL_NEW, which in
some cases returned the OLD (pre-modification) value of some of the
attributes, instead of its NEW value.

The bug was caused by a confusion in our JSON utility function,
rjson::set(), which sounds like it can set any member of a map, but in
fact may only be used to add a *new* member - if a member with the same
name (key) already existed, the result is undefined (two values for the
same key). In ReturnValues=ALL_NEW we did exactly this: we started with
a copy of the original item, and then used set() to override some of the
members. This is not allowed.

So in this patch, we introduce a new function, rjson::replace(), which
does what we previously thought that rjson::set() does - i.e., replace a
member if it exists, or if not, add it. We call this function in
the ReturnValues=ALL_NEW code.

This patch also adds a test case that reproduces the incorrect ALL_NEW
results - and gets fixed by this patch.

In an upcoming patch, we should rename the confusingly-named set()
functions and audit all their uses. But we don't do this in this patch
yet. We just add some comments to clarify what set() does - but don't
change it, and just add one new function for replace().

Fixes #9542

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211104134937.40797-1-nyh@scylladb.com>
2021-11-04 16:34:58 +01:00
Michał Radwański
cac9ac5126 test: memtable: add full_compaction in background
Add full compaction in test_memtable_with_many_versions_conforms_to_mutation_source
in background. Without it, some paths in the partition snapshot reader
weren't covered, as the tests always managed to read all range
tombstones and rows which cover a given clustering range from just a
single snapshot. Now, when full_compaction happens in process of reading
from a clustering range, we can force state refresh with non-nullopt
positions of last row and last range tombstone.

Note: this inability to test affected only the reversing reader.
2021-11-04 16:19:54 +01:00
Michał Radwański
94b263e356 partition_snapshot_reader: fix obtaining rt_slice, if Reversing and
_last_rts was set

If Reversing and _last_rts was set, the created rt_slice still contained
range tombstones between *_last_rts and (snapshot) clustering range end.
This is wrong - the correct range is between (snapshot) clustering range
begin and *_last_rts.
2021-11-04 16:10:07 +01:00
Pavel Emelyanov
6e97d2ce87 Merge branch 'compaction_cleanup_and_improvements_v2' from Raphael S. Carvalho
Cleanup and improvements for compaction

* 'compaction_cleanup_and_improvements_v2' of https://github.com/raphaelsc/scylla:
  compaction: fix outdated doc of compact_sstables()
  table: fix indentation in compact_sstables()
  table: give a more descriptive name to compaction_data in compact_sstables()
  compaction_manager: rename submit_major_compaction to perform_major_compaction
  compaction: fix indentantion in compaction.hh
  compaction: move incremental_owned_ranges_checker into cleanup_compaction
  compaction: make owned ranges const in cleanup_compaction
  compaction: replace outdated comment in regular_compaction
  compaction: give a more descriptive name to compaction_data
  compaction_manager: simplify creation of compaction_data
2021-11-04 17:27:07 +03:00
Raphael S. Carvalho
132a840ed5 compaction: fix outdated doc of compact_sstables()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-04 11:09:24 -03:00
Raphael S. Carvalho
98dd57113f table: fix indentation in compact_sstables()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-04 11:09:24 -03:00
Raphael S. Carvalho
51aa79e267 table: give a more descriptive name to compaction_data in compact_sstables()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-04 11:09:24 -03:00
Raphael S. Carvalho
8ce9cda391 compaction_manager: rename submit_major_compaction to perform_major_compaction
for symmetry, let's call it perform_* as it doesn't work like submission
functions which doesn't wait for result, like the one for minor
compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-04 09:54:00 -03:00
Raphael S. Carvalho
0d745912d0 compaction: fix indentantion in compaction.hh
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-04 09:50:46 -03:00
Raphael S. Carvalho
5af9a690c1 compaction: move incremental_owned_ranges_checker into cleanup_compaction
let's move checker into cleanup as it's not needed elsewhere.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-04 09:49:44 -03:00
Raphael S. Carvalho
04ef2124c6 compaction: make owned ranges const in cleanup_compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-04 09:47:12 -03:00
Raphael S. Carvalho
d86c2491d4 compaction: replace outdated comment in regular_compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-04 09:45:34 -03:00
Raphael S. Carvalho
b344db1696 compaction: give a more descriptive name to compaction_data
info is no longer descriptive, as compaction now works with
compaction_data instead of compaction_info.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-04 09:43:08 -03:00
Raphael S. Carvalho
63dc4e2107 compaction_manager: simplify creation of compaction_data
there's no need for wrapping compaction_data in shared_ptr, also
let's kill unused params in create_compaction_data to simplify
its creation.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-04 09:35:49 -03:00
Takuya ASADA
9b4cf8c532 scylla_util.py: On is_gce(), return False when it's on GKE
GKE metadata server does not provide same metadata as GCE, we should not
return True on is_gce().
So try to fetch machine-type from metadata server, return False if it
404 not found.

Fixes #9471

Signed-off-by: Takuya ASADA <syuu@scylladb.com>

Closes #9582
2021-11-04 12:49:06 +02:00
Avi Kivity
a64458e71a Merge "Run test cases in parallel by default" from Pavel E
"
Some time ago there was introduced the --parallel-cases option
that was set to False by default. Now everything is ready for
making it True.

Running in a BYO job shows that it takes 30 minutes less to
complete the debug tests. Other timings remain almost the same.

tests: unit(dev), unit(debug)
"

* 'br-parallel-cases-by-default' of https://github.com/xemul/scylla:
  test.py: Run parallel cases by default
  test, raft: Keep many-400 case out of debug mode
  test.py: Cache collected test-cases
2021-11-04 10:10:08 +02:00
Pavel Emelyanov
d1679b66f2 test.py: Run parallel cases by default
There were few missing bits before making this the default.

- default max number of AIOs, now tests are run with the greatly
  reduced value

- 1.5 hours single case from database_test, now it's split and
  scales with --parallel-cases

- suite add_test methods called in a loop for --repeat options,
  patch #1 from this set fixes it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-04 10:47:13 +03:00
Pavel Emelyanov
12cf69e5f5 test, raft: Keep many-400 case out of debug mode
This case takes 45+ minutes which is 1.5 times longer then the
second longest case out there. I propose to keep the many-400
case out of debug runs, there's many-100 one which is configured
the same way but uses 4x times less "nodes".

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-04 10:47:13 +03:00
Pavel Emelyanov
0d0ccd50b5 test.py: Cache collected test-cases
The add_test method of a siute can be called several times in a
row e.g. in case of --repeat option or because there are more
than one custom_args entries in the suite.yaml file. In any case
it's pointless to re-collect the test cases by launching the
test binary again, it's much faster (and 100% safe) to keep the
list of cases from the previous call and re-use it if the test
name matches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-04 10:47:13 +03:00
Avi Kivity
e1817b536f build: clobber user/group info from node_exporter tarball
node_exporter is packaged with some random uid/gid in the tarball.
When extracting it as an ordinary user this isn't a problem, since
the uid/gid are reset to the current user, but that doesn't happen
under dbuild since `tar` thinks the current user is root. This causes
a problem if one wants to delete the build directory later, since it
becomes owned by some random user (see /etc/subuid)

Reset the uid/gid infomation so this doesn't happen.

Closes #9579
2021-11-04 09:27:13 +02:00
Raphael S. Carvalho
ab0217e30e compaction: Improve overall efficiency by not diluting it with relatively inefficient jobs
Compaction efficiency can be defined as how much backlog is reduced per
byte read or written.
We know a few facts about efficiency:
1) the more files are compacted together (the fan-in) the higher the
efficiency will be, however...
2) the bigger the size difference of input files the worse the
efficiency, i.e. higher write amplification.

so compactions with similar-sized files are the most efficient ones,
and its efficiency increases with a higher number of files.

However, in order to not have bad read amplification, number of files
cannot grow out of bounds. So we have to allow parallel compaction
on different tiers, but to avoid "dilution" of overall efficiency,
we will only allow a compaction to proceed if its efficiency is
greater than or equal to the efficiency of ongoing compactions.

By the time being, we'll assume that strategies don't pick candidates
with wildly different sizes, so efficiency is only calculated as a
function of compaction fan-in.

Now when system is under heavy load, then fan-in threshold will
automatically grow to guarantee that overall efficiency remains
stable.

Please note that fan-in is defined in number of runs. LCS compaction
on higher levels will have a fan-in of 2. Under heavy load, it
may happen that LCS will temporarily switch to size-tiered mode
for compaction to keep up with amount of data being produced.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211103215110.135633-2-raphaelsc@scylladb.com>
2021-11-03 20:03:23 +02:00