Commit Graph

24422 Commits

Author SHA1 Message Date
Pavel Solodovnikov
4ab1f7f55d idl: Decouple idl-compiler data structures from grammar structure
Instead of operating on the raw lists of tokens, transform them into
typed structures representation, which makes the code by many orders of
magnitude simpler to read, understand and extend.

This includes sweeping changes throughout the whole source code of the
tool, because almost every function was tightly coupled to the way
data was passed down from the parser right to the code generation
routines.

Tested manually by checking that old generated sources are precisely
the same as the new generated sources.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2020-12-15 15:59:17 +03:00
Tomasz Grabiec
0aa1f7c70a Merge "raft: fix replication if existing log on leader" from Gleb
* scylla-dev/add_dummy_v2:
  raft: test: replication works on leader change without adding an entry
  raft: commit a dummy entry after leader change
  raft: test: fix snapshot correctness check
  sstables: add `may_have_partition_tombstones` method
2020-11-24 11:35:18 +01:00
Gleb Natapov
51d1d20687 raft: test: replication works on leader change without adding an entry
Check that a newly elected leader commits all the entries in its log
without waiting for more entries to be submitted.
2020-11-24 11:35:18 +01:00
Gleb Natapov
6130fb8b39 raft: commit a dummy entry after leader change
After a node becomes leader it needs to do two things: send an append
message to establish its leadership and commit one entry to make sure
all previous entries with smaller terms are committed as well.
2020-11-24 11:35:18 +01:00
Gleb Natapov
e3a886738b raft: test: fix snapshot correctness check
Snapshot index cannot be used to check snapshot correctness since some
entries may not be command and thus do not affect snapshot value. Lest
use applied entries count instead.
2020-11-24 11:35:18 +01:00
Asias He
1b2155eb1d repair: Use same description for the same metric
In commit 9b28162f88 (repair: Use label
for node ops metrics), we switched to use label for different node
operations. We should use the same description for the same metric name.

Fixes #7681

Closes #7682
2020-11-24 09:35:39 +02:00
Avi Kivity
e8ff77c05f Merge 'sstables: a bunch of refactors' from Kamil Braun
1. sstables: move `sstable_set` implementations to a separate module

    All the implementations were kept in sstables/compaction_strategy.cc
    which is quite large even without them. `sstable_set` already had its
    own header file, now it gets its own implementation file.

    The declarations of implementation classes and interfaces (`sstable_set_impl`,
    `bag_sstable_set`, and so on) were also exposed in a header file,
    sstable_set_impl.hh, for the purposes of potential unit testing.

2. mutation_reader: move `mutation_reader::forwarding` to flat_mutation_reader.hh

    Files which need this definition won't have to include
    mutation_reader.hh, only flat_mutation_reader.hh (so the inclusions are
    in total smaller; mutation_reader.hh includes flat_mutation_reader.hh).

3. sstables: move sstable reader creation functions to `sstable_set`

    Lower level functions such as `create_single_key_sstable_reader`
    were made methods of `sstable_set`.

    The motivation is that each concrete sstable_set
    may decide to use a better sstable reading algorithm specific to the
    data structures used by this sstable_set. For this it needs to access
    the set's internals.

    A nice side effect is that we moved some code out of table.cc
    and database.hh which are huge files.

4. sstables: pass `ring_position` to `create_single_key_sstable_reader`

    instead of `partition_range`.

    It would be best to pass `partition_key` or `decorated_key` here.
    However, the implementation of this function needs a `partition_range`
    to pass into `sstable_set::select`, and `partition_range` must be
    constructed from `ring_position`s. We could create the `ring_position`
    internally from the key but that would involve a copy which we want to
    avoid.

5. sstable_set: refactor `filter_sstable_for_reader_by_pk`

    Introduce a `make_pk_filter` function, which given a ring position,
    returns a boolean function (a filter) that given a sstable, tells
    whether the sstable may contain rows with the given position.

    The logic has been extracted from `filter_sstable_for_reader_by_pk`.

Split from #7437.

Closes #7655

* github.com:scylladb/scylla:
  sstable_set: refactor filter_sstable_for_reader_by_pk
  sstables: pass ring_position to create_single_key_sstable_reader
  sstables: move sstable reader creation functions to `sstable_set`
  mutation_reader: move mutation_reader::forwarding to flat_mutation_reader.hh
  sstables: move sstable_set implementations to a separate module
2020-11-24 09:23:57 +02:00
Kamil Braun
d158921966 sstables: add may_have_partition_tombstones method
For sstable versions greater or equal than md, the `min_max_column_names`
sstable metadata gives a range of position-in-partitions such that all
clustering rows stored in this sstable have positions in this range.

Partition tombstones in this context are understood as covering the
entire range of clustering keys; thus, if the sstable contains at least
one partition tombstone, the sstable position range is set to be the
range of all clustered rows.

Therefore, by checking that the position range is *not* the range of all
clustered rows we know that the sstable cannot have any partition tombstones.

Closes #7678
2020-11-23 23:30:19 +02:00
Kamil Braun
72c59e8000 flat_mutation_reader: document assumption about fast_forward_to
It is not legal to fast forward a reader before it enters a partition.
One must ensure that there even is a partition in the first place. For
this one must fetch a `partition_start` fragment.

Closes #7679
2020-11-23 17:39:46 +01:00
Pavel Emelyanov
fea4a5492f system-keyspace: Remove dead code
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20201123151453.27341-1-xemul@scylladb.com>
2020-11-23 17:16:15 +02:00
Tomasz Grabiec
36f9da6420 Merge "raft: testing: snapshots and partitioning elections" from Alejo
Fixes, features needed for testing, snapshot testing.
Free election after partitioning (replication test) .

* https://github.com/alecco/scylla/tree/raft-ale-tests-05e:
  raft: replication test: partitioning with leader
  raft: replication test: run free election after partitioning
  raft: expose fsm tick() to server for testing
  raft: expose is_leader() for testing
  raft: replication test: test take and load snapshot
  raft: fix a bug in leader election
  raft: fix default randomized timeout
  raft: replication test: fix custom next leader
  raft: replication test: custom next leader noop for same
  raft: replication test: fix failure detector for disconnected
2020-11-23 14:36:39 +01:00
Kamil Braun
6c8b0af505 sstable_set: refactor filter_sstable_for_reader_by_pk
Introduce a `make_pk_filter` function, which given a ring position,
returns a boolean function (a filter) that given a sstable, tells
whether the sstable may contain rows with the given position.

The logic has been extracted from `filter_sstable_for_reader_by_pk`.
2020-11-23 12:35:10 +01:00
Kamil Braun
68663d0de0 sstables: pass ring_position to create_single_key_sstable_reader
instead of partition_range.

It would be best to pass `partition_key` or `decorated_key` here.
However, the implementation of this function needs a `partition_range`
to pass into `sstable_set::select`, and `partition_range` must be
constructed from `ring_position`s. We could create the `ring_position`
internally from the key but that would involve a copy which we want to
avoid.
2020-11-23 12:33:24 +01:00
Takuya ASADA
b90ddc12c9 scylla_prepare: add --tune system when SET_CLOCKSOURCE=yes
perftune.py only run clocksource setup when --tune system specified,
so we need to add it on the parameter when SET_CLOCKSOURCE=yes.

Fixes #7672
2020-11-23 10:51:16 +02:00
Avi Kivity
f8e0517bc7 cql: do not advance timeouts on internal pages
Currently, each internal page fetched during aggregating
gets a timeout based on the time the page fetch was started,
rather than the query start time. This means the query can
continue processing long after the client has abandoned it
due to its own timeout, which is based on the query start time.

Fix by establishing the timeout once when the query starts, and
not advancing it.

Test: manual (SELECT count(*) FROM a large table).

Fixes #1175.

Closes #7662
2020-11-23 08:14:18 +01:00
Alejo Sanchez
1f8ca4e06d raft: replication test: partitioning with leader
For test simplicity support

    partition{leader{A},B,C,D}

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-22 22:39:00 -04:00
Avi Kivity
3eac976e24 build: remove non-C/C++ jobs from submodule_pools
The C and C++ sub-builds were placed in submodule_pool to
reduce concurrency, as they are memory intensive (well, at least
the C++ jobs are), and we choose build concurrency based on memory.
But the other submodules are not memory intensives, and certainly
the packaging jobs are not (and they are single-threaded too).

To allow these simple jobs to utilize multicores more efficiently,
remove them from submodule_pool so they can run in parallel.

Closes #7671
2020-11-23 00:32:41 +02:00
Avi Kivity
bcced9f56b build: compress unified package faster
The unified package is quite large (1GB compressed), and it
is the last step in the build so its build time cannot be
parallized with other tasks. Compress it with pigz to take
advantage of multiple cores and speed up the build a little.

Closes #7670
2020-11-23 00:31:04 +02:00
Takuya ASADA
3fefa520bd dist/common/scripts: drop run() and out(), swtich to subprocess.run()
We initially implemented run() and out() functions because we couldn't use
subprocess.run() since we were on Python 3.4.
But since we moved to relocatable python3, we don't need to implement it ourselves.
Why we keep using these functions are, because we needed to set environemnt variable to set PATH.
Since we recently moved away these codes to python thunk, we finally able to
drop run() and out(), switch to subprocess.run().
2020-11-22 17:59:27 +02:00
Alejo Sanchez
f12fed0809 raft: replication test: run free election after partitioning
When partitioning without keeping the existing leader, run an election
without forcing a particular leader.

To force a leader after partitioning, a test can just set it with new_leader{X}.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-22 10:32:34 -04:00
Alejo Sanchez
d610d5a7b8 raft: expose fsm tick() to server for testing
For tests to advance servers they need to invoke tick().

This is needed to advance free elections.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-22 10:32:34 -04:00
Alejo Sanchez
9e7e14fc50 raft: expose is_leader() for testing
Expose fsm leader check to allow tests to find out the leader after an
election.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-22 10:32:34 -04:00
Alejo Sanchez
f4d0131f02 raft: replication test: test take and load snapshot
Through configuration trigger automatic snapshotting.

For now, handle expected log index within the test's state machine and
pass it with snapshot_value (within the test file).

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-22 10:32:34 -04:00
Konstantin Osipov
bce8cb11a7 raft: fix a bug in leader election
If a server responds favourably to RequestVote RPC, it should
reset its election timer, otherwise it has very high chances of becoming
a candidate with an even newer term, despite successful elections.
A candidate with a term larger than the leader rejects AppendEntries
RPCs and can not become a leader itself (because of protection
against of disruptive leaders), so is stuck in this state.
2020-11-22 10:32:34 -04:00
Alejo Sanchez
08f8c418df raft: fix default randomized timeout
Range after election timeout should start at +1.
This matches existing update_current_term() code adding dist(1, 2*n).

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-22 10:32:34 -04:00
Alejo Sanchez
ab3a8b7bcd raft: replication test: fix custom next leader
Adjustments after changes due to free election in partitioning and changes in
the code.

Elapse previous leader after isolating it.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-22 10:32:22 -04:00
Alejo Sanchez
3bff7d1d21 raft: replication test: custom next leader noop for same
If custom specified leader is same do nothing.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-22 10:15:20 -04:00
Avi Kivity
1e170ebfc1 Merge 'Changing hints configuration followup' from Piotr Dulikowski
Follow-up to https://github.com/scylladb/scylla/pull/6916.

- Fixes wrong usage of `resource_manager::prepare_per_device_limits`,
- Improves locking in `resource_manager` so that it is more safe to call its methods concurrently,
- Adds comments around `resource_manager::register_manager` so that it's more clear what this method does and why.

Closes #7660

* github.com:scylladb/scylla:
  hints/resource_manager: add comments to register_manager
  hints/resource_manager: fix indentation
  hints/resource_manager: improve mutual exclusion
  hints/resource_manager: correct prepare_per_device_limits usage
2020-11-22 15:06:35 +02:00
Alejo Sanchez
1436e4a323 raft: replication test: fix failure detector for disconnected
For a disconnected server all other servers is_alive() is false.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-22 09:04:58 -04:00
Pekka Enberg
2c8dcbe5c5 reloc: Remove "build_reloc.sh" script as obsolete
The "ninja dist-server-tar" command is a full replacement for
"build_reloc.sh" script. We release engineering infrastructure has been
switched to ninja, so let's remove "build_reloc.sh" as obsolete.
2020-11-20 22:41:26 +02:00
Piotr Sarna
5a9dc6a3cc Merge 'Cleanup CDC tests after CDC became GA' from Piotr Jastrzębski
Now that CDC is GA, it should be enabled in all the tests by default.
To achieve that the PR adds a special db::config::add_cdc_extension()
helper which is used in cql_test_envm to make sure CDC is usable in
all the tests that use cql_test_env.m As a result, cdc_tests can be
simplified.
Finally, some trailing whitespaces are removed from cdc_tests.

Tests: unit(dev)

Closes #7657

* github.com:scylladb/scylla:
  cdc: Remove trailing whitespaces from cdc_tests
  cdc: Remove mk_cdc_test_config from tests
  config: Add add_cdc_extension function for testing
  cdc: Add missing includes to cdc_extension.hh
2020-11-20 13:56:29 +01:00
Konstantin Osipov
269c049a16 test.py: enable back CQL based tests
The patch which introduces build-dependent testing
has a regression: it quietly filters out all tests
which are not part of ninja output. Since ninja
doesn't build any CQL tests (including CQL-pytest),
all such tests were quietly disabled.

Fix the regression by only doing the filtering
in unit and boost test suites.

test: dev (unit), dev + --build-raft
Message-Id: <20201119224008.185250-1-kostja@scylladb.com>
2020-11-20 11:45:15 +02:00
Pekka Enberg
6a04ae69a2 Update seastar submodule
* seastar c861dbfb...010fb0df (3):
  > build: clean up after failed -fconcepts detection
  > logger: issue std::endl to output stream
  > util/log: improve discoverability of log rate-limiting
2020-11-20 11:43:11 +02:00
Avi Kivity
82b508250e tools: toolchain: dbuild: don't confine with seccomp
Some systems (at least, Centos 7, aarch64) block the membarrier()
syscall via seccomp. This causes Scylla or unit tests to burn cpu
instead of sleeping when there is nothing to do.

Fix by instructing podman/docker not to block any syscalls. I
tested this with podman, and it appears [1] to be supported on
docker.

[1] https://docs.docker.com/engine/security/seccomp/#run-without-the-default-seccomp-profile

Closes #7661
2020-11-20 09:11:52 +02:00
Kamil Braun
40d8bfa394 sstables: move sstable reader creation functions to sstable_set
Lower level functions such as `create_single_key_sstable_reader`
were made methods of `sstable_set`.

The motivation is that each concrete sstable_set
may decide to use a better sstable reading algorithm specific to the
data structures used by this sstable_set. For this it needs to access
the set's internals.

A nice side effect is that we moved some code out of table.cc
and database.hh which are huge files.
2020-11-19 17:52:39 +01:00
Kamil Braun
708093884c mutation_reader: move mutation_reader::forwarding to flat_mutation_reader.hh
Files which need this definition won't have to include
mutation_reader.hh, only flat_mutation_reader.hh (so the inclusions are
in total smaller; mutation_reader.hh includes flat_mutation_reader.hh).
2020-11-19 17:52:39 +01:00
Kamil Braun
b02b441c2e sstables: move sstable_set implementations to a separate module
All the implementations were kept in sstables/compaction_strategy.cc
which is quite large even without them. `sstable_set` already had its
own header file, now it gets its own implementation file.

The declarations of implementation classes and interfaces (`sstable_set_impl`,
`bag_sstable_set`, and so on) were also exposed in a header file,
sstable_set_impl.hh, for the purposes of potential unit testing.
2020-11-19 17:52:37 +01:00
Avi Kivity
70689088fd Merge "Remove reference on database from global qctx" from Pavel E
"
The qctx is global object that references query processor and
database to let the rest of the code query system keyspace.

As the first step of de-globalizing it -- remove the database
reference from it. After the set the qctx remains a simple
wrapper over the query processor (which is already de-globalized)
and the query processor in turn is mostly needed only to parse
the query string into prepared statement only. This, in turn,
makes it possible to remove the qctx later by parsing the
query strings on boot and carrying _them_ around, not the qctx
itself.

tests: unit(dev), dtest(simple_cluster_driver_test:dev), manual start/stop
"

* 'br-remove-database-from-qctx' of https://github.com/xemul/scylla:
  query-context: Remove database from qctx
  schema-tables: Use query processor referece in save_system(_keyspace)?_schema
  system-keyspace: Rewrite force_blocking_flush
  system-keyspace: Use cluster_name string in check_health
  system-keyspace: Use db::config in setup_version
  query-context: Kill global helpers
  test: Use cql_test_env::evecute_cql instead of qctx version
  code: Use qctx::evecute_cql methods, not global ones
  system-keyspace: Do not call minimal_setup for the 2nd time
  system-keyspace: Fix indentation after previous patch
  system-keyspace: Do not do invoke_on_all by hands
  system-keyspace: Remove dead code
2020-11-19 18:31:51 +02:00
Pavel Emelyanov
689fd029a1 query-context: Remove database from qctx
No users of qctx::db are left.  One global database reference less.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
464c8990d4 schema-tables: Use query processor referece in save_system(_keyspace)?_schema
The save_system_schema and save_system_keyspace_schema are both
called on start and can the needed get query processor reference
from arguments.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
66dcc47571 system-keyspace: Rewrite force_blocking_flush
The method is called after query_processor::execute_internal
to flush the cf. Encapsulating this flush inside database and
getting the database from query_processor lets removing
database reference from global qctx object.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
6cad18ad33 system-keyspace: Use cluster_name string in check_health
The check_help needs global qctx to get db.config.cluster_name,
which is already available at the caller side.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
36a3ee6ad4 system-keyspace: Use db::config in setup_version
This is the beginning of de-globalizing global qctx thing.

The setup_version() needs global qctx to get config from.
It's possible to get the config from the caller instead.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
43039a0812 query-context: Kill global helpers
Now the db::execute_cql* callers are patched, the global
helpers can be removed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
64eef0a4f7 test: Use cql_test_env::evecute_cql instead of qctx version
Similar to previous patch, but for tests. Since cql_test_env
does't have qctx on board, the patch makes one step forward
and calls what is called by qctx::execute_cql.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
303ebe4a36 code: Use qctx::evecute_cql methods, not global ones
There are global db::execute_cql() helpers that just forward
the args into qctx::execute_cql(). The former are going away,
so patch all callers to use qctx themselves.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
8bf6b1298c system-keyspace: Do not call minimal_setup for the 2nd time
THe system_keyspace::minimal_setup is called by main.cc by hands
already, some steps before the regular ::setup().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
7b82ec2f9e system-keyspace: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
1773dadc72 system-keyspace: Do not do invoke_on_all by hands
The cache_truncation_record needs to run cf.cache_truncation_record
on each shard's DB, so the invoke_on_all can be used.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
fb20d9cd1e system-keyspace: Remove dead code
Not called anywhare.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00