Previous patches switched all the places that called
i_partitioner::shard_of to use sharding_info::shard_of
so i_partitioner::shard_of is no longer used and can
be removed.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Create sharding_info with the same parameters as
the partitioner and use it instead of the partitioner.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
This patch replaces all the uses of i_partitioner:shard_of
with sharding_info::shard_of in read_context.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Previous patches have switched all the places that was
using i_partitioner::token_for_next_shard to
sharding_info::token_for_next_shard. Now the function
can be removed from i_partitioner.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
The function relies only on i_partitioner::shard_count
and i_partitioner::token_fon_next_shard. Both are really implemented
in sharding_info so the method can use them directly.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
After previous patches that switched some tests to use sharding_info
instead of i_partitioner, we now don't need with_partitioner_for_tests_only
and the function can be removed.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
dummy_partitioner was renamed to dummy_sharding_info in
the previous patch. This patch cleans up the names of
files. It's done in a separate patch to not obstruct
the diff of previous patch.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Previously this function was accessing sharding logic
through partitioner obtained from the schema.
While converting tests, dummy_partitioner is turned into
dummy_sharding_info.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
partitioner_test contains test_partitioner_sharding function
which this patch renames to test_sharding and makes it
use sharding_info instead of the partitioner.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
repair does not use partitioner and only uses sharding logic.
This means it does not have to depend on i_partitioner and can
instead operate on sharding_info.
This has an important consequence of allowing the repair of
multiple tables having different partitioners at the same time.
All tables repaired together still have to use the same
sharding logic.
To achieve this the change:
1. Removes partitioner field from repair_info
2. repair_info has access to sharding_info through schema
objects of repaired tables
3. partitioner name is removed from shard_config
4. local and remote partitioners are removed from repair_meta.
Remote sharding_info is used instead.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
This method is not implemented anywhere not to mention the usage.
It is the only resonable thing to remove it instead of keeping
an unused and unimplemented declaration.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
The class does not depend on partitioning logic but only uses
sharding logic. This means it is possible and desirable to limit its
dependency to only sharding_info.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
ring_position_range_vector_sharder does not depend on partitioning logic.
It only uses sharding logic so it is not necessary to store i_partitioner
in the class. Reference to sharding_info is enough.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
ring_position_range_sharder does not depend on partitioning at all.
It only uses sharding so it is enough for the class to take sharding_info
instead of a whole i_partitioner. This patch changes ring_position_range_sharder
class to contain const sharding_info& instead of const i_partitioner&.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
At the moment, we have a single sharding logic per node
but we want to be able to set it per table in the future.
To make it easy to change in the future sharding_info
will be managed inside schema and all the other code
will access it through schema::get_sharding_info function.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
This patch creates a new class called sharding_info.
This new class will now be responsible for all
the sharding logic that before was a part of the partitioner.
In the end, sharding and partitioning logic will be fully
separated but this patch starts with just extracting sharding
logic to sharding_info and embedding it into i_partitioner class.
All sharding functions are still present in i_partitioner but now
they just delegate to the corresponding functions of the embedded
sharding_info object.
Following patches will gradually switch all uses of the following
i_partitioner member functions to their equivalents in sharding_info:
1. shard_of
2. token_for_next_shard
3. sharding_ignore_msb
4. shard_count
After that, sharding_info will be removed from i_partitioner and
the two classes will be totally independent.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
* seastar c7b6b84e5...06a8c8f6e (12):
> scheduling_group_specific: remove inclusion of reactor.hh
> future: Delete void_futurize_helper
> future: Delete unused do_void_futurize_helper instantiation
> core: remove io_queue queued requests metric
> future: Add assert to set_urgent_state
> future: Add a comment to set_urgent_state
> future: Use placement new instead of operator= in set_urgent_state
> file: use correct io_queue in dup()d files
> io_queue: fix miscalculation of sizes when I/O queue is not configured.
> merge: Add log levels to RPC loggers
> reactor: Replace a call to cpu_id with this_shard_id()
> reactor: Drop a few redundant calls to engine()
The column-family is already looked up as the first line in the method.
No need to repeat that lookup in the lambda passed to
`run_when_memory_available()`, we can just capture the reference to the
already obtained column-family object. These objects are safe to
reference, they don't just disappear in the middle of an operation.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200327140827.128647-1-bdenes@scylladb.com>
Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation
number g1, g2, g3.
n1, n2, n3 running scylla version with commit
0a52ecb6df (gossip: Fix max generation
drift measure)
One year later, user wants the upgrade n1,n2,n3 to a new version
when n3 does a rolling restart with a new version, n3 will use a
generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and
g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's
gossip update and mark g3 as down.
Such unnecessary marking of node down can cause availability issues.
For example:
DC1: n1, n2
DC2: n3, n4
When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which
causes the whole DC2 to be unavailable.
To fix, we can start the node with a gossip generation within
MAX_GENERATION_DIFFERENCE difference for the new node.
Once all the nodes run the version with commit
0a52ecb6df, the option is no logger
needed.
Fixes#5164
Most of Scylla code runs with a user-supplied query timeout, expressed as
absolute clock (deadline). When injecting test sleeps into such code, we most
often want to not sleep beyond the user supplied deadline. Extend error
injection API to optionally accept a deadline, and, if it is provided,
sleep no more than up to the deadline. If current time is beyond deadline,
sleep injection is skipped altogether.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Message-Id: <20200326091600.1037717-2-alejo.sanchez@scylladb.com>
The previous patch made the LA format the default. We no longer need to
choose between writing the older KA format or LA, so the LA_SSTABLE
cluster feature has became unnecessary.
Unfortunately, we cannot completely remove this feature: Since commit
4f3ce42163 we cannot remove cluster features
because this node will refuse to join a cluster which already agreed on
features that it lacks - thinking it is an old node trying to join a
new cluster.
So the LA_SSTABLE feature flag remains, and we continue to advertise
that our node supports it. We just no longer care about what other
nodes advertised for it, so we can remove a bit of code that cared.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200324232607.4215-3-nyh@scylladb.com>
Over the years, Scylla updated the sstable format from the KA format to
the LA format, and most recently to the MC format. On a mixed cluster -
as occurs during a rolling upgrade - we want all the nodes, even new ones,
to write sstables in the format preferred by the old version. The thinking
is that if the upgrade fails, and we want to downgrade all nodes back to
the older version, we don't want to lose data because we already have
too-new sstables.
So the current code starts by selecting the oldest format we ever had - KA,
and only switching this choice to LA and MC after we verify that all the
nodes in the cluster support these newer formats.
But before an agreement is reached on the new format, sstables may already
be created in the antique KA format. This is usually harmless - we can
read this format just fine. However, the KA format has a problem that it is
unable to represent table names or keyspaces with the "-" character in them,
because this character is used to separate the keyspace and table names in
the file name. For CQL, a "-" is not allowed anyway in keyspace or table
names; But for Alternator, this character is allowed - and if a KA table
happens to be created by accident (before the LA or MC formats are chosen),
it cannot be read again during boot, and Scylla cannot reboot.
The solution that this patch takes is to change Scylla's default sstable
format to LA (and, as before, if the entire cluster agrees, the newer MC
format will be used). From now on, new KA tables will never be written.
But we still fully support *reading* the KA format - this is important in
case some very old sstables never underwent compaction.
The old code had, confusingly, two places where the default KA format
was chosen. This patch fixes is so the new default (LA) is specified in
only one place.
Fixes#6071.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200324232607.4215-2-nyh@scylladb.com>
* seastar 92c488706...c7b6b84e5 (6):
> semaphore: Use futurize_invoke instead of futurize_apply
> future: specify futurize::make_exception_future as noexcept
> future: Move ignore out of line
> future: Split then and then_impl to enable NRVO
> semaphore_units: allow getting the number of units held
> Merge "Split futurize::apply into invoke(...) and apply(tuple)" from Rafael
Fixes#6073
The logic with pre/post image was tangled into looking at "rs"
and would cause pre-image info to be stored even if only post-image
data was enabled.
Now only generate keys (and rows for them) iff explicitly enabled.
And only generate pre-image key iff we have pre-image data.
Fixes#6070
When mutation splitting was added, non-atomic column assignments were broken
into two invocation of transform. This means the second (actual data assignment)
does not know about the tombstone in first one -> postimage is created as if
we were _adding_ to the collection, not replacing it.
While not pretty, we can handle this knowing that we always get
invoked in timestamp order -> tombstone first, then assign.
So we simply keep track of non-atomic columns deleted across calls
and filter out preimage data post this.
Added test cases for all non-atomics
Now that #3158 is fixed, we can move the consumer to its place after
the `compaction_mutation_state::start_new_page()` call. No need to keep
it as `std::unique_ptr<>`.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200310185147.207665-1-bdenes@scylladb.com>
"
This small series consists of several changes that aim to
reduce the number of shared_ptr's in cql3 code.
Also it contains a patch that makes CqlParser::query to return
std::unique_ptr<> instead of seastar::shared_ptr<>, which leads
to more understandable code and lays foundation for further
optimizations (e.g. possibly eliminating shared_ptr's in
`prepared_statement` and just moving raw statements in `prepare`
without copying them).
Tests: unit(dev, debug)
"
* 'feature/cql_cleanups_9' of https://github.com/ManManson/scylla:
cql3: return raw::parsed_statement as unique_ptr
cql3: de-pointerize arguments to some of CQL grammar rules and definitions.
cql3: make abstract_marker::make_in_receiver accept cref to column_specification
Fixes#5899
When terminating (closing) a segment, we write a trailing block
of zero so reader can have an empty region after last used chunk
as end marker. This is due to using recycled, pre-allocated
segments with potentially non-zero data extending over the point
where we are ending the segment (i.e. we are not fully filling
the segment due to a huge mutation or similar).
However, if we reach end of segment writing the final block
(typically many small mutations), the file will end naturally
after the data written, and any trailing zero block would in fact
just extend the file further. While this will only happen once per
segment recycled (independent on how many times it is recycled),
it is still both slightly breaking the disk usage contract and
also potentially causing some disk stalls due to metadata changes
(though of course very infrequent).
We should only write trailing zero if we are below the max_size
file size when terminating
Adds a small size check to commitlog test to verify size bounds.
(Which breaks without the patch)
v2:
- Fix test to take into account that files might be deleted
behind our backs.
v3:
- Fix test better, by doing verification _before_ segments are
queued for delete.
Message-Id: <20200226121601.15347-2-calle@scylladb.com>
Message-Id: <20200324100235.23982-1-calle@scylladb.com>
Change CQL parsing routine to return std::unique_ptr
instead of seastar::shared_ptr.
This can help reduce redundant shared_ptr copies even further.
Make some supplementary changes necessary for this transition:
* Remove enabled_shared_from_this base class from the following
classes: truncate_statement, authorization_statement,
authentication_statement: these were previously constructing
prepared_statement instance in `prepare` method using
`shared_from_this`.
Make `prepare` methods implementation of inheriting classes
mirror implementation from other statements (i.e.
create a shallow copy of the object when prepairing into
`prepared_statement`; this could be further refactored
to avoid copies as much as possible).
* Remove unused fields in create_role_statement which led to
error while using compiler-generated copy ctor (copying
uninitialied bool values via ctor).
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Make the following rules and definitions accept a reference
instead of shared_ptr's:
* cfamDefinition
* cfamColumns
* pkDef
* typeColumns
* ksName
* cfName
* idxName
* properties
* property
This will reduce a bit the number of countless shared_ptr copies
and moves all over the place in cql3 code.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
These methods just extract some info out of
column_specification, so no need have another copy of
shared_ptr since it's not stored anywhere inside.
Transform abstract_marker::in_raw::make_in_receiver as well
following the call chain.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/6030 from
Piotr Dulikowski:
Adds CDC-related metrics.
Following counters are added, both for total and failed operations:
Total number of CDC operations that did/did not perform splitting,
Total number of CDC operations that touched a particular mutation part.
Total number of preimage selects.
Fixes#6002.
Tests: unit(dev, debug)
* 'cdc-metrics' of github.com:piodul/scylla:
storage_proxy: track CDC operations in LWT flow
storage_proxy: track CDC operations in logged batches
storage_proxy: track CDC operations in standard flow
storage_proxy: add cdc tracker hooks to write response handlers
storage_proxy: move "else if" remainder into "else" block
cdc: create an operation_result_tracker object
cdc: add an object for tracking progress of cdc mutations
cdc: count touched mutation parts in transformer::transform
cdc: track preimage selects in metrics
cdc: register metric counters
cdc: fix non-atomic updates in splitting
Compaction automatically adds gc grace period to expiry times already,
no need to add it when creating the tombstones. Remove the redundant
additions form the code. The direct impact is really minor as this is
only used in tests, but it might confuse readers who are looking at how
tombstones are created across the codebase.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200323120948.92104-1-bdenes@scylladb.com>
Adds a field to abstract_write_response_handler that points to the cdc
operation result tracker, and a function for registering the tracker in
the handlers that currently write to a CDC log table.