We added this filter after detecting a bug in the Raft-based
topology. We weren't sending `barrier_and_drain` commands to a
decommissioning node that could still be coordinating requests.
It could cause stale topology exceptions on replicas if the
decommissioning node sent a request with an old topology version
after normal nodes received the new fence version.
This bug has been fixed in the previous commit, so we remove the
filter.
Before this patch, we didn't send the `barrier_and_drain` command
to a decommissioning node that could still be coordinating
requests. It could happen that a decommissioning node sent
a request with an old topology version after normal nodes received
the new fence version. Then, the request would fail on replicas
with the stale topology exception.
We fix this problem by modifying `exec_global_command`. From now
on, it sends `barrier_and_drain` to a decommissioning node, which
can also be in the `left_token_ring` state.
We add a sanity check to ensure at most one transitioning node at
a time. If there is more, something must have gone wrong.
In the future, we might implement concurrent topology operations.
Then, we will remove this sanity check.
We also extend the comment describing `transition_nodes` so that
it better explains why we use a map and how it should be handled.
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for db::schema_tables::table_kind,
and its operator<<() is still used by the homebrew generic formatter
for std::map<>, so it is preserved.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16972
Loading schemas of views and indexes was not supported, with either `--schema-file`, or when loading schema from schema sstables.
This PR addresses both:
* When loading schema from CQL (file), `CREATE MATERIALIZED VIEW` and `CREATE INDEX` statements are now also processed correctly.
* When loading schema from schema tables, `system_schema.views` is also processed, when the table has no corresponding entry in `system_schema.tables`.
Tests are also added.
Fixes: #16492Closesscylladb/scylladb#16517
* github.com:scylladb/scylladb:
test/cql-pytest: test_tools.py: add schema-loading tests for MV/SI
test/cql-pytest: test_tools.py: extract some fixture logic to functions
test/cql-pytest: test_tools.py: extract common schema-loading facilities into base-class
tools/schema_loader: load_schema_from_schema_tables(): add support for MV/SI schemas
tools/schema_loader: load_one_schema_from_file(): add support for view/index schemas
test/boost/schema_loader_test: add test for mvs and indexes
tools/schema_loader: load_schemas(): implement parsing views/indexes from CQL
replica/database: extract existing_index_names and get_available_index_name
tools/schema_loader: make real_db.tables the only source of truth on existing tables
tools/schema_loader: table(): store const keyspace&
tools/schema_loader: make database,keyspace,table non-movable
cql3/statements/create_index_statement: build_index_schema(): include index metadata in returned value
cql3/statements/create_index_statement: make build_index_schema() public
cql3/statements/create_index_statement: relax some method's dependence on qp
cql3/statements/create_view_statement: make prepare_view() public
Native histograms (also known as sparse histograms) are an experimental Prometheus feature.
They use protobuf as the reporting layer.
Native histograms hold the benefits of high resolution at a lower resource cost.
This series allows sending histograms in a native histogram format over protobuf.
By default, protobuf support is disabled. To use protobuf with native histograms, the command line flag prometheus_allow_protobuf should be set to true, and the Prometheus server should send the accept header with protobuf.
Fixes#12931Closesscylladb/scylladb#16737
* github.com:scylladb/scylladb:
main.cc: Add prometheus_allow_protobuf command line
histogram_metrics_helper: support native histogram
config: Add prometheus_allow_protobuf flag
Add empty line before list of different checksums in
validate-checksums's description. Otherwise the list is not rendered.
Closesscylladb/scylladb#16401
we deduce the paths to other SSTable components from the one
specified from the command line, for instance, if
/a/b/c/me-really-big-Data.db is fed to `scylla sstable`, the tool
would try to read /a/b/c/me-really-big-TOC.txt for the list of
other components. this works fine if the full path is specified
in the command line.
but if a relative path is specified, like, "me-really-big-Data.db",
this does not work anymore. before this change, the tool
would be reading "/me-really-big-TOC.txt", which does not exist
under most circumstances. while $PWD/me-really-big-TOC.txt should
exist if the SSTable is sane.
after this change, we always convert the specified path to
its canonical representation, no matter it is relative or absolutate.
this enables us to get the correct parent path path when trying
to read, for instance, the TOC component.
Fixes#16955
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16964
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for rjson::value, and drop its
operator<<().
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16956
The `topology_coordinator` is a large class (>1000 loc) which resides in
an even larger source file (storage_service.cc, ~7800 loc). This PR
moves the topology_coordinator class out of the storage_service.cc file
in order to improve modularity and recompilation times during
development.
As a first step, the `topology_mutation_builder` and
`topology_node_mutation_builder` classes are also moved from
storage_service.cc to their own, new header/source files as they are an
important abstraction used both by the topology coordinator code and
some other code in storage_service.cc that won't be moved.
Then, the `topology_coordinator` is moved out. The
`topology_coordinator` class is completely hidden in the new
topology_coordinator.cc file and can only be started and waited on to
finish via the new `run_topology_coordinator` function.
Fixes: scylladb/scylladb#16605Closesscylladb/scylladb#16609
* github.com:scylladb/scylladb:
service: move topology coordinator to a separate file
storage_service: introduce run_topology_coordinator function
service: move topology mutation builder out of storage_service
storage_service: detemplate topology_node_mutation_builder::set
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
cql3::statements::statement_type. and its operator<<() is dropped.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16948
The topology coordinator is a large class that sits in an even larger
storage_service.cc file. For the sake of code modularization and
reducing recompilation time, move the topology coordinator outside
storage_service.cc.
The topology_coordinator class is moved to the new
topology_coordinator.cc unchanged. Along with it, the following items
are moved:
- wait_for_ip function - it's used both by storage_service and
topology_coordinator, so in order for the new topology_coordinator.cc
not to depend on storage service, it is moved to the new file,
- raft_topology logger - for the same reason as wait_for_ip,
- run_topology_coordinator - serves as the main interface for the
topology coordinator. The topology coordinator class is not exposed at
all, it's only possible to start the coordinator and wait until it
shuts down itself via that function.
Extracts a part of the logic of the raft_state_monitor_fiber method into
a separate function. It will be moved to a separate file in the next
commit along with the topology coordinator, and will serve as the only
way of interaction with the topology coordinator while the class itself
will remain hidden.
The topology_coordinator class is now directly constructed on the stack
(or rather in the coroutine frame), the indirection via shared_ptr is no
longer needed.
Before introduction of PR#15524 the removal had always been invoked
via finally() continuation. In spite of making flush() noexcept, the
mentioned PR modified the logic. If flush() returns exceptional future,
then the removal is not performed.
This change restores the old behavior - removal operation is always called.
Since now, the logic of compaction_group::stop() is as follows:
- firstly, it waits for completion of flush() via
seastar::coroutine::as_future() to avoid premature exception
- then it executes compaction_manager.remove()
- in the end it inspects the future returned from flush()
to re-throw the exception if the operation failed
Fixed: scylladb#16751
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#16940
Alternator incorrectly refuses an empty tag value for TagResource, but DynamoDB does allow this case and it's useful (note that an empty tag key is rightly forbidden). So this short series fixes this case, and adds additional tests for TagResource which covers this case and other cases we forgot to cover in tests.
Fixes#16904.
Closesscylladb/scylladb#16910
* github.com:scylladb/scylladb:
test/alternator: add more tests for TagResource
alternator: allow empty tag value
There are currently two options how to "request" the number of initial tables for a table
1. specify it explicitly when creating a keyspace
2. let scylla calculate it on its own
Both are not very nice. The former doesn't take cluster layout into consideration. The latter does, but starts with one tablet per shard, which can be too low if the amount of data grows rapidly.
Here's a (maybe temporary) proposal to facilitate at least perf tests -- the --tablets-initial-scale-factor option that enhances the option number two above by multiplying the calculated number of tablets by the configured number. This is what we currently do to run perf tests by patching scylla, with the option it going to be more convenient.
Closesscylladb/scylladb#16919
* github.com:scylladb/scylladb:
config: Add --tablets-initial-scale-factor
tablet_allocator: Add initial tablets scale to config
tablet_allocator: Add config
This patch add the prometheus_allow_protobuf command line support.
When set to true, Prometheus will accept protobuf requests and will
reply with protobuf protocol.
This will also enable the experimental Prometheus Native Histograms.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
approx_exponential_histogram uses similar logic to Prometheus native
histogram, to allow Prometheus sending its data in a native histogram
format it needs to report schema and min id (id of the first bucket).
This patch update to_metrics_histogram to set those optional parameters,
leaving it to the Prometheus to decide in what format the histogram will
be reported.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Native histograms (also known as sparse histograms) are an experimental
Prometheus feature. They use protobuf as the reporting layer. The
prometheus_allow_protobuf flag allows the user to enable protobuf
protocol. When this flag is set to true, and the Prometheus server sends
in the request that it accepts protobuf, the result will be in protobuf
protocol.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
The topology_mutation_builder, topology_node_mutation_builder and
topology_request_tracking_mutation_builder are currently used by
storage service - mainly, but not exclusively, by the topology
coordinator logic. As we are going to extract the topology coordinator
to a separate file, we need to move the builders to their own file as
well so that they will be accessible both by the topology coordinator
and the storage service.
One of the overloads of `topology_node_mutation_builder::set` is a
template which takes a std::set of things that convert to a sstring.
This was done to support sets of strings of different types (e.g.
sstring, string_view) but it turns out that only sstring is used at the
moment.
De-template the method as it is unnecessary for it to be a template.
Moreover, the `topology_node_mutation_builder` is going to be moved in
the next commit of the PR to a separate file, so not having template
methods makes the task simpler.
Issue #16904 discovered that Alternator refuses to allow an empty tag
value while it's useful (and DynamoDB allows it). This brought to my
attention that our test coverage of the TagResource operation was lacking.
So this patch adds more tests for some corner cases of TagResource which
we missed, including the allowed lengths of tag keys and values.
These tests reproduce #16904 (the case of empty tag value) and also #16908
(allowing and correctly counting unicode letters), and also add
regression testing to cases which we already handled correctly.
As usual, all the new tests also pass on DynamoDB.
Refs #16904
Refs #16908
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The existing code incorrectly forbid setting a tag on a table to an empty
string value, but this is allowed by DynamoDB and is useful, so we fix it
in this patch.
While at it, improve the error-checking code for tag parameters to
cleanly detect more cases (like missing or non-string keys or values).
The following patch is a test that fails before this patch (because
it fails to insert a tag with an empty value) and passes after it.
Fixes#16904.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This PR fixes test_tablet_missing_data_repair and enable the test again.
If a node is not UP yet, repair in the test will be a partial repair. The partial repair will not repair all the data which cause the check of rows after repair to fail. Check nodes see each other as UP before repair.
Closesscylladb/scylladb#16930
* github.com:scylladb/scylladb:
test: Enable test_tablet_missing_data_repair again
test: Wait for nodes to be up when repair
test: Check repair status in ScyllaRESTAPIClient
This commit improves the developer-oriented section
of the core documentation:
- Added links to the developer sections in the new
Get Started guide (Develop with ScyllaDB and
Tutorials and Example Projects) for ease of access.
- Replaced the outdated Learn to Use ScyllaDB page with
a link to the up-to-date page in the Get Started guide.
This involves removing the learn.rst file and adding
an appropriate redirection.
- Removed the Apache Copyrights, as this page does not
need it.
- Removed the Features panel box as there was only one
feature listed, which looked weird. Also, we are in
the process of removing the Features section.
Closesscylladb/scylladb#16800
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for enum_option<>. since its
operator<<() is still used by the homebrew generic formatter for
formatting vector<>, operator<<() is preserved.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16917
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
cql3::authorized_prepared_statements_cache_key, and remove its
operator<<().
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16924
it seems that the tree builds just fine with this warning enabled.
and narrowing is a potentially unsafe numeric conversion. so let's
enable this warning option.
this change also helps to reduce the difference between the rules
generated by configure.py and those generated by CMake.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16929
Previous patch taught tablets allocator to multiply the initial tablets
count by some value. This patch makes this factor configurable
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When allocating tablets for table for the frist time their initial count
is calculated so that each shard in a cluster gets one tablet. It may
happen that more than one initial tablet per shard is better, e.g. perf
tests typically rely on that.
It's possible to specify the initial tablets count when creating a
keyspace, this number doesn't take the cluster topology into
consideration and may also be not very nice.
As a temporary solution (e.g. for perf tests) we may add a configurable
that scales the initial number of calculated tablets by some factor
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Tablet allocator is a sharded service, that starts in main, it's worth
equipping it with a config. Next patches will fill it with some payload
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
this change addresses the possible data resurrection after
"nodetool compact" and "nodetool flush" commands. and prepare for
the fix of a similar data resurrection issue after "nodetool cleanup".
active commitlog segments are recycled in the background once they are
discarded.
and there is a chance that we could have data resurrection even after
"nodetool cleanup", because the mutations in commitlog's active segments
could change the tables which are supposed to be removed by
"nodetool cleanup", so as a solution to address this problem in the
pre-tablets era, we force new active segments of commitlog, and flush the
involved memtables. since the active segments are discarded in the
background, the completion of the "nodetool cleanup" does not guarantee
that these mutation won't be applied to memtable when server restarts,
if it is killed right away.
the same applies to "force_flush", "force_compaction" and
"force_keyspace_compaction" API calls which are used by nodetool as
well. quote from Benny's comment
> If major comapction doesn't wait for the commitlog deletion it is
> also exposed to data resurrection since theoretically it could purge
> tombstones based on the assumption that commitlog would not resurrect
> data that they might shadow, BUT on a crash/restart scenario commitlog
> replay would happen since the commitlog segments weren't deleted -
> breaking the contract with compaction.
so to ensure that the active segments are reclaimed upon completion of
"nodetool cleanup", "nodetool compact" and "nodetool flush" commands,
let's wait for pending deletes in `database::flush_all_tables()`, so the
caller wait until the reclamation of deleted active segments completes.
Refs #4734
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16915
This enhancement formats descriptions in config.cc using the standard markup language reStructuredText (RST).
By doing so, it improves the rendering of these descriptions in the documentation, allowing you to use various directives like admonitions, code blocks, ordered lists, and more.
Closesscylladb/scylladb#16311
The name of the keyspace being part of the partition key is not useful,
the table_id already uniquely identifies the table. The keyspace name
being part of the key, means that code wanting to interact with this
table, often has to resolve the table id, just to be able to provide the
keyspace name. This is counter productive, so make the keyspace_name
just a static column instead, just like table_name already is.
Fixes: #16377Closesscylladb/scylladb#16881