This configuration entry is expected to be used as a quick fix
for an overloaded node, so it should be possible to reload this value
without having to restart the server.
The newly introduced parameter - max_concurrent_requests_per_shard
- can be used to limit the number of in-flight requests a single
coordinator shard can handle. Each surplus request will be
immediately refused by returning OverloadedException error to the client.
The default value for this parameter is large enough to never
actually shed any requests.
Currently, the limit is only applied to CQL requests - other frontends
like alternator and redis are not throttled yet.
Introduce new database config option `schema_registry_grace_period`
describing the amount of time in seconds after which unused schema
versions will be cleaned up from the schema registry cache.
Default value is 1 second, the same value as was hardcoded before.
Tests: unit(debug)
Refs: #7225
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200915131957.446455-1-pa.solodovnikov@scylladb.com>
MD format is disabled by default at this point.
The option extends enable_sstables_mc_format
so that both are needed to be set for supporting
the md format.
The MD_FORMAT cluster feature will be added in
a following patch.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
"
This is inspired by #6781. The idea is to make Scylla listen for CQL connections on port 9042 (where both old shard-aware and shard-unaware clients can still connect the traditional way). On top of that I added a new port, where everything works the same way, only the port from client's socket used to determine the shard No. to connect to. Desired shard No. is the result of `clientside_port % num_shards`.
The new port is configurable from scylla.yaml and defaults to 19042 (unencrypted, unless user configures encryption options and omits `native_shard_aware_transport_port_ssl` in DB config).
Two "SUPPORTED" tags are added: "SCYLLA_SHARD_AWARE_PORT" and "SCYLLA_SHARD_AWARE_PORT_SSL". For compatibility, "SCYLLA_SHARDING_ALGORITHM" is still kept.
Fixes#5239
"
* jul-stas-shard-aware-listener:
docs: Info about shard-aware listeners in protocol-extensions
transport: Added listener with port-based load balancing
The new port is configurable from scylla.yaml and defaults to 19042
(unencrypted, unless client configures encryption options and omits
`native_shard_aware_transport_port_ssl`).
Two "SUPPORTED" tags are added: "SCYLLA_SHARD_AWARE_PORT" and
"SCYLLA_SHARD_AWARE_PORT_SSL". For compatibility,
"SCYLLA_SHARDING_ALGORITHM" is still kept.
Fixes#5239
This pair of limits replace the old max_memory_for_unlimited_query one,
which remains as an alias to the hard limit. The soft limit inherits the
previous value of the limit (1MB), when this limit is reached a warning
will be logged allowing the users to adjust their client codes without
downtime. The hard limit starts out with a more permissive default of
100MB. When this is reached queries are aborted, the same behaviour as
with the previous single limit.
The idea is to allow clients a grace period for fixing their code, while
at the same time protecting the database from the really bad queries.
Now that there are no ad-hoc aliases needing to overwrite the name and
description parameter of this method, we can drop these and have each
config item just use `name()` and `desc()` to access these.
We already uses aliases for some configuration items, although these are
created with an ad-hoc mechanism that only registers them on the command
line. Replace this with the built-in alias mechanism in the previous
patch, which has the benefit of conflict resolution and also working
with YAML.
Option to control the alternator streams CDC query/shard range time
confidence interval, i.e. the period we enforce as timestamp threshold
when reading. The default, 10s, should be sufficient on a normal
cluster, but across DCs:, or with client timestamps or whatever, one might
need a larger window.
This patch moves the resharding process to use the new
directory_with_sstables_handler infrastructure. There is no longer
a clear reshard step, and that just becomes a natural part of
populate_column_family.
In main.cc, a couple of changes are necessary to make that happen.
The first one obviously is to stop calling reshard. We also need to
make sure that:
- The compaction manager is started much earlier, so we can register
resharding jobs with it.
- auto compactions are disabled in the populate method, so resharding
doesn't have to fight for bandwidth with auto compactions.
Now that we are resharding through the sstable_directory, the old
resharding code can be deleted. There is also no need to deal with
the resharding backlog either, because the SSTables are not yet
added to the sstable set at this point.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Alternator supports four ways in which write operations can use quorum
writes or LWT or both, which we called "write isolation policies".
Until this patch, Alternator defaulted to the most generally safe policy,
"always_use_lwt". This default could have been overriden for each table
separately, but there was no way to change this default for all tables.
This patch adds a "--alternator-write-isolation" configuration option which
allows changing the default.
Moreover, @dorlaor asked that users must *explicitly* choose this default
mode, and not get "always_use_lwt" without noticing. The previous default,
"always_use_lwt" supports any workload correctly but because it uses LWT
for all writes it may be disappointingly slow for users who run write-only
workloads (including most benchmarks) - such users might find the slow
writes so disappointing that they will drop Scylla. Conversely, a default
of "forbid_rmw" will be faster and still correct, but will fail on workloads
which need read-modify-write operations - and suprise users that need these
operations. So Dor asked that that *none* of the write modes be made the
default, and users must make an informed choice between the different write
modes, rather than being disappointed by a default choice they weren't
aware of.
So after this patch, Scylla refuses to boot if Alternator is enabled but
a "--alternator-write-isolation" option is missing.
The patch also modifies the relevant documentation, adds the same option to
our docker image, and the modifies the test-running script
test/alternator/run to run Scylla with the old default mode (always_use_lwt),
which we need because we want to test RMW operations as well.
Fixes#6452
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200524160338.108417-1-nyh@scylladb.com>
In commit da3bf20e71 we supposedly enabled
support for Cassandra's "start_native_transport" option which can be set to
0 to run Scylla without listening on the CQL port. This can be useful, for
example, if a user only want the DynamoDB or Redis APIs but not CQL.
Unfortunately, the option was still marked "Unused", so it wasn't really
enabled as a valid command line option. This patch fixes that, and
documents the start_native_transport option in docs/protocols.md, where
we document the different protocols, ports, and options to configure them.
Fixes#6387.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200506174850.13616-1-nyh@scylladb.com>
Always enable lightweight transactions. Remove the check for the command
line switch from the feature service, assuming LWT is always enabled.
Remove the check for LWT from Alternator.
Note that in order for the cluster to work with LWT, all nodes need
to support it.
Rename LWT to UNUSED in db/config.hh, to keep accepting lwt keyword in
--experimental-features command line option, but do nothing with it.
Changes in v2:
* remove enable_lwt feature flag, it's always there
Closes#6102
test: unit (dev, debug)
Message-Id: <20200401071149.41921-1-kostja@scylladb.com>
Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation
number g1, g2, g3.
n1, n2, n3 running scylla version with commit
0a52ecb6df (gossip: Fix max generation
drift measure)
One year later, user wants the upgrade n1,n2,n3 to a new version
when n3 does a rolling restart with a new version, n3 will use a
generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and
g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's
gossip update and mark g3 as down.
Such unnecessary marking of node down can cause availability issues.
For example:
DC1: n1, n2
DC2: n3, n4
When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which
causes the whole DC2 to be unavailable.
To fix, we can start the node with a gossip generation within
MAX_GENERATION_DIFFERENCE difference for the new node.
Once all the nodes run the version with commit
0a52ecb6df, the option is no logger
needed.
Fixes#5164
We have a few kind of queries whose memory consumption is not limited at
all. One of these is reverse queries, which reads entire partitions into
memory, before reversing them. These partitions can be larger than
memory and thus such a query can single-handedly cause OOM.
This patch introduces a configuration for a memory limit for such
queries. This will serve as a hard limit and queries which attempt to
use more memory than this, will be aborted.
The limit is propagated to table objects, with the intention of keeping
system tables unlimited. These tables are usually small and initiators
of system queries are not prepared for failures.
Refs #817
Truncation is potentially long. It has its own timeout in storage
proxy/rpc. This value should probably also be higher than default
timeout.
Message-Id: <20200218135926.26522-1-calle@scylladb.com>
RandomPartitioner has been deprecated for 2.5 year.
Now we drop the support for it. There are two reasons for this.
First, this partitioner can lead to uneven distribution of partitions
among the nodes in the cluster which leads to hot nodes.
Second, we're planning to unify the representation of tokens and
fix it as int64_t. RandomPartitioner does not comply with this.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
ByteOrderedPartitioner has been deprecated for 2.5 year.
Now we drop the support for it. There are two reasons for this.
First, this partitioner can lead to uneven distribution of partitions
among the nodes in the cluster which leads to hot nodes.
Second, we're planning to unify the representation of tokens and
fix it as int64_t. ByteOrderPartitioner does not comply with this.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
OrderPreservingPartitioner seems to be long gone and not supported
so remove all the places it's still mentioned.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Even though we configure the set of accepted values for
some config flags, named_value ignore them.
This patch implements the checks that verify flag is
not set to the value that's not on the list.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Following patch will start checking allowed_values
in named_value and print errors for wrong values.
This will require all the types used with named_value
to have operator<< implemented. seed_provider_type
is one such type.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Murmur3 is the default partitioner.
ByteOrder and Random are the deprecated ones
and should be mentioned in the description.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
The option in question apparently does not work, several sharded objects
are start()-ed (and thus instanciated) in join_roken_ring, while instances
themselves of these objects are used during init of other stuff.
This leads to broken seastar local_is_initialized assertion on sys_dist_ks,
but reading the code shows more examples, e.g. the auth_service is started
on join, but is used for thrift and cql servers initialization.
The suggestion is to remove the option instead of fixing. The is_joined
logic is kept since on-start joining still can take some time and it's safer
to report real status from the API.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191203140717.14521-1-xemul@scylladb.com>
"
Add --experimental-features -- a vector of features to unlock. Make corresponding changes in the YAML parser.
Fixes#5338
"
* 'vecexper' of https://github.com/dekimir/scylla:
config: Add `experimental_features` option
utils: Add enum_option
Provide some explanation on prio strings + direction to gnutls manual.
Document client auth option.
Remove confusing/misleading statement on "custom options"
Message-Id: <20191210123714.12278-1-calle@scylladb.com>
When the user wants to turn on only some experimental features, they
can use this new option. The existing `experimental` option is
preserved for backwards compatibility.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Rename option redis_transport_port to redis_port, which the redis transport listens on for clients.
Rename option redis_transport_port_ssl to redis_ssl_port, which the redis TLS transport listens on for clients.
Rename option redis_database_count. Set the redis dabase count.
Rename option redis_keyspace_opitons to redis_keyspace_replication_strategy_options. Set the replication strategy for redis keyspace.
Remove option enable_redis_protocol, which is unnecessary.
Fixes: #5335
Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
When starting scylla daemon as non-root the initialization fails
because standard /var/lib/scylla is not accessible by regular users.
Making the default dir accessible for user is not very convenient
either, as it will cause conflicts if two or more instances of scylla
are in use.
This problem can be resolved by specifying --commitlog-directory,
--data-file-directories, etc on start, but it's too much typing. I
propose to revive Nadav's --home option that allows to move all the
directories under the same prefix in one go.
Unlike Nadav's approach the --workdir option doesn't do any tricky
manipulations with existing directories. Insead, as Pekka suggested,
the individual directories are placed under the workir if and only
if the respective option is NOT provided. Otherwise the directory
configuration is taken as is regardless of whether its absolute or
relative path.
The values substutution is done early on start. Avi suggested that
this is unsafe wrt HUP config re-read and proper paths must be
resolved on the fly, but this patch doesn't address that yet, here's
why.
First of all, the respective options are MustRestart now and the
substitution is done before HUP handler is installed.
Next, commitlog and data_file values are copied on start, so marking
the options as LiveUpdate won't make any effect.
Finally, the existing named_value::operator() returns a reference,
so returning a calculated (and thus temporary) value is not possible
(from my current understanding, correct me if I'm wrong). Thus if we
want the *_directory() to return calculated value all callers of them
must be patched to call something different (e.g. *_directory.get() ?)
which will lead to more confusion and errors.
Changes v3:
- the option is --workdir back again
- the existing *directory are only affected if unset
- default config doesn't have any of these set
- added the short -W alias
Changes v2:
- the option is --home now
- all other paths are changed to be relative
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20191119130059.18066-1-xemul@scylladb.com>
In this document, the detailed design and implementation of Redis API in
Scylla is provided.
v2: build: work around ragel 7 generated code bug (suggested by Avi)
Ragel 7 incorrectly emits some unused variables that don't compile.
As a workaround, sed them away.
Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
Signed-off-by: Amos Kong <amos@scylladb.com>
Those are typically symptoms of use-after-free or memory corruption in
the program. It's better to catch such error sooner than later.
That situation is also dangerous since if a valid descriptor would
land under the invalid access, not the one which was intended for the
operation, then the operation may be performed on the wrong file and
result in corruption.
Message-Id: <1565206788-31254-1-git-send-email-tgrabiec@scylladb.com>
"
This patch series adds only UDF support, UDA will be in the next patch series.
With this all CQL types are mapped to Lua. Right now we setup a new
lua state and copy the values for each argument and return. This will
be optimized once profiled.
We require --experimental to enable UDF in case there is some change
to the table format.
"
* 'espindola/udf-only-v4' of https://github.com/espindola/scylla: (65 commits)
Lua: Document the conversions between Lua and CQL
Lua: Implement decimal subtraction
Lua: Implement decimal addition
Lua: Implement support for returning decimal
Lua: Implement decimal to string conversion
Lua: Implement decimal to floating point conversion
Lua: Implement support for decimal arguments
Lua: Implement support for returning varint
Lua: Implement support for returning duration
Lua: Implement support for duration arguments
Lua: Implement support for returning inet
Lua: Implement support for inet arguments
Lua: Implement support for returning time
Lua: Implement support for time arguments
Lua: Implement support for returning timeuuid
Lua: Implement support for returning uuid
Lua: Implement support for uuid and timeuuid arguments
Lua: Implement support for returning date
Lua: Implement support for date arguments
Lua: Implement support for returning timestamp
...
Scylla 3.1.0 broke the serialization format for TTLs. Later versions
corrected it, but if a cluster was originally installed as 3.1.0,
it will use the broken serialization forever. This configuration option
allows upgrades from 3.1.0 to succeed, by enabling the broken format
even for later versions.
The config entry will be used to turn authorization for alternator
requests on and off. The default is currently off, since the key store
is not implemented yet.
So far we had the "--alternator-port" option allowing to configure the port
on which the Alternator server listens on, but the server always listened
to any address. It is important to also be able to configure the listen
address - it is useful in tests running several instances of Scylla on
the same machine, and useful in multi-homed machines with several interfaces.
So this patch adds the "--alternator-address" option, defaulting to 0.0.0.0
(to listen on all interfaces). It works like the many other "--*-address"
options that Scylla already has.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190808204641.28648-1-nyh@scylladb.com>
Until now, we always opened the Alternator port along with Scylla's
regular ports (CQL etc.). This should really be made optional.
With this patch, by default Alternator does NOT start and does not
open a port. Run Scylla with --alternator-port=8000 to open an Alternator
API port on port 8000, as was the default until now. It's also possible
to set this in scylla.yaml.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
key monotonicity validation requires an overhead to store the last key and also to compare
therefore provide an option to enable/disable it (disabled by default).
Refs #4804
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
"
Current admission control takes a permit when cql requests starts and
releases it when reply is sent, but some requests may leave background
work behind after that point (some because there is genuine background
work to do like complete a write or do a read repair, and some because
a read/write may stuck in a queue longer than the request's timeout), so
after Scylla replies with a timeout some resources are still occupied.
The series fixes this by passing the permit down to storage_proxy where
it is held until all background work is completed.
Fixes#4768
"
* 'gleb/admission-v3' of github.com:scylladb/seastar-dev:
transport: add a metric to follow memory available for service permit.
storage_proxy: store a permit in a read executor
storage_proxy: store a permit in a write response handler
Pass service permit to storage_proxy
transport: introduce service_permit class and use it instead of semaphore_units
transport: hold admission a permit until a reply is sent
transport: remove cql server load balancer