In the state of Alternator in docs/alternator/alternator.md, we said that
BatchWriteItem doesn't check for duplicate entries. That is not true -
we do - and we even have tests (test_batch_write_duplicate*) to verify that.
So drop that comment.
Refs #5698. (there is still a small bug in the duplicate checking, so still
leaving that issue open).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200219164107.14716-1-nyh@scylladb.com>
This patch adds a new document, docs/protocols.md, about all the different
protocols which Scylla supports - and the different ports which they use.
This includes Scylla's internal protocol, user-facing protocols (CQL, Thrift,
DynamoDB, Redis, JMX) and things inbetween (REST API, Prometheus).
I wrote this document after being frustrated that when I see a port number
(e.g., "7000") or a port option name (e.g., "storage_port") it's hard to
figure out what they actually are - or why they are given such strange
names. The intention is that this file can easily be searched for option
names, for more familiar names (e.g., "CQL"), and a reader can get the
whole story - including some pointers to relevant part of the code (this
part of the document can be improved further - in this version this only
exists for the internal protocol).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200217172049.25510-1-nyh@scylladb.com>
The instructions in docs/alternator/getting-started.md on how to run
Alternator with docker are outdated and confusing, so this patch updates
them.
First, the instructions recommended the scylladb/scylla-nightly:alternator
tag, but we only ever created this tag once, and never updated it. Since
then, Alternator has been constantly improving, and we've caught up on
a lot of features, and people who want to test or evaluate Alternator
will most likely want to run the latest nightly build, with all the latest
Alternator features. So we update the instructions to request the latest
nightly build - and mention the need to explictly do "docker pull" (without
this step, you can find yourself running an antique nightly build, which
you downloaded months ago!). This instruction can be revisited once
Alternator is GAed and not improving quickly and we can then recommend to
run the latest stable Scylla - but I think we're not there yet.
Second, in recent builds, Alternator requires that the LWT feature is
enabled, and since LWT is still experimental, this means that one needs
to add "--experimental 1" to the "docker run" command. Without it, the
command line in getting-started.md will refuse to boot, complaining that
Alternator was enabled but LWT wasn't. So this patch adds the
"--experimental 1" in the relevant places in the text. Again, this
instruction can and should be revisited once LWT goes out of experimental
mode.
Fixes#5813
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200216113601.9535-1-nyh@scylladb.com>
Merged pull request https://github.com/scylladb/scylla/pull/5485
by Kamil Braun:
This series introduces the notion of CDC generations: sets of CDC streams
used by the cluster to choose partition keys for CDC log writes.
Each CDC generation begins operating at a specific time point, called the
generation's timestamp (cdc_streams_timestamp in the code).
It continues being used by all nodes in the cluster to generate log writes
until superseded by a new generation.
Generations are chosen so that CDC log writes are colocated with their
corresponding base table writes, i.e. their partition keys (which are CDC
stream identifiers picked from the generation operating at time of making
the write) fall into the same vnode and shard as the corresponding base
table write partition keys. Currently this is probabilistic and not 100%
of log writes will be colocated - this will change in future commits,
after per-table partitioners are implemented.
CDC generations are a global property of the cluster -- they don't depend
on any particular table's configuration. Therefore the old "CDC stream
description tables", which were specific to each CDC-enabled table,
were removed and replaced by a new, global description table inside the
system_distributed keyspace.
A new generation is introduced and supersedes the previous one whenever
we insert new tokens into the token ring, which breaks the colocation
property of the previous generation. The new generation is chosen to
account for the new tokens and restore colocation. This happens when a
new node joins the cluster.
The joining node is responsible for creating and informing other nodes
about the new CDC generation. It does that by serializing it and inserting
into an internal distributed table ("CDC topology description table").
If it fails the insert, it fails the joining process. It then announces
the generation to other nodes through gossip using the generation's
timestamp, which is the partition key of the inserted distributed table
entry.
Nodes that learn about the new generation through gossip attempt to
retrieve it from the distributed table. This might fail - for example,
if the node is partitioned away from all replicas that hold this
generation's table entry. In that case the node might stop accepting
writes, since it knows that it should send log entries to a new generation
of streams, but it doesn't know what the generation is. The node will keep
trying to retrieve the data in the background until it succeeds or sees
that it is no longer necessary (e.g., because yet another generation
superseded this one). So we give up some availability to achieve safety.
However, this solution is not completely safe (might break consistency
properties): if a node learns about a new generation too late (if gossip
doesn't reach this node in time), the node might send writes to the wrong
(old) generation. In the future we will introduce a transaction-based
approach where we will always make sure that all nodes receive the new
generation before any of them starts using it (and if it's impossible
e.g. due to a network partition, we will fail the bootstrap attempt).
In practice, if the admin makes sure that the cluster works correctly
before bootstrapping a new node, and a network partition doesn't start
in the few seconds window where a new generation is announced, everything
will work as it should.
After the learning node retrieves the generation, it inserts it into an
in-memory data structure called "CDC metadata". This structure is then
used when performing writes to the CDC log -- given the timestamp of the
written mutation, the data structure will return the CDC generation
operating at this time point. CDC metadata might reject the query for
two reasons: if the timestamp belongs to an earlier generation, which
most probably doesn't have the colocation property anymore, or if it is
picked too far away into the future, where we don't know if the current
generation won't be superseded by a different one (so we don't yet know
the set of streams that this log write should be sent to). If the client
uses server-generated timestamps, the query will never be rejected.
Clients can also use client-generated timestamps, but they must make sure
that their clocks are not too desynchronized with the database --
otherwise some or all of their writes to CDC-enabled tables will be
rejected.
In the case of rolling upgrade, where we restart nodes that were
previously running without CDC, we act a bit differently - there is no
naturally selected joining node which must propose a new generation.
We have to select such a node using other means. For this we use a bully
approach: every node compares its host id with host ids of other nodes
and if it finds that it has the greatest host id, it becomes responsible
for creating the first generation.
This change also fixes the way of choosing values of the "time" column
of CDC log writes: the timeuuid is chosen in a way which preserves
ordering of corresponding base table mutations (the timestamp of this
timeuuid is equal to the base table mutation timestamp).
Warning: if you were running a previous CDC version (without topology
change support), make sure to disable CDC on all tables before performing
the upgrade. This will drop the log data -- backup it if needed.
TODO in future patchset: expire CDC generations. Currently, each inserted
CDC generation will stay in the distributed tables forever (until
manually removed by the administrator). When a generation is superseded,
it should become "expired", and 24 hours after expiration, it should be
removed. The distributed tables (cdc_topology_description and
cdc_description) both have an "expired" column which can be used for
this purpose.
Unit tests: dev, debug, release
dtests (dev): https://jenkins.scylladb.com/job/scylla-master/job/byo/job/byo_build_tests_dtest/907/
This patch adds support for the ConditionExpression parameter of the
item-writing operations in Alternator: PutItem, UpdateItem and DeleteItem.
We already supported conditional updates/put/delete using the "Expected"
parameter. The ConditionExpression parameter implemented here provides a
very similar feature, using a different - and also newer and more powerful -
syntax.
The implementation here reuses much of our existing expression-parsing
infrastructure. Unsurprisingly, ConditionExpression's syntax has much in
common with UpdateExpression which we already support) and also many of the
comparison functions already implemented for "Expected". However, it's still
quite a bit of new code, because of the many different comparisons, functions,
and syntax variations we need to support.
This patch also expands alternator-test/test_condition_expression.py with
a few additional corner cases discovered during the development of this
patch.
Almost all of the tests for this feature (35 out of 39) now pass.
Two tests still fail because we don't yet support nested attributes (this
is a missing feature across Alternator), and two tests fail because of minor
ideosyncracies in DynamoDB's error path that we chose not to duplicate
yet (but still remember the difference in the form of an xfailing test).
Fixes#5035
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
In this patch, we re-implement the three read-modify-write operations -
PutItem, UpdateItem, DeleteItem. All three operations may need to read the
item before writing it to support conditional updates (the "Expected"
parameter) and UpdateItem may also need the previous item's value for
its update expression (e.g., a user may ask to "set a=a+1" or "set a=b").
Before this patch, the implementation of RMW operations simply did a read,
and then a write - without any attempt to protect concurrent operations.
In this patch, Scylla's LWT mechanism (storage_proxy::cas()) is used
instead, to ensure that concurrent update operations are correctly
isolated even if they are conditional. This means that Alternator now
requires the experimental LWT feature to be enabled (and refuses to
boot if it isn't).
The version presented here is configured to always use LWT for *every*
write, regardless of whether it has a condition or not. So it will
will significantly slow down write-only workloads like YCSB. But the code
in this patch actually includes three other modes, which can be chosen by
setting an enum constant in the code. In the future we will want to let the
user configure this mode, globally, per table or per attribute.
Note that read requests are NOT modified, and work exactly as they did
before: i.e., strongly-consistent reads are done using a normal
CL=LOCAL_QUORUM read - not via LWT. I believe this is good enough given
Dynamo's guarantees, and critical for our read performance.
Also note that patch doesn't yet fix the BatchWriteItem operation.
Although BatchWriteItem does not support any RMW operations - just pure
writes - we may still need to do those pure writes using LWT. This
should be fixed in a follow-up patch.
Unfortunately, this patch involves a large amount of code movement and
reorganization, because:
1. The cas operation requires each operation to be made into an object,
with a separate apply() function, forcing a lot of code to move.
2. Moreover, we need to do this for three different operations (PutItem,
UpdateItem, DeleteItem) so to avoid massive code duplication, I had
to move some common code.
3. The cas operation also forced us to change some of the utility functions'
APIs.
The end result is that this patch focuses more on a compact and
understandable *end result* than it does on an easy to understand *patch*,
so reviewers - sorry about that.
All alternator-test/ tests pass with this patch (and also with all of the
different optional modes enabled). However, other than that, I did not yet
do any real isolation tests (are concurrent operations really isolated
correctly? or is LWT just faking it? :-) ), performance tests or stress
tests - and I'll definitely need to do those as well.
Fixes#5054
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
If you merge a pull request that contains multiple patches via
the github interface, it will document itself as the committer.
Work around this brain damage by using the command line.
Merged pull request https://github.com/scylladb/scylla/pull/5381 by
Peng Jian, fixing multiple small issues with Redis:
* Rename the options related to Redis API, and describe them clearly.
* Rename redis_transport_port to redis_port
* Rename redis_transport_port_ssl to redis_ssl_port
* Rename redis_default_database_count to redis_database_count
* Remove unnecessary option enable_redis_protocol
* Modify the default value of opition redis_read_consistency_level and redis_write_consistency_level to LOCAL_QUORUM
* Fix the DEL command: support to delete mutilple keys in one command.
* Fix the GET command: return the empty string when the required key is not exists.
* Fix the redis-test/test_del_non_existent_key: mark xfail.
Rename option redis_transport_port to redis_port, which the redis transport listens on for clients.
Rename option redis_transport_port_ssl to redis_ssl_port, which the redis TLS transport listens on for clients.
Rename option redis_database_count. Set the redis dabase count.
Rename option redis_keyspace_opitons to redis_keyspace_replication_strategy_options. Set the replication strategy for redis keyspace.
Remove option enable_redis_protocol, which is unnecessary.
Fixes: #5335
Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
"
This patch series adds only UDF support, UDA will be in the next patch series.
With this all CQL types are mapped to Lua. Right now we setup a new
lua state and copy the values for each argument and return. This will
be optimized once profiled.
We require --experimental to enable UDF in case there is some change
to the table format.
"
* 'espindola/udf-only-v4' of https://github.com/espindola/scylla: (65 commits)
Lua: Document the conversions between Lua and CQL
Lua: Implement decimal subtraction
Lua: Implement decimal addition
Lua: Implement support for returning decimal
Lua: Implement decimal to string conversion
Lua: Implement decimal to floating point conversion
Lua: Implement support for decimal arguments
Lua: Implement support for returning varint
Lua: Implement support for returning duration
Lua: Implement support for duration arguments
Lua: Implement support for returning inet
Lua: Implement support for inet arguments
Lua: Implement support for returning time
Lua: Implement support for time arguments
Lua: Implement support for returning timeuuid
Lua: Implement support for returning uuid
Lua: Implement support for uuid and timeuuid arguments
Lua: Implement support for returning date
Lua: Implement support for date arguments
Lua: Implement support for returning timestamp
...
This document adds information about how fixes are tracked to be
backported into releases and what is the procedure that is followed to
backport those fixes.
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Merged patch series from Botond Dénes:
This series extends the existing docs/debugging.md with a detailed guide
on how to debug Scylla coredumps. The intended target audience is
developers who are debugging their first core, hence the level of
details (hopefully enough). That said this should be just as useful for
seasoned debuggers just quickly looking up some snippet they can't
remember exactly. A Throubleshooting chapter is also added in this
series for commonly-met problems.
I decided to create this guide after myself having struggled for more
than a day on just opening(!) a coredump that was produced on Ubuntu.
As my main source, I used the How-to-debug-a-coredump page from the
internal wiki which contains many useful information on debugging
coredumps, however I found it to be missing some crucial information, as
well being very terse, thus being primarily useful for experienced
debuggers who can fill in the blanks. The reason I'm not extending said
wiki page is that I think this information should not be hidden in some
internal wiki page. Also, docs/debugging.md now seems to be a much
better base for such a document. This document was started as a
comprehensive debugging manual for beginners (but not just).
You will notice that the information on how to debug cores from
CentOS/Redhat are quite sparse. This is because I have no experience
with such cores, so for now the respective chapters are just stubs. I
intend to complete them in the future after having gained the necessary
experience and knowledge, however those being in possession of said
knowledge are more then welcome to send a patch. :)
Botond Dénes (4):
docs/debugging.md: demote 'Starting GDB' and 'Using GDB'
docs/debugging.md: fix formatting issues
docs/debugging.md: add 'Debugging coredumps' subchapter
docs/debugging.md: add 'Throubleshooting' subchapter
docs/debugging.md | 240 +++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 228 insertions(+), 12 deletions(-)
To the 'Debuggin Scylla with GDB` chapter. The '### Debugging
relocatable binaries built with the toolchain' subchapter is demoted to
be just a section in this new subchapter. It is also renamed to
'Relocatable binaries'.
This subchapter intends to be a complete guide on how to debug coredumps
from how to obtain the correct version of all the binaries all the way
to how to correctly open the core with GDB.
"Delete README-DPDK.md, move IDL.md to docs/ and fix
docs/review-checklist.md to point to scylla's coding style document,
instead of seastar's."
* 'documentation-cleanup/v3' of https://github.com/denesb/scylla:
docs/review-checklist.md: point to scylla's coding-style.md instead of seastar's
docs: mv coding-style.md docs/
rm README-DPDK.md
docs: mv IDL.md docs/
Merged patch set from Piotr Sarna:
Refs #5046
This commit adds handling "Authorization:" header in incoming requests.
The signature sent in the authorization is recomputed server-side
and compared with what the client sent. In case of a mismatch,
UnrecognizedClientException is returned.
The signature computation is based on boto3 Python implementation
and uses gnutls to compute HMAC hashes.
This series is rebased on a previous HTTPS series in order to ease
merging these two. As such, it depends on the HTTPS series being
merged first.
Tests: alternator(local, remote)
The series also comes with a simple authorization test and a docs update.
Piotr Sarna (6):
alternator: migrate split() function to string_view
alternator: add computing the auth signature
config: add alternator_enforce_authorization entry
alternator: add verifying the auth signature
alternator-test: add a basic authorization test case
docs: update alternator authorization entry
alternator-test/test_authorization.py | 34 ++++++++
configure.py | 1 +
alternator/{server.hh => auth.hh} | 22 ++---
alternator/server.hh | 3 +-
db/config.hh | 1 +
alternator/auth.cc | 88 ++++++++++++++++++++
alternator/server.cc | 112 +++++++++++++++++++++++---
db/config.cc | 1 +
main.cc | 2 +-
docs/alternator/alternator.md | 7 +-
10 files changed, 241 insertions(+), 30 deletions(-)
create mode 100644 alternator-test/test_authorization.py
copy alternator/{server.hh => auth.hh} (58%)
create mode 100644 alternator/auth.cc
The example Python code had wrong indentation, and wouldn't actually
work if naively copy-pasted. Noticed by Noam Hasson.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190929091440.28042-1-nyh@scylladb.com>
A documentation file that is intended to be a place for anything
debugging related: getting started tutorial, tips and tricks and
advanced guides.
For now it contains a short introductions, some selected links to
more in-depth documentation and some trips and tricks that I could think
off the top of my head.
One of those tricks describes how to load cores obtained from
relocatable packages inside the `dbuild` container. I originally
intended to add that to `tools/toolchain/README.md` but was convinced
that `docs/debugging.md` would be a better place for this.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190924133110.15069-1-bdenes@scylladb.com>
This patch adds a getting started document for alternator,
it explains how to start up a cluster that has an alternator
API port open and how to test that it works using either an
application or some simple and minimal python scripts.
The goal of the document is to get a user to have an up and
running docker based cluster with alternator support in the
shortest time possible.
As part of trying to make alternator more accessible
to users, we expect more documents to be created so
it seems like a good idea to give all of the alternator
docs their own directory.
This adds a "alternator-address" and "alternator-port" configuration
options to the Docker image, so people can enable Alternator with
"docker run" with:
docker run --name some-scylla -d <image> --alternator-port=8080
Message-Id: <20190902110920.19269-1-penberg@scylladb.com>
Add a link to a longer document (currently, around 40 pages) about
DynamoDB's features and how we implemented or may implement them in
Alternator.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190825121201.31747-2-nyh@scylladb.com>
This adds a new document, docs/alternator.md, about Alternator.
The scope of this document should be expanded in the future. We begin
here by introducing Alternator and its current compatibility level with
Amazon DynamoDB, but it should later grow to explain the design of Alternator
and how it maps the DynamoDB data model onto Scylla's.
Whether this document should remain a short high-level overview, or a long
and detailed design document, remains an open question.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190805085340.17543-1-nyh@scylladb.com>
Recently we started to use more the concept of metric labels - several
metrics which share the same name, but differ in the value of some label
such a "group" (for different scheduling groups).
This patch documents this feature in docs/metrics.md, gives the example of
scheduling groups, and explains a couple more relevant Promethueus syntax
tricks.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190909113803.15383-1-nyh@scylladb.com>
This clarifies that "rows" are clustering rows and that there is no
information about individual collection elements.
The patch also documents some properties common to all these tables.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190820171204.48739-1-espindola@scylladb.com>