Commit Graph

53 Commits

Author SHA1 Message Date
Piotr Sarna
4ad577b40c alternator: add content length limit to alternator servers
This patch adds a 16MB content length limit to alternator
HTTP(S) servers. It also comes with a test, which verifies
that larger requests are refused.

Fixes #5832

Tests: alternator-test(local,remote)

Message-Id: <29d5708f4bf9f41883d33d21b9cca72b05170e6c.1582285070.git.sarna@scylladb.com>
2020-02-23 14:34:20 +02:00
Nadav Har'El
b8aed18a24 alternator: unzero "scylla_alternator_total_operations" metric
In commit 388b492040, which was only supposed
to move around code, we accidentally lost the line which does

    _executor.local()._stats.total_operations++;

So after this commit this counter was always zero...
This patch returns the line incrementing this counter.

Arguably, this counter is not very important - a user can also calculate
this number by summing up all the counters in the scylla_alternator_operation
array (these are counters for individual types of operations). Nevertheless,
as long as we do export a "scylla_alternator_total_operations" metric,
we need to correctly calculate it and can't leave it zero :-)

Fixes #5836

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200219162820.14205-1-nyh@scylladb.com>
2020-02-20 08:11:15 +01:00
Piotr Sarna
3315220aea alternator: fix server when no authorization header is found
A typo caused the code to check for wrong header and assume
that Authorization header exists, even if it was not the case.
The fix comes with a regression test.
Message-Id: <58070abddae6359212aa399688e3e2704d52f419.1582108625.git.sarna@scylladb.com>
2020-02-19 13:39:50 +02:00
Piotr Sarna
bd888a2695 alternator: guard alternator-specific handlers with a gate
Alternator is able to serve more requests than its database operations,
e.g. a health check and returning the list of its nodes.
These operation, for safety, are no also guarded by the pending
requests gate.
2020-02-16 14:15:29 +01:00
Piotr Sarna
acfed880cc alternator: guard pending alternator requests with a gate
In order to make sure that pending alternator requests are processed
during shutdown, a gate for each shard is introduced. On shutdown,
each gate will be closed and all in-progress operations will be waited upon.

Fixes #5781
2020-02-16 13:48:45 +01:00
Piotr Sarna
c8ab9b3ae4 alternator: implement stopping alternator server
Stopping Scylla with alternator enabled is not clean,
because the server does not stop accepting requests
on shutdown, which leads to use-after-free events.
The first step towards a cleaner solution is to implement
alternator_server::stop(), which stops the HTTP/HTTPS servers.

Refs #5781
2020-02-16 13:34:21 +01:00
Piotr Sarna
3eb6da224b alternator: switch to keyspace-per-table approach
Instead of a monolith alternator keyspace, each table creates its own
keyspace, named in the following pattern: `a#TABLE_NAME`.
The `a#` prefix contains an illegal CQL character in order to ensure
that these keyspaces are never created via CQL.
2020-02-13 09:46:19 +01:00
Piotr Sarna
dcf54331ea alternator: allow custom names for keyspaces
The maybe_create_keyspace utility now accepts a parameter - the desired
name for a newly created keyspace.
2020-02-13 09:16:37 +01:00
Piotr Sarna
f4e51a96ca alternator: replace overloaded with overloaded_functor
Turns out we already have a utility header for a visitor
with overloaded lambdas. This patch purges the explicit
reimplementation of the same trick and uses the existing
class instead.
Message-Id: <60c0b9a978f8208b188ef6ddc0564cb133bed707.1581496049.git.sarna@scylladb.com>
2020-02-12 14:21:42 +02:00
Gleb Natapov
38fcab3db4 alternator: pass tracing state explicitly instead of relying on it been in the client_state
Multiple requests can use the same client_state simultaneously, so it is
not safe to use it as a container for a tracing state which is per
request. This is not yet an issue for the alternator since it creates
new client_state object for each request, but first of all it should not
and second trace state will be dropped from the client_state, by later
patch.
2020-02-10 14:50:55 +02:00
Nadav Har'El
b262eb5031 alternator: use simpler API for registering Alternator's HTTP URLs
We used the Seastar HTTP server's add() method to register URLs to
serve (so-called "routes"), but as suggested by Amnon, when we have
fixed URLs without parameters being path components, it's simpler
to use the put() method to do the same thing - and also results in
slightly less work at run-time to look up these routes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2020-02-05 21:14:18 +02:00
Nadav Har'El
3fecf6f641 alternator: add public API for list of nodes in current DC
If we want to balance the Alternator request load among the different nodes
(Refs #5030), the load balancer - whether it uses HTTP load balancing or
DNS - needs to be able to get an up-to-date list of live nodes to which it
can direct Alternator traffic. This list should include only the live nodes
in the same data center (geographical region) - it is expected that a
separate load balancer will be installed in each data center, and clients
from within this data center will reach this data center's load balancer.

There are multiple APIs in current Scylla to do something similar to what
we need, but as far as I know, none of them is exactly what we need or
convenient for Alternator installations: We don't want the load balancer
to use CQL, and the REST API http://localhost:10000/gossiper/endpoint/live/
doesn't do what we need (it doesn't restrict the list to one data center)
plus it's not open to connections outside the machine.

So in this patch, we implement a new HTTP request on the Alternator port -
"/localnodes", returning a JSON-formatted list of all live nodes in the
contacted node's data center:

   $ curl http://localhost:8000/localnodes
   ["127.0.0.2","127.0.0.1","127.0.0.3"]

Like the existing health check HTTP request, this operation is public and
unauthenticated. We consider the security risk low - it allows an attacker
to enquire the list of Scylla nodes in this DC, but an attacker can achieve
the same thing by just scanning the addresses in this subnet using the health
check request (or even with ordinary DynamoDB API requests).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2020-02-05 21:14:18 +02:00
Piotr Sarna
4c9f2f3c0a alternator: implement tagging
The following requests are implemented:
 - TagResource
 - UntagResource
 - ListTagsOfResource

Also, more tests are added for validating inputs, for both
arns, tag values and tag keys.

Message-Id: <a7ce9534ca580736fea445813fafef75a6139e29.1579618972.git.sarna@scylladb.com>
2020-01-29 10:20:05 +01:00
Piotr Sarna
a6a65abc3c alternator: change request return type to variant<value, error>
In order to minimize the use of exceptions during normal operations,
each request handler is now able to return either a proper JSON value,
or an instance of api_error, which indicates that something went wrong,
but without having to throw, catch and rethrow C++ exceptions.
This is especially important for conditional updates, since it's
expected to be common to return ConditionalCheckFailedException.
Message-Id: <d8996a0a270eb0d9db8fdcfb7046930b96781e69.1579515640.git.sarna@scylladb.com>
2020-01-28 12:39:23 +02:00
Nadav Har'El
aad5eeab51 alternator: better error messages when Alternator port is taken
If Alternator is requested to be enabled on a specific port but the port is
already taken, the boot fails as expected - but the error log is confusing;
It currently looks something like this:

WARN  2019-12-24 11:22:57,303 [shard 0] alternator-server - Failed to set up Alternator HTTP server on 0.0.0.0 port 8000, TLS port 8043: std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)
... (many more messages about the server shutting down)
INFO  2019-12-24 11:22:58,008 [shard 0] init - Startup failed: std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)

There are two problems here. First, the "WARN" should really be an "ERROR",
because it causes the server to be shut down and the user must see this error.
Second, the final line in the log, something the user is likely to see first,
contains only the ultimate cause for the exception (an address already in use)
but not the information what this address was needed for.

This patch solves both issues, and the log now looks like:

ERROR 2019-12-24 14:00:54,496 [shard 0] alternator-server - Failed to set up Alterna
tor HTTP server on 0.0.0.0 port 8000, TLS port 8043: std::system_error (error system
:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)
...
INFO  2019-12-24 14:00:55,056 [shard 0] init - Startup failed: std::_Nested_exception<std::runtime_error> (Failed to set up Alternator HTTP server on 0.0.0.0 port 8000, TLS port 8043): std::system_error (error system:98, posix_listen failed for address 0.0.0.0:8000: Address already in use)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20191224124127.7093-1-nyh@scylladb.com>
2020-01-03 15:48:20 +02:00
Dejan Mircevski
2a136ba1bc alternator: Fix race condition in set_routes()
server::set_routes() was setting the value of server::_callbacks.
This led to a race condition, as set_routes() is invoked on every
shard simultaneously.  It is also unnecessary, since _callbacks can be
initialized in the constructor.

Fixes #5220.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-10-27 12:31:24 +02:00
Piotr Sarna
657e7ef5a5 alternator: add alternator health check
The health check is performed simply by issuing a GET request
to the alternator port - it returns the following status 200
response when the server is healthy:

$ curl -i localhost:8000
HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 23
Server: Seastar httpd
Date: 21 Oct 2019 12:55:33 GMT

healthy: localhost:8000

This commit comes with a test.
Fixes #5050
Message-Id: <3050b3819661ee19640c78372e655470c1e1089c.1571921618.git.sarna@scylladb.com>
2019-10-26 18:14:18 +03:00
Piotr Sarna
a0a33ae4f3 alternator: add additional datestamp verification
The authorization signature contains both a full obligatory date header
and a shortened datestamp - an additional verification step ensures that
the shortened stamp matches the full date.
2019-10-23 15:05:39 +02:00
Piotr Sarna
718cba10a1 alternator: verify that the signature has not expired
AWS signatures have a 15min expiration policy. For compatibility,
the same policy is applied for alternator requests. The policy also
ensures that signatures expanding more than 15 minutes into the future
are treated as unsafe and thus not accepted.
2019-10-23 15:05:39 +02:00
Piotr Sarna
524b03dea5 alternator: add key cache to authorization
In order to avoid fetching keys from system_auth.roles system table
on every request, a cache layer is introduced. And in order not to
reinvent the wheel, the existing implementation of loading_cache
with max size 1024 and a 1 minute timeout is used.
2019-10-23 15:05:39 +02:00
Piotr Sarna
6dee7737d7 alternator: use keys from system_auth.roles for authorization
Instead of having a hardcoded secret key, the server now verifies
an actual key extracted from system_auth.roles system table.
This commit comes with a test update - instead of 'whatever':'whatever',
the credentials used for a local run are 'alternator':'secret_pass',
which matches the initial contents of system_auth.roles table,
which acts as a key store.

Fixes #5046
2019-10-23 15:05:39 +02:00
Piotr Sarna
388b492040 alternator: move the api handler to a separate function
The lambda used for handling the api request has grown a little bit
too large, so it's moved to a separate method. Along with it,
the callbacks are now remembered inside the class itself.
2019-10-23 15:05:39 +02:00
Piotr Sarna
a93cf12668 alternator: futurize verify_signature function
The verify_signature utility will later be coupled with Scylla
authorization. In order to prepare for that, it is first transformed
into a function that returns future<>, and it also becomes a member
of class server. The reason it becoming a member function is that
it will make it easier to implement a server-local key cache.
2019-10-23 15:05:39 +02:00
Piotr Sarna
97cbb9a2c7 alternator: add verifying the auth signature
The signature sent in the "Authorization:" header is now verified
by computing the signature server-side with a matching secret key
and confirming that the signatures match.
Currently the secret key is hardcoded to be "whatever" in order
to work with current tests, but it should be replaced
by a proper key store.

Refs #5046
2019-10-10 13:51:00 +02:00
Piotr Sarna
ca58b46b4c alternator: migrate split() function to string_view
The implementation of string split was based on sstring type for
simplicity, but it turns out that more generic std::string_view
will be beneficial later to avoid unneeded string copying.
Unfortunately boost::split does not cooperate well with string views,
so a simple manual implementation is provided instead.
2019-10-10 13:50:59 +02:00
Piotr Sarna
e1b0537149 alternator: add HTTPS support
By providing a server based on a TLS socket, it's now possible
to serve HTTPS requests in alternator. The HTTPS server is enabled
by setting its port in scylla.yaml: alternator_tls_port=XXXX.
Alternator TLS relies on the existing TLS configuration,
which is provided by certificate, keyfile, truststore, priority_string
options.

Fixes #5042
2019-10-03 19:10:30 +02:00
Nadav Har'El
62c4ed8ee3 alternator: improve request logging
We needlessly split the trace-level log message for the request to two
messages - one containing just the operation's name, and one with the
parameters. Moreover we printed them in the opposite order (parameters
first, then the operation). So this patch combines them into one log
message.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190829165341.3600-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
c9eb9d9c76 alternator: update license blurbs
Update all the license blurbs to the one we use in the open-source
Scylla project, licensed under the AGPL.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190825160321.10016-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
d6e671b04f alternator: add initial tracing to requests
Each request provides basic tracing information about itself.

Example output from tracing:

cqlsh> select request, parameters from system_traces.sessions
           where session_id = 39813070-c4ea-11e9-8572-000000000000;
 request          | parameters
------------------+-----------------------------------------------------
 Alternator Query | {'query': '{"TableName": "alternator_test_15664",
                    "KeyConditions": {"p": {"AttributeValueList":
                    [{"S": "T0FE0QCS0X"}], "ComparisonOperator": "EQ"}}}'}

cqlsh> select session_id, activity from system_traces.events
           where session_id = 39813070-c4ea-11e9-8572-000000000000;
 session_id                           | activity
--------------------------------------+-----------------------------
 39813070-c4ea-11e9-8572-000000000000 |                    Querying
 39813070-c4ea-11e9-8572-000000000000 | Performing a database query
2019-09-11 18:01:05 +03:00
Piotr Sarna
cb791abb9d alternator: enable query tracing
Probabilistic tracing can be enabled via REST API. Alternator will
from now on create tracing sessions for its operations as well.

Examples:

 # trace around 0.1% of all requests
curl -X POST http://localhost:10000/storage_service/trace_probability?probability=0.001
 # trace everything
curl -X POST http://localhost:10000/storage_service/trace_probability?probability=1
2019-09-11 18:01:05 +03:00
Piotr Sarna
6c8c31bfc9 alternator: add client state
Keeping an instance of client_state is a convenient way of being able
to use tracing for alternator. It's also currently used in paging,
so adding a client state to executor removes the need of keeping
a dummy value.
2019-09-11 18:01:05 +03:00
Nadav Har'El
2f53423a2f alternator: automatically choose RF: 1 or 3
In CQL, before a user can create a table, they must create a keyspace to
contain this table and, among other things, specify this keyspace's RF.

But in the DynamoDB API, there is no "create keyspace" operation - the
user just creates a table, and there is no way, and no opportunity,
to specify the requested RF. Presumably, Amazon always uses the same
RF for all tables, most likely 3, although this is not officially
documented anywhere.

The existing code creates the keyspace during Scylla boot, with RF=1.
This RF=1 always works, and is a good choice for a one-node test run,
but was a really bad choice for a real cluster with multiple nodes, so
this patch fixes this choice:

With this patch, the keyspace creation is delayed - it doesn't happen
when the first node of the cluster boots, but only when the user creates
the first table. Presumably, at that time, the cluster is already up,
so at that point we can make the obvious choice automatically: a one-node
cluster will get RF=1, a >=3 node cluster will get RF=3. The choice of
RF is logged - and the choice of RF=1 is considered a warning.

Note that with this patch, keyspace creation is still automatic as it
was before. The user may manually create the keyspace via CQL, to
override this automatic choice. In the future we may also add additional
keyspace configuration options via configuration flags or new REST
requests, and the keyspace management code will also likely change
as we start to support clusters with multiple regions and global
tables. But for now, I think the automatic method is easiest for
users who want to test-drive Alternator without reading lengthy
instructions on how to set up the keyspace.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190820180610.5341-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Nadav Har'El
b2bd3bbc1f alternator: add "--alternator-address" configuration parameter
So far we had the "--alternator-port" option allowing to configure the port
on which the Alternator server listens on, but the server always listened
to any address. It is important to also be able to configure the listen
address - it is useful in tests running several instances of Scylla on
the same machine, and useful in multi-homed machines with several interfaces.

So this patch adds the "--alternator-address" option, defaulting to 0.0.0.0
(to listen on all interfaces). It works like the many other "--*-address"
options that Scylla already has.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190808204641.28648-1-nyh@scylladb.com>
2019-09-11 18:01:05 +03:00
Piotr Sarna
0fd1354ef9 alternator: add handling rapidjson errors in the server
If a JSON parsing error is encountered, it is transformed
to a validation exception and returned to the user in JSON form.
2019-09-11 18:01:04 +03:00
Nadav Har'El
4d07e2b7c5 alternator: support BatchGetItem
This patch adds to Alternator an implementation of the BatchGetItem
operation, which allows to start a number of GetItem requests in parallel
in a single request.

The implementation is almost complete - the only missing feature is the
ability to ask only for non-top-level attributes in ProjectionExpression.
Everything else should work, and this patch also includes tests which,
as usual, pass on DynamoDB and now also on Alternator.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:33:50 +03:00
Nadav Har'El
83b91d4b49 alternator: add DeleteItem
Add support for the DeleteItem operation, which deletes an item.

The basic deletion operation is supported. Still not supported are:

1. Parameters to conditionally delete (ConditionalExpression or Expected)
2. Parameters to return pre-delete content
3. ReturnItemCollectionMetrics (statistics relevant for tables with LSI)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 15:19:46 +03:00
Piotr Sarna
27f00d1693 alternator: move error class to a separate header
Error class definitions were previously in server.hh, but they
are separate entities - future .cc files can use the errors without
the need of including server definitions.
Message-Id: <b5689e0f4c9f9183161eafff718f45dd8a61b653.1559646761.git.sarna@scylladb.com>
2019-09-11 14:52:58 +03:00
Nadav Har'El
eb81b31132 alternator: add statistics
his patch adds a statistics framework to Alternator: Executor has (for
each shard) a _stats object which contains counters for various events,
and also is in charge of making these counters visible via Scylla's regular
metrics API (http://localhost:9180/metrics).

This patch includes a counter for each of DynamoDB's operation types,
and we increase the ones we support when handled. We also added counters
for total operations and unsupported operations (operation types we don't
yet handle). In the future we can easily add many more counters: Define
the counter in stats.hh, export it in stats.cc, and increment it in
where relevant in executor.cc (or server.cc).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:36:26 +03:00
Piotr Sarna
b309c9d54b alternator: implement basic Query
The implementation covers the following restrictions
 - equality for hash key;
 - equality, <, <=, >, >=, between, begins_with for sort key.
Message-Id: <021989f6d0803674cbd727f9b8b3815433ceeea5.1558356119.git.sarna@scylladb.com>
2019-09-11 14:36:16 +03:00
Piotr Sarna
8525b14271 alternator: add lookup table for requests
Instead of using a really long if-else chain, requests are now
looked up via a routing table.
Message-Id: <746a34b754c3070aa9cbeaf98a6e7c6781aaee65.1557914794.git.sarna@scylladb.com>
2019-09-11 14:29:59 +03:00
Piotr Sarna
c0ecd1a334 alternator: add basic BatchWriteItem
The initial implementation only supports PutRequest requests,
without serving DeleteRequest properly.
Message-Id: <451bcbed61f7eb2307ff5722de33c2e883563643.1557914382.git.sarna@scylladb.com>
2019-09-11 14:29:50 +03:00
Nadav Har'El
9a0c13913d alternator: improve where DescribeEndpoints gets its information
Instead of blindly returning "localhost:8000" in response to
DescribeEndpoints and for sure causing us problems in the future,
the right thing to do is to return the same domain name which the
user originally used to get to us, be it "localhost:8000" or
"some.domain.name:1234". But how can we know what this domain name
was? Easy - this is why HTTP 1.1 added a mandatory "Host:" header,
and the DynamoDB driver I tested (boto3) adds it as expected,
indeed with the expected value of "localhost:8000" on my local setup.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:25:22 +03:00
Nadav Har'El
29e0f68ee0 alternator: add initial implementation of DescribeEndpoints
DescribeEndpoints is not a very important API (and by default, clients
don't use it) but I wanted to understand how DynamoDB responds to it,
and what better way than to write a test :-)

And then, if we already have a test, let's implement this request in
Scylla as well. This is a silly implementation, which always returns
"localhost:8000". In the future, this will need to be configurable -
we're not supposed here to return *this* server's IP address, but rather
a domain name which can be used to get to all servers.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:22:47 +03:00
Piotr Sarna
4def674731 alternator: implement basic scan
The most basic version of Scan request is implemented.
It still contains a list of TODOs, among which the support for Segments
parameter for scan parallelism.
Message-Id: <5d1bfc086dbbe64b3674b0053e58a0439e64909b.1557757402.git.sarna@scylladb.com>
2019-09-11 14:21:39 +03:00
Piotr Sarna
0ce3866fb5 alternator: lower debug messages verbosity in the HTTP server
The HTTP server still uses WARN log level to log debug messages,
which is way higher than necessary. These messages are degraded
to TRACE level.
Message-Id: <59559277f2548d4046001bebff45ab2d3b7063b5.1557744617.git.sarna@scylladb.com>
2019-09-11 14:12:40 +03:00
Piotr Sarna
b6dde25bcc alternator: implement ListTables
ListTables is used to extract all table names created so far.
Message-Id: <04f4d804a40ff08a38125f36351e56d7426d2e3d.1557402320.git.sarna@scylladb.com>
2019-09-11 14:10:54 +03:00
Nadav Har'El
0c2a440f7f alternator: add initial UpdateItem implementation
Add an initial UpdateItem implementation. As PutItem and GetItem we
are still limited to string attributes. This initial implementation
of UpdateItem implements only the "PUT" action (not "DELETE" and
certainly not "ADD") and not any of the more advanced options.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 14:03:00 +03:00
Nadav Har'El
0e06d82a1f alternator: clean up api_error() interface
All operation-generated error messages should have the 400 HTTP error
code. It's a real nag to have to type it every time. So make it the
default.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 13:01:47 +03:00
Nadav Har'El
8dec31d23b alternator: add initial implementation of DeleteTable
Add an initial implementation of Delete table, enough for making the

   pytest --local test_table.py::test_create_and_delete_table

Pass.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:45:42 +03:00
Nadav Har'El
41d4b88e78 alternator: on unknown operation, return standard API error
When given an unknown operation (we didn't implement yet many of them...)
we should throw the appropriate api_error, not some random exception.

This allows the client to understand the operation is not supported
and stop retrying - instead of retrying thinking this was a weird
internal error.

For example the test
   pytest --local test_table.py::test_create_and_delete_table

Now fails immediately, saying Unsupported operation DeleteTable.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2019-09-11 12:45:04 +03:00