Compare commits

..

279 Commits

Author SHA1 Message Date
Pekka Enberg
d3a05737f7 release: prepare for 0.14.1 2016-01-05 15:30:47 +02:00
Shlomi Livne
21c68d3da9 dist/redhat: Increase scylla-server service start timeout to 15 min
Fixes #749

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2016-01-05 15:30:41 +02:00
Pekka Enberg
88d544ed14 Merge "Fixes for AMI" from Shlomi
"The patch fixes a few issues caused by generalizing the ami scripts. The
 scylla_bootparam_setup requires invocation with ami flag. The
 scylla_install is missing some steps executed by the scylla-ami.sh."
2016-01-04 15:21:24 +02:00
Shlomi Livne
638c0c0ea8 Fixing missing items in move from scylla-ami.sh to scylla_install
scylla-ami.sh moved some ami specific files. This parts have been
dropped when converging scylla-ami into scylla_install. Fixing that.

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2016-01-04 14:57:57 +02:00
Shlomi Livne
f3e96e0f0b Invoke scylla_bootparam_setup with/without ami flag
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2016-01-04 14:57:57 +02:00
Shlomi Livne
fa15440665 Fix error: no integer expression expected in AMI creation
The script imports the /etc/sysconfig/scylla-server for configuration
settings (NR_PAGES). The /etc/sysconfig/scylla-server iincludes an AMI
param which is of string value and called as a last step in
scylla_install (after scylla_bootparam_setup has been initated).

The AMI variable is setup in scylla_install and is used in multiple
scripts. To resolve the conflict moving the import of
/etc/sysconfig/scylla-server after the AMI variable has been compared.

Fixes: #744

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2016-01-04 14:57:33 +02:00
Takuya ASADA
c4d66a3beb dist: apply limits settings correctly on Ubuntu
Fixes #738

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-02 12:20:47 +02:00
Shlomi Livne
5023f9bbab Make sure the directory we are writting coredumps to exists
After upgrading an AMI and trying to stop and start a machine the
/var/lib/scylla/coredump is not created. Create the directory if it does
not exist prior to generating core

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2015-12-31 13:21:54 +02:00
Pekka Enberg
3efc145562 dist: Increase NOFILE rlimit to 200k
Commit 2ba4910 ("main: verify that the NOFILE rlimit is sufficient")
added a recommendation to set NOFILE rlimit to 200k. Update our release
binaries to do the same.
2015-12-30 12:21:01 +02:00
Avi Kivity
1ad638f8bf main: verify that the NOFILE rlimit is sufficient
Require 10k files, recommend 200k.

Allow bypassing via --developer-mode.

Fixes #692.
2015-12-30 11:05:21 +02:00
Avi Kivity
43f4a8031d init: bail out if running not on an XFS filesystem
Allow an override via '--developer-mode true', and use it in
the docker setup, since that cannot be expected to use XFS.

Fixes #658.
2015-12-30 11:05:14 +02:00
Pekka Enberg
27dbbe1ca4 release: prepare for 0.14 2015-12-30 10:28:58 +02:00
Pekka Enberg
0aa105c9cf Merge "load report a negative value" from Amnon
"This series solve an issue with the load broadcaster that reports negative
 values due to an integer wrap around.  While fixing this issue an additional
 change was made so that the load_map would return doubles and not formatted
 string.  This is a better API, safer and better documented."
2015-12-30 10:21:55 +02:00
Nadav Har'El
f0b27671a2 murmur3 partitioner: remove outdated comment, and code
Since commit 16596385ee, long_token() is already checking
t.is_minimum(), so the comment which explains why it does not (for
performance) is no longer relevant. And we no longer need to check
t._kind before calling long_token (the check we do here is the same
as is_minimum).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-30 10:01:29 +02:00
Nadav Har'El
de5a3e5c5a repair: check columnFamilies list
Check the list of column families passed as an option to repair, to
provide the user with a more meaningful exception when a non-existant
column family is passed.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-30 09:59:54 +02:00
Nadav Har'El
3ae29216c8 repair: add missing ampersand
This was a plain bug - ranges_opt is supposed to parse the option into
the vector "var", but took the vector by value.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-30 09:46:13 +02:00
Nadav Har'El
a0a649c1be repair: support "columnFamilies" parameter
Support the "columnFamilies" parameter of repair, allowing to repair
only some of the column families of a keyspace, instead of all of them.
For example, using a command like "nodetool repaire keyspace cf1 cf2".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-30 09:45:28 +02:00
Lucas Meneghel Rodrigues
43d39d8b03 scylla_coredump_setup: Don't call yum on scylla server spec file
The script scylla_coredump_setup was introduced in
9b4d0592, and added to the scylla rpm spec file, as a
post script. However, calling yum when there's one
yum instance installng scylla server will cause a deadlock,
since yum waits for the yum lock to be released, and the
original yum process waits for the script to end.

So let's remove this from the script. Debian shouldn't be
affected, since it was never added to the debian build
rules (to the best of my knowlege, after analyzing 9b4d0592),
hence I did not remove it. It should cause the same problem
with apt-get in case it was used.

CC: Takuya ASADA <syuu@scylladb.com>
[ penberg: Rebase and drop statement about 'abrt' package not in Fedora. ]
Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>
2015-12-30 09:38:36 +02:00
Nadav Har'El
ebebaa525d repair: fix missing default values
A default value was not set for the "incremental" and "parallelism"
repair parameters, so Scylla can wrongly decide that they have an
unsupported value.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-29 15:39:47 +02:00
Amnon Heiman
ec379649ea API: repair to use documented params
The repair API use to have an undocumented parameter list similiar to
origin.

This patch changes the way repair is getting its parameters.
Instead of a one undocumented string it now lists all the different
optional parameters in the swagger file and accept them explicitely.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-29 15:38:44 +02:00
Amnon Heiman
f0d68e4161 main: start the http server in the first step
This change set the http server to start as the first step in the boot
order.

It is helpfull if some other step takes a long time or stuck.

Fixes #725

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-29 14:20:57 +02:00
Avi Kivity
c8b09a69a9 lsa: disable constant_time_size in binomial_heap implementation
Corrupts heap on boost < 1.60, and not needed.

Fixes #698.
2015-12-29 12:59:00 +01:00
Vlad Zolotarov
756de38a9d database: actually check that a snapshot directory exists
Actually check that a snapshot directory with a given tag
exists instead of just checking that a 'snapshot' directory
exists.

Fixes issue #689

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-12-29 12:59:00 +01:00
Amnon Heiman
71905081b1 API: report the load map as an unformatted double
In origin the storage_serivce report the load map as a formatted string.
As an API a better option is to report the load map as double and let
the JMX proxy do the formatting.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-29 11:55:34 +02:00
Amnon Heiman
06e1facc34 load_broadcaster report negative size
The map_reduce0 convert the result value to the init value type. In
load_bradcaster 0 is of type int.
This result with an int wrap around and negative results.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-29 11:55:34 +02:00
Avi Kivity
41bd266ddd db: provide more information on "Unrecognized error" while loading sstables
This information can be used to understand the root cause of the failure.

Refs #692.
2015-12-29 10:23:32 +02:00
Nadav Har'El
7247f055df repair: partial support for some options
Add partial support for the "incremental" option (only support the
"false" setting, i.e., not incremental repair) and the "parallelism"
option (the choice of sequential or parallel repair is ignored - we
always use our own technique).

This is needed because scylla-jmx passes these options by default
(e.g., "incremental=false" is passed to say this is *not* incremental
repair, and we just need to allow this and ignore it).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-29 09:38:09 +02:00
Nadav Har'El
3cfa39e1f0 repair: log repair options
When throwing an "unsupported repair options" exception to the caller
(such as "nodetool repair"), also list which options were not recognized.
Additionally, list the options when logging the repair operation.

This patch includes an operator<< implementation for pretty-printing an
std::unordered_map. We may want to move it later to a more central
location - even Seastar (like we have a pretty-printer for std::vector
in core/sstring.hh).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-29 09:37:30 +02:00
Raphael S. Carvalho
b7d36af26f compaction: fix max_purgeable calculation
max_purgeable was being incorrectly calculated because the code
that creates vector of uncompacted sstables was wrong.
This value is used to determine whether or not a tombstone can
be purged.
Operand < is supposed to be used instead in the callback passed
as third parameter to boost::set_difference.
This fix is a step towards closing the issue #676.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-29 09:30:08 +02:00
Takuya ASADA
46767fcacf dist: fix .rpm build error (File not found: scylla_extlinux_setup)
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-29 09:26:58 +02:00
Pekka Enberg
ca1f9f1c9a main: Fix implicitly disabled client encryption options
The start_native_transport() function in storage_service expects the
'enabled' option to be defined. If the option is not defined, it means
that encryption is implicitly disabled.

Fixes #718.
2015-12-28 16:24:49 +02:00
Pekka Enberg
a76b3a009b Merge "use steady_clock where monotonic clock is required" from Vlad
"The first patch in this series fixes the issue #638 in scylla.
 The second one fixes the tests to use the appropriate clock."
2015-12-28 13:35:50 +02:00
Avi Kivity
561bb79d22 Merge "CQL server SSL" from Calle
"* Update scylla.conf section
 * Add SSL capability to cql server
 * Use conf and initiate optional SSL cql server in
   main/storage_service"
2015-12-28 12:55:25 +02:00
Avi Kivity
72cb8d4461 Merge "Messaging service TLS" from Calle
"Adds support for TLS/SSL encrypted (and cert verified)
connections for message service

* Modify config option to match "native" style cerificate management
* Add SSL options to messaging service and generate SSL server/client
  endpoints when required
* Add config option handling to init/main"
2015-12-28 12:54:28 +02:00
Calle Wilund
fae3bb7a24 storage_service: Set up CQL server as SSL if specified
* Massage user options in main
* Use them in storage_service, and if needed, load certificates etc
  and pass to transport/cql server.

Conflicts:
	service/storage_service.cc
2015-12-28 10:13:48 +00:00
Calle Wilund
51d3990261 cql_server: Allow using SSL socket
Optional credentials argument determine if SSL or normal
server socket is created.

Note: This does not follow the pattern of "socket as argument", simply
because this is a distributed object, so only trivial or immutable
objects should be passed to it.
2015-12-28 10:13:48 +00:00
Calle Wilund
d8b2581a07 scylla.conf: Update client_encryption_options with scylla syntax
Using certificate+key directly
2015-12-28 10:13:48 +00:00
Calle Wilund
5f003f9284 scylla.conf: Modify server_encryption_options section
Describe scylla version of option.

Note, for test usage, the below should be workable:

server_encryption_options:
    internode_encryption: all
    certificate: seastar/tests/test.crt
    truststore: seastar/tests/catest.pem
    keyfile: seastar/tests/test.key

Since the seastar test suite contains a snakeoil cert + trust
combo
2015-12-28 10:10:35 +00:00
Calle Wilund
70f293d82e main/init: Use server_encryption_options
* Reads server_encryption_options
* Interpret the above, and load and initialize credentials
  and use with messaging service init if required
2015-12-28 10:10:35 +00:00
Calle Wilund
d1badfa108 messaging_service: Optionally create SSL endpoints
* Accept port + credentials + option for what to encrypt
* If set, enable a SSL listener at ssl_port
* Check outgoing connections by IP to determine if
  they should go to SSL/normal endpoint

Requires seastar RPC patch

Note: currently, the connections created by messaging service
does _not_ do certificate name verification. While DNS lookup
is probably not that expensive here, I am not 100% sure it is
the desired behaviour.
Normal trust is however verified.
2015-12-28 10:10:35 +00:00
Calle Wilund
1a9fb4ed7f config: Modify/use server_encryption_options
* Mark option used
* Make sub-options adapted to seastar-tls useable values (i.e. x509)

Syntax is now:

server_encryption_options:
	internode_encryption: <none, all, dc, rack>
	certificate: <path-to-PEM-x509-cert> (default conf/scylla.crt)
	keyfile: <path-to-PEM-x509-key> (default conf/scylla.key)
	truststore: <path-to-PEM-trust-store-file> (default empty,
                                                    use system trust)
2015-12-28 10:10:35 +00:00
Calle Wilund
b7baa4d1f5 config: clean up some style + move method to cc file 2015-12-28 10:10:35 +00:00
Takuya ASADA
fc29a341d2 dist: show usage and scylla-server status when login to AMI instance
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-28 11:40:34 +02:00
Avi Kivity
827a4d0010 Merge "streaming: Invalidate cache upon receiving of stream" from Asias
"When a node gain or regain responsibility for certain token ranges, streaming
will be performed, upon receiving of the stream data, the row cache
is invalidated for that range.

Refs #484."
2015-12-28 10:24:46 +02:00
Amnon Heiman
2c79fe1488 storage_service: describe_ring return full data
The describe_ring method in storage_service did not report the start and
end tokens.

Also for rpc addresses that are not the local address, it returned the
value representation (including the version) and not just the adress.

Fixes #695

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-28 09:56:12 +02:00
Takuya ASADA
0abcf5b3f3 dist: use readable time format on coredump file, instead of unix time
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-28 09:55:05 +02:00
Takuya ASADA
940c34b896 dist: don't abort scylla_coredump_setup when 'yum remove abrt' failed
It always fail when abrt is not installed.
This also fixes build_ami.sh failing because of this error.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-28 09:40:57 +02:00
Vlad Zolotarov
0f8090d6c7 tests: use steady_clock where monotinic clock is required
Use steady_clock instead of high_resolution_clock where monotonic
clock is required. high_resolution_clock is essentially a
system_clock (Wall Clock) therefore may not to be assumed monotonic
since Wall Clock may move backwards due to time/date adjustments.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-12-27 18:08:15 +02:00
Vlad Zolotarov
33552829b2 core: use steady_clock where monotinic clock is required
Use steady_clock instead of high_resolution_clock where monotonic
clock is required. high_resolution_clock is essentially a
system_clock (Wall Clock) therefore may not to be assumed monotonic
since Wall Clock may move backwards due to time/date adjustments.

Fixes issue #638

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-12-27 18:07:53 +02:00
Takuya ASADA
7f4a1567c6 dist: support non-ami boot parameter setup, add parameters for preallocate hugepages on boot-time
Fixes #172

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-27 17:56:49 +02:00
Takuya ASADA
6bf602e435 dist: setup ntpd on AMI
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-27 17:54:32 +02:00
Avi Kivity
2b22772e3c Merge "Introduce keep alive timer for stream_session" from Asias
"Fixes stream_session hangs:

1) if the sending node is gone, the receiving peer will wait forever
2) if the node which should send COMPLETE_MESSAGE to the peer node is gone,
   the peer node will wait forever"
2015-12-27 16:56:32 +02:00
Avi Kivity
f3980f1fad Merge seastar upstream
* seastar 51154f7...8b2171e (9):
  > memcached: avoid a collision of an expiration with time_point(-1).
  > tutorial: minor spelling corrections etc.
  > tutorial: expand semaphores section
  > Merge "Use steady_clock where monotonic clock is required" from Vlad
  > Merge "TLS fixes + RPC adaption" from Calle
  > do_with() optimization
  > tutorial: explain limiting parallelism using semaphores
  > submit_io: change pending flushes criteria
  > apps: remove defunct apps/seastar

Adjust code to use steady_clock instead of high_resolution_clock.
2015-12-27 14:40:20 +02:00
Avi Kivity
0687d7401d Merge "storage_service updates" from Asias
"
- Fix erase of new_replica_endpoints in get_changed_ranges_for_leaving
- Introduce ntroduce ring_delay_ms option
"
2015-12-27 12:46:37 +02:00
Nadav Har'El
06f8dd4eb2 repair: job id must start at 1
This patch fixes a bug where the *first* run of "nodetool repair" always
returned immediately, instead of waiting for the repair to complete.

Repair operations are asynchronous: Starting a repair returns a numeric
id, which can then be used to query for the repair's completion, and this
is what "nodetool repair" does (through our JMX layer). We started with
the repair ID "0", the next one is "1", and so on.

The problem is that "nodetool repair", when it sees 0 being returned,
treats it not as a regular repair ID, but rather as an answer that
there is nothing to repair - printing a message to that effect and *not*
waiting for the repair (which was correctly started) to complete.

The trivial fix is to start our repair IDs at 1, instead of 0.
We currently do not return 0 in any case (we don't know there is nothing
to repair before we actually start the work, and parameter errors
cause an exception, not a return of 0).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-27 12:42:26 +02:00
Avi Kivity
93aeedf403 Merge "Fixes for CentOS/RHEL support" from Takuya
"Recent changes on scripts causes error on CentOS/RHEL, this patchset fixes it."
2015-12-27 12:21:29 +02:00
Glauber Costa
e299127e81 main: check if options file can be read.
If we can't open the file, we will fail with a misterious error. It is a costumary
scenario, though, since people who are unaware or have just forgotten about seastar's
restriction of direct io access may put those files in tmpfs and other mount points.

We have a direct_io check that is designed exactly for this purpose, so as to give
the user a better error message. This patch makes use of it.

Fixes #644

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2015-12-27 12:20:40 +02:00
Asias He
f57ba6902b storage_service: Introduce ring_delay_ms option
It is hard-coded as 30 seconds at the moment.

Usage:
$ scylla --ring-delay-ms 5000

Time a node waits to hear from other nodes before joining the ring in
milliseconds.

Same as -Dcassandra.ring_delay_ms in cassandra.
2015-12-25 15:08:22 +08:00
Asias He
9c07ed8db6 storage_service: Fix erase new_replica_endpoints in get_changed_ranges_for_leaving
We need to calculate begin() and end() in the loop since elements in
new_replica_endpoints might be removed.

Refs #700
2015-12-25 15:08:22 +08:00
Asias He
88846bc816 storage_service: Add more debug info in decommission
It is useful to debug decommission issue.
2015-12-25 15:08:22 +08:00
Asias He
19f1875682 gossip: Print endpoint_state_map debug info in trace level
This generates too many logs with debug level. Make it trace level.
2015-12-25 15:08:22 +08:00
Nadav Har'El
06ab43a7ee murmur3 partitioner: fix midpoint() algorithm
The midpoint() algorithm to find a token between two tokens doesn't
work correctly in case of wraparound. The code tried to handle this
case, but did it wrong. So this patch fixes the midpoint() algorithm,
and adds clearer comments about why the fixed algorithm is correct.

This patch also modifies two midpoint() tests in partitioner_test,
which were incorrect - they verified that midpoint() returns some expected
values, but expected values were wrong!

We also add to the test a more fundemental test of midpoint() correctness,
which doesn't check the midpoint against a known value (which is easy to
get wrong, like indeed happened); Rather we simply check that the midpoint
is really inside the range (according to the token ordering operator).
This simple test failed with the old implementation of midpoint() and
passes with the new one.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-24 17:19:49 +02:00
Avi Kivity
3392f02b54 Merge "Make date parser more liberal" from Paweł
"This series makes date and time parsing more liberal so that Scylla
accepts the same date formats the origin does.

Fixes #521."
2015-12-24 17:18:04 +02:00
Asias He
20c258f202 streaming: Fix session hang with maybe_completed: WAIT_COMPLETE -> WAIT_COMPLETE
The problem is that we set the session state to WAIT_COMPLETE in
send_complete_message's continuation, the peer node might send
COMPLETE_MESSAGE before we run the continuation, thus we set the wrong
status in COMPLETE_MESSAGE's handler and will not close the session.

Before:

   GOT STREAM_MUTATION_DONE
   receive  task_completed
   SEND COMPLETE_MESSAGE to 127.0.0.2:0
   GOT COMPLETE_MESSAGE, from=127.0.0.2, connecting=127.0.0.3, dst_cpu_id=0
   complete: PREPARING -> WAIT_COMPLETE
   GOT COMPLETE_MESSAGE Reply
   maybe_completed: WAIT_COMPLETE -> WAIT_COMPLETE

After:

   GOT STREAM_MUTATION_DONE
   receive  task_completed
   maybe_completed: PREPARING -> WAIT_COMPLETE
   SEND COMPLETE_MESSAGE to 127.0.0.2:0
   GOT COMPLETE_MESSAGE, from=127.0.0.2, connecting=127.0.0.3, dst_cpu_id=0
   complete: WAIT_COMPLETE -> COMPLETE
   Session with 127.0.0.2 is complete
2015-12-24 20:34:44 +08:00
Asias He
c971fad618 streaming: Introduce keep alive timer for each stream_session
If the session is idle for 10 minutes, close the session. This can
detect the following hangs:

1) if the sending node is gone, the receiving peer will wait forever
2) if the node which should send COMPLETE_MESSAGE to the peer node is
gone, the peer node will wait forever

Fixes simple_kill_streaming_node_while_bootstrapping_test.
2015-12-24 20:34:44 +08:00
Asias He
f527e07be6 streaming: Get stream_session in STREAM_MUTATION handler
Get from address from cinfo. It is needed to figure out which stream
session this mutation is belonged to, since we need to update the keep
alive timer for this stream session.
2015-12-24 20:34:44 +08:00
Asias He
d7a8c655a6 streaming: Print All sessions completed after state change message
close_session will print "All sessions completed" message, print the
state change message before that.
2015-12-24 20:34:44 +08:00
Asias He
bd276fd087 streaming: Increase retry timeout
Currently, if the node is actually down, although the streaming_timeout
is 10 seconds, the sending of the verb will return rpc_closed error
immediately, so we give up in 20 * 5 = 100 seconds. After this change,
we give up in 10 * 30 = 300 seconds at least, and 10 * (30 + 30) = 600
seconds at most.
2015-12-24 20:34:44 +08:00
Asias He
eaea09ee71 streaming: Retransmit COMPLETE_MESSAGE message
It is oneway message at the moment. If a COMPLETE_MESSAGE is lost, no
one will close the session. The first step to fix the issue is to try to
retransmit the message.
2015-12-24 20:34:44 +08:00
Asias He
d1d6395978 streaming: Print old state before setting the new state 2015-12-24 20:34:44 +08:00
Takuya ASADA
bf9547b1c4 dist: support RHEL on scylla_install
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-24 18:48:30 +09:00
Takuya ASADA
bb0880f024 dist: use /etc/os-release instead of /etc/redhat-release
Since other scripts using /etc/os-release, it is better to use same one here.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-24 18:48:30 +09:00
Takuya ASADA
b6df28f3d5 dist: use $ID instead of $NAME to detect type of distribution
$NAME is full name of distribution, for script it is too long.
$ID is shortened one, which is more useful.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-24 18:48:30 +09:00
Takuya ASADA
0a4b68d35e dist: support CentOS yum repository
Fixes #671

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-24 18:48:30 +09:00
Takuya ASADA
8f4e90b87a dist: use tsc clocksource on AMI
Stop using xen clocksource, use tsc clocksource instead.
Fixes #462

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-22 22:29:32 +02:00
Amnon Heiman
b0856f7acf API: Init value for cf_map reduce should be of type int64_t
The helper function for summing statistic over the column family are
template function that infer the return type acording to the type of the
Init param.

In the API the return value should be int64_t, passing an integer would
cause a number wrap around.

A partial output from the nodetool cfstats after the fix

nodetool cfstats keyspace1
Keyspace: keyspace1
	Read Count: 0
	Read Latency: NaN ms.
	Write Count: 4050000
	Write Latency: 0.009178098765432099 ms.
	Pending Flushes: 0
		Table: standard1
		SSTable count: 12
		Space used (live): 1118617445
		Space used (total): 23336562465

Fixes #682

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-22 17:33:13 +02:00
Tomasz Grabiec
88f5da5d1d Merge branch 'calle/paging_fixes' from seastar-dev.git
From Calle:

Fixes #589
Query should not return dangling static row in partition without any
regular/ck columns if a CK restriction is applied.

Refs #650
Fixes bug in CK range code for paging, and removes CK use for tables with not
clustering -> way simpler code. Also removed lots of workaround code no longer
required.

Note that this patch set does not fully fix #650/paging since bug #663 causes
duplicate rows. Still almost there though.
2015-12-22 11:22:42 +01:00
Avi Kivity
926d340661 logger: be robust when exceptions are thrown while stringifying args
Instead of propagating the exception, swallow it and print it out in
the log message.

Fixes #672.
2015-12-21 19:58:08 +01:00
Paweł Dziepak
cf949e98cb tests/types: add more tests for date and time parsing
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-21 15:34:17 +01:00
Paweł Dziepak
633a13f7b3 types: timestamp_from_string: accept more date formats
Boost::date_time doesn't accept some of the date and time formats that
the origin do (e.g. 2013-9-22 or 2013-009-22).

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-21 15:30:35 +01:00
Calle Wilund
f118222b2d query_pagers: Remove unneeded clustering + remove static workaround
Refs #640

* Remove use of cluster key range for tables without CK
  Checking CK existance once and use the info allows us to remove some
  stupid complexity in checking for "last key" match
* With fix for #589 we can also remove some superfluous code to
  compensate for that issue, and make "partition end" simper
* Remove extra row in CK case. Not needed anymore

End result is that pager now more or less only relies on adapted query
ranges.
2015-12-21 14:19:45 +00:00
Calle Wilund
72a079d196 paging_state: Make clustering key optional 2015-12-21 14:19:45 +00:00
Calle Wilund
c868d22d0c db/serializer: Add support for optional<T> to be serialized
template spacialization.

Simply just wraps underlying type serialization and adds a "bool"
check mark first in stream.
2015-12-21 14:19:45 +00:00
Calle Wilund
803b58620f data_output: specialize serialized_size for bool to ensure sync with write 2015-12-21 14:19:45 +00:00
Paweł Dziepak
d41807cb66 types: timestamp_from_string(): restore indentation
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-21 15:17:50 +01:00
Paweł Dziepak
873ed78358 types: catch parsing errors in timestamp_from_string()
timestamp_from_string() is used by both timestamp and date types, so it
is better to move the try { } catch { } to the functions itself instead
of expecting its callers to catch exceptions.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-21 15:14:36 +01:00
Takuya ASADA
0d1ef007d3 dist: skip mounting RAID if it's already mounted
On AMI, scylla-server fails to systemctl restart because scylla_prepare tries to mount /var/lib/scylla even it's already mounted.
This patch fixes the issue.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-21 15:50:09 +02:00
Avi Kivity
c3d0ae822d Merge seastar upstream
* seastar b44d729...51154f7 (6):
  > semaphore: add with_semaphore()
  > scripts: posix_net_conf.sh: don't transform wide CPU mask
  > resource: fix build for systems without HWLOC
  > build: link libasan before all other libraries
  > Use sys_membarrier() when available
  > build: add missing library (boost_filesystem)
2015-12-21 14:45:57 +02:00
Calle Wilund
8c17e9e26c mutation_partition: Do not return static row if CK range does not match
Fixes #589

If we got no rows, but have live static columns, we should only
give them back IFF we did not have any CK restrictions.
If ck:s exist, and we have a restriction on them, we either have maching
rows, or return nothing, since cql does not allow "is null".
2015-12-21 10:38:48 +00:00
Pekka Enberg
98454b13b9 cql3: Remove some ifdef'd code 2015-12-21 10:38:48 +00:00
Pekka Enberg
c6541b4cc2 cql3: Remove untranslated IMeasurableMemory code from column_identifier
We will not be using it so just remove the untranslated code.
2015-12-21 10:38:48 +00:00
Pekka Enberg
81d72afd85 cql3: Move delete_statement implementation to source file 2015-12-21 10:38:48 +00:00
Pekka Enberg
cd58ea3b96 cql3: Move modification_statement implementation to source file 2015-12-21 10:38:48 +00:00
Pekka Enberg
bcd602d3f8 cql3: Move parsed_statement implementation to source file 2015-12-21 10:38:48 +00:00
Pekka Enberg
44ba4857eb cql3: Move property_definitions implementation to source file 2015-12-21 10:38:48 +00:00
Pekka Enberg
2759473c7a cql3: Move select_statement implementation to source file 2015-12-21 10:38:48 +00:00
Pekka Enberg
7a5d6818a3 cql3: Move update_statement implementation to source file 2015-12-21 10:38:48 +00:00
Paweł Dziepak
9aa24860d7 test/sstables: add more key_reader tests
This patch introduces a test for reading keys from a single sstable with
the range begining and end being the keys present in the index summary.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-21 10:38:48 +00:00
Paweł Dziepak
2fd7caafa0 sstables: respect range inclusiveness in key_reader
When choosing a relevant range of buckets it wasn't taken into account
whether the range bounds are inclusive or not. That may have resulted in
more buckets being read than necessary which was a condition not
expected by the code responsible from looking for a relevant keys inside
the buckets.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-21 10:38:48 +00:00
Raphael S. Carvalho
d8e810686a sstables: remove outdated comment
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-21 10:38:48 +00:00
Raphael S. Carvalho
99710ae0e6 db: fix indentation
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-21 10:38:48 +00:00
Raphael S. Carvalho
e1edc2111c sstables: fix comment describing sstable::mark_for_deletion
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-21 10:38:48 +00:00
Raphael S. Carvalho
22ac260059 db: add missing sstable::mark_for_deletion call
If a sstable doesn't belong to current shard, mark_for_deletion
should be called for the deletion manager to still work.
It doesn't mean that the sstable will be deleted, but that the
sstable is not relevant to the current shard, thus it can be
deleted by the deletion manager in the future.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-21 10:38:48 +00:00
Asias He
2d32195c32 streaming: Invalidate cache upon receiving of stream
When a node gain or regain responsibility for certain token ranges,
streaming will be performed, upon receiving of the stream data, the
row cache is invalidated for that range.

Refs #484.
2015-12-21 14:44:13 +08:00
Asias He
517fd9edd4 streaming: Add helper to get distributed<database> db 2015-12-21 14:42:47 +08:00
Asias He
d51227ad9c streaming: Remove transfer_files
It is never used.
2015-12-21 14:42:47 +08:00
Asias He
c25393a3f6 database: Add non-const version of get_row_cache
We need this to invalidate row cache of a column family.
2015-12-21 14:42:47 +08:00
Tomasz Grabiec
324ad43be1 Merge branch 'penberg/cql-cleanups/v1' from seastar-dev.git
Another round of cleanups to the CQL code from Pekka.
2015-12-18 17:36:45 +01:00
Tomasz Grabiec
0862d2f531 Merge branch 'pdziepak/fix-sstables-key_reader-663/v2'
From Paweł:

"This series fixes sstables::key_reader not respecting range inclusiveness
if the bounds were the keys that were present in the index summary.

Fixes #663."
2015-12-18 17:35:09 +01:00
Paweł Dziepak
b39d1fb1fc test/sstables: add more key_reader tests
This patch introduces a test for reading keys from a single sstable with
the range begining and end being the keys present in the index summary.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-18 17:24:29 +01:00
Paweł Dziepak
18b8d7cccc sstables: respect range inclusiveness in key_reader
When choosing a relevant range of buckets it wasn't taken into account
whether the range bounds are inclusive or not. That may have resulted in
more buckets being read than necessary which was a condition not
expected by the code responsible from looking for a relevant keys inside
the buckets.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-18 17:24:26 +01:00
Pekka Enberg
eeadf601e6 Merge "cleanups and improvements" from Raphael 2015-12-18 13:45:11 +02:00
Pekka Enberg
9521ef6402 cql3: Remove some ifdef'd code 2015-12-18 13:29:58 +02:00
Pekka Enberg
f5597968ac cql3: Remove untranslated IMeasurableMemory code from column_identifier
We will not be using it so just remove the untranslated code.
2015-12-18 13:29:58 +02:00
Pekka Enberg
b754de8f4a cql3: Move delete_statement implementation to source file 2015-12-18 13:29:58 +02:00
Pekka Enberg
227e517852 cql3: Move modification_statement implementation to source file 2015-12-18 13:29:58 +02:00
Pekka Enberg
ca963d470e cql3: Move parsed_statement implementation to source file 2015-12-18 13:07:55 +02:00
Pekka Enberg
ff994cfd39 cql3: Move property_definitions implementation to source file 2015-12-18 13:04:32 +02:00
Pekka Enberg
d7db5e91b6 cql3: Move select_statement implementation to source file 2015-12-18 12:59:22 +02:00
Pekka Enberg
8b780e3958 cql3: Move update_statement implementation to source file 2015-12-18 12:54:19 +02:00
Takuya ASADA
0f46d10011 dist: add execute permission to build_ami_local.sh
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-18 11:56:44 +02:00
Pekka Enberg
e56bf8933f Improve not implemented errors
Print out the function name where we're throwing the exception from to
make it easier to debug such exceptions.
2015-12-18 10:51:37 +01:00
Paweł Dziepak
73f9850e1c tests/key_reader: make sure that the reader lives long enough
Fixes test failure in debug mode.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-18 10:32:37 +01:00
Pekka Enberg
39af3ec190 Merge "Implement nodetool drain" from Paweł
"This series adds support for nodetool command 'drain'. The general idea
 of this command is to close all connection (both with clienst and other
 nodes) and flush all memtables to disk.

 Fixes #662."
2015-12-18 11:16:32 +02:00
Takuya ASADA
ae10d86ba4 dist: add missing building time dependencies for Ubuntu
This is necessary to make --cpuset parameter work correctly.

Fixes #554

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-18 11:16:02 +02:00
Takuya ASADA
01bd4959ac dist: downgrade g++ to 4.9 on Ubuntu
Since Ubuntu package fails to build with g++-5, we need to downgrade it.

Fixes #665

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-18 11:15:22 +02:00
Takuya ASADA
aad9c9741a dist: add hwloc as a dependency
It is required for posix_net_conf.sh

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-18 11:14:07 +02:00
Paweł Dziepak
ae3e1374b4 test.py: add missing tests
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 19:08:21 +01:00
Pekka Enberg
89dcc5dfb3 Merge "dist: provide generic Scylla setup script" from Takuya
"Merge AMI scripts to dist/common/scripts, make it usable on non-AMI
 environments. Provides a script to do all settings automatically, which
 able to run as one-liner like this:

   curl http://url_to_scylla_install | sudo bash -s -- -d /dev/xvdb,/dev/xvdc -n eth0 -l ./

 Also enables coredump, save it to /var/lib/scylla/coredump"
2015-12-17 16:01:49 +02:00
Paweł Dziepak
39a65e6294 api: enable storage_service::drain()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 14:06:41 +01:00
Paweł Dziepak
9c0b7f9bbe storage_service: implement drain()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 14:06:41 +01:00
Paweł Dziepak
dcbba2303e messaging_service: restore indentation
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 14:06:41 +01:00
Paweł Dziepak
9661d8936b messaging_service: wait for outstanding requests
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 14:06:41 +01:00
Paweł Dziepak
442bc90505 compaction_manager: check whether the manager is already stopped
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 14:06:41 +01:00
Paweł Dziepak
25d255390e database: add non-const getter for compaction_manager
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 14:06:41 +01:00
Paweł Dziepak
31672906d3 transport: wait for outstanding requests to end during shutdown
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 14:06:41 +01:00
Paweł Dziepak
8ee1a44720 storage_service: implement get_drain_progress()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 14:06:40 +01:00
Paweł Dziepak
28e6edf927 transport: ignore future when stopping the server
When the server is shutting down a flag _stopping is set and listeners
are aborted using abort_accept(), which causes accept() calls to return
failed futures. However, accept handler just checks that the flag
_stopping is set and returns which causes a failed future to be
destroyed and a warning is printed.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 14:06:40 +01:00
Takuya ASADA
f7796ef7b3 dist: host gcc-5.1.1-4.fc22.src.rpm on our S3 account, since Fedora mirror deleted it
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-17 12:53:32 +02:00
Takuya ASADA
9b4d0592fa dist: enable coredump, save it to /var/lib/scylla/coredump
Enables coredump, save it to /var/lib/scylla/coredump

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-17 18:20:27 +09:00
Takuya ASADA
d0e5f8083f dist: provide generic scylla setup script
Merge AMI scripts to dist/common/scripts, make it usable on non-AMI environments.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-17 18:20:03 +09:00
Takuya ASADA
768ad7c4b8 dist: add SET_NIC entry on sysconfig
Add SET_NIC parameter which is already used in scylla_prepare

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-17 18:19:46 +09:00
Takuya ASADA
de1277de29 dist: specify NIC ifname on sysconfig, pass it to posix_net_conf.sh
Support to specify IFNAME for posix_net_conf.sh

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-17 18:19:23 +09:00
Takuya ASADA
04d9a2a210 dist: add mdadm, xfsprogs on package dependencies
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-17 16:59:07 +09:00
Pekka Enberg
9604d55a44 Merge "Add unit test for get_restricted_ranges()" from Tomek 2015-12-17 09:14:30 +02:00
Avi Kivity
b34a1f6a84 Merge "Preliminary changes for handling of schema changes" from Tomasz
"I extracted some less controversial changes on which the schema changes series will depend
 o somewhat reduce the noise in the main series."
2015-12-16 19:08:22 +02:00
Tomasz Grabiec
e2037ebc62 schema: Fix operator==() to include missing fields 2015-12-16 18:06:55 +01:00
Tomasz Grabiec
5a4d47aa1b schema: Remove dead code 2015-12-16 18:06:55 +01:00
Tomasz Grabiec
7a3bae0322 schema: Add equality operators 2015-12-16 18:06:55 +01:00
Tomasz Grabiec
f9d6c7b026 compress: Add equality operators 2015-12-16 18:06:55 +01:00
Tomasz Grabiec
adb93ef31f types: Make name() return const& 2015-12-16 18:06:55 +01:00
Tomasz Grabiec
f28e5f0517 tests: mutation_assertions: Make is_equal_to() check symmetricity 2015-12-16 18:06:55 +01:00
Tomasz Grabiec
3324cf0b8c tests: mutation_reader_assertions: Introduce next_mutation() 2015-12-16 18:06:55 +01:00
Tomasz Grabiec
ad99f89228 tests: mutation_assertion: Introduce has_schema() 2015-12-16 18:06:55 +01:00
Tomasz Grabiec
7451ab4356 tests: mutation_assertion: Allow chaining of assertions 2015-12-16 18:06:55 +01:00
Tomasz Grabiec
efe08a0512 tests: mutation_assertions: Own the mutation which is checked
Easier for users because they don't have to ensure liveness.
2015-12-16 18:06:55 +01:00
Tomasz Grabiec
0cdee6d1c3 tests: row_cache: Fix test_update()
The underlying data source for cache should not be the same memtable
which is later used to update the cache from. This fixes the following
assertion failure:

row_cache_test_g: utils/logalloc.hh:289: decltype(auto) logalloc::allocating_section::operator()(logalloc::region&, Func&&) [with Func = memtable::make_reader(schema_ptr, const partition_range&)::<lambda()>]: Assertion `r.reclaiming_enabled()' failed.

The problem is that when memtable is merged into cache their regions
are also merged, so locking cache's region locks the memtable region
as well.
2015-12-16 18:06:55 +01:00
Tomasz Grabiec
09188bccde mutation_query: Make reconcilable_result printable 2015-12-16 18:06:54 +01:00
Tomasz Grabiec
dd51ff0410 query: Make query::result movable 2015-12-16 18:06:54 +01:00
Tomasz Grabiec
872bfadb3d messaging_service: Remove unused parameters from send_migration_request() 2015-12-16 18:06:54 +01:00
Tomasz Grabiec
157af1036b data_output: Introduce write_view() which matches data_input::read_view() 2015-12-16 18:06:54 +01:00
Tomasz Grabiec
054187acf2 db/serializer: Introduce to_bytes/from_bytes helpers 2015-12-16 18:06:54 +01:00
Tomasz Grabiec
e8d49a106c query_processor: Add trace-level logging of queries 2015-12-16 18:06:54 +01:00
Tomasz Grabiec
de09c86681 data_value: Make printable 2015-12-16 18:06:54 +01:00
Tomasz Grabiec
2ee60d8496 tests: sstable_test: Avoid throwing during expected conditions
Makes debugging easier by making 'catch throw' not stop on expected
conditions.
2015-12-16 18:06:54 +01:00
Tomasz Grabiec
ef49c95015 tests: cql_query_env: Avoid exceptions during normal execution 2015-12-16 18:06:54 +01:00
Tomasz Grabiec
50984ad8d4 scylla-gdb.py: Allow the script to be sourced multiple times
Currently sourcing for the second time causes an exception from
pretty printer registration:

Traceback (most recent call last):
  File "./scylla-gdb.py", line 41, in <module>
    gdb.printing.register_pretty_printer(gdb.current_objfile(), build_pretty_printer())
  File "/usr/share/gdb/python/gdb/printing.py", line 152, in register_pretty_printer
    printer.name)
RuntimeError: pretty-printer already registered: scylla
2015-12-16 18:06:51 +01:00
Avi Kivity
e27a5d97f6 Merge "background mutation throttling" from Gleb
Fixes the case where background activity needed to complete CL=ONE writes
is queued up in the storage proxy, and the client adds new work faster
than it can be cleared.
2015-12-16 18:08:12 +02:00
Raphael S. Carvalho
41be378ff1 db: fix build of sstable list in column_family::compact_sstables
The last two loops were incorrectly inside the first one. That's a
bug because a new sstable may be emplaced more than once in the
sstable list, which can cause several problems. mark_for_deletion
may also be called more than once for compacted sstables, however,
it is idempotent.
Found this issue while auditing the code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-16 17:46:17 +02:00
Avi Kivity
4c84d23f3b Merge seastar upstream
"* seastar 294ea30...b44d729 (5):
  > Merge "Properly distribute IO queues" from Glauber
  > reactor: allow more poll time in virtualized environments
  > reactor: fix idle-poll limit
  > reactor: use a vector of unique_ptr for the IO queues
  > io queues: make the queues really part of the reactor"
2015-12-16 17:42:30 +02:00
Tomasz Grabiec
0d5166dcd8 tests: Add test for get_restricted_ranges() 2015-12-16 13:09:01 +01:00
Tomasz Grabiec
e445e4785c storage_proxy: Extract get_restricted_ranges() as a free function
To make it directly testable.
2015-12-16 13:09:01 +01:00
Tomasz Grabiec
756624ef18 Remove dead code 2015-12-16 13:09:01 +01:00
Tomasz Grabiec
eb27fb1f6b range: Introduce equal() 2015-12-16 13:09:01 +01:00
Calle Wilund
43929d0ec1 commitlog: Add some comments about the IO flow
Documentation.
2015-12-16 13:13:31 +02:00
Gleb Natapov
de63b3a824 storage_proxy: provide timeout for send_mutation verb
Providing timeout for send_mutation verb allows rpc to drop packets that
sit in outgoing queue for to long.
2015-12-16 10:13:46 +02:00
Gleb Natapov
fe4bc741f4 storage_proxy: throttle mutations based on ongoing background activity
With consistency level less then ALL mutation processing can move to
background (meaning client was answered, but there is still work to
do on behalf of the request). If background request rate completion
is lower than incoming request rate background request will accumulate
and eventually will exhaust all memory resources. This patch's aim is
to prevent this situation by monitoring how much memory all current
background request take and when some threshold is passed stop moving
request to background (by not replying to a client until either memory
consumptions moves below the threshold or request is fully completed).

There are two main point where each background mutation consumes memory:
holding frozen mutation until operation is complete in order to hint it
if it does not) and on rpc queue to each replica where it sits until it's
sent out on the wire. The patch accounts for both of those separately
and limits the former to be 10% of total memory and the later to be 6M.
Why 6M? The best answer I can give is why not :) But on a more serious
note the number should be small enough so that all the data can be
sent out in a reasonable amount of time and one shard is not capable to
achieve even close to a full bandwidth, so empirical evidence shows 6M
to be a good number.
2015-12-16 10:13:46 +02:00
Pekka Enberg
40e8a9c99c sstables/compaction: Fix compilation error with GCC 4.9.2
I am sure it's a compiler issue but I am not ready to give up and
upgrade just yet:

  sstables/compaction.cc:307:55: error: converting to ‘std::unordered_map<int, long int>’ from initializer list would use explicit constructor ‘std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::unordered_map(std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::size_type, const hasher&, const key_equal&, const allocator_type&) [with _Key = int; _Tp = long int; _Hash = std::hash<int>; _Pred = std::equal_to<int>; _Alloc = std::allocator<std::pair<const int, long int> >; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::size_type = long unsigned int; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::hasher = std::hash<int>; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::key_equal = std::equal_to<int>; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::allocator_type = std::allocator<std::pair<const int, long int> >]’
                 stats->start_size, stats->end_size, {});
2015-12-16 10:03:14 +02:00
Raphael S. Carvalho
36d31a5dab fix cql_query_test
Test was failing because _qp (distributed<cql3::query_processor>) was stopped
before _db (distributed<database>).
Compaction manager is member of database, and when database is stopped,
compaction manager is also stopped. After a2fb0ec9a, compaction updates the
system table compaction history, and that requires a working query context.
We cannot simply move _qp->stop() to after _db->stop() because the former
relies on migration_manager and storage_proxy. So the most obvious fix is to
clean the global variable that stores query context after _qp was stopped.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-16 09:58:46 +02:00
Nadav Har'El
63c0906b16 messaging_service: drop unnecessary explicit templates
The previous patch added message_service read()/write() support for all
types which know how to serialize themselves through our "old" serialization
API (serialize()/deserialize()/serialized_size()).

So we no longer need the almost 200 lines of repetitive code in
messaging_service.{cc,hh} which defined these read/write templates
separately for a dozen different types using their *serialize() methods.
We also no longer need the helper functions read_gms()/write_gms(), which
are basically the same code as that in the template functions added in the
previous patch.

Compilation is not significantly slowed down by this patch, because it
merely replaces a dozen templates by one template that covers them all -
it does not add new template complexity, and these templates are anyway
instantiated only in messaging_service.cc (other code only calls specific
functions defined in messaging_service.cc, and does not use these templates).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-15 19:07:05 +02:00
Nadav Har'El
438f6b79f7 messaging_service: allow any self-serializing type
Currently, messaging_service only supports sending types for which a read/
write function has been explicitly implemented in messageing_service.hh/cc.

Some types already have serialization/deserialization methods inside them,
and those could have been used for the serialization without having to write
new functions for each of these types. Many of these types were already
supported explicitly in messaging_service.{cc,hh}, but some were forgot -
for example, dht::token.

So this patch adds a default implemention of messaging_service write()/read()
which will work for any type which has these serialization methods.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2015-12-15 19:07:05 +02:00
Tomasz Grabiec
a78f4656e8 Introduce ring_position_less_comparator 2015-12-15 18:00:55 +01:00
Avi Kivity
8fc7583224 Merge seastar upstream
* seastar 5b9e3da...294ea30 (9):
  > Merge "IO queues" from Glauber
  > reactor: increment check_direct_io_support to also deal with files
  > Merge "SSL/TLS initial certificate validation" from Calle
  > tutorial.md: remove inaccurate statements about x86
  > build: verify that the installed compiler is up to date
  > build: complain if fossil version of gnutls is installed
  > build: fix debian naming of gnutls-devel package
  > build: add configure-time check for gnutls-devel
  > tutorial.md: introduction to asynchrnous programming
2015-12-15 16:50:16 +02:00
Gleb Natapov
e43ae7521f storage_proxy: unfuturize send_to_live_endpoints()
send_to_live_endpoints() is never waited upon, it does its job in the
background. This patch formalize that by changing return value to void
and also refactoring code so that frozen_mutation shared pointer is not
held more that it should: currently it is held until send_mutation()
completes, but since send_mutation() does not use frozen_mutation
asynchronously this is not necessary.
2015-12-15 15:40:36 +02:00
Tomasz Grabiec
305c2b0880 frozen_mutation: Introduce decorated_key() helper
Requested by Asias for use in streaming code.
2015-12-15 15:16:04 +02:00
Tomasz Grabiec
179b587d62 Abstract timestamp creation behind new_timestamp()
Replace db_clock::now_in_usec() and db_clock::now() * 1000 accesses
where the intent is to create a new auto-generate cell timestamp with
a call to new_timestamp(). Now the knowledge of how to create timestamps
is in a single place.
2015-12-15 15:16:04 +02:00
Avi Kivity
8abd013601 Merge 2015-12-15 15:00:49 +02:00
Avi Kivity
fb8a4f6c1b Merge " implement get_compactions API" from Raphael
"get_compactions returns progress information for each compaction
running in the system. It can be accessed using swagger UI.
'nodetool compactionstats' is not working yet because of some
pending work in the nodetool side."
2015-12-15 14:59:49 +02:00
Paweł Dziepak
71f92c4d14 mutation_partition: do not move rows_entry::_link
Apparently, link hook copy constructor is a no-op and move contructor
doesn't exist so the code is correct, but that explicit move makes code
needlessly confusing.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-15 13:22:23 +01:00
Paweł Dziepak
59245e7913 row_cache: add functions for invalidating entries in cache
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-15 13:21:11 +01:00
Raphael S. Carvalho
833a78e9f7 api: implement get_compactions
get_compactions returns progress information about each
ongoing compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-15 09:50:36 -02:00
Raphael S. Carvalho
193ede68f3 compaction: register and deregister compaction_stats
That's important for compaction stats API that will need stats
data of each ongoing compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-15 09:50:32 -02:00
Raphael S. Carvalho
e74dcc86bd compaction_manager: introduce list of compaction_stats
This list will store compaction_stats for each ongoing compaction.
That's why register and deregister methods are provided.
This change is important for compaction stats API that needs data
of each ongoing compaction, such as progress, ks, cf, etc.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-15 09:50:28 -02:00
Raphael S. Carvalho
a26fb15d1a db: add method to get compaction manager from cf
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-15 09:50:20 -02:00
Raphael S. Carvalho
1fba394dd0 sstables: store keyspace and cf in compaction_stats
The reason behind this change is that we will need ks and cf
for the compaction stats API.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-15 09:50:02 -02:00
Raphael S. Carvalho
ac1a67c8bc sstables: move compaction_stats to header file
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-15 09:49:45 -02:00
Avi Kivity
a2eac711cf Merge "compaction history support" from Raphael
"This patchset will make Scylla update the system table
COMPACTION_HISTORY whenever a compaction job finishes.
Functions were added to both update and retrieve the
content of this system table. Compaction history API
is also enabled in this series."
2015-12-15 13:22:14 +02:00
Raphael S. Carvalho
87fbe29cf9 api: add support to compaction history
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-15 09:00:21 -02:00
Takuya ASADA
9c5afb8e58 dist: add scylla-gdb.py on scylla-server-debuginfo rpm package
It will place at /usr/src/debug/scylla-server-development/scylla-gdb.py
Fixes #604

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-15 12:13:17 +02:00
Raphael S. Carvalho
a2fb0ec9a3 sstables: update compaction history at the end of compaction
When compaction job finishes, call function to update the system
table COMPACTION_HISTORY. That's also needed for the compaction
history API.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-14 14:20:03 -02:00
Raphael S. Carvalho
433ed60ca3 db: add method to get compaction history
This method is intended to return content of the system table
COMPACTION_HISTORY as a vector of compaction_history_entry.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-14 14:19:04 -02:00
Raphael S. Carvalho
f3beacac28 db: add method to update the system table COMPACTION_HISTORY
It's supposed to be called at the end of compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-14 13:47:10 -02:00
Raphael S. Carvalho
0fa194c844 sstables: remove outdated comment
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-14 12:43:53 -02:00
Raphael S. Carvalho
6142efaedb db: fix indentation
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-14 12:43:34 -02:00
Raphael S. Carvalho
81f5b1716e sstables: fix comment describing sstable::mark_for_deletion
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-14 12:43:11 -02:00
Raphael S. Carvalho
7bbc1b49b6 db: add missing sstable::mark_for_deletion call
If a sstable doesn't belong to current shard, mark_for_deletion
should be called for the deletion manager to still work.
It doesn't mean that the sstable will be deleted, but that the
sstable is not relevant to the current shard, thus it can be
deleted by the deletion manager in the future.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-14 12:42:26 -02:00
Tomasz Grabiec
0865ecde17 storage_proxy: Fix range splitting
There is a check whose intent was to detect wrap around during walk of
the ring tokens by comparing the split point with minimum token, which
is supposed to be inserted by the ring iterator. It assumed that when
we encounter it, the range is a wrap around. It doesn't hold when
minimum token is part of the token metadata or set of tokens is empty.

In such case, a full range would be split into 3 overlapping full
ranges. The fix is to drop the assumption and instead ensure that
ranges do not wrap around by unwrapping them if necessary.

Fixes #655.
2015-12-14 16:05:54 +02:00
Takuya ASADA
3b7693feda dist: add package dependency to gnutls library
Now Seastar depends to gnutls, we need to add it on .rpm/.deb package dependency.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-14 13:28:28 +02:00
Pekka Enberg
ba09c545fc dist/docker: Enable SMP support
Now that Scylla has a sleep mode, we can enable SMP support again.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-12-14 13:23:30 +02:00
Avi Kivity
fd14cb3743 mutation_partition: fix leak in move assignment operator
The default move assignment operator calls boost::intrusive::set's move
assignment operator, which leaks, because it does not believe it owns
the data.

Fix by providing a custom implementation.
2015-12-14 10:33:19 +01:00
Asias He
9781e0d34d storage_service: Make bootstrapping/leaving/moving log more consistent
It is useful for test code to grep the log.
2015-12-11 13:57:40 +02:00
Tomasz Grabiec
1991fd5ca2 Merge branch 'pdziepak/fix-clustering-key-comparison/v2' fom seastar-dev.git
From Paweł:

This series fixes comparison of byte order comparable clustering keys.

Fixes #645.
2015-12-11 12:51:02 +01:00
Paweł Dziepak
3a73496817 tests/cql: add test for ordering clustering keys
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-11 12:05:25 +01:00
Paweł Dziepak
8cab343895 compound: fix compare() of prefixable types
All components of prefixable compound type are preceeded by their
length what makes them not byte order comparable.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-11 12:04:31 +01:00
Paweł Dziepak
8fd4b9f911 schema: remove _clustering_key_prefix_type
All clustering keys are now prefixable.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-11 10:47:24 +01:00
Paweł Dziepak
bb9a71f70c thrift: let class_from_compound_type() accept prefixable types
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-11 10:45:56 +01:00
Pekka Enberg
0d8a02453e types: Fix frozen collection type names
Frozen collection type names must be wrapped in FrozenType so that we
are able to store the types correctly in system tables.

This fixes #646 and fixes #580.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-12-11 10:41:11 +01:00
Pekka Enberg
63bdeb65f2 cql3: Implement maps::literal::test_assignment() function
The test_assignement() function is invoked via the Cassandra unit tests
so we might as well implement it.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-12-11 09:35:13 +01:00
Asias He
57ee9676c2 storage_service: Fix default ring_delay time
It is 30 seconds instead of 5 seconds by default. To align with c*.

Pleas note, after this a node will takes at least 30 seconds to complete
a bootstrap.
2015-12-11 09:05:19 +02:00
Avi Kivity
b3cd672d97 Merge seastar upstream
* seastar ad07a2e...5b9e3da (2):
  > Merge "rpc cleanups and improvements" from Gleb
  > shared_future: Add missing include
2015-12-10 18:11:59 +02:00
Paweł Dziepak
9d482532f4 tests/lsa: reduce the size of large allocation
Originally, large allocation test case attempted to allocate an object
as big as halft of the space used by the lsa. That failed when the test
was executed with lower amount of memory available mainly due to the
memory fragmentation caused by previous test cases.

This patches reduces the size of the large allocation to 3/8 of the
total space used by the lsa which is still a lot but seems to make the
test pass even with as little memory as 64MB per shard.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 13:16:43 +01:00
Avi Kivity
d425aacaeb release: copy version string into heap
If we get a core dump from a user, it is important to be able to
identify its version.  Copy the release string into the heap (which is
copied into the code dump), so we can search for it using the "strings"
or "ident" commands.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
2015-12-10 13:12:40 +02:00
Lucas Meneghel Rodrigues
2167173251 utils/logalloc.cc - Declare member minimum_size from segment_zone struct
This fixes compile error:

In function `logalloc::segment_zone::segment_zone()':
/home/lmr/Code/scylla/utils/logalloc.cc:412: undefined reference to `logalloc::segment_zone::minimum_size'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>
2015-12-10 12:54:34 +02:00
Asias He
b7d10b710e streaming: Propagate fail to send PREPARE_DONE_MESSAGE exception
Otherwise the stream_plan will not be marked as failed state.
2015-12-10 12:38:00 +02:00
Paweł Dziepak
ec453c5037 managed_bytes: fix potentially unaligned accesses
blob_storage defined with attribute packed which makes its alignment
requirement equal 1. This means that its members may be unaligned.
GCC is obviously aware of that and will generate appropriate code
(and not generate ubsan checks). However, there are few places where
members of blob_storage are accessed via pointers, these have to be
wrapped by unaligned_cast<> to let the compiler know that the location
pointed to may be not aligned properly.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 11:59:54 +02:00
Tomasz Grabiec
43498b3158 Merge branch 'pdziepak/fix-partial-clustering-keys/v1' from seastar-dev.git
Form Paweł:

This series fixes support for clustering keys which trailing components
are null. The solution is to use clustering_key_prefix instead of
clustering_key everywhere.

Fixes #515.
2015-12-10 10:43:12 +01:00
Paweł Dziepak
66ff1421f0 tests/cql: add test for clustering keys with empty components
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:47:07 +01:00
Paweł Dziepak
64f50a4f40 db: make clustering_key a prefix
Schemas using compact storage can have clustering keys with the trailing
components not set and effectively being a clustering key prefixes
instead of full clustering keys.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:46:47 +01:00
Paweł Dziepak
77c7ed6cc5 keys: add prefix_equality_less_compare for prefixes
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:46:26 +01:00
Paweł Dziepak
220a3b23c0 keys: allow creating partial views of prefixes
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:46:26 +01:00
Paweł Dziepak
3c16ab080a sstables: do not assume clustering_key has the proper format
In case of non-compound dense tables the column name is just the value
of the clustering key (which has only one component). Current code just
casts clustering_key to bytes_view which works because there is no
additional metadata in single element clustering keys.
However, that may change when the internal representation of clustering
key is changed so explicitly extract the proper component.

This change will become necessary when clustering_key is replaced by
clustering_key_prefix.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:46:26 +01:00
Paweł Dziepak
5f1e9fd88f mutation_partition: remove unused find_entry()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:46:26 +01:00
Paweł Dziepak
3287022000 cql3: do not assume that clustering key is full
In case of schemas that use compact storage it is possible that trailing
components of clustering keys are not set.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 05:46:26 +01:00
Avi Kivity
167addbfe1 main: remove issue #417 (poll mode) warning
Fixed.
2015-12-09 19:00:32 +02:00
Avi Kivity
a352d63bf9 Merge seastar upstream
* seastar c5e595b...ad07a2e (1):
  > reactor: add command line option to disable sleep mode

Fixes #417
2015-12-09 19:00:20 +02:00
Glauber Costa
3c988e8240 perf_sstable: use current scylla default directory
When this tool was written, we were still using /var/lib/cassandra as a default
location. We should update it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2015-12-09 17:46:31 +02:00
Avi Kivity
01c3670def Merge seastar upstream
* seastar 5dc22fa...c5e595b (3):
  > memory: be less strict about NUMA bindings
  > reactor: let the resource code specify the default memory reserve
  > resource: reserve even more memory when hwloc is compiled in

Fixes #642
2015-12-09 16:47:47 +02:00
Asias He
66938ac129 streaming: Add retransmit logic for streaming verbs
Retransmit streaming related verbs and give up in 5 minutes.

Tested with:

  lein test :only cassandra.batch-test/batch-halves-decommission

Fixes #568.
2015-12-09 15:12:36 +02:00
Avi Kivity
14794af260 Merge seastar upstream
* seastar 9f9182e...5dc22fa (1):
  > future: add repeat_until_value(): repeat an action until it returns a value
2015-12-09 15:11:59 +02:00
Avi Kivity
213700e42f Merge seastar upstream
* seastar d40453b...9f9182e (5):
  > Merge "Sleep mode support"
  > future: add futurize<T>::from_tuple(tuple<T>)
  > tls: Add missing destructor for dh_params::impl, fixes ASAN error
  > tls/socket fix: Add missing noexcept to constructor/move
  > Merge "Initial SSL/TLS socket support" from Calle
2015-12-09 11:01:13 +02:00
Avi Kivity
204610ac61 Merge "Make LSA more large-allocation-friendly" from Paweł
"This series attempts to make LSA more friendly for large (i.e. bigger
than LSA segment) allocations. It is achieved by introducing segment
zones – large, contiguous areas of segments and using them to allocate
segments instead of calling malloc() directly.
Zones can be shrunk when needed to reclaim memory and segments can be
migrated either to reduce number of zone or to defragment one in order
to be able to shrink it. LSA tries to keep all segments at the lower
addresses and reclaims memory starting from the zones in the highest
parts of the address space."
2015-12-09 10:49:23 +02:00
Avi Kivity
883074e936 Merge "Fix replace_node support" from Asias
Also:

[PATCH scylla v1 0/7] gossip mark node down fix + cleanup
[PATCH scylla v1 0/2] Refuse decommissioned node to rejoin
[PATCH scylla] storage_service: Fix added node not showing up in nodetool in status joining
2015-12-09 10:42:52 +02:00
Paweł Dziepak
8ba66bb75d managed_bytes: fix copy size in move constructor
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-09 10:38:28 +02:00
Asias He
b63d49c773 storage_service: Log removing replaced endpoint from system.peers
This info is important when replacing a node. Useful for debugging.
2015-12-09 12:30:52 +08:00
Asias He
d26c7e671d storage_service: Enable commented out code in handle_state_normal
Add current_owner to endpoints_to_remove if endpoint and current_owner
have the same token and endpoint is newer than current_owner.
2015-12-09 12:30:52 +08:00
Asias He
3793bb7be1 token_metadata: Add get_endpoint_to_token_map_for_reading 2015-12-09 12:30:52 +08:00
Asias He
1cc7887ffb token_metadata: Do nothing if tokens is empty.
When replacing a node, we might ignore the tokens so that the tokens is
empty. In this case, we will have

   std::unordered_map<inet_address, std::unordered_set<token>> = {ip, {}}

passed to token_metadata::update_normal_tokens(std::unordered_map<inet_address,
std::unordered_set<token>>& endpoint_tokens)

and hit the assert

   assert(!tokens.empty());
2015-12-09 12:30:52 +08:00
Asias He
e79c85964f system_keyspace: Flush system.peers in remove_endpoint
1) Start node 1, node 2, node 3
2) Stop  node 3
3) Start node 4 to replace node 3
4) Kill  node 4 (removal of node 3 in system.peers is not flushed to disk)
5) Start node 4 (will load node 3's token and host_id info in bootup)

This makes

   "Token .* changing ownership from 127.0.0.3 to 127.0.0.4"

messages printed again in step 5) which are not expected, which fails the dtest

   FAIL: replace_first_boot_test (replace_address_test.TestReplaceAddress)
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File "scylla-dtest/replace_address_test.py",
   line 220, in replace_first_boot_test
       self.assertEqual(len(movedTokensList), numNodes)
   AssertionError: 512 != 256
2015-12-09 12:30:52 +08:00
Asias He
110a18987e token_metadata: Print Token changing ownership from
Needed by test.
2015-12-09 12:30:52 +08:00
Asias He
906f670a86 gossip: Print node status in handle_major_state_change
It is useful to know the STATUS value when debugging.
2015-12-09 12:29:15 +08:00
Asias He
a0325a5528 gossip: Simplify is_shutdown and friends.
Use the newly added helper get_gossip_status.
2015-12-09 12:29:15 +08:00
Asias He
9d4382c626 gossip: Introduce get_gossip_status
Get value of application_state::STATUS.
2015-12-09 12:29:15 +08:00
Asias He
5a65d8bcdd gossip: Fix endless marking a node down
In commit 56df32ba56 (gossip: Mark node as
dead even if already left). A node liveness check is missed.

Fix it up.

Before: (mark a node down multiple times)

[Tue Dec  8 12:16:33 2015] INFO  [shard 0] gossip - InetAddress 127.0.0.3 is now DOWN
[Tue Dec  8 12:16:33 2015] DEBUG [shard 0] storage_service - endpoint=127.0.0.3 on_dead
[Tue Dec  8 12:16:34 2015] INFO  [shard 0] gossip - InetAddress 127.0.0.3 is now DOWN
[Tue Dec  8 12:16:34 2015] DEBUG [shard 0] storage_service - endpoint=127.0.0.3 on_dead
[Tue Dec  8 12:16:35 2015] INFO  [shard 0] gossip - InetAddress 127.0.0.3 is now DOWN
[Tue Dec  8 12:16:35 2015] DEBUG [shard 0] storage_service - endpoint=127.0.0.3 on_dead
[Tue Dec  8 12:16:36 2015] INFO  [shard 0] gossip - InetAddress 127.0.0.3 is now DOWN
[Tue Dec  8 12:16:36 2015] DEBUG [shard 0] storage_service - endpoint=127.0.0.3 on_dead

After: (mark a node down only one time)

[Tue Dec  8 12:28:36 2015] INFO  [shard 0] gossip - InetAddress 127.0.0.3 is now DOWN
[Tue Dec  8 12:28:36 2015] DEBUG [shard 0] storage_service - endpoint=127.0.0.3 on_dead
2015-12-09 12:29:15 +08:00
Asias He
fa3c84db10 gossip: Kill default constructor for versioned_value
The only reason we needed it is to make
   _application_state[key] = value
work.

With the current default constructor, we increase the version number
needlessly. To fix and to be safe, remove the default constructor
completely.
2015-12-09 12:29:15 +08:00
Asias He
52a5e954f9 gossip: Pass const ref for versioned_value in on_change and before_change 2015-12-09 12:29:15 +08:00
Asias He
3308430343 storage_service: Make before_change and on_change log print more informative
- Make before_change and on_change print the versioned_value
- Print endpoint address first in handle_state_* and
  on_change and friends.
2015-12-09 12:29:15 +08:00
Asias He
ccbd801f40 storage_service: Fix decommissioned nodes are willing to rejoin the cluster if restarted
Backport: CASSANDRA-8801

a53a6ce Decommissioned nodes will not rejoin the cluster.

Tested with:
topology_test.py:TestTopology.decommissioned_node_cant_rejoin_test
2015-12-09 10:43:51 +08:00
Asias He
b3dd2d976a storage_service: Simplify prepare_to_join with seastar thread 2015-12-09 10:43:51 +08:00
Asias He
e9a4d93d1b storage_service: Fix added node not showing up in nodetool in status joining
The get_token_endpoint API should return a map of tokens to endpoints,
including the bootstrapping ones.

Use get_local_storage_service().get_token_to_endpoint_map() for it.

$ nodetool -p 7100 status

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns    Host ID Rack
UN  127.0.0.1  12645      256     ?  eac5b6cf-5fda-4447-8104-a7bf3b773aba  rack1
UN  127.0.0.2  12635      256     ?  2ad1b7df-c8ad-4cbc-b1f1-059121d2f0c7  rack1
UN  127.0.0.3  12624      256     ?  61f82ea7-637d-4083-acc9-567e0c01b490  rack1
UJ  127.0.0.4  ?          256     ?  ced2725e-a5a4-4ac3-86de-e1c66cecfb8d  rack1

Fixes #617
2015-12-09 10:43:51 +08:00
Paweł Dziepak
63bdf52803 tests/lsa: add large allocations test
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 23:56:46 +01:00
Tomasz Grabiec
d68a8b5349 Merge branch 'dev/amnon/index_summary_size_v2' from seastar-dev.git
API for getting sstable index summary memory footprint from Amnon
2015-12-08 20:03:39 +01:00
Paweł Dziepak
73a1213160 scylla-gdb.py: print lsa zones
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
0d66300d43 lsa: add more counters
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
83b004b2fb lsa: avoid fragmenting memory
Originally, lsa allocated each segment independently what could result
in high memory fragmentation. As a result many compaction and eviction
passes may be needed to release a sufficiently big contiguous memory
block.

These problems are solved by introduction of segment zones, contiguous
groups of segments. All segments are allocated from zones and the
algorithm tries to keep the number of zones to a minimum. Moreover,
segments can be migrated between zones or inside a zone in order to deal
with fragmentation inside zone.

Segment zones can be shrunk but cannot grow. Segment pool keeps a tree
containing all zones ordered by their base addresses. This tree is used
only by the memory reclamer. There is also a list of zones that have
at least one free segments that is used during allocation.

Segment allocation doesn't have any preferences which segment (and zone)
to choose. Each zone contains a free list of unused segments. If there
are no zones with free segments a new one is created.

Segment reclamation migrates segments from the zones higher in memory
to the ones at lower addresses. The remaining zones are shrunk until the
requested number of segments is reclaimed.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
6c4a54fb0b tests: add tests for utils::dynamic_bitset
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
2fb14a10b6 utils: add dynamic_bitset
A dynamic bitset implementation that provides functions to search for
both set and cleared bits in both directions.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
40dda261f2 lsa: maintain segment to region mapping
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
c4e71bac7f tests/row_cache_alloc_stress: make sure that allocation fails
Currently test case "Testing reading when memory can't be reclaimed."
assumes that the allocation section used by row cache upon entering
will require more free memory than there is available (inc. evictable).
However, the reserves used by allocation section are adjusted
dynamically and depend solely on previous events. In other words there
is no guarantee that the reserve would be increased so much that the
allocation will fail.

The problem is solved by adding another allocation that is guaranteed
to be bigger than all evictable and free memory.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
2e94086a2c lsa: use bi::list to implement segment_stack
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Tomasz Grabiec
6ead7a0ec5 Merge tag 'large-blobs/v3' from git@github.com:avikivity/scylla.git
Scattering of blobs from Avi:

This patchset converts the stack to scatter managed_bytes in lsa memory,
allowing large blobs (and collections) to be stored in memtable and cache.
Outside memtable/cache, they are still stored sequentially, but it is assumed
that the number of transient objects is bounded.

The approach taken here is to scatter managed_bytes data in multiple
blob_storage objects, but to linearize them back when accessing (for
example, to merge cells).  This allows simple access through the normal
bytes_view.  It causes an extra two copies, but copying a megabyte twice
is cheap compared to accessing a megabyte's worth of small cells, so
per-byte throughput is increased.

Testing show that lsa large object space is kept at zero, but throughput
is bad because Scylla easily overwhelms the disk with large blobs; we'll
need Glauber's throttling patches or a really fast disk to see good
throughput with this.
2015-12-08 19:15:13 +01:00
Avi Kivity
5c5331d910 tests: test large blobs in memtables 2015-12-08 15:17:09 +02:00
Avi Kivity
0c2fba7e0b lsa: advertize our preferred maximum allocation size
Let managed_bytes know that allocating below a tenth of the segment size is
the right thing to do.
2015-12-08 15:17:09 +02:00
Avi Kivity
f9e2a9a086 mutation_partition: work on linearized atomic_cell_or_mutation objects
Ensure that when we examine atomic_cell_or_mutation objects for merging,
that they are contiguous in memory.  When we are done we scatter them again.
2015-12-08 15:17:09 +02:00
Avi Kivity
ad975ad629 atomic_cell_or_collection: linearize(), unlinearize()
Add linearize() and unlinearize() methods that allow making an
atomic_cell_or_collection object temporarily contiguous, so we can examine
it as a bytes_view.
2015-12-08 15:17:09 +02:00
Avi Kivity
13324607e6 managed_bytes: conform to allocation_strategy's max_preferred_allocation_size
Instead of allocating a single blob_storage, chain multiple blob_storage
objects in a list, each limited not to exceed the allocation_strategy's
max_preferred_allocation_size.  This allows lsa to allocate each blob_storage
object as an lsa managed object that can be migrated in memory.

Also provide linearize()/scatter() methods that can be used to temporarily
consolidate the storage into a single blob_storage.  This makes the data
contiguous, so we can use a regular bytes_view to examine it.
2015-12-08 15:17:08 +02:00
Amnon Heiman
3ce7fa181c API: Add the implementation for index_summary_off_heap_memory
This adds the implementation for the index_summary_off_heap_memory for a
single column family and for all of them.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-07 15:15:39 +02:00
Amnon Heiman
e786f1d02f sstable: Add get_summary function
The get_summary method returns a const reference to the summary object.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-07 14:52:18 +02:00
Amnon Heiman
bae286a5b4 Add memory_footprint method to summary_ka
Similiar to origin, off heap memory, memory_footprint is the size of
queus multiply by the structure size.

memory_footprint is used by the API to report the memory that is taken
by the summary.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-07 14:52:18 +02:00
Avi Kivity
2437fc956c allocation_strategy: expose preferred allocation size limit
Our premier allocation_strategy, lsa, prefers to limit allocations below
a tenth of the segment size so they can be moved around; larger allocations
are pinned and can cause memory fragmentation.

Provide an API so that objects can query for this preferred size limit.

For now, lsa is not updated to expose its own limit; this will be done
after the full stack is updated to make use of the limit, or intermediate
steps will not work correctly.
2015-12-06 16:23:42 +02:00
177 changed files with 5224 additions and 1895 deletions

2
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../scylla-seastar
url = ../seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui

View File

@@ -1,6 +1,6 @@
#!/bin/sh
VERSION=0.13.2
VERSION=0.14.1
if test -f version
then

View File

@@ -425,7 +425,7 @@
"summary":"load value. Keys are IP addresses",
"type":"array",
"items":{
"type":"mapper"
"type":"double_mapper"
},
"nickname":"get_load_map",
"produces":[
@@ -797,8 +797,72 @@
"paramType":"path"
},
{
"name":"options",
"description":"Options for the repair",
"name":"primaryRange",
"description":"If the value is the string 'true' with any capitalization, repair only the first range returned by the partitioner.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"parallelism",
"description":"Repair parallelism, can be 0 (sequential), 1 (parallel) or 2 (datacenter-aware).",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"incremental",
"description":"If the value is the string 'true' with any capitalization, perform incremental repair.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"jobThreads",
"description":"An integer specifying the parallelism on each node.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"ranges",
"description":"An explicit list of ranges to repair, overriding the default choice. Each range is expressed as token1:token2, and multiple ranges can be given as a comma separated list.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"columnFamilies",
"description":"Which column families to repair in the given keyspace. Multiple columns families can be named separated by commas. If this option is missing, all column families in the keyspace are repaired.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"dataCenters",
"description":"Which data centers are to participate in this repair. Multiple data centers can be listed separated by commas.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"hosts",
"description":"Which hosts are to participate in this repair. Multiple hosts can be listed separated by commas.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"trace",
"description":"If the value is the string 'true' with any capitalization, enable tracing of the repair.",
"required":false,
"allowMultiple":false,
"type":"string",
@@ -1964,6 +2028,20 @@
}
}
},
"double_mapper":{
"id":"double_mapper",
"description":"A key value mapping between a string and a double",
"properties":{
"key":{
"type":"string",
"description":"The key"
},
"value":{
"type":"double",
"description":"The value"
}
}
},
"maplist_mapper":{
"id":"maplist_mapper",
"description":"A key value mapping, where key and value are list",

View File

@@ -64,21 +64,21 @@ future<> foreach_column_family(http_context& ctx, const sstring& name, function<
future<json::json_return_type> get_cf_stats(http_context& ctx, const sstring& name,
int64_t column_family::stats::*f) {
return map_reduce_cf(ctx, name, 0, [f](const column_family& cf) {
return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
return cf.get_stats().*f;
}, std::plus<int64_t>());
}
future<json::json_return_type> get_cf_stats(http_context& ctx,
int64_t column_family::stats::*f) {
return map_reduce_cf(ctx, 0, [f](const column_family& cf) {
return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
return cf.get_stats().*f;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_stats_count(http_context& ctx, const sstring& name,
utils::ihistogram column_family::stats::*f) {
return map_reduce_cf(ctx, name, 0, [f](const column_family& cf) {
return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
return (cf.get_stats().*f).count;
}, std::plus<int64_t>());
}
@@ -101,7 +101,7 @@ static future<json::json_return_type> get_cf_stats_sum(http_context& ctx, const
static future<json::json_return_type> get_cf_stats_count(http_context& ctx,
utils::ihistogram column_family::stats::*f) {
return map_reduce_cf(ctx, 0, [f](const column_family& cf) {
return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
return (cf.get_stats().*f).count;
}, std::plus<int64_t>());
}
@@ -133,7 +133,7 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils:
}
static future<json::json_return_type> get_cf_unleveled_sstables(http_context& ctx, const sstring& name) {
return map_reduce_cf(ctx, name, 0, [](const column_family& cf) {
return map_reduce_cf(ctx, name, int64_t(0), [](const column_family& cf) {
return cf.get_unleveled_sstables();
}, std::plus<int64_t>());
}
@@ -223,25 +223,25 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], 0, [](column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {
return cf.active_memtable().region().occupancy().total_space();
}, std::plus<int64_t>());
});
cf::get_all_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, 0, [](column_family& cf) {
return map_reduce_cf(ctx, int64_t(0), [](column_family& cf) {
return cf.active_memtable().region().occupancy().total_space();
}, std::plus<int64_t>());
});
cf::get_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], 0, [](column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {
return cf.active_memtable().region().occupancy().used_space();
}, std::plus<int64_t>());
});
cf::get_all_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, 0, [](column_family& cf) {
return map_reduce_cf(ctx, int64_t(0), [](column_family& cf) {
return cf.active_memtable().region().occupancy().used_space();
}, std::plus<int64_t>());
});
@@ -256,7 +256,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_cf_all_memtables_off_heap_size.set(r, [&ctx] (std::unique_ptr<request> req) {
warn(unimplemented::cause::INDEXES);
return map_reduce_cf(ctx, req->param["name"], 0, [](column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {
return cf.occupancy().total_space();
}, std::plus<int64_t>());
});
@@ -265,21 +265,21 @@ void set_column_family(http_context& ctx, routes& r) {
warn(unimplemented::cause::INDEXES);
return ctx.db.map_reduce0([](const database& db){
return db.dirty_memory_region_group().memory_used();
}, 0, std::plus<int64_t>()).then([](int res) {
}, int64_t(0), std::plus<int64_t>()).then([](int res) {
return make_ready_future<json::json_return_type>(res);
});
});
cf::get_cf_all_memtables_live_data_size.set(r, [&ctx] (std::unique_ptr<request> req) {
warn(unimplemented::cause::INDEXES);
return map_reduce_cf(ctx, req->param["name"], 0, [](column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {
return cf.occupancy().used_space();
}, std::plus<int64_t>());
});
cf::get_all_cf_all_memtables_live_data_size.set(r, [&ctx] (std::unique_ptr<request> req) {
warn(unimplemented::cause::INDEXES);
return map_reduce_cf(ctx, 0, [](column_family& cf) {
return map_reduce_cf(ctx, int64_t(0), [](column_family& cf) {
return cf.active_memtable().region().occupancy().used_space();
}, std::plus<int64_t>());
});
@@ -304,7 +304,7 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_estimated_row_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], 0, [](column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {
uint64_t res = 0;
for (auto i: *cf.get_sstables() ) {
res += i.second->get_stats_metadata().estimated_row_size.count();
@@ -424,11 +424,11 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_max_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], 0, max_row_size, max_int64);
return map_reduce_cf(ctx, req->param["name"], int64_t(0), max_row_size, max_int64);
});
cf::get_all_max_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, 0, max_row_size, max_int64);
return map_reduce_cf(ctx, int64_t(0), max_row_size, max_int64);
});
cf::get_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -539,20 +539,20 @@ void set_column_family(http_context& ctx, routes& r) {
}, std::plus<uint64_t>());
});
cf::get_index_summary_off_heap_memory_used.set(r, [] (std::unique_ptr<request> req) {
//TBD
// FIXME
// We are missing the off heap memory calculation
// Return 0 is the wrong value. It's a work around
// until the memory calculation will be available
//auto id = get_uuid(req->param["name"], ctx.db.local());
return make_ready_future<json::json_return_type>(0);
cf::get_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return sst.second->get_summary().memory_footprint();
});
}, std::plus<uint64_t>());
});
cf::get_all_index_summary_off_heap_memory_used.set(r, [] (std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(0);
cf::get_all_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return sst.second->get_summary().memory_footprint();
});
}, std::plus<uint64_t>());
});
cf::get_compression_metadata_off_heap_memory_used.set(r, [] (std::unique_ptr<request> req) {
@@ -623,25 +623,25 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_row_cache_hit.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], 0, [](const column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](const column_family& cf) {
return cf.get_row_cache().stats().hits;
}, std::plus<int64_t>());
});
cf::get_all_row_cache_hit.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, 0, [](const column_family& cf) {
return map_reduce_cf(ctx, int64_t(0), [](const column_family& cf) {
return cf.get_row_cache().stats().hits;
}, std::plus<int64_t>());
});
cf::get_row_cache_miss.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], 0, [](const column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](const column_family& cf) {
return cf.get_row_cache().stats().misses;
}, std::plus<int64_t>());
});
cf::get_all_row_cache_miss.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, 0, [](const column_family& cf) {
return map_reduce_cf(ctx, int64_t(0), [](const column_family& cf) {
return cf.get_row_cache().stats().misses;
}, std::plus<int64_t>());

View File

@@ -21,6 +21,7 @@
#include "compaction_manager.hh"
#include "api/api-doc/compaction_manager.json.hh"
#include "db/system_keyspace.hh"
namespace api {
@@ -38,12 +39,25 @@ static future<json::json_return_type> get_cm_stats(http_context& ctx,
}
void set_compaction_manager(http_context& ctx, routes& r) {
cm::get_compactions.set(r, [] (std::unique_ptr<request> req) {
//TBD
// FIXME
warn(unimplemented::cause::API);
std::vector<cm::summary> map;
return make_ready_future<json::json_return_type>(map);
cm::get_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {
return ctx.db.map_reduce0([](database& db) {
std::vector<cm::summary> summaries;
const compaction_manager& cm = db.get_compaction_manager();
for (const auto& c : cm.get_compactions()) {
cm::summary s;
s.ks = c->ks;
s.cf = c->cf;
s.unit = "keys";
s.task_type = "compaction";
s.completed = c->total_keys_written;
s.total = c->total_partitions;
summaries.push_back(std::move(s));
}
return summaries;
}, std::vector<cm::summary>(), concat<cm::summary>).then([](const std::vector<cm::summary>& res) {
return make_ready_future<json::json_return_type>(res);
});
});
cm::force_user_defined_compaction.set(r, [] (std::unique_ptr<request> req) {
@@ -83,11 +97,29 @@ void set_compaction_manager(http_context& ctx, routes& r) {
});
cm::get_compaction_history.set(r, [] (std::unique_ptr<request> req) {
//TBD
// FIXME
warn(unimplemented::cause::API);
std::vector<cm::history> res;
return make_ready_future<json::json_return_type>(res);
return db::system_keyspace::get_compaction_history().then([] (std::vector<db::system_keyspace::compaction_history_entry> history) {
std::vector<cm::history> res;
res.reserve(history.size());
for (auto& entry : history) {
cm::history h;
h.id = entry.id.to_sstring();
h.ks = std::move(entry.ks);
h.cf = std::move(entry.cf);
h.compacted_at = entry.compacted_at;
h.bytes_in = entry.bytes_in;
h.bytes_out = entry.bytes_out;
for (auto it : entry.rows_merged) {
httpd::compaction_manager_json::row_merged e;
e.key = it.first;
e.value = it.second;
h.rows_merged.push(std::move(e));
}
res.push_back(std::move(h));
}
return make_ready_future<json::json_return_type>(res);
});
});
cm::get_compaction_info.set(r, [] (std::unique_ptr<request> req) {

View File

@@ -89,7 +89,7 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_token_endpoint.set(r, [] (const_req req) {
auto token_to_ep = service::get_local_storage_service().get_token_metadata().get_token_to_endpoint();
auto token_to_ep = service::get_local_storage_service().get_token_to_endpoint_map();
std::vector<storage_service_json::mapper> res;
return map_to_key_value(token_to_ep, res);
});
@@ -169,8 +169,14 @@ void set_storage_service(http_context& ctx, routes& r) {
ss::get_load_map.set(r, [] (std::unique_ptr<request> req) {
return service::get_local_storage_service().get_load_map().then([] (auto&& load_map) {
std::vector<ss::mapper> res;
return make_ready_future<json::json_return_type>(map_to_key_value(load_map, res));
std::vector<ss::double_mapper> res;
for (auto i : load_map) {
ss::double_mapper val;
val.key = i.first;
val.value = i.second;
res.push_back(val);
}
return make_ready_future<json::json_return_type>(res);
});
});
@@ -312,18 +318,14 @@ void set_storage_service(http_context& ctx, routes& r) {
ss::repair_async.set(r, [&ctx](std::unique_ptr<request> req) {
// Currently, we get all the repair options encoded in a single
// "options" option, and split it to a map using the "," and ":"
// delimiters. TODO: consider if it doesn't make more sense to just
// take all the query parameters as this map and pass it to the repair
// function.
static std::vector<sstring> options = {"primaryRange", "parallelism", "incremental",
"jobThreads", "ranges", "columnFamilies", "dataCenters", "hosts", "trace"};
std::unordered_map<sstring, sstring> options_map;
for (auto s : split(req->get_query_param("options"), ",")) {
auto kv = split(s, ":");
if (kv.size() != 2) {
throw httpd::bad_param_exception("malformed async repair options");
for (auto o : options) {
auto s = req->get_query_param(o);
if (s != "") {
options_map[o] = s;
}
options_map.emplace(std::move(kv[0]), std::move(kv[1]));
}
// The repair process is asynchronous: repair_start only starts it and
@@ -415,15 +417,18 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_drain_progress.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>("");
return service::get_storage_service().map_reduce(adder<service::storage_service::drain_progress>(), [] (auto& ss) {
return ss.get_drain_progress();
}).then([] (auto&& progress) {
auto progress_str = sprint("Drained %s/%s ColumnFamilies", progress.remaining_cfs, progress.total_cfs);
return make_ready_future<json::json_return_type>(std::move(progress_str));
});
});
ss::drain.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(json_void());
return service::get_local_storage_service().drain().then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::truncate.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD

View File

@@ -302,6 +302,12 @@ public:
bool operator==(const atomic_cell_or_collection& other) const {
return _data == other._data;
}
void linearize() {
_data.linearize();
}
void unlinearize() {
_data.scatter();
}
friend std::ostream& operator<<(std::ostream&, const atomic_cell_or_collection&);
};

View File

@@ -82,6 +82,12 @@ public:
}
return caching_options(k, r);
}
bool operator==(const caching_options& other) const {
return _key_cache == other._key_cache && _row_cache == other._row_cache;
}
bool operator!=(const caching_options& other) const {
return !(*this == other);
}
};

View File

@@ -68,7 +68,7 @@ public:
, _byte_order_equal(std::all_of(_types.begin(), _types.end(), [] (auto t) {
return t->is_byte_order_equal();
}))
, _byte_order_comparable(_types.size() == 1 && _types[0]->is_byte_order_comparable())
, _byte_order_comparable(!is_prefixable && _types.size() == 1 && _types[0]->is_byte_order_comparable())
, _is_reversed(_types.size() == 1 && _types[0]->is_reversed())
{ }
@@ -278,10 +278,10 @@ public:
});
}
bytes from_string(sstring_view s) {
throw std::runtime_error("not implemented");
throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
}
sstring to_string(const bytes& b) {
throw std::runtime_error("not implemented");
throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
}
// Retruns true iff given prefix has no missing components
bool is_full(bytes_view v) const {

View File

@@ -114,6 +114,14 @@ public:
}
return opts;
}
bool operator==(const compression_parameters& other) const {
return _compressor == other._compressor
&& _chunk_length == other._chunk_length
&& _crc_check_chance == other._crc_check_chance;
}
bool operator!=(const compression_parameters& other) const {
return !(*this == other);
}
private:
void validate_options(const std::map<sstring, sstring>& options) {
// currently, there are no options specific to a particular compressor

View File

@@ -782,40 +782,25 @@ commitlog_total_space_in_mb: -1
# the request scheduling. Currently the only valid option is keyspace.
# request_scheduler_id: keyspace
# Enable or disable inter-node encryption
# Default settings are TLS v1, RSA 1024-bit keys (it is imperative that
# users generate their own keys) TLS_RSA_WITH_AES_128_CBC_SHA as the cipher
# suite for authentication, key exchange and encryption of the actual data transfers.
# Use the DHE/ECDHE ciphers if running in FIPS 140 compliant mode.
# NOTE: No custom encryption options are enabled at the moment
# Enable or disable inter-node encryption.
# You must also generate keys and provide the appropriate key and trust store locations and passwords.
# No custom encryption options are currently enabled. The available options are:
#
# The available internode options are : all, none, dc, rack
#
# If set to dc cassandra will encrypt the traffic between the DCs
# If set to rack cassandra will encrypt the traffic between the racks
#
# The passwords used in these options must match the passwords used when generating
# the keystore and truststore. For instructions on generating these files, see:
# http://download.oracle.com/javase/6/docs/technotes/guides/security/jsse/JSSERefGuide.html#CreateKeystore
# If set to dc scylla will encrypt the traffic between the DCs
# If set to rack scylla will encrypt the traffic between the racks
#
# server_encryption_options:
# internode_encryption: none
# keystore: conf/.keystore
# keystore_password: cassandra
# truststore: conf/.truststore
# truststore_password: cassandra
# More advanced defaults below:
# protocol: TLS
# algorithm: SunX509
# store_type: JKS
# cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
# require_client_auth: false
# certificate: conf/scylla.crt
# keyfile: conf/scylla.key
# truststore: <none, use system trust>
# enable or disable client/server encryption.
# client_encryption_options:
# enabled: false
# keystore: conf/.keystore
# keystore_password: cassandra
# certificate: conf/scylla.crt
# keyfile: conf/scylla.key
# require_client_auth: false
# Set trustore and truststore_password if require_client_auth is true
@@ -839,3 +824,17 @@ commitlog_total_space_in_mb: -1
# reducing overhead from the TCP protocol itself, at the cost of increasing
# latency if you block for cross-datacenter responses.
# inter_dc_tcp_nodelay: false
# Relaxation of environment checks.
#
# Scylla places certain requirements on its environment. If these requirements are
# not met, performance and reliability can be degraded.
#
# These requirements include:
# - A filesystem with good support for aysnchronous I/O (AIO). Currently,
# this means XFS.
#
# false: strict environment checks are in place; do not start if they are not met.
# true: relaxed environment checks; performance and reliability may degraade.
#
# developer_mode: false

View File

@@ -183,6 +183,7 @@ scylla_tests = [
'tests/managed_vector_test',
'tests/crc_test',
'tests/flush_queue_test',
'tests/dynamic_bitset_test',
]
apps = [
@@ -280,6 +281,8 @@ scylla_core = (['database.cc',
'cql3/statements/schema_altering_statement.cc',
'cql3/statements/ks_prop_defs.cc',
'cql3/statements/modification_statement.cc',
'cql3/statements/parsed_statement.cc',
'cql3/statements/property_definitions.cc',
'cql3/statements/update_statement.cc',
'cql3/statements/delete_statement.cc',
'cql3/statements/batch_statement.cc',
@@ -339,6 +342,7 @@ scylla_core = (['database.cc',
'utils/rate_limiter.cc',
'utils/compaction_manager.cc',
'utils/file_lock.cc',
'utils/dynamic_bitset.cc',
'gms/version_generator.cc',
'gms/versioned_value.cc',
'gms/gossiper.cc',
@@ -482,6 +486,7 @@ tests_not_using_seastar_test_framework = set([
'tests/crc_test',
'tests/perf/perf_sstable',
'tests/managed_vector_test',
'tests/dynamic_bitset_test',
])
for t in tests_not_using_seastar_test_framework:
@@ -498,7 +503,7 @@ deps['tests/sstable_test'] += ['tests/sstable_datafile_test.cc']
deps['tests/bytes_ostream_test'] = ['tests/bytes_ostream_test.cc']
deps['tests/UUID_test'] = ['utils/UUID_gen.cc', 'tests/UUID_test.cc']
deps['tests/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'tests/murmur_hash_test.cc']
deps['tests/allocation_strategy_test'] = ['tests/allocation_strategy_test.cc', 'utils/logalloc.cc', 'log.cc']
deps['tests/allocation_strategy_test'] = ['tests/allocation_strategy_test.cc', 'utils/logalloc.cc', 'log.cc', 'utils/dynamic_bitset.cc']
warnings = [
'-Wno-mismatched-tags', # clang-only

View File

@@ -55,14 +55,11 @@ namespace cql3 {
* Represents an identifer for a CQL column definition.
* TODO : should support light-weight mode without text representation for when not interned
*/
class column_identifier final : public selection::selectable /* implements IMeasurableMemory*/ {
class column_identifier final : public selection::selectable {
public:
bytes bytes_;
private:
sstring _text;
#if 0
private static final long EMPTY_SIZE = ObjectSizes.measure(new ColumnIdentifier("", true));
#endif
public:
column_identifier(sstring raw_text, bool keep_case);
@@ -83,20 +80,6 @@ public:
}
#if 0
public long unsharedHeapSize()
{
return EMPTY_SIZE
+ ObjectSizes.sizeOnHeapOf(bytes)
+ ObjectSizes.sizeOf(text);
}
public long unsharedHeapSizeExcludingData()
{
return EMPTY_SIZE
+ ObjectSizes.sizeOnHeapExcludingData(bytes)
+ ObjectSizes.sizeOf(text);
}
public ColumnIdentifier clone(AbstractAllocator allocator)
{
return new ColumnIdentifier(allocator.clone(bytes), text);

View File

@@ -114,30 +114,26 @@ maps::literal::validate_assignable_to(database& db, const sstring& keyspace, col
assignment_testable::test_result
maps::literal::test_assignment(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver) {
throw std::runtime_error("not implemented");
#if 0
if (!(receiver.type instanceof MapType))
return AssignmentTestable.TestResult.NOT_ASSIGNABLE;
if (!dynamic_pointer_cast<const map_type_impl>(receiver->type)) {
return assignment_testable::test_result::NOT_ASSIGNABLE;
}
// If there is no elements, we can't say it's an exact match (an empty map if fundamentally polymorphic).
if (entries.isEmpty())
return AssignmentTestable.TestResult.WEAKLY_ASSIGNABLE;
ColumnSpecification keySpec = Maps.keySpecOf(receiver);
ColumnSpecification valueSpec = Maps.valueSpecOf(receiver);
if (entries.empty()) {
return assignment_testable::test_result::WEAKLY_ASSIGNABLE;
}
auto key_spec = maps::key_spec_of(*receiver);
auto value_spec = maps::value_spec_of(*receiver);
// It's an exact match if all are exact match, but is not assignable as soon as any is non assignable.
AssignmentTestable.TestResult res = AssignmentTestable.TestResult.EXACT_MATCH;
for (Pair<Term.Raw, Term.Raw> entry : entries)
{
AssignmentTestable.TestResult t1 = entry.left.testAssignment(keyspace, keySpec);
AssignmentTestable.TestResult t2 = entry.right.testAssignment(keyspace, valueSpec);
if (t1 == AssignmentTestable.TestResult.NOT_ASSIGNABLE || t2 == AssignmentTestable.TestResult.NOT_ASSIGNABLE)
return AssignmentTestable.TestResult.NOT_ASSIGNABLE;
if (t1 != AssignmentTestable.TestResult.EXACT_MATCH || t2 != AssignmentTestable.TestResult.EXACT_MATCH)
res = AssignmentTestable.TestResult.WEAKLY_ASSIGNABLE;
auto res = assignment_testable::test_result::EXACT_MATCH;
for (auto entry : entries) {
auto t1 = entry.first->test_assignment(db, keyspace, key_spec);
auto t2 = entry.second->test_assignment(db, keyspace, value_spec);
if (t1 == assignment_testable::test_result::NOT_ASSIGNABLE || t2 == assignment_testable::test_result::NOT_ASSIGNABLE)
return assignment_testable::test_result::NOT_ASSIGNABLE;
if (t1 != assignment_testable::test_result::EXACT_MATCH || t2 != assignment_testable::test_result::EXACT_MATCH)
res = assignment_testable::test_result::WEAKLY_ASSIGNABLE;
}
return res;
#endif
}
sstring

View File

@@ -199,13 +199,7 @@ public:
}
virtual shared_ptr<operation> prepare(database& db, const sstring& keyspace, const column_definition& receiver);
#if 0
protected String toString(ColumnSpecification column)
{
return String.format("%s[%s] = %s", column.name, selector, value);
}
#endif
virtual bool is_compatible_with(shared_ptr<raw_update> other) override;
};
@@ -218,13 +212,6 @@ public:
virtual shared_ptr<operation> prepare(database& db, const sstring& keyspace, const column_definition& receiver) override;
#if 0
protected String toString(ColumnSpecification column)
{
return String.format("%s = %s + %s", column.name, column.name, value);
}
#endif
virtual bool is_compatible_with(shared_ptr<raw_update> other) override;
};
@@ -237,13 +224,6 @@ public:
virtual shared_ptr<operation> prepare(database& db, const sstring& keyspace, const column_definition& receiver) override;
#if 0
protected String toString(ColumnSpecification column)
{
return String.format("%s = %s - %s", column.name, column.name, value);
}
#endif
virtual bool is_compatible_with(shared_ptr<raw_update> other) override;
};
@@ -256,12 +236,6 @@ public:
virtual shared_ptr<operation> prepare(database& db, const sstring& keyspace, const column_definition& receiver) override;
#if 0
protected String toString(ColumnSpecification column)
{
return String.format("%s = %s - %s", column.name, value, column.name);
}
#endif
virtual bool is_compatible_with(shared_ptr<raw_update> other) override;
};

View File

@@ -178,7 +178,7 @@ query_processor::prepare(const std::experimental::string_view& query_string, con
query_processor::get_stored_prepared_statement(const std::experimental::string_view& query_string, const sstring& keyspace, bool for_thrift)
{
if (for_thrift) {
throw std::runtime_error("not implemented");
throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
#if 0
Integer thriftStatementId = computeThriftId(queryString, keyspace);
ParsedStatement.Prepared existing = thriftPreparedStatements.get(thriftStatementId);
@@ -209,7 +209,7 @@ query_processor::store_prepared_statement(const std::experimental::string_view&
MAX_CACHE_PREPARED_MEMORY));
#endif
if (for_thrift) {
throw std::runtime_error("not implemented");
throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
#if 0
Integer statementId = computeThriftId(queryString, keyspace);
thriftPreparedStatements.put(statementId, prepared);
@@ -334,6 +334,9 @@ query_options query_processor::make_internal_options(
future<::shared_ptr<untyped_result_set>> query_processor::execute_internal(
const std::experimental::string_view& query_string,
const std::initializer_list<data_value>& values) {
if (log.is_enabled(logging::log_level::trace)) {
log.trace("execute_internal: \"{}\" ({})", query_string, ::join(", ", values));
}
auto p = prepare_internal(query_string);
auto opts = make_internal_options(p, values);
return do_with(std::move(opts),

View File

@@ -374,7 +374,7 @@ public:
}
virtual std::vector<bytes_opt> bounds(statements::bound b, const query_options& options) const override {
throw std::runtime_error("not implemented");
throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
#if 0
return Composites.toByteBuffers(boundsAsComposites(b, options));
#endif

View File

@@ -41,13 +41,13 @@ public:
::shared_ptr<primary_key_restrictions<T>> do_merge_to(schema_ptr schema, ::shared_ptr<restriction> restriction) const {
if (restriction->is_multi_column()) {
throw std::runtime_error("not implemented");
throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
}
return ::make_shared<single_column_primary_key_restrictions<T>>(schema)->merge_to(schema, restriction);
}
::shared_ptr<primary_key_restrictions<T>> merge_to(schema_ptr schema, ::shared_ptr<restriction> restriction) override {
if (restriction->is_multi_column()) {
throw std::runtime_error("not implemented");
throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
}
if (restriction->is_on_token()) {
return static_pointer_cast<token_restriction>(restriction);

View File

@@ -384,7 +384,11 @@ void result_set_builder::visitor::accept_new_row(
_builder.add(_partition_key[def->component_index()]);
break;
case column_kind::clustering_key:
_builder.add(_clustering_key[def->component_index()]);
if (_clustering_key.size() > def->component_index()) {
_builder.add(_clustering_key[def->component_index()]);
} else {
_builder.add({});
}
break;
case column_kind::regular_column:
add_value(*def, row_iterator);

View File

@@ -159,7 +159,7 @@ protected:
virtual shared_ptr<restrictions::restriction> new_contains_restriction(database& db, schema_ptr schema,
::shared_ptr<variable_specifications> bound_names,
bool is_key) override {
throw std::runtime_error("not implemented");
throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
#if 0
ColumnDefinition columnDef = toColumnDefinition(schema, entity);
Term term = toTerm(toReceivers(schema, columnDef), value, schema.ksName, bound_names);

View File

@@ -322,7 +322,7 @@ public:
virtual future<shared_ptr<transport::messages::result_message>> execute_internal(
distributed<service::storage_proxy>& proxy,
service::query_state& query_state, const query_options& options) override {
throw "not implemented";
throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
#if 0
assert !hasConditions;
for (IMutation mutation : getMutations(BatchQueryOptions.withoutPerStatementVariables(options), true, queryState.getTimestamp()))

View File

@@ -45,6 +45,14 @@ namespace cql3 {
namespace statements {
delete_statement::delete_statement(statement_type type, uint32_t bound_terms, schema_ptr s, std::unique_ptr<attributes> attrs)
: modification_statement{type, bound_terms, std::move(s), std::move(attrs)}
{ }
bool delete_statement::require_full_clustering_key() const {
return false;
}
void delete_statement::add_update_for_key(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) {
if (_column_operations.empty()) {
m.partition().apply_delete(*s, prefix, params.make_tombstone());
@@ -96,5 +104,17 @@ delete_statement::parsed::prepare_internal(database& db, schema_ptr schema, ::sh
return stmt;
}
delete_statement::parsed::parsed(::shared_ptr<cf_name> name,
::shared_ptr<attributes::raw> attrs,
std::vector<::shared_ptr<operation::raw_deletion>> deletions,
std::vector<::shared_ptr<relation>> where_clause,
conditions_vector conditions,
bool if_exists)
: modification_statement::parsed(std::move(name), std::move(attrs), std::move(conditions), false, if_exists)
, _deletions(std::move(deletions))
, _where_clause(std::move(where_clause))
{ }
}
}

View File

@@ -55,13 +55,9 @@ namespace statements {
*/
class delete_statement : public modification_statement {
public:
delete_statement(statement_type type, uint32_t bound_terms, schema_ptr s, std::unique_ptr<attributes> attrs)
: modification_statement{type, bound_terms, std::move(s), std::move(attrs)}
{ }
delete_statement(statement_type type, uint32_t bound_terms, schema_ptr s, std::unique_ptr<attributes> attrs);
virtual bool require_full_clustering_key() const override {
return false;
}
virtual bool require_full_clustering_key() const override;
virtual void add_update_for_key(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) override;
@@ -94,11 +90,7 @@ public:
std::vector<::shared_ptr<operation::raw_deletion>> deletions,
std::vector<::shared_ptr<relation>> where_clause,
conditions_vector conditions,
bool if_exists)
: modification_statement::parsed(std::move(name), std::move(attrs), std::move(conditions), false, if_exists)
, _deletions(std::move(deletions))
, _where_clause(std::move(where_clause))
{ }
bool if_exists);
protected:
virtual ::shared_ptr<modification_statement> prepare_internal(database& db, schema_ptr schema,
::shared_ptr<variable_specifications> bound_names, std::unique_ptr<attributes> attrs);

View File

@@ -71,6 +71,81 @@ operator<<(std::ostream& out, modification_statement::statement_type t) {
return out;
}
modification_statement::modification_statement(statement_type type_, uint32_t bound_terms, schema_ptr schema_, std::unique_ptr<attributes> attrs_)
: type{type_}
, _bound_terms{bound_terms}
, s{schema_}
, attrs{std::move(attrs_)}
, _column_operations{}
{ }
bool modification_statement::uses_function(const sstring& ks_name, const sstring& function_name) const {
if (attrs->uses_function(ks_name, function_name)) {
return true;
}
for (auto&& e : _processed_keys) {
auto r = e.second;
if (r && r->uses_function(ks_name, function_name)) {
return true;
}
}
for (auto&& operation : _column_operations) {
if (operation && operation->uses_function(ks_name, function_name)) {
return true;
}
}
for (auto&& condition : _column_conditions) {
if (condition && condition->uses_function(ks_name, function_name)) {
return true;
}
}
for (auto&& condition : _static_conditions) {
if (condition && condition->uses_function(ks_name, function_name)) {
return true;
}
}
return false;
}
uint32_t modification_statement::get_bound_terms() {
return _bound_terms;
}
sstring modification_statement::keyspace() const {
return s->ks_name();
}
sstring modification_statement::column_family() const {
return s->cf_name();
}
bool modification_statement::is_counter() const {
return s->is_counter();
}
int64_t modification_statement::get_timestamp(int64_t now, const query_options& options) const {
return attrs->get_timestamp(now, options);
}
bool modification_statement::is_timestamp_set() const {
return attrs->is_timestamp_set();
}
gc_clock::duration modification_statement::get_time_to_live(const query_options& options) const {
return gc_clock::duration(attrs->get_time_to_live(options));
}
void modification_statement::check_access(const service::client_state& state) {
warn(unimplemented::cause::PERMISSIONS);
#if 0
state.hasColumnFamilyAccess(keyspace(), columnFamily(), Permission.MODIFY);
// CAS updates can be used to simulate a SELECT query, so should require Permission.SELECT as well.
if (hasConditions())
state.hasColumnFamilyAccess(keyspace(), columnFamily(), Permission.SELECT);
#endif
}
future<std::vector<mutation>>
modification_statement::get_mutations(distributed<service::storage_proxy>& proxy, const query_options& options, bool local, int64_t now) {
auto keys = make_lw_shared(build_partition_keys(options));
@@ -549,6 +624,63 @@ bool modification_statement::depends_on_column_family(const sstring& cf_name) co
return column_family() == cf_name;
}
void modification_statement::add_operation(::shared_ptr<operation> op) {
if (op->column.is_static()) {
_sets_static_columns = true;
} else {
_sets_regular_columns = true;
}
_column_operations.push_back(std::move(op));
}
void modification_statement::add_condition(::shared_ptr<column_condition> cond) {
if (cond->column.is_static()) {
_sets_static_columns = true;
_static_conditions.emplace_back(std::move(cond));
} else {
_sets_regular_columns = true;
_column_conditions.emplace_back(std::move(cond));
}
}
void modification_statement::set_if_not_exist_condition() {
_if_not_exists = true;
}
bool modification_statement::has_if_not_exist_condition() const {
return _if_not_exists;
}
void modification_statement::set_if_exist_condition() {
_if_exists = true;
}
bool modification_statement::has_if_exist_condition() const {
return _if_exists;
}
bool modification_statement::requires_read() {
return std::any_of(_column_operations.begin(), _column_operations.end(), [] (auto&& op) {
return op->requires_read();
});
}
bool modification_statement::has_conditions() {
return _if_not_exists || _if_exists || !_column_conditions.empty() || !_static_conditions.empty();
}
void modification_statement::validate_where_clause_for_conditions() {
// no-op by default
}
modification_statement::parsed::parsed(::shared_ptr<cf_name> name, ::shared_ptr<attributes::raw> attrs, conditions_vector conditions, bool if_not_exists, bool if_exists)
: cf_statement{std::move(name)}
, _attrs{std::move(attrs)}
, _conditions{std::move(conditions)}
, _if_not_exists{if_not_exists}
, _if_exists{if_exists}
{ }
}
}

View File

@@ -107,84 +107,29 @@ private:
};
public:
modification_statement(statement_type type_, uint32_t bound_terms, schema_ptr schema_, std::unique_ptr<attributes> attrs_)
: type{type_}
, _bound_terms{bound_terms}
, s{schema_}
, attrs{std::move(attrs_)}
, _column_operations{}
{ }
modification_statement(statement_type type_, uint32_t bound_terms, schema_ptr schema_, std::unique_ptr<attributes> attrs_);
virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const override {
if (attrs->uses_function(ks_name, function_name)) {
return true;
}
for (auto&& e : _processed_keys) {
auto r = e.second;
if (r && r->uses_function(ks_name, function_name)) {
return true;
}
}
for (auto&& operation : _column_operations) {
if (operation && operation->uses_function(ks_name, function_name)) {
return true;
}
}
for (auto&& condition : _column_conditions) {
if (condition && condition->uses_function(ks_name, function_name)) {
return true;
}
}
for (auto&& condition : _static_conditions) {
if (condition && condition->uses_function(ks_name, function_name)) {
return true;
}
}
return false;
}
virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const override;
virtual bool require_full_clustering_key() const = 0;
virtual void add_update_for_key(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) = 0;
virtual uint32_t get_bound_terms() override {
return _bound_terms;
}
virtual uint32_t get_bound_terms() override;
virtual sstring keyspace() const {
return s->ks_name();
}
virtual sstring keyspace() const;
virtual sstring column_family() const {
return s->cf_name();
}
virtual sstring column_family() const;
virtual bool is_counter() const {
return s->is_counter();
}
virtual bool is_counter() const;
int64_t get_timestamp(int64_t now, const query_options& options) const {
return attrs->get_timestamp(now, options);
}
int64_t get_timestamp(int64_t now, const query_options& options) const;
bool is_timestamp_set() const {
return attrs->is_timestamp_set();
}
bool is_timestamp_set() const;
gc_clock::duration get_time_to_live(const query_options& options) const {
return gc_clock::duration(attrs->get_time_to_live(options));
}
gc_clock::duration get_time_to_live(const query_options& options) const;
virtual void check_access(const service::client_state& state) override {
warn(unimplemented::cause::PERMISSIONS);
#if 0
state.hasColumnFamilyAccess(keyspace(), columnFamily(), Permission.MODIFY);
// CAS updates can be used to simulate a SELECT query, so should require Permission.SELECT as well.
if (hasConditions())
state.hasColumnFamilyAccess(keyspace(), columnFamily(), Permission.SELECT);
#endif
}
virtual void check_access(const service::client_state& state) override;
void validate(distributed<service::storage_proxy>&, const service::client_state& state) override;
@@ -192,14 +137,7 @@ public:
virtual bool depends_on_column_family(const sstring& cf_name) const override;
void add_operation(::shared_ptr<operation> op) {
if (op->column.is_static()) {
_sets_static_columns = true;
} else {
_sets_regular_columns = true;
}
_column_operations.push_back(std::move(op));
}
void add_operation(::shared_ptr<operation> op);
#if 0
public Iterable<ColumnDefinition> getColumnsWithConditions()
@@ -212,31 +150,15 @@ public:
}
#endif
public:
void add_condition(::shared_ptr<column_condition> cond) {
if (cond->column.is_static()) {
_sets_static_columns = true;
_static_conditions.emplace_back(std::move(cond));
} else {
_sets_regular_columns = true;
_column_conditions.emplace_back(std::move(cond));
}
}
void add_condition(::shared_ptr<column_condition> cond);
void set_if_not_exist_condition() {
_if_not_exists = true;
}
void set_if_not_exist_condition();
bool has_if_not_exist_condition() const {
return _if_not_exists;
}
bool has_if_not_exist_condition() const;
void set_if_exist_condition() {
_if_exists = true;
}
void set_if_exist_condition();
bool has_if_exist_condition() const {
return _if_exists;
}
bool has_if_exist_condition() const;
private:
void add_key_values(const column_definition& def, ::shared_ptr<restrictions::restriction> values);
@@ -254,11 +176,7 @@ protected:
const column_definition* get_first_empty_key();
public:
bool requires_read() {
return std::any_of(_column_operations.begin(), _column_operations.end(), [] (auto&& op) {
return op->requires_read();
});
}
bool requires_read();
protected:
future<update_parameters::prefetched_rows_type> read_required_rows(
@@ -269,9 +187,7 @@ protected:
db::consistency_level cl);
public:
bool has_conditions() {
return _if_not_exists || _if_exists || !_column_conditions.empty() || !_static_conditions.empty();
}
bool has_conditions();
virtual future<::shared_ptr<transport::messages::result_message>>
execute(distributed<service::storage_proxy>& proxy, service::query_state& qs, const query_options& options) override;
@@ -428,9 +344,7 @@ protected:
* processed to check that they are compatible.
* @throws InvalidRequestException
*/
virtual void validate_where_clause_for_conditions() {
// no-op by default
}
virtual void validate_where_clause_for_conditions();
public:
class parsed : public cf_statement {
@@ -443,13 +357,7 @@ public:
const bool _if_not_exists;
const bool _if_exists;
protected:
parsed(::shared_ptr<cf_name> name, ::shared_ptr<attributes::raw> attrs, conditions_vector conditions, bool if_not_exists, bool if_exists)
: cf_statement{std::move(name)}
, _attrs{std::move(attrs)}
, _conditions{std::move(conditions)}
, _if_not_exists{if_not_exists}
, _if_exists{if_exists}
{ }
parsed(::shared_ptr<cf_name> name, ::shared_ptr<attributes::raw> attrs, conditions_vector conditions, bool if_not_exists, bool if_exists);
public:
virtual ::shared_ptr<parsed_statement::prepared> prepare(database& db) override;

View File

@@ -0,0 +1,83 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright 2014 Cloudius Systems
*
* Modified by Cloudius Systems
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "cql3/statements/parsed_statement.hh"
namespace cql3 {
namespace statements {
parsed_statement::~parsed_statement()
{ }
shared_ptr<variable_specifications> parsed_statement::get_bound_variables() {
return _variables;
}
// Used by the parser and preparable statement
void parsed_statement::set_bound_variables(const std::vector<::shared_ptr<column_identifier>>& bound_names) {
_variables = ::make_shared<variable_specifications>(bound_names);
}
bool parsed_statement::uses_function(const sstring& ks_name, const sstring& function_name) const {
return false;
}
parsed_statement::prepared::prepared(::shared_ptr<cql_statement> statement_, std::vector<::shared_ptr<column_specification>> bound_names_)
: statement(std::move(statement_))
, bound_names(std::move(bound_names_))
{ }
parsed_statement::prepared::prepared(::shared_ptr<cql_statement> statement_, const variable_specifications& names)
: prepared(statement_, names.get_specifications())
{ }
parsed_statement::prepared::prepared(::shared_ptr<cql_statement> statement_, variable_specifications&& names)
: prepared(statement_, std::move(names).get_specifications())
{ }
parsed_statement::prepared::prepared(::shared_ptr<cql_statement>&& statement_)
: prepared(statement_, std::vector<::shared_ptr<column_specification>>())
{ }
}
}

View File

@@ -60,47 +60,29 @@ private:
::shared_ptr<variable_specifications> _variables;
public:
virtual ~parsed_statement()
{ }
virtual ~parsed_statement();
shared_ptr<variable_specifications> get_bound_variables() {
return _variables;
}
shared_ptr<variable_specifications> get_bound_variables();
// Used by the parser and preparable statement
void set_bound_variables(const std::vector<::shared_ptr<column_identifier>>& bound_names)
{
_variables = ::make_shared<variable_specifications>(bound_names);
}
void set_bound_variables(const std::vector<::shared_ptr<column_identifier>>& bound_names);
class prepared {
public:
const ::shared_ptr<cql_statement> statement;
const std::vector<::shared_ptr<column_specification>> bound_names;
prepared(::shared_ptr<cql_statement> statement_, std::vector<::shared_ptr<column_specification>> bound_names_)
: statement(std::move(statement_))
, bound_names(std::move(bound_names_))
{ }
prepared(::shared_ptr<cql_statement> statement_, std::vector<::shared_ptr<column_specification>> bound_names_);
prepared(::shared_ptr<cql_statement> statement_, const variable_specifications& names)
: prepared(statement_, names.get_specifications())
{ }
prepared(::shared_ptr<cql_statement> statement_, const variable_specifications& names);
prepared(::shared_ptr<cql_statement> statement_, variable_specifications&& names)
: prepared(statement_, std::move(names).get_specifications())
{ }
prepared(::shared_ptr<cql_statement> statement_, variable_specifications&& names);
prepared(::shared_ptr<cql_statement>&& statement_)
: prepared(statement_, std::vector<::shared_ptr<column_specification>>())
{ }
prepared(::shared_ptr<cql_statement>&& statement_);
};
virtual ::shared_ptr<prepared> prepare(database& db) = 0;
virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const {
return false;
}
virtual bool uses_function(const sstring& ks_name, const sstring& function_name) const;
};
}

View File

@@ -0,0 +1,186 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright 2015 Cloudius Systems
*
* Modified by Cloudius Systems
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "cql3/statements/property_definitions.hh"
namespace cql3 {
namespace statements {
property_definitions::property_definitions()
: _properties{}
{ }
void property_definitions::add_property(const sstring& name, sstring value) {
auto it = _properties.find(name);
if (it != _properties.end()) {
throw exceptions::syntax_exception(sprint("Multiple definition for property '%s'", name));
}
_properties.emplace(name, value);
}
void property_definitions::add_property(const sstring& name, const std::map<sstring, sstring>& value) {
auto it = _properties.find(name);
if (it != _properties.end()) {
throw exceptions::syntax_exception(sprint("Multiple definition for property '%s'", name));
}
_properties.emplace(name, value);
}
void property_definitions::validate(const std::set<sstring>& keywords, const std::set<sstring>& obsolete) {
for (auto&& kv : _properties) {
auto&& name = kv.first;
if (keywords.count(name)) {
continue;
}
if (obsolete.count(name)) {
#if 0
logger.warn("Ignoring obsolete property {}", name);
#endif
} else {
throw exceptions::syntax_exception(sprint("Unknown property '%s'", name));
}
}
}
std::experimental::optional<sstring> property_definitions::get_simple(const sstring& name) const {
auto it = _properties.find(name);
if (it == _properties.end()) {
return std::experimental::nullopt;
}
try {
return boost::any_cast<sstring>(it->second);
} catch (const boost::bad_any_cast& e) {
throw exceptions::syntax_exception(sprint("Invalid value for property '%s'. It should be a string", name));
}
}
std::experimental::optional<std::map<sstring, sstring>> property_definitions::get_map(const sstring& name) const {
auto it = _properties.find(name);
if (it == _properties.end()) {
return std::experimental::nullopt;
}
try {
return boost::any_cast<std::map<sstring, sstring>>(it->second);
} catch (const boost::bad_any_cast& e) {
throw exceptions::syntax_exception(sprint("Invalid value for property '%s'. It should be a map.", name));
}
}
bool property_definitions::has_property(const sstring& name) const {
return _properties.find(name) != _properties.end();
}
sstring property_definitions::get_string(sstring key, sstring default_value) const {
auto value = get_simple(key);
if (value) {
return value.value();
} else {
return default_value;
}
}
// Return a property value, typed as a Boolean
bool property_definitions::get_boolean(sstring key, bool default_value) const {
auto value = get_simple(key);
if (value) {
std::string s{value.value()};
std::transform(s.begin(), s.end(), s.begin(), ::tolower);
return s == "1" || s == "true" || s == "yes";
} else {
return default_value;
}
}
// Return a property value, typed as a double
double property_definitions::get_double(sstring key, double default_value) const {
auto value = get_simple(key);
return to_double(key, value, default_value);
}
double property_definitions::to_double(sstring key, std::experimental::optional<sstring> value, double default_value) {
if (value) {
auto val = value.value();
try {
return std::stod(val);
} catch (const std::exception& e) {
throw exceptions::syntax_exception(sprint("Invalid double value %s for '%s'", val, key));
}
} else {
return default_value;
}
}
// Return a property value, typed as an Integer
int32_t property_definitions::get_int(sstring key, int32_t default_value) const {
auto value = get_simple(key);
return to_int(key, value, default_value);
}
int32_t property_definitions::to_int(sstring key, std::experimental::optional<sstring> value, int32_t default_value) {
if (value) {
auto val = value.value();
try {
return std::stoi(val);
} catch (const std::exception& e) {
throw exceptions::syntax_exception(sprint("Invalid integer value %s for '%s'", val, key));
}
} else {
return default_value;
}
}
long property_definitions::to_long(sstring key, std::experimental::optional<sstring> value, long default_value) {
if (value) {
auto val = value.value();
try {
return std::stol(val);
} catch (const std::exception& e) {
throw exceptions::syntax_exception(sprint("Invalid long value %s for '%s'", val, key));
}
} else {
return default_value;
}
}
}
}

View File

@@ -66,141 +66,38 @@ protected:
#endif
std::unordered_map<sstring, boost::any> _properties;
property_definitions()
: _properties{}
{ }
property_definitions();
public:
void add_property(const sstring& name, sstring value) {
auto it = _properties.find(name);
if (it != _properties.end()) {
throw exceptions::syntax_exception(sprint("Multiple definition for property '%s'", name));
}
_properties.emplace(name, value);
}
void add_property(const sstring& name, sstring value);
void add_property(const sstring& name, const std::map<sstring, sstring>& value) {
auto it = _properties.find(name);
if (it != _properties.end()) {
throw exceptions::syntax_exception(sprint("Multiple definition for property '%s'", name));
}
_properties.emplace(name, value);
}
void add_property(const sstring& name, const std::map<sstring, sstring>& value);
void validate(const std::set<sstring>& keywords, const std::set<sstring>& obsolete);
void validate(const std::set<sstring>& keywords, const std::set<sstring>& obsolete) {
for (auto&& kv : _properties) {
auto&& name = kv.first;
if (keywords.count(name)) {
continue;
}
if (obsolete.count(name)) {
#if 0
logger.warn("Ignoring obsolete property {}", name);
#endif
} else {
throw exceptions::syntax_exception(sprint("Unknown property '%s'", name));
}
}
}
protected:
std::experimental::optional<sstring> get_simple(const sstring& name) const {
auto it = _properties.find(name);
if (it == _properties.end()) {
return std::experimental::nullopt;
}
try {
return boost::any_cast<sstring>(it->second);
} catch (const boost::bad_any_cast& e) {
throw exceptions::syntax_exception(sprint("Invalid value for property '%s'. It should be a string", name));
}
}
std::experimental::optional<sstring> get_simple(const sstring& name) const;
std::experimental::optional<std::map<sstring, sstring>> get_map(const sstring& name) const;
std::experimental::optional<std::map<sstring, sstring>> get_map(const sstring& name) const {
auto it = _properties.find(name);
if (it == _properties.end()) {
return std::experimental::nullopt;
}
try {
return boost::any_cast<std::map<sstring, sstring>>(it->second);
} catch (const boost::bad_any_cast& e) {
throw exceptions::syntax_exception(sprint("Invalid value for property '%s'. It should be a map.", name));
}
}
public:
bool has_property(const sstring& name) const {
return _properties.find(name) != _properties.end();
}
bool has_property(const sstring& name) const;
sstring get_string(sstring key, sstring default_value) const {
auto value = get_simple(key);
if (value) {
return value.value();
} else {
return default_value;
}
}
sstring get_string(sstring key, sstring default_value) const;
// Return a property value, typed as a Boolean
bool get_boolean(sstring key, bool default_value) const {
auto value = get_simple(key);
if (value) {
std::string s{value.value()};
std::transform(s.begin(), s.end(), s.begin(), ::tolower);
return s == "1" || s == "true" || s == "yes";
} else {
return default_value;
}
}
bool get_boolean(sstring key, bool default_value) const;
// Return a property value, typed as a double
double get_double(sstring key, double default_value) const {
auto value = get_simple(key);
return to_double(key, value, default_value);
}
double get_double(sstring key, double default_value) const;
static double to_double(sstring key, std::experimental::optional<sstring> value, double default_value) {
if (value) {
auto val = value.value();
try {
return std::stod(val);
} catch (const std::exception& e) {
throw exceptions::syntax_exception(sprint("Invalid double value %s for '%s'", val, key));
}
} else {
return default_value;
}
}
static double to_double(sstring key, std::experimental::optional<sstring> value, double default_value);
// Return a property value, typed as an Integer
int32_t get_int(sstring key, int32_t default_value) const {
auto value = get_simple(key);
return to_int(key, value, default_value);
}
int32_t get_int(sstring key, int32_t default_value) const;
static int32_t to_int(sstring key, std::experimental::optional<sstring> value, int32_t default_value) {
if (value) {
auto val = value.value();
try {
return std::stoi(val);
} catch (const std::exception& e) {
throw exceptions::syntax_exception(sprint("Invalid integer value %s for '%s'", val, key));
}
} else {
return default_value;
}
}
static int32_t to_int(sstring key, std::experimental::optional<sstring> value, int32_t default_value);
static long to_long(sstring key, std::experimental::optional<sstring> value, long default_value) {
if (value) {
auto val = value.value();
try {
return std::stol(val);
} catch (const std::exception& e) {
throw exceptions::syntax_exception(sprint("Invalid long value %s for '%s'", val, key));
}
} else {
return default_value;
}
}
static long to_long(sstring key, std::experimental::optional<sstring> value, long default_value);
};
}

View File

@@ -54,6 +54,31 @@ namespace statements {
thread_local const shared_ptr<select_statement::parameters> select_statement::_default_parameters = ::make_shared<select_statement::parameters>();
select_statement::parameters::parameters()
: _is_distinct{false}
, _allow_filtering{false}
{ }
select_statement::parameters::parameters(orderings_type orderings,
bool is_distinct,
bool allow_filtering)
: _orderings{std::move(orderings)}
, _is_distinct{is_distinct}
, _allow_filtering{allow_filtering}
{ }
bool select_statement::parameters::is_distinct() {
return _is_distinct;
}
bool select_statement::parameters::allow_filtering() {
return _allow_filtering;
}
select_statement::parameters::orderings_type const& select_statement::parameters::orderings() {
return _orderings;
}
select_statement::select_statement(schema_ptr schema,
uint32_t bound_terms,
::shared_ptr<parameters> parameters,
@@ -115,6 +140,14 @@ bool select_statement::depends_on_column_family(const sstring& cf_name) const {
return column_family() == cf_name;
}
const sstring& select_statement::keyspace() const {
return _schema->ks_name();
}
const sstring& select_statement::column_family() const {
return _schema->cf_name();
}
query::partition_slice
select_statement::make_partition_slice(const query_options& options) {
std::vector<column_id> static_columns;
@@ -318,6 +351,18 @@ shared_ptr<transport::messages::result_message> select_statement::process_result
return ::make_shared<transport::messages::result_message::rows>(std::move(rs));
}
select_statement::raw_statement::raw_statement(::shared_ptr<cf_name> cf_name,
::shared_ptr<parameters> parameters,
std::vector<::shared_ptr<selection::raw_selector>> select_clause,
std::vector<::shared_ptr<relation>> where_clause,
::shared_ptr<term::raw> limit)
: cf_statement(std::move(cf_name))
, _parameters(std::move(parameters))
, _select_clause(std::move(select_clause))
, _where_clause(std::move(where_clause))
, _limit(std::move(limit))
{ }
::shared_ptr<parsed_statement::prepared>
select_statement::raw_statement::prepare(database& db) {
schema_ptr schema = validation::validate_column_family(db, keyspace(), column_family());

View File

@@ -72,20 +72,13 @@ public:
const bool _is_distinct;
const bool _allow_filtering;
public:
parameters()
: _is_distinct{false}
, _allow_filtering{false}
{ }
parameters();
parameters(orderings_type orderings,
bool is_distinct,
bool allow_filtering)
: _orderings{std::move(orderings)}
, _is_distinct{is_distinct}
, _allow_filtering{allow_filtering}
{ }
bool is_distinct() { return _is_distinct; }
bool allow_filtering() { return _allow_filtering; }
orderings_type const& orderings() { return _orderings; }
bool allow_filtering);
bool is_distinct();
bool allow_filtering();
orderings_type const& orderings();
};
private:
static constexpr int DEFAULT_COUNT_PAGE_SIZE = 10000;
@@ -195,13 +188,9 @@ public:
}
#endif
const sstring& keyspace() const {
return _schema->ks_name();
}
const sstring& keyspace() const;
const sstring& column_family() const {
return _schema->cf_name();
}
const sstring& column_family() const;
query::partition_slice make_partition_slice(const query_options& options);
@@ -457,13 +446,7 @@ public:
::shared_ptr<parameters> parameters,
std::vector<::shared_ptr<selection::raw_selector>> select_clause,
std::vector<::shared_ptr<relation>> where_clause,
::shared_ptr<term::raw> limit)
: cf_statement(std::move(cf_name))
, _parameters(std::move(parameters))
, _select_clause(std::move(select_clause))
, _where_clause(std::move(where_clause))
, _limit(std::move(limit))
{ }
::shared_ptr<term::raw> limit);
virtual ::shared_ptr<prepared> prepare(database& db) override;
private:

View File

@@ -48,6 +48,14 @@ namespace cql3 {
namespace statements {
update_statement::update_statement(statement_type type, uint32_t bound_terms, schema_ptr s, std::unique_ptr<attributes> attrs)
: modification_statement{type, bound_terms, std::move(s), std::move(attrs)}
{ }
bool update_statement::require_full_clustering_key() const {
return true;
}
void update_statement::add_update_for_key(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) {
if (s->is_dense()) {
if (!prefix || (prefix.size() == 1 && prefix.components().front().empty())) {
@@ -100,6 +108,16 @@ void update_statement::add_update_for_key(mutation& m, const exploded_clustering
#endif
}
update_statement::parsed_insert::parsed_insert(::shared_ptr<cf_name> name,
::shared_ptr<attributes::raw> attrs,
std::vector<::shared_ptr<column_identifier::raw>> column_names,
std::vector<::shared_ptr<term::raw>> column_values,
bool if_not_exists)
: modification_statement::parsed{std::move(name), std::move(attrs), conditions_vector{}, if_not_exists, false}
, _column_names{std::move(column_names)}
, _column_values{std::move(column_values)}
{ }
::shared_ptr<modification_statement>
update_statement::parsed_insert::prepare_internal(database& db, schema_ptr schema,
::shared_ptr<variable_specifications> bound_names, std::unique_ptr<attributes> attrs)
@@ -148,6 +166,16 @@ update_statement::parsed_insert::prepare_internal(database& db, schema_ptr schem
return stmt;
}
update_statement::parsed_update::parsed_update(::shared_ptr<cf_name> name,
::shared_ptr<attributes::raw> attrs,
std::vector<std::pair<::shared_ptr<column_identifier::raw>, ::shared_ptr<operation::raw_update>>> updates,
std::vector<relation_ptr> where_clause,
conditions_vector conditions)
: modification_statement::parsed(std::move(name), std::move(attrs), std::move(conditions), false, false)
, _updates(std::move(updates))
, _where_clause(std::move(where_clause))
{ }
::shared_ptr<modification_statement>
update_statement::parsed_update::prepare_internal(database& db, schema_ptr schema,
::shared_ptr<variable_specifications> bound_names, std::unique_ptr<attributes> attrs)

View File

@@ -64,14 +64,9 @@ public:
private static final Constants.Value EMPTY = new Constants.Value(ByteBufferUtil.EMPTY_BYTE_BUFFER);
#endif
update_statement(statement_type type, uint32_t bound_terms, schema_ptr s, std::unique_ptr<attributes> attrs)
: modification_statement{type, bound_terms, std::move(s), std::move(attrs)}
{ }
update_statement(statement_type type, uint32_t bound_terms, schema_ptr s, std::unique_ptr<attributes> attrs);
private:
virtual bool require_full_clustering_key() const override {
return true;
}
virtual bool require_full_clustering_key() const override;
virtual void add_update_for_key(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) override;
public:
@@ -92,11 +87,7 @@ public:
::shared_ptr<attributes::raw> attrs,
std::vector<::shared_ptr<column_identifier::raw>> column_names,
std::vector<::shared_ptr<term::raw>> column_values,
bool if_not_exists)
: modification_statement::parsed{std::move(name), std::move(attrs), conditions_vector{}, if_not_exists, false}
, _column_names{std::move(column_names)}
, _column_values{std::move(column_values)}
{ }
bool if_not_exists);
virtual ::shared_ptr<modification_statement> prepare_internal(database& db, schema_ptr schema,
::shared_ptr<variable_specifications> bound_names, std::unique_ptr<attributes> attrs) override;
@@ -122,11 +113,7 @@ public:
::shared_ptr<attributes::raw> attrs,
std::vector<std::pair<::shared_ptr<column_identifier::raw>, ::shared_ptr<operation::raw_update>>> updates,
std::vector<relation_ptr> where_clause,
conditions_vector conditions)
: modification_statement::parsed(std::move(name), std::move(attrs), std::move(conditions), false, false)
, _updates(std::move(updates))
, _where_clause(std::move(where_clause))
{ }
conditions_vector conditions);
protected:
virtual ::shared_ptr<modification_statement> prepare_internal(database& db, schema_ptr schema,
::shared_ptr<variable_specifications> bound_names, std::unique_ptr<attributes> attrs);

View File

@@ -224,14 +224,6 @@ public:
// We don't "need" that override but it saves us the allocation of a Value object if used
return options.make_temporary(_type->build_value(bind_internal(options)));
}
#if 0
@Override
public String toString()
{
return tupleToString(elements);
}
#endif
};
/**

View File

@@ -88,14 +88,6 @@ public:
}
_specs[bind_index] = spec;
}
#if 0
@Override
public String toString()
{
return Arrays.toString(specs);
}
#endif
};
}

View File

@@ -470,7 +470,8 @@ future<sstables::entry_descriptor> column_family::probe_file(sstring sstdir, sst
dblog.error("malformed sstable {}: {}. Refusing to boot", fname, e.what());
throw;
} catch(...) {
dblog.error("Unrecognized error while processing {}: Refusing to boot", fname);
dblog.error("Unrecognized error while processing {}: {}. Refusing to boot",
fname, std::current_exception());
throw;
}
return make_ready_future<entry_descriptor>(std::move(comps));
@@ -731,21 +732,22 @@ column_family::compact_sstables(sstables::compaction_descriptor descriptor) {
std::unordered_set<sstables::shared_sstable> s(
sstables_to_compact->begin(), sstables_to_compact->end());
for (const auto& oldtab : *current_sstables) {
// Checks if oldtab is a sstable not being compacted.
if (!s.count(oldtab.second)) {
update_stats_for_new_sstable(oldtab.second->data_size());
_sstables->emplace(oldtab.first, oldtab.second);
}
}
for (const auto& newtab : *new_tables) {
// FIXME: rename the new sstable(s). Verify a rename doesn't cause
// problems for the sstable object.
update_stats_for_new_sstable(newtab.second->data_size());
_sstables->emplace(newtab.first, newtab.second);
}
for (const auto& newtab : *new_tables) {
// FIXME: rename the new sstable(s). Verify a rename doesn't cause
// problems for the sstable object.
update_stats_for_new_sstable(newtab.second->data_size());
_sstables->emplace(newtab.first, newtab.second);
}
for (const auto& oldtab : *sstables_to_compact) {
oldtab->mark_for_deletion();
}
for (const auto& oldtab : *sstables_to_compact) {
oldtab->mark_for_deletion();
}
});
});
@@ -762,6 +764,8 @@ column_family::load_new_sstables(std::vector<sstables::entry_descriptor> new_tab
auto last = sst->get_last_partition_key(*_schema);
if (belongs_to_current_shard(*_schema, first, last)) {
this->add_sstable(sst);
} else {
sst->mark_for_deletion();
}
return make_ready_future<>();
});
@@ -854,79 +858,79 @@ future<> column_family::populate(sstring sstdir) {
auto verifier = make_lw_shared<std::unordered_map<unsigned long, status>>();
auto descriptor = make_lw_shared<sstable_descriptor>();
return do_with(std::vector<future<>>(), [this, sstdir, verifier, descriptor] (std::vector<future<>>& futures) {
return lister::scan_dir(sstdir, { directory_entry_type::regular }, [this, sstdir, verifier, descriptor, &futures] (directory_entry de) {
// FIXME: The secondary indexes are in this level, but with a directory type, (starting with ".")
auto f = probe_file(sstdir, de.name).then([verifier, descriptor] (auto entry) {
if (verifier->count(entry.generation)) {
if (verifier->at(entry.generation) == status::has_toc_file) {
if (entry.component == sstables::sstable::component_type::TOC) {
throw sstables::malformed_sstable_exception("Invalid State encountered. TOC file already processed");
return do_with(std::vector<future<>>(), [this, sstdir, verifier, descriptor] (std::vector<future<>>& futures) {
return lister::scan_dir(sstdir, { directory_entry_type::regular }, [this, sstdir, verifier, descriptor, &futures] (directory_entry de) {
// FIXME: The secondary indexes are in this level, but with a directory type, (starting with ".")
auto f = probe_file(sstdir, de.name).then([verifier, descriptor] (auto entry) {
if (verifier->count(entry.generation)) {
if (verifier->at(entry.generation) == status::has_toc_file) {
if (entry.component == sstables::sstable::component_type::TOC) {
throw sstables::malformed_sstable_exception("Invalid State encountered. TOC file already processed");
} else if (entry.component == sstables::sstable::component_type::TemporaryTOC) {
throw sstables::malformed_sstable_exception("Invalid State encountered. Temporary TOC file found after TOC file was processed");
}
} else if (entry.component == sstables::sstable::component_type::TOC) {
verifier->at(entry.generation) = status::has_toc_file;
} else if (entry.component == sstables::sstable::component_type::TemporaryTOC) {
throw sstables::malformed_sstable_exception("Invalid State encountered. Temporary TOC file found after TOC file was processed");
verifier->at(entry.generation) = status::has_temporary_toc_file;
}
} else if (entry.component == sstables::sstable::component_type::TOC) {
verifier->at(entry.generation) = status::has_toc_file;
} else if (entry.component == sstables::sstable::component_type::TemporaryTOC) {
verifier->at(entry.generation) = status::has_temporary_toc_file;
}
} else {
if (entry.component == sstables::sstable::component_type::TOC) {
verifier->emplace(entry.generation, status::has_toc_file);
} else if (entry.component == sstables::sstable::component_type::TemporaryTOC) {
verifier->emplace(entry.generation, status::has_temporary_toc_file);
} else {
verifier->emplace(entry.generation, status::has_some_file);
if (entry.component == sstables::sstable::component_type::TOC) {
verifier->emplace(entry.generation, status::has_toc_file);
} else if (entry.component == sstables::sstable::component_type::TemporaryTOC) {
verifier->emplace(entry.generation, status::has_temporary_toc_file);
} else {
verifier->emplace(entry.generation, status::has_some_file);
}
}
}
// Retrieve both version and format used for this column family.
if (!descriptor->version) {
descriptor->version = entry.version;
}
if (!descriptor->format) {
descriptor->format = entry.format;
}
});
// push future returned by probe_file into an array of futures,
// so that the supplied callback will not block scan_dir() from
// reading the next entry in the directory.
futures.push_back(std::move(f));
return make_ready_future<>();
}).then([&futures] {
return when_all(futures.begin(), futures.end()).then([] (std::vector<future<>> ret) {
try {
for (auto& f : ret) {
f.get();
// Retrieve both version and format used for this column family.
if (!descriptor->version) {
descriptor->version = entry.version;
}
} catch(...) {
throw;
}
});
}).then([verifier, sstdir, descriptor, this] {
return parallel_for_each(*verifier, [sstdir = std::move(sstdir), descriptor, this] (auto v) {
if (v.second == status::has_temporary_toc_file) {
unsigned long gen = v.first;
assert(descriptor->version);
sstables::sstable::version_types version = descriptor->version.value();
assert(descriptor->format);
sstables::sstable::format_types format = descriptor->format.value();
if (engine().cpu_id() != 0) {
dblog.info("At directory: {}, partial SSTable with generation {} not relevant for this shard, ignoring", sstdir, v.first);
return make_ready_future<>();
if (!descriptor->format) {
descriptor->format = entry.format;
}
// shard 0 is the responsible for removing a partial sstable.
return sstables::sstable::remove_sstable_with_temp_toc(_schema->ks_name(), _schema->cf_name(), sstdir, gen, version, format);
} else if (v.second != status::has_toc_file) {
throw sstables::malformed_sstable_exception(sprint("At directory: %s: no TOC found for SSTable with generation %d!. Refusing to boot", sstdir, v.first));
}
});
// push future returned by probe_file into an array of futures,
// so that the supplied callback will not block scan_dir() from
// reading the next entry in the directory.
futures.push_back(std::move(f));
return make_ready_future<>();
}).then([&futures] {
return when_all(futures.begin(), futures.end()).then([] (std::vector<future<>> ret) {
try {
for (auto& f : ret) {
f.get();
}
} catch(...) {
throw;
}
});
}).then([verifier, sstdir, descriptor, this] {
return parallel_for_each(*verifier, [sstdir = std::move(sstdir), descriptor, this] (auto v) {
if (v.second == status::has_temporary_toc_file) {
unsigned long gen = v.first;
assert(descriptor->version);
sstables::sstable::version_types version = descriptor->version.value();
assert(descriptor->format);
sstables::sstable::format_types format = descriptor->format.value();
if (engine().cpu_id() != 0) {
dblog.info("At directory: {}, partial SSTable with generation {} not relevant for this shard, ignoring", sstdir, v.first);
return make_ready_future<>();
}
// shard 0 is the responsible for removing a partial sstable.
return sstables::sstable::remove_sstable_with_temp_toc(_schema->ks_name(), _schema->cf_name(), sstdir, gen, version, format);
} else if (v.second != status::has_toc_file) {
throw sstables::malformed_sstable_exception(sprint("At directory: %s: no TOC found for SSTable with generation %d!. Refusing to boot", sstdir, v.first));
}
return make_ready_future<>();
});
});
});
});
}
utils::UUID database::empty_version = utils::UUID_gen::get_name_UUID(bytes{});
@@ -1143,7 +1147,7 @@ void database::add_keyspace(sstring name, keyspace k) {
}
void database::update_keyspace(const sstring& name) {
throw std::runtime_error("not implemented");
throw std::runtime_error("update keyspace not implemented");
}
void database::drop_keyspace(const sstring& name) {
@@ -1945,7 +1949,7 @@ future<> column_family::snapshot(sstring name) {
}
future<bool> column_family::snapshot_exists(sstring tag) {
sstring jsondir = _config.datadir + "/snapshots/";
sstring jsondir = _config.datadir + "/snapshots/" + tag;
return engine().open_directory(std::move(jsondir)).then_wrapped([] (future<file> f) {
try {
f.get0();

View File

@@ -194,8 +194,7 @@ private:
mutation_source sstables_as_mutation_source();
key_source sstables_as_key_source() const;
partition_presence_checker make_partition_presence_checker(lw_shared_ptr<sstable_list> old_sstables);
// We will use highres because hopefully it won't take more than a few usecs
std::chrono::high_resolution_clock::time_point _sstable_writes_disabled_at;
std::chrono::steady_clock::time_point _sstable_writes_disabled_at;
public:
// Creates a mutation reader which covers all data sources for this column family.
// Caller needs to ensure that column_family remains live (FIXME: relax this).
@@ -216,6 +215,10 @@ public:
return _cache;
}
row_cache& get_row_cache() {
return _cache;
}
logalloc::occupancy_stats occupancy() const;
public:
column_family(schema_ptr schema, config cfg, db::commitlog& cl, compaction_manager&);
@@ -247,7 +250,7 @@ public:
// to call this separately in all shards first, to guarantee that none of them are writing
// new data before you can safely assume that the whole node is disabled.
future<int64_t> disable_sstable_write() {
_sstable_writes_disabled_at = std::chrono::high_resolution_clock::now();
_sstable_writes_disabled_at = std::chrono::steady_clock::now();
return _sstables_lock.write_lock().then([this] {
return make_ready_future<int64_t>((*_sstables->end()).first);
});
@@ -255,10 +258,10 @@ public:
// SSTable writes are now allowed again, and generation is updated to new_generation
// returns the amount of microseconds elapsed since we disabled writes.
std::chrono::high_resolution_clock::duration enable_sstable_write(int64_t new_generation) {
std::chrono::steady_clock::duration enable_sstable_write(int64_t new_generation) {
update_sstables_known_generation(new_generation);
_sstables_lock.write_unlock();
return std::chrono::high_resolution_clock::now() - _sstable_writes_disabled_at;
return std::chrono::steady_clock::now() - _sstable_writes_disabled_at;
}
// Make sure the generation numbers are sequential, starting from "start".
@@ -321,6 +324,10 @@ public:
return _stats;
}
compaction_manager& get_compaction_manager() const {
return _compaction_manager;
}
template<typename Func, typename Result = futurize_t<std::result_of_t<Func()>>>
Result run_with_compaction_disabled(Func && func) {
++_compaction_disabled;
@@ -562,6 +569,9 @@ public:
return _commitlog.get();
}
compaction_manager& get_compaction_manager() {
return _compaction_manager;
}
const compaction_manager& get_compaction_manager() const {
return _compaction_manager;
}

View File

@@ -35,8 +35,8 @@ class column_definition;
// keys.hh
class exploded_clustering_prefix;
class partition_key;
class clustering_key;
class clustering_key_prefix;
using clustering_key = clustering_key_prefix;
// memtable.hh
class memtable;

View File

@@ -56,6 +56,7 @@
#include "unimplemented.hh"
#include "db/config.hh"
#include "gms/failure_detector.hh"
#include "service/storage_service.hh"
static logging::logger logger("batchlog_manager");
@@ -87,10 +88,8 @@ future<> db::batchlog_manager::start() {
);
});
});
_timer.arm(
lowres_clock::now()
+ std::chrono::milliseconds(
service::storage_service::RING_DELAY));
auto ring_delay = service::get_local_storage_service().get_ring_delay();
_timer.arm(lowres_clock::now() + ring_delay);
}
return make_ready_future<>();
}
@@ -115,7 +114,7 @@ mutation db::batchlog_manager::get_batch_log_mutation_for(const std::vector<muta
mutation db::batchlog_manager::get_batch_log_mutation_for(const std::vector<mutation>& mutations, const utils::UUID& id, int32_t version, db_clock::time_point now) {
auto schema = _qp.db().local().find_schema(system_keyspace::NAME, system_keyspace::BATCHLOG);
auto key = partition_key::from_singular(*schema, id);
auto timestamp = db_clock::now_in_usecs();
auto timestamp = api::new_timestamp();
auto data = [this, &mutations] {
std::vector<frozen_mutation> fm(mutations.begin(), mutations.end());
const auto size = std::accumulate(fm.begin(), fm.end(), size_t(0), [](size_t s, auto& m) {

View File

@@ -281,6 +281,43 @@ private:
* A single commit log file on disk. Manages creation of the file and writing mutations to disk,
* as well as tracking the last mutation position of any "dirty" CFs covered by the segment file. Segment
* files are initially allocated to a fixed size and can grow to accomidate a larger value if necessary.
*
* The IO flow is somewhat convoluted and goes something like this:
*
* Mutation path:
* - Adding data to the segment usually writes into the internal buffer
* - On EOB or overflow we issue a write to disk ("cycle").
* - A cycle call will acquire the segment read lock and send the
* buffer to the corresponding position in the file
* - If we are periodic and crossed a timing threshold, or running "batch" mode
* we might be forced to issue a flush ("sync") after adding data
* - A sync call acquires the write lock, thus locking out writes
* and waiting for pending writes to finish. It then checks the
* high data mark, and issues the actual file flush.
* Note that the write lock is released prior to issuing the
* actual file flush, thus we are allowed to write data to
* after a flush point concurrently with a pending flush.
*
* Sync timer:
* - In periodic mode, we try to primarily issue sync calls in
* a timer task issued every N seconds. The timer does the same
* operation as the above described sync, and resets the timeout
* so that mutation path will not trigger syncs and delay.
*
* Note that we do not care which order segment chunks finish writing
* to disk, other than all below a flush point must finish before flushing.
*
* We currently do not wait for flushes to finish before issueing the next
* cycle call ("after" flush point in the file). This might not be optimal.
*
* To close and finish a segment, we first close the gate object that guards
* writing data to it, then flush it fully (including waiting for futures create
* by the timer to run their course), and finally wait for it to
* become "clean", i.e. get notified that all mutations it holds have been
* persisted to sstables elsewhere. Once this is done, we can delete the
* segment. If a segment (object) is deleted without being fully clean, we
* do not remove the file on disk.
*
*/
class db::commitlog::segment: public enable_lw_shared_from_this<segment> {
@@ -370,6 +407,7 @@ public:
void reset_sync_time() {
_sync_time = clock_type::now();
}
// See class comment for info
future<sseg_ptr> sync() {
// Note: this is not a marker for when sync was finished.
// It is when it was initiated
@@ -386,6 +424,7 @@ public:
future<> shutdown() {
return _gate.close();
}
// See class comment for info
future<sseg_ptr> flush(uint64_t pos = 0) {
auto me = shared_from_this();
assert(!me.owned());
@@ -431,6 +470,7 @@ public:
/**
* Send any buffer contents to disk and get a new tmp buffer
*/
// See class comment for info
future<sseg_ptr> cycle(size_t s = 0) {
auto size = clear_buffer_slack();
auto buf = std::move(_buffer);

View File

@@ -117,8 +117,9 @@ template<typename K, typename V>
struct convert<std::unordered_map<K, V>> {
static Node encode(const std::unordered_map<K, V>& rhs) {
Node node(NodeType::Map);
for(typename std::map<K, V>::const_iterator it=rhs.begin();it!=rhs.end();++it)
node.force_insert(it->first, it->second);
for (auto& p : rhs) {
node.force_insert(p.first, p.second);
}
return node;
}
static bool decode(const Node& node, std::unordered_map<K, V>& rhs) {
@@ -413,3 +414,21 @@ future<> db::config::read_from_file(const sstring& filename) {
return read_from_file(std::move(f));
});
}
boost::filesystem::path db::config::get_conf_dir() {
using namespace boost::filesystem;
path confdir;
auto* cd = std::getenv("SCYLLA_CONF");
if (cd != nullptr) {
confdir = path(cd);
} else {
auto* p = std::getenv("SCYLLA_HOME");
if (p != nullptr) {
confdir = path(p);
}
confdir /= "conf";
}
return confdir;
}

View File

@@ -121,23 +121,7 @@ public:
* @return path of the directory where configuration files are located
* according the environment variables definitions.
*/
static boost::filesystem::path get_conf_dir() {
using namespace boost::filesystem;
path confdir;
auto* cd = std::getenv("SCYLLA_CONF");
if (cd != nullptr) {
confdir = path(cd);
} else {
auto* p = std::getenv("SCYLLA_HOME");
if (p != nullptr) {
confdir = path(p);
}
confdir /= "conf";
}
return confdir;
}
static boost::filesystem::path get_conf_dir();
typedef std::unordered_map<sstring, sstring> string_map;
typedef std::vector<sstring> string_list;
@@ -682,7 +666,7 @@ public:
val(permissions_update_interval_in_ms, uint32_t, 2000, Unused, \
"Refresh interval for permissions cache (if enabled). After this interval, cache entries become eligible for refresh. On next access, an async reload is scheduled and the old value is returned until it completes. If permissions_validity_in_ms , then this property must benon-zero." \
) \
val(server_encryption_options, string_map, /*none*/, Unused, \
val(server_encryption_options, string_map, /*none*/, Used, \
"Enable or disable inter-node encryption. You must also generate keys and provide the appropriate key and trust store locations and passwords. No custom encryption options are currently enabled. The available options are:\n" \
"\n" \
"internode_encryption : (Default: none ) Enable or disable encryption of inter-node communication using the TLS_RSA_WITH_AES_128_CBC_SHA cipher suite for authentication, key exchange, and encryption of data transfers. The available inter-node options are:\n" \
@@ -690,20 +674,9 @@ public:
"\tnone : No encryption.\n" \
"\tdc : Encrypt the traffic between the data centers (server only).\n" \
"\track : Encrypt the traffic between the racks(server only).\n" \
"\tkeystore : (Default: conf/.keystore ) The location of a Java keystore (JKS) suitable for use with Java Secure Socket Extension (JSSE), which is the Java version of the Secure Sockets Layer (SSL), and Transport Layer Security (TLS) protocols. The keystore contains the private key used to encrypt outgoing messages.\n" \
"\tkeystore_password : (Default: cassandra ) Password for the keystore.\n" \
"\ttruststore : (Default: conf/.truststore ) Location of the truststore containing the trusted certificate for authenticating remote servers.\n" \
"\ttruststore_password : (Default: cassandra ) Password for the truststore.\n" \
"\n" \
"The passwords used in these options must match the passwords used when generating the keystore and truststore. For instructions on generating these files, see Creating a Keystore to Use with JSSE.\n" \
"\n" \
"The advanced settings are:\n" \
"\n" \
"\tprotocol : (Default: TLS )\n" \
"\talgorithm : (Default: SunX509 )\n" \
"\tstore_type : (Default: JKS )\n" \
"\tcipher_suites : (Default: TLS_RSA_WITH_AES_128_CBC_SHA , TLS_RSA_WITH_AES_256_CBC_SHA )\n" \
"\trequire_client_auth : (Default: false ) Enables or disables certificate authentication.\n" \
"certificate : (Default: conf/scylla.crt) The location of a PEM-encoded x509 certificate used to identify and encrypt the internode communication.\n" \
"keyfile : (Default: conf/scylla.key) PEM Key file associated with certificate.\n" \
"truststore : (Default: <system truststore> ) Location of the truststore containing the trusted certificate for authenticating remote servers.\n" \
"Related information: Node-to-node encryption" \
) \
val(client_encryption_options, string_map, /*none*/, Unused, \
@@ -750,6 +723,9 @@ public:
val(replace_token, sstring, "", Used, "The tokens of the node to replace. Same as -Dcassandra.replace_token in cassandra.") \
val(replace_address, sstring, "", Used, "The listen_address or broadcast_address of the dead node to replace. Same as -Dcassandra.replace_address.") \
val(replace_address_first_boot, sstring, "", Used, "Like replace_address option, but if the node has been bootstrapped sucessfully it will be ignored. Same as -Dcassandra.replace_address_first_boot.") \
val(override_decommission, bool, false, Used, "Set true to force a decommissioned node to join the cluster") \
val(ring_delay_ms, uint32_t, 30 * 1000, Used, "Time a node waits to hear from other nodes before joining the ring in milliseconds. Same as -Dcassandra.ring_delay_ms in cassandra.") \
val(developer_mode, bool, false, Used, "Relax environement checks. Setting to true can reduce performance and reliability significantly.") \
/* done! */
#define _make_value_member(name, type, deflt, status, desc, ...) \

View File

@@ -1269,7 +1269,7 @@ void create_table_from_table_row_and_column_rows(schema_builder& builder, const
} else {
// FIXME:
// is_dense = CFMetaData.calculateIsDense(fullRawComparator, columnDefs);
throw std::runtime_error("not implemented");
throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
}
bool is_compound = cell_comparator::check_compound(table_row.get_nonnull<sstring>("comparator"));

View File

@@ -187,30 +187,6 @@ void db::serializer<partition_key_view>::skip(input& in) {
in.skip(len);
}
template<>
db::serializer<clustering_key_view>::serializer(const clustering_key_view& key)
: _item(key), _size(sizeof(uint16_t) /* size */ + key.representation().size()) {
}
template<>
void db::serializer<clustering_key_view>::write(output& out, const clustering_key_view& key) {
bytes_view v = key.representation();
out.write<uint16_t>(v.size());
out.write(v.begin(), v.end());
}
template<>
void db::serializer<clustering_key_view>::read(clustering_key_view& b, input& in) {
auto len = in.read<uint16_t>();
b = clustering_key_view::from_bytes(in.read_view(len));
}
template<>
clustering_key_view db::serializer<clustering_key_view>::read(input& in) {
auto len = in.read<uint16_t>();
return clustering_key_view::from_bytes(in.read_view(len));
}
template<>
db::serializer<clustering_key_prefix_view>::serializer(const clustering_key_prefix_view& key)
: _item(key), _size(sizeof(uint16_t) /* size */ + key.representation().size()) {
@@ -281,7 +257,6 @@ template class db::serializer<atomic_cell_view> ;
template class db::serializer<collection_mutation_view> ;
template class db::serializer<utils::UUID> ;
template class db::serializer<partition_key_view> ;
template class db::serializer<clustering_key_view> ;
template class db::serializer<clustering_key_prefix_view> ;
template class db::serializer<frozen_mutation> ;
template class db::serializer<db::replay_position> ;

View File

@@ -22,6 +22,8 @@
#ifndef DB_SERIALIZER_HH_
#define DB_SERIALIZER_HH_
#include <experimental/optional>
#include "utils/data_input.hh"
#include "utils/data_output.hh"
#include "bytes_ostream.hh"
@@ -57,9 +59,9 @@ public:
return *this;
}
static void write(output&, const T&);
static void read(T&, input&);
static T read(input&);
static void write(output&, const type&);
static void read(type&, input&);
static type read(input&);
static void skip(input& in);
size_t size() const {
@@ -75,11 +77,100 @@ public:
void write(data_output& out) const {
write(out, _item);
}
bytes to_bytes() const {
bytes b(bytes::initialized_later(), _size);
data_output out(b);
write(out);
return b;
}
static type from_bytes(bytes_view v) {
data_input in(v);
return read(in);
}
private:
const T& _item;
const type& _item;
size_t _size;
};
template<typename T>
class serializer<std::experimental::optional<T>> {
public:
typedef std::experimental::optional<T> type;
typedef data_output output;
typedef data_input input;
typedef serializer<T> _MyType;
serializer(const type& t)
: _item(t)
, _size(output::serialized_size<bool>() + (t ? serializer<T>(*t).size() : 0))
{}
// apply to memory, must be at least size() large.
const _MyType& operator()(output& out) const {
write(out, _item);
return *this;
}
static void write(output& out, const type& v) {
bool en = v;
out.write<bool>(en);
if (en) {
serializer<T>::write(out, *v);
}
}
static void read(type& dst, input& in) {
auto en = in.read<bool>();
if (en) {
dst = serializer<T>::read(in);
} else {
dst = {};
}
}
static type read(input& in) {
type t;
read(t, in);
return t;
}
static void skip(input& in) {
auto en = in.read<bool>();
if (en) {
serializer<T>::skip(in);
}
}
size_t size() const {
return _size;
}
void write(bytes_ostream& out) const {
auto buf = out.write_place_holder(_size);
data_output data_out((char*)buf, _size);
write(data_out, _item);
}
void write(data_output& out) const {
write(out, _item);
}
bytes to_bytes() const {
bytes b(bytes::initialized_later(), _size);
data_output out(b);
write(out);
return b;
}
static type from_bytes(bytes_view v) {
data_input in(v);
return read(in);
}
private:
const std::experimental::optional<T> _item;
size_t _size;
};
template<> serializer<utils::UUID>::serializer(const utils::UUID &);
template<> void serializer<utils::UUID>::write(output&, const type&);
template<> void serializer<utils::UUID>::read(utils::UUID&, input&);
@@ -123,11 +214,6 @@ template<> void serializer<partition_key_view>::read(partition_key_view&, input&
template<> partition_key_view serializer<partition_key_view>::read(input&);
template<> void serializer<partition_key_view>::skip(input&);
template<> serializer<clustering_key_view>::serializer(const clustering_key_view &);
template<> void serializer<clustering_key_view>::write(output&, const clustering_key_view&);
template<> void serializer<clustering_key_view>::read(clustering_key_view&, input&);
template<> clustering_key_view serializer<clustering_key_view>::read(input&);
template<> serializer<clustering_key_prefix_view>::serializer(const clustering_key_prefix_view &);
template<> void serializer<clustering_key_prefix_view>::write(output&, const clustering_key_prefix_view&);
template<> void serializer<clustering_key_prefix_view>::read(clustering_key_prefix_view&, input&);

View File

@@ -464,7 +464,8 @@ static future<> build_bootstrap_info() {
static auto state_map = std::unordered_map<sstring, bootstrap_state>({
{ "NEEDS_BOOTSTRAP", bootstrap_state::NEEDS_BOOTSTRAP },
{ "COMPLETED", bootstrap_state::COMPLETED },
{ "IN_PROGRESS", bootstrap_state::IN_PROGRESS }
{ "IN_PROGRESS", bootstrap_state::IN_PROGRESS },
{ "DECOMMISSIONED", bootstrap_state::DECOMMISSIONED }
});
bootstrap_state state = bootstrap_state::NEEDS_BOOTSTRAP;
@@ -796,6 +797,8 @@ future<> remove_endpoint(gms::inet_address ep) {
}).then([ep] {
sstring req = "DELETE FROM system.%s WHERE peer = ?";
return execute_cql(req, PEERS, ep.addr()).discard_result();
}).then([] {
return force_blocking_flush(PEERS);
});
}
@@ -874,6 +877,10 @@ bool bootstrap_in_progress() {
return get_bootstrap_state() == bootstrap_state::IN_PROGRESS;
}
bool was_decommissioned() {
return get_bootstrap_state() == bootstrap_state::DECOMMISSIONED;
}
bootstrap_state get_bootstrap_state() {
return _local_cache.local()._state;
}
@@ -882,7 +889,8 @@ future<> set_bootstrap_state(bootstrap_state state) {
static std::unordered_map<bootstrap_state, sstring, enum_hash<bootstrap_state>> state_to_name({
{ bootstrap_state::NEEDS_BOOTSTRAP, "NEEDS_BOOTSTRAP" },
{ bootstrap_state::COMPLETED, "COMPLETED" },
{ bootstrap_state::IN_PROGRESS, "IN_PROGRESS" }
{ bootstrap_state::IN_PROGRESS, "IN_PROGRESS" },
{ bootstrap_state::DECOMMISSIONED, "DECOMMISSIONED" }
});
sstring state_name = state_to_name.at(state);
@@ -1002,5 +1010,55 @@ query(distributed<service::storage_proxy>& proxy, const sstring& cf_name, const
});
}
static map_type_impl::native_type prepare_rows_merged(std::unordered_map<int32_t, int64_t>& rows_merged) {
map_type_impl::native_type tmp;
for (auto& r: rows_merged) {
int32_t first = r.first;
int64_t second = r.second;
auto map_element = std::make_pair<data_value, data_value>(data_value(first), data_value(second));
tmp.push_back(std::move(map_element));
}
return tmp;
}
future<> update_compaction_history(sstring ksname, sstring cfname, int64_t compacted_at, int64_t bytes_in, int64_t bytes_out,
std::unordered_map<int32_t, int64_t> rows_merged)
{
// don't write anything when the history table itself is compacted, since that would in turn cause new compactions
if (ksname == "system" && cfname == COMPACTION_HISTORY) {
return make_ready_future<>();
}
auto map_type = map_type_impl::get_instance(int32_type, long_type, true);
sstring req = "INSERT INTO system.%s (id, keyspace_name, columnfamily_name, compacted_at, bytes_in, bytes_out, rows_merged) VALUES (?, ?, ?, ?, ?, ?, ?)";
return execute_cql(req, COMPACTION_HISTORY, utils::UUID_gen::get_time_UUID(), ksname, cfname, compacted_at, bytes_in, bytes_out,
make_map_value(map_type, prepare_rows_merged(rows_merged))).discard_result();
}
future<std::vector<compaction_history_entry>> get_compaction_history()
{
sstring req = "SELECT * from system.%s";
return execute_cql(req, COMPACTION_HISTORY).then([] (::shared_ptr<cql3::untyped_result_set> msg) {
std::vector<compaction_history_entry> history;
for (auto& row : *msg) {
compaction_history_entry entry;
entry.id = row.get_as<utils::UUID>("id");
entry.ks = row.get_as<sstring>("keyspace_name");
entry.cf = row.get_as<sstring>("columnfamily_name");
entry.compacted_at = row.get_as<int64_t>("compacted_at");
entry.bytes_in = row.get_as<int64_t>("bytes_in");
entry.bytes_out = row.get_as<int64_t>("bytes_out");
if (row.has("rows_merged")) {
entry.rows_merged = row.get_map<int32_t, int64_t>("rows_merged");
}
history.push_back(std::move(entry));
}
return std::move(history);
});
}
} // namespace system_keyspace
} // namespace db

View File

@@ -153,7 +153,8 @@ load_dc_rack_info();
enum class bootstrap_state {
NEEDS_BOOTSTRAP,
COMPLETED,
IN_PROGRESS
IN_PROGRESS,
DECOMMISSIONED
};
#if 0
@@ -258,26 +259,28 @@ enum class bootstrap_state {
compactionLog.truncateBlocking();
}
public static void updateCompactionHistory(String ksname,
String cfname,
long compactedAt,
long bytesIn,
long bytesOut,
Map<Integer, Long> rowsMerged)
{
// don't write anything when the history table itself is compacted, since that would in turn cause new compactions
if (ksname.equals("system") && cfname.equals(COMPACTION_HISTORY))
return;
String req = "INSERT INTO system.%s (id, keyspace_name, columnfamily_name, compacted_at, bytes_in, bytes_out, rows_merged) VALUES (?, ?, ?, ?, ?, ?, ?)";
executeInternal(String.format(req, COMPACTION_HISTORY), UUIDGen.getTimeUUID(), ksname, cfname, ByteBufferUtil.bytes(compactedAt), bytesIn, bytesOut, rowsMerged);
}
public static TabularData getCompactionHistory() throws OpenDataException
{
UntypedResultSet queryResultSet = executeInternal(String.format("SELECT * from system.%s", COMPACTION_HISTORY));
return CompactionHistoryTabularData.from(queryResultSet);
}
#endif
struct compaction_history_entry {
utils::UUID id;
sstring ks;
sstring cf;
int64_t compacted_at = 0;
int64_t bytes_in = 0;
int64_t bytes_out = 0;
// Key: number of rows merged
// Value: counter
std::unordered_map<int32_t, int64_t> rows_merged;
};
future<> update_compaction_history(sstring ksname, sstring cfname, int64_t compacted_at, int64_t bytes_in, int64_t bytes_out,
std::unordered_map<int32_t, int64_t> rows_merged);
future<std::vector<compaction_history_entry>> get_compaction_history();
typedef std::vector<db::replay_position> replay_positions;
future<> save_truncation_record(const column_family&, db_clock::time_point truncated_at, db::replay_position);
@@ -519,6 +522,7 @@ enum class bootstrap_state {
bool bootstrap_complete();
bool bootstrap_in_progress();
bootstrap_state get_bootstrap_state();
bool was_decommissioned();
future<> set_bootstrap_state(bootstrap_state state);
#if 0

View File

@@ -34,12 +34,12 @@ token byte_ordered_partitioner::get_random_token()
std::map<token, float> byte_ordered_partitioner::describe_ownership(const std::vector<token>& sorted_tokens)
{
throw std::runtime_error("not implemented");
throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
}
token byte_ordered_partitioner::midpoint(const token& t1, const token& t2) const
{
throw std::runtime_error("not implemented");
throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
}
unsigned

View File

@@ -386,12 +386,22 @@ public:
friend std::ostream& operator<<(std::ostream&, const ring_position&);
};
// Trichotomic comparator for ring_position
struct ring_position_comparator {
const schema& s;
ring_position_comparator(const schema& s_) : s(s_) {}
int operator()(const ring_position& lh, const ring_position& rh) const;
};
// "less" comparator for ring_position
struct ring_position_less_comparator {
const schema& s;
ring_position_less_comparator(const schema& s_) : s(s_) {}
bool operator()(const ring_position& lh, const ring_position& rh) const {
return lh.less_compare(s, rh);
}
};
struct token_comparator {
// Return values are those of a trichotomic comparison.
int operator()(const token& t1, const token& t2) const;

View File

@@ -88,18 +88,8 @@ inline int64_t long_token(const token& t) {
return net::ntoh(*lp);
}
// XXX: Technically, this should be inside long token. However, long_token is
// used quite a lot in hot paths, so it is better to keep the branches of, if
// we can. Most our comparators will check for _kind separately,
// so this should be fine.
sstring murmur3_partitioner::to_sstring(const token& t) const {
int64_t lt;
if (t._kind == dht::token::kind::before_all_keys) {
lt = std::numeric_limits<long>::min();
} else {
lt = long_token(t);
}
return ::to_sstring(lt);
return ::to_sstring(long_token(t));
}
dht::token murmur3_partitioner::from_sstring(const sstring& t) const {
@@ -122,17 +112,35 @@ int murmur3_partitioner::tri_compare(const token& t1, const token& t2) {
}
}
// Assuming that x>=y, return the positive difference x-y.
// The return type is an unsigned type, as the difference may overflow
// a signed type (e.g., consider very positive x and very negative y).
template <typename T>
static std::make_unsigned_t<T> positive_subtract(T x, T y) {
return std::make_unsigned_t<T>(x) - std::make_unsigned_t<T>(y);
}
token murmur3_partitioner::midpoint(const token& t1, const token& t2) const {
auto l1 = long_token(t1);
auto l2 = long_token(t2);
// long_token is defined as signed, but the arithmetic works out the same
// without invoking undefined behavior with a signed type.
auto delta = (uint64_t(l2) - uint64_t(l1)) / 2;
if (l1 > l2) {
// wraparound
delta += 0x8000'0000'0000'0000;
int64_t mid;
if (l1 <= l2) {
// To find the midpoint, we cannot use the trivial formula (l1+l2)/2
// because the addition can overflow the integer. To avoid this
// overflow, we first notice that the above formula is equivalent to
// l1 + (l2-l1)/2. Now, "l2-l1" can still overflow a signed integer
// (e.g., think of a very positive l2 and very negative l1), but
// because l1 <= l2 in this branch, we note that l2-l1 is positive
// and fits an *unsigned* int's range. So,
mid = l1 + positive_subtract(l2, l1)/2;
} else {
// When l2 < l1, we need to switch l1 and and l2 in the above
// formula, because now l1 - l2 is positive.
// Additionally, we consider this case is a "wrap around", so we need
// to behave as if l2 + 2^64 was meant instead of l2, i.e., add 2^63
// to the average.
mid = l2 + positive_subtract(l1, l2)/2 + 0x8000'0000'0000'0000;
}
auto mid = uint64_t(l1) + delta;
return get_token(mid);
}

12
dist/ami/build_ami.sh vendored
View File

@@ -5,6 +5,16 @@ if [ ! -e dist/ami/build_ami.sh ]; then
exit 1
fi
TARGET_JSON=scylla.json
if [ "$1" != "" ]; then
TARGET_JSON=$1
fi
if [ ! -f dist/ami/$TARGET_JSON ]; then
echo "dist/ami/$TARGET_JSON does not found"
exit 1
fi
cd dist/ami
if [ ! -f variables.json ]; then
@@ -20,4 +30,4 @@ if [ ! -d packer ]; then
cd -
fi
packer/packer build -var-file=variables.json scylla.json
packer/packer build -var-file=variables.json $TARGET_JSON

30
dist/ami/build_ami_local.sh vendored Executable file
View File

@@ -0,0 +1,30 @@
#!/bin/sh -e
if [ ! -e dist/ami/build_ami_local.sh ]; then
echo "run build_ami_local.sh in top of scylla dir"
exit 1
fi
sudo yum -y install git
if [ ! -f dist/ami/scylla-server.x86_64.rpm ]; then
dist/redhat/build_rpm.sh
cp build/rpms/scylla-server-`cat build/SCYLLA-VERSION-FILE`-`cat build/SCYLLA-RELEASE-FILE`.*.x86_64.rpm dist/ami/scylla-server.x86_64.rpm
fi
if [ ! -f dist/ami/scylla-jmx.noarch.rpm ]; then
cd build
git clone --depth 1 https://github.com/scylladb/scylla-jmx.git
cd scylla-jmx
sh -x -e dist/redhat/build_rpm.sh
cd ../..
cp build/scylla-jmx/build/rpms/scylla-jmx-`cat build/scylla-jmx/build/SCYLLA-VERSION-FILE`-`cat build/scylla-jmx/build/SCYLLA-RELEASE-FILE`.*.noarch.rpm dist/ami/scylla-jmx.noarch.rpm
fi
if [ ! -f dist/ami/scylla-tools.noarch.rpm ]; then
cd build
git clone --depth 1 https://github.com/scylladb/scylla-tools-java.git
cd scylla-tools-java
sh -x -e dist/redhat/build_rpm.sh
cd ../..
cp build/scylla-tools-java/build/rpms/scylla-tools-`cat build/scylla-tools-java/build/SCYLLA-VERSION-FILE`-`cat build/scylla-tools-java/build/SCYLLA-RELEASE-FILE`.*.noarch.rpm dist/ami/scylla-tools.noarch.rpm
fi
exec dist/ami/build_ami.sh scylla_local.json

45
dist/ami/files/.bash_profile vendored Normal file
View File

@@ -0,0 +1,45 @@
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/.local/bin:$HOME/bin
export PATH
echo
echo ' _____ _ _ _____ ____ '
echo ' / ____| | | | | __ \| _ \ '
echo ' | (___ ___ _ _| | | __ _| | | | |_) |'
echo ' \___ \ / __| | | | | |/ _` | | | | _ < '
echo ' ____) | (__| |_| | | | (_| | |__| | |_) |'
echo ' |_____/ \___|\__, |_|_|\__,_|_____/|____/ '
echo ' __/ | '
echo ' |___/ '
echo ''
echo ''
echo 'Nodetool:'
echo ' nodetool --help'
echo 'CQL Shell:'
echo ' cqlsh'
echo 'More documentation available at: '
echo ' http://www.scylladb.com/doc/'
echo
if [ "`systemctl is-active scylla-server`" = "active" ]; then
tput setaf 4
tput bold
echo " ScyllaDB is active."
tput sgr0
else
tput setaf 1
tput bold
echo " ScyllaDB is not started!"
tput sgr0
echo "Please wait for startup. To see status of ScyllaDB, run "
echo " 'systemctl status scylla-server'"
fi

View File

@@ -1,5 +0,0 @@
[Coredump]
Storage=external
Compress=yes
ProcessSizeMax=16G
ExternalSizeMax=16G

View File

@@ -1,11 +0,0 @@
[Unit]
Description=Scylla Setup
After=network.target
[Service]
Type=oneshot
ExecStart=/usr/lib/scylla/scylla-setup.sh
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target

View File

@@ -1,56 +0,0 @@
#!/bin/sh -e
RAIDCNT=`grep xvdb /proc/mdstat | wc -l`
RAIDDEV=`grep xvdb /proc/mdstat | awk '{print $1}'`
if [ $RAIDCNT -ge 1 ]; then
echo "RAID already constructed."
mount -o noatime /dev/$RAIDDEV /var/lib/scylla
else
echo "RAID does not constructed, going to initialize..."
dnf update -y
DISKS=""
NR=0
for i in xvd{b..z}; do
if [ -b /dev/$i ];then
echo Found disk /dev/$i
DISKS="$DISKS /dev/$i"
NR=$((NR+1))
fi
done
echo Creating RAID0 for scylla using $NR disk\(s\): $DISKS
if [ $NR -ge 1 ]; then
mdadm --create --verbose --force --run /dev/md0 --level=0 -c256 --raid-devices=$NR $DISKS
blockdev --setra 65536 /dev/md0
mkfs.xfs /dev/md0 -f
echo "DEVICE $DISKS" > /etc/mdadm.conf
mdadm --detail --scan >> /etc/mdadm.conf
UUID=`blkid /dev/md0 | awk '{print $2}'`
mount -o noatime /dev/md0 /var/lib/scylla
else
echo "WARN: Scylla is not using XFS to store data. Perforamnce will suffer." > /home/fedora/WARN_PLEASE_READ.TXT
fi
mkdir -p /var/lib/scylla/data
mkdir -p /var/lib/scylla/commitlog
chown scylla:scylla /var/lib/scylla/*
chown scylla:scylla /var/lib/scylla/
CPU_NR=`cat /proc/cpuinfo |grep processor|wc -l`
if [ $CPU_NR -ge 8 ]; then
NR=$((CPU_NR - 1))
grep -v SCYLLA_ARGS /etc/sysconfig/scylla-server | grep -v SET_NIC > /tmp/scylla-server
echo SCYLLA_ARGS=\"--cpuset 1-$NR --smp $NR\" >> /tmp/scylla-server
echo SET_NIC=\"yes\" >> /tmp/scylla-server
mv /tmp/scylla-server /etc/sysconfig/scylla-server
fi
/usr/lib/scylla/scylla-ami/ds2_configure.py
fi
systemctl start scylla-server.service
systemctl start scylla-jmx.service

View File

@@ -1,11 +0,0 @@
[scylla]
name=Scylla for Fedora $releasever - $basearch
baseurl=https://s3.amazonaws.com/downloads.scylladb.com/rpm/fedora/$releasever/$basearch/
enabled=1
gpgcheck=0
[scylla-generic]
name=Scylla for Fedora $releasever
baseurl=https://s3.amazonaws.com/downloads.scylladb.com/rpm/fedora/$releasever/noarch/
enabled=1
gpgcheck=0

View File

@@ -1,18 +0,0 @@
#!/bin/sh -e
setenforce 0
sed -e "s/enforcing/disabled/" /etc/sysconfig/selinux > /tmp/selinux
mv /tmp/selinux /etc/sysconfig/
dnf update -y
mv /home/fedora/scylla.repo /etc/yum.repos.d/
dnf install -y scylla-server scylla-server-debuginfo scylla-jmx scylla-tools
dnf install -y mdadm xfsprogs
cp /home/fedora/coredump.conf /etc/systemd/coredump.conf
mv /home/fedora/scylla-setup.service /usr/lib/systemd/system
mv /home/fedora/scylla-setup.sh /usr/lib/scylla
chmod a+rx /usr/lib/scylla/scylla-setup.sh
mv /home/fedora/scylla-ami /usr/lib/scylla/scylla-ami
chmod a+rx /usr/lib/scylla/scylla-ami/ds2_configure.py
systemctl enable scylla-setup.service
grep -v ' - mounts' /etc/cloud/cloud.cfg > /tmp/cloud.cfg
mv /tmp/cloud.cfg /etc/cloud/cloud.cfg

16
dist/ami/scylla.json vendored
View File

@@ -18,13 +18,23 @@
"provisioners": [
{
"type": "file",
"source": "files/",
"destination": "/home/fedora"
"source": "files/scylla-ami",
"destination": "/home/fedora/scylla-ami"
},
{
"type": "file",
"source": "files/.bash_profile",
"destination": "/home/fedora/.bash_profile"
},
{
"type": "file",
"source": "../../scripts/scylla_install",
"destination": "/home/fedora/scylla_install"
},
{
"type": "shell",
"inline": [
"sudo sh -x -e /home/fedora/setup-ami.sh"
"sudo sh -x -e /home/fedora/scylla_install -a"
]
}
],

67
dist/ami/scylla_local.json vendored Normal file
View File

@@ -0,0 +1,67 @@
{
"builders": [
{
"type": "amazon-ebs",
"access_key": "{{user `access_key`}}",
"secret_key": "{{user `secret_key`}}",
"subnet_id": "{{user `subnet_id`}}",
"security_group_id": "{{user `security_group_id`}}",
"region": "{{user `region`}}",
"associate_public_ip_address": "{{user `associate_public_ip_address`}}",
"source_ami": "ami-a51564c0",
"instance_type": "{{user `instance_type`}}",
"ssh_username": "fedora",
"ssh_timeout": "5m",
"ami_name": "scylla_{{isotime | clean_ami_name}}"
}
],
"provisioners": [
{
"type": "file",
"source": "files/scylla-ami",
"destination": "/home/fedora/scylla-ami"
},
{
"type": "file",
"source": "files/.bash_profile",
"destination": "/home/fedora/.bash_profile"
},
{
"type": "file",
"source": "../../scripts/scylla_install",
"destination": "/home/fedora/scylla_install"
},
{
"type": "file",
"source": "scylla-server.x86_64.rpm",
"destination": "/home/fedora/scylla-server.x86_64.rpm"
},
{
"type": "file",
"source": "scylla-jmx.noarch.rpm",
"destination": "/home/fedora/scylla-jmx.noarch.rpm"
},
{
"type": "file",
"source": "scylla-tools.noarch.rpm",
"destination": "/home/fedora/scylla-tools.noarch.rpm"
},
{
"type": "shell",
"inline": [
"sudo yum install -y /home/fedora/scylla-server.x86_64.rpm /home/fedora/scylla-jmx.noarch.rpm /home/fedora/scylla-tools.noarch.rpm",
"sudo mv /home/fedora/scylla-ami /usr/lib/scylla/scylla-ami",
"sudo sh -x -e /home/fedora/scylla_install -a -l /home/fedora"
]
}
],
"variables": {
"access_key": "",
"secret_key": "",
"subnet_id": "",
"security_group_id": "",
"region": "",
"associate_public_ip_address": "",
"instance_type": ""
}
}

View File

@@ -1,4 +1,5 @@
scylla - core unlimited
scylla - memlock unlimited
scylla - nofile 100000
scylla - nofile 200000
scylla - as unlimited
scylla - nproc 8096

48
dist/common/scripts/scylla_bootparam_setup vendored Executable file
View File

@@ -0,0 +1,48 @@
#!/bin/sh -e
#
# Copyright (C) 2015 ScyllaDB
print_usage() {
echo "scylla_bootparam_setup -a"
echo " -a AMI instance mode"
exit 1
}
AMI=0
while getopts a OPT; do
case "$OPT" in
"a")
AMI=1
;;
"h")
print_usage
;;
esac
done
. /etc/os-release
if [ $AMI -eq 1 ]; then
. /etc/sysconfig/scylla-server
sed -e "s#append #append clocksource=tsc tsc=reliable hugepagesz=2M hugepages=$NR_HUGEPAGES #" /boot/extlinux/extlinux.conf > /tmp/extlinux.conf
mv /tmp/extlinux.conf /boot/extlinux/extlinux.conf
else
. /etc/sysconfig/scylla-server
if [ ! -f /etc/default/grub ]; then
echo "Unsupported bootloader"
exit 1
fi
if [ "`grep hugepagesz /etc/default/grub`" != "" ] || [ "`grep hugepages /etc/default/grub`" != "" ]; then
sed -e "s#hugepagesz=2M ##" /etc/default/grub > /tmp/grub
mv /tmp/grub /etc/default/grub
sed -e "s#hugepages=[0-9]* ##" /etc/default/grub > /tmp/grub
mv /tmp/grub /etc/default/grub
fi
sed -e "s#^GRUB_CMDLINE_LINUX=\"#GRUB_CMDLINE_LINUX=\"hugepagesz=2M hugepages=$NR_HUGEPAGES #" /etc/default/grub > /tmp/grub
mv /tmp/grub /etc/default/grub
if [ "$ID" = "ubuntu" ]; then
grub2-mkconfig -o /boot/grub/grub.cfg
else
grub2-mkconfig -o /boot/grub2/grub.cfg
fi
fi

16
dist/common/scripts/scylla_coredump_setup vendored Executable file
View File

@@ -0,0 +1,16 @@
#!/bin/sh -e
#
# Copyright (C) 2015 ScyllaDB
. /etc/os-release
if [ "$ID" = "ubuntu" ]; then
apt-get remove -y apport-noui
else
if [ -f /etc/systemd/coredump.conf ]; then
mv /etc/systemd/coredump.conf /etc/systemd/coredump.conf.save
systemctl daemon-reload
fi
fi
sysctl -p /etc/sysctl.d/99-scylla.conf

38
dist/common/scripts/scylla_ntp_setup vendored Executable file
View File

@@ -0,0 +1,38 @@
#!/bin/sh -e
#
# Copyright (C) 2015 ScyllaDB
print_usage() {
echo "scylla_ntp_setup -a"
echo " -a AMI instance mode"
exit 1
}
AMI=0
while getopts a OPT; do
case "$OPT" in
"a")
AMI=1
;;
"h")
print_usage
;;
esac
done
. /etc/os-release
if [ "$NAME" = "Ubuntu" ]; then
apt-get install -y ntp ntpdate
service ntp stop
ntpdate `cat /etc/ntp.conf |grep "^server"|head -n1|awk '{print $2}'`
service ntp start
else
yum install -y ntp ntpdate || true
if [ $AMI -eq 1 ]; then
sed -e s#fedora.pool.ntp.org#amazon.pool.ntp.org# /etc/ntp.conf > /tmp/ntp.conf
mv /tmp/ntp.conf /etc/ntp.conf
fi
systemctl enable ntpd.service
ntpdate `cat /etc/ntp.conf |grep "^server"|head -n1|awk '{print $2}'`
systemctl start ntpd.service
fi

View File

@@ -1,5 +1,43 @@
#!/bin/sh -e
if [ "$AMI" = "yes" ]; then
RAIDCNT=`grep xvdb /proc/mdstat | wc -l`
RAIDDEV=`grep xvdb /proc/mdstat | awk '{print $1}'`
if [ $RAIDCNT -ge 1 ]; then
echo "RAID already constructed."
if [ "`mount|grep /var/lib/scylla`" = "" ]; then
mount -o noatime /dev/$RAIDDEV /var/lib/scylla
fi
else
echo "RAID does not constructed, going to initialize..."
if [ "$AMI_KEEP_VERSION" != "yes" ]; then
yum update -y
fi
DISKS=""
for i in /dev/xvd{b..z}; do
if [ -b $i ];then
echo "Found disk $i"
if [ "$DISKS" = "" ]; then
DISKS=$i
else
DISKS="$DISKS,$i"
fi
fi
done
if [ "$DISKS" != "" ]; then
/usr/lib/scylla/scylla_raid_setup -d $DISKS
else
echo "WARN: Scylla is not using XFS to store data. Perforamnce will suffer." > /home/fedora/WARN_PLEASE_READ.TXT
fi
/usr/lib/scylla/scylla-ami/ds2_configure.py
fi
fi
if [ "$NETWORK_MODE" = "virtio" ]; then
ip tuntap del mode tap dev $TAP
ip tuntap add mode tap dev $TAP user $USER one_queue vnet_hdr
@@ -15,10 +53,10 @@ elif [ "$NETWORK_MODE" = "dpdk" ]; then
done
else # NETWORK_MODE = posix
if [ "$SET_NIC" = "yes" ]; then
sudo sh /usr/lib/scylla/posix_net_conf.sh >/dev/null 2>&1 || true
sudo sh /usr/lib/scylla/posix_net_conf.sh $IFNAME >/dev/null 2>&1 || true
fi
fi
. /etc/os-release
if [ "$NAME" = "Ubuntu" ]; then
if [ "$ID" = "ubuntu" ]; then
hugeadm --create-mounts
fi

61
dist/common/scripts/scylla_raid_setup vendored Executable file
View File

@@ -0,0 +1,61 @@
#!/bin/sh -e
#
# Copyright (C) 2015 ScyllaDB
print_usage() {
echo "scylla-raid-setup -d /dev/hda,/dev/hdb... -r /dev/md0 -u"
echo " -d specify disks for RAID"
echo " -r MD device name for RAID"
echo " -u update /etc/fstab for RAID"
exit 1
}
RAID=/dev/md0
FSTAB=0
while getopts d:r:uh OPT; do
case "$OPT" in
"d")
DISKS=`echo $OPTARG|tr -s ',' ' '`
NR_DISK=$((`echo $OPTARG|grep , -o|wc -w` + 1))
;;
"r")
RAID=$OPTARG
;;
"u")
FSTAB=1
;;
"h")
print_usage
;;
esac
done
if [ "$DISKS" = "" ]; then
print_usage
fi
echo Creating RAID0 for scylla using $NR_DISK disk\(s\): $DISKS
if [ -e $RAID ]; then
echo "$RAID is already using"
exit 1
fi
if [ "`mount|grep /var/lib/scylla`" != "" ]; then
echo "/var/lib/scylla is already mounted"
exit 1
fi
mdadm --create --verbose --force --run $RAID --level=0 -c256 --raid-devices=$NR_DISK $DISKS
blockdev --setra 65536 $RAID
mkfs.xfs $RAID -f
echo "DEVICE $DISKS" > /etc/mdadm.conf
mdadm --detail --scan >> /etc/mdadm.conf
if [ $FSTAB -ne 0 ]; then
UUID=`blkid $RAID | awk '{print $2}'`
echo "$UUID /var/lib/scylla xfs noatime 0 0" >> /etc/fstab
fi
mount -t xfs -o noatime $RAID /var/lib/scylla
mkdir -p /var/lib/scylla/data
mkdir -p /var/lib/scylla/commitlog
mkdir -p /var/lib/scylla/coredump
chown scylla:scylla /var/lib/scylla/*
chown scylla:scylla /var/lib/scylla/

10
dist/common/scripts/scylla_save_coredump vendored Executable file
View File

@@ -0,0 +1,10 @@
#!/bin/sh -e
#
# Copyright (C) 2015 ScyllaDB
FILE=$1
TIME=`date --date @$2 +%F-%T`
PID=$3
mkdir -p /var/lib/scylla/coredump
/usr/bin/gzip -c > /var/lib/scylla/coredump/core.$FILE-$TIME-$PID.gz

100
dist/common/scripts/scylla_sysconfig_setup vendored Executable file
View File

@@ -0,0 +1,100 @@
#!/bin/sh -e
#
# Copyright (C) 2015 ScyllaDB
print_usage() {
echo "scylla-sysconfig-setup -n eth0 -m posix -p 64 -u scylla -g scylla -d /var/lib/scylla -c /etc/scylla -N -a -k"
echo " -n specify NIC"
echo " -m network mode (posix, dpdk)"
echo " -p number of hugepages"
echo " -u user (dpdk requires root)"
echo " -g group (dpdk requires root)"
echo " -d scylla home directory"
echo " -c scylla config directory"
echo " -N setup NIC's interrupts, RPS, XPS"
echo " -a AMI instance mode"
echo " -k keep package version on AMI"
exit 1
}
. /etc/os-release
if [ "$ID" = "ubuntu" ]; then
SYSCONFIG=/etc/default
else
SYSCONFIG=/etc/sysconfig
fi
NIC=eth0
NETWORK_MODE=posix
NR_HUGEPAGES=64
USER=scylla
GROUP=scylla
SCYLLA_HOME=/var/lib/scylla
SCYLLA_CONF=/etc/scylla
SETUP_NIC=0
SET_NIC="no"
AMI=no
AMI_KEEP_VERSION=no
SCYLLA_ARGS=
while getopts n:m:p:u:g:d:c:Nakh OPT; do
case "$OPT" in
"n")
NIC=$OPTARG
;;
"m")
NETWORK_MODE=$OPTARG
;;
"p")
NR_HUGEPAGES=$OPTARG
;;
"u")
USER=$OPTARG
;;
"g")
GROUP=$OPTARG
;;
"d")
SCYLLA_HOME=$OPTARG
;;
"c")
SCYLLA_CONF=$OPTARG
;;
"N")
SETUP_NIC=1
;;
"a")
AMI=yes
;;
"k")
AMI_KEEP_VERSION=yes
;;
"h")
print_usage
;;
esac
done
echo Setting parameters on $SYSCONFIG/scylla-server
ETHDRV=`/usr/lib/scylla/dpdk_nic_bind.py --status | grep if=$NIC | sed -e "s/^.*drv=//" -e "s/ .*$//"`
ETHPCIID=`/usr/lib/scylla/dpdk_nic_bind.py --status | grep if=$NIC | awk '{print $1}'`
NR_CPU=`cat /proc/cpuinfo |grep processor|wc -l`
if [ $NR_CPU -ge 8 ]; then
NR=$((NR_CPU - 1))
SET_NIC="yes"
SCYLLA_ARGS="--cpuset 1-$NR --smp $NR"
fi
sed -e s#^NETWORK_MODE=.*#NETWORK_MODE=$NETWORK_MODE# \
-e s#^ETHDRV=.*#ETHDRV=$ETHDRV# \
-e s#^ETHPCIID=.*#ETHPCIID=$ETHPCIID# \
-e s#^NR_HUGEPAGES=.*#NR_HUGEPAGES=$NR_HUGEPAGES# \
-e s#^USER=.*#USER=$USER# \
-e s#^GROUP=.*#GROUP=$GROUP# \
-e s#^SCYLLA_HOME=.*#SCYLLA_HOME=$SCYLLA_HOME# \
-e s#^SCYLLA_CONF=.*#SCYLLA_CONF=$SCYLLA_CONF# \
-e s#^SET_NIC=.*#SET_NIC=$SET_NIC# \
-e s#^SCYLLA_ARGS=.*#SCYLLA_ARGS="$SCYLLA_ARGS"# \
-e s#^AMI=.*#AMI="$AMI"# \
-e s#^AMI_KEEP_VERSION=.*#AMI_KEEP_VERSION="$AMI_KEEP_VERSION"# \
$SYSCONFIG/scylla-server > /tmp/scylla-server
mv /tmp/scylla-server $SYSCONFIG/scylla-server

View File

@@ -7,6 +7,12 @@ TAP=tap0
# bridge device name (virtio)
BRIDGE=virbr0
# ethernet device name
IFNAME=eth0
# setup NIC's interrupts, RPS, XPS (posix)
SET_NIC=no
# ethernet device driver (dpdk)
ETHDRV=
@@ -30,3 +36,9 @@ SCYLLA_CONF=/etc/scylla
# additional arguments
SCYLLA_ARGS=""
# setup as AMI instance
AMI=no
# do not upgrade Scylla packages on AMI startup
AMI_KEEP_VERSION=no

1
dist/common/sysctl.d/99-scylla.conf vendored Normal file
View File

@@ -0,0 +1 @@
kernel.core_pattern=|/usr/lib/scylla/scylla_save_coredump %e %t %p

View File

@@ -4,9 +4,9 @@ IP=$(hostname -i)
sed -e "s/seeds:.*/seeds: $IP/g" /var/lib/scylla/conf/scylla.yaml > $HOME/scylla.yaml
/usr/bin/scylla --log-to-syslog 1 \
--log-to-stdout 0 \
--developer-mode true \
--default-log-level info \
--options-file $HOME/scylla.yaml \
--listen-address $IP \
--rpc-address $IP \
--network-stack posix \
--smp 1
--network-stack posix

View File

@@ -7,21 +7,21 @@ if [ ! -e dist/redhat/build_rpm.sh ]; then
exit 1
fi
OS=`awk '{print $1}' /etc/redhat-release`
if [ "$OS" != "Fedora" ] && [ "$OS" != "CentOS" ]; then
. /etc/os-release
if [ "$ID" != "fedora" ] && [ "$ID" != "centos" ]; then
echo "Unsupported distribution"
exit 1
fi
if [ "$OS" = "Fedora" ] && [ ! -f /usr/bin/mock ]; then
if [ "$ID" = "fedora" ] && [ ! -f /usr/bin/mock ]; then
sudo yum -y install mock
elif [ "$OS" = "CentOS" ] && [ ! -f /usr/bin/yum-builddep ]; then
elif [ "$ID" = "centos" ] && [ ! -f /usr/bin/yum-builddep ]; then
sudo yum -y install yum-utils
fi
if [ ! -f /usr/bin/git ]; then
sudo yum -y install git
fi
mkdir -p $RPMBUILD/{BUILD,BUILDROOT,RPMS,SOURCES,SPECS,SRPMS}
if [ "$OS" = "CentOS" ]; then
if [ "$ID" = "centos" ]; then
./dist/redhat/centos_dep/build_dependency.sh
fi
VERSION=$(./SCYLLA-VERSION-GEN)
@@ -33,7 +33,7 @@ rm -f version
cp dist/redhat/scylla-server.spec.in $RPMBUILD/SPECS/scylla-server.spec
sed -i -e "s/@@VERSION@@/$SCYLLA_VERSION/g" $RPMBUILD/SPECS/scylla-server.spec
sed -i -e "s/@@RELEASE@@/$SCYLLA_RELEASE/g" $RPMBUILD/SPECS/scylla-server.spec
if [ "$OS" = "Fedora" ]; then
if [ "$ID" = "fedora" ]; then
rpmbuild -bs --define "_topdir $RPMBUILD" $RPMBUILD/SPECS/scylla-server.spec
mock rebuild --resultdir=`pwd`/build/rpms $RPMBUILD/SRPMS/scylla-server-$VERSION*.src.rpm
else

View File

@@ -8,13 +8,18 @@ License: AGPLv3
URL: http://www.scylladb.com/
Source0: %{name}-@@VERSION@@-@@RELEASE@@.tar
BuildRequires: libaio-devel boost-devel libstdc++-devel cryptopp-devel hwloc-devel numactl-devel libpciaccess-devel libxml2-devel zlib-devel thrift-devel yaml-cpp-devel lz4-devel snappy-devel jsoncpp-devel systemd-devel xz-devel openssl-devel libcap-devel libselinux-devel libgcrypt-devel libgpg-error-devel elfutils-devel krb5-devel libcom_err-devel libattr-devel pcre-devel elfutils-libelf-devel bzip2-devel keyutils-libs-devel xfsprogs-devel make
BuildRequires: libaio-devel boost-devel libstdc++-devel cryptopp-devel hwloc-devel numactl-devel libpciaccess-devel libxml2-devel zlib-devel thrift-devel yaml-cpp-devel lz4-devel snappy-devel jsoncpp-devel systemd-devel xz-devel openssl-devel libcap-devel libselinux-devel libgcrypt-devel libgpg-error-devel elfutils-devel krb5-devel libcom_err-devel libattr-devel pcre-devel elfutils-libelf-devel bzip2-devel keyutils-libs-devel xfsprogs-devel make gnutls-devel
%{?fedora:BuildRequires: ninja-build ragel antlr3-tool antlr3-C++-devel python3 gcc-c++ libasan libubsan}
%{?rhel:BuildRequires: scylla-ninja-build scylla-ragel scylla-antlr3-tool scylla-antlr3-C++-devel python34 scylla-gcc-c++ >= 5.1.1}
Requires: systemd-libs xfsprogs
Requires: systemd-libs xfsprogs mdadm hwloc
%description
%define __debug_install_post \
%{_rpmconfigdir}/find-debuginfo.sh %{?_missing_build_ids_terminate_build:--strict-build-id} %{?_find_debuginfo_opts} "%{_builddir}/%{?buildsubdir}";\
cp scylla-gdb.py ${RPM_BUILD_ROOT}/usr/src/debug/%{name}-%{version}/;\
%{nil}
%prep
%setup -q
@@ -30,6 +35,7 @@ ninja-build -j2
%install
rm -rf $RPM_BUILD_ROOT
mkdir -p $RPM_BUILD_ROOT%{_bindir}
mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/sysctl.d/
mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/sysconfig/
mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/security/limits.d/
mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/scylla/
@@ -37,6 +43,7 @@ mkdir -p $RPM_BUILD_ROOT%{_docdir}/scylla/
mkdir -p $RPM_BUILD_ROOT%{_unitdir}
mkdir -p $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
install -m644 dist/common/sysctl.d/99-scylla.conf $RPM_BUILD_ROOT%{_sysconfdir}/sysctl.d/
install -m644 dist/common/sysconfig/scylla-server $RPM_BUILD_ROOT%{_sysconfdir}/sysconfig/
install -m644 dist/common/limits.d/scylla.conf $RPM_BUILD_ROOT%{_sysconfdir}/security/limits.d/
install -d -m755 $RPM_BUILD_ROOT%{_sysconfdir}/scylla
@@ -44,6 +51,7 @@ install -m644 conf/scylla.yaml $RPM_BUILD_ROOT%{_sysconfdir}/scylla/
install -m644 conf/cassandra-rackdc.properties $RPM_BUILD_ROOT%{_sysconfdir}/scylla/
install -m644 dist/redhat/systemd/scylla-server.service $RPM_BUILD_ROOT%{_unitdir}/
install -m755 dist/common/scripts/* $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
install -m755 dist/redhat/scripts/* $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
install -m755 seastar/scripts/posix_net_conf.sh $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
install -m755 seastar/dpdk/tools/dpdk_nic_bind.py $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
install -m755 build/release/scylla $RPM_BUILD_ROOT%{_bindir}
@@ -57,6 +65,7 @@ install -m644 licenses/* $RPM_BUILD_ROOT%{_docdir}/scylla/licenses/
install -d -m755 $RPM_BUILD_ROOT%{_sharedstatedir}/scylla/
install -d -m755 $RPM_BUILD_ROOT%{_sharedstatedir}/scylla/data
install -d -m755 $RPM_BUILD_ROOT%{_sharedstatedir}/scylla/commitlog
install -d -m755 $RPM_BUILD_ROOT%{_sharedstatedir}/scylla/coredump
install -d -m755 $RPM_BUILD_ROOT%{_prefix}/lib/scylla/swagger-ui
cp -r swagger-ui/dist $RPM_BUILD_ROOT%{_prefix}/lib/scylla/swagger-ui
install -d -m755 $RPM_BUILD_ROOT%{_prefix}/lib/scylla/api
@@ -75,13 +84,27 @@ TMP=""
if [ -d /var/lib/scylla/conf ] && [ ! -L /var/lib/scylla/conf ]; then
cp -a /var/lib/scylla/conf /tmp/%{name}-%{version}-%{release}
fi
# Adding IFNAME for previous version of sysconfig
if [ -f /etc/sysconfig/scylla-server ] && [ `grep IFNAME -r /etc/sysconfig/scylla-server|wc -l` -eq 0 ]; then
echo "# ethernet device name" >> /etc/sysconfig/scylla-server
echo "IFNAME=eth0" >> /etc/sysconfig/scylla-server
fi
if [ -d /usr/lib/scylla/scylla-ami ]; then
echo "# setup as AMI instance" >> /etc/sysconfig/scylla-server
echo "AMI=no" >> /etc/sysconfig/scylla-server
echo "# do not upgrade Scylla packages on AMI startup" >> /etc/sysconfig/scylla-server
echo "AMI_KEEP_VERSION=no" >> /etc/sysconfig/scylla-server
fi
%post
grep -v api_ui_dir /etc/scylla/scylla.yaml | grep -v api_doc_dir > /tmp/scylla.yaml
echo "api_ui_dir: /usr/lib/scylla/swagger-ui/dist/" >> /tmp/scylla.yaml
echo "api_doc_dir: /usr/lib/scylla/api/api-doc/" >> /tmp/scylla.yaml
mv /tmp/scylla.yaml /etc/scylla/scylla.yaml
# Upgrade coredump settings
if [ -f /etc/systemd/coredump.conf ];then
/usr/lib/scylla/scylla_coredump_setup
fi
%systemd_post scylla-server.service
%preun
@@ -105,6 +128,7 @@ rm -rf $RPM_BUILD_ROOT
%config(noreplace) %{_sysconfdir}/sysconfig/scylla-server
%{_sysconfdir}/security/limits.d/scylla.conf
%{_sysconfdir}/sysctl.d/99-scylla.conf
%attr(0755,root,root) %dir %{_sysconfdir}/scylla
%config(noreplace) %{_sysconfdir}/scylla/scylla.yaml
%config(noreplace) %{_sysconfdir}/scylla/cassandra-rackdc.properties
@@ -118,6 +142,12 @@ rm -rf $RPM_BUILD_ROOT
%{_prefix}/lib/scylla/scylla_prepare
%{_prefix}/lib/scylla/scylla_run
%{_prefix}/lib/scylla/scylla_stop
%{_prefix}/lib/scylla/scylla_save_coredump
%{_prefix}/lib/scylla/scylla_coredump_setup
%{_prefix}/lib/scylla/scylla_raid_setup
%{_prefix}/lib/scylla/scylla_sysconfig_setup
%{_prefix}/lib/scylla/scylla_bootparam_setup
%{_prefix}/lib/scylla/scylla_ntp_setup
%{_prefix}/lib/scylla/posix_net_conf.sh
%{_prefix}/lib/scylla/dpdk_nic_bind.py
%{_prefix}/lib/scylla/dpdk_nic_bind.pyc
@@ -127,6 +157,7 @@ rm -rf $RPM_BUILD_ROOT
%attr(0755,scylla,scylla) %dir %{_sharedstatedir}/scylla/
%attr(0755,scylla,scylla) %dir %{_sharedstatedir}/scylla/data
%attr(0755,scylla,scylla) %dir %{_sharedstatedir}/scylla/commitlog
%attr(0755,scylla,scylla) %dir %{_sharedstatedir}/scylla/coredump
%changelog
* Tue Jul 21 2015 Takuya ASADA <syuu@cloudius-systems.com>

View File

@@ -5,13 +5,14 @@ After=network.target libvirtd.service
[Service]
Type=simple
LimitMEMLOCK=infinity
LimitNOFILE=100000
LimitNOFILE=200000
LimitAS=infinity
LimitNPROC=8096
EnvironmentFile=/etc/sysconfig/scylla-server
ExecStartPre=/usr/lib/scylla/scylla_prepare
ExecStart=/usr/lib/scylla/scylla_run
ExecStopPost=/usr/lib/scylla/scylla_stop
TimeoutStartSec=900
KillMode=process
Restart=no

View File

@@ -38,7 +38,7 @@ sudo apt-get -y update
./dist/ubuntu/dep/build_dependency.sh
DEP="libyaml-cpp-dev liblz4-dev libsnappy-dev libcrypto++-dev libjsoncpp-dev libaio-dev ragel ninja-build git liblz4-1 libaio1 hugepages software-properties-common"
DEP="libyaml-cpp-dev liblz4-dev libsnappy-dev libcrypto++-dev libjsoncpp-dev libaio-dev ragel ninja-build git liblz4-1 libaio1 hugepages software-properties-common libgnutls28-dev libhwloc-dev libnuma-dev libpciaccess-dev"
if [ "$RELEASE" = "14.04" ]; then
DEP="$DEP libboost1.55-dev libboost-program-options1.55.0 libboost-program-options1.55-dev libboost-system1.55.0 libboost-system1.55-dev libboost-thread1.55.0 libboost-thread1.55-dev libboost-test1.55.0 libboost-test1.55-dev libboost-filesystem1.55-dev libboost-filesystem1.55.0 libsnappy1"
@@ -55,6 +55,6 @@ if [ "$RELEASE" != "15.10" ]; then
sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
sudo apt-get -y update
fi
sudo apt-get -y install g++-5
sudo apt-get -y install g++-4.9
debuild -r fakeroot -us -uc

View File

@@ -4,11 +4,11 @@ Homepage: http://scylladb.com
Section: database
Priority: optional
Standards-Version: 3.9.5
Build-Depends: debhelper (>= 9), libyaml-cpp-dev, liblz4-dev, libsnappy-dev, libcrypto++-dev, libjsoncpp-dev, libaio-dev, libthrift-dev, thrift-compiler, antlr3, antlr3-c++-dev, ragel, g++-5, ninja-build, git, libboost-program-options1.55-dev | libboost-program-options-dev, libboost-filesystem1.55-dev | libboost-filesystem-dev, libboost-system1.55-dev | libboost-system-dev, libboost-thread1.55-dev | libboost-thread-dev, libboost-test1.55-dev | libboost-test-dev
Build-Depends: debhelper (>= 9), libyaml-cpp-dev, liblz4-dev, libsnappy-dev, libcrypto++-dev, libjsoncpp-dev, libaio-dev, libthrift-dev, thrift-compiler, antlr3, antlr3-c++-dev, ragel, g++-4.9, ninja-build, git, libboost-program-options1.55-dev | libboost-program-options-dev, libboost-filesystem1.55-dev | libboost-filesystem-dev, libboost-system1.55-dev | libboost-system-dev, libboost-thread1.55-dev | libboost-thread-dev, libboost-test1.55-dev | libboost-test-dev, libgnutls28-dev, libhwloc-dev, libnuma-dev, libpciaccess-dev
Package: scylla-server
Architecture: amd64
Depends: ${shlibs:Depends}, ${misc:Depends}, hugepages, adduser
Depends: ${shlibs:Depends}, ${misc:Depends}, hugepages, adduser, mdadm, xfsprogs, hwloc-nox
Description: Scylla database server binaries
Scylla is a highly scalable, eventually consistent, distributed,
partitioned row DB.

View File

@@ -4,12 +4,13 @@ DOC = $(CURDIR)/debian/scylla-server/usr/share/doc/scylla-server
SCRIPTS = $(CURDIR)/debian/scylla-server/usr/lib/scylla
SWAGGER = $(SCRIPTS)/swagger-ui
API = $(SCRIPTS)/api
SYSCTL = $(CURDIR)/debian/scylla-server/etc/sysctl.d
LIMITS= $(CURDIR)/debian/scylla-server/etc/security/limits.d
LIBS = $(CURDIR)/debian/scylla-server/usr/lib
CONF = $(CURDIR)/debian/scylla-server/etc/scylla
override_dh_auto_build:
./configure.py --disable-xen --enable-dpdk --mode=release --static-stdc++ --compiler=g++-5
./configure.py --disable-xen --enable-dpdk --mode=release --static-stdc++ --compiler=g++-4.9
ninja
override_dh_auto_clean:
@@ -21,6 +22,9 @@ override_dh_auto_install:
mkdir -p $(LIMITS) && \
cp $(CURDIR)/dist/common/limits.d/scylla.conf $(LIMITS)
mkdir -p $(SYSCTL) && \
cp $(CURDIR)/dist/common/sysctl.d/99-scylla.conf $(SYSCTL)
mkdir -p $(CONF) && \
cp $(CURDIR)/conf/scylla.yaml $(CONF)
cp $(CURDIR)/conf/cassandra-rackdc.properties $(CONF)
@@ -34,6 +38,7 @@ override_dh_auto_install:
mkdir -p $(SCRIPTS) && \
cp $(CURDIR)/seastar/dpdk/tools/dpdk_nic_bind.py $(SCRIPTS)
cp $(CURDIR)/dist/common/scripts/* $(SCRIPTS)
cp $(CURDIR)/dist/ubuntu/scripts/* $(SCRIPTS)
mkdir -p $(SWAGGER) && \
cp -r $(CURDIR)/swagger-ui/dist $(SWAGGER)
@@ -47,6 +52,7 @@ override_dh_auto_install:
mkdir -p $(CURDIR)/debian/scylla-server/var/lib/scylla/data
mkdir -p $(CURDIR)/debian/scylla-server/var/lib/scylla/commitlog
mkdir -p $(CURDIR)/debian/scylla-server/var/lib/scylla/coredump
override_dh_strip:
dh_strip --dbg-package=scylla-server-dbg

19
dist/ubuntu/scripts/scylla_run vendored Executable file
View File

@@ -0,0 +1,19 @@
#!/bin/bash -e
args="--log-to-syslog 1 --log-to-stdout 0 --default-log-level info $SCYLLA_ARGS"
if [ "$NETWORK_MODE" = "posix" ]; then
args="$args --network-stack posix"
elif [ "$NETWORK_MODE" = "virtio" ]; then
args="$args --network-stack native"
elif [ "$NETWORK_MODE" = "dpdk" ]; then
args="$args --network-stack native --dpdk-pmd"
fi
export HOME=/var/lib/scylla
ulimit -c unlimited
ulimit -l unlimited
ulimit -n 200000
ulimit -m unlimited
ulimit -u 8096
exec sudo -E -u $USER /usr/bin/scylla $args

View File

@@ -27,6 +27,7 @@
#include "mutation_partition_serializer.hh"
#include "utils/UUID.hh"
#include "utils/data_input.hh"
#include "query-result-set.hh"
//
// Representation layout:
@@ -49,6 +50,11 @@ frozen_mutation::key(const schema& s) const {
return partition_key_view_serializer::read(in);
}
dht::decorated_key
frozen_mutation::decorated_key(const schema& s) const {
return dht::global_partitioner().decorate_key(s, key(s));
}
frozen_mutation::frozen_mutation(bytes&& b)
: _bytes(std::move(b))
{ }
@@ -88,3 +94,11 @@ mutation_partition_view frozen_mutation::partition() const {
partition_key_view_serializer::skip(in);
return mutation_partition_view::from_bytes(in.read_view(in.avail()));
}
std::ostream& operator<<(std::ostream& out, const frozen_mutation::printer& pr) {
return out << pr.self.unfreeze(pr.schema);
}
frozen_mutation::printer frozen_mutation::pretty_printer(schema_ptr s) const {
return { *this, std::move(s) };
}

View File

@@ -21,6 +21,7 @@
#pragma once
#include "dht/i_partitioner.hh"
#include "atomic_cell.hh"
#include "keys.hh"
#include "mutation_partition_view.hh"
@@ -51,8 +52,17 @@ public:
bytes_view representation() const { return _bytes; }
utils::UUID column_family_id() const;
partition_key_view key(const schema& s) const;
dht::decorated_key decorated_key(const schema& s) const;
mutation_partition_view partition() const;
mutation unfreeze(schema_ptr s) const;
struct printer {
const frozen_mutation& self;
schema_ptr schema;
friend std::ostream& operator<<(std::ostream&, const printer&);
};
printer pretty_printer(schema_ptr) const;
};
frozen_mutation freeze(const mutation& m);

View File

@@ -54,7 +54,7 @@ namespace gms {
*/
class endpoint_state {
public:
using clk = std::chrono::high_resolution_clock;
using clk = std::chrono::steady_clock;
private:
heart_beat_state _heart_beat_state;
std::map<application_state, versioned_value> _application_state;
@@ -109,7 +109,11 @@ public:
}
void add_application_state(application_state key, versioned_value value) {
_application_state[key] = value;
if (_application_state.count(key)) {
_application_state.at(key) = value;
} else {
_application_state.emplace(key, value);
}
}
/* getters and setters */

View File

@@ -91,11 +91,8 @@ void gossiper::set_seeds(std::set<inet_address> _seeds) {
}
std::chrono::milliseconds gossiper::quarantine_delay() {
return std::chrono::milliseconds(service::storage_service::RING_DELAY * 2);
}
static auto storage_service_ring_delay() {
return std::chrono::milliseconds(service::storage_service::RING_DELAY);
auto& ss = service::get_local_storage_service();
return ss.get_ring_delay() * 2;
}
auto& storage_service_value_factory() {
@@ -585,9 +582,9 @@ void gossiper::run() {
logger.trace("=== Gossip round FAIL");
}
if (logger.is_enabled(logging::log_level::debug)) {
if (logger.is_enabled(logging::log_level::trace)) {
for (auto& x : endpoint_state_map) {
logger.debug("ep={}, eps={}", x.first, x.second);
logger.trace("ep={}, eps={}", x.first, x.second);
}
}
_scheduled_gossip_task.arm(INTERVAL);
@@ -762,8 +759,9 @@ future<> gossiper::advertise_removing(inet_address endpoint, utils::UUID host_id
// remember this node's generation
int generation = state.get_heart_beat_state().get_generation();
logger.info("Removing host: {}", host_id);
logger.info("Sleeping for {}ms to ensure {} does not change", service::storage_service::RING_DELAY, endpoint);
sleep(storage_service_ring_delay()).get();
auto ring_delay = service::get_local_storage_service().get_ring_delay();
logger.info("Sleeping for {}ms to ensure {} does not change", ring_delay.count(), endpoint);
sleep(ring_delay).get();
// make sure it did not change
auto& eps = endpoint_state_map.at(endpoint);
if (eps.get_heart_beat_state().get_generation() != generation) {
@@ -823,9 +821,9 @@ future<> gossiper::assassinate_endpoint(sstring address) {
int generation = ep_state.get_heart_beat_state().get_generation();
int heartbeat = ep_state.get_heart_beat_state().get_heart_beat_version();
logger.info("Sleeping for {} ms to ensure {} does not change", service::storage_service::RING_DELAY, endpoint);
logger.info("Sleeping for {} ms to ensure {} does not change", ss.get_ring_delay().count(), endpoint);
// make sure it did not change
sleep(storage_service_ring_delay()).get();
sleep(ss.get_ring_delay()).get();
auto it = endpoint_state_map.find(endpoint);
if (it == endpoint_state_map.end()) {
@@ -1129,7 +1127,7 @@ void gossiper::handle_major_state_change(inet_address ep, const endpoint_state&
if (!is_dead_state(ep_state)) {
mark_alive(ep, ep_state);
} else {
logger.debug("Not marking {} alive due to dead state", ep);
logger.debug("Not marking {} alive due to dead state {}", ep, get_gossip_status(eps));
mark_dead(ep, ep_state);
}
_subscribers.for_each([ep, ep_state] (auto& subscriber) {
@@ -1142,14 +1140,7 @@ void gossiper::handle_major_state_change(inet_address ep, const endpoint_state&
}
bool gossiper::is_dead_state(const endpoint_state& eps) const {
if (!eps.get_application_state(application_state::STATUS)) {
return false;
}
auto value = eps.get_application_state(application_state::STATUS)->value;
std::vector<sstring> pieces;
boost::split(pieces, value, boost::is_any_of(","));
assert(pieces.size() > 0);
sstring state = pieces[0];
sstring state = get_gossip_status(eps);
for (auto& deadstate : DEAD_STATES) {
if (state == deadstate) {
return true;
@@ -1159,38 +1150,11 @@ bool gossiper::is_dead_state(const endpoint_state& eps) const {
}
bool gossiper::is_shutdown(const inet_address& endpoint) const {
auto ep_state = get_endpoint_state_for_endpoint(endpoint);
if (!ep_state) {
return false;
}
auto app_state = ep_state->get_application_state(application_state::STATUS);
if (!app_state) {
return false;
}
auto value = app_state->value;
std::vector<sstring> pieces;
boost::split(pieces, value, boost::is_any_of(","));
assert(pieces.size() > 0);
sstring state = pieces[0];
return state == sstring(versioned_value::SHUTDOWN);
return get_gossip_status(endpoint) == sstring(versioned_value::SHUTDOWN);
}
bool gossiper::is_silent_shutdown_state(const endpoint_state& ep_state) const{
auto app_state = ep_state.get_application_state(application_state::STATUS);
if (!app_state) {
return false;
}
auto value = app_state->value;
std::vector<sstring> pieces;
boost::split(pieces, value, boost::is_any_of(","));
assert(pieces.size() > 0);
sstring state = pieces[0];
sstring state = get_gossip_status(ep_state);
for (auto& deadstate : SILENT_SHUTDOWN_STATES) {
if (state == deadstate) {
return true;
@@ -1369,7 +1333,8 @@ future<> gossiper::do_shadow_round() {
return make_ready_future<>();
}).get();
}
if (clk::now() > t + storage_service_ring_delay() * 60) {
auto& ss = service::get_local_storage_service();
if (clk::now() > t + ss.get_ring_delay() * 60) {
throw std::runtime_error(sprint("Unable to gossip with any seeds (ShadowRound)"));
}
if (this->_in_shadow_round) {
@@ -1571,5 +1536,24 @@ void gossiper::force_newer_generation() {
}
}
sstring gossiper::get_gossip_status(const endpoint_state& ep_state) const {
auto app_state = ep_state.get_application_state(application_state::STATUS);
if (!app_state) {
return "";
}
auto value = app_state->value;
std::vector<sstring> pieces;
boost::split(pieces, value, boost::is_any_of(","));
assert(pieces.size() > 0);
return pieces[0];
}
sstring gossiper::get_gossip_status(const inet_address& endpoint) const {
auto ep_state = get_endpoint_state_for_endpoint(endpoint);
if (!ep_state) {
return "";
}
return get_gossip_status(*ep_state);
}
} // namespace gms

View File

@@ -78,7 +78,7 @@ class i_failure_detector;
*/
class gossiper : public i_failure_detection_event_listener, public seastar::async_sharded_service<gossiper> {
public:
using clk = std::chrono::high_resolution_clock;
using clk = std::chrono::steady_clock;
private:
using messaging_verb = net::messaging_verb;
using messaging_service = net::messaging_service;
@@ -497,6 +497,9 @@ public:
bool is_silent_shutdown_state(const endpoint_state& ep_state) const;
void mark_as_shutdown(const inet_address& endpoint);
void force_newer_generation();
private:
sstring get_gossip_status(const endpoint_state& ep_state) const;
sstring get_gossip_status(const inet_address& endpoint) const;
};
extern distributed<gossiper> _the_gossiper;

View File

@@ -65,9 +65,9 @@ public:
*/
virtual void on_join(inet_address endpoint, endpoint_state ep_state) = 0;
virtual void before_change(inet_address endpoint, endpoint_state current_state, application_state new_statekey, versioned_value newvalue) = 0;
virtual void before_change(inet_address endpoint, endpoint_state current_state, application_state new_statekey, const versioned_value& newvalue) = 0;
virtual void on_change(inet_address endpoint, application_state state, versioned_value value) = 0;
virtual void on_change(inet_address endpoint, application_state state, const versioned_value& value) = 0;
virtual void on_alive(inet_address endpoint, endpoint_state state) = 0;

View File

@@ -96,11 +96,6 @@ public:
value == other.value;
}
versioned_value()
: version(version_generator::get_next_version())
, value("") {
}
private:
versioned_value(const sstring& value, int version = version_generator::get_next_version())
: version(version), value(value) {

43
init.cc
View File

@@ -45,10 +45,49 @@ future<> init_storage_service(distributed<database>& db) {
});
}
future<> init_ms_fd_gossiper(sstring listen_address, uint16_t port, db::seed_provider_type seed_provider, sstring cluster_name, double phi) {
future<> init_ms_fd_gossiper(sstring listen_address
, uint16_t storage_port
, uint16_t ssl_storage_port
, sstring ms_encrypt_what
, sstring ms_trust_store
, sstring ms_cert
, sstring ms_key
, db::seed_provider_type seed_provider
, sstring cluster_name
, double phi)
{
const gms::inet_address listen(listen_address);
using encrypt_what = net::messaging_service::encrypt_what;
using namespace seastar::tls;
encrypt_what ew = encrypt_what::none;
if (ms_encrypt_what == "all") {
ew = encrypt_what::all;
} else if (ms_encrypt_what == "dc") {
ew = encrypt_what::dc;
} else if (ms_encrypt_what == "rack") {
ew = encrypt_what::rack;
}
future<> f = make_ready_future<>();
::shared_ptr<server_credentials> creds;
if (ew != encrypt_what::none) {
// note: credentials are immutable after this, and ok to share across shards
creds = ::make_shared<server_credentials>(::make_shared<dh_params>(dh_params::level::MEDIUM));
f = creds->set_x509_key_file(ms_cert, ms_key, x509_crt_format::PEM).then([creds, ms_trust_store] {
return ms_trust_store.empty()
? creds->set_system_trust()
: creds->set_x509_trust_file(ms_trust_store, x509_crt_format::PEM)
;
});
}
// Init messaging_service
return net::get_messaging_service().start(listen, std::move(port)).then([] {
return f.then([listen, storage_port, creds, ssl_storage_port, ew] {
return net::get_messaging_service().start(listen, storage_port, ew, ssl_storage_port, creds);
// #293 - do not stop anything
//engine().at_exit([] { return net::get_messaging_service().stop(); });
}).then([phi] {

11
init.hh
View File

@@ -27,4 +27,13 @@
#include "database.hh"
future<> init_storage_service(distributed<database>& db);
future<> init_ms_fd_gossiper(sstring listen_address, uint16_t storage_port, db::seed_provider_type seed_provider, sstring cluster_name = "Test Cluster", double phi = 8);
future<> init_ms_fd_gossiper(sstring listen_address
, uint16_t storage_port
, uint16_t ssl_storage_port
, sstring ms_encrypt_what
, sstring ms_trust_store
, sstring ms_cert
, sstring ms_key
, db::seed_provider_type seed_provider
, sstring cluster_name = "Test Cluster"
, double phi = 8);

View File

@@ -28,10 +28,6 @@ std::ostream& operator<<(std::ostream& out, const partition_key& pk) {
return out << "pk{" << to_hex(pk) << "}";
}
std::ostream& operator<<(std::ostream& out, const clustering_key& ck) {
return out << "ck{" << to_hex(ck) << "}";
}
std::ostream& operator<<(std::ostream& out, const clustering_key_prefix& ckp) {
return out << "ckp{" << to_hex(ckp) << "}";
}

119
keys.hh
View File

@@ -51,10 +51,10 @@
class partition_key;
class partition_key_view;
class clustering_key;
class clustering_key_view;
class clustering_key_prefix;
class clustering_key_prefix_view;
using clustering_key = clustering_key_prefix;
using clustering_key_view = clustering_key_prefix_view;
// Abstracts a view to serialized compound.
template <typename TopLevelView>
@@ -301,6 +301,53 @@ public:
};
};
template <typename TopLevel>
class prefix_view_on_prefix_compound {
public:
using iterator = typename compound_type<allow_prefixes::yes>::iterator;
private:
bytes_view _b;
unsigned _prefix_len;
iterator _begin;
iterator _end;
public:
prefix_view_on_prefix_compound(const schema& s, bytes_view b, unsigned prefix_len)
: _b(b)
, _prefix_len(prefix_len)
, _begin(TopLevel::get_compound_type(s)->begin(_b))
, _end(_begin)
{
std::advance(_end, prefix_len);
}
iterator begin() const { return _begin; }
iterator end() const { return _end; }
struct less_compare_with_prefix {
typename TopLevel::compound prefix_type;
less_compare_with_prefix(const schema& s)
: prefix_type(TopLevel::get_compound_type(s))
{ }
bool operator()(const prefix_view_on_prefix_compound& k1, const TopLevel& k2) const {
return lexicographical_tri_compare(
prefix_type->types().begin(), prefix_type->types().end(),
k1.begin(), k1.end(),
prefix_type->begin(k2), prefix_type->end(k2),
tri_compare) < 0;
}
bool operator()(const TopLevel& k1, const prefix_view_on_prefix_compound& k2) const {
return lexicographical_tri_compare(
prefix_type->types().begin(), prefix_type->types().end(),
prefix_type->begin(k1), prefix_type->end(k1),
k2.begin(), k2.end(),
tri_compare) < 0;
}
};
};
template <typename TopLevel, typename TopLevelView, typename PrefixTopLevel>
class prefixable_full_compound : public compound_wrapper<TopLevel, TopLevelView> {
using base = compound_wrapper<TopLevel, TopLevelView>;
@@ -391,6 +438,12 @@ class prefix_compound_wrapper : public compound_wrapper<TopLevel, TopLevelView>
protected:
prefix_compound_wrapper(bytes&& b) : base(std::move(b)) {}
public:
using prefix_view_type = prefix_view_on_prefix_compound<TopLevel>;
prefix_view_type prefix_view(const schema& s, unsigned prefix_len) const {
return { s, this->representation(), prefix_len };
}
bool is_full(const schema& s) const {
return TopLevel::get_compound_type(s)->is_full(base::_bytes);
}
@@ -407,6 +460,25 @@ public:
t->begin(prefix), t->end(prefix),
equal);
}
// In prefix equality two sequences are equal if any of them is a prefix
// of the other. Otherwise lexicographical ordering is applied.
// Note: full compounds sorted according to lexicographical ordering are also
// sorted according to prefix equality ordering.
struct prefix_equality_less_compare {
typename TopLevel::compound prefix_type;
prefix_equality_less_compare(const schema& s)
: prefix_type(TopLevel::get_compound_type(s))
{ }
bool operator()(const TopLevel& k1, const TopLevel& k2) const {
return prefix_equality_tri_compare(prefix_type->types().begin(),
prefix_type->begin(k1), prefix_type->end(k1),
prefix_type->begin(k2), prefix_type->end(k2),
tri_compare) < 0;
}
};
};
class partition_key_view : public compound_view_wrapper<partition_key_view> {
@@ -506,49 +578,6 @@ public:
friend std::ostream& operator<<(std::ostream& os, const exploded_clustering_prefix& ecp);
};
class clustering_key_view : public compound_view_wrapper<clustering_key_view> {
public:
clustering_key_view(bytes_view v)
: compound_view_wrapper<clustering_key_view>(v)
{ }
public:
static clustering_key_view from_bytes(bytes_view v) {
return { v };
}
};
class clustering_key : public prefixable_full_compound<clustering_key, clustering_key_view, clustering_key_prefix> {
clustering_key(bytes&& b)
: prefixable_full_compound<clustering_key, clustering_key_view, clustering_key_prefix>(std::move(b))
{ }
public:
clustering_key(const clustering_key_view& v)
: clustering_key(bytes(v.representation().begin(), v.representation().end()))
{ }
using compound = lw_shared_ptr<compound_type<allow_prefixes::no>>;
static clustering_key from_bytes(bytes b) {
return clustering_key(std::move(b));
}
static const compound& get_compound_type(const schema& s) {
return s.clustering_key_type();
}
static clustering_key from_clustering_prefix(const schema& s, const exploded_clustering_prefix& prefix) {
if (prefix.is_full(s)) {
return from_exploded(s, prefix.components());
}
assert(s.is_dense());
auto components = prefix.components();
components.resize(s.clustering_key_size());
return from_exploded(s, std::move(components));
}
friend std::ostream& operator<<(std::ostream& out, const clustering_key& ck);
};
class clustering_key_prefix_view : public prefix_compound_view_wrapper<clustering_key_prefix_view, clustering_key> {
clustering_key_prefix_view(bytes_view v)
: prefix_compound_view_wrapper<clustering_key_prefix_view, clustering_key>(v)

View File

@@ -93,7 +93,7 @@ public:
reconnectable_snitch_helper(sstring local_dc)
: _local_dc(local_dc) {}
void before_change(gms::inet_address endpoint, gms::endpoint_state cs, gms::application_state new_state_key, gms::versioned_value new_value) override {
void before_change(gms::inet_address endpoint, gms::endpoint_state cs, gms::application_state new_state_key, const gms::versioned_value& new_value) override {
// do nothing.
}
@@ -105,7 +105,7 @@ public:
}
}
void on_change(gms::inet_address endpoint, gms::application_state state, gms::versioned_value value) override {
void on_change(gms::inet_address endpoint, gms::application_state state, const gms::versioned_value& value) override {
if (state == gms::application_state::INTERNAL_IP) {
reconnect(endpoint, value);
}

View File

@@ -81,6 +81,9 @@ void token_metadata::update_normal_token(token t, inet_address endpoint)
}
void token_metadata::update_normal_tokens(std::unordered_set<token> tokens, inet_address endpoint) {
if (tokens.empty()) {
return;
}
std::unordered_map<inet_address, std::unordered_set<token>> endpoint_tokens ({{endpoint, tokens}});
update_normal_tokens(endpoint_tokens);
}
@@ -122,7 +125,7 @@ void token_metadata::update_normal_tokens(std::unordered_map<inet_address, std::
auto prev = _token_to_endpoint_map.insert(std::pair<token, inet_address>(t, endpoint));
should_sort_tokens |= prev.second; // new token inserted -> sort
if (prev.first->second != endpoint) {
// logger.warn("Token {} changing ownership from {} to {}", t, prev.first->second, endpoint);
logger.warn("Token {} changing ownership from {} to {}", t, prev.first->second, endpoint);
prev.first->second = endpoint;
}
}
@@ -515,6 +518,21 @@ std::vector<gms::inet_address> token_metadata::pending_endpoints_for(const token
return endpoints;
}
std::map<token, inet_address> token_metadata::get_normal_and_bootstrapping_token_to_endpoint_map() {
std::map<token, inet_address> ret(_token_to_endpoint_map.begin(), _token_to_endpoint_map.end());
ret.insert(_bootstrap_tokens.begin(), _bootstrap_tokens.end());
return ret;
}
std::multimap<inet_address, token> token_metadata::get_endpoint_to_token_map_for_reading() {
std::multimap<inet_address, token> cloned;
for (const auto& x : _token_to_endpoint_map) {
cloned.emplace(x.second, x.first);
}
return cloned;
}
/////////////////// class topology /////////////////////////////////////////////
inline void topology::clear() {
_dc_endpoints.clear();

View File

@@ -877,44 +877,18 @@ public:
{
return ImmutableList.copyOf(Iterables.concat(naturalEndpoints, pendingEndpointsFor(token, keyspaceName)));
}
#endif
public:
/** @return an endpoint to token multimap representation of tokenToEndpointMap (a copy) */
public Multimap<InetAddress, Token> getEndpointToTokenMapForReading()
{
lock.readLock().lock();
try
{
Multimap<InetAddress, Token> cloned = HashMultimap.create();
for (Map.Entry<Token, InetAddress> entry : tokenToEndpointMap.entrySet())
cloned.put(entry.getValue(), entry.getKey());
return cloned;
}
finally
{
lock.readLock().unlock();
}
}
std::multimap<inet_address, token> get_endpoint_to_token_map_for_reading();
/**
* @return a (stable copy, won't be modified) Token to Endpoint map for all the normal and bootstrapping nodes
* in the cluster.
*/
public Map<Token, InetAddress> getNormalAndBootstrappingTokenToEndpointMap()
{
lock.readLock().lock();
try
{
Map<Token, InetAddress> map = new HashMap<Token, InetAddress>(tokenToEndpointMap.size() + _bootstrap_tokens.size());
map.putAll(tokenToEndpointMap);
map.putAll(_bootstrap_tokens);
return map;
}
finally
{
lock.readLock().unlock();
}
}
std::map<token, inet_address> get_normal_and_bootstrapping_token_to_endpoint_map();
#if 0
/**
* @return the Topology map of nodes to DCs + Racks
*

7
log.cc
View File

@@ -98,7 +98,12 @@ logger::really_do_log(log_level level, const char* fmt, stringer** s, size_t n)
if (*p == '{' && *(p+1) == '}') {
p += 2;
if (n > 0) {
(*s++)->append(out);
try {
(*s)->append(out);
} catch (...) {
out << '<' << std::current_exception() << '>';
}
++s;
--n;
} else {
out << "???";

153
main.cc
View File

@@ -45,11 +45,30 @@
#include "release.hh"
#include <cstdio>
#include <core/file.hh>
#include <sys/time.h>
#include <sys/resource.h>
logging::logger startlog("init");
namespace bpo = boost::program_options;
static boost::filesystem::path relative_conf_dir(boost::filesystem::path path) {
static auto conf_dir = db::config::get_conf_dir(); // this is not gonna change in our life time
return conf_dir / path;
}
// Note: would be neat if something like this was in config::string_map directly
// but that cruds up the YAML/boost integration so until I want to spend hairpulling
// time with that, this is an acceptable helper
template<typename K, typename V, typename KK, typename VV = V>
static V get_or_default(const std::unordered_map<K, V>& src, const KK& key, const VV& def = V()) {
auto i = src.find(key);
if (i != src.end()) {
return i->second;
}
return def;
}
static future<>
read_config(bpo::variables_map& opts, db::config& cfg) {
using namespace boost::filesystem;
@@ -58,12 +77,11 @@ read_config(bpo::variables_map& opts, db::config& cfg) {
if (opts.count("options-file") > 0) {
file = opts["options-file"].as<sstring>();
} else {
auto file_path = db::config::get_conf_dir();
file_path /= path("scylla.yaml");
file = file_path.string();
file = relative_conf_dir("scylla.yaml").string();
}
return cfg.read_from_file(file).handle_exception([file](auto ep) {
return check_direct_io_support(file).then([file, &cfg] {
return cfg.read_from_file(file);
}).handle_exception([file](auto ep) {
startlog.error("Could not read configuration file {}: {}", file, ep);
return make_exception_future<>(ep);
});
@@ -84,12 +102,19 @@ static logging::log_level to_loglevel(sstring level) {
}
}
static future<> disk_sanity(sstring path) {
return check_direct_io_support(path).then([path] {
return file_system_at(path).then([path] (auto fs) {
static future<> disk_sanity(sstring path, bool developer_mode) {
return check_direct_io_support(path).then([path, developer_mode] {
return file_system_at(path).then([path, developer_mode] (auto fs) {
if (fs != fs_type::xfs) {
startlog.warn("{} is not on XFS. This is a non-supported setup, and performance is expected to be very bad.\n"
"For better performance, placing your data on XFS-formatted directories is strongly recommended", path);
if (!developer_mode) {
startlog.error("{} is not on XFS. This is a non-supported setup, and performance is expected to be very bad.\n"
"For better performance, placing your data on XFS-formatted directories is required."
" To override this error, see the developer_mode configuration option.", path);
throw std::runtime_error(sprint("invalid configuration: path \"%s\" on unsupported filesystem", path));
} else {
startlog.warn("{} is not on XFS. This is a non-supported setup, and performance is expected to be very bad.\n"
"For better performance, placing your data on XFS-formatted directories is strongly recommended", path);
}
}
});
});
@@ -155,6 +180,28 @@ private:
class bad_configuration_error : public std::exception {};
static
void
verify_rlimit(bool developer_mode) {
struct rlimit lim;
int r = getrlimit(RLIMIT_NOFILE, &lim);
if (r == -1) {
throw std::system_error(errno, std::system_category());
}
auto recommended = 200'000U;
auto min = 10'000U;
if (lim.rlim_cur < min) {
if (developer_mode) {
startlog.warn("NOFILE rlimit too low (recommended setting {}, minimum setting {};"
" you may run out of file descriptors.", recommended, min);
} else {
startlog.error("NOFILE rlimit too low (recommended setting {}, minimum setting {};"
" refusing to start.", recommended, min);
throw std::runtime_error("NOFILE rlimit too low");
}
}
}
int main(int ac, char** av) {
runtime::init_uptime();
std::setvbuf(stdout, nullptr, _IOLBF, 1000);
@@ -191,21 +238,18 @@ int main(int ac, char** av) {
apply_logger_settings(cfg->default_log_level(), cfg->logger_log_level(),
cfg->log_to_stdout(), cfg->log_to_syslog());
return read_config(opts, *cfg).then([&cfg, &db, &qp, &proxy, &mm, &ctx, &opts, &dirs]() {
return read_config(opts, *cfg).then([cfg, &db, &qp, &proxy, &mm, &ctx, &opts, &dirs]() {
apply_logger_settings(cfg->default_log_level(), cfg->logger_log_level(),
cfg->log_to_stdout(), cfg->log_to_syslog());
verify_rlimit(cfg->developer_mode());
dht::set_global_partitioner(cfg->partitioner());
auto start_thrift = cfg->start_rpc();
uint16_t api_port = cfg->api_port();
uint16_t storage_port = cfg->storage_port();
ctx.api_dir = cfg->api_ui_dir();
ctx.api_doc = cfg->api_doc_dir();
double phi = cfg->phi_convict_threshold();
sstring cluster_name = cfg->cluster_name();
sstring listen_address = cfg->listen_address();
sstring rpc_address = cfg->rpc_address();
sstring api_address = cfg->api_address() != "" ? cfg->api_address() : rpc_address;
auto seed_provider= cfg->seed_provider();
sstring broadcast_address = cfg->broadcast_address();
sstring broadcast_rpc_address = cfg->broadcast_rpc_address();
@@ -228,14 +272,45 @@ int main(int ac, char** av) {
utils::fb_utilities::set_broadcast_rpc_address(rpc_address);
}
// TODO: lib.
auto is_true = [](sstring val) {
std::transform(val.begin(), val.end(), val.begin(), ::tolower);
return val == "true" || val == "1";
};
// The start_native_transport method is invoked by API as well, and uses the config object
// (through db) directly. Lets fixup default valued right here instead then, so it in turn can be
// kept simple
// TODO: make intrinsic part of config defaults instead
auto& ceo = cfg->client_encryption_options();
if (is_true(get_or_default(ceo, "enabled", "false"))) {
ceo["enabled"] = "true";
ceo["certificate"] = get_or_default(ceo, "certificate", relative_conf_dir("scylla.crt").string());
ceo["keyfile"] = get_or_default(ceo, "keyfile", relative_conf_dir("scylla.key").string());
} else {
ceo["enabled"] = "false";
}
using namespace locator;
return i_endpoint_snitch::create_snitch(cfg->endpoint_snitch()).then([] {
// #293 - do not stop anything
// engine().at_exit([] { return i_endpoint_snitch::stop_snitch(); });
}).then([api_address] {
return dns::gethostbyname(api_address);
}).then([&db, api_address, api_port, &ctx] (dns::hostent e){
auto ip = e.addresses[0].in.s_addr;
ctx.http_server.start().then([api_address, api_port, ip, &ctx] {
return set_server(ctx);
}).then([api_address, api_port, ip, &ctx] {
ctx.http_server.listen(ipv4_addr{ip, api_port});
}).then([api_address, api_port] {
print("Seastar HTTP server listening on %s:%s ...\n", api_address, api_port);
});
}).then([&db] {
return init_storage_service(db);
}).then([&db, cfg] {
return db.start(std::move(*cfg)).then([&db] {
// Note: changed from using a move here, because we want the config object intact.
return db.start(std::ref(*cfg)).then([&db] {
engine().at_exit([&db] {
// #293 - do not stop anything - not even db (for real)
@@ -248,8 +323,31 @@ int main(int ac, char** av) {
});
});
});
}).then([listen_address, storage_port, seed_provider, cluster_name, phi] {
return init_ms_fd_gossiper(listen_address, storage_port, seed_provider, cluster_name, phi);
}).then([cfg, listen_address] {
// Moved local parameters here, esp since with the
// ssl stuff it gets to be a lot.
uint16_t storage_port = cfg->storage_port();
uint16_t ssl_storage_port = cfg->ssl_storage_port();
double phi = cfg->phi_convict_threshold();
auto seed_provider= cfg->seed_provider();
sstring cluster_name = cfg->cluster_name();
const auto& ssl_opts = cfg->server_encryption_options();
auto encrypt_what = get_or_default(ssl_opts, "internode_encryption", "none");
auto trust_store = get_or_default(ssl_opts, "truststore");
auto cert = get_or_default(ssl_opts, "certificate", relative_conf_dir("scylla.crt").string());
auto key = get_or_default(ssl_opts, "keyfile", relative_conf_dir("scylla.key").string());
return init_ms_fd_gossiper(listen_address
, storage_port
, ssl_storage_port
, encrypt_what
, trust_store
, cert
, key
, seed_provider
, cluster_name
, phi);
}).then([&db] {
return streaming::stream_session::init_streaming_service(db);
}).then([&proxy, &db] {
@@ -281,9 +379,9 @@ int main(int ac, char** av) {
directories.insert(db.local().get_config().data_file_directories().cbegin(),
db.local().get_config().data_file_directories().cend());
directories.insert(db.local().get_config().commitlog_directory());
return do_with(std::move(directories), [] (auto& directories) {
return parallel_for_each(directories, [] (sstring pathname) {
return disk_sanity(pathname);
return do_with(std::move(directories), [&db] (auto& directories) {
return parallel_for_each(directories, [&db] (sstring pathname) {
return disk_sanity(pathname, db.local().get_config().developer_mode());
});
});
}).then([&db] {
@@ -346,19 +444,6 @@ int main(int ac, char** av) {
}
return make_ready_future<>();
});
}).then([api_address] {
return dns::gethostbyname(api_address);
}).then([&db, api_address, api_port, &ctx] (dns::hostent e){
auto ip = e.addresses[0].in.s_addr;
ctx.http_server.start().then([api_address, api_port, ip, &ctx] {
return set_server(ctx);
}).then([api_address, api_port, ip, &ctx] {
ctx.http_server.listen(ipv4_addr{ip, api_port});
}).then([api_address, api_port] {
print("Seastar HTTP server listening on %s:%s ...\n", api_address, api_port);
});
}).then([] {
startlog.warn("Polling mode enabled. ScyllaDB will use 100% of all your CPUs.\nSee https://github.com/scylladb/scylla/issues/417 for a more detailed explanation");
});
}).or_terminate();
});

View File

@@ -33,6 +33,8 @@
#include "query-result.hh"
#include "rpc/rpc.hh"
#include "db/config.hh"
#include "dht/i_partitioner.hh"
#include "range.hh"
namespace net {
@@ -44,70 +46,6 @@ using gossip_digest_ack = gms::gossip_digest_ack;
using gossip_digest_ack2 = gms::gossip_digest_ack2;
using rpc_protocol = rpc::protocol<serializer, messaging_verb>;
using namespace std::chrono_literals;
template <typename Output>
void net::serializer::write(Output& out, const gms::gossip_digest_syn& v) const {
return write_gms(out, v);
}
template <typename Input>
gms::gossip_digest_syn net::serializer::read(Input& in, rpc::type<gms::gossip_digest_syn>) const {
return read_gms<gms::gossip_digest_syn>(in);
}
template <typename Output>
void net::serializer::write(Output& out, const gms::gossip_digest_ack2& v) const {
return write_gms(out, v);
}
template <typename Input>
gms::gossip_digest_ack2 net::serializer::read(Input& in, rpc::type<gms::gossip_digest_ack2>) const {
return read_gms<gms::gossip_digest_ack2>(in);
}
template <typename Output>
void net::serializer::write(Output& out, const streaming::messages::stream_init_message& v) const {
return write_gms(out, v);
}
template <typename Input>
streaming::messages::stream_init_message net::serializer::read(Input& in, rpc::type<streaming::messages::stream_init_message>) const {
return read_gms<streaming::messages::stream_init_message>(in);
}
template <typename Output>
void net::serializer::write(Output& out, const streaming::messages::prepare_message& v) const {
return write_gms(out, v);
}
template <typename Input>
streaming::messages::prepare_message net::serializer::read(Input& in, rpc::type<streaming::messages::prepare_message>) const {
return read_gms<streaming::messages::prepare_message>(in);
}
template <typename Output>
void net::serializer::write(Output& out, const gms::inet_address& v) const {
return write_gms(out, v);
}
template <typename Input>
gms::inet_address net::serializer::read(Input& in, rpc::type<gms::inet_address>) const {
return read_gms<gms::inet_address>(in);
}
template <typename Output>
void net::serializer::write(Output& out, const gms::gossip_digest_ack& v) const {
return write_gms(out, v);
}
template <typename Input>
gms::gossip_digest_ack net::serializer::read(Input& in, rpc::type<gms::gossip_digest_ack>) const {
return read_gms<gms::gossip_digest_ack>(in);
}
template <typename Output>
void net::serializer::write(Output& out, const query::read_command& v) const {
return write_gms(out, v);
}
template <typename Input>
query::read_command net::serializer::read(Input& in, rpc::type<query::read_command>) const {
return read_gms<query::read_command>(in);
}
template <typename Output>
void net::serializer::write(Output& out, const query::result& v) const {
write_serializable(out, v);
@@ -117,33 +55,6 @@ query::result net::serializer::read(Input& in, rpc::type<query::result>) const {
return read_serializable<query::result>(in);
}
template <typename Output>
void net::serializer::write(Output& out, const query::result_digest& v) const {
return write_gms(out, v);
}
template <typename Input>
query::result_digest net::serializer::read(Input& in, rpc::type<query::result_digest>) const {
return read_gms<query::result_digest>(in);
}
template <typename Output>
void net::serializer::write(Output& out, const utils::UUID& v) const {
return write_gms(out, v);
}
template <typename Input>
utils::UUID net::serializer::read(Input& in, rpc::type<utils::UUID>) const {
return read_gms<utils::UUID>(in);
}
// for query::range<T>
template <typename Output, typename T>
void net::serializer::write(Output& out, const query::range<T>& v) const {
write_gms(out, v);
}
template <typename Input, typename T>
query::range<T> net::serializer::read(Input& in, rpc::type<query::range<T>>) const {
return read_gms<query::range<T>>(in);
}
struct messaging_service::rpc_protocol_wrapper : public rpc_protocol { using rpc_protocol::rpc_protocol; };
@@ -156,6 +67,9 @@ public:
rpc_protocol_client_wrapper(rpc_protocol& proto, ipv4_addr addr, ipv4_addr local = ipv4_addr())
: _p(std::make_unique<rpc_protocol::client>(proto, addr, local)) {
}
rpc_protocol_client_wrapper(rpc_protocol& proto, ipv4_addr addr, ipv4_addr local, ::shared_ptr<seastar::tls::server_credentials> c)
: _p(std::make_unique<rpc_protocol::client>(proto, addr, seastar::tls::connect(c, addr, local)))
{}
auto get_stats() const { return _p->get_stats(); }
future<> stop() { return _p->stop(); }
bool error() {
@@ -244,10 +158,35 @@ void register_handler(messaging_service* ms, messaging_verb verb, Func&& func) {
}
messaging_service::messaging_service(gms::inet_address ip, uint16_t port)
: messaging_service(std::move(ip), port, encrypt_what::none, 0, nullptr)
{}
messaging_service::messaging_service(gms::inet_address ip
, uint16_t port
, encrypt_what ew
, uint16_t ssl_port
, ::shared_ptr<seastar::tls::server_credentials> credentials
)
: _listen_address(ip)
, _port(port)
, _rpc(new rpc_protocol_wrapper(serializer{}))
, _server(new rpc_protocol_server_wrapper(*_rpc, ipv4_addr{_listen_address.raw_addr(), _port})) {
, _ssl_port(ssl_port)
, _encrypt_what(ew)
, _rpc(new rpc_protocol_wrapper(serializer { }))
, _server(new rpc_protocol_server_wrapper(*_rpc, ipv4_addr { _listen_address.raw_addr(), _port }))
, _credentials(std::move(credentials))
, _server_tls([this]() -> std::unique_ptr<rpc_protocol_server_wrapper>{
if (_encrypt_what == encrypt_what::none) {
return nullptr;
}
listen_options lo;
lo.reuse_address = true;
return std::make_unique<rpc_protocol_server_wrapper>(*_rpc,
seastar::tls::listen(_credentials
, make_ipv4_address(ipv4_addr {_listen_address.raw_addr(), _ssl_port})
, lo)
);
}())
{
register_handler(this, messaging_verb::CLIENT_ID, [] (rpc::client_info& ci, gms::inet_address broadcast_address) {
ci.attach_auxiliary("baddr", broadcast_address);
return rpc::no_wait;
@@ -265,14 +204,16 @@ gms::inet_address messaging_service::listen_address() {
}
future<> messaging_service::stop() {
return when_all(
_server->stop(),
parallel_for_each(_clients, [] (auto& m) {
return parallel_for_each(m, [] (std::pair<const shard_id, shard_info>& c) {
return c.second.rpc_client->stop();
});
})
).discard_result();
return _in_flight_requests.close().then([this] {
return when_all(
_server->stop(),
parallel_for_each(_clients, [] (auto& m) {
return parallel_for_each(m, [] (std::pair<const shard_id, shard_info>& c) {
return c.second.rpc_client->stop();
});
})
).discard_result();
});
}
rpc::no_wait_type messaging_service::no_wait() {
@@ -350,8 +291,33 @@ shared_ptr<messaging_service::rpc_protocol_client_wrapper> messaging_service::ge
remove_error_rpc_client(verb, id);
}
auto remote_addr = ipv4_addr(get_preferred_ip(id.addr).raw_addr(), _port);
auto client = ::make_shared<rpc_protocol_client_wrapper>(*_rpc, remote_addr, ipv4_addr{_listen_address.raw_addr(), 0});
auto must_encrypt = [&id, this] {
if (_encrypt_what == encrypt_what::none) {
return false;
}
if (_encrypt_what == encrypt_what::all) {
return true;
}
auto& snitch_ptr = locator::i_endpoint_snitch::get_local_snitch_ptr();
if (_encrypt_what == encrypt_what::dc) {
return snitch_ptr->get_datacenter(id.addr)
== snitch_ptr->get_datacenter(utils::fb_utilities::get_broadcast_address());
}
return snitch_ptr->get_rack(id.addr)
== snitch_ptr->get_rack(utils::fb_utilities::get_broadcast_address());
}();
auto remote_addr = ipv4_addr(get_preferred_ip(id.addr).raw_addr(), must_encrypt ? _ssl_port : _port);
auto local_addr = ipv4_addr{_listen_address.raw_addr(), 0};
auto client = must_encrypt ?
::make_shared<rpc_protocol_client_wrapper>(*_rpc,
remote_addr, local_addr, _credentials) :
::make_shared<rpc_protocol_client_wrapper>(*_rpc,
remote_addr, local_addr);
it = _clients[idx].emplace(id, shard_info(std::move(client))).first;
_rpc->make_client<rpc::no_wait_type(gms::inet_address)>(messaging_verb::CLIENT_ID)(*it->second.rpc_client, utils::fb_utilities::get_broadcast_address());
return it->second.rpc_client;
@@ -391,50 +357,85 @@ std::unique_ptr<messaging_service::rpc_protocol_wrapper>& messaging_service::rpc
// Send a message for verb
template <typename MsgIn, typename... MsgOut>
auto send_message(messaging_service* ms, messaging_verb verb, shard_id id, MsgOut&&... msg) {
auto rpc_client_ptr = ms->get_rpc_client(verb, id);
auto rpc_handler = ms->rpc()->make_client<MsgIn(MsgOut...)>(verb);
auto& rpc_client = *rpc_client_ptr;
return rpc_handler(rpc_client, std::forward<MsgOut>(msg)...).then_wrapped([ms = ms->shared_from_this(), id, verb, rpc_client_ptr = std::move(rpc_client_ptr)] (auto&& f) {
try {
if (f.failed()) {
ms->increment_dropped_messages(verb);
f.get();
assert(false); // never reached
return seastar::with_gate(ms->requests_gate(), [&] {
auto rpc_client_ptr = ms->get_rpc_client(verb, id);
auto rpc_handler = ms->rpc()->make_client<MsgIn(MsgOut...)>(verb);
auto& rpc_client = *rpc_client_ptr;
return rpc_handler(rpc_client, std::forward<MsgOut>(msg)...).then_wrapped([ms = ms->shared_from_this(), id, verb, rpc_client_ptr = std::move(rpc_client_ptr)] (auto&& f) {
try {
if (f.failed()) {
ms->increment_dropped_messages(verb);
f.get();
assert(false); // never reached
}
return std::move(f);
} catch (rpc::closed_error) {
// This is a transport error
ms->remove_error_rpc_client(verb, id);
throw;
} catch (...) {
// This is expected to be a rpc server error, e.g., the rpc handler throws a std::runtime_error.
throw;
}
return std::move(f);
} catch (rpc::closed_error) {
// This is a transport error
ms->remove_error_rpc_client(verb, id);
throw;
} catch (...) {
// This is expected to be a rpc server error, e.g., the rpc handler throws a std::runtime_error.
throw;
}
});
});
}
// TODO: Remove duplicated code in send_message
template <typename MsgIn, typename... MsgOut>
auto send_message_timeout(messaging_service* ms, messaging_verb verb, shard_id id, std::chrono::milliseconds timeout, MsgOut&&... msg) {
auto rpc_client_ptr = ms->get_rpc_client(verb, id);
auto rpc_handler = ms->rpc()->make_client<MsgIn(MsgOut...)>(verb);
auto& rpc_client = *rpc_client_ptr;
return rpc_handler(rpc_client, timeout, std::forward<MsgOut>(msg)...).then_wrapped([ms = ms->shared_from_this(), id, verb, rpc_client_ptr = std::move(rpc_client_ptr)] (auto&& f) {
try {
if (f.failed()) {
ms->increment_dropped_messages(verb);
f.get();
assert(false); // never reached
template <typename MsgIn, typename Timeout, typename... MsgOut>
auto send_message_timeout(messaging_service* ms, messaging_verb verb, shard_id id, Timeout timeout, MsgOut&&... msg) {
return seastar::with_gate(ms->requests_gate(), [&] {
auto rpc_client_ptr = ms->get_rpc_client(verb, id);
auto rpc_handler = ms->rpc()->make_client<MsgIn(MsgOut...)>(verb);
auto& rpc_client = *rpc_client_ptr;
return rpc_handler(rpc_client, timeout, std::forward<MsgOut>(msg)...).then_wrapped([ms = ms->shared_from_this(), id, verb, rpc_client_ptr = std::move(rpc_client_ptr)] (auto&& f) {
try {
if (f.failed()) {
ms->increment_dropped_messages(verb);
f.get();
assert(false); // never reached
}
return std::move(f);
} catch (rpc::closed_error) {
// This is a transport error
ms->remove_error_rpc_client(verb, id);
throw;
} catch (...) {
// This is expected to be a rpc server error, e.g., the rpc handler throws a std::runtime_error.
throw;
}
return std::move(f);
} catch (rpc::closed_error) {
// This is a transport error
ms->remove_error_rpc_client(verb, id);
throw;
} catch (...) {
// This is expected to be a rpc server error, e.g., the rpc handler throws a std::runtime_error.
throw;
}
});
});
}
template <typename MsgIn, typename... MsgOut>
auto send_message_timeout_and_retry(messaging_service* ms, messaging_verb verb, shard_id id,
std::chrono::seconds timeout, int nr_retry, std::chrono::seconds wait, MsgOut... msg) {
namespace stdx = std::experimental;
using MsgInTuple = typename futurize_t<MsgIn>::value_type;
return do_with(int(nr_retry), std::move(msg)..., [ms, verb, id, timeout, wait, nr_retry] (auto& retry, const auto&... messages) {
return repeat_until_value([ms, verb, id, timeout, wait, nr_retry, &retry, &messages...] {
return send_message_timeout<MsgIn>(ms, verb, id, timeout, messages...).then_wrapped(
[verb, id, wait, nr_retry, &retry] (auto&& f) mutable {
try {
MsgInTuple ret = f.get();
if (retry != nr_retry) {
logger.info("Retry verb={} to {}, retry={}: OK", int(verb), id, retry);
}
return make_ready_future<stdx::optional<MsgInTuple>>(std::move(ret));
} catch (...) {
logger.info("Retry verb={} to {}, retry={}: {}", int(verb), id, retry, std::current_exception());
if (--retry == 0) {
throw;
}
return sleep(wait).then([] {
return make_ready_future<stdx::optional<MsgInTuple>>(stdx::nullopt);
});
}
});
}).then([] (MsgInTuple result) {
return futurize<MsgIn>::from_tuple(std::move(result));
});
});
}
@@ -444,51 +445,80 @@ auto send_message_oneway(messaging_service* ms, messaging_verb verb, shard_id id
return send_message<rpc::no_wait_type>(ms, std::move(verb), std::move(id), std::forward<MsgOut>(msg)...);
}
// Send one way message for verb
template <typename Timeout, typename... MsgOut>
auto send_message_oneway_timeout(messaging_service* ms, Timeout timeout, messaging_verb verb, shard_id id, MsgOut&&... msg) {
return send_message_timeout<rpc::no_wait_type>(ms, std::move(verb), std::move(id), timeout, std::forward<MsgOut>(msg)...);
}
// Wrappers for verbs
// Retransmission parameters for streaming verbs
// A stream plan gives up retrying in 10 minutes at most, 5 minutes at least
static constexpr int streaming_nr_retry = 10;
static constexpr std::chrono::seconds streaming_timeout{30};
static constexpr std::chrono::seconds streaming_wait_before_retry{30};
// STREAM_INIT_MESSAGE
void messaging_service::register_stream_init_message(std::function<future<unsigned> (streaming::messages::stream_init_message msg, unsigned src_cpu_id)>&& func) {
register_handler(this, messaging_verb::STREAM_INIT_MESSAGE, std::move(func));
}
future<unsigned> messaging_service::send_stream_init_message(shard_id id, streaming::messages::stream_init_message msg, unsigned src_cpu_id) {
return send_message<unsigned>(this, messaging_verb::STREAM_INIT_MESSAGE, std::move(id), std::move(msg), std::move(src_cpu_id));
return send_message_timeout_and_retry<unsigned>(this, messaging_verb::STREAM_INIT_MESSAGE, id,
streaming_timeout, streaming_nr_retry, streaming_wait_before_retry,
std::move(msg), src_cpu_id);
}
// PREPARE_MESSAGE
void messaging_service::register_prepare_message(std::function<future<streaming::messages::prepare_message> (streaming::messages::prepare_message msg, UUID plan_id,
inet_address from, inet_address connecting, unsigned src_cpu_id, unsigned dst_cpu_id)>&& func) {
register_handler(this, messaging_verb::PREPARE_MESSAGE, std::move(func));
}
future<streaming::messages::prepare_message> messaging_service::send_prepare_message(shard_id id, streaming::messages::prepare_message msg, UUID plan_id,
inet_address from, inet_address connecting, unsigned src_cpu_id, unsigned dst_cpu_id) {
return send_message<streaming::messages::prepare_message>(this, messaging_verb::PREPARE_MESSAGE, std::move(id), std::move(msg),
std::move(plan_id), std::move(from), std::move(connecting), std::move(src_cpu_id), std::move(dst_cpu_id));
return send_message_timeout_and_retry<streaming::messages::prepare_message>(this, messaging_verb::PREPARE_MESSAGE, id,
streaming_timeout, streaming_nr_retry, streaming_wait_before_retry,
std::move(msg), plan_id, from, connecting, src_cpu_id, dst_cpu_id);
}
// PREPARE_DONE_MESSAGE
void messaging_service::register_prepare_done_message(std::function<future<> (UUID plan_id, inet_address from, inet_address connecting, unsigned dst_cpu_id)>&& func) {
register_handler(this, messaging_verb::PREPARE_DONE_MESSAGE, std::move(func));
}
future<> messaging_service::send_prepare_done_message(shard_id id, UUID plan_id, inet_address from, inet_address connecting, unsigned dst_cpu_id) {
return send_message<void>(this, messaging_verb::PREPARE_DONE_MESSAGE, std::move(id), std::move(plan_id), std::move(from), std::move(connecting), std::move(dst_cpu_id));
return send_message_timeout_and_retry<void>(this, messaging_verb::PREPARE_DONE_MESSAGE, id,
streaming_timeout, streaming_nr_retry, streaming_wait_before_retry,
plan_id, from, connecting, dst_cpu_id);
}
void messaging_service::register_stream_mutation(std::function<future<> (UUID plan_id, frozen_mutation fm, unsigned dst_cpu_id)>&& func) {
// STREAM_MUTATION
void messaging_service::register_stream_mutation(std::function<future<> (const rpc::client_info& cinfo, UUID plan_id, frozen_mutation fm, unsigned dst_cpu_id)>&& func) {
register_handler(this, messaging_verb::STREAM_MUTATION, std::move(func));
}
future<> messaging_service::send_stream_mutation(shard_id id, UUID plan_id, frozen_mutation fm, unsigned dst_cpu_id) {
return send_message<void>(this, messaging_verb::STREAM_MUTATION, std::move(id), std::move(plan_id), std::move(fm), std::move(dst_cpu_id));
return send_message_timeout_and_retry<void>(this, messaging_verb::STREAM_MUTATION, id,
streaming_timeout, streaming_nr_retry, streaming_wait_before_retry,
plan_id, std::move(fm), dst_cpu_id);
}
void messaging_service::register_stream_mutation_done(std::function<future<> (UUID plan_id, UUID cf_id, inet_address from, inet_address connecting, unsigned dst_cpu_id)>&& func) {
// STREAM_MUTATION_DONE
void messaging_service::register_stream_mutation_done(std::function<future<> (UUID plan_id, std::vector<range<dht::token>> ranges, UUID cf_id, inet_address from, inet_address connecting, unsigned dst_cpu_id)>&& func) {
register_handler(this, messaging_verb::STREAM_MUTATION_DONE, std::move(func));
}
future<> messaging_service::send_stream_mutation_done(shard_id id, UUID plan_id, UUID cf_id, inet_address from, inet_address connecting, unsigned dst_cpu_id) {
return send_message<void>(this, messaging_verb::STREAM_MUTATION_DONE, std::move(id), std::move(plan_id), std::move(cf_id), std::move(from), std::move(connecting), std::move(dst_cpu_id));
future<> messaging_service::send_stream_mutation_done(shard_id id, UUID plan_id, std::vector<range<dht::token>> ranges, UUID cf_id, inet_address from, inet_address connecting, unsigned dst_cpu_id) {
return send_message_timeout_and_retry<void>(this, messaging_verb::STREAM_MUTATION_DONE, id,
streaming_timeout, streaming_nr_retry, streaming_wait_before_retry,
plan_id, std::move(ranges), cf_id, from, connecting, dst_cpu_id);
}
void messaging_service::register_complete_message(std::function<rpc::no_wait_type (UUID plan_id, inet_address from, inet_address connecting, unsigned dst_cpu_id)>&& func) {
// COMPLETE_MESSAGE
void messaging_service::register_complete_message(std::function<future<> (UUID plan_id, inet_address from, inet_address connecting, unsigned dst_cpu_id)>&& func) {
register_handler(this, messaging_verb::COMPLETE_MESSAGE, std::move(func));
}
future<> messaging_service::send_complete_message(shard_id id, UUID plan_id, inet_address from, inet_address connecting, unsigned dst_cpu_id) {
return send_message_oneway(this, messaging_verb::COMPLETE_MESSAGE, std::move(id), std::move(plan_id), std::move(from), std::move(connecting), std::move(dst_cpu_id));
return send_message_timeout_and_retry<void>(this, messaging_verb::COMPLETE_MESSAGE, id,
streaming_timeout, streaming_nr_retry, streaming_wait_before_retry,
plan_id, from, connecting, dst_cpu_id);
}
void messaging_service::register_echo(std::function<future<> ()>&& func) {
@@ -541,14 +571,14 @@ future<> messaging_service::send_definitions_update(shard_id id, std::vector<fro
return send_message_oneway(this, messaging_verb::DEFINITIONS_UPDATE, std::move(id), std::move(fm));
}
void messaging_service::register_migration_request(std::function<future<std::vector<frozen_mutation>> (gms::inet_address reply_to, unsigned shard)>&& func) {
void messaging_service::register_migration_request(std::function<future<std::vector<frozen_mutation>> ()>&& func) {
register_handler(this, net::messaging_verb::MIGRATION_REQUEST, std::move(func));
}
void messaging_service::unregister_migration_request() {
_rpc->unregister_handler(net::messaging_verb::MIGRATION_REQUEST);
}
future<std::vector<frozen_mutation>> messaging_service::send_migration_request(shard_id id, gms::inet_address reply_to, unsigned shard) {
return send_message<std::vector<frozen_mutation>>(this, messaging_verb::MIGRATION_REQUEST, std::move(id), std::move(reply_to), std::move(shard));
future<std::vector<frozen_mutation>> messaging_service::send_migration_request(shard_id id) {
return send_message<std::vector<frozen_mutation>>(this, messaging_verb::MIGRATION_REQUEST, std::move(id));
}
void messaging_service::register_mutation(std::function<rpc::no_wait_type (frozen_mutation fm, std::vector<inet_address> forward,
@@ -558,9 +588,9 @@ void messaging_service::register_mutation(std::function<rpc::no_wait_type (froze
void messaging_service::unregister_mutation() {
_rpc->unregister_handler(net::messaging_verb::MUTATION);
}
future<> messaging_service::send_mutation(shard_id id, const frozen_mutation& fm, std::vector<inet_address> forward,
future<> messaging_service::send_mutation(shard_id id, clock_type::time_point timeout, const frozen_mutation& fm, std::vector<inet_address> forward,
inet_address reply_to, unsigned shard, response_id_type response_id) {
return send_message_oneway(this, messaging_verb::MUTATION, std::move(id), fm, std::move(forward),
return send_message_oneway_timeout(this, timeout, messaging_verb::MUTATION, std::move(id), fm, std::move(forward),
std::move(reply_to), std::move(shard), std::move(response_id));
}

View File

@@ -36,6 +36,9 @@
#include "query-request.hh"
#include "db/serializer.hh"
#include "mutation_query.hh"
#include <seastar/core/gate.hh>
#include <seastar/net/tls.hh>
// forward declarations
namespace streaming { namespace messages {
@@ -278,9 +281,29 @@ struct serializer {
return read_serializable<reconcilable_result>(in);
}
// For complex types which have serialize()/deserialize(), e.g. gms::gossip_digest_syn, gms::gossip_digest_ack2
template <typename Output>
void write(Output& out, const query::result& v) const;
template <typename Input>
query::result read(Input& in, rpc::type<query::result>) const;
// Default implementation for any type which knows how to serialize itself
// with methods serialize(), deserialize() and serialized_size() with the
// following signatures:
// void serialize(bytes::iterator& out) const;
// size_t serialized_size() const;
// static T deserialize(bytes_view& in);
//
// One inefficiency inherent in this API is that deserialize() expects
// the serialized data to have been already read into a contiguous buffer,
// and to do this, the reader needs to know in advance how much to read,
// so we are forced to precede the serialized data by its length - even
// though the deserialize() function should already know where to stop.
// Even a fixed-length object will end up preceeded by its length.
// This waste can be avoided by implementing special read()/write()
// functions for this type, above.
template <typename T, typename Output>
void write_gms(Output& out, const T& v) const {
void write(Output& out, const T& v) const {
uint32_t sz = v.serialized_size();
write(out, sz);
bytes b(bytes::initialized_later(), sz);
@@ -289,7 +312,7 @@ struct serializer {
out.write(reinterpret_cast<const char*>(b.c_str()), sz);
}
template <typename T, typename Input>
T read_gms(Input& in) const {
T read(Input& in, rpc::type<T>) const {
auto sz = read(in, rpc::type<uint32_t>());
bytes b(bytes::initialized_later(), sz);
in.read(reinterpret_cast<char*>(b.begin()), sz);
@@ -297,62 +320,6 @@ struct serializer {
return T::deserialize(bv);
}
template <typename Output>
void write(Output& out, const gms::gossip_digest_syn& v) const;
template <typename Input>
gms::gossip_digest_syn read(Input& in, rpc::type<gms::gossip_digest_syn>) const;
template <typename Output>
void write(Output& out, const gms::gossip_digest_ack2& v) const;
template <typename Input>
gms::gossip_digest_ack2 read(Input& in, rpc::type<gms::gossip_digest_ack2>) const;
template <typename Output>
void write(Output& out, const streaming::messages::stream_init_message& v) const;
template <typename Input>
streaming::messages::stream_init_message read(Input& in, rpc::type<streaming::messages::stream_init_message>) const;
template <typename Output>
void write(Output& out, const streaming::messages::prepare_message& v) const;
template <typename Input>
streaming::messages::prepare_message read(Input& in, rpc::type<streaming::messages::prepare_message>) const;
template <typename Output>
void write(Output& out, const gms::inet_address& v) const;
template <typename Input>
gms::inet_address read(Input& in, rpc::type<gms::inet_address>) const;
template <typename Output>
void write(Output& out, const gms::gossip_digest_ack& v) const;
template <typename Input>
gms::gossip_digest_ack read(Input& in, rpc::type<gms::gossip_digest_ack>) const;
template <typename Output>
void write(Output& out, const query::read_command& v) const;
template <typename Input>
query::read_command read(Input& in, rpc::type<query::read_command>) const;
template <typename Output>
void write(Output& out, const query::result& v) const;
template <typename Input>
query::result read(Input& in, rpc::type<query::result>) const;
template <typename Output>
void write(Output& out, const query::result_digest& v) const;
template <typename Input>
query::result_digest read(Input& in, rpc::type<query::result_digest>) const;
template <typename Output>
void write(Output& out, const utils::UUID& v) const;
template <typename Input>
utils::UUID read(Input& in, rpc::type<utils::UUID>) const;
// for query::range<T>
template <typename Output, typename T>
void write(Output& out, const query::range<T>& v) const;
template <typename Input, typename T>
query::range<T> read(Input& input, rpc::type<query::range<T>>) const;
template <typename Output, typename T>
void write(Output& out, const foreign_ptr<T>& v) const {
return write(out, *v);
@@ -431,23 +398,40 @@ public:
bool knows_version(const gms::inet_address& endpoint) const;
enum class encrypt_what {
none,
rack,
dc,
all,
};
private:
gms::inet_address _listen_address;
uint16_t _port;
uint16_t _ssl_port;
encrypt_what _encrypt_what;
// map: Node broadcast address -> Node internal IP for communication within the same data center
std::unordered_map<gms::inet_address, gms::inet_address> _preferred_ip_cache;
std::unique_ptr<rpc_protocol_wrapper> _rpc;
std::unique_ptr<rpc_protocol_server_wrapper> _server;
::shared_ptr<seastar::tls::server_credentials> _credentials;
std::unique_ptr<rpc_protocol_server_wrapper> _server_tls;
std::array<clients_map, 2> _clients;
uint64_t _dropped_messages[static_cast<int32_t>(messaging_verb::LAST)] = {};
seastar::gate _in_flight_requests;
public:
using clock_type = std::chrono::steady_clock;
public:
messaging_service(gms::inet_address ip = gms::inet_address("0.0.0.0"), uint16_t port = 7000);
messaging_service(gms::inet_address ip, uint16_t port, encrypt_what,
uint16_t ssl_port, ::shared_ptr<seastar::tls::server_credentials>);
~messaging_service();
public:
uint16_t port();
gms::inet_address listen_address();
future<> stop();
static rpc::no_wait_type no_wait();
seastar::gate& requests_gate() { return _in_flight_requests; }
public:
gms::inet_address get_preferred_ip(gms::inet_address ep);
future<> init_local_preferred_ip_cache();
@@ -468,13 +452,13 @@ public:
future<> send_prepare_done_message(shard_id id, UUID plan_id, inet_address from, inet_address connecting, unsigned dst_cpu_id);
// Wrapper for STREAM_MUTATION verb
void register_stream_mutation(std::function<future<> (UUID plan_id, frozen_mutation fm, unsigned dst_cpu_id)>&& func);
void register_stream_mutation(std::function<future<> (const rpc::client_info& cinfo, UUID plan_id, frozen_mutation fm, unsigned dst_cpu_id)>&& func);
future<> send_stream_mutation(shard_id id, UUID plan_id, frozen_mutation fm, unsigned dst_cpu_id);
void register_stream_mutation_done(std::function<future<> (UUID plan_id, UUID cf_id, inet_address from, inet_address connecting, unsigned dst_cpu_id)>&& func);
future<> send_stream_mutation_done(shard_id id, UUID plan_id, UUID cf_id, inet_address from, inet_address connecting, unsigned dst_cpu_id);
void register_stream_mutation_done(std::function<future<> (UUID plan_id, std::vector<range<dht::token>> ranges, UUID cf_id, inet_address from, inet_address connecting, unsigned dst_cpu_id)>&& func);
future<> send_stream_mutation_done(shard_id id, UUID plan_id, std::vector<range<dht::token>> ranges, UUID cf_id, inet_address from, inet_address connecting, unsigned dst_cpu_id);
void register_complete_message(std::function<rpc::no_wait_type (UUID plan_id, inet_address from, inet_address connecting, unsigned dst_cpu_id)>&& func);
void register_complete_message(std::function<future<> (UUID plan_id, inet_address from, inet_address connecting, unsigned dst_cpu_id)>&& func);
future<> send_complete_message(shard_id id, UUID plan_id, inet_address from, inet_address connecting, unsigned dst_cpu_id);
// Wrapper for ECHO verb
@@ -503,9 +487,9 @@ public:
future<> send_definitions_update(shard_id id, std::vector<frozen_mutation> fm);
// Wrapper for MIGRATION_REQUEST
void register_migration_request(std::function<future<std::vector<frozen_mutation>> (gms::inet_address reply_to, unsigned shard)>&& func);
void register_migration_request(std::function<future<std::vector<frozen_mutation>> ()>&& func);
void unregister_migration_request();
future<std::vector<frozen_mutation>> send_migration_request(shard_id id, gms::inet_address reply_to, unsigned shard);
future<std::vector<frozen_mutation>> send_migration_request(shard_id id);
// FIXME: response_id_type is an alias in service::storage_proxy::response_id_type
using response_id_type = uint64_t;
@@ -513,7 +497,7 @@ public:
void register_mutation(std::function<rpc::no_wait_type (frozen_mutation fm, std::vector<inet_address> forward,
inet_address reply_to, unsigned shard, response_id_type response_id)>&& func);
void unregister_mutation();
future<> send_mutation(shard_id id, const frozen_mutation& fm, std::vector<inet_address> forward,
future<> send_mutation(shard_id id, clock_type::time_point timeout, const frozen_mutation& fm, std::vector<inet_address> forward,
inet_address reply_to, unsigned shard, response_id_type response_id);
// Wrapper for MUTATION_DONE

View File

@@ -116,6 +116,15 @@ mutation_partition::operator=(const mutation_partition& x) {
return *this;
}
mutation_partition&
mutation_partition::operator=(mutation_partition&& x) noexcept {
if (this != &x) {
this->~mutation_partition();
new (this) mutation_partition(std::move(x));
}
return *this;
}
void
mutation_partition::apply(const schema& schema, const mutation_partition& p) {
_tombstone.apply(p._tombstone);
@@ -181,10 +190,12 @@ mutation_partition::range_tombstone_for_row(const schema& schema, const clusteri
}
auto c = row_tombstones_entry::key_comparator(
clustering_key::prefix_view_type::less_compare_with_prefix(schema));
clustering_key_prefix::prefix_view_type::less_compare_with_prefix(schema));
// _row_tombstones contains only strict prefixes
for (unsigned prefix_len = 1; prefix_len < schema.clustering_key_size(); ++prefix_len) {
unsigned key_length = std::distance(key.begin(schema), key.end(schema));
assert(key_length <= schema.clustering_key_size());
for (unsigned prefix_len = 1; prefix_len <= key_length; ++prefix_len) {
auto i = _row_tombstones.find(key.prefix_view(schema, prefix_len), c);
if (i != _row_tombstones.end()) {
t.apply(i->t());
@@ -272,15 +283,6 @@ void mutation_partition::insert_row(const schema& s, const clustering_key& key,
_rows.insert(_rows.end(), *e);
}
const rows_entry*
mutation_partition::find_entry(const schema& schema, const clustering_key_prefix& key) const {
auto i = _rows.find(key, rows_entry::key_comparator(clustering_key::less_compare_with_prefix(schema)));
if (i == _rows.end()) {
return nullptr;
}
return &*i;
}
const row*
mutation_partition::find_row(const clustering_key& key) const {
auto i = _rows.find(key);
@@ -325,7 +327,7 @@ mutation_partition::clustered_row(const schema& s, const clustering_key_view& ke
boost::iterator_range<mutation_partition::rows_type::const_iterator>
mutation_partition::range(const schema& schema, const query::range<clustering_key_prefix>& r) const {
auto cmp = rows_entry::key_comparator(clustering_key::prefix_equality_less_compare(schema));
auto cmp = rows_entry::key_comparator(clustering_key_prefix::prefix_equality_less_compare(schema));
auto i1 = r.start() ? (r.start()->is_inclusive()
? _rows.lower_bound(r.start()->value(), cmp)
: _rows.upper_bound(r.start()->value(), cmp)) : _rows.cbegin();
@@ -439,20 +441,23 @@ mutation_partition::query(query::result::partition_writer& pw,
// To avoid retraction of the partition entry in case of limit == 0.
assert(limit > 0);
bool any_live = has_any_live_data(s, column_kind::static_column, static_row(), _tombstone, now);
if (!slice.static_columns.empty()) {
auto row_builder = pw.add_static_row();
get_row_slice(s, column_kind::static_column, static_row(), slice.static_columns, partition_tombstone(), now, row_builder);
row_builder.finish();
}
// Like PK range, an empty row range, should be considered an "exclude all" restriction
bool has_ck_selector = pw.ranges().empty();
auto is_reversed = slice.options.contains(query::partition_slice::option::reversed);
for (auto&& row_range : pw.ranges()) {
if (limit == 0) {
break;
}
has_ck_selector |= !row_range.is_full();
// FIXME: Optimize for a full-tuple singular range. mutation_partition::range()
// does two lookups to form a range, even for singular range. We need
// only one lookup for a full-tuple singular range though.
@@ -461,7 +466,6 @@ mutation_partition::query(query::result::partition_writer& pw,
auto row_tombstone = tombstone_for_row(s, e);
if (row.is_live(s, row_tombstone, now)) {
any_live = true;
auto row_builder = pw.add_row(e.key());
get_row_slice(s, column_kind::regular_column, row.cells(), slice.regular_columns, row_tombstone, now, row_builder);
row_builder.finish();
@@ -473,11 +477,19 @@ mutation_partition::query(query::result::partition_writer& pw,
});
}
if (!any_live) {
pw.retract();
} else {
pw.finish();
}
// If we got no rows, but have live static columns, we should only
// give them back IFF we did not have any CK restrictions.
// #589
// If ck:s exist, and we do a restriction on them, we either have maching
// rows, or return nothing, since cql does not allow "is null".
if (pw.row_count() == 0
&& (has_ck_selector
|| !has_any_live_data(s, column_kind::static_column,
static_row(), _tombstone, now))) {
pw.retract();
} else {
pw.finish();
}
}
std::ostream&
@@ -607,6 +619,8 @@ void
merge_column(const column_definition& def,
atomic_cell_or_collection& old,
atomic_cell_or_collection&& neww) {
old.linearize();
neww.linearize();
if (def.is_atomic()) {
if (compare_atomic_cell_for_merge(old.as_atomic_cell(), neww.as_atomic_cell()) < 0) {
old = std::move(neww);
@@ -615,6 +629,7 @@ merge_column(const column_definition& def,
auto ct = static_pointer_cast<const collection_type_impl>(def.type);
old = ct->merge(old.as_collection_mutation(), neww.as_collection_mutation());
}
old.unlinearize();
}
void
@@ -761,7 +776,12 @@ uint32_t mutation_partition::do_compact(const schema& s,
trim_rows<false>(s, row_ranges, row_callback);
}
if (row_count == 0 && static_row_live) {
// #589 - Do not add extra row for statics unless we did a CK range-less query.
// See comment in query
if (row_count == 0 && static_row_live
&& std::any_of(row_ranges.begin(), row_ranges.end(), [](auto& r) {
return r.is_full();
})) {
++row_count;
}
@@ -851,8 +871,7 @@ mutation_partition::live_row_count(const schema& s, gc_clock::time_point query_t
}
rows_entry::rows_entry(rows_entry&& o) noexcept
: _link(std::move(o._link))
, _key(std::move(o._key))
: _key(std::move(o._key))
, _row(std::move(o._row))
{
using container_type = mutation_partition::rows_type;

View File

@@ -559,7 +559,7 @@ public:
mutation_partition(const mutation_partition&);
~mutation_partition();
mutation_partition& operator=(const mutation_partition& x);
mutation_partition& operator=(mutation_partition&& x) = default;
mutation_partition& operator=(mutation_partition&& x) noexcept;
bool equal(const schema& s, const mutation_partition&) const;
friend std::ostream& operator<<(std::ostream& os, const mutation_partition& mp);
public:
@@ -656,7 +656,6 @@ public:
const rows_type& clustered_rows() const { return _rows; }
const row_tombstones_type& row_tombstones() const { return _row_tombstones; }
const row* find_row(const clustering_key& key) const;
const rows_entry* find_entry(const schema& schema, const clustering_key_prefix& key) const;
tombstone range_tombstone_for_row(const schema& schema, const clustering_key& key) const;
tombstone tombstone_for_row(const schema& schema, const clustering_key& key) const;
tombstone tombstone_for_row(const schema& schema, const rows_entry& e) const;

View File

@@ -167,3 +167,23 @@ mutation_query(const mutation_source& source,
});
});
}
std::ostream& operator<<(std::ostream& out, const reconcilable_result::printer& pr) {
out << "{rows=" << pr.self.row_count() << ", [";
bool first = true;
for (const partition& p : pr.self.partitions()) {
if (!first) {
out << ", ";
}
first = false;
out << "{rows=" << p.row_count() << ", ";
out << p._m.pretty_printer(pr.schema);
out << "}";
}
out << "]}";
return out;
}
reconcilable_result::printer reconcilable_result::pretty_printer(schema_ptr s) const {
return { *this, std::move(s) };
}

Some files were not shown because too many files have changed in this diff Show More