Compare commits

..

478 Commits

Author SHA1 Message Date
Avi Kivity
b2eb0810a2 build: support for alternative versions of libsystemd pkgconfig
While pkgconfig is supposed to be a distribution and version neutral way
of detecting packages, it doesn't always work this way.  The sd_notify()
manual page documents that sd_notify is available via the libsystemd
package, but on centos 7.0 it is only available via the libsystemd-daemon
package (on centos 7.1+ it works as expected).

Fix by allowing for alternate version of package names, testing each one
until a match is found.

Fixes #879.

Message-Id: <1454858862-5239-1-git-send-email-avi@scylladb.com>
(cherry picked from commit 8b0a26f06d)
2016-02-07 17:38:10 +02:00
Avi Kivity
14d029bf71 Merge "Sstable cleanup fixes" from Tomasz
"  - Added waiting for async cleanup on clean shutdown

  - Crash in the middle of sstable removal doesn't leave system in a non-bootable state"

(cherry picked from commit f3ca597a01)
2016-02-04 16:43:09 +02:00
Pekka Enberg
38470b4d28 release: prepare for 0.17 2016-01-28 14:44:40 +02:00
Raphael S. Carvalho
3b7970baff compaction: delete generated sstables in event of an interrupt
Generated sstables may imply either fully or partially written.
Compaction is interrupted if it was deriberately asked to stop (stop API)
or it was forced to do so in event of a failure, ex: out of disk space.
There is a need to explicitly delete sstables generated by a compaction
that was interrupted. Otherwise, such sstables will waste disk space and
even worsen read performance, which degrades as number of generations
to look at increases.

Fixes #852.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <49212dbf485598ae839c8e174e28299f7127f63e.1453912119.git.raphaelsc@scylladb.com>
2016-01-28 14:05:57 +02:00
Pekka Enberg
3c3c819280 Merge "api: Fix stream_manager" from Asias
"Fix the metrics for bytes sent and received"
2016-01-28 13:57:59 +02:00
Raphael S. Carvalho
ba4260ea8f api: print proper compaction type
There are several compaction types, and we should print the correct
one when listing ongoing compaction. Currently, we only support
compaction types: COMPACTION and CLEANUP.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <c96b1508a8216bf5405b1a0b0f8489d5cc4be844.1453851299.git.raphaelsc@scylladb.com>
2016-01-28 13:47:00 +02:00
Tomasz Grabiec
9fa62af96b database: Move implementation to .cc
Message-Id: <1453980679-27226-1-git-send-email-tgrabiec@scylladb.com>
2016-01-28 13:35:33 +02:00
Tomasz Grabiec
ca6bafbb56 canonical_mutation: Remove commented out junk 2016-01-28 12:29:20 +01:00
Tomasz Grabiec
41dc98bb79 Merge branch 'cleanup_improvements' from git@github.com:raphaelsc/scylla.git
Compaction cleanup improvements from Raphael.
2016-01-27 18:30:46 +01:00
Avi Kivity
873deb5808 Merge "move paging_state to use idl" from Gleb 2016-01-27 19:06:04 +02:00
Takuya ASADA
03caacaad0 dist: enable collectd client by default
Fixes #838

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453910311-24928-2-git-send-email-syuu@scylladb.com>
2016-01-27 18:45:45 +02:00
Takuya ASADA
f33656ef03 dist: eliminate startup script
Fixes #373

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453910311-24928-1-git-send-email-syuu@scylladb.com>
2016-01-27 18:45:35 +02:00
Gleb Natapov
b065e2003f Move paging_state to use idl 2016-01-27 18:39:43 +02:00
Raphael S. Carvalho
45c446d6eb compaction: pass dht::token by reference
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-27 13:25:41 -02:00
Raphael S. Carvalho
fc541e2f08 compaction: remove code to sort local ranges
storage_service::get_local_ranges returns sorted ranges, which are
not overlapping nor wrap-around. As a result, there is no need for
the consumer to do anything.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-27 13:15:36 -02:00
Avi Kivity
c75d1c4eeb Merge "Ubuntu 'expect stop' related fixes" from Takuya 2016-01-27 17:00:23 +02:00
Gleb Natapov
65bd429a0b Add serialization helper to use outside of rpc. 2016-01-27 16:43:06 +02:00
Takuya ASADA
4162fb158c main: raise SIGSTOP only when scylla become ready
supervisor_notify() calls periodically, to log message on systemd.
So raise(SIGSTOP) will called multiple times, upstart doesn't expected that.
We need to call it just one time.

Fixes #846

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-27 23:30:26 +09:00
Takuya ASADA
851951d32d dist: run upstart job as 'scylla' user
Don't use sudo when launching scylla, run directly from upstart.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-27 23:30:02 +09:00
Takuya ASADA
89f0fc89b4 dist: set ulimit in upstart job
Upstart job able to specify ulimit like systemd, so drop ubuntu's scylla_run and merge with redhat one.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-27 23:29:52 +09:00
Takuya ASADA
b4accd8904 main: autodetect systemd/upstart
We can autodetect systemd/upstart by environment variables, don't need program argument.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-27 23:29:32 +09:00
Takuya ASADA
559f913494 dist: use nightly for prebuilt 3rdparty packages (CentOS)
Developers probably wants to use latest dependency packages, so switch to nightly.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453904521-2716-1-git-send-email-syuu@scylladb.com>
2016-01-27 16:24:49 +02:00
Gleb Natapov
19c55693fd idl: add missing header to serializer.hh 2016-01-27 15:49:29 +02:00
Amnon Heiman
7b53b99968 idl-compiler: split the idl list
Not all the idls are used by the messaging service, this patch removes
the auto-generated single include file that holds all the files and
replaes it with individual include of the generated fiels.
The patch does the following:
* It removes from the auto-generated inc file and clean the configure.py
  from it.
* It places an explicit include for each generated file in
  messaging_serivce.
* It add dependency of the generated code in the idl-compiler, so a
change in the compiler will trigger recreation of the generated files.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1453900241-13053-1-git-send-email-amnon@scylladb.com>
2016-01-27 15:23:00 +02:00
Pekka Enberg
86173fb8cc db/commitlog: Fix debug log format string in commitlog_replayer::recover()
I saw the following Boost format string related warning during commitlog
replay:

  INFO  [shard 0] commitlog_replayer - Replaying node3/commitlog/CommitLog-1-72057594289748293.log, node3/commitlog/CommitLog-1-90071992799230277.log, node3/commitlog/CommitLog-1-108086391308712261.log, node3/commitlog/CommitLog-1-251820357.log, node3/commitlog/CommitLog-1-54043195780266309.log, node3/commitlog/CommitLog-1-36028797270784325.log, node3/commitlog/CommitLog-1-126100789818194245.log, node3/commitlog/CommitLog-1-18014398761302341.log, node3/commitlog/CommitLog-1-126100789818194246.log, node3/commitlog/CommitLog-1-251820358.log, node3/commitlog/CommitLog-1-18014398761302342.log, node3/commitlog/CommitLog-1-36028797270784326.log, node3/commitlog/CommitLog-1-54043195780266310.log, node3/commitlog/CommitLog-1-72057594289748294.log, node3/commitlog/CommitLog-1-90071992799230278.log, node3/commitlog/CommitLog-1-108086391308712262.log
  WARN  [shard 0] commitlog_replayer - error replaying: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::io::too_many_args> > (boost::too_many_args: format-string referred to less arguments than were passed)

While inspecting the code, I noticed that one of the error loggers is
missing an argument. As I don't know how the original failure triggered,
I wasn't able to verify that that was the only one, though.

Message-Id: <1453893301-23128-1-git-send-email-penberg@scylladb.com>
2016-01-27 13:40:19 +02:00
Pekka Enberg
c4dafe24f5 Update scylla-ami submodule
* dist/ami/files/scylla-ami 77cde04...e284bcd (2):
  > Run scylla.yaml construction only once
  > Revert "Run scylla.yaml construction only once"
2016-01-27 13:30:04 +02:00
Asias He
9fee1cc43a api: Use get_bytes_{received,sent} in stream_manager
The data in session_info is not correctly updated.

Tested while decommission a node:

$ curl -X GET  --silent --header "Accept: application/json"
"http://127.0.0.$i:10000/stream_manager/metrics/incoming";echo

$ curl -X GET --silent --header "Accept: application/json"
"http://127.0.0.$i:10000/stream_manager/metrics/outgoing";echo
2016-01-27 18:17:36 +08:00
Asias He
03aced39c4 streaming: Account number of bytes sent and received per session
The API will consume it soon.
2016-01-27 18:16:58 +08:00
Asias He
36829c4c87 api: Fix stream_manager total_incoming/outgoing bytes
Any stream, no matter initialized by us or initialized by a peer node,
can send and receive data. We should audit incoming/outgoing bytes in
the all streams.
2016-01-27 18:15:09 +08:00
Asias He
08f703ddf6 streaming: Add get_all_streams in stream_manager
Get all streams both initialized by us or initialized by peer node.
2016-01-27 18:15:09 +08:00
Tomasz Grabiec
c971544e83 bytes_ostream: Adapt to Output concept used in serializer.hh
Message-Id: <1453888242-2086-1-git-send-email-tgrabiec@scylladb.com>
2016-01-27 12:13:34 +02:00
Gleb Natapov
6a581bb8b6 messaging_service: replace rpc::type with boost::type
RPC moved to boost::type to make serializers less rpc centric. Scylla
should follow.

Message-Id: <20160126164450.GA11706@scylladb.com>
2016-01-27 11:57:45 +02:00
Gleb Natapov
6f6b231839 Make serializer use new simple stream location
Message-Id: <20160127093045.GG9236@scylladb.com>
2016-01-27 11:37:37 +02:00
Raphael S. Carvalho
d54c77d5d0 change abstract_replication_strategy::get_ranges to not return wrap-arounds
The main motivation behind this change is to make get_ranges() easier for
consumers to work with the returned ranges, e.g. binary search to find a
range in which a token is contained. In addition, a wrap-around range
introduces corner cases, so we should avoid it altogether.

Suppose that a node owns three tokens: -5, 6, 8

get_ranges() would return the following ranges:
(8, -5], (-5, 6], (6, 8]
get_ranges() will now return the following ranges:
(-inf, -5], (-5, 6], (6, 8], (8, +inf)

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <4bda1428d1ebbe7c8af25aa65119edc5b97bc2eb.1453827605.git.raphaelsc@scylladb.com>
2016-01-27 09:48:31 +01:00
Avi Kivity
b9ab28a0e6 Merge "storage_service: add drain on shutdown logic & fix" from Asias
"Fixes:
- storage_service::handle_state_removing() doesn't call drain() #825
https://github.com/scylladb/scylla/issues/825

- nodetool gossipinfo is out of sync #790
https://github.com/scylladb/scylla/issues/790"
2016-01-27 10:38:56 +02:00
Avi Kivity
1d7144ac14 Merge seastar upstream
* seastar bdb273a...ec468ba (1):
  > Move simple streams used for serialization into separate header
2016-01-27 10:38:09 +02:00
Amnon Heiman
fd94009d0e Fix API init process
The last patch of the API init process had a bug, that the wrong init
function was called.

This solve the issue.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1453879042-26926-1-git-send-email-amnon@scylladb.com>
2016-01-27 10:03:24 +02:00
Asias He
8b4275126d storage_service: Shutdown messaging_service in decommission
It is commented out.
2016-01-27 11:48:49 +08:00
Asias He
b2f2c1c28c storage_service: Add drain on shutdown logic
We register engine().at_exit() callbacks when we initialize the services. We
do not really call the callbacks at the moment due to #293.

It is pretty hard to see the whole picture in which order the services
are shutdown. Instead of for each services to register a at_exit()
callbacks, I proposal to have a single at_exit() callback which do the
shutdown for all the services. In cassandra, the shutdown work is done
in storage_service::drain_on_shutdown callbacks.

In this patch, the drain_on_shutdown is executed during shutdown.

As a result, the proper gossip shutdown is executed and fixes #790.

With this patch, when Ctrl-C on a node, it looks like:

INFO  [shard 0] storage_service - Drain on shutdown: starts
INFO  [shard 0] gossip - Announcing shutdown
INFO  [shard 0] storage_service - Node 127.0.0.1 state jump to normal
INFO  [shard 0] storage_service - Drain on shutdown: stop_gossiping done
INFO  [shard 0] storage_service - CQL server stopped
INFO  [shard 0] storage_service - Drain on shutdown: shutdown rpc and cql server done
INFO  [shard 0] storage_service - Drain on shutdown: shutdown messaging_service done
INFO  [shard 0] storage_service - Drain on shutdown: flush column_families done
INFO  [shard 0] storage_service - Drain on shutdown: shutdown commitlog done
INFO  [shard 0] storage_service - Drain on shutdown: done
2016-01-27 11:45:52 +08:00
Asias He
e733930dff storage_service: Call drain inside handle_state_removing
Now that drain is implemented, call it.

Fixes #825
2016-01-27 11:45:52 +08:00
Asias He
5003c6e78b config: Introduce shutdown_announce_in_ms option
Time a node waits after sending gossip shutdown message in milliseconds.

Reduces ./cql_query_test execution time

from
   real    2m24.272s
   user    0m8.339s
   sys     0m10.556s

to
   real    1m17.765s
   user    0m3.698s
   sys     0m11.578
2016-01-27 11:19:38 +08:00
Paweł Dziepak
490201fd1c row_cache: protect against stale entries
row_cache::update() does not explicitly invalidate the entries it failed
to update in case of a failure. This could lead to inconsistency between
row cache and sstables.

In paractice that's not a problem because before row_cache::update()
fails it will cause all entries in the cache to be invalidated during
memory reclaim, but it's better to be safe and explicitly remove entries
that should be updated but it was not possible to do so.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1453829681-29239-1-git-send-email-pdziepak@scylladb.com>
2016-01-26 20:34:41 +01:00
Takuya ASADA
9b66d00115 dist: fix scylla_bootparam_setup for Ubuntu
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453836012-6436-1-git-send-email-syuu@scylladb.com>
2016-01-26 21:24:20 +02:00
Erich Keane
c836c88850 Replace deprecated BOOST_MESSAGE with BOOST_TEST_MESSAGE
BOOST Unit test deprecated BOOST_MESSAGE as early as 1.34 and had it
been perminently removed.  This patch replaces all uses of BOOST_MESSAGE
with BOOST_TEST_MESSAGE.

Signed-off-by: Erich Keane <erich.keane@verizon.net>
Message-Id: <1453783854-4274-1-git-send-email-erich.keane@verizon.net>
2016-01-26 19:01:40 +02:00
Amnon Heiman
b1845cddec Breaking the API initialization into stages
The API needs to be available at an early stage of the initialization,
on the other hand not all the specific APIs are available at that time.

This patch breaks the API initialization into stages, in each stage
additional commands will be available.

While setting that the api header files was broken into api_init.hh that
is relevent to the main and to api.hh which holds the different
api helper functions.

Fixes #754

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1453822331-16729-2-git-send-email-amnon@scylladb.com>
2016-01-26 17:41:31 +02:00
Calle Wilund
e6b792b2ff commitlog bugfix: Fix batch mode
Last series accidently broke batch mode.
With new, fancy, potentitally blocking ways, we need to treat
batch mode differently, since in this case, sync should always
come _after_ alloc-write.
Previous patch caused infinite loop. Broke jenkins.

Message-Id: <1453821077-2385-1-git-send-email-calle@scylladb.com>
2016-01-26 17:13:14 +02:00
Glauber Costa
3f94070d4e use auto&& instead of auto& for priority classes.
By Avi's request, who reminds us that auto& is more suited for situations
in which we are assigning to the variable in question.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <87c76520f4df8b8c152e60cac3b5fba5034f0b50.1453820373.git.glauber@scylladb.com>
2016-01-26 17:00:20 +02:00
Avi Kivity
71eb79aedd main: exit with code 0 on shutdown
To avoid confusing systemd.

Fixes #823.

Message-Id: <1453220473-28712-1-git-send-email-avi@scylladb.com>
2016-01-26 16:26:53 +02:00
Calle Wilund
89dc0f7be3 commitlog: wait for writes (if needed) on new segment as well
Also check closed status in allocate, since alloc queue waiting could
lead to us re-allocating in a segment that gets closed in between
queue enter and us running the continuation.

Message-Id: <1453811471-1858-1-git-send-email-calle@scylladb.com>
2016-01-26 15:05:12 +02:00
Shlomi Livne
0a553dae1f Fix test.py invocation of sstable_test
invocation of sstable_test "./test.py  --name sstable_test --mode
release --jenkins a"
ran ... --log_sink=a.release.sstable_test -c1.boost.xml" which caused
the test to fail "with error code -11" fix that.

In addition boost test printout was bad fix that as well

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <3af8c4b55beae673270f5302822d7b9dbba18c0f.1453809032.git.shlomi@scylladb.com>
2016-01-26 12:56:26 +01:00
Avi Kivity
fbf56b3d98 Merge "Commit log threshold / back pressure" from Calle
"Adds flush + write thresholds/limits that, when reached, causes
operations to wait before being issued.
Write ops waiting also causes further allocations to queue up,
i.e. limiting throughput.

Adds getters for some useful "backlog" measurements:

* Pending (ongoing) writes/flush
* Pending (queued, wating) allocations
* Num times write/flush threshold has been exceeded (i.e. waits occured)
* Finished, dirty segments
* Unused (preallocated) segments"
2016-01-26 13:19:58 +02:00
Avi Kivity
a53788d61d Merge "More streaming cleanup and fix" from Asias
"- Drop compression_info/stream_message
- Cleanup outgoing_file_message/prepare_message
- Fix stream manager API (more to come)"
2016-01-26 13:17:58 +02:00
Avi Kivity
486d937111 Merge seastar upstream
* seastar 97f418a...bdb273a (6):
  > rpc: alias rpc::type to boost::type
  > Fix warning_supported to properly work with Clang
  > rpc: change 'overflow' to 'underflow' in input stream processing
  > rpc: log an error that caused connection to be closed.
  > rpc: clarify deserialization error message
  > rpc: do not append new line in a logger
2016-01-26 12:58:32 +02:00
Avi Kivity
5ad4b59f99 Update scylla-ami submodule
* dist/ami/files/scylla-ami 188781c...77cde04 (1):
  > Run scylla.yaml construction only once
2016-01-26 12:58:11 +02:00
Takuya ASADA
12748cf1b9 dist: support CentOS AMI on scylla_ntp_setup
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453801319-26072-1-git-send-email-syuu@scylladb.com>
2016-01-26 12:55:08 +02:00
Calle Wilund
f2c5315d33 commitlog: Add write/flush limits
Configured on start (for now - and dummy values at that). 
When shard write/flush count reaches limit, and incoming ops will queue
until previous ones finish. 

Consequently, if an allocation op forces a write, which blocks, any 
other incoming allocations will also queue up to provide back pressure.
2016-01-26 10:19:24 +00:00
Calle Wilund
7628a4dfe0 commitlog: Add some feedback/measurement methods
Suitable to derive "back pressure" from.
2016-01-26 09:47:14 +00:00
Calle Wilund
4f5bd4b64b commitlog: split write/flush counters 2016-01-26 09:47:14 +00:00
Calle Wilund
215c8b60bf commitlog: minor cleanup - remove red squiggles in eclipse 2016-01-26 09:42:26 +00:00
Calle Wilund
61c7235c11 Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-26 09:42:08 +00:00
Avi Kivity
0de7d1fc1b Merge "Add priority classes to our I/O path" from Glauber
"After the patch, all of our relevant I/O is placed on a specific priority class.
The ones which are not are left into the Seastar's default priority, which will
effectively work as an idle class.

Examples of such I/O are commitlog replay and initial SSTable loading. Since they
will happen during initialization, they will run uncontended, and do not justify
having a class on their own."
2016-01-26 10:46:13 +02:00
Asias He
750573ca0c configure: Fix idl indentation 2016-01-26 15:04:45 +08:00
Asias He
cc6d928193 api: Fix peer -> streaming_plan id in stream_manager
It is wrong to get a stream plan id like below:

   utils::UUID plan_id = gms::get_local_gossiper().get_host_id(ep);

We should look at all stream_sessions with the peer in question.
2016-01-26 15:00:44 +08:00
Asias He
384e81b48a streaming: Add get_peer_session_info
Like get_all_session_info, but only gets the session_info for a specific
peer.
2016-01-26 14:52:40 +08:00
Asias He
c7b156ed65 api: Fix get_{all}total_outgoing_byte in stream_manager
We should call get_total_size_sent instead of get_total_size_received
for outgoing byes.
2016-01-26 14:22:43 +08:00
Asias He
2e69d50c0c streaming: Cleanup prepare_message
- Drop empty prepare_message.cc
- Drop #if 0'ed code
2016-01-26 13:14:04 +08:00
Asias He
bbf025968b streaming: Cleanup outgoing_file_message
- Drop the unused headers
- Drop the outgoing_file_message.cc file which is empty
2016-01-26 13:12:01 +08:00
Asias He
e8b8b454df streaming: Flatten streaming messages class namespace
There are only two messages: prepare_message and outgoing_file_message.
Actually only the prepare_message is the message we send on wire.
Flatten the namespace.
2016-01-26 13:04:29 +08:00
Asias He
cab36a450b streaming: Remove stream_message
It is not useful to make stream_message as the base class for stream
messages. Scylla uses RPC verbs to distinguish different messages types.
2016-01-26 12:32:17 +08:00
Asias He
6a067bcc23 streaming: Drop unused compression_info 2016-01-26 11:55:36 +08:00
Glauber Costa
b63611e148 mark I/O operations with priority classes
After this patch, our I/O operations will be tagged into a specific priority class.

The available classes are 5, and were defined in the previous patch:

 1) memtable flush
 2) commitlog writes
 3) streaming mutation
 4) SSTable compaction
 5) CQL query

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
261c272178 introduce a priority manager
After the introduction of the Fair I/O Queueing mechanism in Seastar,
it is possible to add requests to a specific priority class, that will
end up being serviced fairly.

This patch introduces a Priority Manager service, that manages the priority
each class of request will get. At this moment, having a class for that may
sound like an overkill. However, the most interesting feature of the Fair I/O
queue comes from being able to adjust the priorities dynamically as workloads
changes: so we will benefit from having them all in the same place.

This is designed to behave like one of our services, with the exception that
it won't use the distributed interface. This is mainly because there is no
reason to introduce that complexity at this point - since we can do thread local
registration as we have been doing in Seastar, and because that would require us
to change most of our tests to start a new service.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
f6cfb04d61 add a priority class to mutation readers
SSTables already have a priority argument wired to their read path. However,
most of our reads do not call that interface directly, but employ the services
of a mutation reader instead.

Some of those readers will be used to read through a mutation_source, and those
have to patched as well.

Right now, whenever we need to pass a class, we pass Seastar's default priority
class.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
8e4bf025ae sstables: wire priority for read path
All the SSTable read path can now take an io_priority. The public functions will
take a default parameter which is Seastar's default priority.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
56c11a8109 sstables: wire priority for write path
All variants of write_component now take an io_priority. The public
interfaces are by default set to Seastar's default priority.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
03d5a89b90 sstables: mandate a buffer size parameter for data_stream_at
The only user for the default size is data_read, sitting at row.cc.
That reader wants to read and process a chunk all at once. So there's
really no reason to use the default buffer size - except that this code
is old.

We should do as we do in other single-key / single-range readers and
try to read all at once if possible, by looking at the size we received
as a parameter. Cleaning up the data_stream_at interface then comes as
a nice side effect.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
15336e7eb7 key_source: turn it into a class
Its definition as a lambda function is inconvenient, because it does not allow
us to use default values for parameters.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
58fdae33bd mutation_source: turn it into a class
Its definition as a lambda function is inconvenient, because it does not allow
us to use default values for parameters.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Gleb Natapov
c9bd069815 messaging_service: log rpc errors
Message-Id: <20160125155005.GC23862@scylladb.com>
2016-01-25 17:59:26 +02:00
Avi Kivity
91b57c7e20 Merge "Move streaming to use IDL" from Asias 2016-01-25 17:10:22 +02:00
Asias He
f027a9babe streaming: Drop unused serialization code 2016-01-25 22:39:13 +08:00
Asias He
ad80916905 messaging_service: Add streaming implementation for idl
- stream_request
- stream_summary
- prepare_message

 Please enter the commit message for your changes. Lines starting
2016-01-25 22:36:58 +08:00
Asias He
b299cc3bee idl: Add streaming.idl.hh
- stream_request
- stream_summary
- prepare_message
2016-01-25 22:29:25 +08:00
Asias He
5e100b3426 streaming: Drop unused repaired_at in stream_request 2016-01-25 22:28:48 +08:00
Avi Kivity
6fade0501b Update test/message.cc for MESSAGE verb rename 2016-01-25 14:47:55 +02:00
Nadav Har'El
db19a43d98 repair: try harder to repair, even when some nodes are unreachable
In the existing code, when we fail to reach one of the replicas of some
range being repaired, we would give up, and not continue to repair the
living replicas of this range. The thinking behind this was since the
repair should be considered failed anyway, there's no point in trying
to do a half-job better.

However, in a discussion I had with Shlomi, he raised the following
alternative thinking, which convinced me: In a large cluster, having
one node or another temporarily dead has a high probability. In that
case, even if the if the repair is doomed to be considered "failed",
we want it at least to do as much as it possibly can to repair the
data on the living part of the cluster. This is what this patch does:
If we can only reach some of the replicas of a given range, the repair
will be considered failed (as before), but we will still repair the
reachable replicas of this range, if they have different checksums.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1453724443-29320-1-git-send-email-nyh@scylladb.com>
2016-01-25 14:37:39 +02:00
Amnon Heiman
039e627b32 idl-compiler: Fix an issue with default values
This patch fix an issue when a parameter with a version attribute had a
default value.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1453723251-9797-1-git-send-email-amnon@scylladb.com>
2016-01-25 14:32:00 +02:00
Takuya ASADA
e9fdb426b6 dist: add pyparsing on CentOS build time dependency CentOS
Porting pyparsing from Fedora23, build it for python34 which provided by epel.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453720780-21105-1-git-send-email-syuu@scylladb.com>
2016-01-25 13:26:58 +02:00
Takuya ASADA
b8b0ff0482 dist: add pyparsing on Ubuntu build time dependency
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453720081-15979-1-git-send-email-syuu@scylladb.com>
2016-01-25 13:08:48 +02:00
Avi Kivity
5c5207f122 Merge "Another round of streaming cleanup" from Asias
"- Merge stream_init_message and stream_parepare_message
- Drop  session_index / keep_ss_table_level / file_message_header"
2016-01-25 12:54:30 +02:00
Asias He
77684a5d4c messaging_service: Drop STREAM_INIT_MESSAGE
The verb is not used anymore.
Message-Id: <1453719054-29584-1-git-send-email-asias@scylladb.com>
2016-01-25 12:53:08 +02:00
Asias He
53c6cd7808 gossip: Rename echo verb to gossip_echo
It is used by gossip only. I really could not allow myself to get along
this inconsistence. Change before we still can.
Message-Id: <1453719054-29584-2-git-send-email-asias@scylladb.com>
2016-01-25 12:53:07 +02:00
Takuya ASADA
67d2aa677e dist: add pyparsing on Fedora build time dependency
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453715594-32675-1-git-send-email-syuu@scylladb.com>
2016-01-25 11:59:32 +02:00
Takuya ASADA
78d107ccaa dist: add missing dependencies for scylla-gdb
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453715339-32296-1-git-send-email-syuu@scylladb.com>
2016-01-25 11:59:31 +02:00
Asias He
51fa717b8e streaming: Get rid of file_message_header
Again, we do not send sstable files, thus neither header info for
sstables files.

TODO: Estimate mutation size we sent.
2016-01-25 17:56:43 +08:00
Asias He
eba9820b22 streaming: Remove stream_session::file_sent
It is the callback after sending file_message_header. In scylla, we do
not sent the file_message_header. Drop it.
2016-01-25 17:25:34 +08:00
Asias He
592683650a streaming: Remove unused serialization code for file_message_header 2016-01-25 17:16:57 +08:00
Calle Wilund
2d1e332fba Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-25 09:11:12 +00:00
Asias He
fa4e94aa27 streaming: Get rid of keep_ss_table_level
We stream mutation instead of files, so keep_ss_table_level is not
relevant for us.
2016-01-25 16:58:57 +08:00
Asias He
2cc31ac977 streaming: Get rid of the stream_index
It is always zero.
2016-01-25 16:58:57 +08:00
Asias He
6b30f08a38 streaming: Always return zero for session_index in api/stream_manager
We will remove session_index soon. It will always be zero. Do not drop
it in api so that the api will be compatible with c*.
2016-01-25 16:58:51 +08:00
Asias He
ad4a096b80 streaming: Get rid of stream_init_message
Unlike streaming in c*, scylla does not need to open tcp connections in
streaming service for both incoming and outgoing messages, seastar::rpc
does the work. There is no need for a standalone stream_init_message
message in the streaming negotiation stage, we can merge the
stream_init_message into stream_prepare_message.
2016-01-25 16:24:16 +08:00
Asias He
048965ea02 streaming: Do not print session_index in handle_session_prepared
session_index is always 0. It will be removed soon.
2016-01-25 16:24:16 +08:00
Avi Kivity
449b81f5d3 Merge "streaming cleanup" from Asias
"No mercy to the unused parameters and messages.
This will help the upcoming IDL serialize/deserialize work."
2016-01-25 10:21:16 +02:00
Avi Kivity
9ebd3f8098 Merge "Move gossip to use IDL" from Asias
"This changes gossip to use IDL based serialization code."
2016-01-25 10:18:34 +02:00
Asias He
20496ed9a8 tests: Stop gossip during shutdown in cql_test_env
Fixes the heap-use-after-free error in build/debug/tests/auth_test

==1415==ERROR: AddressSanitizer: heap-use-after-free on address
0x62200032cfa8 at pc 0x00000350701d bp 0x7fec96df8d40 sp
0x7fec96df8d30
READ of size 8 at 0x62200032cfa8 thread T1
    #0 0x350701c in
_ZZN3gms8gossiper3runEvENKUlOT_E0_clI6futureIJEEEEDaS2_
(/home/penberg/scylla/build/debug/tests/auth_test_g+0x350701c)
    #1 0x35795b1 in apply<gms::gossiper::run()::<lambda(auto:40&&)>,
future<> > /home/penberg/scylla/seastar/core/future.hh:1203
    #2 0x369103d in
_ZZN6futureIJEE12then_wrappedIZN3gms8gossiper3runEvEUlOT_E0_S0_EET0_S5_ENUlS5_E_clI12future_stateIJEEEEDaS5_
(/home/penberg/scylla/build/debug/tests/auth_test_g+0x369103d)
    #3 0x369182a in run /home/penberg/scylla/seastar/core/future.hh:399
    #4 0x435f24 in
reactor::run_tasks(circular_buffer<std::unique_ptr<task,
std::default_delete<task> >, std::allocator<std::unique_ptr<task,
std::default_delete<task> > > >&) core/reactor.cc:1368
    #5 0x43a44f in reactor::run() core/reactor.cc:1672
    #6 0x952e4b in app_template::run_deprecated(int, char**,
std::function<void ()>&&) core/app-template.cc:123
    #7 0x58dc79d in test_runner::start(int,
char**)::{lambda()#1}::operator()()
(/home/penberg/scylla/build/debug/tests/auth_test_g+0x58dc79d)
    #8 0x58e6cd6 in _M_invoke /usr/include/c++/5.3.1/functional:1871
    #9 0x688639 in std::function<void ()>::operator()() const
/usr/include/c++/5.3.1/functional:2271
    #10 0x8d939c in posix_thread::start_routine(void*) core/posix.cc:51
    #11 0x7feca02a4609 in start_thread (/lib64/libpthread.so.0+0x7609)
    #12 0x7fec9ffdea4c in clone (/lib64/libc.so.6+0x102a4c)

0x62200032cfa8 is located 5800 bytes inside of 5808-byte region
[0x62200032b900,0x62200032cfb0)
freed by thread T1 here:
    #0 0x7feca4f76472 in operator delete(void*, unsigned long)
(/lib64/libasan.so.2+0x9a472)
    #1 0x3740772 in gms::gossiper::~gossiper()
(/home/penberg/scylla/build/debug/tests/auth_test_g+0x3740772)
    #2 0x2588ba1 in shared_ptr<gms::gossiper>::~shared_ptr()
seastar/core/shared_ptr.hh:389
    #3 0x4fc908c in
seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1}::~stop()
(/home/penberg/scylla/build/debug/tests/auth_test_g+0x4fc908c)
    #4 0x4ff722a in future<>
future<>::then<seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1}, future<>
>(seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1}&&)::{lambda(seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1})#1}::~then()
(/home/penberg/scylla/build/debug/tests/auth_test_g+0x4ff722a)
    #5 0x509a28c in continuation<future<>
future<>::then<seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1}, future<>
>(seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1}&&)::{lambda(seastar::sharded<gms::gossiper>::stop()::{lambda(unsigned
int)#1}::operator()(unsigned
int)::{lambda()#1}::operator()()::{lambda()#1})#1}>::~continuation()
seastar/core/future.hh:395
    #6 0x509a40d in continuation<future<>
Message-Id: <f8f1c92c1eb88687ab0534f5e7874d53050a5b93.1453446350.git.asias@scylladb.com>
2016-01-25 08:19:18 +02:00
Asias He
bc4ac5004e streaming: Kill stream_result_future::create_and_register
The helper is used only once in init_sending_side and in
init_receiving_side we do not use create_and_register to create
stream_result_future. Kill the trivial helper to make the code more
consistent.

In addition, rename variables "future" and "f" to sr (streaming_result).
2016-01-25 11:38:13 +08:00
Asias He
face74a8f2 streaming: Rename stream_result_future::init to ::init_sending_side
So we have:

- init_sending_side
  called when the node initiates a stream_session

- init_receiving_side
  called when the node is a receiver of a stream_session initiated by a peer
2016-01-25 11:38:13 +08:00
Asias He
dc94c5e42e streaming: Rename get_or_create_next_session to get_or_create_session
There is only one session for each peer in stream_coordinator.
2016-01-25 11:38:13 +08:00
Asias He
e46d4166f2 streaming: Refactor host_streaming_data
In scylla, in each stream_coordinator, there will be only one
stream_session for each remote peer. Drop the code supporting multiple
stream_sessions in host_streaming_data.

We now have

   shared_ptr<stream_session> _stream_session

instead of

   std::map<int, shared_ptr<stream_session>> _stream_sessions
2016-01-25 11:38:13 +08:00
Asias He
8a4b563729 streaming: Drop the get_or_create_session_by_id interafce
The session index will always be 0 in stream_coordinator. Drop the api for it.
2016-01-25 11:38:13 +08:00
Asias He
9a346d56b9 streaming: Drop unnecessary parameters in stream_init_message
- from
  We can get it form the rpc::client_info

- session_index
  There will always be one session in stream_coordinator::host_streaming_data with a peer.

- is_for_outgoing
  In cassandra, it initiates two tcp connections, one for incoming stream and one for outgoing stream.
  logger.debug("[Stream #{}] Sending stream init for incoming stream", session.planId());
  logger.debug("[Stream #{}] Sending stream init for outgoing stream", session.planId());
  In scylla, it only initiates one "connection" for sending, the peer initiates another "connection" for receiving.
  So, is_for_outgoing will also be true in scylla, we can drop it.

- keep_ss_table_level
  In scylla, again, we stream mutations instead of sstable file. It is
  not relevant to us.
2016-01-25 11:38:13 +08:00
Asias He
1bc5cd1b22 streaming: Drop streaming/messages/session_failed_message
It is not used.
2016-01-25 11:38:13 +08:00
Asias He
2a04e8d70e streaming: Drop streaming/messages/incoming_file_message
It is not used.
2016-01-25 11:38:13 +08:00
Asias He
26ba21949e streaming: Drop streaming/messages/retry_message
It is not used.
2016-01-25 11:38:13 +08:00
Asias He
4b4363b62d streaming: Drop streaming/messages/received_message
It is not used.
2016-01-25 11:38:13 +08:00
Asias He
b3e00472ed streaming: Drop streaming/streaming.cc
It is used in the early stage of development to make sure things compile.
2016-01-25 11:38:13 +08:00
Asias He
5a0bf10a0b streaming: Drop streaming/messages/complete_message
It is not used.
2016-01-25 11:38:13 +08:00
Asias He
bdd6a69af7 streaming: Drop unused parameters
- int connections_per_host

Scylla does not create connections per stream_session, instead it uses
rpc, thus connections_per_host is not relevant to scylla.

- bool keep_ss_table_level
- int repaired_at

Scylla does not stream sstable files. They are not relevant to scylla.
2016-01-25 11:38:13 +08:00
Asias He
7b633ad127 gossip: Drop unused serialization code
- heart_beat_state
2016-01-25 11:28:29 +08:00
Asias He
4ce08ff251 messaging_service: Add heart_beat_state implementation 2016-01-25 11:28:29 +08:00
Asias He
d7c7994f37 gossip: Drop unused serialization code
- versioned_value
2016-01-25 11:28:29 +08:00
Asias He
8098ba10b7 gossip: Drop unused serialization code
- endpoint_state
2016-01-25 11:28:29 +08:00
Asias He
ecca969adf messaging_service: Add gossip::endpoint_state implementation 2016-01-25 11:28:29 +08:00
Asias He
2a0b6589dd messaging_service: Add versioned_value implementation 2016-01-25 11:28:29 +08:00
Asias He
6660658742 gossip: Drop unused serialization code
- gossip_digest_serialization_helper
- gossip_digest
2016-01-25 11:28:29 +08:00
Asias He
15f2b353b9 messaging_service: Add gossip_digest implementation 2016-01-25 11:28:29 +08:00
Asias He
736d21a912 gossip: Drop unused serialization code
- gossip_digest_syn
- gossip_digest_ack
- gossip_digest_ack2
2016-01-25 11:28:29 +08:00
Asias He
d81fc12af3 messaging_service: Add gossip_digest_ack2 implementation 2016-01-25 11:28:29 +08:00
Asias He
e67cecaee1 messaging_service: Add gossip_digest_syn implementation 2016-01-25 11:28:29 +08:00
Asias He
d94b7e49d2 idl: Add gossip_digest_syn
Added get_partioner and get_cluster_id
2016-01-25 11:28:28 +08:00
Asias He
60f5891c3f idl: Add gossip_digest_ack2 2016-01-25 11:28:26 +08:00
Takuya ASADA
0f0d1c7aed dist: don't depends on libvirtd, since we are not using it
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453470970-31036-1-git-send-email-syuu@scylladb.com>
2016-01-24 17:15:13 +02:00
Avi Kivity
65a140481c Merge " streaming COMPLETE_MESSAGE failure and message retry logic fix" from Asias
"This series:

- Add more debug info to stream session
- Fail session if we fail to send COMPLETE_MESSAGE
- Handle message retry logic for verbs used by streaming

See commit log for details."
2016-01-24 16:41:06 +02:00
Avi Kivity
6135e0ae78 Merge "Move read/write mutation path to use IDL" from Gleb 2016-01-24 13:35:04 +02:00
Avi Kivity
b415f87324 Merge "Serializer Deserializer code generation" from Amnon
"The series do the following:
It adds the code generation
Perform the needed changes in the current classes so each would have getter for
each of its serializable value and a constructor from the serialized values.
It adds a schema definition that cover gossip_diget_ack
It changes the messaging_service to use the generated code.

An overall explanation of the solution with a description of the schema IDL can
be found on the wiki page:

https://github.com/scylladb/scylla/wiki/Serializer-Deserializer-Code-generation
"
2016-01-24 12:56:42 +02:00
Gleb Natapov
b9b6f703c3 Remove old serializer for frozen_mutation and reconcilable_result 2016-01-24 12:45:41 +02:00
Gleb Natapov
067bdb23cd Move reconcilable_result and frozen_mutation to idl 2016-01-24 12:45:41 +02:00
Gleb Natapov
18dff5ebc8 Move smart pointer serialization helpers to .cc file.
They are not used outside of the .cc file, so should not be in the
header.
2016-01-24 12:45:41 +02:00
Gleb Natapov
93da9b2725 Remove redundant vector serialization code.
IDL serializer has the code to serialize vectors, so use it instead.
2016-01-24 12:45:41 +02:00
Gleb Natapov
ab6703f9bc Remove old query::result serializer 2016-01-24 12:45:41 +02:00
Gleb Natapov
afc407c6e5 Move query::result to use idl. 2016-01-24 12:45:41 +02:00
Gleb Natapov
be4e68adbf Add bytes_ostream serializer. 2016-01-24 12:45:41 +02:00
Gleb Natapov
043d132ba9 Remove no longer used serializers. 2016-01-24 12:45:41 +02:00
Gleb Natapov
4ae906b204 Add serializer overload for query::partition_range.
From now on query::partition_range will use generated code.
2016-01-24 12:45:41 +02:00
Gleb Natapov
2d1b2765e6 Add serializer overload for query::read_command.
From now on query::read_command will use generated code.
2016-01-24 12:45:41 +02:00
Gleb Natapov
49ce2b83df Add ring_position constructor needed by serializer. 2016-01-24 12:45:41 +02:00
Gleb Natapov
6cc5b15a9c Fix read_command constructor to not copy parameters. 2016-01-24 12:45:41 +02:00
Gleb Natapov
4384c7fe85 un-nest range::bound class.
Serializer does not support nested classes yet, so move bound outside.
2016-01-24 12:45:41 +02:00
Gleb Natapov
7357b1ddfe Move specific_ranges to .hh and un-nest it.
Serializer requires class to be defined, so it has to be in .h file. It
also does not support nested types yet, so move it outside of containing
class.
2016-01-24 12:45:41 +02:00
Gleb Natapov
9ae7dc70da Prepare partition_slice to be used by serializer.
Add missing _specific_ranges getter and setter.
2016-01-24 12:45:41 +02:00
Gleb Natapov
48ab0bd613 Make constructor from bytes for partition_key and clustering_key_prefix public
Make constructor from bytes public since serializer will use it.
2016-01-24 12:45:41 +02:00
Gleb Natapov
8deb5e424c Add idl files for more types.
Add idl for uuid/range/read_command/token/ring_position/clustering_key_prefix/partition_key.
2016-01-24 12:45:41 +02:00
Gleb Natapov
11299aa3db Add serializers for more basic types.
We will need them in following patches.
2016-01-24 12:45:41 +02:00
Gleb Natapov
a643f3d61f Reorder bool and uint8_t serializers
bool serializer uses uint8_t one so should be defined after it.
2016-01-24 12:45:41 +02:00
Gleb Natapov
cba31eb4f8 cleanup gossip_digest.idl
Remove uuid class, nonexistent application states and add ';'.
2016-01-24 12:45:37 +02:00
Avi Kivity
a3efecb8fe Merge seastar upstream
* seastar 5c2660b...97f418a (4):
  > io_queues: register individual classes with collectd
  > reactor: destroy the I/O queues explicitly
  > rpc_impl: add pragma once
  > rpc: add skip to simple_input_stream
2016-01-24 12:35:24 +02:00
Takuya ASADA
b5029dae7e dist: remove abrt from AMI, since it's not able to work with Scylla
New CentOS Base Image contains abrt by default, so remove it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453416479-28553-2-git-send-email-syuu@scylladb.com>
2016-01-24 12:31:50 +02:00
Amnon Heiman
0006f236a6 Add an IDL definition file
This adds the IDL definition file.
It is also cover in the wiki page:
https://github.com/scylladb/scylla/wiki/Serializer-Deserializer-Code-generation

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:29:21 +02:00
Amnon Heiman
f266c2ed42 README.md: Add dependency for pyparsing python3
python3 needs to install pyparsing excplicitely. This adds the
installation of python3-pyparsing to the require dependencies in the
README.md

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:29:21 +02:00
Amnon Heiman
577ce0d231 Adding a sepcific template initialization in messaging_service to use
the serializer

This patch adds a specific template initialization so that the rpc would
use the serializer and deserializer that are auto-generated.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:29:21 +02:00
Amnon Heiman
b625363072 Adding the serializer decleration and implementation files.
This patch adds the serializer and serializer_imple files. They holds
the functions that are not auto generated: primitives and templates (map
and vector) It also holds the include to the auto-generated code.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:29:20 +02:00
Amnon Heiman
451cf2692c configure.py Add serializer code generation from schema
This patch adds rules and the idl schema to configure, which will call
the code generation to create the serialization and deserialization
functions.

There is also a rule to create the header file that include the auto
generated header files.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:29:20 +02:00
Amnon Heiman
0715dcd6ba A schema definition for gossip_digest_ack
This is a definition example for gossip_digest_ack with all its sub
classes.

It can be used by the code generator to create the serializer and
deserializer functions.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:29:14 +02:00
Amnon Heiman
d27734b9be Add a constructor to inet_address from uint32_t
inet_address uses uint32_t to store the ip address, but its constructor
is int32_t.
So this patch adds such a constructor.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:13:01 +02:00
Amnon Heiman
8a4d211a99 Changes the versioned_value to make serializeble
This patch contains two changes, it make the constructor with parameters
public. And it removes the dependency in messaging_service.hh from the
header file by moving some of the code to the .cc file.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:13:01 +02:00
Amnon Heiman
ddc3fe1328 endpoint_state adds a constructor for all serialized parameters
An external deserialize function needs a constructor with all the
serialized parameters.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:13:01 +02:00
Amnon Heiman
4a34ed82a3 Add code generation for serializer and deserializer
The code generation takes a schema file and create two files from it,
one with a dist.hh extension containing the forward declarations and a second
with dist.impl.hh with the actual implementation.

Because the rpc uses templating for the input and output streams. The
generated functions are templates.

For each class, struct or enum, two functions are created:
serialize - that gets the output buffer as template parameter and
serialize the object to it. There must be a public way to get to each of
the parameters in the class (either a getter or the parameter should be
public)

deserialize - that gets an input buffer, and return the deserialize
object (and by reference the number of char it read).
To create the return object, the class must have a public constructor
with all of its parameters.

The solution description can be found here:
https://github.com/scylladb/scylla/wiki/Serializer-Deserializer-Code-generation

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-01-24 12:12:51 +02:00
Takuya ASADA
aef1e67a9b dist: remove mdadm,xfsprogs from dependencies, install it when constructing RAID by scylla_raid_setup
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453422886-26297-2-git-send-email-syuu@scylladb.com>
2016-01-24 12:10:41 +02:00
Takuya ASADA
b92a075a34 main: support supervisor_notify() on Ubuntu
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453422886-26297-1-git-send-email-syuu@scylladb.com>
2016-01-24 12:10:41 +02:00
Asias He
7ac3e835a6 messaging_service: Fix send_message_timeout_and_retry
When a verb timeout, if we resend the message again, the peer could receive
the message more than once. This would confuse the receiver. Currently, only
the streaming code use the retry logic.

- In case of rpc:timeout_error:

Instead of doing timeout in a relatively short time and resending a few
times, we make the timeout big enough and let the tcp to do the resend.
Thus, we can avoid resending the message more than once, of course, the
receiver would not receive the message more than once.

- In case of rpc::closed_error:

There are two cases:
1) Failing to establish a connection.

For instance, the peer is down. It is safe to resend since we know for
sure the receiver hasn't received the message yet.

2) The connection is established.

We can not figure out if the remote peer have received the message
already or not upon receiving the rpc::closed_error exception.

Currently, we still sleep & resend the message again, so the receiver
might receive the message more than once. We do not have better choice
in this case, if we want the resend to recover the sending error due to
temporary network issue, since failing the whole stream_session due to
failing to send a single message is not wise.

NOTE: If the duplicated message is received when the stream_session is done,
it will be ignored since it can not find the stream_manager anymore.
For message like, STREAM_MUTATION, it is ok to receive twice (we apply the
mutation twice).

TODO: For other messages which uses the retry logic, we need
to make sure it is ok to receive more than once.
2016-01-22 08:20:48 +08:00
Asias He
864c7f636c streaming: Fail the session if fails to send COMPLETE_MESSAGE
We will retry sending COMPLETE_MESSAGE, if it fails even with the
retry, there must be something wrong. Abort the stream_session in this
case.
2016-01-22 07:44:21 +08:00
Asias He
9be671e7f5 streaming: Simplify send_complete_message
The send once logic is open coded. Moved it into
send_complete_message(), so we can simplify the caller.
2016-01-22 07:43:39 +08:00
Asias He
88e99e89d6 streaming: Add more debug info
- Add debug for the peer address info
- Add debug in stream_transfer_task and stream_receive_task
- Add debug when cancel the keep_alive timer
- Add debug for has_active_sessions in stream_result_future::maybe_complete
2016-01-22 07:43:16 +08:00
Pekka Enberg
81996bd10b Merge "Improvements to compaction manager" from Raphael 2016-01-21 20:54:49 +02:00
Raphael S. Carvalho
bb909798bc compaction_manager: introduce can_submit
Purpose is to reuse code and also make it easier to read.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-21 15:42:23 -02:00
Raphael S. Carvalho
653a07d75d compaction_manager: introduce signal_less_busy_task
Purpose is to reuse code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-21 15:31:44 -02:00
Raphael S. Carvalho
2164aa8d5b move compaction manager from /utils to /sstables
Compaction manager was initially created at utils because it was
more generic, and wasn't only intended for compaction.
It was more like a task handler based on futures, but now it's
only intended to manage compaction tasks, and thus should be
moved elsewhere. /sstables is where compaction code is located.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-21 15:23:05 -02:00
Pekka Enberg
b5833e8002 Merge "Enable incremental backups option" from Vlad
"This series moves the "backup" logic into the sstable::write_components()
methods, adds a support for enabling backup for sstables flushed in the
compaction flow (in addition to a regular flushing flow which had this support
already) and enables the "incremental_backups" configuration option."

I fixed up a merge conflict with commit 5e953b5 ("Merge "Add support to
stop ongoing compaction" from Raphael").
2016-01-21 18:52:07 +02:00
Pekka Enberg
5e953b5e47 Merge "Add support to stop ongoing compaction" from Raphael
"stop compaction is about temporarily interrupting all ongoing compaction
 of a given type.
 That will also be needed for 'nodetool stop <compaction_type>'.

 The test was about starting scylla, stressing it, stopping compaction using
 the API and checking that scylla was able to recover.

 Scylla will print a message as follow for each compaction that was stopped:
 ERROR [shard 0] compaction_manager - compaction failed: read exception:
 std::runtime_error (Compaction for keyspace1/standard1 was deliberately stopped.)
 INFO  [shard 0] compaction_manager - compaction task handler sleeping for 20 seconds"
2016-01-21 18:34:10 +02:00
Takuya ASADA
fae47ee4a8 dist: fetch CentOS dependencies from koji, update them to latest version
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453378765-19596-1-git-send-email-syuu@scylladb.com>
2016-01-21 15:04:45 +02:00
Asias He
755d792c78 gossip: Wait for gossip timer callback to finish in do_stop_gossiping
Also do not rearm the timer if we stopped the gossip.

Message-Id: <73765857b554d9914e87b24d287ff35ab0af6fce.1453378191.git.asias@scylladb.com>
2016-01-21 14:15:57 +02:00
Vlad Zolotarov
e3d7db5e57 ec2_snitch: complete the EC2Snitch -> Ec2Snitch renaming
The rename started in 72b27a91fe
was not complete. This patch fixes the places that were missed
in the above patch.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1453375025-7512-3-git-send-email-vladz@cloudius-systems.com>
2016-01-21 13:35:30 +02:00
Vlad Zolotarov
9951edde1a locator::ec2_multi_region_snitch: add a get_name() implementation
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1453375025-7512-2-git-send-email-vladz@cloudius-systems.com>
2016-01-21 13:35:29 +02:00
Avi Kivity
43c81db74e Update ami submodule
* dist/ami/files/scylla-ami eb1fdd4...188781c (1):
  > Switch SimpleSnitch to Ec2Snitch
2016-01-21 13:13:23 +02:00
Vlad Zolotarov
de3bb01582 config: allow enabling the incremental backup via .yaml
Enable the incremental_backups/--incremental-backups option.
When enabled there will be a hard link created in the
<column family directory>/backup directory for every flushed
sstable.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-21 12:13:24 +02:00
Vlad Zolotarov
c2ab54e9c7 sstables flushing: enable incremental backup (if requested)
Enable incremental backup when sstables are flushed if
incremental backup has been requested.

It has been enabled in the regular flushing flow before but
wasn't in the compaction flow.

This patch enables it in both places and does it using a
backup capability of sstable::write_components() method(s).

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-21 12:13:20 +02:00
Vlad Zolotarov
cb5c66f264 sstable::write_components(): add a 'backup' parameter
When 'backup' parameter is TRUE - create backup hard
links for a newly written sstables in <sstable dir>/backups/
subdirectory.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-21 12:04:45 +02:00
Amnon Heiman
e33710d2ca API: storage_service get_logging_level
This patch adds the get_loggin_level command that returns a map between
the log name and its level.
To test the API do:
curl -X GET "http://localhost:10000/storage_service/logging_level"

this would enable the `nodetool getlogginglevels` command.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1453365106-27294-3-git-send-email-amnon@scylladb.com>
2016-01-21 11:58:54 +02:00
Amnon Heiman
ba80121e49 migration_task: rename logger name
Logger name should not contain a space, it causes issues when trying to
modify their level from the nodetool.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1453365106-27294-2-git-send-email-amnon@scylladb.com>
2016-01-21 11:58:42 +02:00
Calle Wilund
980681d28e auth: Add a simplistic "schedule" for auth db setup
Only difference from previous sleep is that we will
explicitly delete the objects if the process terminates
before tasks are run. I.e. make ASas happier.

Message-Id: <1453295521-29580-1-git-send-email-calle@scylladb.com>
2016-01-20 19:31:14 +02:00
Raphael S. Carvalho
f001bb0f53 sstables: fix make_checksummed_file_output_stream
Arguments buffer_size and true were accidently inverted.
GCC wasn't complaning because implicit conversion of bool to
int, and vice-versa, is valid.
However, this conversion is not very safe because we could
accidentaly invert parameters.

This should fix the last problem with sstable_test.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <9478cd266006fdf8a7bd806f1c612ec9d1297c1f.1453301866.git.raphaelsc@scylladb.com>
2016-01-20 16:01:38 +01:00
Calle Wilund
07f992e42a Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-20 13:31:33 +00:00
Calle Wilund
63b17be4f0 auth_test: Modify yet another case to use "normal" continuation.
test_cassandra_hash also sort of expects exceptions. ASas causes false
positives here as well with seastar::thread, so do it with normal cont.
Message-Id: <1453295521-29580-2-git-send-email-calle@scylladb.com>
2016-01-20 15:15:45 +02:00
Takuya ASADA
2eb12681b0 dist: add 'scylla-gdb' package for CentOS
Fixes #831

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453291407-12232-1-git-send-email-syuu@scylladb.com>
2016-01-20 15:12:33 +02:00
Avi Kivity
7bc3e6ffd0 Merge seastar upstream
* seastar 0516ed0...5c2660b (4):
  > reactor: block all signals early
  > reactor: replace sigprocmask() with pthread_sigmask()
  > fstream: remove unused interface
  > foreign_ptr: remove make_local_and_release()

Fixes #601.
2016-01-20 14:59:52 +02:00
Asias He
1c2d95f2b0 streaming: Remove unused verb handlers
They are never used in scylla.
Message-Id: <1453283955-23691-2-git-send-email-asias@scylladb.com>
2016-01-20 13:58:59 +02:00
Asias He
767e25a686 streaming: Remove the _handlers helper
It is introduced to help to run the invoke_on_all, we can reuse the
distributed<database> db for it.
Message-Id: <1453283955-23691-1-git-send-email-asias@scylladb.com>
2016-01-20 13:58:44 +02:00
Paweł Dziepak
33892943d9 sstables: do not drop row marker when reading mutation
Since 581271a243 "sstables: ignore data
belonging to dropped columns" we silently drop cells if there is no
column in the current schema that they belong to or their timestamp is
older than the column dropped_at value. Originally this check was
applied to row markers as well which caused them to be always dropped
since there is no column in the schema representing these markers.
This patch makes sure that the check whether colum is alive is performed
only if the cell is not a row marker.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1453289300-28607-1-git-send-email-pdziepak@scylladb.com>
2016-01-20 12:35:41 +01:00
Calle Wilund
9197a886f8 Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-20 09:44:38 +00:00
Takuya ASADA
79b218eb1c dist: use our own CentOS7 Base image
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453241256-23338-4-git-send-email-syuu@scylladb.com>
2016-01-20 09:40:56 +02:00
Takuya ASADA
b9cb91e934 dist: stop ntpd before running ntpdate
New CentOS Base Image runs ntpd by default, so shutdown it before running ntpdate.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453241256-23338-3-git-send-email-syuu@scylladb.com>
2016-01-20 09:40:35 +02:00
Takuya ASADA
98e61a93ef dist: disable SELinux only when it enabled
New CentOS7 Base Image disabled SELinux by default, and running 'setenforce 0' on the image causes error, we won't able to build AMI.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453241256-23338-2-git-send-email-syuu@scylladb.com>
2016-01-20 09:40:01 +02:00
Raphael S. Carvalho
c318f3baa3 sstables: fix sstable::data_stream_at
After 63967db8, offset is ignored when creating a input stream.
Found the problem after sstable_test failed recently.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <56ece21ff6e043e224eb2a6e76cdd422b94821b0.1453232689.git.raphaelsc@scylladb.com>
2016-01-20 09:35:57 +02:00
Raphael S. Carvalho
ff9b1694fe api: implement stop_compaction
stop_compaction is implemented by calling stop_compaction() of
compaction manager for each database.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-19 23:15:18 -02:00
Raphael S. Carvalho
5cceb7d249 api: fix paramType of parameter of stop_compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-19 23:15:18 -02:00
Raphael S. Carvalho
3bd240d9e8 compaction: add ability to stop an ongoing compaction
That's needed for nodetool stop, which is called to stop all ongoing
compaction. The implementation is about informing an ongoing compaction
that it was asked to stop, so the compaction itself will trigger an
exception. Compaction manager will catch this exception and re-schedule
the compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-19 23:15:18 -02:00
Raphael S. Carvalho
ec4c73d451 compaction: rename compaction_stats to compaction_info
compaction_info makes more sense because this structure doesn't
only store stats about ongoing compaction. Soon, we will add
information to it about whether or not an user asked to stop the
respective ongoing compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-19 23:15:18 -02:00
Tomasz Grabiec
bd34adcf22 tests: memory_footprint: Show canonnical_mutation size
Message-Id: <1453227147-21918-1-git-send-email-tgrabiec@scylladb.com>
2016-01-19 20:22:59 +02:00
Tomasz Grabiec
b8c3fa4d46 cql3: Print only column name in error message
Printing column_definition prints all fields of the struct, we want
only name here.
Message-Id: <1453207531-16589-1-git-send-email-tgrabiec@scylladb.com>
2016-01-19 20:22:37 +02:00
Tomasz Grabiec
0596455dc2 Merge branch 'pdziepak/date-timestamp-fixes/v2'
From Paweł:

These patches contain fixes for date and timestamp types:
 - date and timestamp are considered compatible types
 - date type is added to abstract_type::parse_type()
2016-01-19 18:35:09 +01:00
Glauber Costa
63967db8bf sstables: always use a file_*_stream_options in our readers and writes
Instead of using the APIs that explicitly pass things like buffer_size,
always use the options instance instead.

This will make it easier to pass extra options in the future.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <5b04e60ab469c319a17a522694e5bedf806702fe.1453219530.git.glauber@scylladb.com>
2016-01-19 18:26:37 +02:00
Glauber Costa
c3ac5257b5 sstables: don't repeat file_writer creation all the time
When this code was originally written, we used to operate on a generic
output_stream. We created a file output stream, and then moved it into
the generic object.

Many patches and reworks later, we now have a file_writer object, but
that pattern was never reworked.

So in a couple of places we have something like this:

    f = file_object acquired by open_file_dma
    auto out = file_writer(std::move(f), 4096);
    auto w = make_shared<file_writer>(std::move(out));

The last statement is just totally redundant. make_shared can create
an object from its parameters without trouble, so we can just pass
the parameter list directly to it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <c01801a1fdf37f8ea9a3e5c52cd424e35ba0a80d.1453219530.git.glauber@scylladb.com>
2016-01-19 18:26:36 +02:00
Calle Wilund
59bf54d59a commitlog_replayer: Modify logging to more match origin
* Match origin log messages
  - Demote per-file printouts to "debug" level.
* Print an all-files stat summary for whole replay (begin/summary)
  - At info level, like origin

Prompted by dtest that expects origin log output.

Message-Id: <1453216558-18359-1-git-send-email-calle@scylladb.com>
2016-01-19 17:19:52 +02:00
Avi Kivity
07e0f0a31f Merge "Support schema changes in batchlog manager" from Tomasz
"We need to be able to replay mutations created using older versions of
the table's schema. frozen_mutation can be only read using the version
it was serialized with, and there is no guarantee that the node will
know this version at the time of replay. Currently versions are kept
in-memory so a node forgets all past versions when it restarts. This
was not implemented yet, replay would fail with exception if the
version is unknown."
2016-01-19 17:17:47 +02:00
Calle Wilund
3f4c8d9eea commitlog_replayer: Modify logging to more match origin
* Match origin log messages
  - Demote per-file printouts to "debug" level.
* Print an all-files stat summary for whole replay (begin/summary)
  - At info level, like origin

Prompted by dtest that expects origin log output.

v2:
* Fixed broken + operator
* Use map_reduce instead of easily readable code
2016-01-19 15:14:21 +00:00
Paweł Dziepak
db30ac8d2d tests/types: add test for timestamp and date compatibility
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 15:34:45 +01:00
Avi Kivity
a2953833dc Merge seastar upstream
* seastar e93cd9d...0516ed0 (9):
  > http: use default file input stream options in file_handler
  > linecount: use default file input stream options
  > fstream: do not pass offset as part options member
  > net: move posix network stack registration to reactor.cc
  > net: throw a human-readable error if using an unregistered network stack
  > io_queue: remove pending_io counter
  > Revert "Merge "Improve rpc server-side statistics""
  > tests: corrections regarding Boost.Test 1.59 compilation failures
  > Merge "Improve rpc server-side statistics"
2016-01-19 16:33:35 +02:00
Calle Wilund
1b4b7aeb66 Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-19 13:51:00 +00:00
Paweł Dziepak
900f5338e7 types: make timestamp_type and date_type compatible
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 14:03:15 +01:00
Tomasz Grabiec
ec12b75426 batchlog_manager: Store canonical_mutations
We need to be able to replay mutations created using older versions of
the table's schema. frozen_mutation can be only read using the version
it was serialized with, and there is no guarantee that the node will
know this version at the time of replay. Currently versions kept
in-memory so a node forgets all past versions when it restarts.

To solve this, let's store canonical_mutations which, like data in
sstables, can be read using any later schema version of given table.
2016-01-19 13:46:28 +01:00
Tomasz Grabiec
e21049328f batchlog_manager: Add more debug logging 2016-01-19 13:46:28 +01:00
Tomasz Grabiec
608b606434 canonical_mutation: Introduce column_family_id() getter 2016-01-19 13:46:28 +01:00
Tomasz Grabiec
06d1f4b584 database: Print table name when printing mutation 2016-01-19 13:46:28 +01:00
Tomasz Grabiec
52073d619c database: Add trace-level logging of applied mutations 2016-01-19 13:46:28 +01:00
Paweł Dziepak
a6171d3e99 types: add date type to parse_type()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 13:43:36 +01:00
Paweł Dziepak
f77ab67809 types: use correct name for date_type
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 13:42:53 +01:00
Tomasz Grabiec
d7cb88e0af Merge branch 'pdziepak/fixes-for-alter-table/v1'
From Paweł:

"This series contains some more fixes for issues related to alter table,
namely: incorrect parsing of collection information in comparator, missing
schema::_raw._collections in equality check, missing compatibility
information for utf8->blob, ascii->blob and ascii->utf8 casts."
2016-01-19 13:22:10 +01:00
Calle Wilund
de9f9308a5 auth_test: workaround ASan false error
test_password_authenticator_operations causes ASan failures, in a way
that I am 99% sure is fully false positive, caused by a combo of
seastar threads, exception throwing and externals.

In lieu of actually identifying what ASan flaw causes this and
potentially cure it, for now, lets just re-write the test in question
to not use seastar::async, but normal continuation. Less easy to read,
but passes ASan.
Message-Id: <1453205136-10308-1-git-send-email-calle@scylladb.com>
2016-01-19 13:11:20 +01:00
Calle Wilund
79a5f7b19d auth_test: workaround ASan false error
test_password_authenticator_operations causes ASan failures, in a way
that I am 99% sure is fully false positive, caused by a combo of
seastar threads, exception throwing and externals.

In lieu of actually identifying what ASan flaw causes this and
potentially cure it, for now, lets just re-write the test in question
to not use seastar::async, but normal continuation. Less easy to read,
but passes ASan.
2016-01-19 12:02:50 +00:00
Raphael S. Carvalho
0c67b1d22b compaction: filter out mutation that doesn't belong to shard
When compacting sstable, mutation that doesn't belong to current shard
should be filtered out. Otherwise, mutation would be duplicated in
all shards that share the sstable being compacted.
sstable_test will now run with -c1 because arbitrary keys are chosen
for sstables to be compacted, so test could fail because of mutations
being filtered out.

fixes #527.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <1acc2e8b9c66fb9c0c601b05e3ae4353e514ead5.1453140657.git.raphaelsc@scylladb.com>
2016-01-19 10:16:41 +01:00
Vlad Zolotarov
922eb218b1 locator::reconnectable_snitch_helper: don't check messaging_service version
Don't demand the messaging_service version to be the same on both
sides of the connection in order to use internal addresses.

Upstream has a similar change for CASSANDRA-6702 in commit a7cae32 ("Fix
ReconnectableSnitch reconnecting to peers during upgrade").

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1452686729-32629-1-git-send-email-vladz@cloudius-systems.com>
2016-01-19 11:04:37 +02:00
Paweł Dziepak
7c9708953e tests/cql3: add tests for ALTER TABLE with multiple collections
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 09:39:24 +01:00
Paweł Dziepak
e249d4eab5 tests/type: add test for simple type compatibility
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 09:39:20 +01:00
Paweł Dziepak
440b6d058e types: fix compatibility for text types
bytes_type is_compatible_with utf8_type and ascii_type
utf8_type is_compatible_with ascii_type

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 09:39:16 +01:00
Paweł Dziepak
17ca7e06f3 schema: print collection info
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 09:39:12 +01:00
Paweł Dziepak
2e2de35dfb schema: add _raw._collections check to operator==()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 09:39:08 +01:00
Paweł Dziepak
92dc95b73b schema: fix comparator parsing
The correct format of collection information in comparator is:

o.a.c.db.m.ColumnToCollection(<name1>:<type1>, <name2>:<type2>, ...)

not:

o.a.c.db.m.ColumnToCollection(<name1>:<type1>),
o.a.c.db.m.ColumnToCollection(<name2>:<type2>) ...

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-19 09:39:05 +01:00
Amnon Heiman
9be42bfd7b API: Add version to application state in failure_detection
The upstream of origin adds the version to the application_state in the
get_endpoints in the failure detector.

In our implementation we return an object to the jmx proxy and the proxy
do the string formatting.

This patch adds the version to the return object which is both useful as
an API and will allow the jmx proxy to add it to its output when we move
forward with the jmx version.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1448962889-19611-1-git-send-email-amnon@scylladb.com>
2016-01-19 10:23:56 +02:00
Tomasz Grabiec
5a1587353f tests: Don't depend on partition_key representation
Representation format is an implementation detail of
partition_key. Code which compares a value to representation makes
assumptions about key's representation. Compare keys to keys instead.
Message-Id: <1453136316-18125-1-git-send-email-tgrabiec@scylladb.com>
2016-01-18 19:01:56 +02:00
Pekka Enberg
2ca8606b4e streaming/stream_session: Don't stop stream manager
We cannot stop the stream manager because it's accessible via the API
server during shutdown, for example, which can cause a SIGSEGV.

Spotted by ASan.
Message-Id: <1453130811-22540-1-git-send-email-penberg@scylladb.com>
2016-01-18 16:34:19 +01:00
Pekka Enberg
422cff5e00 api/messaging_service: Fix heap-buffer-overflows in set_messaging_service()
Fix various issues in set_messaging_service() that caused
heap-buffer-overflows when JMX proxy connects to Scylla API:

  - Off-by-one error in 'num_verb' definition

  - Call to initializer list std::vector constructor variant that caused
    the vector to be two elements long.

  - Missing verb definitions from the Swagger definition that caused
    response vector to be too small.

Spotted by ASan.
Message-Id: <1453125439-16703-1-git-send-email-penberg@scylladb.com>
2016-01-18 15:43:29 +01:00
Pekka Enberg
3723beb302 service/storage_service: Fix typos in logger messages
Message-Id: <1453128076-18613-1-git-send-email-penberg@scylladb.com>
2016-01-18 15:43:04 +01:00
Takuya ASADA
d5d5857b62 dist: extend coredump size limit
16GB is not enough for some larger machines, so extend it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453115792-21989-2-git-send-email-syuu@scylladb.com>
2016-01-18 13:38:43 +02:00
Takuya ASADA
023c6dc620 dist: preserve environment variable when running scylla_prepare on sudo
sysconfig parameters are passed via environment variables, but sudo resets it by default.
Need to keep them beyond sudo.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1453115792-21989-1-git-send-email-syuu@scylladb.com>
2016-01-18 13:22:14 +02:00
Gleb Natapov
dde2e80a20 storage_proxy: remove batchlog synchronously
Wait for batchlog removal before completing a query otherwise batchlog
removal queries may accumulate. Still ignore an error if it happens
since it is not critical, but log it.

Message-Id: <20160118095642.GB6705@scylladb.com>
2016-01-18 12:38:12 +02:00
Avi Kivity
221ef4536c messaging service: limit rpc server resources
Otherwise, a slow node can be overwhelmed by other nodes and run out of
memory.

Fixes #596.
Message-Id: <1452776394-13682-1-git-send-email-avi@scylladb.com>
2016-01-18 11:16:45 +02:00
Avi Kivity
a881e596fa Merge "Ubuntu dependency packages fix" from Takuya 2016-01-18 11:13:18 +02:00
Gleb Natapov
f97eed0c94 fix batch size checking
warn_threshold is in kbytes v.size is in bytes size is in kbytes.

Message-Id: <20160118090620.GZ6705@scylladb.com>
2016-01-18 11:08:13 +02:00
Avi Kivity
5313a28044 Merge "Fix re-addinig collections" from Paweł
"This series makes sure that Scylla rejects adding a collections if
its column name is the same as a collection that existed before and
their types are incompatible.

Fixes #782"
2016-01-18 10:58:40 +02:00
Tomasz Grabiec
237819c31f logalloc: Excluded zones' free segments in lsa/byres-non_lsa_used_space
Historically the purpose of the metric is to show how much memory is
in standard allocations. After zones were introduced, this would also
include free space in lsa zones, which is almost all memory, and thus
the metric lost its original meaning. This change brings it back to
its original meaning.

Message-Id: <1452865125-4033-1-git-send-email-tgrabiec@scylladb.com>
2016-01-18 10:48:14 +02:00
Takuya ASADA
5270da1eef dist: prevent use abrt with scylla
scylla should use with systemd-coredump, not abrt.
Fixes #762

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1452713081-32492-1-git-send-email-syuu@scylladb.com>
2016-01-18 10:47:11 +02:00
Pekka Enberg
7d3a3bd201 Merge "column family cleanup support" from Raphael
"This patch is intended to add support to column family cleanup, which will
 make 'nodetool cleanup' possible.

 Why is this feature needed? Remove irrelevant data from a node that loses part
 of its token range to a newly added node."
2016-01-18 10:15:05 +02:00
Pekka Enberg
6cc02242f6 Merge "Multi schema support in commit log" from Paweł
"This series adds support for multiple schema versions to the commit log.
 All segments contain column mappings of all schema versions used by the
 mutations contained in the segment, which are necessary in order to be
 able to read frozen mutations and upgrade them to the current schema
 version."
2016-01-18 10:11:26 +02:00
Avi Kivity
d5050e4c6a storage_proxy: make MUTATION and MUTATION_DONE verbs sychronous at the server side
While MUTATION and MUTATION_DONE are asynchronous by nature (when a MUTATION
completes, it sends a MUTATION_DONE message instead of responding
synchronously), we still want them to be synchronous at the server side
wrt. the RPC server itself.  This is because RPC accounts for resources
consumed by the handler only while the handler is executing; if we return
immediately, and let the code execute asynchronously, RPC believes no
resources are consumed and can instantiate more handlers than the shard
has resources for.

Fix by changing the return type of the handlers to future<no_wait_type>
(from a plain no_wait_type), and making that future complete when local
processing is over.

Ref #596.
Message-Id: <1453048967-5286-1-git-send-email-avi@scylladb.com>
2016-01-18 09:59:34 +02:00
Nadav Har'El
d97cbbbe43 repair: forbid repair with "-dc" not including the current host
Theoretically, one could want to repair a single host *and* all the hosts
in one or more other data centers which don't include this host. However,
Cassandra's "nodetool repair" explicitly does not allow this, and fails if
given a list of data centers (via the "-dc" option) which doesn't include
the host starting the repair. So we need to behave like "nodetool repair"
and fail in this case too.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1453037016-25775-1-git-send-email-nyh@scylladb.com>
2016-01-18 09:54:16 +02:00
Paweł Dziepak
fa7bef72d4 tests/cql3: add tests for ALTER TABLE validation
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:35:50 +01:00
Paweł Dziepak
b7e58db7ec tests: allow any future in assert_that_failed()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:35:44 +01:00
Paweł Dziepak
00f7a873a5 cql3: forbid re-adding collection with incompatible type
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:35:38 +01:00
Paweł Dziepak
4927ff95da schema: read collections from comparator
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:35:33 +01:00
Paweł Dziepak
725129deb7 type_parser: accept sstring_view
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:35:27 +01:00
Paweł Dziepak
6372a22064 schema: use _raw._collections to generate comparator name
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:35:03 +01:00
Paweł Dziepak
84840c1c98 schema: keep track of removed collections
Cassandra disallows adding a column with the same name as a collection
that existed in the past in that table if the types aren't compatible.
To enforce that Scylla needs to keep track of all collections that ever
existed in the column family.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:34:29 +01:00
Avi Kivity
c4cf4e0bcd Merge seastar upstream
* seastar a8183c1...e93cd9d (2):
  > rpc: make sure we serialize on _resources_available sempahore
  > rpc: fix support for handlers returning future<no_wait_type>
2016-01-17 18:36:22 +02:00
Avi Kivity
249dbc1d8e Merge seastar upstream
* seastar 6f9453d...a8183c1 (2):
  > rpc: fix server losing handler
  > Merge "Fair I/O Queue" from Glauber
2016-01-17 14:21:53 +02:00
Takuya ASADA
01309c0dd8 dist: add missing dependency (xfslibs-dev) for Ubuntu
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-16 15:39:09 +09:00
Takuya ASADA
9ad3365353 dist: use gdebi to resolve install-time dependencies
Since we switched to use mk-build-deps, it only resolves build-time dependencies.
We also need to install install-time dependencies.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-16 15:39:09 +09:00
Takuya ASADA
705285cf27 dist: resolve build time dependency by mk-build-deps command, do not install them manually
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-16 15:39:09 +09:00
Takuya ASADA
90be81f9ba dist: add missing build time dependency for thrift package on Ubuntu
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-16 15:39:09 +09:00
Tomasz Grabiec
d332fcaefc row_cache: Restore indentation 2016-01-15 15:33:17 +01:00
Tomasz Grabiec
6b3cd35109 Merge branch 'pdziepak/multi-schema-sstables/v1'
From Paweł:

This series add support for reading sstables using different schema than
the one that was used to write them.
2016-01-15 14:23:18 +01:00
Paweł Dziepak
dbf23fdff5 tests/sstable: add test for multi schema
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-15 13:12:40 +01:00
Paweł Dziepak
cfc0a132a9 sstable: handle multi-cell vs atomic incompatibilities
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-15 13:12:40 +01:00
Paweł Dziepak
581271a243 sstables: ignore data belonging to dropped columns
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-15 13:12:40 +01:00
Asias He
e10580f474 cql_server: Fix connection shutdown
_fd is of type connected_socket. shutdown_input() and shutdown_output()
return future<>. Do not ignore the future.

Message-Id: <786eee890541a18d3501ecd52415f2900c545157.1452835922.git.asias@scylladb.com>
2016-01-15 11:37:30 +02:00
Tomasz Grabiec
ccd609185f sstables: Add ability to wait for async sstable cleanup tasks
This patch adds a function which waits for the background cleanup work
which is started from sstable destructors.

We wait for those cleanups on reactor exit so that unit tests don't
leak. This fixes erratic ASAN complaint about memory leak when running
schema_change_test in debug mode:

    Indirect leak of 64 byte(s) in 1 object(s) allocated from:
         0x7fab24413912 in operator new(unsigned long) (/lib64/libasan.so.2+0x99912)
         0x1776aeb in make_unique<continuation<future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> >, future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> > /usr/include/c++/5.1.1/bits/unique_ptr.h:765
         0x1752b69 in schedule<future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:513
        0x1711365 in schedule<future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:690
        0x16d0474 in then_wrapped<future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>, future<> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:880
        0x1696e9c in handle_exception<sstables::sstable::~sstable()::<lambda(auto:52)> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:1012
        0x1638ba8 in sstables::sstable::~sstable() sstables/sstables.cc:1619

The leak is about allocations related to close() syscall tasks invoked
from sstable destructor, which were not waited for.

Message-Id: <1452783887-25244-1-git-send-email-tgrabiec@scylladb.com>
2016-01-15 11:32:15 +02:00
Calle Wilund
e935c9cd34 select_statement: Make sure all aggregate queries use paging
Mainly to make sure we respect row limits. Since normal result
generation does not for aggregates.

Fixes #752 

Message-Id: <1452681048-30171-2-git-send-email-calle@scylladb.com>
2016-01-14 19:03:37 +02:00
Calle Wilund
1dc5937f40 query_pagers: fix log message in requires_paging
message would state that all queries required paging, even when
returning the opposite to caller

Message-Id: <1452681048-30171-1-git-send-email-calle@scylladb.com>
2016-01-14 19:03:16 +02:00
Asias He
cc3073b42d gossip: cleanup application_state
Drop the unused one.

Message-Id: <4cc45164d55742951b618d2c7b1e8bdb997f005a.1452771260.git.asias@scylladb.com>
2016-01-14 19:01:51 +02:00
Avi Kivity
d47a58cc32 README: add libxml2 and libpciaccess packages to list or required packages
Needed for link stage.
2016-01-14 17:47:48 +02:00
Avi Kivity
cf7e6cede2 README: add hwloc and numactl to install recommendations 2016-01-14 17:30:43 +02:00
Tomasz Grabiec
b7976f3b82 config: Set default logging level to info
Commit d7b403db1f changed the default in
logging::logger. It affected tests but not scylla binary, where it's
being overwritten in main.cc.
Message-Id: <1452777008-21708-1-git-send-email-tgrabiec@scylladb.com>
2016-01-14 15:11:58 +02:00
Pekka Enberg
9306f4eb22 Merge "Disable ALTER TABLE statement unless --experimental=on" from Tomek 2016-01-14 14:30:20 +02:00
Avi Kivity
cf8ab65fbc Merge seastar upstream
* seastar 43e64c2...6f9453d (2):
  > Merge "rpc resource accounting"
  > core: Introduce smp::invoke_on_all()
2016-01-14 14:28:27 +02:00
Asias He
826b6ed877 gossip: Print node status in handle_major_state_change
Message-Id: <1452768680-32355-1-git-send-email-asias@scylladb.com>
2016-01-14 14:22:37 +02:00
Asias He
e7a899f5f3 gossip: Enable debug msg for convcit
Kill one FIXME in convict

Message-Id: <1452768680-32355-2-git-send-email-asias@scylladb.com>
2016-01-14 14:22:36 +02:00
Tomasz Grabiec
054f1df0a5 cql3: Disable ALTER TABLE unless experimental features are on 2016-01-14 13:21:13 +01:00
Tomasz Grabiec
1fd03ea1d2 tests: cql_test_env: Enable experimental features 2016-01-14 13:21:13 +01:00
Tomasz Grabiec
a13aaa62df config: Add 'experimental' switch 2016-01-14 13:21:13 +01:00
Gleb Natapov
647a09cd7b storage_proxy: improve mutation timeout logging
Message-Id: <20160114105359.GY6705@scylladb.com>
2016-01-14 12:00:35 +01:00
Pekka Enberg
733584c44d main: Start the API service as the last step
This reverts commit f0d68e4 ("main: start the http server in the first
step"). The service layer is not ready to serve clients before it's
fully up and running which causes early startup crashes everywhere.
Message-Id: <1452768015-22763-1-git-send-email-penberg@scylladb.com>
2016-01-14 12:55:50 +02:00
Tomasz Grabiec
1daaf909d7 Merge branch 'tgrabiec/row_cache_invalidate_fix'
Fixes for wrap-around range handling in row_cache.
2016-01-14 11:38:26 +01:00
Takuya ASADA
7479cde28b dist: extend root disk size to 10GB
Since default root disk size is too small for our purpose, it's better to extend.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1452762325-5620-1-git-send-email-syuu@scylladb.com>
2016-01-14 11:29:00 +02:00
Pekka Enberg
90123197e1 service/client_state: Use anonymous user when authentication is disabled
If authentication is disabled, nobody calls login() to set the current
user. There's untranslated code in client_state constructor to do just
that.

Fixes "You have not logged in" errors when USE statement is executed
with authentication disabled.
Message-Id: <1452759946-13998-1-git-send-email-penberg@scylladb.com>
2016-01-14 09:29:33 +01:00
Avi Kivity
4143cf6385 Merge "Initial authenticator support" from Calle
"Add implementation of cassandra password authenticator, and user
password checking to CQL connections.

User/pwd are stored in system_auth table. Passwords are hashed
using glibc 'crypt_r'.

The latter is worth noting, as this is a difference compared to origin;
Origin uses Java bcrypt library for salt/hash, i.e. blowfish hashing.
Most glibc variants do _not_ have support for blowfish. To be 100%
compatible with imported origin tables we might need to add
bcrypt/blowfish sources into scylla (no packaged libs available afaict)

The code currently first attempts to use blowfish, if we happen to run
centos or Openwall, which has it compiled in. Otherwise we will fall
back to sha512, sha256 or even md5 depending on lib support.

To use:
* scylla.conf: authenticator=PasswordAuthenticator
* cqlsh -u cassandra -p cassandra

Not implemented (yet):
* "Authorizer", thus no KS/CF access checking
* CQL create/alter/delete user (create_user_statement etc). I.e. there is
  only a single user name; default "cassandra:cassandra" user/pwd combo"
2016-01-13 19:13:05 +02:00
Takuya ASADA
0511b02f90 dist: run scylla_prepare, scylla_stop on sudo
Since we changed uid on scylla-server.service to scylla, we need sudo for these scripts.

Fixes #783

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1452704598-5292-1-git-send-email-syuu@scylladb.com>
2016-01-13 19:06:33 +02:00
Tomasz Grabiec
6b059fd828 row_cache: Guard against wrap-around range in make_reader() 2016-01-13 17:50:55 +01:00
Tomasz Grabiec
7fb0bc4e15 row_cache: Take the reclaim lock in invalidate()
It's needed to keep the iterators valid in case eviciton is triggered
somehwere in between. It probably isn't because destructors should not
allocate, but better be safe.
2016-01-13 17:50:55 +01:00
Tomasz Grabiec
5e05f63ee7 tests: Add more tests for row_cache::invalidate()
Regs #785.
2016-01-13 17:50:55 +01:00
Tomasz Grabiec
50cc0c162e row_cache: Make invalidate() handle wrap-around ranges
Currently for wrap around the "begin" iterator would not meet with the
"end" iterator, invoking undefined behavior in erase_and_dispose()
which results in a crash.

Fixes #785
2016-01-13 17:50:55 +01:00
Calle Wilund
8192384338 auth_test: Unit tests for auth objects 2016-01-13 15:37:39 +00:00
Calle Wilund
9e3295bc69 cql_test_env: Allow specifying db::config for the env 2016-01-13 15:35:37 +00:00
Calle Wilund
9ef05993ff config: Mark "authenticator" used + update description 2016-01-13 15:35:36 +00:00
Calle Wilund
1d811f1e8f transport::server: Add authentication support
If system autheticator object requires authentication, issue
a challenge to client, and process response.
2016-01-13 15:35:36 +00:00
Calle Wilund
1c30d37285 client_state: Add user object + login
Note: all actual authorization methods are still unimplemented.
2016-01-13 15:35:36 +00:00
Calle Wilund
4692f46b8d storage_service: Initialize auth system on start 2016-01-13 15:35:36 +00:00
Calle Wilund
9a4d45e19d auth::auth/authenticator: user storage and authentication
User db storage + login/pwd db using system tables.

Authenticator object is a global shard-shared singleton, assumed
to be completely immutable, thus safe.
Actual login authentication is done via locally created stateful object
(sasl challenge), that queries db.

Uses "crypt_r" for password hashing, vs. origins use of bcrypt.
Main reason is that bcrypt does not exist as any consistent package
that can be consumed, so to guarantee full compatibility we'd have
to include the source. Not hard, but at least initially more work than
worth.
2016-01-13 15:35:35 +00:00
Calle Wilund
00de63c920 cql3::query_processor: Add processing helpers for internal usage
syntactical sugar + "process" for internal, similar to 
execute_internal, but allowing querying the whole cluster, and optional
statement caching.
2016-01-13 15:35:21 +00:00
Calle Wilund
6a5f075107 batch_statement: Modify verify_batch_size to match current origin
Fixes #614

* Use warning threshold from config
* Don't throw exceptions. We're only supposed to warn.
* Try to actually estimate mutation data payload size, not
  number of mutations.
Message-Id: <1452615759-23213-1-git-send-email-calle@scylladb.com>
2016-01-13 12:26:49 +01:00
Paweł Dziepak
218898b297 commitlog: upgrade mutations during commitlog replay
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:50:26 +01:00
Paweł Dziepak
661849dbc3 commitlog: learn about schema versions during replay
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:50:23 +01:00
Paweł Dziepak
55d342181a commitlog: do not skip entries inside a chunk
All entries inside a chunk needs to be read since any of them may
contain column mapping.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:23:00 +01:00
Paweł Dziepak
18d0a57bf4 commitlog: use commitlog entry writer and reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:20:06 +01:00
Paweł Dziepak
a877905bd4 commitlog: allow adding entries using commitlog_entry_writer
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:17:45 +01:00
Paweł Dziepak
0254c3e30b commitlog: add commitlog entry writer and reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:13:49 +01:00
Paweł Dziepak
434c02cdfa commitlog: keep track of schema versions
Each segment chunk should contain column mappings for all schema
versions used by the mutations it contains. In order to avoid
duplication db::commitlog::segment remembers all schema versions already
written in current chunk.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:13:41 +01:00
Paweł Dziepak
9d74268234 commitlog: introduce entry_writer
Current commitlog interface requires writers to specify the size of a
new entry which cannot depend on the segment to which the entry is
written.
If column mappings are going to be stored in the commitlog that's not
enough since we don't know whether column mapping needs to be written
until we known in which segment the entry is going to be stored.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:13:26 +01:00
Calle Wilund
32e480025f cql3::query_options: Add constructors for internal processing 2016-01-13 08:49:01 +00:00
Calle Wilund
2e9ab3aff1 types.hh: Add data_type_for<bool> 2016-01-13 08:49:01 +00:00
Calle Wilund
40efd231b1 auth::authenticated_user: Object representing a named or anon user 2016-01-13 08:49:01 +00:00
Calle Wilund
51af2bcafd auth::permission: permissions for authorization
Not actually used yet. But some day...
2016-01-13 08:49:01 +00:00
Calle Wilund
6f708eae1c auth::data_resource: resource identifier for auth permissions 2016-01-13 08:49:01 +00:00
Calle Wilund
9c1d088718 exceptions: add authorization exceptions 2016-01-13 08:49:01 +00:00
Calle Wilund
cd4ae7a81e Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-13 08:48:43 +00:00
Tomasz Grabiec
e88f41fb3f messaging_service: Move REPAIR_CHECKSUM_RANGE verb out of the streaming verbs group
Message-Id: <1452620321-17223-1-git-send-email-tgrabiec@scylladb.com>
2016-01-12 20:17:08 +02:00
Calle Wilund
8de95cdee8 paging bugfix: Allow reset/removal of "specific ck range"
Refs #752

Paged aggregate queries will re-use the partition_slice object,
thus when setting a specific ck range for "last pk", we will hit
an exception case.
Allow removing entries (actually only the one), and overwriting
(using schema equality for keys), so we maintain the interface
while allowing the pager code to re-set the ck range for previous
page pass.

[tgrabiec: commit log cleanup, fixed issue ref]

Message-Id: <1452616259-23751-1-git-send-email-calle@scylladb.com>
2016-01-12 17:45:57 +01:00
Calle Wilund
7d7d592665 batch_statement: Modify verify_batch_size to match current origin
Fixes #614

* Use warning threshold from config
* Don't throw exceptions. We're only supposed to warn.
* Try to actually estimate mutation data payload size, not
  number of mutations.
2016-01-12 16:30:31 +00:00
Calle Wilund
81e9dc0c2a paging bugfix: Ensure limit for single page is min(page size, limit left)
Fixes #752

We set row limit for query to be min of page size/remaining in limit,
but if we have a multinode query we might end up with more rows than asked
for, so must do this again in post-processing.
2016-01-12 16:30:30 +00:00
Calle Wilund
ea92d7d4fd paging bugfix: Allow reset/removal of "specific ck range"
Refs #792

Paged aggregate queries will re-use the partition_slice object,
thus when setting a specific ck range for "last pk", we will hit
an exception case.
Allow removing entries (actually only the one), and overwriting
(using schema equality for keys), so we maintain the interface
while allowing the pager code to re-set the ck range for previous
page pass. 

v2: 
* Changed to schema-equality checks so we sort of maintain a 
  sane api and behaviour, even with the 1-entry map
 
v3: 
* Renamed remove "contains" in specific_ranges, and made the calling
  code use more map-like logic, again to keep things cleaner
2016-01-12 16:30:30 +00:00
Calle Wilund
e50d8b6895 paging bugfix: Ensure limit for single page is min(page size, limit left)
Fixes #752

We set row limit for query to be min of page size/remaining in limit,
but if we have a multinode query we might end up with more rows than asked
for, so must do this again in post-processing.

Message-Id: <1452606935-12899-2-git-send-email-calle@scylladb.com>
2016-01-12 17:23:04 +02:00
Vlad Zolotarov
9232ad927f messaging_service::get_rpc_client(): fix the encryption logic
According to specification
(here https://wiki.apache.org/cassandra/InternodeEncryption)
when the internode encryption is set to `dc` the data passed between
DCs should be encrypted and similarly, when it's set to `rack`
the inter-rack traffic should encrypted.

Currently Scylla would encrypt the traffic inside a local DC in the
first case and inside the local RACK in the later one.

This patch fixes the encryption logic to follow the specification
above.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1452501794-23232-1-git-send-email-vladz@cloudius-systems.com>
2016-01-12 16:22:26 +02:00
Avi Kivity
4693197e37 Merge seastar upstream
* seastar fe7a49c...43e64c2 (1):
  > resource: fix failures on low-memory machines

Fixes #734.
2016-01-12 14:45:43 +02:00
Calle Wilund
5b9f196115 Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-12 11:46:40 +00:00
Avi Kivity
39f81b95d6 main: make --developer-mode relax dma requirements
With Docker we might be running on a filesystem that does not support DMA
(aufs; or tmpfs on boot2docker), so let --developer-mode allow running
on those file systems.
Message-Id: <1452593083-25601-1-git-send-email-avi@scylladb.com>
2016-01-12 13:34:46 +02:00
Avi Kivity
d68026716e Merge seastar upstream
* seastar ad3577b...fe7a49c (2):
  > reactor: workaround tmpfs O_DIRECT vs O_EXCL bug
  > rpc: fix reordering between sending client's negotiation frame and user's data
2016-01-12 13:27:16 +02:00
Takuya ASADA
a1d1d0bd06 Revert "dist: prevent 'local rpm' AMI image update to older version of scylla package by yum update"
This reverts commit b28b8147a0.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1452592877-29721-2-git-send-email-syuu@scylladb.com>
2016-01-12 12:26:09 +02:00
Takuya ASADA
5459df1e9e dist: renumber development version as 666.development
yum command think "development-xxxx.xxxx" is older than "0.x", so nightly package mistakenly update with release version.
To prevent this problem, we should add greater number prior to "development".
Also same on Ubuntu package.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1452592877-29721-1-git-send-email-syuu@scylladb.com>
2016-01-12 12:26:08 +02:00
Calle Wilund
fdda880920 Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-12 10:17:22 +00:00
Avi Kivity
5809ed476f Merge "Orderly service startup for systemd"
Use systemd Type=notify to tell systemd about startup progress.

We can now use 'systemctl status scylla-server' to see where we are
in service startup, and 'systemctl start scylla-server' will wait until
either startup is complete, or we fail to start up.
2016-01-12 12:01:32 +02:00
Avi Kivity
3d5f6de683 main: notify systemd of startup progress
Send current startup stage via sd_notify STATUS variable; let it know that
startup is complete via READY=1.

Fixes #760.
2016-01-12 11:58:24 +02:00
Calle Wilund
1b54b9c2d8 Merge branch 'master' of https://github.com/scylladb/scylla 2016-01-12 09:02:05 +00:00
Calle Wilund
7f4985a017 commit log reader bugfix: Fix tried to read entries across chunk bounds
read_entry did not verify that current chunk has enough data left
for a minimal entry. Thus we could try to read an entry from the slack
left in a chunk, and get lost in the file (pos > next, skip very much
-> eof). And also give false errors about corruption.
Message-Id: <1452517700-599-1-git-send-email-calle@scylladb.com>
2016-01-12 10:29:07 +02:00
Tzach Livyatan
c5b332716c Fix AMI prompt from "nodetool --help" to "nodetool help"
Fixes #775

Signed-off-by: Tzach Livyatan <tzach@scylladb.com>
Message-Id: <1452586945-28738-1-git-send-email-tzach@scylladb.com>
2016-01-12 10:27:05 +02:00
Takuya ASADA
fc13b9eb66 dist: yum install epel-release before installing CentOS dependencies
Fixes #779

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1452586442-19777-1-git-send-email-syuu@scylladb.com>
2016-01-12 10:24:56 +02:00
Raphael S. Carvalho
fc6a1934b0 api: implement force_keyspace_cleanup
This will add support for an user to clean up an entire keyspace
or some of its column families.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 03:53:22 -02:00
Raphael S. Carvalho
a5c90194f5 db: add support to clean up a column family
Cleanup is a procedure that will discard irrelevant keys from
all sstables of a column family, thus saving disk space.
Scylla will clean up a sstable by using compaction code, in
which this sstable will be the only input used.
Compaction manager was changed to become aware of cleanup, such
that it will be able to schedule cleanup requests and also know
how to handle them properly.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 03:53:04 -02:00
Raphael S. Carvalho
d44a5d1e94 compaction: filter out compacting sstables
The implementation is about storing generation of compacting sstables
in an unordered set per column family, so before strategy is called,
compaction manager will filter out compacting sstables.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 01:18:29 -02:00
Raphael S. Carvalho
9c13c1c738 compaction: move compaction execution from strategy to manager
Currently, compaction strategy is the responsible for both getting the
sstables selected for compaction and running compaction.
Moving the code that runs compaction from strategy to manager is a big
improvement, which will also make possible for the compaction manager
to keep track of which sstables are being compacted at a moment.
This change will also be needed for cleanup and concurrent compaction
on the same column family.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 00:04:27 -02:00
Raphael S. Carvalho
68619211f5 tests: add test to a sstable rewrite
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-11 21:43:41 -02:00
Raphael S. Carvalho
ed80ed82ef sstables: prepare compact_sstables to work with cleanup
Cleanup is about rewriting a sstable discarding any keys that
are irrelevant, i.e. keys that don't belong to current node.
Parameter cleanup was added to compact_sstables.
If set to true, irrelevant code such as the one that updates
compaction history will be skipped. Logic was also added to
discard irrelevant keys.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-11 21:43:40 -02:00
Raphael S. Carvalho
5c674091dc db: move code that rebuilds sstable list to a function
That code will be used by column family cleanup, so let's put
that code into a function. This change also improves the code
readability.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-11 19:51:04 -02:00
Raphael S. Carvalho
58189dd489 db: move generation calculation code to a function
Code that calculates generation should be put in a function.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-11 19:51:02 -02:00
Avi Kivity
678bdd5c79 Merge "Change AMI base image to CentOS7, use systemd-coredump for Fedora/CentOS, make AMI rootfs as XFS" from Takuya 2016-01-11 18:43:57 +02:00
Tzach Livyatan
8a4f7e211b Add REST API server ip:port parameters to scylla.yaml
api_port and api_address are already valid configuration options.
Adding them to scylla.yaml, let user know they exists.

solve issue #704

Signed-off-by: Tzach Livyatan <tzach@cloudius-systems.com>
Message-Id: <1452527028-13724-1-git-send-email-tzach@cloudius-systems.com>
2016-01-11 18:00:48 +02:00
Avi Kivity
f917f73616 Merge "Handling of schema changes" from Tomasz
"Our domain objects have schema version dependent format, for efficiency
reasons. The data structures which map between columns and values rely on
column ids, which are consecutive integers. For example, we store cells in a
vector where index into the vector is an implicit column id identifying table
column of the cell. When columns are added or removed the column ids may
shift. So, to access mutations or query results one needs to know the version
of the schema corresponding to it.

In case of query results, the schema version to which it conforms will always
be the version which was used to construct the query request. So there's no
change in the way query result consumers operate to handle schema changes. The
interfaces for querying needed to be extended to accept schema version and do
the conversions if necessary.

Shard-local interfaces work with a full definition of schema version,
represented by the schema type (usually passed as schema_ptr). Schema versions
are identified across shards and nodes with a UUID (table_schema_version
type). We maintain schema version registry (schema_registry) to avoid fetching
definitions we already know about. When we get a request using unknown schema,
we need to fetch the definition from the source, which must know it, to obtain
a shard-local schema_ptr for it.

Because mutation representation is schema version dependent, mutations of
different versions don't necessarily commute. When a column is dropped from
schema, the dropped column is no longer representable in the new schema. It is
generally fine to not hold data for dropped columns, the intent behind
dropping a column is to lose the data in that column. However, when merging an
incoming mutation with an existing mutation both of which have different
schema versions, we'd have to choose which schema should be considered
"latest" in order not to loose data. Schema changes can be made concurrently
in the cluster and initiated on different nodes so there is not always a
single notion of latest schema. However, schema changes are commutative and by
merging changes nodes eventually agree on the version.  For example adding
column A (version X) on one node and adding column B (version Y) on another
eventually results in a schema version with both A and B (version Z). We
cannot tell which version among X and Y is newer, but we can tell that version
Z is newer than both X and Y. So the solution to the problem of merging
conflicting mutations could be to ensure that such merge is performed using
the schema which is superior to schemas of both mutations.

The approach taken in the series for ensuring this is as follows. When a node
receives a mutation of an unknown schema version it first performs a schema
merge with the source of that mutation. Schema merge makes sure that current
node's version is superior to the schema of incoming mutation. Once the
version is synced with, it is remembered as such and won't be synced with on
later mutations. Because of this bookkeeping, schema versions must be
monotonic; we don't want table altering to result in any earlier version
because that would cause nodes to avoid syncing with them. The version is a
cryptographically-secure hash of schema mutations, which should fulfill this
purpose in practice.

TODO: It's possible that the node is already performing a sync triggered by
broadcasted schema mutations. To avoid triggering a second sync needlessly, the
schema merging should mark incoming versions as being synced with.

Each table shard keeps track of its current schema version, which is
considered to be superior to all versions which are going to be applied to it.
All data sources for given column family within a shard have the same notion
of current schema version. Individual entries in cache and memtables may be at
earlier versions but this is hidden behind the interface. The entries are
upgraded to current version lazily on access. Sstables are immutable, so they
don't need to track current version. Like any other data source, they can be
queried with any schema version.

Note, the series triggered a bug in demangler:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68700"
2016-01-11 17:59:14 +02:00
Avi Kivity
3092c1ebb5 Update scylla-ami submodule
* ami/files/scylla-ami 07b7118...eb1fdd4 (2):
  > move log file to /var/lib/scylla
  > move config file to /etc/scylla
2016-01-11 17:58:47 +02:00
Avi Kivity
9182ce1f61 Merge seastar upstream
* seastar d0bf6f8...ad3577b (9):
  > httpd: close connection before deleting it
  > reactor: support for non-O_DIRECT capable filesystems
  > tests: modernize linecount
  > IO queues: destruct within reactor's destructor
  > tests: Use dnsdomainname in mkcert.gmk
  > tests: memcached: workaround a possible race between flush_all and read
  > apps: memcached: reduce the error during the expiration time translation
  > timer: add missing #include
  > core: do not call open_file_dma directly

Fixes #757.
2016-01-11 17:41:39 +02:00
Takuya ASADA
6a457da969 dist: add ignore files for AMI 2016-01-11 14:22:20 +00:00
Takuya ASADA
b28b8147a0 dist: prevent 'local rpm' AMI image update to older version of scylla package by yum update
Since yum command think development version is older than release version, we need it.
2016-01-11 14:22:13 +00:00
Takuya ASADA
dd9894a7b6 dist: cleanup build directory before creating rpms for AMI
To prevent AMI build fail, cleanup build directory first.
2016-01-11 14:21:02 +00:00
Takuya ASADA
8886fe7393 dist: use systemd-coredump on Fedora/CentOS, create symlink /var/lib/scylla/coredump -> /var/lib/systemd/coredump when we mounted RAID
Use systemd-coredump for coredump if distribution is CentOS/RHEL/Fedora, and make symlink from RAID to /var/lib/systemd/coredump if RAID is mounted.
2016-01-11 14:20:50 +00:00
Takuya ASADA
927957d3b9 dist: since AMI uses XFS rootfs, we don't need to warn extra disks not attached to the AMI instance
Even extra disks are not supplied, it's stil valid since we have XFS rootfs now.
2016-01-11 14:19:35 +00:00
Takuya ASADA
47be3fd866 dist: split scylla_install script to two parts, scylla_install_pkg is for installing .rpm/.deb packages, scylla_setup is for setup environment after package installed
This enables to setup RAID/NTP/NIC after .rpm/.deb package installed.
2016-01-11 14:19:29 +00:00
Takuya ASADA
76f0191382 dist: remove scylla_local.json, merge it to scylla.json
We can share one packer config file for both build settings.
2016-01-11 14:18:55 +00:00
Takuya ASADA
8721e27978 dist: fetch CentOS dependencies from our yum repository by default
Only rebuild dependencies when passing -R option to build_rpm.sh
2016-01-11 14:18:49 +00:00
Takuya ASADA
b3c85aea89 dist: switch AMI base image from Fedora to CentOS
Move AMI to CentOS, use XFS for rootfs
2016-01-11 14:18:30 +00:00
Takuya ASADA
202389b2ec dist: don't need yum install and mv scylla-ami before scylla_install
This fixes 'amazon-ebs: mv: cannot stat ‘/home/fedora/scylla-ami’: No such file or directory' on build_ami_local.sh
2016-01-11 14:18:08 +00:00
Takuya ASADA
f3c32645d3 dist: add build time dependency to scylla-libstdc++-static for CentOS
This fixes link error on CentOS

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2016-01-11 14:17:52 +00:00
Takuya ASADA
780d9a26b2 configure.py: add --python option to specify python3 command path, for CentOS
Since python3 path is /usr/bin/python3.4 on CentOS, we need modify it's path
2016-01-11 14:17:27 +00:00
Takuya ASADA
b0980ef0c4 dist: use scylla-boost instead of boost to fix compile error on CentOS
boost package doesn't usable on CentOS, use scylla-boost instead.
2016-01-11 14:17:02 +00:00
Lucas Meneghel Rodrigues
94c3c5c1e9 dist/ami: Print newline at the end of MOTD banner
The MOTD banner now printed upon .bash_profile execution,
if scylla is running, ends with a 'tput sgr0'. That command
appends an extra '[m' at the beginning of the output of any
following command. The automation scripts don't like this.

So let's add an 'echo' at the end of that path to add a newline,
avoiding the condition described above, and another one at the
'ScyllaDB is not started' path, for symmetry. I'm doing this
as it seems easier than having to develop heuristics to know
whether to remove or not that character.

CC: Shlomi Livne <slivne@scylladb.com>
Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>
Message-Id: <1452216044-28374-1-git-send-email-lmr@scylladb.com>
2016-01-11 15:40:43 +02:00
Calle Wilund
244cd62edb commit log reader bugfix: Fix tried to read entries across chunk bounds
read_entry did not verify that current chunk has enough data left
for a minimal entry. Thus we could try to read an entry from the slack
left in a chunk, and get lost in the file (pos > next, skip very much
-> eof). And also give false errors about corruption.
2016-01-11 13:07:26 +00:00
Vlad Zolotarov
0ed210e117 storage_proxy::query(): intercept exceptions coming from trace()
Exceptions originated by an unimplemented to_string() methods
may interrupt the query() flow if not intercepted. Don't let it
happen.

Fixes issue #768

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-11 12:29:50 +01:00
Tomasz Grabiec
e62857da48 schema_tables: Wait for make_directory_for_column_family() to finish in merge_tables() 2016-01-11 10:34:55 +01:00
Tomasz Grabiec
71bbbceced schema_tables: Notify about table creation after it is fully inited
I'm not aware of any issues it could cause, but it makes more sense
that way.
2016-01-11 10:34:55 +01:00
Tomasz Grabiec
b6c6ee5360 tests: Add test for statement invalidation 2016-01-11 10:34:55 +01:00
Tomasz Grabiec
036eec295f query_processor: Invalidate statements synchronously
We want the statements to be removed before we ack the schema change,
otherwise it will race with all future operations.

Since the subscriber will be invoked on each shard, there is no need
to broadcast to all shards, we can just handle current shard.
2016-01-11 10:34:55 +01:00
Tomasz Grabiec
8deb3f18d3 query_processor: Invalidate prepared statements when columns change
Replicates https://issues.apache.org/jira/browse/CASSANDRA-7910 :

"Prepare a statement with a wildcard in the select clause.
2. Alter the table - add a column
3. execute the prepared statement
Expected result - get all the columns including the new column
Actual result - get the columns except the new column"
2016-01-11 10:34:55 +01:00
Tomasz Grabiec
facc549510 schema: Introduce equal_columns() 2016-01-11 10:34:55 +01:00
Tomasz Grabiec
0ea045b654 tests: Add notification test to schema_change_test 2016-01-11 10:34:54 +01:00
Tomasz Grabiec
d80ffc580f schema_tables: Notify about table schema update 2016-01-11 10:34:54 +01:00
Tomasz Grabiec
40858612e5 db: Make column_family::schema() return const& to avoid copy 2016-01-11 10:34:54 +01:00
Tomasz Grabiec
8817e9613d migration_manager: Simplify notifications
Currently the notify_*() method family broadcasts to all shards, so
schema merging code invokes them only on shard 0, to avoid doubling
notifications. We can simplify this by making the notify_*() methods
per-instance and thus shard-local.
2016-01-11 10:34:54 +01:00
Tomasz Grabiec
5d38614f51 tests: Add test for column drop 2016-01-11 10:34:54 +01:00
Tomasz Grabiec
5689a1b08b tests: Add test for column drop 2016-01-11 10:34:54 +01:00
Paweł Dziepak
21bbc65f3f tests/cql: add tests for ALTER TABLE
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-11 10:34:54 +01:00
Paweł Dziepak
0276919819 cql3: complete translation of alter table statement
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-11 10:34:54 +01:00
Paweł Dziepak
f24f677dde db/schema_tables: simplify column difference computation
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-11 10:34:54 +01:00
Paweł Dziepak
ae3acd0f9c system_tables: store sechma::dropped_columns in system tables
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-11 10:34:54 +01:00
Paweł Dziepak
b5bee9c36a schema_builder: force column id recomputation in build()
If the schema_builder is constructed from an existing schema we need to
make sure that the original column ids of regular and static columns are
*not* used since they may become invalid if columns are added or
removed.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-11 10:34:54 +01:00
Paweł Dziepak
da0f999123 schema_builder: add with_altered_column_type()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-11 10:34:54 +01:00
Paweł Dziepak
9807ddd158 schema_builder: add with_column_rename()
Columns that are part of the primary key can be renamed.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-11 10:34:54 +01:00
Paweł Dziepak
9bf13ed09b mutation_partition: drop cells from dropped_columns at upgrade
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
[tgrabiec: Merged the changes into converting_mutation_partition_applied]
2016-01-11 10:34:53 +01:00
Paweł Dziepak
3cbfa0e52f schema: add column_definition::_dropped_at
When a column is dropped its name and deletion timestamp are added
to schema::_raw._dropped_columns to prevent data resurrection in case a
column with the same name is added. To reduce the number of lookups in
_dropped_columns this patch makes each instance of column_definition
to caches this information (i.e. timestamp of the latest removal of a
column with the same name).

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-11 10:34:53 +01:00
Paweł Dziepak
42dc4ce715 schema: keep track of dropped columns
Knowing which columns were dropped (and when) is important to prevent
the data from the dropped ones reappearing if a new column is added with
the same name.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-11 10:34:53 +01:00
Tomasz Grabiec
a81fa1727b tests: Add schema_change_test 2016-01-11 10:34:53 +01:00
Tomasz Grabiec
d8ff9ee441 schema_tables: Make merge_tables() compare by mutations
Schema version is calculated from mutations, so merge_schema should
also look at mutation changes to detect schema changes whenever
version changes.
2016-01-11 10:34:53 +01:00
Tomasz Grabiec
5707c5e7ca schema_tables: Simplify merge_tables() and merge_keyspaces()
read_schema_for_keyspaces() drops empty results so the emptiness
checks are always false and we can remove some redundancy.
2016-01-11 10:34:53 +01:00
Tomasz Grabiec
bfefe5a546 schema_tables: Calculate digest from mutations
We want the node's schema version to change whenever
table_schema_version of any table changes. The latter is calculated by
hashing mutations so we should also use mutation hash when calculating
schema digest.
2016-01-11 10:34:53 +01:00
Tomasz Grabiec
b91c92401f migration_manager: Implement migration_manager::announce_column_family_update 2016-01-11 10:34:53 +01:00
Tomasz Grabiec
c6a52bed73 db: Fail when attempting to mutate using not synced schema 2016-01-11 10:34:53 +01:00
Tomasz Grabiec
a2cdbff965 storage_proxy: Log failures of definitions update handler
Fixes #769.
2016-01-11 10:34:53 +01:00
Tomasz Grabiec
e1e8858ed1 service: Fetch and sync schema 2016-01-11 10:34:53 +01:00
Tomasz Grabiec
cdca20775f messaging_service: Introduce get_source() 2016-01-11 10:34:53 +01:00
Tomasz Grabiec
f0d886893d db: Mark new schemas as synced 2016-01-11 10:34:52 +01:00
Tomasz Grabiec
fb5658ede1 schema_registry: Track synced state of schema
We need to track which schema version were synced with on current node
to avoid triggering the sync on every mutation. We need to sync before
mutating to be able to apply the incoming mutation using current
node's schema, possibly applying irreverdible transformations to it to
make it conform.
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
311e3733e0 service: migration_task: Implement using migration_manager::merge_schema_from()
To avoid duplication.
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
dee0bbf3f3 migration_manager: Introduce merge_schema_from() 2016-01-11 10:34:52 +01:00
Tomasz Grabiec
be2bdb779a tests: Introduce canonical_mutation_test 2016-01-11 10:34:52 +01:00
Tomasz Grabiec
a63971ee4c tests: memtable_test: Add test for concurrent reading and schema changes 2016-01-11 10:34:52 +01:00
Tomasz Grabiec
8164902c84 schema_tables: Change column_family schema on schema sync
Notifications are not implemented yet.
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
d81a46d7b5 column_family: Add schema setters
There is one current schema for given column_family. Entries in
memtables and cache can be at any of the previous schemas, but they're
always upgraded to current schema on access.
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
da3a453003 service: Add GET_SCHEMA_VERSION remote call
The verb belongs to a seaprate client to avoid potential deadlocks
should the throttling on connection level be introduced in the
future. Another reason is to reduce latency for version requests as it
can potentially block many requests.
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
a9c00cbc11 batchlog_manager: Use requested schema version 2016-01-11 10:34:52 +01:00
Tomasz Grabiec
4e5a52d6fa db: Make read interface schema version aware
The intent is to make data returned by queries always conform to a
single schema version, which is requested by the client. For CQL
queries, for example, we want to use the same schema which was used to
compile the query. The other node expects to receive data conforming
to the requested schema.

Interface on shard level accepts schema_ptr, across nodes we use
table_schema_version UUID. To transfer schema_ptr across shards, we
use global_schema_ptr.

Because schema is identified with UUID across nodes, requestors must
be prepared for being queried for the definition of the schema. They
must hold a live schema_ptr around the request. This guarantees that
schema_registry will always know about the requested version. This is
not an issue because for queries the requestor needs to hold on to the
schema anyway to be able to interpret the results. But care must be
taken to always use the same schema version for making the request and
parsing the results.

Schema requesting across nodes is currently stubbed (throws runtime
exception).
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
036974e19b Make mutation interfaces support multiple versions
Schema is tracked in memtable and cache per-entry. Entries are
upgraded lazily on access. Incoming mutations are upgraded to table's
current schema on given shard.

Mutating nodes need to keep schema_ptr alive in case schema version is
requested by target node.
2016-01-11 10:34:51 +01:00
Tomasz Grabiec
9eef4d1651 db: Learn schema versions when adding tables 2016-01-11 10:34:51 +01:00
Tomasz Grabiec
175be4c2aa cql_query_test: Disable test_user_type 2016-01-11 10:34:51 +01:00
Tomasz Grabiec
04eb58159a query: Add schema_version field to read_command 2016-01-11 10:34:51 +01:00
Tomasz Grabiec
f9ae1ed1c6 frozen_mutation: Add schema_version field 2016-01-11 10:34:51 +01:00
Tomasz Grabiec
8c6480fc46 Introduce global_schema_ptr 2016-01-11 10:34:51 +01:00
Tomasz Grabiec
f25487bc1e Introduce schema_registry 2016-01-11 10:34:51 +01:00
Tomasz Grabiec
533aec84b3 schema: Enable shared_from_this() 2016-01-11 10:34:51 +01:00
Tomasz Grabiec
8a05b61d68 memtable: Read under _read_section 2016-01-11 10:34:51 +01:00
Tomasz Grabiec
5184381a0b memtable: Deconstify memtable in readers
We want to upgrade entries on read and for that we need mutating
permission.
2016-01-11 10:34:51 +01:00
Tomasz Grabiec
0a9436fc1a schema: Introduce frozen_schema
For passing schema across shards/nodes. Also, for keeping in
schema_registry when there's no live schema_ptr.
2016-01-11 10:34:51 +01:00
Tomasz Grabiec
060f93477b Make schema_mutations serializable
We must use canonical_mutation form to allow for changes in the schema
of schema tables. The node which deserializes schema mutations may not
have the same version of the schema tables so we cannot use
frozen_mutation, which is a schema dependent form.
2016-01-11 10:34:50 +01:00
Tomasz Grabiec
e84f3717b5 Introduce canonical_mutation
frozen_schema will transfer schema definition across nodes with schema
mutations. Because different nodes may have different versions of
schema tables, we cannot use frozen_mutations to transfer these
because frozen_mutation can only be read using the same version of the
schema it was frozen with. To solve this problem, new from of mutation
is introduced called canonical_mutation, which can be read using any
version of the schema.
2016-01-11 10:34:50 +01:00
Tomasz Grabiec
3e447e4ad1 tests: mutation_test: Add tests for equality and hashing 2016-01-11 10:34:50 +01:00
Tomasz Grabiec
48f1db5ffa mutation_assertions: Add is_not_equal_to() 2016-01-11 10:34:50 +01:00
Tomasz Grabiec
88a6a17f72 tests: Use mutation generators in frozen_mutation_test 2016-01-11 10:34:50 +01:00
Avi Kivity
6185744312 dist: redhat: drop 'sudo' in scylla_run
Systemd will change the user for us, and the extra process created by
'sudo' confuses sd_notify().
2016-01-10 18:46:43 +02:00
Avi Kivity
dd271b77b0 build: add support for optional pkg-config managed packages 2016-01-10 18:24:12 +02:00
Vlad Zolotarov
19e275be1f tests: gossip_test: initialize a broadcast address and a snitch
This patch fixes a regression introduced by
a commit ca935bf "tests: Fix gossip_test".

database service initializes a replication_strategy
object and a replication_strategy requires a snitch
service to be initialized.

A snitch service requires a broadcast address to be
set.

If any of the above is not initialized we are going
to hit the corresponding assert().

Set a snitch to a SimpleSnitch and a broadcast
address to 127.0.0.1.

Fixes issue #770

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1452421748-9605-1-git-send-email-vladz@cloudius-systems.com>
2016-01-10 13:13:37 +02:00
Tomasz Grabiec
d7b403db1f log: Change default level from warn to info
Logging at 'warn' level leaves us with too silent logs, not as helpful
as they could be in case of failure.

Message-Id: <1452283669-11675-1-git-send-email-tgrabiec@scylladb.com>
2016-01-09 09:24:22 +02:00
Tomasz Grabiec
9b2cc557c5 mutation_source_test: Add mutation generators
The goal is to provide various test cases with a way of iterating over
many combinations of mutaitons. It's good to have this in one place to
avoid duplication and increased coverage.
2016-01-08 21:10:27 +01:00
Tomasz Grabiec
4b92ef01fc test: Add tests for mutation upgrade 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
f59ec59abc mutation: Implement upgrade()
Converts mutation to a new schema.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
0edfe138f8 mutation_partition_view: Make visitable also with column_mapping 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
2cfdfe261d Introduce converting_mutation_partition_applier 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
b17cbc23ab schema: Introduce column_mapping
Encapsulates information needed to convert mutation representations
between schema versions.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
9a3db10b85 db/serializer: Implement skip() for bytes and sstring 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
13974234a4 db/serializer: Spread serializers to relax header dependencies 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
d13c6d7008 types: Introduce is_atomic()
Matches column_definition::is_atomic()
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
f3556ebfc2 schema: Introduce column_count_type
Right now in some places we use column_id, and in some places
size_t. Solve it by using column_count_type whose meaning is "an
integer sufficiently large for indexing columns". Note that we cannot
use column_id because it has more meaning to it than that.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
f58c2dec1e schema: Make schema objects versioned
The version needs to change value not only on structural changes but
also temporal. This is needed for nodes to detect if the version they
see was already synchronized with or not even if it has the same
structure as the past versions. We also need to end up with the same
version on all nodes when schema changes are commuted.

For regular mutable schemas version will be calculated from underlying
mutations when schema is announced. For static schemas of system
keyspace it is calculated by hashing scylla version and column id,
because we don't have mutations at the time of building the schema.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
13295563e0 schema_builder: Move compact_storage setting outside build()
Properties of the schema are set using methods of schema_builder and
different variants of build() are for different forms of the final
schema object.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
dbb7b7ebe3 db: Move system keyspace initialization to init_system_keyspace() 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
fdb9e01eb4 schema_tables: Use schema_mutations for schema_ptr translations
We will be able to reuse the code in frozen_schema. We need to read
data in mutation form so that we can construct the correct
schema_table_version, and attach the mutations to schema_ptr.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
d07e32bc32 schema_tables: Simplify schema building invocation chain 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
3c3ea20640 schema_tables: Drop pkey parameter from add_table_to_schema_mutation()
It simplifies add_table_to_schema_mutation() interface.

The current code is also a bit confusing, partition_key is created
with the keyspaces() schema and used in mutations destined for the
columnfamilies() schema. It works, the types are the same, but looks a
bit scary.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
22254e94cc query::result_set: Add constructor from mutation 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
a861b74b7e Introduce schema_mutations 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
a6084ee007 mutation: Make hashable
The computed hash is independent of any internal representation thus
can be used as a digest across nodes and versions.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
c009fe5991 keys: Add missing clustering_key_prefix_view::get_compound_type() 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
ade5cf1b4b mutation_partition: Make visitable with mutation_partition_visitor 2016-01-08 21:10:25 +01:00
Tomasz Grabiec
bc9ee083dd db: Move atomic_cell_or_collection to separate header
To break future cyclic dependency:

  atomic_cell.hh -> schema.hh (new) -> types.hh -> atomic_cell.hh
2016-01-08 21:10:25 +01:00
Tomasz Grabiec
6f955e1290 mutation_partition: Make equal() work with different schemas 2016-01-08 21:10:25 +01:00
Tomasz Grabiec
75caba5b8a schema: Guarantee that column id order matches name order
For static and regular (row) columns it is very convenient in some
cases to utilize the fact that columns ordered by ids are also ordered
by name. It currently holds, so make schema export this guarantee and
enable consumers to rely on.

The static schema::row_column_ids_are_ordered_by_name field is about
allowing code external to schema to make it very explicit (via
static_assert) that it relies on this guarantee, and be easily
discoverable in case we would have to relax this.
2016-01-08 21:10:25 +01:00
Tomasz Grabiec
14d0482efa Introduce md5_hasher 2016-01-08 21:10:25 +01:00
Tomasz Grabiec
eb1b21eb4b Introduce hashing helpers 2016-01-08 21:10:25 +01:00
Tomasz Grabiec
ff3a2e1239 mutation_partition: Drop row tombstones in do_compact() 2016-01-08 21:10:25 +01:00
Tomasz Grabiec
eb9b383531 service: migration_manager: Fix announce order to match C*
Current logic differs from C*, we first push to other nodes and then
initiate the the sync locally, while C* does the opposite.
2016-01-08 21:10:25 +01:00
Tomasz Grabiec
0768deba74 query_processor: Add trace-level logging of processed statements 2016-01-08 21:10:25 +01:00
Tomasz Grabiec
dae531554a create_index_statement: Use textual column name in all messages
As pointed out by Pawel, we can rely on operator<<()
Message-Id: <1452243656-3376-1-git-send-email-tgrabiec@scylladb.com>
2016-01-08 11:06:09 +02:00
Tomasz Grabiec
5d6d039297 create_index_statement: Use textual representation of column name
Before:

  InvalidRequest: code=2200 [Invalid query] message="No column definition found for column 736368656d615f76657273696f6e"

After:

  InvalidRequest: code=2200 [Invalid query] message="No column definition found for column schema_version"
Message-Id: <1452243156-2923-1-git-send-email-tgrabiec@scylladb.com>
2016-01-08 10:53:37 +02:00
Avi Kivity
0c755d2c94 db: reduce log spam when ignoring an sstable
With 10 sstables/shard and 50 shards, we get ~10*50*50 messages = 25,000
log messages about sstables being ignored.  This is not reasonable.

Reduce the log level to debug, and move the message to database.cc,
because at its original location, the containing function has nothing to
do with the message itself.

Reviewed-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Message-Id: <1452181687-7665-1-git-send-email-avi@scylladb.com>
2016-01-07 19:23:25 +02:00
Avi Kivity
3377739fa3 main: wait for API http server to start
Wait for the future returned by the http server start process to resolve,
so we know it is started.  If it doesn't, we'll hit the or_terminate()
further down the line and exit with an error code.
Message-Id: <1452092806-11508-3-git-send-email-avi@scylladb.com>
2016-01-07 16:44:07 +02:00
Avi Kivity
fbe3283816 snitch: intentionally leak snitch singleton
Because our shutdown process is crippled (refs #293), we won't shutdown the
snitch correctly, and the sharded<> instance can assert during shutdown.
This interferes with the next patch, which adds orderly shutdown if the http
server fails to start.

Leak it intentionally to work around the problem.
Message-Id: <1452092806-11508-2-git-send-email-avi@scylladb.com>
2016-01-07 16:43:37 +02:00
Pekka Enberg
973c62a486 gms/gossiper: Fix compilation error
Commit 02b04e5 ("gossip: Add is_safe_for_bootstrap") needs on extra
curly bracket to compile.
Message-Id: <1452177529-13555-1-git-send-email-penberg@scylladb.com>
2016-01-07 16:42:55 +02:00
Vlad Zolotarov
07f8549683 database: filter out a manifest.json files
Filter out manifest.json files when reading sstables during
bootup and when loading new sstables ('nodetool refresh').

Fixes issue #529

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1451911734-26511-3-git-send-email-vladz@cloudius-systems.com>
2016-01-07 15:56:02 +02:00
Vlad Zolotarov
c5aa2d6f1a database: lister: add a filtering option
Add a possibility to pass a filter functor receiving a full path
to a directory entry and returning a boolean value: TRUE if an
entry should be enumerated and FALSE - if it should be filtered out.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1451911734-26511-2-git-send-email-vladz@cloudius-systems.com>
2016-01-07 15:56:01 +02:00
Asias He
02b04e5907 gossip: Add is_safe_for_bootstrap
Make the following tests pass:

bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test
bootstrap_test.py:TestBootstrap.killed_wiped_node_cannot_join_test

    1) start node2
    2) wait for cql connection with node2 is ready
    3) stop node2
    4) delete data and commitlog directory for node2
    5) start node2

In step 5), node2 will do the bootstrap process since its data,
including the system table is wiped. It will think itself is a completly
new node and can possiblly stream from wrong node and violate
consistency.

To fix, we reject the boot if we found the node was in SHUTDOWN or
STATUS_NORMAL.

CASSANDRA-9765
Message-Id: <47bc23f4ce1487a60c5b4fbe5bfe9514337480a8.1452158975.git.asias@scylladb.com>
2016-01-07 15:55:01 +02:00
Asias He
933614bdf9 main: Change API server starting message
It comes from the Seastar HTTP server and is inaccurate.

Message-Id: <6a634437d2bd4368400010e25969e215894c2df9.1452162686.git.asias@scylladb.com>
2016-01-07 15:53:28 +02:00
Asias He
6439f4d808 storage_service: Fix load_broadcaster in get_load_map
If get_load_map is called from the api while load_broadcaster is not
set yet, we dereference nullptr.

Fixes #763.
Message-Id: <6f8d554f4976aea85d5cec5a76a3848234138b0a.1452152148.git.asias@scylladb.com>
2016-01-07 10:36:36 +02:00
Asias He
2345cda42f messaging_service: Rename shard_id to msg_addr
Use shard_id as the destination of the messaging_service is confusing,
since shard_id is used in the context of cpu id.
Message-Id: <8c9ef193dc000ef06f8879e6a01df65cf24635d8.1452155241.git.asias@scylladb.com>
2016-01-07 10:36:35 +02:00
Asias He
8c909122a6 gossip: Add wait_for_gossip_to_settle
Implement the wait for gossip to settle logic in the bootup process.

CASSANDRA-4288

Fixes:
bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test

1) start node2
2) wait for cql connection with node2 is ready
3) stop node2
4) delete data and commitlog directory for node2
5) start node2

In step 5, sometimes I saw in shadow round of node2, it gets node2's
status as BOOT from other nodes in the cluster instead of NORMAL. The
problem is we do not wait for gossip to settle before we start cql server,
as a result, when we stop node2 in step 3), other nodes in the cluster
have not got node2's status update to NORMAL.
2016-01-07 10:09:25 +02:00
Benoît Canet
8f725256e1 config: Mark ssl_storage_port as Used
Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1452082041-6117-1-git-send-email-benoit@scylladb.com>
2016-01-06 17:34:53 +02:00
299 changed files with 12080 additions and 5368 deletions

3
.gitignore vendored
View File

@@ -5,3 +5,6 @@ build
build.ninja
cscope.*
/debian/
dist/ami/files/*.rpm
dist/ami/variables.json
dist/ami/scylla_deploy.sh

103
IDL.md Normal file
View File

@@ -0,0 +1,103 @@
#IDL definition
The schema we use similar to c++ schema.
Use class or struct similar to the object you need the serializer for.
Use namespace when applicable.
##keywords
* class/struct - a class or a struct like C++
class/struct can have final or stub marker
* namespace - has the same C++ meaning
* enum class - has the same C++ meaning
* final modifier for class - when a class mark as final it will not contain a size parameter. Note that final class cannot be extended by future version, so use with care
* stub class - when a class is mark as stub, it means that no code will be generated for this class and it is only there as a documentation.
* version attributes - mark with [[version id ]] mark that a field is available from a specific version
* template - A template class definition like C++
##Syntax
###Namespace
```
namespace ns_name { namespace-body }
```
* ns_name: either a previously unused identifier, in which case this is original-namespace-definition or the name of a namespace, in which case this is extension-namespace-definition
* namespace-body: possibly empty sequence of declarations of any kind (including class and struct definitions as well as nested namespaces)
###class/struct
`
class-key class-name final(optional) stub(optional) { member-specification } ;(optional)
`
* class-key: one of class or struct.
* class-name: the name of the class that's being defined. optionally followed by keyword final, optionally followed by keyword stub
* final: when a class mark as final, it means it can not be extended and there is no need to serialize its size, use with care.
* stub: when a class is mark as stub, it means no code will generate for it and it is added for documentation only.
* member-specification: list of access specifiers, and public member accessor see class member below.
* to be compatible with C++ a class definition can be followed by a semicolon.
###enum
`enum-key identifier enum-base { enumerator-list(optional) }`
* enum-key: only enum class is supported
* identifier: the name of the enumeration that's being declared.
* enum-base: colon (:), followed by a type-specifier-seq that names an integral type (see the C++ standard for the full list of all possible integral types).
* enumerator-list: comma-separated list of enumerator definitions, each of which is either simply an identifier, which becomes the name of the enumerator, or an identifier with an initializer: identifier = integral value.
Note that though C++ allows constexpr as an initialize value, it makes the documentation less readable, hence is not permitted.
###class member
`type member-access attributes(optional) default-value(optional);`
* type: Any valid C++ type, following the C++ notation. note that there should be a serializer for the type, but deceleration order is not mandatory
* member-access: is the way the member can be access. If the member is public it can be the name itself. if not it could be a getter function that should be followed by braces. Note that getter can (and probably should) be const methods.
* attributes: Attributes define by square brackets. Currently are use to mark a version in which a specific member was added [ [ version version-number] ] would mark that the specific member was added in the given version number.
###template
`template < parameter-list > class-declaration`
* parameter-list - a non-empty comma-separated list of the template parameters.
* class-decleration - (See class section) The class name declared become a template name.
##IDL example
Forward slashes comments are ignored until the end of the line.
```
namespace utils {
// An example of a stub class
class UUID stub {
int64_t most_sig_bits;
int64_t least_sig_bits;
}
}
namespace gms {
//an enum example
enum class application_state:int {STATUS = 0,
LOAD,
SCHEMA,
DC};
// example of final class
class versioned_value final {
// getter and setter as public member
int version;
sstring value;
}
class heart_beat_state {
//getter as function
int32_t get_generation();
//default value example
int32_t get_heart_beat_version() = 1;
}
class endpoint_state {
heart_beat_state get_heart_beat_state();
std::map<application_state, versioned_value> get_application_state_map();
}
class gossip_digest {
inet_address get_endpoint();
int32_t get_generation();
//mark that a field was added on a specific version
int32_t get_max_version() [ [version 0.14.2] ];
}
class gossip_digest_ack {
std::vector<gossip_digest> digests();
std::map<inet_address, gms::endpoint_state> get_endpoint_state_map();
}
}
```

View File

@@ -15,7 +15,7 @@ git submodule update --recursive
* Installing required packages:
```
sudo yum install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel
sudo yum install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel numactl-devel hwloc-devel libpciaccess-devel libxml2-devel python3-pyparsing
```
* Build Scylla

View File

@@ -1,6 +1,6 @@
#!/bin/sh
VERSION=0.15
VERSION=0.17
if test -f version
then

View File

@@ -106,7 +106,7 @@
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"string"
"paramType":"query"
}
]
}

View File

@@ -196,6 +196,10 @@
"value": {
"type": "string",
"description": "The version value"
},
"version": {
"type": "int",
"description": "The application state version"
}
}
}

View File

@@ -234,12 +234,12 @@
"type":"string",
"enum":[
"CLIENT_ID",
"ECHO",
"MUTATION",
"MUTATION_DONE",
"READ_DATA",
"READ_MUTATION_DATA",
"READ_DIGEST",
"GOSSIP_ECHO",
"GOSSIP_DIGEST_SYN",
"GOSSIP_DIGEST_ACK2",
"GOSSIP_SHUTDOWN",
@@ -247,13 +247,13 @@
"TRUNCATE",
"REPLICATION_FINISHED",
"MIGRATION_REQUEST",
"STREAM_INIT_MESSAGE",
"PREPARE_MESSAGE",
"PREPARE_DONE_MESSAGE",
"STREAM_MUTATION",
"STREAM_MUTATION_DONE",
"COMPLETE_MESSAGE",
"LAST"
"REPAIR_CHECKSUM_RANGE",
"GET_SCHEMA_VERSION"
]
}
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright 2015 ScyllaDB
*/
/*
@@ -52,67 +52,98 @@ static std::unique_ptr<reply> exception_reply(std::exception_ptr eptr) {
return std::make_unique<reply>();
}
future<> set_server(http_context& ctx) {
future<> set_server_init(http_context& ctx) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx](routes& r) {
r.register_exeption_handler(exception_reply);
httpd::directory_handler* dir = new httpd::directory_handler(ctx.api_dir,
new content_replace("html"));
r.put(GET, "/ui", new httpd::file_handler(ctx.api_dir + "/index.html",
new content_replace("html")));
r.add(GET, url("/ui").remainder("path"), dir);
rb->set_api_doc(r);
rb->register_function(r, "storage_service",
"The storage service API");
set_storage_service(ctx,r);
rb->register_function(r, "commitlog",
"The commit log API");
set_commitlog(ctx,r);
rb->register_function(r, "gossiper",
"The gossiper API");
set_gossiper(ctx,r);
rb->register_function(r, "column_family",
"The column family API");
set_column_family(ctx, r);
rb->register_function(r, "lsa", "Log-structured allocator API");
set_lsa(ctx, r);
rb->register_function(r, "failure_detector",
"The failure detector API");
set_failure_detector(ctx,r);
rb->register_function(r, "messaging_service",
"The messaging service API");
set_messaging_service(ctx, r);
rb->register_function(r, "storage_proxy",
"The storage proxy API");
set_storage_proxy(ctx, r);
rb->register_function(r, "cache_service",
"The cache service API");
set_cache_service(ctx,r);
rb->register_function(r, "collectd",
"The collectd API");
set_collectd(ctx, r);
rb->register_function(r, "endpoint_snitch_info",
"The endpoint snitch info API");
set_endpoint_snitch(ctx, r);
rb->register_function(r, "compaction_manager",
"The Compaction manager API");
set_compaction_manager(ctx, r);
rb->register_function(r, "hinted_handoff",
"The hinted handoff API");
set_hinted_handoff(ctx, r);
rb->register_function(r, "stream_manager",
"The stream manager API");
set_stream_manager(ctx, r);
r.add(GET, url("/ui").remainder("path"), new httpd::directory_handler(ctx.api_dir,
new content_replace("html")));
rb->register_function(r, "system",
"The system related API");
set_system(ctx, r);
rb->set_api_doc(r);
});
}
static future<> register_api(http_context& ctx, const sstring& api_name,
const sstring api_desc,
std::function<void(http_context& ctx, routes& r)> f) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx, api_name, api_desc, f](routes& r) {
rb->register_function(r, api_name, api_desc);
f(ctx,r);
});
}
future<> set_server_storage_service(http_context& ctx) {
return register_api(ctx, "storage_service", "The storage service API", set_storage_service);
}
future<> set_server_gossip(http_context& ctx) {
return register_api(ctx, "gossiper",
"The gossiper API", set_gossiper);
}
future<> set_server_load_sstable(http_context& ctx) {
return register_api(ctx, "column_family",
"The column family API", set_column_family);
}
future<> set_server_messaging_service(http_context& ctx) {
return register_api(ctx, "messaging_service",
"The messaging service API", set_messaging_service);
}
future<> set_server_storage_proxy(http_context& ctx) {
return register_api(ctx, "storage_proxy",
"The storage proxy API", set_storage_proxy);
}
future<> set_server_stream_manager(http_context& ctx) {
return register_api(ctx, "stream_manager",
"The stream manager API", set_stream_manager);
}
future<> set_server_gossip_settle(http_context& ctx) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx](routes& r) {
rb->register_function(r, "failure_detector",
"The failure detector API");
set_failure_detector(ctx,r);
rb->register_function(r, "cache_service",
"The cache service API");
set_cache_service(ctx,r);
rb->register_function(r, "endpoint_snitch_info",
"The endpoint snitch info API");
set_endpoint_snitch(ctx, r);
});
}
future<> set_server_done(http_context& ctx) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx](routes& r) {
rb->register_function(r, "compaction_manager",
"The Compaction manager API");
set_compaction_manager(ctx, r);
rb->register_function(r, "lsa", "Log-structured allocator API");
set_lsa(ctx, r);
rb->register_function(r, "commitlog",
"The commit log API");
set_commitlog(ctx,r);
rb->register_function(r, "hinted_handoff",
"The hinted handoff API");
set_hinted_handoff(ctx, r);
rb->register_function(r, "collectd",
"The collectd API");
set_collectd(ctx, r);
});
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright 2015 Cloudius Systems
* Copyright 2015 ScyllaDB
*/
/*
@@ -21,31 +21,17 @@
#pragma once
#include "http/httpd.hh"
#include "json/json_elements.hh"
#include "database.hh"
#include "service/storage_proxy.hh"
#include <boost/lexical_cast.hpp>
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string/classification.hpp>
#include "api/api-doc/utils.json.hh"
#include "utils/histogram.hh"
#include "http/exception.hh"
#include "api_init.hh"
namespace api {
struct http_context {
sstring api_dir;
sstring api_doc;
httpd::http_server_control http_server;
distributed<database>& db;
distributed<service::storage_proxy>& sp;
http_context(distributed<database>& _db, distributed<service::storage_proxy>&
_sp) : db(_db), sp(_sp) {}
};
future<> set_server(http_context& ctx);
template<class T>
std::vector<sstring> container_to_vec(const T& container) {
std::vector<sstring> res;

51
api/api_init.hh Normal file
View File

@@ -0,0 +1,51 @@
/*
* Copyright 2016 ScylaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "database.hh"
#include "service/storage_proxy.hh"
#include "http/httpd.hh"
namespace api {
struct http_context {
sstring api_dir;
sstring api_doc;
httpd::http_server_control http_server;
distributed<database>& db;
distributed<service::storage_proxy>& sp;
http_context(distributed<database>& _db,
distributed<service::storage_proxy>& _sp)
: db(_db), sp(_sp) {
}
};
future<> set_server_init(http_context& ctx);
future<> set_server_storage_service(http_context& ctx);
future<> set_server_gossip(http_context& ctx);
future<> set_server_load_sstable(http_context& ctx);
future<> set_server_messaging_service(http_context& ctx);
future<> set_server_storage_proxy(http_context& ctx);
future<> set_server_stream_manager(http_context& ctx);
future<> set_server_gossip_settle(http_context& ctx);
future<> set_server_done(http_context& ctx);
}

View File

@@ -49,7 +49,7 @@ void set_compaction_manager(http_context& ctx, routes& r) {
s.ks = c->ks;
s.cf = c->cf;
s.unit = "keys";
s.task_type = "compaction";
s.task_type = sstables::compaction_name(c->type);
s.completed = c->total_keys_written;
s.total = c->total_partitions;
summaries.push_back(std::move(s));
@@ -67,11 +67,14 @@ void set_compaction_manager(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(json_void());
});
cm::stop_compaction.set(r, [] (std::unique_ptr<request> req) {
//TBD
// FIXME
warn(unimplemented::cause::API);
return make_ready_future<json::json_return_type>("");
cm::stop_compaction.set(r, [&ctx] (std::unique_ptr<request> req) {
auto type = req->get_query_param("type");
return ctx.db.invoke_on_all([type] (database& db) {
auto& cm = db.get_compaction_manager();
cm.stop_compaction(type);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
cm::get_pending_tasks.set(r, [&ctx] (std::unique_ptr<request> req) {

View File

@@ -44,6 +44,7 @@ void set_failure_detector(http_context& ctx, routes& r) {
// method that the state index are static but the name can be changed.
version_val.application_state = static_cast<std::underlying_type<gms::application_state>::type>(a.first);
version_val.value = a.second.value;
version_val.version = a.second.version;
val.application_state.push(version_val);
}
res.push_back(val);

View File

@@ -32,9 +32,9 @@ using namespace net;
namespace api {
using shard_info = messaging_service::shard_info;
using shard_id = messaging_service::shard_id;
using msg_addr = messaging_service::msg_addr;
static const int32_t num_verb = static_cast<int32_t>(messaging_verb::LAST) + 1;
static const int32_t num_verb = static_cast<int32_t>(messaging_verb::LAST);
std::vector<message_counter> map_to_message_counters(
const std::unordered_map<gms::inet_address, unsigned long>& map) {
@@ -58,7 +58,7 @@ future_json_function get_client_getter(std::function<uint64_t(const shard_info&)
using map_type = std::unordered_map<gms::inet_address, uint64_t>;
auto get_shard_map = [f](messaging_service& ms) {
std::unordered_map<gms::inet_address, unsigned long> map;
ms.foreach_client([&map, f] (const shard_id& id, const shard_info& info) {
ms.foreach_client([&map, f] (const msg_addr& id, const shard_info& info) {
map[id.addr] = f(info);
});
return map;
@@ -124,7 +124,7 @@ void set_messaging_service(http_context& ctx, routes& r) {
});
get_dropped_messages_by_ver.set(r, [](std::unique_ptr<request> req) {
shared_ptr<std::vector<uint64_t>> map = make_shared<std::vector<uint64_t>>(num_verb, 0);
shared_ptr<std::vector<uint64_t>> map = make_shared<std::vector<uint64_t>>(num_verb);
return net::get_messaging_service().map_reduce([map](const uint64_t* local_map) mutable {
for (auto i = 0; i < num_verb; i++) {
@@ -137,8 +137,12 @@ void set_messaging_service(http_context& ctx, routes& r) {
for (auto i : verb_counter::verb_wrapper::all_items()) {
verb_counter c;
messaging_verb v = i; // for type safety we use messaging_verb values
if ((*map)[static_cast<int32_t>(v)] > 0) {
c.count = (*map)[static_cast<int32_t>(v)];
auto idx = static_cast<uint32_t>(v);
if (idx >= map->size()) {
throw std::runtime_error(sprint("verb index out of bounds: %lu, map size: %lu", idx, map->size()));
}
if ((*map)[idx] > 0) {
c.count = (*map)[idx];
c.verb = i;
res.push_back(c);
}

View File

@@ -30,6 +30,7 @@
#include "repair/repair.hh"
#include "locator/snitch_base.hh"
#include "column_family.hh"
#include "log.hh"
namespace api {
@@ -271,15 +272,21 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::force_keyspace_cleanup.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
// FIXME
// the nodetool clean up is used in many tests
// this workaround willl let it work until
// a cleanup is implemented
warn(unimplemented::cause::API);
auto keyspace = validate_keyspace(ctx, req->param);
auto column_family = req->get_query_param("cf");
return make_ready_future<json::json_return_type>(0);
auto column_families = split_cf(req->get_query_param("cf"));
if (column_families.empty()) {
column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
}
return ctx.db.invoke_on_all([keyspace, column_families] (database& db) {
std::vector<column_family*> column_families_vec;
auto& cm = db.get_compaction_manager();
for (auto entry : column_families) {
column_family* cf = &db.find_column_family(keyspace, entry);
cm.submit_cleanup_job(cf);
}
}).then([]{
return make_ready_future<json::json_return_type>(0);
});
});
ss::scrub.set(r, [&ctx](std::unique_ptr<request> req) {
@@ -398,9 +405,13 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_logging_levels.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
std::vector<ss::mapper> res;
for (auto i : logging::logger_registry().get_all_logger_names()) {
ss::mapper log;
log.key = i;
log.value = logging::level_name(logging::logger_registry().get_logger_level(i));
res.push_back(log);
}
return make_ready_future<json::json_return_type>(res);
});

View File

@@ -47,7 +47,7 @@ static hs::progress_info get_progress_info(const streaming::progress_info& info)
res.direction = info.dir;
res.file_name = info.file_name;
res.peer = boost::lexical_cast<std::string>(info.peer);
res.session_index = info.session_index;
res.session_index = 0;
res.total_bytes = info.total_bytes;
return res;
}
@@ -70,7 +70,7 @@ static hs::stream_state get_state(
for (auto info : result_future.get_coordinator().get()->get_all_session_info()) {
hs::stream_info si;
si.peer = boost::lexical_cast<std::string>(info.peer);
si.session_index = info.session_index;
si.session_index = 0;
si.state = info.state;
si.connecting = si.peer;
set_summaries(info.receiving_summaries, si.receiving_summaries);
@@ -109,14 +109,16 @@ void set_stream_manager(http_context& ctx, routes& r) {
});
hs::get_total_incoming_bytes.set(r, [](std::unique_ptr<request> req) {
gms::inet_address ep(req->param["peer"]);
utils::UUID plan_id = gms::get_local_gossiper().get_host_id(ep);
return streaming::get_stream_manager().map_reduce0([plan_id](streaming::stream_manager& stream) {
gms::inet_address peer(req->param["peer"]);
return streaming::get_stream_manager().map_reduce0([peer](streaming::stream_manager& sm) {
int64_t res = 0;
streaming::stream_result_future* s = stream.get_receiving_stream(plan_id).get();
if (s != nullptr) {
for (auto si: s->get_coordinator()->get_all_session_info()) {
res += si.get_total_size_received();
for (auto sr : sm.get_all_streams()) {
if (sr) {
for (auto session : sr->get_coordinator()->get_all_stream_sessions()) {
if (session->peer == peer) {
res += session->get_bytes_received();
}
}
}
}
return res;
@@ -126,12 +128,12 @@ void set_stream_manager(http_context& ctx, routes& r) {
});
hs::get_all_total_incoming_bytes.set(r, [](std::unique_ptr<request> req) {
return streaming::get_stream_manager().map_reduce0([](streaming::stream_manager& stream) {
return streaming::get_stream_manager().map_reduce0([](streaming::stream_manager& sm) {
int64_t res = 0;
for (auto s : stream.get_receiving_streams()) {
if (s.second.get() != nullptr) {
for (auto si: s.second.get()->get_coordinator()->get_all_session_info()) {
res += si.get_total_size_received();
for (auto sr : sm.get_all_streams()) {
if (sr) {
for (auto session : sr->get_coordinator()->get_all_stream_sessions()) {
res += session->get_bytes_received();
}
}
}
@@ -142,14 +144,16 @@ void set_stream_manager(http_context& ctx, routes& r) {
});
hs::get_total_outgoing_bytes.set(r, [](std::unique_ptr<request> req) {
gms::inet_address ep(req->param["peer"]);
utils::UUID plan_id = gms::get_local_gossiper().get_host_id(ep);
return streaming::get_stream_manager().map_reduce0([plan_id](streaming::stream_manager& stream) {
gms::inet_address peer(req->param["peer"]);
return streaming::get_stream_manager().map_reduce0([peer](streaming::stream_manager& sm) {
int64_t res = 0;
streaming::stream_result_future* s = stream.get_sending_stream(plan_id).get();
if (s != nullptr) {
for (auto si: s->get_coordinator()->get_all_session_info()) {
res += si.get_total_size_received();
for (auto sr : sm.get_all_streams()) {
if (sr) {
for (auto session : sr->get_coordinator()->get_all_stream_sessions()) {
if (session->peer == peer) {
res += session->get_bytes_sent();
}
}
}
}
return res;
@@ -159,12 +163,12 @@ void set_stream_manager(http_context& ctx, routes& r) {
});
hs::get_all_total_outgoing_bytes.set(r, [](std::unique_ptr<request> req) {
return streaming::get_stream_manager().map_reduce0([](streaming::stream_manager& stream) {
return streaming::get_stream_manager().map_reduce0([](streaming::stream_manager& sm) {
int64_t res = 0;
for (auto s : stream.get_initiated_streams()) {
if (s.second.get() != nullptr) {
for (auto si: s.second.get()->get_coordinator()->get_all_session_info()) {
res += si.get_total_size_received();
for (auto sr : sm.get_all_streams()) {
if (sr) {
for (auto session : sr->get_coordinator()->get_all_stream_sessions()) {
res += session->get_bytes_sent();
}
}
}

View File

@@ -272,45 +272,6 @@ template<typename T>
class serializer;
}
// A variant type that can hold either an atomic_cell, or a serialized collection.
// Which type is stored is determined by the schema.
class atomic_cell_or_collection final {
managed_bytes _data;
template<typename T>
friend class db::serializer;
private:
atomic_cell_or_collection(managed_bytes&& data) : _data(std::move(data)) {}
public:
atomic_cell_or_collection() = default;
atomic_cell_or_collection(atomic_cell ac) : _data(std::move(ac._data)) {}
static atomic_cell_or_collection from_atomic_cell(atomic_cell data) { return { std::move(data._data) }; }
atomic_cell_view as_atomic_cell() const { return atomic_cell_view::from_bytes(_data); }
atomic_cell_or_collection(collection_mutation cm) : _data(std::move(cm.data)) {}
explicit operator bool() const {
return !_data.empty();
}
static atomic_cell_or_collection from_collection_mutation(collection_mutation data) {
return std::move(data.data);
}
collection_mutation_view as_collection_mutation() const {
return collection_mutation_view{_data};
}
bytes_view serialize() const {
return _data;
}
bool operator==(const atomic_cell_or_collection& other) const {
return _data == other._data;
}
void linearize() {
_data.linearize();
}
void unlinearize() {
_data.scatter();
}
friend std::ostream& operator<<(std::ostream&, const atomic_cell_or_collection&);
};
class column_definition;
int compare_atomic_cell_for_merge(atomic_cell_view left, atomic_cell_view right);

57
atomic_cell_hash.hh Normal file
View File

@@ -0,0 +1,57 @@
/*
* Copyright (C) 2015 Cloudius Systems, Ltd.
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
// Not part of atomic_cell.hh to avoid cyclic dependency between types.hh and atomic_cell.hh
#include "types.hh"
#include "atomic_cell.hh"
#include "hashing.hh"
template<typename Hasher>
void feed_hash(collection_mutation_view cell, Hasher& h, const data_type& type) {
auto&& ctype = static_pointer_cast<const collection_type_impl>(type);
auto m_view = ctype->deserialize_mutation_form(cell);
::feed_hash(h, m_view.tomb);
for (auto&& key_and_value : m_view.cells) {
::feed_hash(h, key_and_value.first);
::feed_hash(h, key_and_value.second);
}
}
template<>
struct appending_hash<atomic_cell_view> {
template<typename Hasher>
void operator()(Hasher& h, atomic_cell_view cell) const {
feed_hash(h, cell.is_live());
feed_hash(h, cell.timestamp());
if (cell.is_live()) {
if (cell.is_live_and_has_ttl()) {
feed_hash(h, cell.expiry());
feed_hash(h, cell.ttl());
}
feed_hash(h, cell.value());
} else {
feed_hash(h, cell.deletion_time());
}
}
};

View File

@@ -0,0 +1,73 @@
/*
* Copyright (C) 2015 Cloudius Systems, Ltd.
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "atomic_cell.hh"
#include "schema.hh"
#include "hashing.hh"
// A variant type that can hold either an atomic_cell, or a serialized collection.
// Which type is stored is determined by the schema.
class atomic_cell_or_collection final {
managed_bytes _data;
template<typename T>
friend class db::serializer;
private:
atomic_cell_or_collection(managed_bytes&& data) : _data(std::move(data)) {}
public:
atomic_cell_or_collection() = default;
atomic_cell_or_collection(atomic_cell ac) : _data(std::move(ac._data)) {}
static atomic_cell_or_collection from_atomic_cell(atomic_cell data) { return { std::move(data._data) }; }
atomic_cell_view as_atomic_cell() const { return atomic_cell_view::from_bytes(_data); }
atomic_cell_or_collection(collection_mutation cm) : _data(std::move(cm.data)) {}
explicit operator bool() const {
return !_data.empty();
}
static atomic_cell_or_collection from_collection_mutation(collection_mutation data) {
return std::move(data.data);
}
collection_mutation_view as_collection_mutation() const {
return collection_mutation_view{_data};
}
bytes_view serialize() const {
return _data;
}
bool operator==(const atomic_cell_or_collection& other) const {
return _data == other._data;
}
template<typename Hasher>
void feed_hash(Hasher& h, const column_definition& def) const {
if (def.is_atomic()) {
::feed_hash(h, as_atomic_cell());
} else {
::feed_hash(as_collection_mutation(), h, def.type);
}
}
void linearize() {
_data.linearize();
}
void unlinearize() {
_data.scatter();
}
friend std::ostream& operator<<(std::ostream&, const atomic_cell_or_collection&);
};

292
auth/auth.cc Normal file
View File

@@ -0,0 +1,292 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright 2016 Cloudius Systems
*
* Modified by Cloudius Systems
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <seastar/core/sleep.hh>
#include "auth.hh"
#include "authenticator.hh"
#include "database.hh"
#include "cql3/query_processor.hh"
#include "cql3/statements/cf_statement.hh"
#include "cql3/statements/create_table_statement.hh"
#include "db/config.hh"
#include "service/migration_manager.hh"
const sstring auth::auth::DEFAULT_SUPERUSER_NAME("cassandra");
const sstring auth::auth::AUTH_KS("system_auth");
const sstring auth::auth::USERS_CF("users");
static const sstring USER_NAME("name");
static const sstring SUPER("super");
static logging::logger logger("auth");
// TODO: configurable
using namespace std::chrono_literals;
const std::chrono::milliseconds auth::auth::SUPERUSER_SETUP_DELAY = 10000ms;
class auth_migration_listener : public service::migration_listener {
void on_create_keyspace(const sstring& ks_name) override {}
void on_create_column_family(const sstring& ks_name, const sstring& cf_name) override {}
void on_create_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_create_function(const sstring& ks_name, const sstring& function_name) override {}
void on_create_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
void on_update_keyspace(const sstring& ks_name) override {}
void on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool) override {}
void on_update_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_update_function(const sstring& ks_name, const sstring& function_name) override {}
void on_update_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
void on_drop_keyspace(const sstring& ks_name) override {
// TODO:
//DatabaseDescriptor.getAuthorizer().revokeAll(DataResource.keyspace(ksName));
}
void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {
// TODO:
//DatabaseDescriptor.getAuthorizer().revokeAll(DataResource.columnFamily(ksName, cfName));
}
void on_drop_user_type(const sstring& ks_name, const sstring& type_name) override {}
void on_drop_function(const sstring& ks_name, const sstring& function_name) override {}
void on_drop_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {}
};
static auth_migration_listener auth_migration;
/**
* Poor mans job schedule. For maximum 2 jobs. Sic.
* Still does nothing more clever than waiting 10 seconds
* like origin, then runs the submitted tasks.
*
* Only difference compared to sleep (from which this
* borrows _heavily_) is that if tasks have not run by the time
* we exit (and do static clean up) we delete the promise + cont
*
* Should be abstracted to some sort of global server function
* probably.
*/
void auth::auth::schedule_when_up(scheduled_func f) {
struct waiter {
promise<> done;
timer<> tmr;
waiter() : tmr([this] {done.set_value();})
{
tmr.arm(SUPERUSER_SETUP_DELAY);
}
~waiter() {
if (tmr.armed()) {
tmr.cancel();
done.set_exception(std::runtime_error("shutting down"));
}
logger.trace("Deleting scheduled task");
}
void kill() {
}
};
typedef std::unique_ptr<waiter> waiter_ptr;
static thread_local std::vector<waiter_ptr> waiters;
logger.trace("Adding scheduled task");
waiters.emplace_back(std::make_unique<waiter>());
auto* w = waiters.back().get();
w->done.get_future().finally([w] {
auto i = std::find_if(waiters.begin(), waiters.end(), [w](const waiter_ptr& p) {
return p.get() == w;
});
if (i != waiters.end()) {
waiters.erase(i);
}
}).then([f = std::move(f)] {
logger.trace("Running scheduled task");
return f();
}).handle_exception([](auto ep) {
return make_ready_future();
});
}
bool auth::auth::is_class_type(const sstring& type, const sstring& classname) {
if (type == classname) {
return true;
}
auto i = classname.find_last_of('.');
return classname.compare(i + 1, sstring::npos, type) == 0;
}
future<> auth::auth::setup() {
auto& db = cql3::get_local_query_processor().db().local();
auto& cfg = db.get_config();
auto type = cfg.authenticator();
if (is_class_type(type, authenticator::ALLOW_ALL_AUTHENTICATOR_NAME)) {
return authenticator::setup(type).discard_result(); // just create the object
}
future<> f = make_ready_future();
if (!db.has_keyspace(AUTH_KS)) {
std::map<sstring, sstring> opts;
opts["replication_factor"] = "1";
auto ksm = keyspace_metadata::new_keyspace(AUTH_KS, "org.apache.cassandra.locator.SimpleStrategy", opts, true);
f = service::get_local_migration_manager().announce_new_keyspace(ksm, false);
}
return f.then([] {
return setup_table(USERS_CF, sprint("CREATE TABLE %s.%s (%s text, %s boolean, PRIMARY KEY(%s)) WITH gc_grace_seconds=%d",
AUTH_KS, USERS_CF, USER_NAME, SUPER, USER_NAME,
90 * 24 * 60 * 60)); // 3 months.
}).then([type] {
return authenticator::setup(type).discard_result();
}).then([] {
// TODO authorizer
}).then([] {
service::get_local_migration_manager().register_listener(&auth_migration); // again, only one shard...
// instead of once-timer, just schedule this later
schedule_when_up([] {
// setup default super user
return has_existing_users(USERS_CF, DEFAULT_SUPERUSER_NAME, USER_NAME).then([](bool exists) {
if (!exists) {
auto query = sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
AUTH_KS, USERS_CF, USER_NAME, SUPER);
cql3::get_local_query_processor().process(query, db::consistency_level::ONE, {DEFAULT_SUPERUSER_NAME, true}).then([](auto) {
logger.info("Created default superuser '{}'", DEFAULT_SUPERUSER_NAME);
}).handle_exception([](auto ep) {
try {
std::rethrow_exception(ep);
} catch (exceptions::request_execution_exception&) {
logger.warn("Skipped default superuser setup: some nodes were not ready");
}
});
}
});
});
});
}
static db::consistency_level consistency_for_user(const sstring& username) {
if (username == auth::auth::DEFAULT_SUPERUSER_NAME) {
return db::consistency_level::QUORUM;
}
return db::consistency_level::LOCAL_ONE;
}
static future<::shared_ptr<cql3::untyped_result_set>> select_user(const sstring& username) {
// Here was a thread local, explicit cache of prepared statement. In normal execution this is
// fine, but since we in testing set up and tear down system over and over, we'd start using
// obsolete prepared statements pretty quickly.
// Rely on query processing caching statements instead, and lets assume
// that a map lookup string->statement is not gonna kill us much.
return cql3::get_local_query_processor().process(
sprint("SELECT * FROM %s.%s WHERE %s = ?",
auth::auth::AUTH_KS, auth::auth::USERS_CF,
USER_NAME), consistency_for_user(username),
{ username }, true);
}
future<bool> auth::auth::is_existing_user(const sstring& username) {
return select_user(username).then(
[](::shared_ptr<cql3::untyped_result_set> res) {
return make_ready_future<bool>(!res->empty());
});
}
future<bool> auth::auth::is_super_user(const sstring& username) {
return select_user(username).then(
[](::shared_ptr<cql3::untyped_result_set> res) {
return make_ready_future<bool>(!res->empty() && res->one().get_as<bool>(SUPER));
});
}
future<> auth::auth::insert_user(const sstring& username, bool is_super)
throw (exceptions::request_execution_exception) {
return cql3::get_local_query_processor().process(sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?)",
AUTH_KS, USERS_CF, USER_NAME, SUPER),
consistency_for_user(username), { username, is_super }).discard_result();
}
future<> auth::auth::delete_user(const sstring& username) throw(exceptions::request_execution_exception) {
return cql3::get_local_query_processor().process(sprint("DELETE FROM %s.%s WHERE %s = ?",
AUTH_KS, USERS_CF, USER_NAME),
consistency_for_user(username), { username }).discard_result();
}
future<> auth::auth::setup_table(const sstring& name, const sstring& cql) {
auto& qp = cql3::get_local_query_processor();
auto& db = qp.db().local();
if (db.has_schema(AUTH_KS, name)) {
return make_ready_future();
}
::shared_ptr<cql3::statements::cf_statement> parsed = static_pointer_cast<
cql3::statements::cf_statement>(cql3::query_processor::parse_statement(cql));
parsed->prepare_keyspace(AUTH_KS);
::shared_ptr<cql3::statements::create_table_statement> statement =
static_pointer_cast<cql3::statements::create_table_statement>(
parsed->prepare(db)->statement);
// Origin sets "Legacy Cf Id" for the new table. We have no need to be
// pre-2.1 compatible (afaik), so lets skip a whole lotta hoolaballo
return statement->announce_migration(qp.proxy(), false).then([statement](bool) {});
}
future<bool> auth::auth::has_existing_users(const sstring& cfname, const sstring& def_user_name, const sstring& name_column) {
auto default_user_query = sprint("SELECT * FROM %s.%s WHERE %s = ?", AUTH_KS, cfname, name_column);
auto all_users_query = sprint("SELECT * FROM %s.%s LIMIT 1", AUTH_KS, cfname);
return cql3::get_local_query_processor().process(default_user_query, db::consistency_level::ONE, { def_user_name }).then([=](::shared_ptr<cql3::untyped_result_set> res) {
if (!res->empty()) {
return make_ready_future<bool>(true);
}
return cql3::get_local_query_processor().process(default_user_query, db::consistency_level::QUORUM, { def_user_name }).then([all_users_query](::shared_ptr<cql3::untyped_result_set> res) {
if (!res->empty()) {
return make_ready_future<bool>(true);
}
return cql3::get_local_query_processor().process(all_users_query, db::consistency_level::QUORUM).then([](::shared_ptr<cql3::untyped_result_set> res) {
return make_ready_future<bool>(!res->empty());
});
});
});
}

120
auth/auth.hh Normal file
View File

@@ -0,0 +1,120 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright 2016 Cloudius Systems
*
* Modified by Cloudius Systems
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <chrono>
#include <seastar/core/sstring.hh>
#include <seastar/core/future.hh>
#include "exceptions/exceptions.hh"
namespace auth {
class auth {
public:
static const sstring DEFAULT_SUPERUSER_NAME;
static const sstring AUTH_KS;
static const sstring USERS_CF;
static const std::chrono::milliseconds SUPERUSER_SETUP_DELAY;
static bool is_class_type(const sstring& type, const sstring& classname);
#if 0
public static Set<Permission> getPermissions(AuthenticatedUser user, IResource resource)
{
return permissionsCache.getPermissions(user, resource);
}
#endif
/**
* Checks if the username is stored in AUTH_KS.USERS_CF.
*
* @param username Username to query.
* @return whether or not Cassandra knows about the user.
*/
static future<bool> is_existing_user(const sstring& username);
/**
* Checks if the user is a known superuser.
*
* @param username Username to query.
* @return true is the user is a superuser, false if they aren't or don't exist at all.
*/
static future<bool> is_super_user(const sstring& username);
/**
* Inserts the user into AUTH_KS.USERS_CF (or overwrites their superuser status as a result of an ALTER USER query).
*
* @param username Username to insert.
* @param isSuper User's new status.
* @throws RequestExecutionException
*/
static future<> insert_user(const sstring& username, bool is_super) throw(exceptions::request_execution_exception);
/**
* Deletes the user from AUTH_KS.USERS_CF.
*
* @param username Username to delete.
* @throws RequestExecutionException
*/
static future<> delete_user(const sstring& username) throw(exceptions::request_execution_exception);
/**
* Sets up Authenticator and Authorizer.
*/
static future<> setup();
/**
* Set up table from given CREATE TABLE statement under system_auth keyspace, if not already done so.
*
* @param name name of the table
* @param cql CREATE TABLE statement
*/
static future<> setup_table(const sstring& name, const sstring& cql);
static future<bool> has_existing_users(const sstring& cfname, const sstring& def_user_name, const sstring& name_column_name);
// For internal use. Run function "when system is up".
typedef std::function<future<>()> scheduled_func;
static void schedule_when_up(scheduled_func);
};
}

View File

@@ -14,9 +14,12 @@
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright 2016 Cloudius Systems
*
* Modified by Cloudius Systems.
* Copyright 2015 Cloudius Systems.
* Modified by Cloudius Systems
*/
/*
@@ -36,39 +39,23 @@
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "utils/UUID.hh"
#include "streaming/messages/stream_message.hh"
#include "authenticated_user.hh"
namespace streaming {
namespace messages {
const sstring auth::authenticated_user::ANONYMOUS_USERNAME("anonymous");
class retry_message : public stream_message {
public:
using UUID = utils::UUID;
UUID cf_id;
int sequence_number;
retry_message() = default;
retry_message(UUID cf_id_, int sequence_number_)
: stream_message(stream_message::Type::RECEIVED)
, cf_id (cf_id_)
, sequence_number(sequence_number_) {
}
#if 0
@Override
public String toString()
{
final StringBuilder sb = new StringBuilder("Retry (");
sb.append(cfId).append(", #").append(sequenceNumber).append(')');
return sb.toString();
}
#endif
public:
void serialize(bytes::iterator& out) const;
static retry_message deserialize(bytes_view& v);
size_t serialized_size() const;
};
auth::authenticated_user::authenticated_user()
: _anon(true)
{}
} // namespace messages
} // namespace streaming
auth::authenticated_user::authenticated_user(sstring name)
: _name(name), _anon(false)
{}
const sstring& auth::authenticated_user::name() const {
return _anon ? ANONYMOUS_USERNAME : _name;
}
bool auth::authenticated_user::operator==(const authenticated_user& v) const {
return _anon ? v._anon : _name == v._name;
}

View File

@@ -14,9 +14,12 @@
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright 2016 Cloudius Systems
*
* Modified by Cloudius Systems.
* Copyright 2015 Cloudius Systems.
* Modified by Cloudius Systems
*/
/*
@@ -38,35 +41,39 @@
#pragma once
#include "streaming/messages/stream_message.hh"
#include "streaming/messages/file_message_header.hh"
#include "sstables/sstables.hh"
#include "mutation_reader.hh"
#include <seastar/core/sstring.hh>
namespace streaming {
namespace messages {
namespace auth {
/**
* IncomingFileMessage is used to receive the part(or whole) of a SSTable data file.
*/
class incoming_file_message : public stream_message {
class authenticated_user {
public:
file_message_header header;
static const sstring ANONYMOUS_USERNAME;
incoming_file_message() = default;
incoming_file_message(file_message_header header_, mutation_reader mr_)
: stream_message(stream_message::Type::FILE)
, header(std::move(header_)) {
authenticated_user();
authenticated_user(sstring name);
const sstring& name() const;
/**
* Checks the user's superuser status.
* Only a superuser is allowed to perform CREATE USER and DROP USER queries.
* Im most cased, though not necessarily, a superuser will have Permission.ALL on every resource
* (depends on IAuthorizer implementation).
*/
bool is_super() const;
/**
* If IAuthenticator doesn't require authentication, this method may return true.
*/
bool is_anonymous() const {
return _anon;
}
#if 0
@Override
public String toString()
{
return "File (" + header + ", file: " + sstable.getFilename() + ")";
}
#endif
bool operator==(const authenticated_user&) const;
private:
sstring _name;
bool _anon;
};
} // namespace messages
} // namespace streaming
}

110
auth/authenticator.cc Normal file
View File

@@ -0,0 +1,110 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright 2016 Cloudius Systems
*
* Modified by Cloudius Systems
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "authenticator.hh"
#include "authenticated_user.hh"
#include "password_authenticator.hh"
#include "auth.hh"
#include "db/config.hh"
const sstring auth::authenticator::USERNAME_KEY("username");
const sstring auth::authenticator::PASSWORD_KEY("password");
const sstring auth::authenticator::ALLOW_ALL_AUTHENTICATOR_NAME("org.apache.cassandra.auth.AllowAllAuthenticator");
/**
* Authenticator is assumed to be a fully state-less immutable object (note all the const).
* We thus store a single instance globally, since it should be safe/ok.
*/
static std::unique_ptr<auth::authenticator> global_authenticator;
future<>
auth::authenticator::setup(const sstring& type) throw (exceptions::configuration_exception) {
if (auth::auth::is_class_type(type, ALLOW_ALL_AUTHENTICATOR_NAME)) {
class allow_all_authenticator : public authenticator {
public:
const sstring& class_name() const override {
return ALLOW_ALL_AUTHENTICATOR_NAME;
}
bool require_authentication() const override {
return false;
}
option_set supported_options() const override {
return option_set();
}
option_set alterable_options() const override {
return option_set();
}
future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) override {
return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>());
}
future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
return make_ready_future();
}
future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
return make_ready_future();
}
future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override {
return make_ready_future();
}
resource_ids protected_resources() const override {
return resource_ids();
}
::shared_ptr<sasl_challenge> new_sasl_challenge() const override {
throw std::runtime_error("Should not reach");
}
};
global_authenticator = std::make_unique<allow_all_authenticator>();
} else if (auth::auth::is_class_type(type, password_authenticator::PASSWORD_AUTHENTICATOR_NAME)) {
auto pwa = std::make_unique<password_authenticator>();
auto f = pwa->init();
return f.then([pwa = std::move(pwa)]() mutable {
global_authenticator = std::move(pwa);
});
} else {
throw exceptions::configuration_exception("Invalid authenticator type: " + type);
}
return make_ready_future();
}
auth::authenticator& auth::authenticator::get() {
assert(global_authenticator);
return *global_authenticator;
}

198
auth/authenticator.hh Normal file
View File

@@ -0,0 +1,198 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright 2016 Cloudius Systems
*
* Modified by Cloudius Systems
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <memory>
#include <unordered_map>
#include <set>
#include <stdexcept>
#include <boost/any.hpp>
#include <seastar/core/sstring.hh>
#include <seastar/core/future.hh>
#include <seastar/core/shared_ptr.hh>
#include <seastar/core/enum.hh>
#include "bytes.hh"
#include "data_resource.hh"
#include "enum_set.hh"
#include "exceptions/exceptions.hh"
namespace db {
class config;
}
namespace auth {
class authenticated_user;
class authenticator {
public:
static const sstring USERNAME_KEY;
static const sstring PASSWORD_KEY;
static const sstring ALLOW_ALL_AUTHENTICATOR_NAME;
/**
* Supported CREATE USER/ALTER USER options.
* Currently only PASSWORD is available.
*/
enum class option {
PASSWORD
};
using option_set = enum_set<super_enum<option, option::PASSWORD>>;
using option_map = std::unordered_map<option, boost::any, enum_hash<option>>;
using credentials_map = std::unordered_map<sstring, sstring>;
/**
* Resource id mappings, i.e. keyspace and/or column families.
*/
using resource_ids = std::set<data_resource>;
/**
* Setup is called once upon system startup to initialize the IAuthenticator.
*
* For example, use this method to create any required keyspaces/column families.
* Note: Only call from main thread.
*/
static future<> setup(const sstring& type) throw(exceptions::configuration_exception);
/**
* Returns the system authenticator. Must have called setup before calling this.
*/
static authenticator& get();
virtual ~authenticator()
{}
virtual const sstring& class_name() const = 0;
/**
* Whether or not the authenticator requires explicit login.
* If false will instantiate user with AuthenticatedUser.ANONYMOUS_USER.
*/
virtual bool require_authentication() const = 0;
/**
* Set of options supported by CREATE USER and ALTER USER queries.
* Should never return null - always return an empty set instead.
*/
virtual option_set supported_options() const = 0;
/**
* Subset of supportedOptions that users are allowed to alter when performing ALTER USER [themselves].
* Should never return null - always return an empty set instead.
*/
virtual option_set alterable_options() const = 0;
/**
* Authenticates a user given a Map<String, String> of credentials.
* Should never return null - always throw AuthenticationException instead.
* Returning AuthenticatedUser.ANONYMOUS_USER is an option as well if authentication is not required.
*
* @throws authentication_exception if credentials don't match any known user.
*/
virtual future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) = 0;
/**
* Called during execution of CREATE USER query (also may be called on startup, see seedSuperuserOptions method).
* If authenticator is static then the body of the method should be left blank, but don't throw an exception.
* options are guaranteed to be a subset of supportedOptions().
*
* @param username Username of the user to create.
* @param options Options the user will be created with.
* @throws exceptions::request_validation_exception
* @throws exceptions::request_execution_exception
*/
virtual future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
/**
* Called during execution of ALTER USER query.
* options are always guaranteed to be a subset of supportedOptions(). Furthermore, if the user performing the query
* is not a superuser and is altering himself, then options are guaranteed to be a subset of alterableOptions().
* Keep the body of the method blank if your implementation doesn't support any options.
*
* @param username Username of the user that will be altered.
* @param options Options to alter.
* @throws exceptions::request_validation_exception
* @throws exceptions::request_execution_exception
*/
virtual future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
/**
* Called during execution of DROP USER query.
*
* @param username Username of the user that will be dropped.
* @throws exceptions::request_validation_exception
* @throws exceptions::request_execution_exception
*/
virtual future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) = 0;
/**
* Set of resources that should be made inaccessible to users and only accessible internally.
*
* @return Keyspaces, column families that will be unmodifiable by users; other resources.
* @see resource_ids
*/
virtual resource_ids protected_resources() const = 0;
class sasl_challenge {
public:
virtual ~sasl_challenge() {}
virtual bytes evaluate_response(bytes_view client_response) throw(exceptions::authentication_exception) = 0;
virtual bool is_complete() const = 0;
virtual future<::shared_ptr<authenticated_user>> get_authenticated_user() const throw(exceptions::authentication_exception) = 0;
};
/**
* Provide a sasl_challenge to be used by the CQL binary protocol server. If
* the configured authenticator requires authentication but does not implement this
* interface we refuse to start the binary protocol server as it will have no way
* of authenticating clients.
* @return sasl_challenge implementation
*/
virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const = 0;
};
}

175
auth/data_resource.cc Normal file
View File

@@ -0,0 +1,175 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright 2016 Cloudius Systems
*
* Modified by Cloudius Systems
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "data_resource.hh"
#include <regex>
#include "service/storage_proxy.hh"
const sstring auth::data_resource::ROOT_NAME("data");
auth::data_resource::data_resource(level l, const sstring& ks, const sstring& cf)
: _ks(ks), _cf(cf)
{
if (l != get_level()) {
throw std::invalid_argument("level/keyspace/column mismatch");
}
}
auth::data_resource::data_resource()
: data_resource(level::ROOT)
{}
auth::data_resource::data_resource(const sstring& ks)
: data_resource(level::KEYSPACE, ks)
{}
auth::data_resource::data_resource(const sstring& ks, const sstring& cf)
: data_resource(level::COLUMN_FAMILY, ks, cf)
{}
auth::data_resource::level auth::data_resource::get_level() const {
if (!_cf.empty()) {
assert(!_ks.empty());
return level::COLUMN_FAMILY;
}
if (!_ks.empty()) {
return level::KEYSPACE;
}
return level::ROOT;
}
auth::data_resource auth::data_resource::from_name(
const sstring& s) {
static std::regex slash_regex("/");
auto i = std::regex_token_iterator<sstring::const_iterator>(s.begin(),
s.end(), slash_regex, -1);
auto e = std::regex_token_iterator<sstring::const_iterator>();
auto n = std::distance(i, e);
if (n > 3 || ROOT_NAME != sstring(*i++)) {
throw std::invalid_argument(sprint("%s is not a valid data resource name", s));
}
if (n == 1) {
return data_resource();
}
auto ks = *i++;
if (n == 2) {
return data_resource(ks.str());
}
auto cf = *i++;
return data_resource(ks.str(), cf.str());
}
sstring auth::data_resource::name() const {
switch (get_level()) {
case level::ROOT:
return ROOT_NAME;
case level::KEYSPACE:
return sprint("%s/%s", ROOT_NAME, _ks);
case level::COLUMN_FAMILY:
default:
return sprint("%s/%s/%s", ROOT_NAME, _ks, _cf);
}
}
auth::data_resource auth::data_resource::get_parent() const {
switch (get_level()) {
case level::KEYSPACE:
return data_resource();
case level::COLUMN_FAMILY:
return data_resource(_ks);
default:
throw std::invalid_argument("Root-level resource can't have a parent");
}
}
const sstring& auth::data_resource::keyspace() const
throw (std::invalid_argument) {
if (is_root_level()) {
throw std::invalid_argument("ROOT data resource has no keyspace");
}
return _ks;
}
const sstring& auth::data_resource::column_family() const
throw (std::invalid_argument) {
if (!is_column_family_level()) {
throw std::invalid_argument(sprint("%s data resource has no column family", name()));
}
return _cf;
}
bool auth::data_resource::has_parent() const {
return !is_root_level();
}
bool auth::data_resource::exists() const {
switch (get_level()) {
case level::ROOT:
return true;
case level::KEYSPACE:
return service::get_local_storage_proxy().get_db().local().has_keyspace(_ks);
case level::COLUMN_FAMILY:
default:
return service::get_local_storage_proxy().get_db().local().has_schema(_ks, _cf);
}
}
sstring auth::data_resource::to_string() const {
return name();
}
bool auth::data_resource::operator==(const data_resource& v) const {
return _ks == v._ks && _cf == v._cf;
}
bool auth::data_resource::operator<(const data_resource& v) const {
return _ks < v._ks ? true : (v._ks < _ks ? false : _cf < v._cf);
}
std::ostream& auth::operator<<(std::ostream& os, const data_resource& r) {
return os << r.name();
}

146
auth/data_resource.hh Normal file
View File

@@ -0,0 +1,146 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright 2016 Cloudius Systems
*
* Modified by Cloudius Systems
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <iosfwd>
#include <seastar/core/sstring.hh>
namespace auth {
class data_resource {
private:
enum class level {
ROOT, KEYSPACE, COLUMN_FAMILY
};
static const sstring ROOT_NAME;
sstring _ks;
sstring _cf;
data_resource(level, const sstring& ks = {}, const sstring& cf = {});
level get_level() const;
public:
/**
* Creates a DataResource representing the root-level resource.
* @return the root-level resource.
*/
data_resource();
/**
* Creates a DataResource representing a keyspace.
*
* @param keyspace Name of the keyspace.
*/
data_resource(const sstring& ks);
/**
* Creates a DataResource instance representing a column family.
*
* @param keyspace Name of the keyspace.
* @param columnFamily Name of the column family.
*/
data_resource(const sstring& ks, const sstring& cf);
/**
* Parses a data resource name into a DataResource instance.
*
* @param name Name of the data resource.
* @return DataResource instance matching the name.
*/
static data_resource from_name(const sstring&);
/**
* @return Printable name of the resource.
*/
sstring name() const;
/**
* @return Parent of the resource, if any. Throws IllegalStateException if it's the root-level resource.
*/
data_resource get_parent() const;
bool is_root_level() const {
return get_level() == level::ROOT;
}
bool is_keyspace_level() const {
return get_level() == level::KEYSPACE;
}
bool is_column_family_level() const {
return get_level() == level::COLUMN_FAMILY;
}
/**
* @return keyspace of the resource.
* @throws std::invalid_argument if it's the root-level resource.
*/
const sstring& keyspace() const throw(std::invalid_argument);
/**
* @return column family of the resource.
* @throws std::invalid_argument if it's not a cf-level resource.
*/
const sstring& column_family() const throw(std::invalid_argument);
/**
* @return Whether or not the resource has a parent in the hierarchy.
*/
bool has_parent() const;
/**
* @return Whether or not the resource exists in scylla.
*/
bool exists() const;
sstring to_string() const;
bool operator==(const data_resource&) const;
bool operator<(const data_resource&) const;
};
std::ostream& operator<<(std::ostream&, const data_resource&);
}

View File

@@ -0,0 +1,357 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright 2016 Cloudius Systems
*
* Modified by Cloudius Systems
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <unistd.h>
#include <crypt.h>
#include <random>
#include <chrono>
#include <seastar/core/reactor.hh>
#include "auth.hh"
#include "password_authenticator.hh"
#include "authenticated_user.hh"
#include "cql3/query_processor.hh"
#include "log.hh"
const sstring auth::password_authenticator::PASSWORD_AUTHENTICATOR_NAME("org.apache.cassandra.auth.PasswordAuthenticator");
// name of the hash column.
static const sstring SALTED_HASH = "salted_hash";
static const sstring USER_NAME = "username";
static const sstring DEFAULT_USER_NAME = auth::auth::DEFAULT_SUPERUSER_NAME;
static const sstring DEFAULT_USER_PASSWORD = auth::auth::DEFAULT_SUPERUSER_NAME;
static const sstring CREDENTIALS_CF = "credentials";
static logging::logger logger("password_authenticator");
auth::password_authenticator::~password_authenticator()
{}
auth::password_authenticator::password_authenticator()
{}
// TODO: blowfish
// Origin uses Java bcrypt library, i.e. blowfish salt
// generation and hashing, which is arguably a "better"
// password hash than sha/md5 versions usually available in
// crypt_r. Otoh, glibc 2.7+ uses a modified sha512 algo
// which should be the same order of safe, so the only
// real issue should be salted hash compatibility with
// origin if importing system tables from there.
//
// Since bcrypt/blowfish is _not_ (afaict) not available
// as a dev package/lib on most linux distros, we'd have to
// copy and compile for example OWL crypto
// (http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/glibc/crypt_blowfish/)
// to be fully bit-compatible.
//
// Until we decide this is needed, let's just use crypt_r,
// and some old-fashioned random salt generation.
static constexpr size_t rand_bytes = 16;
static sstring hashpw(const sstring& pass, const sstring& salt) {
// crypt_data is huge. should this be a thread_local static?
auto tmp = std::make_unique<crypt_data>();
tmp->initialized = 0;
auto res = crypt_r(pass.c_str(), salt.c_str(), tmp.get());
if (res == nullptr) {
throw std::system_error(errno, std::system_category());
}
return res;
}
static bool checkpw(const sstring& pass, const sstring& salted_hash) {
auto tmp = hashpw(pass, salted_hash);
return tmp == salted_hash;
}
static sstring gensalt() {
static sstring prefix;
std::random_device rd;
std::default_random_engine e1(rd());
std::uniform_int_distribution<char> dist;
sstring valid_salt = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789./";
sstring input(rand_bytes, 0);
for (char&c : input) {
c = valid_salt[dist(e1) % valid_salt.size()];
}
sstring salt;
if (!prefix.empty()) {
return prefix + salt;
}
auto tmp = std::make_unique<crypt_data>();
tmp->initialized = 0;
// Try in order:
// blowfish 2011 fix, blowfish, sha512, sha256, md5
for (sstring pfx : { "$2y$", "$2a$", "$6$", "$5$", "$1$" }) {
salt = pfx + input;
if (crypt_r("fisk", salt.c_str(), tmp.get())) {
prefix = pfx;
return salt;
}
}
throw std::runtime_error("Could not initialize hashing algorithm");
}
static sstring hashpw(const sstring& pass) {
return hashpw(pass, gensalt());
}
future<> auth::password_authenticator::init() {
gensalt(); // do this once to determine usable hashing
sstring create_table = sprint(
"CREATE TABLE %s.%s ("
"%s text,"
"%s text," // salt + hash + number of rounds
"options map<text,text>,"// for future extensions
"PRIMARY KEY(%s)"
") WITH gc_grace_seconds=%d",
auth::auth::AUTH_KS,
CREDENTIALS_CF, USER_NAME, SALTED_HASH, USER_NAME,
90 * 24 * 60 * 60); // 3 months.
return auth::setup_table(CREDENTIALS_CF, create_table).then([this] {
// instead of once-timer, just schedule this later
auth::schedule_when_up([] {
return auth::has_existing_users(CREDENTIALS_CF, DEFAULT_USER_NAME, USER_NAME).then([](bool exists) {
if (!exists) {
cql3::get_local_query_processor().process(sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?) USING TIMESTAMP 0",
auth::AUTH_KS,
CREDENTIALS_CF,
USER_NAME, SALTED_HASH
),
db::consistency_level::ONE, {DEFAULT_USER_NAME, hashpw(DEFAULT_USER_PASSWORD)}).then([](auto) {
logger.info("Created default user '{}'", DEFAULT_USER_NAME);
});
}
});
});
});
}
db::consistency_level auth::password_authenticator::consistency_for_user(const sstring& username) {
if (username == DEFAULT_USER_NAME) {
return db::consistency_level::QUORUM;
}
return db::consistency_level::LOCAL_ONE;
}
const sstring& auth::password_authenticator::class_name() const {
return PASSWORD_AUTHENTICATOR_NAME;
}
bool auth::password_authenticator::require_authentication() const {
return true;
}
auth::authenticator::option_set auth::password_authenticator::supported_options() const {
return option_set::of<option::PASSWORD>();
}
auth::authenticator::option_set auth::password_authenticator::alterable_options() const {
return option_set::of<option::PASSWORD>();
}
future<::shared_ptr<auth::authenticated_user> > auth::password_authenticator::authenticate(
const credentials_map& credentials) const
throw (exceptions::authentication_exception) {
if (!credentials.count(USERNAME_KEY)) {
throw exceptions::authentication_exception(sprint("Required key '%s' is missing", USERNAME_KEY));
}
if (!credentials.count(PASSWORD_KEY)) {
throw exceptions::authentication_exception(sprint("Required key '%s' is missing", PASSWORD_KEY));
}
auto& username = credentials.at(USERNAME_KEY);
auto& password = credentials.at(PASSWORD_KEY);
// Here was a thread local, explicit cache of prepared statement. In normal execution this is
// fine, but since we in testing set up and tear down system over and over, we'd start using
// obsolete prepared statements pretty quickly.
// Rely on query processing caching statements instead, and lets assume
// that a map lookup string->statement is not gonna kill us much.
auto& qp = cql3::get_local_query_processor();
return qp.process(
sprint("SELECT %s FROM %s.%s WHERE %s = ?", SALTED_HASH,
auth::AUTH_KS, CREDENTIALS_CF, USER_NAME),
consistency_for_user(username), { username }, true).then_wrapped(
[=](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
auto res = f.get0();
if (res->empty() || !checkpw(password, res->one().get_as<sstring>(SALTED_HASH))) {
throw exceptions::authentication_exception("Username and/or password are incorrect");
}
return make_ready_future<::shared_ptr<authenticated_user>>(::make_shared<authenticated_user>(username));
} catch (std::system_error &) {
std::throw_with_nested(exceptions::authentication_exception("Could not verify password"));
} catch (exceptions::request_execution_exception& e) {
std::throw_with_nested(exceptions::authentication_exception(e.what()));
}
});
}
future<> auth::password_authenticator::create(sstring username,
const option_map& options)
throw (exceptions::request_validation_exception,
exceptions::request_execution_exception) {
try {
auto password = boost::any_cast<sstring>(options.at(option::PASSWORD));
auto query = sprint("INSERT INTO %s.%s (%s, %s) VALUES (?, ?)",
auth::AUTH_KS, CREDENTIALS_CF, USER_NAME, SALTED_HASH);
auto& qp = cql3::get_local_query_processor();
return qp.process(query, consistency_for_user(username), { username, hashpw(password) }).discard_result();
} catch (std::out_of_range&) {
throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
}
}
future<> auth::password_authenticator::alter(sstring username,
const option_map& options)
throw (exceptions::request_validation_exception,
exceptions::request_execution_exception) {
try {
auto password = boost::any_cast<sstring>(options.at(option::PASSWORD));
auto query = sprint("UPDATE %s.%s SET %s = ? WHERE %s = ?",
auth::AUTH_KS, CREDENTIALS_CF, SALTED_HASH, USER_NAME);
auto& qp = cql3::get_local_query_processor();
return qp.process(query, consistency_for_user(username), { hashpw(password), username }).discard_result();
} catch (std::out_of_range&) {
throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
}
}
future<> auth::password_authenticator::drop(sstring username)
throw (exceptions::request_validation_exception,
exceptions::request_execution_exception) {
try {
auto query = sprint("DELETE FROM %s.%s WHERE %s = ?",
auth::AUTH_KS, CREDENTIALS_CF, USER_NAME);
auto& qp = cql3::get_local_query_processor();
return qp.process(query, consistency_for_user(username), { username }).discard_result();
} catch (std::out_of_range&) {
throw exceptions::invalid_request_exception("PasswordAuthenticator requires PASSWORD option");
}
}
auth::authenticator::resource_ids auth::password_authenticator::protected_resources() const {
return { data_resource(auth::AUTH_KS, CREDENTIALS_CF) };
}
::shared_ptr<auth::authenticator::sasl_challenge> auth::password_authenticator::new_sasl_challenge() const {
class plain_text_password_challenge: public sasl_challenge {
public:
plain_text_password_challenge(const password_authenticator& a)
: _authenticator(a)
{}
/**
* SASL PLAIN mechanism specifies that credentials are encoded in a
* sequence of UTF-8 bytes, delimited by 0 (US-ASCII NUL).
* The form is : {code}authzId<NUL>authnId<NUL>password<NUL>{code}
* authzId is optional, and in fact we don't care about it here as we'll
* set the authzId to match the authnId (that is, there is no concept of
* a user being authorized to act on behalf of another).
*
* @param bytes encoded credentials string sent by the client
* @return map containing the username/password pairs in the form an IAuthenticator
* would expect
* @throws javax.security.sasl.SaslException
*/
bytes evaluate_response(bytes_view client_response)
throw (exceptions::authentication_exception) override {
logger.debug("Decoding credentials from client token");
sstring username, password;
auto b = client_response.crbegin();
auto e = client_response.crend();
auto i = b;
while (i != e) {
if (*i == 0) {
sstring tmp(i.base(), b.base());
if (password.empty()) {
password = std::move(tmp);
} else if (username.empty()) {
username = std::move(tmp);
}
b = ++i;
continue;
}
++i;
}
if (username.empty()) {
throw exceptions::authentication_exception("Authentication ID must not be null");
}
if (password.empty()) {
throw exceptions::authentication_exception("Password must not be null");
}
_credentials[USERNAME_KEY] = std::move(username);
_credentials[PASSWORD_KEY] = std::move(password);
_complete = true;
return {};
}
bool is_complete() const override {
return _complete;
}
future<::shared_ptr<authenticated_user>> get_authenticated_user() const
throw (exceptions::authentication_exception) override {
return _authenticator.authenticate(_credentials);
}
private:
const password_authenticator& _authenticator;
credentials_map _credentials;
bool _complete = false;
};
return ::make_shared<plain_text_password_challenge>(*this);
}

View File

@@ -14,9 +14,12 @@
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright 2016 Cloudius Systems
*
* Modified by Cloudius Systems.
* Copyright 2015 Cloudius Systems.
* Modified by Cloudius Systems
*/
/*
@@ -38,54 +41,33 @@
#pragma once
namespace streaming {
namespace messages {
#include "authenticator.hh"
/**
* StreamMessage is an abstract base class that every messages in streaming protocol inherit.
*
* Every message carries message type({@link Type}) and streaming protocol version byte.
*/
class stream_message {
namespace auth {
class password_authenticator : public authenticator {
public:
enum class Type {
PREPARE,
FILE,
RECEIVED,
RETRY,
COMPLETE,
SESSION_FAILED,
};
static const sstring PASSWORD_AUTHENTICATOR_NAME;
Type type;
int priority;
password_authenticator();
~password_authenticator();
stream_message() = default;
future<> init();
stream_message(Type type_)
: type(type_) {
if (type == Type::PREPARE) {
priority = 5;
} else if (type == Type::FILE) {
priority = 0;
} else if (type == Type::RECEIVED) {
priority = 4;
} else if (type == Type::RETRY) {
priority = 4;
} else if (type == Type::COMPLETE) {
priority = 1;
} else if (type == Type::SESSION_FAILED) {
priority = 5;
}
}
const sstring& class_name() const override;
bool require_authentication() const override;
option_set supported_options() const override;
option_set alterable_options() const override;
future<::shared_ptr<authenticated_user>> authenticate(const credentials_map& credentials) const throw(exceptions::authentication_exception) override;
future<> create(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
future<> alter(sstring username, const option_map& options) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
future<> drop(sstring username) throw(exceptions::request_validation_exception, exceptions::request_execution_exception) override;
resource_ids protected_resources() const override;
::shared_ptr<sasl_challenge> new_sasl_challenge() const override;
/**
* @return priority of this message. higher value, higher priority.
*/
int get_priority() {
return priority;
}
static db::consistency_level consistency_for_user(const sstring& username);
};
} // namespace messages
} // namespace streaming
}

49
auth/permission.cc Normal file
View File

@@ -0,0 +1,49 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright 2016 Cloudius Systems
*
* Modified by Cloudius Systems
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "permission.hh"
const auth::permission_set auth::ALL_DATA = auth::permission_set::of
< auth::permission::CREATE, auth::permission::ALTER,
auth::permission::DROP, auth::permission::SELECT,
auth::permission::MODIFY, auth::permission::AUTHORIZE>();
const auth::permission_set auth::ALL = auth::ALL_DATA;
const auth::permission_set auth::NONE;

81
auth/permission.hh Normal file
View File

@@ -0,0 +1,81 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright 2016 Cloudius Systems
*
* Modified by Cloudius Systems
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "enum_set.hh"
namespace auth {
enum class permission {
//Deprecated
READ,
//Deprecated
WRITE,
// schema management
CREATE, // required for CREATE KEYSPACE and CREATE TABLE.
ALTER, // required for ALTER KEYSPACE, ALTER TABLE, CREATE INDEX, DROP INDEX.
DROP, // required for DROP KEYSPACE and DROP TABLE.
// data access
SELECT, // required for SELECT.
MODIFY, // required for INSERT, UPDATE, DELETE, TRUNCATE.
// permission management
AUTHORIZE, // required for GRANT and REVOKE.
};
typedef enum_set<super_enum<permission,
permission::READ,
permission::WRITE,
permission::CREATE,
permission::ALTER,
permission::DROP,
permission::SELECT,
permission::MODIFY,
permission::AUTHORIZE>> permission_set;
extern const permission_set ALL_DATA;
extern const permission_set ALL;
extern const permission_set NONE;
}

View File

@@ -22,6 +22,7 @@
#pragma once
#include "core/sstring.hh"
#include "hashing.hh"
#include <experimental/optional>
#include <iosfwd>
#include <functional>
@@ -57,3 +58,20 @@ std::ostream& operator<<(std::ostream& os, const bytes_view& b);
}
template<>
struct appending_hash<bytes> {
template<typename Hasher>
void operator()(Hasher& h, const bytes& v) const {
feed_hash(h, v.size());
h.update(reinterpret_cast<const char*>(v.cbegin()), v.size() * sizeof(bytes::value_type));
}
};
template<>
struct appending_hash<bytes_view> {
template<typename Hasher>
void operator()(Hasher& h, bytes_view v) const {
feed_hash(h, v.size());
h.update(reinterpret_cast<const char*>(v.begin()), v.size() * sizeof(bytes_view::value_type));
}
};

View File

@@ -24,6 +24,7 @@
#include "types.hh"
#include "net/byteorder.hh"
#include "core/unaligned.hh"
#include "hashing.hh"
/**
* Utility for writing data into a buffer when its final size is not known up front.
@@ -205,6 +206,10 @@ public:
}
}
void write(const char* ptr, size_t size) {
write(bytes_view(reinterpret_cast<const signed char*>(ptr), size));
}
// Writes given sequence of bytes with a preceding length component encoded in big-endian format
inline void write_blob(bytes_view v) {
assert((size_type)v.size() == v.size());
@@ -332,3 +337,13 @@ public:
_current->offset = pos._offset;
}
};
template<>
struct appending_hash<bytes_ostream> {
template<typename Hasher>
void operator()(Hasher& h, const bytes_ostream& b) const {
for (auto&& frag : b.fragments()) {
feed_hash(h, frag);
}
}
};

103
canonical_mutation.cc Normal file
View File

@@ -0,0 +1,103 @@
/*
* Copyright (C) 2015 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "canonical_mutation.hh"
#include "mutation.hh"
#include "mutation_partition_serializer.hh"
#include "converting_mutation_partition_applier.hh"
#include "hashing_partition_visitor.hh"
template class db::serializer<canonical_mutation>;
//
// Representation layout:
//
// <canonical_mutation> ::= <column_family_id> <table_schema_version> <partition_key> <column-mapping> <partition>
//
// For <partition> see mutation_partition_serializer.cc
// For <column-mapping> see db::serializer<column_mapping>
//
canonical_mutation::canonical_mutation(bytes data)
: _data(std::move(data))
{ }
canonical_mutation::canonical_mutation(const mutation& m)
: _data([&m] {
bytes_ostream out;
db::serializer<utils::UUID>(m.column_family_id()).write(out);
db::serializer<table_schema_version>(m.schema()->version()).write(out);
db::serializer<partition_key_view>(m.key()).write(out);
db::serializer<column_mapping>(m.schema()->get_column_mapping()).write(out);
mutation_partition_serializer ser(*m.schema(), m.partition());
ser.write(out);
return to_bytes(out.linearize());
}())
{ }
utils::UUID canonical_mutation::column_family_id() const {
data_input in(_data);
return db::serializer<utils::UUID>::read(in);
}
mutation canonical_mutation::to_mutation(schema_ptr s) const {
data_input in(_data);
auto cf_id = db::serializer<utils::UUID>::read(in);
if (s->id() != cf_id) {
throw std::runtime_error(sprint("Attempted to deserialize canonical_mutation of table %s with schema of table %s (%s.%s)",
cf_id, s->id(), s->ks_name(), s->cf_name()));
}
auto version = db::serializer<table_schema_version>::read(in);
auto pk = partition_key(db::serializer<partition_key_view>::read(in));
mutation m(std::move(pk), std::move(s));
if (version == m.schema()->version()) {
db::serializer<column_mapping>::skip(in);
auto partition_view = mutation_partition_serializer::read_as_view(in);
m.partition().apply(*m.schema(), partition_view, *m.schema());
} else {
column_mapping cm = db::serializer<column_mapping>::read(in);
converting_mutation_partition_applier v(cm, *m.schema(), m.partition());
auto partition_view = mutation_partition_serializer::read_as_view(in);
partition_view.accept(cm, v);
}
return m;
}
template<>
db::serializer<canonical_mutation>::serializer(const canonical_mutation& v)
: _item(v)
, _size(db::serializer<bytes>(v._data).size())
{ }
template<>
void
db::serializer<canonical_mutation>::write(output& out, const canonical_mutation& v) {
db::serializer<bytes>(v._data).write(out);
}
template<>
canonical_mutation db::serializer<canonical_mutation>::read(input& in) {
return canonical_mutation(db::serializer<bytes>::read(in));
}

65
canonical_mutation.hh Normal file
View File

@@ -0,0 +1,65 @@
/*
* Copyright (C) 2015 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "bytes.hh"
#include "schema.hh"
#include "database_fwd.hh"
#include "db/serializer.hh"
#include "mutation_partition_visitor.hh"
#include "mutation_partition_serializer.hh"
// Immutable mutation form which can be read using any schema version of the same table.
// Safe to access from other shards via const&.
// Safe to pass serialized across nodes.
class canonical_mutation {
bytes _data;
canonical_mutation(bytes);
public:
explicit canonical_mutation(const mutation&);
canonical_mutation(canonical_mutation&&) = default;
canonical_mutation(const canonical_mutation&) = default;
canonical_mutation& operator=(const canonical_mutation&) = default;
canonical_mutation& operator=(canonical_mutation&&) = default;
// Create a mutation object interpreting this canonical mutation using
// given schema.
//
// Data which is not representable in the target schema is dropped. If this
// is not intended, user should sync the schema first.
mutation to_mutation(schema_ptr) const;
utils::UUID column_family_id() const;
friend class db::serializer<canonical_mutation>;
};
namespace db {
template<> serializer<canonical_mutation>::serializer(const canonical_mutation&);
template<> void serializer<canonical_mutation>::write(output&, const canonical_mutation&);
template<> canonical_mutation serializer<canonical_mutation>::read(input&);
extern template class serializer<canonical_mutation>;
}

View File

@@ -34,6 +34,8 @@ enum class compaction_strategy_type {
};
class compaction_strategy_impl;
class sstable;
struct compaction_descriptor;
class compaction_strategy {
::shared_ptr<compaction_strategy_impl> _compaction_strategy_impl;
@@ -46,7 +48,9 @@ public:
compaction_strategy(compaction_strategy&&);
compaction_strategy& operator=(compaction_strategy&&);
future<> compact(column_family& cfs);
// Return a list of sstables to be compacted after applying the strategy.
compaction_descriptor get_sstables_for_compaction(column_family& cfs, std::vector<lw_shared_ptr<sstable>> candidates);
static sstring name(compaction_strategy_type type) {
switch (type) {
case compaction_strategy_type::null:

View File

@@ -169,6 +169,17 @@ rpc_address: localhost
# port for Thrift to listen for clients on
rpc_port: 9160
# port for REST API server
api_port: 10000
# IP for the REST API server
api_address: 127.0.0.1
# Log WARN on any batch size exceeding this value. 5kb per batch by default.
# Caution should be taken on increasing the size of this threshold as it can lead to node instability.
batch_size_warn_threshold_in_kb: 5
###################################################
## Not currently supported, reserved for future use
###################################################
@@ -205,7 +216,7 @@ rpc_port: 9160
# reduced proportionally to the number of nodes in the cluster.
# batchlog_replay_throttle_in_kb: 1024
# Authentication backend, implementing IAuthenticator; used to identify users
# Authentication backend, identifying users
# Out of the box, Scylla provides org.apache.cassandra.auth.{AllowAllAuthenticator,
# PasswordAuthenticator}.
#
@@ -599,10 +610,6 @@ commitlog_total_space_in_mb: -1
# column_index_size_in_kb: 64
# Log WARN on any batch size exceeding this value. 5kb per batch by default.
# Caution should be taken on increasing the size of this threshold as it can lead to node instability.
# batch_size_warn_threshold_in_kb: 5
# Number of simultaneous compactions to allow, NOT including
# validation "compactions" for anti-entropy repair. Simultaneous
# compactions can help preserve read performance in a mixed read/write

View File

@@ -50,6 +50,9 @@ def apply_tristate(var, test, note, missing):
return False
return False
def have_pkg(package):
return subprocess.call(['pkg-config', package]) == 0
def pkg_config(option, package):
output = subprocess.check_output(['pkg-config', option, package])
return output.decode('utf-8').strip()
@@ -134,6 +137,7 @@ modes = {
scylla_tests = [
'tests/mutation_test',
'tests/canonical_mutation_test',
'tests/range_test',
'tests/types_test',
'tests/keys_test',
@@ -151,6 +155,7 @@ scylla_tests = [
'tests/perf/perf_sstable',
'tests/cql_query_test',
'tests/storage_proxy_test',
'tests/schema_change_test',
'tests/mutation_reader_test',
'tests/key_reader_test',
'tests/mutation_query_test',
@@ -184,6 +189,7 @@ scylla_tests = [
'tests/crc_test',
'tests/flush_queue_test',
'tests/dynamic_bitset_test',
'tests/auth_test',
]
apps = [
@@ -222,6 +228,8 @@ arg_parser.add_argument('--static-stdc++', dest = 'staticcxx', action = 'store_t
help = 'Link libgcc and libstdc++ statically')
arg_parser.add_argument('--tests-debuginfo', action = 'store', dest = 'tests_debuginfo', type = int, default = 0,
help = 'Enable(1)/disable(0)compiler debug information generation for tests')
arg_parser.add_argument('--python', action = 'store', dest = 'python', default = 'python3',
help = 'Python3 path')
add_tristate(arg_parser, name = 'hwloc', dest = 'hwloc', help = 'hwloc support')
add_tristate(arg_parser, name = 'xen', dest = 'xen', help = 'Xen support')
args = arg_parser.parse_args()
@@ -235,11 +243,15 @@ cassandra_interface = Thrift(source = 'interface/cassandra.thrift', service = 'C
scylla_core = (['database.cc',
'schema.cc',
'frozen_schema.cc',
'schema_registry.cc',
'bytes.cc',
'mutation.cc',
'row_cache.cc',
'canonical_mutation.cc',
'frozen_mutation.cc',
'memtable.cc',
'schema_mutations.cc',
'release.cc',
'utils/logalloc.cc',
'utils/large_bitset.cc',
@@ -257,6 +269,7 @@ scylla_core = (['database.cc',
'sstables/partition.cc',
'sstables/filter.cc',
'sstables/compaction.cc',
'sstables/compaction_manager.cc',
'log.cc',
'transport/event.cc',
'transport/event_notifier.cc',
@@ -304,6 +317,7 @@ scylla_core = (['database.cc',
'utils/big_decimal.cc',
'types.cc',
'validation.cc',
'service/priority_manager.cc',
'service/migration_manager.cc',
'service/storage_proxy.cc',
'cql3/operator.cc',
@@ -341,7 +355,6 @@ scylla_core = (['database.cc',
'utils/bloom_filter.cc',
'utils/bloom_calculations.cc',
'utils/rate_limiter.cc',
'utils/compaction_manager.cc',
'utils/file_lock.cc',
'utils/dynamic_bitset.cc',
'gms/version_generator.cc',
@@ -375,13 +388,13 @@ scylla_core = (['database.cc',
'locator/ec2_snitch.cc',
'locator/ec2_multi_region_snitch.cc',
'message/messaging_service.cc',
'service/client_state.cc',
'service/migration_task.cc',
'service/storage_service.cc',
'service/pending_range_calculator_service.cc',
'service/load_broadcaster.cc',
'service/pager/paging_state.cc',
'service/pager/query_pagers.cc',
'streaming/streaming.cc',
'streaming/stream_task.cc',
'streaming/stream_session.cc',
'streaming/stream_request.cc',
@@ -394,13 +407,6 @@ scylla_core = (['database.cc',
'streaming/stream_coordinator.cc',
'streaming/stream_manager.cc',
'streaming/stream_result_future.cc',
'streaming/messages/stream_init_message.cc',
'streaming/messages/retry_message.cc',
'streaming/messages/received_message.cc',
'streaming/messages/prepare_message.cc',
'streaming/messages/file_message_header.cc',
'streaming/messages/outgoing_file_message.cc',
'streaming/messages/incoming_file_message.cc',
'streaming/stream_session_state.cc',
'gc_clock.cc',
'partition_slice_builder.cc',
@@ -408,6 +414,12 @@ scylla_core = (['database.cc',
'repair/repair.cc',
'exceptions/exceptions.cc',
'dns.cc',
'auth/auth.cc',
'auth/authenticated_user.cc',
'auth/authenticator.cc',
'auth/data_resource.cc',
'auth/password_authenticator.cc',
'auth/permission.cc',
]
+ [Antlr3Grammar('cql3/Cql.g')]
+ [Thrift('interface/cassandra.thrift', 'Cassandra')]
@@ -447,7 +459,21 @@ api = ['api/api.cc',
'api/system.cc'
]
scylla_tests_dependencies = scylla_core + [
idls = ['idl/gossip_digest.idl.hh',
'idl/uuid.idl.hh',
'idl/range.idl.hh',
'idl/keys.idl.hh',
'idl/read_command.idl.hh',
'idl/token.idl.hh',
'idl/ring_position.idl.hh',
'idl/result.idl.hh',
'idl/frozen_mutation.idl.hh',
'idl/reconcilable_result.idl.hh',
'idl/streaming.idl.hh',
'idl/paging_state.idl.hh',
]
scylla_tests_dependencies = scylla_core + api + idls + [
'tests/cql_test_env.cc',
'tests/cql_assertions.cc',
'tests/result_set_assertions.cc',
@@ -460,15 +486,15 @@ scylla_tests_seastar_deps = [
]
deps = {
'scylla': ['main.cc'] + scylla_core + api,
'scylla': idls + ['main.cc'] + scylla_core + api,
}
tests_not_using_seastar_test_framework = set([
'tests/types_test',
'tests/keys_test',
'tests/partitioner_test',
'tests/map_difference_test',
'tests/frozen_mutation_test',
'tests/canonical_mutation_test',
'tests/perf/perf_mutation',
'tests/lsa_async_eviction_test',
'tests/lsa_sync_eviction_test',
@@ -530,6 +556,32 @@ else:
args.pie = ''
args.fpie = ''
# a list element means a list of alternative packages to consider
# the first element becomes the HAVE_pkg define
# a string element is a package name with no alternatives
optional_packages = [['libsystemd', 'libsystemd-daemon']]
pkgs = []
def setup_first_pkg_of_list(pkglist):
# The HAVE_pkg symbol is taken from the first alternative
upkg = pkglist[0].upper().replace('-', '_')
for pkg in pkglist:
if have_pkg(pkg):
pkgs.append(pkg)
defines.append('HAVE_{}=1'.format(upkg))
return True
return False
for pkglist in optional_packages:
if isinstance(pkglist, str):
pkglist = [pkglist]
if not setup_first_pkg_of_list(pkglist):
if len(pkglist) == 1:
print('Missing optional package {pkglist[0]}'.format(**locals()))
else:
alternatives = ':'.join(pkglist[1:])
print('Missing optional package {pkglist[0]} (or alteratives {alternatives})'.format(**locals()))
defines = ' '.join(['-D' + d for d in defines])
globals().update(vars(args))
@@ -562,7 +614,7 @@ elif args.dpdk_target:
seastar_cflags = args.user_cflags + " -march=nehalem"
seastar_flags += ['--compiler', args.cxx, '--cflags=%s' % (seastar_cflags)]
status = subprocess.call(['./configure.py'] + seastar_flags, cwd = 'seastar')
status = subprocess.call([python, './configure.py'] + seastar_flags, cwd = 'seastar')
if status != 0:
print('Seastar configuration failed')
@@ -591,7 +643,10 @@ for mode in build_modes:
seastar_deps = 'practically_anything_can_change_so_lets_run_it_every_time_and_restat.'
args.user_cflags += " " + pkg_config("--cflags", "jsoncpp")
libs = "-lyaml-cpp -llz4 -lz -lsnappy " + pkg_config("--libs", "jsoncpp") + ' -lboost_filesystem'
libs = "-lyaml-cpp -llz4 -lz -lsnappy " + pkg_config("--libs", "jsoncpp") + ' -lboost_filesystem' + ' -lcrypt'
for pkg in pkgs:
args.user_cflags += ' ' + pkg_config('--cflags', pkg)
libs += ' ' + pkg_config('--libs', pkg)
user_cflags = args.user_cflags
user_ldflags = args.user_ldflags
if args.staticcxx:
@@ -623,6 +678,9 @@ with open(buildfile, 'w') as f:
rule swagger
command = seastar/json/json2code.py -f $in -o $out
description = SWAGGER $out
rule serializer
command = ./idl-compiler.py --ns ser -f $in -o $out
description = IDL compiler $out
rule ninja
command = {ninja} -C $subdir $target
restat = 1
@@ -659,6 +717,7 @@ with open(buildfile, 'w') as f:
compiles = {}
ragels = {}
swaggers = {}
serializers = {}
thrifts = set()
antlr3_grammars = set()
for binary in build_artifacts:
@@ -712,6 +771,9 @@ with open(buildfile, 'w') as f:
elif src.endswith('.rl'):
hh = '$builddir/' + mode + '/gen/' + src.replace('.rl', '.hh')
ragels[hh] = src
elif src.endswith('.idl.hh'):
hh = '$builddir/' + mode + '/gen/' + src.replace('.idl.hh', '.dist.hh')
serializers[hh] = src
elif src.endswith('.json'):
hh = '$builddir/' + mode + '/gen/' + src + '.hh'
swaggers[hh] = src
@@ -730,6 +792,7 @@ with open(buildfile, 'w') as f:
for g in antlr3_grammars:
gen_headers += g.headers('$builddir/{}/gen'.format(mode))
gen_headers += list(swaggers.keys())
gen_headers += list(serializers.keys())
f.write('build {}: cxx.{} {} || {} \n'.format(obj, mode, src, ' '.join(gen_headers)))
if src in extra_cxxflags:
f.write(' cxxflags = {seastar_cflags} $cxxflags $cxxflags_{mode} {extra_cxxflags}\n'.format(mode = mode, extra_cxxflags = extra_cxxflags[src], **modeval))
@@ -739,6 +802,9 @@ with open(buildfile, 'w') as f:
for hh in swaggers:
src = swaggers[hh]
f.write('build {}: swagger {}\n'.format(hh,src))
for hh in serializers:
src = serializers[hh]
f.write('build {}: serializer {} | idl-compiler.py\n'.format(hh,src))
for thrift in thrifts:
outs = ' '.join(thrift.generated('$builddir/{}/gen'.format(mode)))
f.write('build {}: thrift.{} {}\n'.format(outs, mode, thrift.source))
@@ -758,7 +824,7 @@ with open(buildfile, 'w') as f:
f.write('build {}: phony\n'.format(seastar_deps))
f.write(textwrap.dedent('''\
rule configure
command = python3 configure.py $configure_args
command = {python} configure.py $configure_args
generator = 1
build build.ninja: configure | configure.py
rule cscope

View File

@@ -0,0 +1,119 @@
/*
* Copyright (C) 2015 Cloudius Systems, Ltd.
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "mutation_partition_view.hh"
#include "schema.hh"
// Mutation partition visitor which applies visited data into
// existing mutation_partition. The visited data may be of a different schema.
// Data which is not representable in the new schema is dropped.
// Weak exception guarantees.
class converting_mutation_partition_applier : public mutation_partition_visitor {
const schema& _p_schema;
mutation_partition& _p;
const column_mapping& _visited_column_mapping;
deletable_row* _current_row;
private:
static bool is_compatible(const column_definition& new_def, const data_type& old_type, column_kind kind) {
return new_def.kind == kind && new_def.type->is_value_compatible_with(*old_type);
}
void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, atomic_cell_view cell) {
if (is_compatible(new_def, old_type, kind) && cell.timestamp() > new_def.dropped_at()) {
dst.apply(new_def, atomic_cell_or_collection(cell));
}
}
void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, collection_mutation_view cell) {
if (!is_compatible(new_def, old_type, kind)) {
return;
}
auto&& ctype = static_pointer_cast<const collection_type_impl>(old_type);
auto old_view = ctype->deserialize_mutation_form(cell);
collection_type_impl::mutation_view new_view;
if (old_view.tomb.timestamp > new_def.dropped_at()) {
new_view.tomb = old_view.tomb;
}
for (auto& c : old_view.cells) {
if (c.second.timestamp() > new_def.dropped_at()) {
new_view.cells.emplace_back(std::move(c));
}
}
dst.apply(new_def, ctype->serialize_mutation_form(std::move(new_view)));
}
public:
converting_mutation_partition_applier(
const column_mapping& visited_column_mapping,
const schema& target_schema,
mutation_partition& target)
: _p_schema(target_schema)
, _p(target)
, _visited_column_mapping(visited_column_mapping)
{ }
virtual void accept_partition_tombstone(tombstone t) override {
_p.apply(t);
}
virtual void accept_static_cell(column_id id, atomic_cell_view cell) override {
const column_mapping::column& col = _visited_column_mapping.static_column_at(id);
const column_definition* def = _p_schema.get_column_definition(col.name());
if (def) {
accept_cell(_p._static_row, column_kind::static_column, *def, col.type(), cell);
}
}
virtual void accept_static_cell(column_id id, collection_mutation_view collection) override {
const column_mapping::column& col = _visited_column_mapping.static_column_at(id);
const column_definition* def = _p_schema.get_column_definition(col.name());
if (def) {
accept_cell(_p._static_row, column_kind::static_column, *def, col.type(), collection);
}
}
virtual void accept_row_tombstone(clustering_key_prefix_view prefix, tombstone t) override {
_p.apply_row_tombstone(_p_schema, prefix, t);
}
virtual void accept_row(clustering_key_view key, tombstone deleted_at, const row_marker& rm) override {
deletable_row& r = _p.clustered_row(_p_schema, key);
r.apply(rm);
r.apply(deleted_at);
_current_row = &r;
}
virtual void accept_row_cell(column_id id, atomic_cell_view cell) override {
const column_mapping::column& col = _visited_column_mapping.regular_column_at(id);
const column_definition* def = _p_schema.get_column_definition(col.name());
if (def) {
accept_cell(_current_row->cells(), column_kind::regular_column, *def, col.type(), cell);
}
}
virtual void accept_row_cell(column_id id, collection_mutation_view collection) override {
const column_mapping::column& col = _visited_column_mapping.regular_column_at(id);
const column_definition* def = _p_schema.get_column_definition(col.name());
if (def) {
accept_cell(_current_row->cells(), column_kind::regular_column, *def, col.type(), collection);
}
}
};

View File

@@ -99,9 +99,9 @@ query_options::query_options(query_options&& o, std::vector<std::vector<bytes_vi
_batch_options = std::move(tmp);
}
query_options::query_options(std::vector<bytes_opt> values)
query_options::query_options(db::consistency_level cl, std::vector<bytes_opt> values)
: query_options(
db::consistency_level::ONE,
cl,
{},
std::move(values),
{},
@@ -120,6 +120,11 @@ query_options::query_options(std::vector<bytes_opt> values)
}
}
query_options::query_options(std::vector<bytes_opt> values)
: query_options(
db::consistency_level::ONE, std::move(values))
{}
db::consistency_level query_options::get_consistency() const
{
return _consistency;

View File

@@ -112,6 +112,7 @@ public:
// forInternalUse
explicit query_options(std::vector<bytes_opt> values);
explicit query_options(db::consistency_level, std::vector<bytes_opt> values);
db::consistency_level get_consistency() const;
bytes_view_opt get_value_at(size_t idx) const;

View File

@@ -109,6 +109,7 @@ future<> query_processor::stop()
future<::shared_ptr<result_message>>
query_processor::process(const sstring_view& query_string, service::query_state& query_state, query_options& options)
{
log.trace("process: \"{}\"", query_string);
auto p = get_statement(query_string, query_state.get_client_state());
options.prepare(p->bound_names);
auto cql_statement = p->statement;
@@ -299,8 +300,9 @@ query_processor::parse_statement(const sstring_view& query)
}
query_options query_processor::make_internal_options(
::shared_ptr<statements::parsed_statement::prepared> p,
const std::initializer_list<data_value>& values) {
::shared_ptr<statements::parsed_statement::prepared> p,
const std::initializer_list<data_value>& values,
db::consistency_level cl) {
if (p->bound_names.size() != values.size()) {
throw std::invalid_argument(sprint("Invalid number of values. Expecting %d but got %d", p->bound_names.size(), values.size()));
}
@@ -316,13 +318,12 @@ query_options query_processor::make_internal_options(
bound_values.push_back({n->type->decompose(v)});
}
}
return query_options(bound_values);
return query_options(cl, bound_values);
}
::shared_ptr<statements::parsed_statement::prepared> query_processor::prepare_internal(
const std::experimental::string_view& query_string) {
auto& p = _internal_statements[sstring(query_string.begin(), query_string.end())];
const sstring& query_string) {
auto& p = _internal_statements[query_string];
if (p == nullptr) {
auto np = parse_statement(query_string)->prepare(_db.local());
np->statement->validate(_proxy, *_internal_state);
@@ -332,22 +333,54 @@ query_options query_processor::make_internal_options(
}
future<::shared_ptr<untyped_result_set>> query_processor::execute_internal(
const std::experimental::string_view& query_string,
const sstring& query_string,
const std::initializer_list<data_value>& values) {
if (log.is_enabled(logging::log_level::trace)) {
log.trace("execute_internal: \"{}\" ({})", query_string, ::join(", ", values));
}
auto p = prepare_internal(query_string);
return execute_internal(p, values);
}
future<::shared_ptr<untyped_result_set>> query_processor::execute_internal(
::shared_ptr<statements::parsed_statement::prepared> p,
const std::initializer_list<data_value>& values) {
auto opts = make_internal_options(p, values);
return do_with(std::move(opts),
[this, p = std::move(p)](query_options & opts) {
return p->statement->execute_internal(_proxy, *_internal_state, opts).then(
[](::shared_ptr<transport::messages::result_message> msg) {
[p](::shared_ptr<transport::messages::result_message> msg) {
return make_ready_future<::shared_ptr<untyped_result_set>>(::make_shared<untyped_result_set>(msg));
});
});
}
future<::shared_ptr<untyped_result_set>> query_processor::process(
const sstring& query_string,
db::consistency_level cl, const std::initializer_list<data_value>& values, bool cache)
{
auto p = cache ? prepare_internal(query_string) : parse_statement(query_string)->prepare(_db.local());
if (!cache) {
p->statement->validate(_proxy, *_internal_state);
}
return process(p, cl, values);
}
future<::shared_ptr<untyped_result_set>> query_processor::process(
::shared_ptr<statements::parsed_statement::prepared> p,
db::consistency_level cl, const std::initializer_list<data_value>& values)
{
auto opts = make_internal_options(p, values, cl);
return do_with(std::move(opts),
[this, p = std::move(p)](query_options & opts) {
return p->statement->execute(_proxy, *_internal_state, opts).then(
[p](::shared_ptr<transport::messages::result_message> msg) {
return make_ready_future<::shared_ptr<untyped_result_set>>(::make_shared<untyped_result_set>(msg));
});
});
}
future<::shared_ptr<transport::messages::result_message>>
query_processor::process_batch(::shared_ptr<statements::batch_statement> batch, service::query_state& query_state, query_options& options) {
auto& client_state = query_state.get_client_state();
@@ -388,8 +421,12 @@ void query_processor::migration_subscriber::on_update_keyspace(const sstring& ks
{
}
void query_processor::migration_subscriber::on_update_column_family(const sstring& ks_name, const sstring& cf_name)
void query_processor::migration_subscriber::on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool columns_changed)
{
if (columns_changed) {
log.info("Column definitions for {}.{} changed, invalidating related prepared statements", ks_name, cf_name);
remove_invalid_prepared_statements(ks_name, cf_name);
}
}
void query_processor::migration_subscriber::on_update_user_type(const sstring& ks_name, const sstring& type_name)
@@ -439,9 +476,7 @@ void query_processor::migration_subscriber::remove_invalid_prepared_statements(s
}
}
for (auto& id : invalid) {
get_query_processor().invoke_on_all([id] (auto& qp) {
qp.invalidate_prepared_statement(id);
});
_qp->invalidate_prepared_statement(id);
}
}

View File

@@ -322,14 +322,25 @@ public:
}
#endif
private:
::shared_ptr<statements::parsed_statement::prepared> prepare_internal(const std::experimental::string_view& query);
query_options make_internal_options(::shared_ptr<statements::parsed_statement::prepared>, const std::initializer_list<data_value>&);
query_options make_internal_options(::shared_ptr<statements::parsed_statement::prepared>, const std::initializer_list<data_value>&, db::consistency_level = db::consistency_level::ONE);
public:
future<::shared_ptr<untyped_result_set>> execute_internal(
const std::experimental::string_view& query_string,
const sstring& query_string,
const std::initializer_list<data_value>& = { });
::shared_ptr<statements::parsed_statement::prepared> prepare_internal(const sstring& query);
future<::shared_ptr<untyped_result_set>> execute_internal(
::shared_ptr<statements::parsed_statement::prepared>,
const std::initializer_list<data_value>& = { });
future<::shared_ptr<untyped_result_set>> process(
const sstring& query_string,
db::consistency_level, const std::initializer_list<data_value>& = { }, bool cache = false);
future<::shared_ptr<untyped_result_set>> process(
::shared_ptr<statements::parsed_statement::prepared>,
db::consistency_level, const std::initializer_list<data_value>& = { });
/*
* This function provides a timestamp that is guaranteed to be higher than any timestamp
* previously used in internal queries.
@@ -486,7 +497,7 @@ public:
virtual void on_create_aggregate(const sstring& ks_name, const sstring& aggregate_name) override;
virtual void on_update_keyspace(const sstring& ks_name) override;
virtual void on_update_column_family(const sstring& ks_name, const sstring& cf_name) override;
virtual void on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool columns_changed) override;
virtual void on_update_user_type(const sstring& ks_name, const sstring& type_name) override;
virtual void on_update_function(const sstring& ks_name, const sstring& function_name) override;
virtual void on_update_aggregate(const sstring& ks_name, const sstring& aggregate_name) override;

View File

@@ -42,6 +42,7 @@
#include "cql3/statements/alter_table_statement.hh"
#include "service/migration_manager.hh"
#include "validation.hh"
#include "db/config.hh"
namespace cql3 {
@@ -77,216 +78,199 @@ void alter_table_statement::validate(distributed<service::storage_proxy>& proxy,
// validated in announce_migration()
}
static const sstring ALTER_TABLE_FEATURE = "ALTER TABLE";
future<bool> alter_table_statement::announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only)
{
throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
#if 0
CFMetaData meta = validateColumnFamily(keyspace(), columnFamily());
CFMetaData cfm = meta.copy();
auto& db = proxy.local().get_db().local();
db.get_config().check_experimental(ALTER_TABLE_FEATURE);
CQL3Type validator = this.validator == null ? null : this.validator.prepare(keyspace());
ColumnIdentifier columnName = null;
ColumnDefinition def = null;
if (rawColumnName != null)
{
columnName = rawColumnName.prepare(cfm);
def = cfm.getColumnDefinition(columnName);
auto schema = validation::validate_column_family(db, keyspace(), column_family());
auto cfm = schema_builder(schema);
shared_ptr<cql3_type> validator;
if (_validator) {
validator = _validator->prepare(db, keyspace());
}
shared_ptr<column_identifier> column_name;
const column_definition* def = nullptr;
if (_raw_column_name) {
column_name = _raw_column_name->prepare_column_identifier(schema);
def = get_column_definition(schema, *column_name);
}
switch (oType)
switch (_type) {
case alter_table_statement::type::add:
{
case ADD:
assert columnName != null;
if (cfm.comparator.isDense())
throw new InvalidRequestException("Cannot add new column to a COMPACT STORAGE table");
assert(column_name);
if (schema->is_dense()) {
throw exceptions::invalid_request_exception("Cannot add new column to a COMPACT STORAGE table");
}
if (isStatic)
{
if (!cfm.comparator.isCompound())
throw new InvalidRequestException("Static columns are not allowed in COMPACT STORAGE tables");
if (cfm.clusteringColumns().isEmpty())
throw new InvalidRequestException("Static columns are only useful (and thus allowed) if the table has at least one clustering column");
if (_is_static) {
if (!schema->is_compound()) {
throw exceptions::invalid_request_exception("Static columns are not allowed in COMPACT STORAGE tables");
}
if (!schema->clustering_key_size()) {
throw exceptions::invalid_request_exception("Static columns are only useful (and thus allowed) if the table has at least one clustering column");
}
}
if (def) {
if (def->is_partition_key()) {
throw exceptions::invalid_request_exception(sprint("Invalid column name %s because it conflicts with a PRIMARY KEY part", column_name));
} else {
throw exceptions::invalid_request_exception(sprint("Invalid column name %s because it conflicts with an existing column", column_name));
}
}
// Cannot re-add a dropped counter column. See #7831.
if (schema->is_counter() && schema->dropped_columns().count(column_name->text())) {
throw exceptions::invalid_request_exception(sprint("Cannot re-add previously dropped counter column %s", column_name));
}
auto type = validator->get_type();
if (type->is_collection() && type->is_multi_cell()) {
if (!schema->is_compound()) {
throw exceptions::invalid_request_exception("Cannot use non-frozen collections with a non-composite PRIMARY KEY");
}
if (schema->is_super()) {
throw exceptions::invalid_request_exception("Cannot use non-frozen collections with super column families");
}
if (def != null)
{
switch (def.kind)
{
case PARTITION_KEY:
case CLUSTERING_COLUMN:
throw new InvalidRequestException(String.format("Invalid column name %s because it conflicts with a PRIMARY KEY part", columnName));
default:
throw new InvalidRequestException(String.format("Invalid column name %s because it conflicts with an existing column", columnName));
auto it = schema->collections().find(column_name->name());
if (it != schema->collections().end() && !type->is_compatible_with(*it->second)) {
throw exceptions::invalid_request_exception(sprint("Cannot add a collection with the name %s "
"because a collection with the same name and a different type has already been used in the past", column_name));
}
}
cfm.with_column(column_name->name(), type, _is_static ? column_kind::static_column : column_kind::regular_column);
break;
}
case alter_table_statement::type::alter:
{
assert(column_name);
if (!def) {
throw exceptions::invalid_request_exception(sprint("Column %s was not found in table %s", column_name, column_family()));
}
auto type = validator->get_type();
switch (def->kind) {
case column_kind::partition_key:
if (type->is_counter()) {
throw exceptions::invalid_request_exception(sprint("counter type is not supported for PRIMARY KEY part %s", column_name));
}
if (!type->is_value_compatible_with(*def->type)) {
throw exceptions::configuration_exception(sprint("Cannot change %s from type %s to type %s: types are incompatible.",
column_name,
def->type->as_cql3_type(),
validator));
}
break;
case column_kind::clustering_key:
if (!schema->is_cql3_table()) {
throw exceptions::invalid_request_exception(sprint("Cannot alter clustering column %s in a non-CQL3 table", column_name));
}
// Note that CFMetaData.validateCompatibility already validate the change we're about to do. However, the error message it
// sends is a bit cryptic for a CQL3 user, so validating here for a sake of returning a better error message
// Do note that we need isCompatibleWith here, not just isValueCompatibleWith.
if (!type->is_compatible_with(*def->type)) {
throw exceptions::configuration_exception(sprint("Cannot change %s from type %s to type %s: types are not order-compatible.",
column_name,
def->type->as_cql3_type(),
validator));
}
break;
case column_kind::compact_column:
case column_kind::regular_column:
case column_kind::static_column:
// Thrift allows to change a column validator so CFMetaData.validateCompatibility will let it slide
// if we change to an incompatible type (contrarily to the comparator case). But we don't want to
// allow it for CQL3 (see #5882) so validating it explicitly here. We only care about value compatibility
// though since we won't compare values (except when there is an index, but that is validated by
// ColumnDefinition already).
if (!type->is_value_compatible_with(*def->type)) {
throw exceptions::configuration_exception(sprint("Cannot change %s from type %s to type %s: types are incompatible.",
column_name,
def->type->as_cql3_type(),
validator));
}
break;
}
// In any case, we update the column definition
cfm.with_altered_column_type(column_name->name(), type);
break;
}
case alter_table_statement::type::drop:
assert(column_name);
if (!schema->is_cql3_table()) {
throw exceptions::invalid_request_exception("Cannot drop columns from a non-CQL3 table");
}
if (!def) {
throw exceptions::invalid_request_exception(sprint("Column %s was not found in table %s", column_name, column_family()));
}
if (def->is_primary_key()) {
throw exceptions::invalid_request_exception(sprint("Cannot drop PRIMARY KEY part %s", column_name));
} else {
for (auto&& column_def : boost::range::join(schema->static_columns(), schema->regular_columns())) { // find
if (column_def.name() == column_name->name()) {
cfm.without_column(column_name->name());
break;
}
}
}
break;
// Cannot re-add a dropped counter column. See #7831.
if (meta.isCounter() && meta.getDroppedColumns().containsKey(columnName))
throw new InvalidRequestException(String.format("Cannot re-add previously dropped counter column %s", columnName));
case alter_table_statement::type::opts:
if (!_properties) {
throw exceptions::invalid_request_exception("ALTER COLUMNFAMILY WITH invoked, but no parameters found");
}
AbstractType<?> type = validator.getType();
if (type.isCollection() && type.isMultiCell())
{
if (!cfm.comparator.supportCollections())
throw new InvalidRequestException("Cannot use non-frozen collections with a non-composite PRIMARY KEY");
if (cfm.isSuper())
throw new InvalidRequestException("Cannot use non-frozen collections with super column families");
_properties->validate();
// If there used to be a collection column with the same name (that has been dropped), it will
// still be appear in the ColumnToCollectionType because or reasons explained on #6276. The same
// reason mean that we can't allow adding a new collection with that name (see the ticket for details).
if (cfm.comparator.hasCollections())
{
CollectionType previous = cfm.comparator.collectionType() == null ? null : cfm.comparator.collectionType().defined.get(columnName.bytes);
if (previous != null && !type.isCompatibleWith(previous))
throw new InvalidRequestException(String.format("Cannot add a collection with the name %s " +
"because a collection with the same name and a different type has already been used in the past", columnName));
}
if (schema->is_counter() && _properties->get_default_time_to_live() > 0) {
throw exceptions::invalid_request_exception("Cannot set default_time_to_live on a table with counters");
}
cfm.comparator = cfm.comparator.addOrUpdateCollection(columnName, (CollectionType)type);
_properties->apply_to_builder(cfm);
break;
case alter_table_statement::type::rename:
for (auto&& entry : _renames) {
auto from = entry.first->prepare_column_identifier(schema);
auto to = entry.second->prepare_column_identifier(schema);
auto def = schema->get_column_definition(from->name());
if (!def) {
throw exceptions::invalid_request_exception(sprint("Cannot rename unknown column %s in table %s", from, column_family()));
}
Integer componentIndex = cfm.comparator.isCompound() ? cfm.comparator.clusteringPrefixSize() : null;
cfm.addColumnDefinition(isStatic
? ColumnDefinition.staticDef(cfm, columnName.bytes, type, componentIndex)
: ColumnDefinition.regularDef(cfm, columnName.bytes, type, componentIndex));
break;
case ALTER:
assert columnName != null;
if (def == null)
throw new InvalidRequestException(String.format("Column %s was not found in table %s", columnName, columnFamily()));
AbstractType<?> validatorType = validator.getType();
switch (def.kind)
{
case PARTITION_KEY:
if (validatorType instanceof CounterColumnType)
throw new InvalidRequestException(String.format("counter type is not supported for PRIMARY KEY part %s", columnName));
if (cfm.getKeyValidator() instanceof CompositeType)
{
List<AbstractType<?>> oldTypes = ((CompositeType) cfm.getKeyValidator()).types;
if (!validatorType.isValueCompatibleWith(oldTypes.get(def.position())))
throw new ConfigurationException(String.format("Cannot change %s from type %s to type %s: types are incompatible.",
columnName,
oldTypes.get(def.position()).asCQL3Type(),
validator));
List<AbstractType<?>> newTypes = new ArrayList<AbstractType<?>>(oldTypes);
newTypes.set(def.position(), validatorType);
cfm.keyValidator(CompositeType.getInstance(newTypes));
}
else
{
if (!validatorType.isValueCompatibleWith(cfm.getKeyValidator()))
throw new ConfigurationException(String.format("Cannot change %s from type %s to type %s: types are incompatible.",
columnName,
cfm.getKeyValidator().asCQL3Type(),
validator));
cfm.keyValidator(validatorType);
}
break;
case CLUSTERING_COLUMN:
if (!cfm.isCQL3Table())
throw new InvalidRequestException(String.format("Cannot alter clustering column %s in a non-CQL3 table", columnName));
AbstractType<?> oldType = cfm.comparator.subtype(def.position());
// Note that CFMetaData.validateCompatibility already validate the change we're about to do. However, the error message it
// sends is a bit cryptic for a CQL3 user, so validating here for a sake of returning a better error message
// Do note that we need isCompatibleWith here, not just isValueCompatibleWith.
if (!validatorType.isCompatibleWith(oldType))
throw new ConfigurationException(String.format("Cannot change %s from type %s to type %s: types are not order-compatible.",
columnName,
oldType.asCQL3Type(),
validator));
cfm.comparator = cfm.comparator.setSubtype(def.position(), validatorType);
break;
case COMPACT_VALUE:
// See below
if (!validatorType.isValueCompatibleWith(cfm.getDefaultValidator()))
throw new ConfigurationException(String.format("Cannot change %s from type %s to type %s: types are incompatible.",
columnName,
cfm.getDefaultValidator().asCQL3Type(),
validator));
cfm.defaultValidator(validatorType);
break;
case REGULAR:
case STATIC:
// Thrift allows to change a column validator so CFMetaData.validateCompatibility will let it slide
// if we change to an incompatible type (contrarily to the comparator case). But we don't want to
// allow it for CQL3 (see #5882) so validating it explicitly here. We only care about value compatibility
// though since we won't compare values (except when there is an index, but that is validated by
// ColumnDefinition already).
if (!validatorType.isValueCompatibleWith(def.type))
throw new ConfigurationException(String.format("Cannot change %s from type %s to type %s: types are incompatible.",
columnName,
def.type.asCQL3Type(),
validator));
// For collections, if we alter the type, we need to update the comparator too since it includes
// the type too (note that isValueCompatibleWith above has validated that the new type doesn't
// change the underlying sorting order, but we still don't want to have a discrepancy between the type
// in the comparator and the one in the ColumnDefinition as that would be dodgy).
if (validatorType.isCollection() && validatorType.isMultiCell())
cfm.comparator = cfm.comparator.addOrUpdateCollection(def.name, (CollectionType)validatorType);
break;
if (schema->get_column_definition(to->name())) {
throw exceptions::invalid_request_exception(sprint("Cannot rename column %s to %s in table %s; another column of that name already exist", from, to, column_family()));
}
// In any case, we update the column definition
cfm.addOrReplaceColumnDefinition(def.withNewType(validatorType));
break;
case DROP:
assert columnName != null;
if (!cfm.isCQL3Table())
throw new InvalidRequestException("Cannot drop columns from a non-CQL3 table");
if (def == null)
throw new InvalidRequestException(String.format("Column %s was not found in table %s", columnName, columnFamily()));
switch (def.kind)
{
case PARTITION_KEY:
case CLUSTERING_COLUMN:
throw new InvalidRequestException(String.format("Cannot drop PRIMARY KEY part %s", columnName));
case REGULAR:
case STATIC:
ColumnDefinition toDelete = null;
for (ColumnDefinition columnDef : cfm.regularAndStaticColumns())
{
if (columnDef.name.equals(columnName))
toDelete = columnDef;
}
assert toDelete != null;
cfm.removeColumnDefinition(toDelete);
cfm.recordColumnDrop(toDelete);
break;
if (def->is_part_of_cell_name()) {
throw exceptions::invalid_request_exception(sprint("Cannot rename non PRIMARY KEY part %s", from));
}
break;
case OPTS:
if (cfProps == null)
throw new InvalidRequestException(String.format("ALTER COLUMNFAMILY WITH invoked, but no parameters found"));
cfProps.validate();
if (meta.isCounter() && cfProps.getDefaultTimeToLive() > 0)
throw new InvalidRequestException("Cannot set default_time_to_live on a table with counters");
cfProps.applyToCFMetadata(cfm);
break;
case RENAME:
for (Map.Entry<ColumnIdentifier.Raw, ColumnIdentifier.Raw> entry : renames.entrySet())
{
ColumnIdentifier from = entry.getKey().prepare(cfm);
ColumnIdentifier to = entry.getValue().prepare(cfm);
cfm.renameColumn(from, to);
if (def->is_indexed()) {
throw exceptions::invalid_request_exception(sprint("Cannot rename column %s because it is secondary indexed", from));
}
break;
cfm.with_column_rename(from->name(), to->name());
}
break;
}
MigrationManager.announceColumnFamilyUpdate(cfm, false, isLocalOnly);
return true;
#endif
return service::get_local_migration_manager().announce_column_family_update(cfm.build(), false, is_local_only).then([] {
return true;
});
}
shared_ptr<transport::event::schema_change> alter_table_statement::change_event()

View File

@@ -38,6 +38,7 @@
*/
#include "batch_statement.hh"
#include "db/config.hh"
namespace cql3 {
@@ -55,6 +56,50 @@ bool batch_statement::depends_on_column_family(const sstring& cf_name) const
return false;
}
void batch_statement::verify_batch_size(const std::vector<mutation>& mutations) {
size_t warn_threshold = service::get_local_storage_proxy().get_db().local().get_config().batch_size_warn_threshold_in_kb();
class my_partition_visitor : public mutation_partition_visitor {
public:
void accept_partition_tombstone(tombstone) override {}
void accept_static_cell(column_id, atomic_cell_view v) override {
size += v.value().size();
}
void accept_static_cell(column_id, collection_mutation_view v) override {
size += v.data.size();
}
void accept_row_tombstone(clustering_key_prefix_view, tombstone) override {}
void accept_row(clustering_key_view, tombstone, const row_marker&) override {}
void accept_row_cell(column_id, atomic_cell_view v) override {
size += v.value().size();
}
void accept_row_cell(column_id id, collection_mutation_view v) override {
size += v.data.size();
}
size_t size = 0;
};
my_partition_visitor v;
for (auto&m : mutations) {
m.partition().accept(*m.schema(), v);
}
auto size = v.size / 1024;
if (size > warn_threshold) {
std::unordered_set<sstring> ks_cf_pairs;
for (auto&& m : mutations) {
ks_cf_pairs.insert(m.schema()->ks_name() + "." + m.schema()->cf_name());
}
_logger.warn(
"Batch of prepared statements for {} is of size {}, exceeding specified threshold of {} by {}.{}",
join(", ", ks_cf_pairs), size, warn_threshold,
size - warn_threshold, "");
}
}
}
}

View File

@@ -196,27 +196,8 @@ public:
* Checks batch size to ensure threshold is met. If not, a warning is logged.
* @param cfs ColumnFamilies that will store the batch's mutations.
*/
static void verify_batch_size(const std::vector<mutation>& mutations) {
size_t warn_threshold = 1000; // FIXME: database_descriptor::get_batch_size_warn_threshold();
size_t fail_threshold = 2000; // FIXME: database_descriptor::get_batch_size_fail_threshold();
static void verify_batch_size(const std::vector<mutation>& mutations);
size_t size = mutations.size();
if (size > warn_threshold) {
std::unordered_set<sstring> ks_cf_pairs;
for (auto&& m : mutations) {
ks_cf_pairs.insert(m.schema()->ks_name() + "." + m.schema()->cf_name());
}
const char* format = "Batch of prepared statements for {} is of size {}, exceeding specified threshold of {} by {}.{}";
if (size > fail_threshold) {
// FIXME: Tracing.trace(format, new Object[] {ksCfPairs, size, failThreshold, size - failThreshold, " (see batch_size_fail_threshold_in_kb)"});
_logger.error(format, join(", ", ks_cf_pairs), size, fail_threshold, size - fail_threshold, " (see batch_size_fail_threshold_in_kb)");
throw exceptions::invalid_request_exception("Batch too large");
} else {
_logger.warn(format, join(", ", ks_cf_pairs), size, warn_threshold, size - warn_threshold, "");
}
}
}
virtual future<shared_ptr<transport::messages::result_message>> execute(
distributed<service::storage_proxy>& storage, service::query_state& state, const query_options& options) override {
return execute(storage, state, options, false, options.get_timestamp(state));

View File

@@ -81,7 +81,7 @@ cql3::statements::create_index_statement::validate(distributed<service::storage_
auto cd = schema->get_column_definition(target->column->name());
if (cd == nullptr) {
throw exceptions::invalid_request_exception(sprint("No column definition found for column %s", target->column->name()));
throw exceptions::invalid_request_exception(sprint("No column definition found for column %s", *target->column));
}
bool is_map = dynamic_cast<const collection_type_impl *>(cd->type.get()) != nullptr
@@ -93,7 +93,7 @@ cql3::statements::create_index_statement::validate(distributed<service::storage_
throw exceptions::invalid_request_exception(
sprint("Cannot create index on %s of frozen<map> column %s",
index_target::index_option(target->type),
target->column->name()));
*target->column));
}
} else {
// validateNotFullIndex
@@ -107,7 +107,7 @@ cql3::statements::create_index_statement::validate(distributed<service::storage_
sprint(
"Cannot create index on %s of column %s; only non-frozen collections support %s indexes",
index_target::index_option(target->type),
target->column->name(),
*target->column,
index_target::index_option(target->type)));
}
// validateTargetColumnIsMapIfIndexInvolvesKeys
@@ -118,7 +118,7 @@ cql3::statements::create_index_statement::validate(distributed<service::storage_
sprint(
"Cannot create index on %s of column %s with non-map type",
index_target::index_option(target->type),
target->column->name()));
*target->column));
}
}
@@ -132,9 +132,9 @@ cql3::statements::create_index_statement::validate(distributed<service::storage_
"Cannot create index on %s(%s): an index on %s(%s) already exists and indexing "
"a map on more than one dimension at the same time is not currently supported",
index_target::index_option(target->type),
target->column->name(),
*target->column,
index_target::index_option(prev_type),
target->column->name()));
*target->column));
}
if (_if_not_exists) {
return;
@@ -164,12 +164,13 @@ cql3::statements::create_index_statement::validate(distributed<service::storage_
throw exceptions::invalid_request_exception(
sprint(
"Cannot create secondary index on partition key column %s",
target->column->name()));
*target->column));
}
}
future<bool>
cql3::statements::create_index_statement::announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only) {
throw std::runtime_error("Indexes are not supported yet");
auto schema = proxy.local().get_db().local().find_schema(keyspace(), column_family());
auto target = _raw_target->prepare(schema);

View File

@@ -270,7 +270,7 @@ modification_statement::read_required_rows(
for (auto&& pk : *keys) {
pr.emplace_back(dht::global_partitioner().decorate_key(*s, pk));
}
query::read_command cmd(s->id(), ps, std::numeric_limits<uint32_t>::max());
query::read_command cmd(s->id(), s->version(), ps, std::numeric_limits<uint32_t>::max());
// FIXME: ignoring "local"
return proxy.local().query(s, make_lw_shared(std::move(cmd)), std::move(pr), cl).then([this, ps] (auto result) {
// FIXME: copying

View File

@@ -218,22 +218,24 @@ select_statement::execute(distributed<service::storage_proxy>& proxy, service::q
int32_t limit = get_limit(options);
auto now = db_clock::now();
auto command = ::make_lw_shared<query::read_command>(_schema->id(), make_partition_slice(options), limit, to_gc_clock(now));
auto command = ::make_lw_shared<query::read_command>(_schema->id(), _schema->version(),
make_partition_slice(options), limit, to_gc_clock(now));
int32_t page_size = options.get_page_size();
// An aggregation query will never be paged for the user, but we always page it internally to avoid OOM.
// If we user provided a page_size we'll use that to page internally (because why not), otherwise we use our default
// Note that if there are some nodes in the cluster with a version less than 2.0, we can't use paging (CASSANDRA-6707).
if (_selection->is_aggregate() && page_size <= 0) {
auto aggregate = _selection->is_aggregate();
if (aggregate && page_size <= 0) {
page_size = DEFAULT_COUNT_PAGE_SIZE;
}
auto key_ranges = _restrictions->get_partition_key_ranges(options);
if (page_size <= 0
if (!aggregate && (page_size <= 0
|| !service::pager::query_pagers::may_need_paging(page_size,
*command, key_ranges)) {
*command, key_ranges))) {
return execute(proxy, command, std::move(key_ranges), state, options,
now);
}
@@ -241,7 +243,7 @@ select_statement::execute(distributed<service::storage_proxy>& proxy, service::q
auto p = service::pager::query_pagers::pager(_schema, _selection,
state, options, command, std::move(key_ranges));
if (_selection->is_aggregate()) {
if (aggregate) {
return do_with(
cql3::selection::result_set_builder(*_selection, now,
options.get_serialization_format()),
@@ -308,7 +310,8 @@ future<::shared_ptr<transport::messages::result_message>>
select_statement::execute_internal(distributed<service::storage_proxy>& proxy, service::query_state& state, const query_options& options) {
int32_t limit = get_limit(options);
auto now = db_clock::now();
auto command = ::make_lw_shared<query::read_command>(_schema->id(), make_partition_slice(options), limit);
auto command = ::make_lw_shared<query::read_command>(_schema->id(), _schema->version(),
make_partition_slice(options), limit);
auto partition_ranges = _restrictions->get_partition_key_ranges(options);
if (needs_post_query_ordering() && _limit) {

View File

@@ -59,7 +59,7 @@ bool update_statement::require_full_clustering_key() const {
void update_statement::add_update_for_key(mutation& m, const exploded_clustering_prefix& prefix, const update_parameters& params) {
if (s->is_dense()) {
if (!prefix || (prefix.size() == 1 && prefix.components().front().empty())) {
throw exceptions::invalid_request_exception(sprint("Missing PRIMARY KEY part %s", *s->clustering_key_columns().begin()));
throw exceptions::invalid_request_exception(sprint("Missing PRIMARY KEY part %s", s->clustering_key_columns().begin()->name_as_text()));
}
// An empty name for the compact value is what we use to recognize the case where there is not column

View File

@@ -23,6 +23,7 @@
#include "database.hh"
#include "unimplemented.hh"
#include "core/future-util.hh"
#include "db/commitlog/commitlog_entry.hh"
#include "db/system_keyspace.hh"
#include "db/consistency_level.hh"
#include "db/serializer.hh"
@@ -57,6 +58,8 @@
#include <seastar/core/enum.hh>
#include "utils/latency.hh"
#include "utils/flush_queue.hh"
#include "schema_registry.hh"
#include "service/priority_manager.hh"
using namespace std::chrono_literals;
@@ -126,9 +129,9 @@ column_family::make_partition_presence_checker(lw_shared_ptr<sstable_list> old_s
mutation_source
column_family::sstables_as_mutation_source() {
return [this] (const query::partition_range& r) {
return make_sstable_reader(r);
};
return mutation_source([this] (schema_ptr s, const query::partition_range& r, const io_priority_class& pc) {
return make_sstable_reader(std::move(s), r, pc);
});
}
// define in .cc, since sstable is forward-declared in .hh
@@ -153,10 +156,14 @@ class range_sstable_reader final : public mutation_reader::impl {
const query::partition_range& _pr;
lw_shared_ptr<sstable_list> _sstables;
mutation_reader _reader;
// Use a pointer instead of copying, so we don't need to regenerate the reader if
// the priority changes.
const io_priority_class* _pc;
public:
range_sstable_reader(schema_ptr s, lw_shared_ptr<sstable_list> sstables, const query::partition_range& pr)
range_sstable_reader(schema_ptr s, lw_shared_ptr<sstable_list> sstables, const query::partition_range& pr, const io_priority_class& pc)
: _pr(pr)
, _sstables(std::move(sstables))
, _pc(&pc)
{
std::vector<mutation_reader> readers;
for (const lw_shared_ptr<sstables::sstable>& sst : *_sstables | boost::adaptors::map_values) {
@@ -183,11 +190,15 @@ class single_key_sstable_reader final : public mutation_reader::impl {
mutation_opt _m;
bool _done = false;
lw_shared_ptr<sstable_list> _sstables;
// Use a pointer instead of copying, so we don't need to regenerate the reader if
// the priority changes.
const io_priority_class* _pc;
public:
single_key_sstable_reader(schema_ptr schema, lw_shared_ptr<sstable_list> sstables, const partition_key& key)
single_key_sstable_reader(schema_ptr schema, lw_shared_ptr<sstable_list> sstables, const partition_key& key, const io_priority_class& pc)
: _schema(std::move(schema))
, _key(sstables::key::from_partition_key(*_schema, key))
, _sstables(std::move(sstables))
, _pc(&pc)
{ }
virtual future<mutation_opt> operator()() override {
@@ -206,26 +217,26 @@ public:
};
mutation_reader
column_family::make_sstable_reader(const query::partition_range& pr) const {
column_family::make_sstable_reader(schema_ptr s, const query::partition_range& pr, const io_priority_class& pc) const {
if (pr.is_singular() && pr.start()->value().has_key()) {
const dht::ring_position& pos = pr.start()->value();
if (dht::shard_of(pos.token()) != engine().cpu_id()) {
return make_empty_reader(); // range doesn't belong to this shard
}
return make_mutation_reader<single_key_sstable_reader>(_schema, _sstables, *pos.key());
return make_mutation_reader<single_key_sstable_reader>(std::move(s), _sstables, *pos.key(), pc);
} else {
// range_sstable_reader is not movable so we need to wrap it
return make_mutation_reader<range_sstable_reader>(_schema, _sstables, pr);
return make_mutation_reader<range_sstable_reader>(std::move(s), _sstables, pr, pc);
}
}
key_source column_family::sstables_as_key_source() const {
return [this] (const query::partition_range& range) {
return key_source([this] (const query::partition_range& range, const io_priority_class& pc) {
std::vector<key_reader> readers;
readers.reserve(_sstables->size());
std::transform(_sstables->begin(), _sstables->end(), std::back_inserter(readers), [&] (auto&& entry) {
auto& sst = entry.second;
auto rd = sstables::make_key_reader(_schema, sst, range);
auto rd = sstables::make_key_reader(_schema, sst, range, pc);
if (sst->is_shared()) {
rd = make_filtering_reader(std::move(rd), [] (const dht::decorated_key& dk) {
return dht::shard_of(dk.token()) == engine().cpu_id();
@@ -234,14 +245,14 @@ key_source column_family::sstables_as_key_source() const {
return rd;
});
return make_combined_reader(_schema, std::move(readers));
};
});
}
// Exposed for testing, not performance critical.
future<column_family::const_mutation_partition_ptr>
column_family::find_partition(const dht::decorated_key& key) const {
return do_with(query::partition_range::make_singular(key), [this] (auto& range) {
return do_with(this->make_reader(range), [] (mutation_reader& reader) {
column_family::find_partition(schema_ptr s, const dht::decorated_key& key) const {
return do_with(query::partition_range::make_singular(key), [s = std::move(s), this] (auto& range) {
return do_with(this->make_reader(s, range), [] (mutation_reader& reader) {
return reader().then([] (mutation_opt&& mo) -> std::unique_ptr<const mutation_partition> {
if (!mo) {
return {};
@@ -253,13 +264,13 @@ column_family::find_partition(const dht::decorated_key& key) const {
}
future<column_family::const_mutation_partition_ptr>
column_family::find_partition_slow(const partition_key& key) const {
return find_partition(dht::global_partitioner().decorate_key(*_schema, key));
column_family::find_partition_slow(schema_ptr s, const partition_key& key) const {
return find_partition(s, dht::global_partitioner().decorate_key(*s, key));
}
future<column_family::const_row_ptr>
column_family::find_row(const dht::decorated_key& partition_key, clustering_key clustering_key) const {
return find_partition(partition_key).then([clustering_key = std::move(clustering_key)] (const_mutation_partition_ptr p) {
column_family::find_row(schema_ptr s, const dht::decorated_key& partition_key, clustering_key clustering_key) const {
return find_partition(std::move(s), partition_key).then([clustering_key = std::move(clustering_key)] (const_mutation_partition_ptr p) {
if (!p) {
return make_ready_future<const_row_ptr>();
}
@@ -274,8 +285,8 @@ column_family::find_row(const dht::decorated_key& partition_key, clustering_key
}
mutation_reader
column_family::make_reader(const query::partition_range& range) const {
if (query::is_wrap_around(range, *_schema)) {
column_family::make_reader(schema_ptr s, const query::partition_range& range, const io_priority_class& pc) const {
if (query::is_wrap_around(range, *s)) {
// make_combined_reader() can't handle streams that wrap around yet.
fail(unimplemented::cause::WRAP_AROUND);
}
@@ -304,21 +315,22 @@ column_family::make_reader(const query::partition_range& range) const {
// https://github.com/scylladb/scylla/issues/185
for (auto&& mt : *_memtables) {
readers.emplace_back(mt->make_reader(range));
readers.emplace_back(mt->make_reader(s, range));
}
if (_config.enable_cache) {
readers.emplace_back(_cache.make_reader(range));
readers.emplace_back(_cache.make_reader(s, range, pc));
} else {
readers.emplace_back(make_sstable_reader(range));
readers.emplace_back(make_sstable_reader(s, range, pc));
}
return make_combined_reader(std::move(readers));
}
// Not performance critical. Currently used for testing only.
template <typename Func>
future<bool>
column_family::for_all_partitions(Func&& func) const {
column_family::for_all_partitions(schema_ptr s, Func&& func) const {
static_assert(std::is_same<bool, std::result_of_t<Func(const dht::decorated_key&, const mutation_partition&)>>::value,
"bad Func signature");
@@ -329,13 +341,13 @@ column_family::for_all_partitions(Func&& func) const {
bool empty = false;
public:
bool done() const { return !ok || empty; }
iteration_state(const column_family& cf, Func&& func)
: reader(cf.make_reader())
iteration_state(schema_ptr s, const column_family& cf, Func&& func)
: reader(cf.make_reader(std::move(s)))
, func(std::move(func))
{ }
};
return do_with(iteration_state(*this, std::move(func)), [] (iteration_state& is) {
return do_with(iteration_state(std::move(s), *this, std::move(func)), [] (iteration_state& is) {
return do_until([&is] { return is.done(); }, [&is] {
return is.reader().then([&is](mutation_opt&& mo) {
if (!mo) {
@@ -351,8 +363,8 @@ column_family::for_all_partitions(Func&& func) const {
}
future<bool>
column_family::for_all_partitions_slow(std::function<bool (const dht::decorated_key&, const mutation_partition&)> func) const {
return for_all_partitions(std::move(func));
column_family::for_all_partitions_slow(schema_ptr s, std::function<bool (const dht::decorated_key&, const mutation_partition&)> func) const {
return for_all_partitions(std::move(s), std::move(func));
}
class lister {
@@ -462,7 +474,15 @@ future<sstables::entry_descriptor> column_family::probe_file(sstring sstdir, sst
}
update_sstables_known_generation(comps.generation);
assert(_sstables->count(comps.generation) == 0);
{
auto i = _sstables->find(comps.generation);
if (i != _sstables->end()) {
auto new_toc = sstdir + "/" + fname;
throw std::runtime_error(sprint("Attempted to add sstable generation %d twice: new=%s existing=%s",
comps.generation, new_toc, i->second->toc_filename()));
}
}
auto fut = sstable::get_sstable_key_range(*_schema, _schema->ks_name(), _schema->cf_name(), sstdir, comps.generation, comps.version, comps.format);
return std::move(fut).then([this, sstdir = std::move(sstdir), comps] (range<partition_key> r) {
@@ -584,27 +604,20 @@ column_family::try_flush_memtable_to_sstable(lw_shared_ptr<memtable> old) {
_config.cf_stats->pending_memtables_flushes_bytes += memtable_size;
newtab->set_unshared();
dblog.debug("Flushing to {}", newtab->get_filename());
return newtab->write_components(*old).then([this, newtab, old] {
return newtab->open_data().then([this, newtab] {
// Note that due to our sharded architecture, it is possible that
// in the face of a value change some shards will backup sstables
// while others won't.
//
// This is, in theory, possible to mitigate through a rwlock.
// However, this doesn't differ from the situation where all tables
// are coming from a single shard and the toggle happens in the
// middle of them.
//
// The code as is guarantees that we'll never partially backup a
// single sstable, so that is enough of a guarantee.
if (!incremental_backups_enabled()) {
return make_ready_future<>();
}
auto dir = newtab->get_dir() + "/backups/";
return touch_directory(dir).then([dir, newtab] {
return newtab->create_links(dir);
});
});
// Note that due to our sharded architecture, it is possible that
// in the face of a value change some shards will backup sstables
// while others won't.
//
// This is, in theory, possible to mitigate through a rwlock.
// However, this doesn't differ from the situation where all tables
// are coming from a single shard and the toggle happens in the
// middle of them.
//
// The code as is guarantees that we'll never partially backup a
// single sstable, so that is enough of a guarantee.
auto&& priority = service::get_local_memtable_flush_priority();
return newtab->write_components(*old, incremental_backups_enabled(), priority).then([this, newtab, old] {
return newtab->open_data();
}).then_wrapped([this, old, newtab, memtable_size] (future<> ret) {
_config.cf_stats->pending_memtables_flushes_count--;
_config.cf_stats->pending_memtables_flushes_bytes -= memtable_size;
@@ -709,68 +722,119 @@ column_family::reshuffle_sstables(int64_t start) {
});
}
void
column_family::rebuild_sstable_list(const std::vector<sstables::shared_sstable>& new_sstables,
const std::vector<sstables::shared_sstable>& sstables_to_remove) {
// Build a new list of _sstables: We remove from the existing list the
// tables we compacted (by now, there might be more sstables flushed
// later), and we add the new tables generated by the compaction.
// We create a new list rather than modifying it in-place, so that
// on-going reads can continue to use the old list.
auto current_sstables = _sstables;
_sstables = make_lw_shared<sstable_list>();
// zeroing live_disk_space_used and live_sstable_count because the
// sstable list is re-created below.
_stats.live_disk_space_used = 0;
_stats.live_sstable_count = 0;
std::unordered_set<sstables::shared_sstable> s(
sstables_to_remove.begin(), sstables_to_remove.end());
for (const auto& oldtab : *current_sstables) {
// Checks if oldtab is a sstable not being compacted.
if (!s.count(oldtab.second)) {
update_stats_for_new_sstable(oldtab.second->data_size());
_sstables->emplace(oldtab.first, oldtab.second);
}
}
for (const auto& newtab : new_sstables) {
// FIXME: rename the new sstable(s). Verify a rename doesn't cause
// problems for the sstable object.
update_stats_for_new_sstable(newtab->data_size());
_sstables->emplace(newtab->generation(), newtab);
}
for (const auto& oldtab : sstables_to_remove) {
oldtab->mark_for_deletion();
}
}
future<>
column_family::compact_sstables(sstables::compaction_descriptor descriptor) {
column_family::compact_sstables(sstables::compaction_descriptor descriptor, bool cleanup) {
if (!descriptor.sstables.size()) {
// if there is nothing to compact, just return.
return make_ready_future<>();
}
return with_lock(_sstables_lock.for_read(), [this, descriptor = std::move(descriptor)] {
return with_lock(_sstables_lock.for_read(), [this, descriptor = std::move(descriptor), cleanup] {
auto sstables_to_compact = make_lw_shared<std::vector<sstables::shared_sstable>>(std::move(descriptor.sstables));
auto new_tables = make_lw_shared<std::vector<
std::pair<unsigned, sstables::shared_sstable>>>();
auto new_tables = make_lw_shared<std::vector<sstables::shared_sstable>>();
auto create_sstable = [this, new_tables] {
// FIXME: this generation calculation should be in a function.
auto gen = _sstable_generation++ * smp::count + engine().cpu_id();
auto gen = this->calculate_generation_for_new_table();
// FIXME: use "tmp" marker in names of incomplete sstable
auto sst = make_lw_shared<sstables::sstable>(_schema->ks_name(), _schema->cf_name(), _config.datadir, gen,
sstables::sstable::version_types::ka,
sstables::sstable::format_types::big);
sst->set_unshared();
new_tables->emplace_back(gen, sst);
new_tables->emplace_back(sst);
return sst;
};
return sstables::compact_sstables(*sstables_to_compact, *this,
create_sstable, descriptor.max_sstable_bytes, descriptor.level).then([this, new_tables, sstables_to_compact] {
// Build a new list of _sstables: We remove from the existing list the
// tables we compacted (by now, there might be more sstables flushed
// later), and we add the new tables generated by the compaction.
// We create a new list rather than modifying it in-place, so that
// on-going reads can continue to use the old list.
auto current_sstables = _sstables;
_sstables = make_lw_shared<sstable_list>();
// zeroing live_disk_space_used and live_sstable_count because the
// sstable list is re-created below.
_stats.live_disk_space_used = 0;
_stats.live_sstable_count = 0;
std::unordered_set<sstables::shared_sstable> s(
sstables_to_compact->begin(), sstables_to_compact->end());
for (const auto& oldtab : *current_sstables) {
// Checks if oldtab is a sstable not being compacted.
if (!s.count(oldtab.second)) {
update_stats_for_new_sstable(oldtab.second->data_size());
_sstables->emplace(oldtab.first, oldtab.second);
create_sstable, descriptor.max_sstable_bytes, descriptor.level, cleanup).then([this, new_tables, sstables_to_compact] {
this->rebuild_sstable_list(*new_tables, *sstables_to_compact);
}).then_wrapped([this, new_tables] (future<> f) {
try {
f.get();
} catch (...) {
// Delete either partially or fully written sstables of a compaction that
// was either stopped abruptly (e.g. out of disk space) or deliberately
// (e.g. nodetool stop COMPACTION).
for (auto& sst : *new_tables) {
dblog.debug("Deleting sstable {} of interrupted compaction for {}/{}", sst->get_filename(), _schema->ks_name(), _schema->cf_name());
sst->mark_for_deletion();
}
}
for (const auto& newtab : *new_tables) {
// FIXME: rename the new sstable(s). Verify a rename doesn't cause
// problems for the sstable object.
update_stats_for_new_sstable(newtab.second->data_size());
_sstables->emplace(newtab.first, newtab.second);
}
for (const auto& oldtab : *sstables_to_compact) {
oldtab->mark_for_deletion();
throw;
}
});
});
}
static bool needs_cleanup(const lw_shared_ptr<sstables::sstable>& sst,
const lw_shared_ptr<std::vector<range<dht::token>>>& owned_ranges,
schema_ptr s) {
auto first = sst->get_first_partition_key(*s);
auto last = sst->get_last_partition_key(*s);
auto first_token = dht::global_partitioner().get_token(*s, first);
auto last_token = dht::global_partitioner().get_token(*s, last);
range<dht::token> sst_token_range = range<dht::token>::make(first_token, last_token);
// return true iff sst partition range isn't fully contained in any of the owned ranges.
for (auto& r : *owned_ranges) {
if (r.contains(sst_token_range, dht::token_comparator())) {
return false;
}
}
return true;
}
future<> column_family::cleanup_sstables(sstables::compaction_descriptor descriptor) {
std::vector<range<dht::token>> r = service::get_local_storage_service().get_local_ranges(_schema->ks_name());
auto owned_ranges = make_lw_shared<std::vector<range<dht::token>>>(std::move(r));
auto sstables_to_cleanup = make_lw_shared<std::vector<sstables::shared_sstable>>(std::move(descriptor.sstables));
return parallel_for_each(*sstables_to_cleanup, [this, owned_ranges = std::move(owned_ranges), sstables_to_cleanup] (auto& sst) {
if (!owned_ranges->empty() && !needs_cleanup(sst, owned_ranges, _schema)) {
return make_ready_future<>();
}
std::vector<sstables::shared_sstable> sstable_to_compact({ sst });
return this->compact_sstables(sstables::compaction_descriptor(std::move(sstable_to_compact)), true);
});
}
future<>
column_family::load_new_sstables(std::vector<sstables::entry_descriptor> new_tables) {
return parallel_for_each(new_tables, [this] (auto comps) {
@@ -816,12 +880,9 @@ void column_family::trigger_compaction() {
}
}
future<> column_family::run_compaction() {
sstables::compaction_strategy strategy = _compaction_strategy;
return do_with(std::move(strategy), [this] (sstables::compaction_strategy& cs) {
return cs.compact(*this).then([this] {
_stats.pending_compactions--;
});
future<> column_family::run_compaction(sstables::compaction_descriptor descriptor) {
return compact_sstables(std::move(descriptor)).then([this] {
_stats.pending_compactions--;
});
}
@@ -975,8 +1036,6 @@ database::database(const db::config& cfg)
if (!_memtable_total_space) {
_memtable_total_space = memory::stats().total_memory() / 2;
}
bool durable = cfg.data_file_directories().size() > 0;
db::system_keyspace::make(*this, durable, _cfg->volatile_system_keyspace_for_testing());
// Start compaction manager with two tasks for handling compaction jobs.
_compaction_manager.start(2);
setup_collectd();
@@ -1119,6 +1178,9 @@ future<> database::parse_system_tables(distributed<service::storage_proxy>& prox
future<>
database::init_system_keyspace() {
bool durable = _cfg->data_file_directories().size() > 0;
db::system_keyspace::make(*this, durable, _cfg->volatile_system_keyspace_for_testing());
// FIXME support multiple directories
return touch_directory(_cfg->data_file_directories()[0] + "/" + db::system_keyspace::NAME).then([this] {
return populate_keyspace(_cfg->data_file_directories()[0], db::system_keyspace::NAME).then([this]() {
@@ -1184,6 +1246,8 @@ void database::drop_keyspace(const sstring& name) {
}
void database::add_column_family(schema_ptr schema, column_family::config cfg) {
schema = local_schema_registry().learn(schema);
schema->registry_entry()->mark_synced();
auto uuid = schema->id();
lw_shared_ptr<column_family> cf;
if (cfg.enable_commitlog && _commitlog) {
@@ -1209,17 +1273,6 @@ void database::add_column_family(schema_ptr schema, column_family::config cfg) {
_ks_cf_to_uuid.emplace(std::move(kscf), uuid);
}
future<> database::update_column_family(const sstring& ks_name, const sstring& cf_name) {
auto& proxy = service::get_storage_proxy();
auto old_cfm = find_schema(ks_name, cf_name);
return db::schema_tables::create_table_from_name(proxy, ks_name, cf_name).then([old_cfm] (auto&& new_cfm) {
if (old_cfm->id() != new_cfm->id()) {
return make_exception_future<>(exceptions::configuration_exception(sprint("Column family ID mismatch (found %s; expected %s)", new_cfm->id(), old_cfm->id())));
}
return make_exception_future<>(std::runtime_error("update column family not implemented"));
});
}
future<> database::drop_column_family(db_clock::time_point dropped_at, const sstring& ks_name, const sstring& cf_name) {
auto uuid = find_uuid(ks_name, cf_name);
auto& ks = find_keyspace(ks_name);
@@ -1483,13 +1536,17 @@ compare_atomic_cell_for_merge(atomic_cell_view left, atomic_cell_view right) {
}
struct query_state {
explicit query_state(const query::read_command& cmd, const std::vector<query::partition_range>& ranges)
: cmd(cmd)
explicit query_state(schema_ptr s,
const query::read_command& cmd,
const std::vector<query::partition_range>& ranges)
: schema(std::move(s))
, cmd(cmd)
, builder(cmd.slice)
, limit(cmd.row_limit)
, current_partition_range(ranges.begin())
, range_end(ranges.end()){
}
schema_ptr schema;
const query::read_command& cmd;
query::result::builder builder;
uint32_t limit;
@@ -1503,21 +1560,21 @@ struct query_state {
};
future<lw_shared_ptr<query::result>>
column_family::query(const query::read_command& cmd, const std::vector<query::partition_range>& partition_ranges) {
column_family::query(schema_ptr s, const query::read_command& cmd, const std::vector<query::partition_range>& partition_ranges) {
utils::latency_counter lc;
_stats.reads.set_latency(lc);
return do_with(query_state(cmd, partition_ranges), [this] (query_state& qs) {
return do_with(query_state(std::move(s), cmd, partition_ranges), [this] (query_state& qs) {
return do_until(std::bind(&query_state::done, &qs), [this, &qs] {
auto&& range = *qs.current_partition_range++;
qs.reader = make_reader(range);
qs.reader = make_reader(qs.schema, range, service::get_local_sstable_query_read_priority());
qs.range_empty = false;
return do_until([&qs] { return !qs.limit || qs.range_empty; }, [this, &qs] {
return qs.reader().then([this, &qs](mutation_opt mo) {
return do_until([&qs] { return !qs.limit || qs.range_empty; }, [&qs] {
return qs.reader().then([&qs](mutation_opt mo) {
if (mo) {
auto p_builder = qs.builder.add_partition(*mo->schema(), mo->key());
auto is_distinct = qs.cmd.slice.options.contains(query::partition_slice::option::distinct);
auto limit = !is_distinct ? qs.limit : 1;
mo->partition().query(p_builder, *_schema, qs.cmd.timestamp, limit);
mo->partition().query(p_builder, *qs.schema, qs.cmd.timestamp, limit);
qs.limit -= p_builder.row_count();
} else {
qs.range_empty = true;
@@ -1538,21 +1595,21 @@ column_family::query(const query::read_command& cmd, const std::vector<query::pa
mutation_source
column_family::as_mutation_source() const {
return [this] (const query::partition_range& range) {
return this->make_reader(range);
};
return mutation_source([this] (schema_ptr s, const query::partition_range& range, const io_priority_class& pc) {
return this->make_reader(std::move(s), range, pc);
});
}
future<lw_shared_ptr<query::result>>
database::query(const query::read_command& cmd, const std::vector<query::partition_range>& ranges) {
database::query(schema_ptr s, const query::read_command& cmd, const std::vector<query::partition_range>& ranges) {
column_family& cf = find_column_family(cmd.cf_id);
return cf.query(cmd, ranges);
return cf.query(std::move(s), cmd, ranges);
}
future<reconcilable_result>
database::query_mutations(const query::read_command& cmd, const query::partition_range& range) {
database::query_mutations(schema_ptr s, const query::read_command& cmd, const query::partition_range& range) {
column_family& cf = find_column_family(cmd.cf_id);
return mutation_query(cf.as_mutation_source(), range, cmd.slice, cmd.row_limit, cmd.timestamp);
return mutation_query(std::move(s), cf.as_mutation_source(), range, cmd.slice, cmd.row_limit, cmd.timestamp);
}
std::unordered_set<sstring> database::get_initial_tokens() {
@@ -1597,7 +1654,8 @@ std::ostream& operator<<(std::ostream& out, const atomic_cell_or_collection& c)
}
std::ostream& operator<<(std::ostream& os, const mutation& m) {
fprint(os, "{mutation: schema %p key %s data ", m.schema().get(), m.decorated_key());
const ::schema& s = *m.schema();
fprint(os, "{%s.%s key %s data ", s.ks_name(), s.cf_name(), m.decorated_key());
os << m.partition() << "}";
return os;
}
@@ -1616,28 +1674,74 @@ std::ostream& operator<<(std::ostream& out, const database& db) {
return out;
}
future<> database::apply_in_memory(const frozen_mutation& m, const db::replay_position& rp) {
void
column_family::apply(const mutation& m, const db::replay_position& rp) {
utils::latency_counter lc;
_stats.writes.set_latency(lc);
active_memtable().apply(m, rp);
seal_on_overflow();
_stats.writes.mark(lc);
if (lc.is_start()) {
_stats.estimated_write.add(lc.latency(), _stats.writes.count);
}
}
void
column_family::apply(const frozen_mutation& m, const schema_ptr& m_schema, const db::replay_position& rp) {
utils::latency_counter lc;
_stats.writes.set_latency(lc);
check_valid_rp(rp);
active_memtable().apply(m, m_schema, rp);
seal_on_overflow();
_stats.writes.mark(lc);
if (lc.is_start()) {
_stats.estimated_write.add(lc.latency(), _stats.writes.count);
}
}
void
column_family::seal_on_overflow() {
++_mutation_count;
if (active_memtable().occupancy().total_space() >= _config.max_memtable_size) {
// FIXME: if sparse, do some in-memory compaction first
// FIXME: maybe merge with other in-memory memtables
_mutation_count = 0;
seal_active_memtable();
}
}
void
column_family::check_valid_rp(const db::replay_position& rp) const {
if (rp < _highest_flushed_rp) {
throw replay_position_reordered_exception();
}
}
future<> database::apply_in_memory(const frozen_mutation& m, const schema_ptr& m_schema, const db::replay_position& rp) {
try {
auto& cf = find_column_family(m.column_family_id());
cf.apply(m, rp);
cf.apply(m, m_schema, rp);
} catch (no_such_column_family&) {
dblog.error("Attempting to mutate non-existent table {}", m.column_family_id());
}
return make_ready_future<>();
}
future<> database::do_apply(const frozen_mutation& m) {
future<> database::do_apply(schema_ptr s, const frozen_mutation& m) {
// I'm doing a nullcheck here since the init code path for db etc
// is a little in flux and commitlog is created only when db is
// initied from datadir.
auto& cf = find_column_family(m.column_family_id());
auto uuid = m.column_family_id();
auto& cf = find_column_family(uuid);
if (!s->is_synced()) {
throw std::runtime_error(sprint("attempted to mutate using not synced schema of %s.%s, version=%s",
s->ks_name(), s->cf_name(), s->version()));
}
if (cf.commitlog() != nullptr) {
auto uuid = m.column_family_id();
bytes_view repr = m.representation();
auto write_repr = [repr] (data_output& out) { out.write(repr.begin(), repr.end()); };
return cf.commitlog()->add_mutation(uuid, repr.size(), write_repr).then([&m, this](auto rp) {
commitlog_entry_writer cew(s, m);
return cf.commitlog()->add_entry(uuid, cew).then([&m, this, s](auto rp) {
try {
return this->apply_in_memory(m, rp);
return this->apply_in_memory(m, s, rp);
} catch (replay_position_reordered_exception&) {
// expensive, but we're assuming this is super rare.
// if we failed to apply the mutation due to future re-ordering
@@ -1645,11 +1749,11 @@ future<> database::do_apply(const frozen_mutation& m) {
// let's just try again, add the mutation to the CL once more,
// and assume success in inevitable eventually.
dblog.debug("replay_position reordering detected");
return this->apply(m);
return this->apply(s, m);
}
});
}
return apply_in_memory(m, db::replay_position());
return apply_in_memory(m, s, db::replay_position());
}
future<> database::throttle() {
@@ -1683,9 +1787,12 @@ void database::unthrottle() {
}
}
future<> database::apply(const frozen_mutation& m) {
return throttle().then([this, &m] {
return do_apply(m);
future<> database::apply(schema_ptr s, const frozen_mutation& m) {
if (dblog.is_enabled(logging::log_level::trace)) {
dblog.trace("apply {}", m.pretty_printer(s));
}
return throttle().then([this, &m, s = std::move(s)] {
return do_apply(std::move(s), m);
});
}
@@ -2226,3 +2333,15 @@ std::ostream& operator<<(std::ostream& os, const keyspace_metadata& m) {
os << "}";
return os;
}
void column_family::set_schema(schema_ptr s) {
dblog.debug("Changing schema version of {}.{} ({}) from {} to {}",
_schema->ks_name(), _schema->cf_name(), _schema->id(), _schema->version(), s->version());
for (auto& m : *_memtables) {
m->set_schema(s);
}
_cache.set_schema(s);
_schema = std::move(s);
}

View File

@@ -64,7 +64,7 @@
#include "mutation_reader.hh"
#include "row_cache.hh"
#include "compaction_strategy.hh"
#include "utils/compaction_manager.hh"
#include "sstables/compaction_manager.hh"
#include "utils/exponential_backoff_retry.hh"
#include "utils/histogram.hh"
#include "sstables/estimated_histogram.hh"
@@ -172,6 +172,9 @@ private:
int _compaction_disabled = 0;
class memtable_flush_queue;
std::unique_ptr<memtable_flush_queue> _flush_queue;
// Store generation of sstables being compacted at the moment. That's needed to prevent a
// sstable from being compacted twice.
std::unordered_set<unsigned long> _compacting_generations;
private:
void update_stats_for_new_sstable(uint64_t new_sstable_data_size);
void add_sstable(sstables::sstable&& sstable);
@@ -185,11 +188,20 @@ private:
void update_sstables_known_generation(unsigned generation) {
_sstable_generation = std::max<uint64_t>(_sstable_generation, generation / smp::count + 1);
}
uint64_t calculate_generation_for_new_table() {
return _sstable_generation++ * smp::count + engine().cpu_id();
}
// Rebuild existing _sstables with new_sstables added to it and sstables_to_remove removed from it.
void rebuild_sstable_list(const std::vector<sstables::shared_sstable>& new_sstables,
const std::vector<sstables::shared_sstable>& sstables_to_remove);
private:
// Creates a mutation reader which covers sstables.
// Caller needs to ensure that column_family remains live (FIXME: relax this).
// The 'range' parameter must be live as long as the reader is used.
mutation_reader make_sstable_reader(const query::partition_range& range) const;
// Mutations returned by the reader will all have given schema.
mutation_reader make_sstable_reader(schema_ptr schema, const query::partition_range& range, const io_priority_class& pc) const;
mutation_source sstables_as_mutation_source();
key_source sstables_as_key_source() const;
@@ -200,7 +212,12 @@ public:
// Caller needs to ensure that column_family remains live (FIXME: relax this).
// Note: for data queries use query() instead.
// The 'range' parameter must be live as long as the reader is used.
mutation_reader make_reader(const query::partition_range& range = query::full_partition_range) const;
// Mutations returned by the reader will all have given schema.
// If I/O needs to be issued to read anything in the specified range, the operations
// will be scheduled under the priority class given by pc.
mutation_reader make_reader(schema_ptr schema,
const query::partition_range& range = query::full_partition_range,
const io_priority_class& pc = default_priority_class()) const;
mutation_source as_mutation_source() const;
@@ -225,16 +242,21 @@ public:
column_family(schema_ptr schema, config cfg, no_commitlog, compaction_manager&);
column_family(column_family&&) = delete; // 'this' is being captured during construction
~column_family();
schema_ptr schema() const { return _schema; }
const schema_ptr& schema() const { return _schema; }
void set_schema(schema_ptr);
db::commitlog* commitlog() { return _commitlog; }
future<const_mutation_partition_ptr> find_partition(const dht::decorated_key& key) const;
future<const_mutation_partition_ptr> find_partition_slow(const partition_key& key) const;
future<const_row_ptr> find_row(const dht::decorated_key& partition_key, clustering_key clustering_key) const;
void apply(const frozen_mutation& m, const db::replay_position& = db::replay_position());
future<const_mutation_partition_ptr> find_partition(schema_ptr, const dht::decorated_key& key) const;
future<const_mutation_partition_ptr> find_partition_slow(schema_ptr, const partition_key& key) const;
future<const_row_ptr> find_row(schema_ptr, const dht::decorated_key& partition_key, clustering_key clustering_key) const;
// Applies given mutation to this column family
// The mutation is always upgraded to current schema.
void apply(const frozen_mutation& m, const schema_ptr& m_schema, const db::replay_position& = db::replay_position());
void apply(const mutation& m, const db::replay_position& = db::replay_position());
// Returns at most "cmd.limit" rows
future<lw_shared_ptr<query::result>> query(const query::read_command& cmd, const std::vector<query::partition_range>& ranges);
future<lw_shared_ptr<query::result>> query(schema_ptr,
const query::read_command& cmd,
const std::vector<query::partition_range>& ranges);
future<> populate(sstring datadir);
@@ -283,7 +305,15 @@ public:
// not a real compaction policy.
future<> compact_all_sstables();
// Compact all sstables provided in the vector.
future<> compact_sstables(sstables::compaction_descriptor descriptor);
// If cleanup is set to true, compaction_sstables will run on behalf of a cleanup job,
// meaning that irrelevant keys will be discarded.
future<> compact_sstables(sstables::compaction_descriptor descriptor, bool cleanup = false);
// Performs a cleanup on each sstable of this column family, excluding
// those ones that are irrelevant to this node or being compacted.
// Cleanup is about discarding keys that are no longer relevant for a
// given sstable, e.g. after node loses part of its token range because
// of a newly added node.
future<> cleanup_sstables(sstables::compaction_descriptor descriptor);
future<bool> snapshot_exists(sstring name);
@@ -306,7 +336,7 @@ public:
void start_compaction();
void trigger_compaction();
future<> run_compaction();
future<> run_compaction(sstables::compaction_descriptor descriptor);
void set_compaction_strategy(sstables::compaction_strategy_type strategy);
const sstables::compaction_strategy& get_compaction_strategy() const {
return _compaction_strategy;
@@ -337,6 +367,10 @@ public:
}
});
}
std::unordered_set<unsigned long>& compacting_generations() {
return _compacting_generations;
}
private:
// One does not need to wait on this future if all we are interested in, is
// initiating the write. The writes initiated here will eventually
@@ -360,14 +394,14 @@ private:
// so that iteration can be stopped by returning false.
// Func signature: bool (const decorated_key& dk, const mutation_partition& mp)
template <typename Func>
future<bool> for_all_partitions(Func&& func) const;
future<bool> for_all_partitions(schema_ptr, Func&& func) const;
future<sstables::entry_descriptor> probe_file(sstring sstdir, sstring fname);
void seal_on_overflow();
void check_valid_rp(const db::replay_position&) const;
public:
// Iterate over all partitions. Protocol is the same as std::all_of(),
// so that iteration can be stopped by returning false.
future<bool> for_all_partitions_slow(std::function<bool (const dht::decorated_key&, const mutation_partition&)> func) const;
future<bool> for_all_partitions_slow(schema_ptr, std::function<bool (const dht::decorated_key&, const mutation_partition&)> func) const;
friend std::ostream& operator<<(std::ostream& out, const column_family& cf);
// Testing purposes.
@@ -541,7 +575,7 @@ class database {
circular_buffer<promise<>> _throttled_requests;
future<> init_commitlog();
future<> apply_in_memory(const frozen_mutation&, const db::replay_position&);
future<> apply_in_memory(const frozen_mutation& m, const schema_ptr& m_schema, const db::replay_position&);
future<> populate(sstring datadir);
future<> populate_keyspace(sstring datadir, sstring ks_name);
@@ -553,7 +587,7 @@ private:
friend void db::system_keyspace::make(database& db, bool durable, bool volatile_testing_only);
void setup_collectd();
future<> throttle();
future<> do_apply(const frozen_mutation&);
future<> do_apply(schema_ptr, const frozen_mutation&);
void unthrottle();
public:
static utils::UUID empty_version;
@@ -584,7 +618,6 @@ public:
void add_column_family(schema_ptr schema, column_family::config cfg);
future<> update_column_family(const sstring& ks_name, const sstring& cf_name);
future<> drop_column_family(db_clock::time_point changed_at, const sstring& ks_name, const sstring& cf_name);
/* throws std::out_of_range if missing */
@@ -619,9 +652,9 @@ public:
unsigned shard_of(const dht::token& t);
unsigned shard_of(const mutation& m);
unsigned shard_of(const frozen_mutation& m);
future<lw_shared_ptr<query::result>> query(const query::read_command& cmd, const std::vector<query::partition_range>& ranges);
future<reconcilable_result> query_mutations(const query::read_command& cmd, const query::partition_range& range);
future<> apply(const frozen_mutation&);
future<lw_shared_ptr<query::result>> query(schema_ptr, const query::read_command& cmd, const std::vector<query::partition_range>& ranges);
future<reconcilable_result> query_mutations(schema_ptr, const query::read_command& cmd, const query::partition_range& range);
future<> apply(schema_ptr, const frozen_mutation&);
keyspace::config make_keyspace_config(const keyspace_metadata& ksm);
const sstring& get_snitch_name() const;
future<> clear_snapshot(sstring tag, std::vector<sstring> keyspace_names);
@@ -669,53 +702,6 @@ public:
// FIXME: stub
class secondary_index_manager {};
inline
void
column_family::apply(const mutation& m, const db::replay_position& rp) {
utils::latency_counter lc;
_stats.writes.set_latency(lc);
active_memtable().apply(m, rp);
seal_on_overflow();
_stats.writes.mark(lc);
if (lc.is_start()) {
_stats.estimated_write.add(lc.latency(), _stats.writes.count);
}
}
inline
void
column_family::seal_on_overflow() {
++_mutation_count;
if (active_memtable().occupancy().total_space() >= _config.max_memtable_size) {
// FIXME: if sparse, do some in-memory compaction first
// FIXME: maybe merge with other in-memory memtables
_mutation_count = 0;
seal_active_memtable();
}
}
inline
void
column_family::check_valid_rp(const db::replay_position& rp) const {
if (rp < _highest_flushed_rp) {
throw replay_position_reordered_exception();
}
}
inline
void
column_family::apply(const frozen_mutation& m, const db::replay_position& rp) {
utils::latency_counter lc;
_stats.writes.set_latency(lc);
check_valid_rp(rp);
active_memtable().apply(m, rp);
seal_on_overflow();
_stats.writes.mark(lc);
if (lc.is_start()) {
_stats.estimated_write.add(lc.latency(), _stats.writes.count);
}
}
future<> update_schema_version_and_announce(distributed<service::storage_proxy>& proxy);
#endif /* DATABASE_HH_ */

View File

@@ -31,12 +31,19 @@ class mutation_partition;
// schema.hh
class schema;
class column_definition;
class column_mapping;
// schema_mutations.hh
class schema_mutations;
// keys.hh
class exploded_clustering_prefix;
class partition_key;
class partition_key_view;
class clustering_key_prefix;
class clustering_key_prefix_view;
using clustering_key = clustering_key_prefix;
using clustering_key_view = clustering_key_prefix_view;
// memtable.hh
class memtable;

View File

@@ -45,6 +45,7 @@
#include <boost/range/adaptor/sliced.hpp>
#include "batchlog_manager.hh"
#include "canonical_mutation.hh"
#include "service/storage_service.hh"
#include "service/storage_proxy.hh"
#include "system_keyspace.hh"
@@ -57,6 +58,7 @@
#include "db/config.hh"
#include "gms/failure_detector.hh"
#include "service/storage_service.hh"
#include "schema_registry.hh"
static logging::logger logger("batchlog_manager");
@@ -116,14 +118,14 @@ mutation db::batchlog_manager::get_batch_log_mutation_for(const std::vector<muta
auto key = partition_key::from_singular(*schema, id);
auto timestamp = api::new_timestamp();
auto data = [this, &mutations] {
std::vector<frozen_mutation> fm(mutations.begin(), mutations.end());
std::vector<canonical_mutation> fm(mutations.begin(), mutations.end());
const auto size = std::accumulate(fm.begin(), fm.end(), size_t(0), [](size_t s, auto& m) {
return s + serializer<frozen_mutation>{m}.size();
return s + serializer<canonical_mutation>{m}.size();
});
bytes buf(bytes::initialized_later(), size);
data_output out(buf);
for (auto& m : fm) {
serializer<frozen_mutation>{m}(out);
serializer<canonical_mutation>{m}(out);
}
return buf;
}();
@@ -151,23 +153,24 @@ future<> db::batchlog_manager::replay_all_failed_batches() {
auto batch = [this, limiter](const cql3::untyped_result_set::row& row) {
auto written_at = row.get_as<db_clock::time_point>("written_at");
auto id = row.get_as<utils::UUID>("id");
// enough time for the actual write + batchlog entry mutation delivery (two separate requests).
// enough time for the actual write + batchlog entry mutation delivery (two separate requests).
auto timeout = get_batch_log_timeout();
if (db_clock::now() < written_at + timeout) {
logger.debug("Skipping replay of {}, too fresh", id);
return make_ready_future<>();
}
// not used currently. ever?
//auto version = row.has("version") ? row.get_as<uint32_t>("version") : /*MessagingService.VERSION_12*/6u;
auto id = row.get_as<utils::UUID>("id");
auto data = row.get_blob("data");
logger.debug("Replaying batch {}", id);
auto fms = make_lw_shared<std::deque<frozen_mutation>>();
auto fms = make_lw_shared<std::deque<canonical_mutation>>();
data_input in(data);
while (in.has_next()) {
fms->emplace_back(serializer<frozen_mutation>::read(in));
fms->emplace_back(serializer<canonical_mutation>::read(in));
}
auto mutations = make_lw_shared<std::vector<mutation>>();
@@ -179,11 +182,10 @@ future<> db::batchlog_manager::replay_all_failed_batches() {
}
auto& fm = fms->front();
auto mid = fm.column_family_id();
return system_keyspace::get_truncated_at(mid).then([this, &fm, written_at, mutations](db_clock::time_point t) {
auto schema = _qp.db().local().find_schema(fm.column_family_id());
return system_keyspace::get_truncated_at(mid).then([this, mid, &fm, written_at, mutations](db_clock::time_point t) {
schema_ptr s = _qp.db().local().find_schema(mid);
if (written_at > t) {
auto schema = _qp.db().local().find_schema(fm.column_family_id());
mutations->emplace_back(fm.unfreeze(schema));
mutations->emplace_back(fm.to_mutation(s));
}
}).then([fms] {
fms->pop_front();

View File

@@ -64,6 +64,8 @@
#include "utils/crc.hh"
#include "utils/runtime.hh"
#include "log.hh"
#include "commitlog_entry.hh"
#include "service/priority_manager.hh"
static logging::logger logger("commitlog");
@@ -155,6 +157,9 @@ public:
bool _shutdown = false;
semaphore _new_segment_semaphore;
semaphore _write_semaphore;
semaphore _flush_semaphore;
scollectd::registrations _regs;
// TODO: verify that we're ok with not-so-great granularity
@@ -170,7 +175,11 @@ public:
uint64_t bytes_slack = 0;
uint64_t segments_created = 0;
uint64_t segments_destroyed = 0;
uint64_t pending_operations = 0;
uint64_t pending_writes = 0;
uint64_t pending_flushes = 0;
uint64_t pending_allocations = 0;
uint64_t write_limit_exceeded = 0;
uint64_t flush_limit_exceeded = 0;
uint64_t total_size = 0;
uint64_t buffer_list_bytes = 0;
uint64_t total_size_on_disk = 0;
@@ -178,33 +187,73 @@ public:
stats totals;
void begin_op() {
future<> begin_write() {
_gate.enter();
++totals.pending_operations;
++totals.pending_writes; // redundant, given semaphore. but easier to read
if (totals.pending_writes >= cfg.max_active_writes) {
++totals.write_limit_exceeded;
logger.trace("Write ops overflow: {}. Will block.", totals.pending_writes);
}
return _write_semaphore.wait();
}
void end_op() {
--totals.pending_operations;
void end_write() {
_write_semaphore.signal();
--totals.pending_writes;
_gate.leave();
}
future<> begin_flush() {
_gate.enter();
++totals.pending_flushes;
if (totals.pending_flushes >= cfg.max_active_flushes) {
++totals.flush_limit_exceeded;
logger.trace("Flush ops overflow: {}. Will block.", totals.pending_flushes);
}
return _flush_semaphore.wait();
}
void end_flush() {
_flush_semaphore.signal();
--totals.pending_flushes;
_gate.leave();
}
bool should_wait_for_write() const {
return _write_semaphore.waiters() > 0 || _flush_semaphore.waiters() > 0;
}
segment_manager(config c)
: cfg(c), max_size(
std::min<size_t>(std::numeric_limits<position_type>::max(),
std::max<size_t>(cfg.commitlog_segment_size_in_mb,
1) * 1024 * 1024)), max_mutation_size(
max_size >> 1), max_disk_size(
size_t(
std::ceil(
cfg.commitlog_total_space_in_mb
/ double(smp::count))) * 1024 * 1024)
: cfg([&c] {
config cfg(c);
if (cfg.commit_log_location.empty()) {
cfg.commit_log_location = "/var/lib/scylla/commitlog";
}
if (cfg.max_active_writes == 0) {
cfg.max_active_writes = // TODO: call someone to get an idea...
25 * smp::count;
}
cfg.max_active_writes = std::max(uint64_t(1), cfg.max_active_writes / smp::count);
if (cfg.max_active_flushes == 0) {
cfg.max_active_flushes = // TODO: call someone to get an idea...
5 * smp::count;
}
cfg.max_active_flushes = std::max(uint64_t(1), cfg.max_active_flushes / smp::count);
return cfg;
}())
, max_size(std::min<size_t>(std::numeric_limits<position_type>::max(), std::max<size_t>(cfg.commitlog_segment_size_in_mb, 1) * 1024 * 1024))
, max_mutation_size(max_size >> 1)
, max_disk_size(size_t(std::ceil(cfg.commitlog_total_space_in_mb / double(smp::count))) * 1024 * 1024)
, _write_semaphore(cfg.max_active_writes)
, _flush_semaphore(cfg.max_active_flushes)
{
assert(max_size > 0);
if (cfg.commit_log_location.empty()) {
cfg.commit_log_location = "/var/lib/scylla/commitlog";
}
logger.trace("Commitlog {} maximum disk size: {} MB / cpu ({} cpus)",
cfg.commit_log_location, max_disk_size / (1024 * 1024),
smp::count);
_regs = create_counters();
}
~segment_manager() {
@@ -238,6 +287,8 @@ public:
}
std::vector<sstring> get_active_names() const;
uint64_t get_num_dirty_segments() const;
uint64_t get_num_active_segments() const;
using buffer_type = temporary_buffer<char>;
@@ -341,9 +392,39 @@ class db::commitlog::segment: public enable_lw_shared_from_this<segment> {
std::unordered_map<cf_id_type, position_type> _cf_dirty;
time_point _sync_time;
seastar::gate _gate;
uint64_t _write_waiters = 0;
semaphore _queue;
std::unordered_set<table_schema_version> _known_schema_versions;
friend std::ostream& operator<<(std::ostream&, const segment&);
friend class segment_manager;
future<> begin_flush() {
// This is maintaining the semantica of only using the write-lock
// as a gate for flushing, i.e. once we've begun a flush for position X
// we are ok with writes to positions > X
return _dwrite.write_lock().then(std::bind(&segment_manager::begin_flush, _segment_manager)).finally([this] {
_dwrite.write_unlock();
});
}
void end_flush() {
_segment_manager->end_flush();
}
future<> begin_write() {
// This is maintaining the semantica of only using the write-lock
// as a gate for flushing, i.e. once we've begun a flush for position X
// we are ok with writes to positions > X
return _dwrite.read_lock().then(std::bind(&segment_manager::begin_write, _segment_manager));
}
void end_write() {
_segment_manager->end_write();
_dwrite.read_unlock();
}
public:
struct cf_mark {
const segment& s;
@@ -365,7 +446,7 @@ public:
segment(segment_manager* m, const descriptor& d, file && f, bool active)
: _segment_manager(m), _desc(std::move(d)), _file(std::move(f)), _sync_time(
clock_type::now())
clock_type::now()), _queue(0)
{
++_segment_manager->totals.segments_created;
logger.debug("Created new {} segment {}", active ? "active" : "reserve", *this);
@@ -383,9 +464,19 @@ public:
}
}
bool is_schema_version_known(schema_ptr s) {
return _known_schema_versions.count(s->version());
}
void add_schema_version(schema_ptr s) {
_known_schema_versions.emplace(s->version());
}
void forget_schema_versions() {
_known_schema_versions.clear();
}
bool must_sync() {
if (_segment_manager->cfg.mode == sync_mode::BATCH) {
return true;
return false;
}
auto now = clock_type::now();
auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(
@@ -401,8 +492,9 @@ public:
*/
future<sseg_ptr> finish_and_get_new() {
_closed = true;
sync();
return _segment_manager->active_segment();
return maybe_wait_for_write(sync()).then([](sseg_ptr s) {
return s->_segment_manager->active_segment();
});
}
void reset_sync_time() {
_sync_time = clock_type::now();
@@ -417,7 +509,7 @@ public:
logger.trace("Sync not needed {}: ({} / {})", *this, position(), _flush_pos);
return make_ready_future<sseg_ptr>(shared_from_this());
}
return cycle().then([](auto seg) {
return cycle().then([](sseg_ptr seg) {
return seg->flush();
});
}
@@ -440,16 +532,14 @@ public:
// This is not 100% neccesary, we really only need the ones below our flush pos,
// but since we pretty much assume that task ordering will make this the case anyway...
return _dwrite.write_lock().then(
return begin_flush().then(
[this, me, pos]() mutable {
_dwrite.write_unlock(); // release it already.
pos = std::max(pos, _file_pos);
if (pos <= _flush_pos) {
logger.trace("{} already synced! ({} < {})", *this, pos, _flush_pos);
return make_ready_future<sseg_ptr>(std::move(me));
}
_segment_manager->begin_op();
return _file.flush().then_wrapped([this, pos, me](auto f) {
return _file.flush().then_wrapped([this, pos, me](future<> f) {
try {
f.get();
// TODO: retry/ignore/fail/stop - optional behaviour in origin.
@@ -462,16 +552,50 @@ public:
logger.error("Failed to flush commits to disk: {}", std::current_exception());
throw;
}
}).finally([this, me] {
_segment_manager->end_op();
});
});
}).finally([this] {
end_flush();
});
}
/**
* Allocate a new buffer
*/
void new_buffer(size_t s) {
assert(_buffer.empty());
auto overhead = segment_overhead_size;
if (_file_pos == 0) {
overhead += descriptor_header_size;
}
auto a = align_up(s + overhead, alignment);
auto k = std::max(a, default_size);
for (;;) {
try {
_buffer = _segment_manager->acquire_buffer(k);
break;
} catch (std::bad_alloc&) {
logger.warn("Could not allocate {} k bytes output buffer ({} k required)", k / 1024, a / 1024);
if (k > a) {
k = std::max(a, k / 2);
logger.debug("Trying reduced size: {} k", k / 1024);
continue;
}
throw;
}
}
_buf_pos = overhead;
auto * p = reinterpret_cast<uint32_t *>(_buffer.get_write());
std::fill(p, p + overhead, 0);
_segment_manager->totals.total_size += k;
}
/**
* Send any buffer contents to disk and get a new tmp buffer
*/
// See class comment for info
future<sseg_ptr> cycle(size_t s = 0) {
future<sseg_ptr> cycle() {
auto size = clear_buffer_slack();
auto buf = std::move(_buffer);
auto off = _file_pos;
@@ -479,36 +603,6 @@ public:
_file_pos += size;
_buf_pos = 0;
// if we need new buffer, get one.
// TODO: keep a queue of available buffers?
if (s > 0) {
auto overhead = segment_overhead_size;
if (_file_pos == 0) {
overhead += descriptor_header_size;
}
auto a = align_up(s + overhead, alignment);
auto k = std::max(a, default_size);
for (;;) {
try {
_buffer = _segment_manager->acquire_buffer(k);
break;
} catch (std::bad_alloc&) {
logger.warn("Could not allocate {} k bytes output buffer ({} k required)", k / 1024, a / 1024);
if (k > a) {
k = std::max(a, k / 2);
logger.debug("Trying reduced size: {} k", k / 1024);
continue;
}
throw;
}
}
_buf_pos = overhead;
auto * p = reinterpret_cast<uint32_t *>(_buffer.get_write());
std::fill(p, p + overhead, 0);
_segment_manager->totals.total_size += k;
}
auto me = shared_from_this();
assert(!me.owned());
@@ -545,13 +639,15 @@ public:
out.write(uint32_t(_file_pos));
out.write(crc.checksum());
forget_schema_versions();
// acquire read lock
return _dwrite.read_lock().then([this, size, off, buf = std::move(buf), me]() mutable {
return begin_write().then([this, size, off, buf = std::move(buf), me]() mutable {
auto written = make_lw_shared<size_t>(0);
auto p = buf.get();
_segment_manager->begin_op();
return repeat([this, size, off, written, p]() mutable {
return _file.dma_write(off + *written, p + *written, size - *written).then_wrapped([this, size, written](auto&& f) {
auto&& priority_class = service::get_local_commitlog_priority();
return _file.dma_write(off + *written, p + *written, size - *written, priority_class).then_wrapped([this, size, written](future<size_t>&& f) {
try {
auto bytes = std::get<0>(f.get());
*written += bytes;
@@ -575,20 +671,59 @@ public:
});
}).finally([this, buf = std::move(buf)]() mutable {
_segment_manager->release_buffer(std::move(buf));
_segment_manager->end_op();
});
}).then([me] {
return make_ready_future<sseg_ptr>(std::move(me));
}).finally([me, this]() {
_dwrite.read_unlock(); // release
end_write(); // release
});
}
future<sseg_ptr> maybe_wait_for_write(future<sseg_ptr> f) {
if (_segment_manager->should_wait_for_write()) {
++_write_waiters;
logger.trace("Too many pending writes. Must wait.");
return f.finally([this] {
if (--_write_waiters == 0) {
_queue.signal(_queue.waiters());
}
});
}
return make_ready_future<sseg_ptr>(shared_from_this());
}
/**
* If an allocation causes a write, and the write causes a block,
* any allocations post that need to wait for this to finish,
* other wise we will just continue building up more write queue
* eventually (+ loose more ordering)
*
* Some caution here, since maybe_wait_for_write actually
* releases _all_ queued up ops when finishing, we could get
* "bursts" of alloc->write, causing build-ups anyway.
* This should be measured properly. For now I am hoping this
* will work out as these should "block as a group". However,
* buffer memory usage might grow...
*/
bool must_wait_for_alloc() {
return _write_waiters > 0;
}
future<sseg_ptr> wait_for_alloc() {
auto me = shared_from_this();
++_segment_manager->totals.pending_allocations;
logger.trace("Previous allocation is blocking. Must wait.");
return _queue.wait().then([me] { // TODO: do we need a finally?
--me->_segment_manager->totals.pending_allocations;
return make_ready_future<sseg_ptr>(me);
});
}
/**
* Add a "mutation" to the segment.
*/
future<replay_position> allocate(const cf_id_type& id, size_t size,
serializer_func func) {
future<replay_position> allocate(const cf_id_type& id, shared_ptr<entry_writer> writer) {
const auto size = writer->size(*this);
const auto s = size + entry_overhead_size; // total size
if (s > _segment_manager->max_mutation_size) {
return make_exception_future<replay_position>(
@@ -597,23 +732,26 @@ public:
+ " bytes is too large for the maxiumum size of "
+ std::to_string(_segment_manager->max_mutation_size)));
}
// would we make the file too big?
for (;;) {
if (position() + s > _segment_manager->max_size) {
// do this in next segment instead.
return finish_and_get_new().then(
[id, size, func = std::move(func)](auto new_seg) {
return new_seg->allocate(id, size, func);
});
}
// enough data?
if (s > (_buffer.size() - _buf_pos)) {
// TODO: iff we have to many writes running, maybe we should
// wait for this?
cycle(s);
continue; // re-check file size overflow
}
break;
std::experimental::optional<future<sseg_ptr>> op;
if (must_sync()) {
op = sync();
} else if (must_wait_for_alloc()) {
op = wait_for_alloc();
} else if (!is_still_allocating() || position() + s > _segment_manager->max_size) { // would we make the file too big?
// do this in next segment instead.
op = finish_and_get_new();
} else if (_buffer.empty()) {
new_buffer(s);
} else if (s > (_buffer.size() - _buf_pos)) { // enough data?
op = maybe_wait_for_write(cycle());
}
if (op) {
return op->then([id, writer = std::move(writer)] (sseg_ptr new_seg) mutable {
return new_seg->allocate(id, std::move(writer));
});
}
_gate.enter(); // this might throw. I guess we accept this?
@@ -634,7 +772,7 @@ public:
out.write(crc.checksum());
// actual data
func(out);
writer->write(*this, out);
crc.process_bytes(p + 2 * sizeof(uint32_t), size);
@@ -645,9 +783,8 @@ public:
_gate.leave();
// finally, check if we're required to sync.
if (must_sync()) {
return sync().then([rp](auto seg) {
if (_segment_manager->cfg.mode == sync_mode::BATCH) {
return sync().then([rp](sseg_ptr) {
return make_ready_future<replay_position>(rp);
});
}
@@ -736,7 +873,7 @@ db::commitlog::segment_manager::list_descriptors(sstring dirname) {
}
return make_ready_future<std::experimental::optional<directory_entry_type>>(de.type);
};
return entry_type(de).then([this, de](auto type) {
return entry_type(de).then([this, de](std::experimental::optional<directory_entry_type> type) {
if (type == directory_entry_type::regular && de.name[0] != '.') {
try {
_result.emplace_back(de.name);
@@ -753,7 +890,7 @@ db::commitlog::segment_manager::list_descriptors(sstring dirname) {
}
};
return engine().open_directory(dirname).then([this, dirname](auto dir) {
return engine().open_directory(dirname).then([this, dirname](file dir) {
auto h = make_lw_shared<helper>(std::move(dirname), std::move(dir));
return h->done().then([h]() {
return make_ready_future<std::vector<db::commitlog::descriptor>>(std::move(h->_result));
@@ -762,7 +899,7 @@ db::commitlog::segment_manager::list_descriptors(sstring dirname) {
}
future<> db::commitlog::segment_manager::init() {
return list_descriptors(cfg.commit_log_location).then([this](auto descs) {
return list_descriptors(cfg.commit_log_location).then([this](std::vector<descriptor> descs) {
segment_id_type id = std::chrono::duration_cast<std::chrono::milliseconds>(runtime::get_boot_time().time_since_epoch()).count() + 1;
for (auto& d : descs) {
id = std::max(id, replay_position(d.id).base_id());
@@ -832,9 +969,23 @@ scollectd::registrations db::commitlog::segment_manager::create_counters() {
),
add_polled_metric(type_instance_id("commitlog"
, per_cpu_plugin_instance, "queue_length", "pending_operations")
, make_typed(data_type::GAUGE, totals.pending_operations)
, per_cpu_plugin_instance, "queue_length", "pending_writes")
, make_typed(data_type::GAUGE, totals.pending_writes)
),
add_polled_metric(type_instance_id("commitlog"
, per_cpu_plugin_instance, "queue_length", "pending_flushes")
, make_typed(data_type::GAUGE, totals.pending_flushes)
),
add_polled_metric(type_instance_id("commitlog"
, per_cpu_plugin_instance, "total_operations", "write_limit_exceeded")
, make_typed(data_type::DERIVE, totals.write_limit_exceeded)
),
add_polled_metric(type_instance_id("commitlog"
, per_cpu_plugin_instance, "total_operations", "flush_limit_exceeded")
, make_typed(data_type::DERIVE, totals.flush_limit_exceeded)
),
add_polled_metric(type_instance_id("commitlog"
, per_cpu_plugin_instance, "memory", "total_size")
, make_typed(data_type::GAUGE, totals.total_size)
@@ -963,7 +1114,7 @@ std::ostream& db::operator<<(std::ostream& out, const db::replay_position& p) {
void db::commitlog::segment_manager::discard_unused_segments() {
logger.trace("Checking for unused segments ({} active)", _segments.size());
auto i = std::remove_if(_segments.begin(), _segments.end(), [=](auto s) {
auto i = std::remove_if(_segments.begin(), _segments.end(), [=](sseg_ptr s) {
if (s->can_delete()) {
logger.debug("Segment {} is unused", *s);
return true;
@@ -1057,7 +1208,7 @@ void db::commitlog::segment_manager::on_timer() {
return this->allocate_segment(false).then([this](sseg_ptr s) {
if (!_shutdown) {
// insertion sort.
auto i = std::upper_bound(_reserve_segments.begin(), _reserve_segments.end(), s, [](auto s1, auto s2) {
auto i = std::upper_bound(_reserve_segments.begin(), _reserve_segments.end(), s, [](sseg_ptr s1, sseg_ptr s2) {
const descriptor& d1 = s1->_desc;
const descriptor& d2 = s2->_desc;
return d1.id < d2.id;
@@ -1069,7 +1220,7 @@ void db::commitlog::segment_manager::on_timer() {
--_reserve_allocating;
});
});
}).handle_exception([](auto ep) {
}).handle_exception([](std::exception_ptr ep) {
logger.warn("Exception in segment reservation: {}", ep);
});
arm();
@@ -1086,6 +1237,19 @@ std::vector<sstring> db::commitlog::segment_manager::get_active_names() const {
return res;
}
uint64_t db::commitlog::segment_manager::get_num_dirty_segments() const {
return std::count_if(_segments.begin(), _segments.end(), [](sseg_ptr s) {
return !s->is_still_allocating() && !s->is_clean();
});
}
uint64_t db::commitlog::segment_manager::get_num_active_segments() const {
return std::count_if(_segments.begin(), _segments.end(), [](sseg_ptr s) {
return s->is_still_allocating();
});
}
db::commitlog::segment_manager::buffer_type db::commitlog::segment_manager::acquire_buffer(size_t s) {
auto i = _temp_buffers.begin();
auto e = _temp_buffers.end();
@@ -1128,8 +1292,44 @@ void db::commitlog::segment_manager::release_buffer(buffer_type&& b) {
*/
future<db::replay_position> db::commitlog::add(const cf_id_type& id,
size_t size, serializer_func func) {
return _segment_manager->active_segment().then([=](auto s) {
return s->allocate(id, size, std::move(func));
class serializer_func_entry_writer final : public entry_writer {
serializer_func _func;
size_t _size;
public:
serializer_func_entry_writer(size_t sz, serializer_func func)
: _func(std::move(func)), _size(sz)
{ }
virtual size_t size(segment&) override { return _size; }
virtual void write(segment&, output& out) override {
_func(out);
}
};
auto writer = ::make_shared<serializer_func_entry_writer>(size, std::move(func));
return _segment_manager->active_segment().then([id, writer] (auto s) {
return s->allocate(id, writer);
});
}
future<db::replay_position> db::commitlog::add_entry(const cf_id_type& id, const commitlog_entry_writer& cew)
{
class cl_entry_writer final : public entry_writer {
commitlog_entry_writer _writer;
public:
cl_entry_writer(const commitlog_entry_writer& wr) : _writer(wr) { }
virtual size_t size(segment& seg) override {
_writer.set_with_schema(!seg.is_schema_version_known(_writer.schema()));
return _writer.size();
}
virtual void write(segment& seg, output& out) override {
if (_writer.with_schema()) {
seg.add_schema_version(_writer.schema());
}
_writer.write(out);
}
};
auto writer = ::make_shared<cl_entry_writer>(cew);
return _segment_manager->active_segment().then([id, writer] (auto s) {
return s->allocate(id, writer);
});
}
@@ -1200,11 +1400,18 @@ future<> db::commitlog::shutdown() {
return _segment_manager->shutdown();
}
size_t db::commitlog::max_record_size() const {
return _segment_manager->max_mutation_size - segment::entry_overhead_size;
}
uint64_t db::commitlog::max_active_writes() const {
return _segment_manager->cfg.max_active_writes;
}
uint64_t db::commitlog::max_active_flushes() const {
return _segment_manager->cfg.max_active_flushes;
}
future<> db::commitlog::clear() {
return _segment_manager->clear();
}
@@ -1386,10 +1593,6 @@ db::commitlog::read_log_file(file f, commit_load_reader_func next, position_type
return skip(slack);
}
if (start_off > pos) {
return skip(size - entry_header_size);
}
return fin.read_exactly(size - entry_header_size).then([this, size, crc = std::move(crc), rp](temporary_buffer<char> buf) mutable {
advance(buf);
@@ -1459,7 +1662,28 @@ uint64_t db::commitlog::get_flush_count() const {
}
uint64_t db::commitlog::get_pending_tasks() const {
return _segment_manager->totals.pending_operations;
return _segment_manager->totals.pending_writes
+ _segment_manager->totals.pending_flushes;
}
uint64_t db::commitlog::get_pending_writes() const {
return _segment_manager->totals.pending_writes;
}
uint64_t db::commitlog::get_pending_flushes() const {
return _segment_manager->totals.pending_flushes;
}
uint64_t db::commitlog::get_pending_allocations() const {
return _segment_manager->totals.pending_allocations;
}
uint64_t db::commitlog::get_write_limit_exceeded_count() const {
return _segment_manager->totals.write_limit_exceeded;
}
uint64_t db::commitlog::get_flush_limit_exceeded_count() const {
return _segment_manager->totals.flush_limit_exceeded;
}
uint64_t db::commitlog::get_num_segments_created() const {
@@ -1470,6 +1694,14 @@ uint64_t db::commitlog::get_num_segments_destroyed() const {
return _segment_manager->totals.segments_destroyed;
}
uint64_t db::commitlog::get_num_dirty_segments() const {
return _segment_manager->get_num_dirty_segments();
}
uint64_t db::commitlog::get_num_active_segments() const {
return _segment_manager->get_num_active_segments();
}
future<std::vector<db::commitlog::descriptor>> db::commitlog::list_existing_descriptors() const {
return list_existing_descriptors(active_config().commit_log_location);
}

View File

@@ -48,6 +48,7 @@
#include "core/stream.hh"
#include "utils/UUID.hh"
#include "replay_position.hh"
#include "commitlog_entry.hh"
class file;
@@ -114,6 +115,10 @@ public:
// Max number of segments to keep in pre-alloc reserve.
// Not (yet) configurable from scylla.conf.
uint64_t max_reserve_segments = 12;
// Max active writes/flushes. Default value
// zero means try to figure it out ourselves
uint64_t max_active_writes = 0;
uint64_t max_active_flushes = 0;
sync_mode mode = sync_mode::PERIODIC;
};
@@ -181,6 +186,13 @@ public:
});
}
/**
* Add an entry to the commit log.
*
* @param entry_writer a writer responsible for writing the entry
*/
future<replay_position> add_entry(const cf_id_type& id, const commitlog_entry_writer& entry_writer);
/**
* Modifies the per-CF dirty cursors of any commit log segments for the column family according to the position
* given. Discards any commit log segments that are no longer used.
@@ -233,14 +245,37 @@ public:
uint64_t get_completed_tasks() const;
uint64_t get_flush_count() const;
uint64_t get_pending_tasks() const;
uint64_t get_pending_writes() const;
uint64_t get_pending_flushes() const;
uint64_t get_pending_allocations() const;
uint64_t get_write_limit_exceeded_count() const;
uint64_t get_flush_limit_exceeded_count() const;
uint64_t get_num_segments_created() const;
uint64_t get_num_segments_destroyed() const;
/**
* Get number of inactive (finished), segments lingering
* due to still being dirty
*/
uint64_t get_num_dirty_segments() const;
/**
* Get number of active segments, i.e. still being allocated to
*/
uint64_t get_num_active_segments() const;
/**
* Returns the largest amount of data that can be written in a single "mutation".
*/
size_t max_record_size() const;
/**
* Return max allowed pending writes (per this shard)
*/
uint64_t max_active_writes() const;
/**
* Return max allowed pending flushes (per this shard)
*/
uint64_t max_active_flushes() const;
future<> clear();
const config& active_config() const;
@@ -283,6 +318,11 @@ public:
const sstring&, commit_load_reader_func, position_type = 0);
private:
commitlog(config);
struct entry_writer {
virtual size_t size(segment&) = 0;
virtual void write(segment&, output&) = 0;
};
};
}

View File

@@ -0,0 +1,88 @@
/*
* Copyright 2016 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <experimental/optional>
#include "frozen_mutation.hh"
#include "schema.hh"
namespace stdx = std::experimental;
class commitlog_entry_writer {
schema_ptr _schema;
db::serializer<column_mapping> _column_mapping_serializer;
const frozen_mutation& _mutation;
bool _with_schema = true;
public:
commitlog_entry_writer(schema_ptr s, const frozen_mutation& fm)
: _schema(std::move(s)), _column_mapping_serializer(_schema->get_column_mapping()), _mutation(fm)
{ }
void set_with_schema(bool value) {
_with_schema = value;
}
bool with_schema() {
return _with_schema;
}
schema_ptr schema() const {
return _schema;
}
size_t size() const {
size_t size = data_output::serialized_size<bool>();
if (_with_schema) {
size += _column_mapping_serializer.size();
}
size += _mutation.representation().size();
return size;
}
void write(data_output& out) const {
out.write(_with_schema);
if (_with_schema) {
_column_mapping_serializer.write(out);
}
auto bv = _mutation.representation();
out.write(bv.begin(), bv.end());
}
};
class commitlog_entry_reader {
frozen_mutation _mutation;
stdx::optional<column_mapping> _column_mapping;
public:
commitlog_entry_reader(const temporary_buffer<char>& buffer)
: _mutation(bytes())
{
data_input in(buffer);
bool has_column_mapping = in.read<bool>();
if (has_column_mapping) {
_column_mapping = db::serializer<::column_mapping>::read(in);
}
auto bv = in.read_view(in.avail());
_mutation = frozen_mutation(bytes(bv.begin(), bv.end()));
}
const stdx::optional<column_mapping>& get_column_mapping() const { return _column_mapping; }
const frozen_mutation& mutation() const { return _mutation; }
};

View File

@@ -56,10 +56,14 @@
#include "db/serializer.hh"
#include "cql3/query_processor.hh"
#include "log.hh"
#include "converting_mutation_partition_applier.hh"
#include "schema_registry.hh"
#include "commitlog_entry.hh"
static logging::logger logger("commitlog_replayer");
class db::commitlog_replayer::impl {
std::unordered_map<table_schema_version, column_mapping> _column_mappings;
public:
impl(seastar::sharded<cql3::query_processor>& db);
@@ -70,6 +74,19 @@ public:
uint64_t skipped_mutations = 0;
uint64_t applied_mutations = 0;
uint64_t corrupt_bytes = 0;
stats& operator+=(const stats& s) {
invalid_mutations += s.invalid_mutations;
skipped_mutations += s.skipped_mutations;
applied_mutations += s.applied_mutations;
corrupt_bytes += s.corrupt_bytes;
return *this;
}
stats operator+(const stats& s) const {
stats tmp = *this;
tmp += s;
return tmp;
}
};
future<> process(stats*, temporary_buffer<char> buf, replay_position rp);
@@ -148,8 +165,6 @@ future<> db::commitlog_replayer::impl::init() {
future<db::commitlog_replayer::impl::stats>
db::commitlog_replayer::impl::recover(sstring file) {
logger.info("Replaying {}", file);
replay_position rp{commitlog::descriptor(file)};
auto gp = _min_pos[rp.shard_id()];
@@ -182,19 +197,29 @@ db::commitlog_replayer::impl::recover(sstring file) {
}
future<> db::commitlog_replayer::impl::process(stats* s, temporary_buffer<char> buf, replay_position rp) {
auto shard = rp.shard_id();
if (rp < _min_pos[shard]) {
logger.trace("entry {} is less than global min position. skipping", rp);
s->skipped_mutations++;
return make_ready_future<>();
}
try {
frozen_mutation fm(bytes(reinterpret_cast<const int8_t *>(buf.get()), buf.size()));
commitlog_entry_reader cer(buf);
auto& fm = cer.mutation();
auto cm_it = _column_mappings.find(fm.schema_version());
if (cm_it == _column_mappings.end()) {
if (!cer.get_column_mapping()) {
throw std::runtime_error(sprint("unknown schema version {}", fm.schema_version()));
}
logger.debug("new schema version {} in entry {}", fm.schema_version(), rp);
cm_it = _column_mappings.emplace(fm.schema_version(), *cer.get_column_mapping()).first;
}
auto shard_id = rp.shard_id();
if (rp < _min_pos[shard_id]) {
logger.trace("entry {} is less than global min position. skipping", rp);
s->skipped_mutations++;
return make_ready_future<>();
}
auto uuid = fm.column_family_id();
auto& map = _rpm[shard];
auto& map = _rpm[shard_id];
auto i = map.find(uuid);
if (i != map.end() && rp <= i->second) {
logger.trace("entry {} at {} is younger than recorded replay position {}. skipping", fm.column_family_id(), rp, i->second);
@@ -203,14 +228,15 @@ future<> db::commitlog_replayer::impl::process(stats* s, temporary_buffer<char>
}
auto shard = _qp.local().db().local().shard_of(fm);
return _qp.local().db().invoke_on(shard, [fm = std::move(fm), rp, shard, s] (database& db) -> future<> {
return _qp.local().db().invoke_on(shard, [this, cer = std::move(cer), cm_it, rp, shard, s] (database& db) -> future<> {
auto& fm = cer.mutation();
// TODO: might need better verification that the deserialized mutation
// is schema compatible. My guess is that just applying the mutation
// will not do this.
auto& cf = db.find_column_family(fm.column_family_id());
if (logger.is_enabled(logging::log_level::debug)) {
logger.debug("replaying at {} {}:{} at {}", fm.column_family_id(),
logger.debug("replaying at {} v={} {}:{} at {}", fm.column_family_id(), fm.schema_version(),
cf.schema()->ks_name(), cf.schema()->cf_name(), rp);
}
// Removed forwarding "new" RP. Instead give none/empty.
@@ -218,7 +244,15 @@ future<> db::commitlog_replayer::impl::process(stats* s, temporary_buffer<char>
// The end result should be that once sstables are flushed out
// their "replay_position" attribute will be empty, which is
// lower than anything the new session will produce.
cf.apply(fm);
if (cf.schema()->version() != fm.schema_version()) {
const column_mapping& cm = cm_it->second;
mutation m(fm.decorated_key(*cf.schema()), cf.schema());
converting_mutation_partition_applier v(cm, *cf.schema(), m.partition());
fm.partition().accept(cm, v);
cf.apply(std::move(m));
} else {
cf.apply(fm, cf.schema());
}
s->applied_mutations++;
return make_ready_future<>();
}).handle_exception([s](auto ep) {
@@ -258,32 +292,41 @@ future<db::commitlog_replayer> db::commitlog_replayer::create_replayer(seastar::
}
future<> db::commitlog_replayer::recover(std::vector<sstring> files) {
return parallel_for_each(files, [this](auto f) {
return this->recover(f);
logger.info("Replaying {}", join(", ", files));
return map_reduce(files, [this](auto f) {
logger.debug("Replaying {}", f);
return _impl->recover(f).then([f](impl::stats stats) {
if (stats.corrupt_bytes != 0) {
logger.warn("Corrupted file: {}. {} bytes skipped.", f, stats.corrupt_bytes);
}
logger.debug("Log replay of {} complete, {} replayed mutations ({} invalid, {} skipped)"
, f
, stats.applied_mutations
, stats.invalid_mutations
, stats.skipped_mutations
);
return make_ready_future<impl::stats>(stats);
}).handle_exception([f](auto ep) -> future<impl::stats> {
logger.error("Error recovering {}: {}", f, ep);
try {
std::rethrow_exception(ep);
} catch (std::invalid_argument&) {
logger.error("Scylla cannot process {}. Make sure to fully flush all Cassandra commit log files to sstable before migrating.", f);
throw;
} catch (...) {
throw;
}
});
}, impl::stats(), std::plus<impl::stats>()).then([](impl::stats totals) {
logger.info("Log replay complete, {} replayed mutations ({} invalid, {} skipped)"
, totals.applied_mutations
, totals.invalid_mutations
, totals.skipped_mutations
);
});
}
future<> db::commitlog_replayer::recover(sstring f) {
return _impl->recover(f).then([f](impl::stats stats) {
if (stats.corrupt_bytes != 0) {
logger.warn("Corrupted file: {}. {} bytes skipped.", f, stats.corrupt_bytes);
}
logger.info("Log replay of {} complete, {} replayed mutations ({} invalid, {} skipped)"
, f
, stats.applied_mutations
, stats.invalid_mutations
, stats.skipped_mutations
);
}).handle_exception([f](auto ep) {
logger.error("Error recovering {}: {}", f, ep);
try {
std::rethrow_exception(ep);
} catch (std::invalid_argument&) {
logger.error("Scylla cannot process {}. Make sure to fully flush all Cassandra commit log files to sstable before migrating.");
throw;
} catch (...) {
throw;
}
});;
return recover(std::vector<sstring>{ f });
}

View File

@@ -30,6 +30,7 @@
#include "core/shared_ptr.hh"
#include "core/fstream.hh"
#include "core/do_with.hh"
#include "core/print.hh"
#include "log.hh"
#include <boost/any.hpp>
@@ -432,3 +433,9 @@ boost::filesystem::path db::config::get_conf_dir() {
return confdir;
}
void db::config::check_experimental(const sstring& what) const {
if (!experimental()) {
throw std::runtime_error(sprint("%s is currently disabled. Start Scylla with --experimental=on to enable.", what));
}
}

View File

@@ -102,6 +102,9 @@ public:
config();
// Throws exception if experimental feature is disabled.
void check_experimental(const sstring& what) const;
boost::program_options::options_description
get_options_description();
@@ -265,7 +268,7 @@ public:
"Counter writes read the current values before incrementing and writing them back. The recommended value is (16 × number_of_drives) ." \
) \
/* Common automatic backup settings */ \
val(incremental_backups, bool, false, Unused, \
val(incremental_backups, bool, false, Used, \
"Backs up data updated since the last snapshot was taken. When enabled, Cassandra creates a hard link to each SSTable flushed or streamed locally in a backups/ subdirectory of the keyspace data. Removing these links is the operator's responsibility.\n" \
"Related information: Enabling incremental backups" \
) \
@@ -383,7 +386,7 @@ public:
"This setting has been removed from default configuration. It makes new (non-seed) nodes automatically migrate the right data to themselves. When initializing a fresh cluster with no data, add auto_bootstrap: false.\n" \
"Related information: Initializing a multiple node cluster (single data center) and Initializing a multiple node cluster (multiple data centers)." \
) \
val(batch_size_warn_threshold_in_kb, uint32_t, 5, Unused, \
val(batch_size_warn_threshold_in_kb, uint32_t, 5, Used, \
"Log WARN on any batch size exceeding this value in kilobytes. Caution should be taken on increasing the size of this threshold as it can lead to node instability." \
) \
val(broadcast_address, sstring, /* listen_address */, Used, \
@@ -638,8 +641,8 @@ public:
) \
/* Security properties */ \
/* Server and client security settings. */ \
val(authenticator, sstring, "org.apache.cassandra.auth.AllowAllAuthenticator", Unused, \
"The authentication backend. It implements IAuthenticator, which is used to identify users. The available authenticators are:\n" \
val(authenticator, sstring, "org.apache.cassandra.auth.AllowAllAuthenticator", Used, \
"The authentication backend, used to identify users. The available authenticators are:\n" \
"\n" \
"\torg.apache.cassandra.auth.AllowAllAuthenticator : Disables authentication; no checks are performed.\n" \
"\torg.apache.cassandra.auth.PasswordAuthenticator : Authenticates users with user names and hashed passwords stored in the system_auth.credentials table. If you use the default, 1, and the node with the lone replica goes down, you will not be able to log into the cluster because the system_auth keyspace was not replicated.\n" \
@@ -690,7 +693,7 @@ public:
val(ssl_storage_port, uint32_t, 7001, Used, \
"The SSL port for encrypted communication. Unused unless enabled in encryption_options." \
) \
val(default_log_level, sstring, "warn", Used, \
val(default_log_level, sstring, "info", Used, \
"Default log level for log messages. Valid values are trace, debug, info, warn, error.") \
val(logger_log_level, string_map, /* none */, Used,\
"map of logger name to log level. Valid values are trace, debug, info, warn, error. " \
@@ -715,7 +718,10 @@ public:
val(replace_address_first_boot, sstring, "", Used, "Like replace_address option, but if the node has been bootstrapped sucessfully it will be ignored. Same as -Dcassandra.replace_address_first_boot.") \
val(override_decommission, bool, false, Used, "Set true to force a decommissioned node to join the cluster") \
val(ring_delay_ms, uint32_t, 30 * 1000, Used, "Time a node waits to hear from other nodes before joining the ring in milliseconds. Same as -Dcassandra.ring_delay_ms in cassandra.") \
val(shutdown_announce_in_ms, uint32_t, 2 * 1000, Used, "Time a node waits after sending gossip shutdown message in milliseconds. Same as -Dcassandra.shutdown_announce_in_ms in cassandra.") \
val(developer_mode, bool, false, Used, "Relax environement checks. Setting to true can reduce performance and reliability significantly.") \
val(skip_wait_for_gossip_to_settle, int32_t, -1, Used, "An integer to configure the wait for gossip to settle. -1: wait normally, 0: do not wait at all, n: wait for at most n polls. Same as -Dcassandra.skip_wait_for_gossip_to_settle in cassandra.") \
val(experimental, bool, false, Used, "Set to true to unlock experimental features.") \
/* done! */
#define _make_value_member(name, type, deflt, status, desc, ...) \
@@ -732,5 +738,4 @@ private:
int _dummy;
};
}

View File

@@ -50,16 +50,20 @@ namespace db {
namespace marshal {
type_parser::type_parser(const sstring& str, size_t idx)
: _str{str}
type_parser::type_parser(sstring_view str, size_t idx)
: _str{str.begin(), str.end()}
, _idx{idx}
{ }
type_parser::type_parser(const sstring& str)
type_parser::type_parser(sstring_view str)
: type_parser{str, 0}
{ }
data_type type_parser::parse(const sstring& str) {
return type_parser(sstring_view(str)).parse();
}
data_type type_parser::parse(sstring_view str) {
return type_parser(str).parse();
}

View File

@@ -62,14 +62,15 @@ class type_parser {
public static final TypeParser EMPTY_PARSER = new TypeParser("", 0);
#endif
type_parser(const sstring& str, size_t idx);
type_parser(sstring_view str, size_t idx);
public:
explicit type_parser(const sstring& str);
explicit type_parser(sstring_view str);
/**
* Parse a string containing an type definition.
*/
static data_type parse(const sstring& str);
static data_type parse(sstring_view str);
#if 0
public static AbstractType<?> parse(CharSequence compareWith) throws SyntaxException, ConfigurationException

View File

@@ -46,6 +46,7 @@
#include "system_keyspace.hh"
#include "query_context.hh"
#include "query-result-set.hh"
#include "query-result-writer.hh"
#include "schema_builder.hh"
#include "map_difference.hh"
#include "utils/UUID_gen.hh"
@@ -53,9 +54,12 @@
#include "core/thread.hh"
#include "json.hh"
#include "log.hh"
#include "frozen_schema.hh"
#include "schema_registry.hh"
#include "db/marshal/type_parser.hh"
#include "db/config.hh"
#include "md5_hasher.hh"
#include <boost/range/algorithm/copy.hpp>
#include <boost/range/adaptor/map.hpp>
@@ -70,6 +74,36 @@ namespace schema_tables {
logging::logger logger("schema_tables");
struct qualified_name {
sstring keyspace_name;
sstring table_name;
qualified_name(sstring keyspace_name, sstring table_name)
: keyspace_name(std::move(keyspace_name))
, table_name(std::move(table_name))
{ }
qualified_name(const schema_ptr& s)
: keyspace_name(s->ks_name())
, table_name(s->cf_name())
{ }
bool operator<(const qualified_name& o) const {
return keyspace_name < o.keyspace_name
|| (keyspace_name == o.keyspace_name && table_name < o.table_name);
}
bool operator==(const qualified_name& o) const {
return keyspace_name == o.keyspace_name && table_name == o.table_name;
}
};
static future<schema_mutations> read_table_mutations(distributed<service::storage_proxy>& proxy, const qualified_name& table);
static void merge_tables(distributed<service::storage_proxy>& proxy,
std::map<qualified_name, schema_mutations>&& before,
std::map<qualified_name, schema_mutations>&& after);
std::vector<const char*> ALL { KEYSPACES, COLUMNFAMILIES, COLUMNS, TRIGGERS, USERTYPES, /* not present in 2.1.8: FUNCTIONS, AGGREGATES */ };
using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
@@ -95,7 +129,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
"keyspace definitions"
)));
builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
return builder.build(schema_builder::compact_storage::yes);
builder.with(schema_builder::compact_storage::yes);
builder.with_version(generate_schema_version(builder.uuid()));
return builder.build();
}();
return keyspaces;
}
@@ -147,7 +183,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
"table definitions"
)));
builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
return builder.build(schema_builder::compact_storage::no);
builder.with(schema_builder::compact_storage::no);
builder.with_version(generate_schema_version(builder.uuid()));
return builder.build();
}();
return columnfamilies;
}
@@ -176,7 +214,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
"column definitions"
)));
builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
return builder.build(schema_builder::compact_storage::no);
builder.with(schema_builder::compact_storage::no);
builder.with_version(generate_schema_version(builder.uuid()));
return builder.build();
}();
return columns;
}
@@ -200,7 +240,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
"trigger definitions"
)));
builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
return builder.build(schema_builder::compact_storage::no);
builder.with(schema_builder::compact_storage::no);
builder.with_version(generate_schema_version(builder.uuid()));
return builder.build();
}();
return triggers;
}
@@ -225,7 +267,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
"user defined type definitions"
)));
builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
return builder.build(schema_builder::compact_storage::no);
builder.with(schema_builder::compact_storage::no);
builder.with_version(generate_schema_version(builder.uuid()));
return builder.build();
}();
return usertypes;
}
@@ -254,7 +298,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
"user defined type definitions"
)));
builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
return builder.build(schema_builder::compact_storage::no);
builder.with(schema_builder::compact_storage::no);
builder.with_version(generate_schema_version(builder.uuid()));
return builder.build();
}();
return functions;
}
@@ -283,7 +329,9 @@ using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
"user defined aggregate definitions"
)));
builder.set_gc_grace_seconds(std::chrono::duration_cast<std::chrono::seconds>(days(7)).count());
return builder.build(schema_builder::compact_storage::no);
builder.with(schema_builder::compact_storage::no);
builder.with_version(generate_schema_version(builder.uuid()));
return builder.build();
}();
return aggregates;
}
@@ -295,10 +343,11 @@ future<> save_system_keyspace_schema() {
// delete old, possibly obsolete entries in schema tables
return parallel_for_each(ALL, [ksm] (sstring cf) {
return db::execute_cql("DELETE FROM system.%s WHERE keyspace_name = ?", cf, ksm->name()).discard_result();
auto deletion_timestamp = schema_creation_timestamp() - 1;
return db::execute_cql(sprint("DELETE FROM system.%%s USING TIMESTAMP %s WHERE keyspace_name = ?",
deletion_timestamp), cf, ksm->name()).discard_result();
}).then([ksm] {
// (+1 to timestamp to make sure we don't get shadowed by the tombstones we just added)
auto mvec = make_create_keyspace_mutations(ksm, qctx->next_timestamp(), true);
auto mvec = make_create_keyspace_mutations(ksm, schema_creation_timestamp(), true);
return qctx->proxy().mutate_locally(std::move(mvec));
});
}
@@ -326,36 +375,30 @@ future<utils::UUID> calculate_schema_digest(distributed<service::storage_proxy>&
auto map = [&proxy] (sstring table) {
return db::system_keyspace::query_mutations(proxy, table).then([&proxy, table] (auto rs) {
auto s = proxy.local().get_db().local().find_schema(system_keyspace::NAME, table);
std::vector<query::result> results;
std::vector<mutation> mutations;
for (auto&& p : rs->partitions()) {
auto mut = p.mut().unfreeze(s);
auto partition_key = value_cast<sstring>(utf8_type->deserialize(mut.key().get_component(*s, 0)));
if (partition_key == system_keyspace::NAME) {
continue;
}
auto slice = partition_slice_builder(*s).build();
results.emplace_back(mut.query(slice));
mutations.emplace_back(std::move(mut));
}
return results;
return mutations;
});
};
auto reduce = [] (auto& hash, auto&& results) {
for (auto&& rs : results) {
for (auto&& f : rs.buf().fragments()) {
hash.Update(reinterpret_cast<const unsigned char*>(f.begin()), f.size());
}
auto reduce = [] (auto& hash, auto&& mutations) {
for (const mutation& m : mutations) {
feed_hash_for_schema_digest(hash, m);
}
return make_ready_future<>();
};
return do_with(CryptoPP::Weak::MD5{}, [map, reduce] (auto& hash) {
return do_with(md5_hasher(), [map, reduce] (auto& hash) {
return do_for_each(ALL.begin(), ALL.end(), [&hash, map, reduce] (auto& table) {
return map(table).then([&hash, reduce] (auto&& results) {
return reduce(hash, results);
return map(table).then([&hash, reduce] (auto&& mutations) {
reduce(hash, mutations);
});
}).then([&hash] {
bytes digest{bytes::initialized_later(), CryptoPP::Weak::MD5::DIGESTSIZE};
hash.Final(reinterpret_cast<unsigned char*>(digest.begin()));
return make_ready_future<utils::UUID>(utils::UUID_gen::get_name_UUID(digest));
return make_ready_future<utils::UUID>(utils::UUID_gen::get_name_UUID(hash.finalize()));
});
});
}
@@ -398,6 +441,28 @@ read_schema_for_keyspaces(distributed<service::storage_proxy>& proxy, const sstr
return map_reduce(keyspace_names.begin(), keyspace_names.end(), map, schema_result{}, insert);
}
static
future<mutation> query_partition_mutation(service::storage_proxy& proxy,
schema_ptr s,
lw_shared_ptr<query::read_command> cmd,
partition_key pkey)
{
auto dk = dht::global_partitioner().decorate_key(*s, pkey);
return do_with(query::partition_range::make_singular(dk), [&proxy, dk, s = std::move(s), cmd = std::move(cmd)] (auto& range) {
return proxy.query_mutations_locally(s, std::move(cmd), range)
.then([dk = std::move(dk), s](foreign_ptr<lw_shared_ptr<reconcilable_result>> res) {
auto&& partitions = res->partitions();
if (partitions.size() == 0) {
return mutation(std::move(dk), s);
} else if (partitions.size() == 1) {
return partitions[0].mut().unfreeze(s);
} else {
assert(false && "Results must have at most one partition");
}
});
});
}
future<schema_result_value_type>
read_schema_partition_for_keyspace(distributed<service::storage_proxy>& proxy, const sstring& schema_table_name, const sstring& keyspace_name)
{
@@ -409,16 +474,18 @@ read_schema_partition_for_keyspace(distributed<service::storage_proxy>& proxy, c
});
}
future<schema_result_value_type>
future<mutation>
read_schema_partition_for_table(distributed<service::storage_proxy>& proxy, const sstring& schema_table_name, const sstring& keyspace_name, const sstring& table_name)
{
auto schema = proxy.local().get_db().local().find_schema(system_keyspace::NAME, schema_table_name);
auto keyspace_key = dht::global_partitioner().decorate_key(*schema,
partition_key::from_singular(*schema, keyspace_name));
auto clustering_range = query::clustering_range(clustering_key_prefix::from_clustering_prefix(*schema, exploded_clustering_prefix({utf8_type->decompose(table_name)})));
return db::system_keyspace::query(proxy, schema_table_name, keyspace_key, clustering_range).then([keyspace_name] (auto&& rs) {
return schema_result_value_type{keyspace_name, std::move(rs)};
});
auto keyspace_key = partition_key::from_singular(*schema, keyspace_name);
auto clustering_range = query::clustering_range(clustering_key_prefix::from_clustering_prefix(
*schema, exploded_clustering_prefix({utf8_type->decompose(table_name)})));
auto slice = partition_slice_builder(*schema)
.with_range(std::move(clustering_range))
.build();
auto cmd = make_lw_shared<query::read_command>(schema->id(), schema->version(), std::move(slice), query::max_rows);
return query_partition_mutation(proxy.local(), std::move(schema), std::move(cmd), std::move(keyspace_key));
}
static semaphore the_merge_lock;
@@ -452,7 +519,7 @@ future<> merge_schema(distributed<service::storage_proxy>& proxy, std::vector<mu
}
future<> merge_schema(distributed<service::storage_proxy>& proxy, std::vector<mutation> mutations, bool do_flush)
{
{
return merge_lock().then([&proxy, mutations = std::move(mutations), do_flush] () mutable {
return do_merge_schema(proxy, std::move(mutations), do_flush);
}).finally([] {
@@ -460,6 +527,35 @@ future<> merge_schema(distributed<service::storage_proxy>& proxy, std::vector<mu
});
}
// Returns names of live table definitions of given keyspace
future<std::vector<sstring>>
static read_table_names_of_keyspace(distributed<service::storage_proxy>& proxy, const sstring& keyspace_name) {
auto s = columnfamilies();
auto pkey = dht::global_partitioner().decorate_key(*s, partition_key::from_singular(*s, keyspace_name));
return db::system_keyspace::query(proxy, COLUMNFAMILIES, pkey).then([] (auto&& rs) {
std::vector<sstring> result;
for (const query::result_set_row& row : rs->rows()) {
result.emplace_back(row.get_nonnull<sstring>("columnfamily_name"));
}
return result;
});
}
// Call inside a seastar thread
static
std::map<qualified_name, schema_mutations>
read_tables_for_keyspaces(distributed<service::storage_proxy>& proxy, const std::set<sstring>& keyspace_names)
{
std::map<qualified_name, schema_mutations> result;
for (auto&& keyspace_name : keyspace_names) {
for (auto&& table_name : read_table_names_of_keyspace(proxy, keyspace_name).get0()) {
auto qn = qualified_name(keyspace_name, table_name);
result.emplace(qn, read_table_mutations(proxy, qn).get0());
}
}
return result;
}
future<> do_merge_schema(distributed<service::storage_proxy>& proxy, std::vector<mutation> mutations, bool do_flush)
{
return seastar::async([&proxy, mutations = std::move(mutations), do_flush] () mutable {
@@ -474,7 +570,7 @@ future<> do_merge_schema(distributed<service::storage_proxy>& proxy, std::vector
// current state of the schema
auto&& old_keyspaces = read_schema_for_keyspaces(proxy, KEYSPACES, keyspaces).get0();
auto&& old_column_families = read_schema_for_keyspaces(proxy, COLUMNFAMILIES, keyspaces).get0();
auto&& old_column_families = read_tables_for_keyspaces(proxy, keyspaces);
/*auto& old_types = */read_schema_for_keyspaces(proxy, USERTYPES, keyspaces).get0();
#if 0 // not in 2.1.8
/*auto& old_functions = */read_schema_for_keyspaces(proxy, FUNCTIONS, keyspaces).get0();
@@ -494,7 +590,7 @@ future<> do_merge_schema(distributed<service::storage_proxy>& proxy, std::vector
// with new data applied
auto&& new_keyspaces = read_schema_for_keyspaces(proxy, KEYSPACES, keyspaces).get0();
auto&& new_column_families = read_schema_for_keyspaces(proxy, COLUMNFAMILIES, keyspaces).get0();
auto&& new_column_families = read_tables_for_keyspaces(proxy, keyspaces);
/*auto& new_types = */read_schema_for_keyspaces(proxy, USERTYPES, keyspaces).get0();
#if 0 // not in 2.1.8
/*auto& new_functions = */read_schema_for_keyspaces(proxy, FUNCTIONS, keyspaces).get0();
@@ -502,7 +598,7 @@ future<> do_merge_schema(distributed<service::storage_proxy>& proxy, std::vector
#endif
std::set<sstring> keyspaces_to_drop = merge_keyspaces(proxy, std::move(old_keyspaces), std::move(new_keyspaces)).get0();
merge_tables(proxy, std::move(old_column_families), std::move(new_column_families)).get0();
merge_tables(proxy, std::move(old_column_families), std::move(new_column_families));
#if 0
mergeTypes(oldTypes, newTypes);
mergeFunctions(oldFunctions, newFunctions);
@@ -512,15 +608,7 @@ future<> do_merge_schema(distributed<service::storage_proxy>& proxy, std::vector
// it is safe to drop a keyspace only when all nested ColumnFamilies where deleted
for (auto&& keyspace_to_drop : keyspaces_to_drop) {
db.drop_keyspace(keyspace_to_drop);
}
// FIXME: clean this up by reorganizing the code
// Send CQL events only once, not once per shard.
if (engine().cpu_id() == 0) {
return do_for_each(keyspaces_to_drop, [] (auto& ks_name) {
return service::migration_manager::notify_drop_keyspace(ks_name);
});
} else {
return make_ready_future<>();
service::get_local_migration_manager().notify_drop_keyspace(keyspace_to_drop);
}
}).get0();
});
@@ -551,138 +639,84 @@ future<std::set<sstring>> merge_keyspaces(distributed<service::storage_proxy>& p
}
for (auto&& key : diff.entries_only_on_right) {
auto&& value = after[key];
if (!value->empty()) {
created.emplace_back(schema_result_value_type{key, std::move(value)});
}
created.emplace_back(schema_result_value_type{key, std::move(value)});
}
for (auto&& key : diff.entries_differing) {
sstring keyspace_name = key;
auto&& pre = before[key];
auto&& post = after[key];
if (!pre->empty() && !post->empty()) {
altered.emplace_back(keyspace_name);
} else if (!pre->empty()) {
dropped.emplace(keyspace_name);
} else if (!post->empty()) { // a (re)created keyspace
created.emplace_back(schema_result_value_type{key, std::move(post)});
}
altered.emplace_back(key);
}
return do_with(std::move(created), [&proxy, altered = std::move(altered)] (auto& created) {
return proxy.local().get_db().invoke_on_all([&created, altered = std::move(altered)] (database& db) {
return do_for_each(created, [&db] (auto&& val) {
return do_for_each(created, [&db](auto&& val) {
auto ksm = create_keyspace_from_schema_partition(val);
return db.create_keyspace(std::move(ksm));
return db.create_keyspace(ksm).then([ksm] {
service::get_local_migration_manager().notify_create_keyspace(ksm);
});
}).then([&altered, &db] () mutable {
for (auto&& name : altered) {
db.update_keyspace(name);
}
return make_ready_future<>();
});
}).then([&created] {
// FIXME: clean this up by reorganizing the code
// Send CQL events only once, not once per shard.
if (engine().cpu_id() == 0) {
return do_for_each(created, [] (auto&& partition) {
auto ksm = create_keyspace_from_schema_partition(partition);
return service::migration_manager::notify_create_keyspace(ksm);
});
} else {
return make_ready_future<>();
}
});
}).then([dropped = std::move(dropped)] () {
return make_ready_future<std::set<sstring>>(dropped);
});
}
static void update_column_family(database& db, schema_ptr new_schema) {
column_family& cfm = db.find_column_family(new_schema->id());
bool columns_changed = !cfm.schema()->equal_columns(*new_schema);
auto s = local_schema_registry().learn(new_schema);
s->registry_entry()->mark_synced();
cfm.set_schema(std::move(s));
service::get_local_migration_manager().notify_update_column_family(cfm.schema(), columns_changed);
}
// see the comments for merge_keyspaces()
future<> merge_tables(distributed<service::storage_proxy>& proxy, schema_result&& before, schema_result&& after)
static void merge_tables(distributed<service::storage_proxy>& proxy,
std::map<qualified_name, schema_mutations>&& before,
std::map<qualified_name, schema_mutations>&& after)
{
return do_with(std::make_pair(std::move(after), std::move(before)), [&proxy] (auto& pair) {
auto& after = pair.first;
auto& before = pair.second;
auto changed_at = db_clock::now();
return proxy.local().get_db().invoke_on_all([changed_at, &proxy, &before, &after] (database& db) {
return seastar::async([changed_at, &proxy, &db, &before, &after] {
std::vector<schema_ptr> created;
std::vector<schema_ptr> altered;
std::vector<schema_ptr> dropped;
auto diff = difference(before, after, [](const auto& x, const auto& y) -> bool {
return *x == *y;
auto changed_at = db_clock::now();
std::vector<global_schema_ptr> created;
std::vector<global_schema_ptr> altered;
std::vector<global_schema_ptr> dropped;
auto diff = difference(before, after);
for (auto&& key : diff.entries_only_on_left) {
auto&& s = proxy.local().get_db().local().find_schema(key.keyspace_name, key.table_name);
dropped.emplace_back(s);
}
for (auto&& key : diff.entries_only_on_right) {
created.emplace_back(create_table_from_mutations(after.at(key)));
}
for (auto&& key : diff.entries_differing) {
altered.emplace_back(create_table_from_mutations(after.at(key)));
}
proxy.local().get_db().invoke_on_all([&created, &dropped, &altered, changed_at] (database& db) {
return seastar::async([&] {
for (auto&& gs : created) {
schema_ptr s = gs.get();
auto& ks = db.find_keyspace(s->ks_name());
auto cfg = ks.make_column_family_config(*s);
db.add_column_family(s, cfg);
ks.make_directory_for_column_family(s->cf_name(), s->id()).get();
service::get_local_migration_manager().notify_create_column_family(s);
}
for (auto&& gs : altered) {
update_column_family(db, gs.get());
}
parallel_for_each(dropped.begin(), dropped.end(), [changed_at, &db](auto&& gs) {
schema_ptr s = gs.get();
return db.drop_column_family(changed_at, s->ks_name(), s->cf_name()).then([s] {
service::get_local_migration_manager().notify_drop_column_family(s);
});
for (auto&& key : diff.entries_only_on_left) {
auto&& rs = before[key];
for (const query::result_set_row& row : rs->rows()) {
auto ks_name = row.get_nonnull<sstring>("keyspace_name");
auto cf_name = row.get_nonnull<sstring>("columnfamily_name");
dropped.emplace_back(db.find_schema(ks_name, cf_name));
}
}
for (auto&& key : diff.entries_only_on_right) {
auto&& value = after[key];
if (!value->empty()) {
auto&& tables = create_tables_from_tables_partition(proxy, value).get0();
boost::copy(tables | boost::adaptors::map_values, std::back_inserter(created));
}
}
for (auto&& key : diff.entries_differing) {
sstring keyspace_name = key;
auto&& pre = before[key];
auto&& post = after[key];
if (!pre->empty() && !post->empty()) {
auto before = db.find_keyspace(keyspace_name).metadata()->cf_meta_data();
auto after = create_tables_from_tables_partition(proxy, post).get0();
auto delta = difference(std::map<sstring, schema_ptr>{before.begin(), before.end()}, after, [](const schema_ptr& x, const schema_ptr& y) -> bool {
return *x == *y;
});
for (auto&& key : delta.entries_only_on_left) {
dropped.emplace_back(before[key]);
}
for (auto&& key : delta.entries_only_on_right) {
created.emplace_back(after[key]);
}
for (auto&& key : delta.entries_differing) {
altered.emplace_back(after[key]);
}
} else if (!pre->empty()) {
auto before = db.find_keyspace(keyspace_name).metadata()->cf_meta_data();
boost::copy(before | boost::adaptors::map_values, std::back_inserter(dropped));
} else if (!post->empty()) {
auto tables = create_tables_from_tables_partition(proxy, post).get0();
boost::copy(tables | boost::adaptors::map_values, std::back_inserter(created));
}
}
for (auto&& cfm : created) {
auto& ks = db.find_keyspace(cfm->ks_name());
auto cfg = ks.make_column_family_config(*cfm);
db.add_column_family(cfm, cfg);
}
parallel_for_each(altered.begin(), altered.end(), [&db] (auto&& cfm) {
return db.update_column_family(cfm->ks_name(), cfm->cf_name());
}).get();
parallel_for_each(dropped.begin(), dropped.end(), [changed_at, &db] (auto&& cfm) {
return db.drop_column_family(changed_at, cfm->ks_name(), cfm->cf_name());
}).get();
// FIXME: clean this up by reorganizing the code
// Send CQL events only once, not once per shard.
if (engine().cpu_id() == 0) {
for (auto&& cfm : created) {
service::migration_manager::notify_create_column_family(cfm).get0();
auto& ks = db.find_keyspace(cfm->ks_name());
ks.make_directory_for_column_family(cfm->cf_name(), cfm->id());
}
for (auto&& cfm : dropped) {
service::migration_manager::notify_drop_column_family(cfm).get0();
}
}
});
}).get();
});
});
}).get();
}
#if 0
@@ -871,7 +905,7 @@ std::vector<mutation> make_create_keyspace_mutations(lw_shared_ptr<keyspace_meta
addTypeToSchemaMutation(type, timestamp, mutation);
#endif
for (auto&& kv : keyspace->cf_meta_data()) {
add_table_to_schema_mutation(kv.second, timestamp, true, pkey, mutations);
add_table_to_schema_mutation(kv.second, timestamp, true, mutations);
}
}
return mutations;
@@ -997,17 +1031,19 @@ std::vector<mutation> make_create_table_mutations(lw_shared_ptr<keyspace_metadat
{
// Include the serialized keyspace in case the target node missed a CREATE KEYSPACE migration (see CASSANDRA-5631).
auto mutations = make_create_keyspace_mutations(keyspace, timestamp, false);
schema_ptr s = keyspaces();
auto pkey = partition_key::from_singular(*s, keyspace->name());
add_table_to_schema_mutation(table, timestamp, true, pkey, mutations);
add_table_to_schema_mutation(table, timestamp, true, mutations);
return mutations;
}
void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestamp, bool with_columns_and_triggers, const partition_key& pkey, std::vector<mutation>& mutations)
schema_mutations make_table_mutations(schema_ptr table, api::timestamp_type timestamp, bool with_columns_and_triggers)
{
// When adding new schema properties, don't set cells for default values so that
// both old and new nodes will see the same version during rolling upgrades.
// For property that can be null (and can be changed), we insert tombstones, to make sure
// we don't keep a property the user has removed
schema_ptr s = columnfamilies();
auto pkey = partition_key::from_singular(*s, table->ks_name());
mutation m{pkey, s};
auto ckey = clustering_key::from_singular(*s, table->cf_name());
m.set_clustered_cell(ckey, "cf_id", table->id(), timestamp);
@@ -1066,16 +1102,24 @@ void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestam
if (table->compact_columns_count() == 1) {
m.set_clustered_cell(ckey, "value_alias", table->compact_column().name_as_text(), timestamp);
} // null if none
#if 0
for (Map.Entry<ColumnIdentifier, Long> entry : table.getDroppedColumns().entrySet())
adder.addMapEntry("dropped_columns", entry.getKey().toString(), entry.getValue());
#endif
map_type_impl::mutation dropped_columns;
auto dropped_columns_column = s->get_column_definition("dropped_columns");
assert(dropped_columns_column);
auto dropped_columns_type = static_pointer_cast<const map_type_impl>(dropped_columns_column->type);
for (auto&& entry : table->dropped_columns()) {
dropped_columns.cells.emplace_back(dropped_columns_type->get_keys_type()->decompose(data_value(entry.first)),
atomic_cell::make_live(timestamp, dropped_columns_type->get_values_type()->decompose(entry.second)));
}
m.set_clustered_cell(ckey, *dropped_columns_column,
atomic_cell_or_collection::from_collection_mutation(dropped_columns_type->serialize_mutation_form(std::move(dropped_columns))));
m.set_clustered_cell(ckey, "is_dense", table->is_dense(), timestamp);
mutation columns_mutation(pkey, columns());
if (with_columns_and_triggers) {
for (auto&& column : table->all_columns_in_select_order()) {
add_column_to_schema_mutation(table, column, timestamp, pkey, mutations);
add_column_to_schema_mutation(table, column, timestamp, columns_mutation);
}
#if 0
@@ -1083,42 +1127,51 @@ void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestam
addTriggerToSchemaMutation(table, trigger, timestamp, mutation);
#endif
}
mutations.emplace_back(std::move(m));
return schema_mutations{std::move(m), std::move(columns_mutation)};
}
#if 0
public static Mutation makeUpdateTableMutation(KSMetaData keyspace,
CFMetaData oldTable,
CFMetaData newTable,
long timestamp,
boolean fromThrift)
{
Mutation mutation = makeCreateKeyspaceMutation(keyspace, timestamp, false);
void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestamp, bool with_columns_and_triggers, std::vector<mutation>& mutations)
{
make_table_mutations(table, timestamp, with_columns_and_triggers).copy_to(mutations);
}
addTableToSchemaMutation(newTable, timestamp, false, mutation);
std::vector<mutation> make_update_table_mutations(lw_shared_ptr<keyspace_metadata> keyspace,
schema_ptr old_table,
schema_ptr new_table,
api::timestamp_type timestamp,
bool from_thrift)
{
// Include the serialized keyspace in case the target node missed a CREATE KEYSPACE migration (see CASSANDRA-5631).
auto mutations = make_create_keyspace_mutations(keyspace, timestamp, false);
MapDifference<ByteBuffer, ColumnDefinition> columnDiff = Maps.difference(oldTable.getColumnMetadata(),
newTable.getColumnMetadata());
add_table_to_schema_mutation(new_table, timestamp, false, mutations);
// columns that are no longer needed
for (ColumnDefinition column : columnDiff.entriesOnlyOnLeft().values())
{
// Thrift only knows about the REGULAR ColumnDefinition type, so don't consider other type
// are being deleted just because they are not here.
if (fromThrift && column.kind != ColumnDefinition.Kind.REGULAR)
continue;
mutation columns_mutation(partition_key::from_singular(*columns(), old_table->ks_name()), columns());
dropColumnFromSchemaMutation(oldTable, column, timestamp, mutation);
auto diff = difference(old_table->all_columns(), new_table->all_columns());
// columns that are no longer needed
for (auto&& name : diff.entries_only_on_left) {
// Thrift only knows about the REGULAR ColumnDefinition type, so don't consider other type
// are being deleted just because they are not here.
const column_definition& column = *old_table->all_columns().at(name);
if (from_thrift && !column.is_regular()) {
continue;
}
// newly added columns
for (ColumnDefinition column : columnDiff.entriesOnlyOnRight().values())
addColumnToSchemaMutation(newTable, column, timestamp, mutation);
drop_column_from_schema_mutation(old_table, column, timestamp, mutations);
}
// old columns with updated attributes
for (ByteBuffer name : columnDiff.entriesDiffering().keySet())
addColumnToSchemaMutation(newTable, newTable.getColumnDefinition(name), timestamp, mutation);
// newly added columns and old columns with updated attributes
for (auto&& name : boost::range::join(diff.entries_differing, diff.entries_only_on_right)) {
const column_definition& column = *new_table->all_columns().at(name);
add_column_to_schema_mutation(new_table, column, timestamp, columns_mutation);
}
mutations.emplace_back(std::move(columns_mutation));
warn(unimplemented::cause::TRIGGERS);
#if 0
MapDifference<String, TriggerDefinition> triggerDiff = Maps.difference(oldTable.getTriggers(), newTable.getTriggers());
// dropped triggers
@@ -1129,9 +1182,9 @@ void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestam
for (TriggerDefinition trigger : triggerDiff.entriesOnlyOnRight().values())
addTriggerToSchemaMutation(newTable, trigger, timestamp, mutation);
return mutation;
}
#endif
return mutations;
}
std::vector<mutation> make_drop_table_mutations(lw_shared_ptr<keyspace_metadata> keyspace, schema_ptr table, api::timestamp_type timestamp)
{
@@ -1159,13 +1212,39 @@ std::vector<mutation> make_drop_table_mutations(lw_shared_ptr<keyspace_metadata>
return mutations;
}
static future<schema_mutations> read_table_mutations(distributed<service::storage_proxy>& proxy, const qualified_name& table)
{
return read_schema_partition_for_table(proxy, COLUMNFAMILIES, table.keyspace_name, table.table_name)
.then([&proxy, table] (mutation cf_m) {
return read_schema_partition_for_table(proxy, COLUMNS, table.keyspace_name, table.table_name)
.then([cf_m = std::move(cf_m)] (mutation col_m) {
return schema_mutations{std::move(cf_m), std::move(col_m)};
});
#if 0
// FIXME:
Row serializedTriggers = readSchemaPartitionForTable(TRIGGERS, ksName, cfName);
try
{
for (TriggerDefinition trigger : createTriggersFromTriggersPartition(serializedTriggers))
cfm.addTriggerDefinition(trigger);
}
catch (InvalidRequestException e)
{
throw new RuntimeException(e);
}
#endif
});
}
future<schema_ptr> create_table_from_name(distributed<service::storage_proxy>& proxy, const sstring& keyspace, const sstring& table)
{
return read_schema_partition_for_table(proxy, COLUMNFAMILIES, keyspace, table).then([&proxy, keyspace, table] (auto partition) {
if (partition.second->empty()) {
throw std::runtime_error(sprint("%s:%s not found in the schema definitions keyspace.", keyspace, table));
}
return create_table_from_table_partition(proxy, std::move(partition.second));
return do_with(qualified_name(keyspace, table), [&proxy] (auto&& qn) {
return read_table_mutations(proxy, qn).then([qn] (schema_mutations sm) {
if (!sm.live()) {
throw std::runtime_error(sprint("%s:%s not found in the schema definitions keyspace.", qn.keyspace_name, qn.table_name));
}
return create_table_from_mutations(std::move(sm));
});
});
}
@@ -1194,18 +1273,6 @@ future<std::map<sstring, schema_ptr>> create_tables_from_tables_partition(distri
}
#endif
void create_table_from_table_row_and_columns_partition(schema_builder& builder, const query::result_set_row& table_row, const schema_result::value_type& serialized_columns)
{
create_table_from_table_row_and_column_rows(builder, table_row, serialized_columns.second);
}
future<schema_ptr> create_table_from_table_partition(distributed<service::storage_proxy>& proxy, lw_shared_ptr<query::result_set>&& partition)
{
return do_with(std::move(partition), [&proxy] (auto& partition) {
return create_table_from_table_row(proxy, partition->row(0));
});
}
/**
* Deserialize table metadata from low-level representation
*
@@ -1215,31 +1282,18 @@ future<schema_ptr> create_table_from_table_row(distributed<service::storage_prox
{
auto ks_name = row.get_nonnull<sstring>("keyspace_name");
auto cf_name = row.get_nonnull<sstring>("columnfamily_name");
auto id = row.get_nonnull<utils::UUID>("cf_id");
return read_schema_partition_for_table(proxy, COLUMNS, ks_name, cf_name).then([&row, ks_name, cf_name, id] (auto serialized_columns) {
schema_builder builder{ks_name, cf_name, id};
create_table_from_table_row_and_columns_partition(builder, row, serialized_columns);
return builder.build();
});
#if 0
// FIXME:
Row serializedTriggers = readSchemaPartitionForTable(TRIGGERS, ksName, cfName);
try
{
for (TriggerDefinition trigger : createTriggersFromTriggersPartition(serializedTriggers))
cfm.addTriggerDefinition(trigger);
}
catch (InvalidRequestException e)
{
throw new RuntimeException(e);
}
#endif
return create_table_from_name(proxy, ks_name, cf_name);
}
void create_table_from_table_row_and_column_rows(schema_builder& builder, const query::result_set_row& table_row, const schema_result::mapped_type& serialized_column_definitions)
schema_ptr create_table_from_mutations(schema_mutations sm, std::experimental::optional<table_schema_version> version)
{
auto table_rs = query::result_set(sm.columnfamilies_mutation());
query::result_set_row table_row = table_rs.row(0);
auto ks_name = table_row.get_nonnull<sstring>("keyspace_name");
auto cf_name = table_row.get_nonnull<sstring>("columnfamily_name");
auto id = table_row.get_nonnull<utils::UUID>("cf_id");
schema_builder builder{ks_name, cf_name, id};
#if 0
AbstractType<?> rawComparator = TypeParser.parse(result.getString("comparator"));
@@ -1257,11 +1311,12 @@ void create_table_from_table_row_and_column_rows(schema_builder& builder, const
AbstractType<?> fullRawComparator = CFMetaData.makeRawAbstractType(rawComparator, subComparator);
#endif
std::vector<column_definition> column_defs = create_columns_from_column_rows(serialized_column_definitions,
ks_name,
cf_name,/*,
fullRawComparator, */
cf == cf_type::super);
std::vector<column_definition> column_defs = create_columns_from_column_rows(
query::result_set(sm.columns_mutation()),
ks_name,
cf_name,/*,
fullRawComparator, */
cf == cf_type::super);
bool is_dense;
if (table_row.has("is_dense")) {
@@ -1272,8 +1327,10 @@ void create_table_from_table_row_and_column_rows(schema_builder& builder, const
throw std::runtime_error(sprint("%s not implemented", __PRETTY_FUNCTION__));
}
bool is_compound = cell_comparator::check_compound(table_row.get_nonnull<sstring>("comparator"));
auto comparator = table_row.get_nonnull<sstring>("comparator");
bool is_compound = cell_comparator::check_compound(comparator);
builder.set_is_compound(is_compound);
cell_comparator::read_collections(builder, comparator);
#if 0
CellNameType comparator = CellNames.fromAbstractType(fullRawComparator, isDense);
@@ -1365,13 +1422,22 @@ void create_table_from_table_row_and_column_rows(schema_builder& builder, const
builder.set_bloom_filter_fp_chance(builder.get_bloom_filter_fp_chance());
}
#if 0
if (result.has("dropped_columns"))
cfm.droppedColumns(convertDroppedColumns(result.getMap("dropped_columns", UTF8Type.instance, LongType.instance)));
#endif
if (table_row.has("dropped_columns")) {
auto map = table_row.get_nonnull<map_type_impl::native_type>("dropped_columns");
for (auto&& entry : map) {
builder.without_column(value_cast<sstring>(entry.first), value_cast<api::timestamp_type>(entry.second));
};
}
for (auto&& cdef : column_defs) {
builder.with_column(cdef);
}
if (version) {
builder.with_version(*version);
} else {
builder.with_version(sm.digest());
}
return builder.build();
}
#if 0
@@ -1391,12 +1457,9 @@ void create_table_from_table_row_and_column_rows(schema_builder& builder, const
void add_column_to_schema_mutation(schema_ptr table,
const column_definition& column,
api::timestamp_type timestamp,
const partition_key& pkey,
std::vector<mutation>& mutations)
mutation& m)
{
schema_ptr s = columns();
mutation m{pkey, s};
auto ckey = clustering_key::from_exploded(*s, {utf8_type->decompose(table->cf_name()), column.name()});
auto ckey = clustering_key::from_exploded(*m.schema(), {utf8_type->decompose(table->cf_name()), column.name()});
m.set_clustered_cell(ckey, "validator", column.type->name(), timestamp);
m.set_clustered_cell(ckey, "type", serialize_kind(column.kind), timestamp);
if (!column.is_on_all_components()) {
@@ -1407,7 +1470,6 @@ void add_column_to_schema_mutation(schema_ptr table,
adder.add("index_type", column.getIndexType() == null ? null : column.getIndexType().toString());
adder.add("index_options", json(column.getIndexOptions()));
#endif
mutations.emplace_back(std::move(m));
}
sstring serialize_kind(column_kind kind)
@@ -1448,14 +1510,14 @@ void drop_column_from_schema_mutation(schema_ptr table, const column_definition&
mutations.emplace_back(m);
}
std::vector<column_definition> create_columns_from_column_rows(const schema_result::mapped_type& rows,
std::vector<column_definition> create_columns_from_column_rows(const query::result_set& rows,
const sstring& keyspace,
const sstring& table, /*,
AbstractType<?> rawComparator, */
bool is_super)
{
std::vector<column_definition> columns;
for (auto&& row : rows->rows()) {
for (auto&& row : rows.rows()) {
columns.emplace_back(std::move(create_column_from_column_row(row, keyspace, table, /*, rawComparator, */ is_super)));
}
return columns;

View File

@@ -43,6 +43,8 @@
#include "service/storage_proxy.hh"
#include "mutation.hh"
#include "schema.hh"
#include "hashing.hh"
#include "schema_mutations.hh"
#include <vector>
#include <map>
@@ -92,17 +94,24 @@ std::vector<mutation> make_drop_keyspace_mutations(lw_shared_ptr<keyspace_metada
lw_shared_ptr<keyspace_metadata> create_keyspace_from_schema_partition(const schema_result_value_type& partition);
future<> merge_tables(distributed<service::storage_proxy>& proxy, schema_result&& before, schema_result&& after);
lw_shared_ptr<keyspace_metadata> create_keyspace_from_schema_partition(const schema_result_value_type& partition);
mutation make_create_keyspace_mutation(lw_shared_ptr<keyspace_metadata> keyspace, api::timestamp_type timestamp, bool with_tables_and_types_and_functions = true);
std::vector<mutation> make_create_table_mutations(lw_shared_ptr<keyspace_metadata> keyspace, schema_ptr table, api::timestamp_type timestamp);
std::vector<mutation> make_update_table_mutations(
lw_shared_ptr<keyspace_metadata> keyspace,
schema_ptr old_table,
schema_ptr new_table,
api::timestamp_type timestamp,
bool from_thrift);
schema_mutations make_table_mutations(schema_ptr table, api::timestamp_type timestamp, bool with_columns_and_triggers = true);
future<std::map<sstring, schema_ptr>> create_tables_from_tables_partition(distributed<service::storage_proxy>& proxy, const schema_result::mapped_type& result);
void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestamp, bool with_columns_and_triggers, const partition_key& pkey, std::vector<mutation>& mutations);
void add_table_to_schema_mutation(schema_ptr table, api::timestamp_type timestamp, bool with_columns_and_triggers, std::vector<mutation>& mutations);
std::vector<mutation> make_drop_table_mutations(lw_shared_ptr<keyspace_metadata> keyspace, schema_ptr table, api::timestamp_type timestamp);
@@ -110,13 +119,11 @@ future<schema_ptr> create_table_from_name(distributed<service::storage_proxy>& p
future<schema_ptr> create_table_from_table_row(distributed<service::storage_proxy>& proxy, const query::result_set_row& row);
void create_table_from_table_row_and_column_rows(schema_builder& builder, const query::result_set_row& table_row, const schema_result::mapped_type& serialized_columns);
future<schema_ptr> create_table_from_table_partition(distributed<service::storage_proxy>& proxy, lw_shared_ptr<query::result_set>&& partition);
schema_ptr create_table_from_mutations(schema_mutations, std::experimental::optional<table_schema_version> version = {});
void drop_column_from_schema_mutation(schema_ptr table, const column_definition& column, long timestamp, std::vector<mutation>& mutations);
std::vector<column_definition> create_columns_from_column_rows(const schema_result::mapped_type& rows,
std::vector<column_definition> create_columns_from_column_rows(const query::result_set& rows,
const sstring& keyspace,
const sstring& table,/*,
AbstractType<?> rawComparator, */
@@ -129,11 +136,25 @@ column_definition create_column_from_column_row(const query::result_set_row& row
bool is_super);
void add_column_to_schema_mutation(schema_ptr table, const column_definition& column, api::timestamp_type timestamp, const partition_key& pkey, std::vector<mutation>& mutations);
void add_column_to_schema_mutation(schema_ptr table, const column_definition& column, api::timestamp_type timestamp, mutation& mutation);
sstring serialize_kind(column_kind kind);
column_kind deserialize_kind(sstring kind);
data_type parse_type(sstring str);
schema_ptr columns();
schema_ptr columnfamilies();
template<typename Hasher>
void feed_hash_for_schema_digest(Hasher& h, const mutation& m) {
// Cassandra is skipping tombstones from digest calculation
// to avoid disagreements due to tombstone GC.
// See https://issues.apache.org/jira/browse/CASSANDRA-6862.
// We achieve similar effect with compact_for_compaction().
mutation m_compacted(m);
m_compacted.partition().compact_for_compaction(*m.schema(), api::max_timestamp, gc_clock::time_point::max());
feed_hash(h, m_compacted);
}
} // namespace schema_tables
} // namespace db

View File

@@ -69,6 +69,11 @@ void db::serializer<bytes>::read(bytes& b, input& in) {
b = in.read<bytes>();
}
template<>
void db::serializer<bytes>::skip(input& in) {
in.read<bytes>(); // FIXME: Avoid reading
}
template<>
db::serializer<bytes_view>::serializer(const bytes_view& v)
: _item(v), _size(output::serialized_size(v)) {
@@ -104,6 +109,11 @@ void db::serializer<sstring>::read(sstring& s, input& in) {
s = in.read<sstring>();
}
template<>
void db::serializer<sstring>::skip(input& in) {
in.read<sstring>(); // FIXME: avoid reading
}
template<>
db::serializer<tombstone>::serializer(const tombstone& t)
: _item(t), _size(sizeof(t.timestamp) + sizeof(decltype(t.deletion_time.time_since_epoch().count()))) {
@@ -157,81 +167,6 @@ void db::serializer<collection_mutation_view>::read(collection_mutation_view& c,
c = collection_mutation_view::from_bytes(bytes_view_serializer::read(in));
}
template<>
db::serializer<partition_key_view>::serializer(const partition_key_view& key)
: _item(key), _size(sizeof(uint16_t) /* size */ + key.representation().size()) {
}
template<>
void db::serializer<partition_key_view>::write(output& out, const partition_key_view& key) {
bytes_view v = key.representation();
out.write<uint16_t>(v.size());
out.write(v.begin(), v.end());
}
template<>
void db::serializer<partition_key_view>::read(partition_key_view& b, input& in) {
auto len = in.read<uint16_t>();
b = partition_key_view::from_bytes(in.read_view(len));
}
template<>
partition_key_view db::serializer<partition_key_view>::read(input& in) {
auto len = in.read<uint16_t>();
return partition_key_view::from_bytes(in.read_view(len));
}
template<>
void db::serializer<partition_key_view>::skip(input& in) {
auto len = in.read<uint16_t>();
in.skip(len);
}
template<>
db::serializer<clustering_key_prefix_view>::serializer(const clustering_key_prefix_view& key)
: _item(key), _size(sizeof(uint16_t) /* size */ + key.representation().size()) {
}
template<>
void db::serializer<clustering_key_prefix_view>::write(output& out, const clustering_key_prefix_view& key) {
bytes_view v = key.representation();
out.write<uint16_t>(v.size());
out.write(v.begin(), v.end());
}
template<>
void db::serializer<clustering_key_prefix_view>::read(clustering_key_prefix_view& b, input& in) {
auto len = in.read<uint16_t>();
b = clustering_key_prefix_view::from_bytes(in.read_view(len));
}
template<>
clustering_key_prefix_view db::serializer<clustering_key_prefix_view>::read(input& in) {
auto len = in.read<uint16_t>();
return clustering_key_prefix_view::from_bytes(in.read_view(len));
}
template<>
db::serializer<frozen_mutation>::serializer(const frozen_mutation& mutation)
: _item(mutation), _size(sizeof(uint32_t) /* size */ + mutation.representation().size()) {
}
template<>
void db::serializer<frozen_mutation>::write(output& out, const frozen_mutation& mutation) {
bytes_view v = mutation.representation();
out.write(v);
}
template<>
void db::serializer<frozen_mutation>::read(frozen_mutation& m, input& in) {
m = read(in);
}
template<>
frozen_mutation db::serializer<frozen_mutation>::read(input& in) {
return frozen_mutation(bytes_serializer::read(in));
}
template<>
db::serializer<db::replay_position>::serializer(const db::replay_position& rp)
: _item(rp), _size(sizeof(uint64_t) * 2) {
@@ -256,7 +191,4 @@ template class db::serializer<sstring> ;
template class db::serializer<atomic_cell_view> ;
template class db::serializer<collection_mutation_view> ;
template class db::serializer<utils::UUID> ;
template class db::serializer<partition_key_view> ;
template class db::serializer<clustering_key_prefix_view> ;
template class db::serializer<frozen_mutation> ;
template class db::serializer<db::replay_position> ;

View File

@@ -28,9 +28,7 @@
#include "utils/data_output.hh"
#include "bytes_ostream.hh"
#include "bytes.hh"
#include "keys.hh"
#include "database_fwd.hh"
#include "frozen_mutation.hh"
#include "db/commitlog/replay_position.hh"
namespace db {
@@ -180,6 +178,7 @@ template<> utils::UUID serializer<utils::UUID>::read(input&);
template<> serializer<bytes>::serializer(const bytes &);
template<> void serializer<bytes>::write(output&, const type&);
template<> void serializer<bytes>::read(bytes&, input&);
template<> void serializer<bytes>::skip(input&);
template<> serializer<bytes_view>::serializer(const bytes_view&);
template<> void serializer<bytes_view>::write(output&, const type&);
@@ -189,6 +188,7 @@ template<> bytes_view serializer<bytes_view>::read(input&);
template<> serializer<sstring>::serializer(const sstring&);
template<> void serializer<sstring>::write(output&, const type&);
template<> void serializer<sstring>::read(sstring&, input&);
template<> void serializer<sstring>::skip(input&);
template<> serializer<tombstone>::serializer(const tombstone &);
template<> void serializer<tombstone>::write(output&, const type&);
@@ -203,22 +203,6 @@ template<> serializer<collection_mutation_view>::serializer(const collection_mut
template<> void serializer<collection_mutation_view>::write(output&, const type&);
template<> void serializer<collection_mutation_view>::read(collection_mutation_view&, input&);
template<> serializer<frozen_mutation>::serializer(const frozen_mutation &);
template<> void serializer<frozen_mutation>::write(output&, const type&);
template<> void serializer<frozen_mutation>::read(frozen_mutation&, input&);
template<> frozen_mutation serializer<frozen_mutation>::read(input&);
template<> serializer<partition_key_view>::serializer(const partition_key_view &);
template<> void serializer<partition_key_view>::write(output&, const partition_key_view&);
template<> void serializer<partition_key_view>::read(partition_key_view&, input&);
template<> partition_key_view serializer<partition_key_view>::read(input&);
template<> void serializer<partition_key_view>::skip(input&);
template<> serializer<clustering_key_prefix_view>::serializer(const clustering_key_prefix_view &);
template<> void serializer<clustering_key_prefix_view>::write(output&, const clustering_key_prefix_view&);
template<> void serializer<clustering_key_prefix_view>::read(clustering_key_prefix_view&, input&);
template<> clustering_key_prefix_view serializer<clustering_key_prefix_view>::read(input&);
template<> serializer<db::replay_position>::serializer(const db::replay_position&);
template<> void serializer<db::replay_position>::write(output&, const db::replay_position&);
template<> void serializer<db::replay_position>::read(db::replay_position&, input&);
@@ -235,9 +219,6 @@ extern template class serializer<bytes>;
extern template class serializer<bytes_view>;
extern template class serializer<sstring>;
extern template class serializer<utils::UUID>;
extern template class serializer<partition_key_view>;
extern template class serializer<clustering_key_view>;
extern template class serializer<clustering_key_prefix_view>;
extern template class serializer<db::replay_position>;
typedef serializer<tombstone> tombstone_serializer;
@@ -247,10 +228,6 @@ typedef serializer<sstring> sstring_serializer;
typedef serializer<atomic_cell_view> atomic_cell_view_serializer;
typedef serializer<collection_mutation_view> collection_mutation_view_serializer;
typedef serializer<utils::UUID> uuid_serializer;
typedef serializer<partition_key_view> partition_key_view_serializer;
typedef serializer<clustering_key_view> clustering_key_view_serializer;
typedef serializer<clustering_key_prefix_view> clustering_key_prefix_view_serializer;
typedef serializer<frozen_mutation> frozen_mutation_serializer;
typedef serializer<db::replay_position> replay_position_serializer;
}

View File

@@ -63,6 +63,8 @@
#include "partition_slice_builder.hh"
#include "db/config.hh"
#include "schema_builder.hh"
#include "md5_hasher.hh"
#include "release.hh"
#include <core/enum.hh>
using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
@@ -73,6 +75,23 @@ std::unique_ptr<query_context> qctx = {};
namespace system_keyspace {
static const api::timestamp_type creation_timestamp = api::new_timestamp();
api::timestamp_type schema_creation_timestamp() {
return creation_timestamp;
}
// Increase whenever changing schema of any system table.
// FIXME: Make automatic by calculating from schema structure.
static const uint16_t version_sequence_number = 1;
table_schema_version generate_schema_version(utils::UUID table_id) {
md5_hasher h;
feed_hash(h, table_id);
feed_hash(h, version_sequence_number);
return utils::UUID_gen::get_name_UUID(h.finalize());
}
// Currently, the type variables (uuid_type, etc.) are thread-local reference-
// counted shared pointers. This forces us to also make the built in schemas
// below thread-local as well.
@@ -101,6 +120,7 @@ schema_ptr hints() {
)));
builder.set_gc_grace_seconds(0);
builder.set_compaction_strategy_options({{ "enabled", "false" }});
builder.with_version(generate_schema_version(builder.uuid()));
return builder.build(schema_builder::compact_storage::yes);
}();
return hints;
@@ -126,6 +146,7 @@ schema_ptr batchlog() {
// .compactionStrategyOptions(Collections.singletonMap("min_threshold", "2"))
)));
builder.set_gc_grace_seconds(0);
builder.with_version(generate_schema_version(builder.uuid()));
return builder.build(schema_builder::compact_storage::no);
}();
return batchlog;
@@ -150,6 +171,7 @@ schema_ptr batchlog() {
// operations on resulting CFMetaData:
// .compactionStrategyClass(LeveledCompactionStrategy.class);
)));
builder.with_version(generate_schema_version(builder.uuid()));
return builder.build(schema_builder::compact_storage::no);
}();
return paxos;
@@ -171,6 +193,7 @@ schema_ptr built_indexes() {
// comment
"built column indexes"
)));
builder.with_version(generate_schema_version(builder.uuid()));
return builder.build(schema_builder::compact_storage::yes);
}();
return built_indexes;
@@ -212,6 +235,7 @@ schema_ptr built_indexes() {
// comment
"information about the local node"
)));
builder.with_version(generate_schema_version(builder.uuid()));
return builder.build(schema_builder::compact_storage::no);
}();
return local;
@@ -242,6 +266,7 @@ schema_ptr built_indexes() {
// comment
"information about known peers in the cluster"
)));
builder.with_version(generate_schema_version(builder.uuid()));
return builder.build(schema_builder::compact_storage::no);
}();
return peers;
@@ -265,6 +290,7 @@ schema_ptr built_indexes() {
// comment
"events related to peers"
)));
builder.with_version(generate_schema_version(builder.uuid()));
return builder.build(schema_builder::compact_storage::no);
}();
return peer_events;
@@ -286,6 +312,7 @@ schema_ptr built_indexes() {
// comment
"ranges requested for transfer"
)));
builder.with_version(generate_schema_version(builder.uuid()));
return builder.build(schema_builder::compact_storage::no);
}();
return range_xfers;
@@ -311,6 +338,7 @@ schema_ptr built_indexes() {
// comment
"unfinished compactions"
)));
builder.with_version(generate_schema_version(builder.uuid()));
return builder.build(schema_builder::compact_storage::no);
}();
return compactions_in_progress;
@@ -340,6 +368,7 @@ schema_ptr built_indexes() {
"week-long compaction history"
)));
builder.set_default_time_to_live(std::chrono::duration_cast<std::chrono::seconds>(days(7)));
builder.with_version(generate_schema_version(builder.uuid()));
return builder.build(schema_builder::compact_storage::no);
}();
return compaction_history;
@@ -368,6 +397,7 @@ schema_ptr built_indexes() {
// comment
"historic sstable read rates"
)));
builder.with_version(generate_schema_version(builder.uuid()));
return builder.build(schema_builder::compact_storage::no);
}();
return sstable_activity;
@@ -393,6 +423,7 @@ schema_ptr size_estimates() {
"per-table primary range size estimates"
)));
builder.set_gc_grace_seconds(0);
builder.with_version(generate_schema_version(builder.uuid()));
return builder.build(schema_builder::compact_storage::no);
}();
return size_estimates;
@@ -513,7 +544,6 @@ future<> setup(distributed<database>& db, distributed<cql3::query_processor>& qp
return ms.init_local_preferred_ip_cache();
});
});
return make_ready_future<>();
}
typedef std::pair<replay_positions, db_clock::time_point> truncation_entry;
@@ -985,8 +1015,9 @@ query_mutations(distributed<service::storage_proxy>& proxy, const sstring& cf_na
database& db = proxy.local().get_db().local();
schema_ptr schema = db.find_schema(db::system_keyspace::NAME, cf_name);
auto slice = partition_slice_builder(*schema).build();
auto cmd = make_lw_shared<query::read_command>(schema->id(), std::move(slice), std::numeric_limits<uint32_t>::max());
return proxy.local().query_mutations_locally(cmd, query::full_partition_range);
auto cmd = make_lw_shared<query::read_command>(schema->id(), schema->version(),
std::move(slice), std::numeric_limits<uint32_t>::max());
return proxy.local().query_mutations_locally(std::move(schema), std::move(cmd), query::full_partition_range);
}
future<lw_shared_ptr<query::result_set>>
@@ -994,7 +1025,8 @@ query(distributed<service::storage_proxy>& proxy, const sstring& cf_name) {
database& db = proxy.local().get_db().local();
schema_ptr schema = db.find_schema(db::system_keyspace::NAME, cf_name);
auto slice = partition_slice_builder(*schema).build();
auto cmd = make_lw_shared<query::read_command>(schema->id(), std::move(slice), std::numeric_limits<uint32_t>::max());
auto cmd = make_lw_shared<query::read_command>(schema->id(), schema->version(),
std::move(slice), std::numeric_limits<uint32_t>::max());
return proxy.local().query(schema, cmd, {query::full_partition_range}, db::consistency_level::ONE).then([schema, cmd] (auto&& result) {
return make_lw_shared(query::result_set::from_raw_result(schema, cmd->slice, *result));
});
@@ -1008,7 +1040,7 @@ query(distributed<service::storage_proxy>& proxy, const sstring& cf_name, const
auto slice = partition_slice_builder(*schema)
.with_range(std::move(row_range))
.build();
auto cmd = make_lw_shared<query::read_command>(schema->id(), std::move(slice), query::max_rows);
auto cmd = make_lw_shared<query::read_command>(schema->id(), schema->version(), std::move(slice), query::max_rows);
return proxy.local().query(schema, cmd, {query::partition_range::make_singular(key)}, db::consistency_level::ONE).then([schema, cmd] (auto&& result) {
return make_lw_shared(query::result_set::from_raw_result(schema, cmd->slice, *result));
});

View File

@@ -84,6 +84,8 @@ extern schema_ptr hints();
extern schema_ptr batchlog();
extern schema_ptr built_indexes(); // TODO (from Cassandra): make private
table_schema_version generate_schema_version(utils::UUID table_id);
// Only for testing.
void minimal_setup(distributed<database>& db, distributed<cql3::query_processor>& qp);
@@ -673,5 +675,7 @@ future<> set_bootstrap_state(bootstrap_state state);
executeInternal(String.format(cql, SSTABLE_ACTIVITY), keyspace, table, generation);
}
#endif
api::timestamp_type schema_creation_timestamp();
} // namespace system_keyspace
} // namespace db

View File

@@ -251,39 +251,6 @@ std::ostream& operator<<(std::ostream& out, const ring_position& pos) {
return out << "}";
}
size_t ring_position::serialized_size() const {
size_t size = serialize_int32_size; /* _key length */
if (_key) {
size += _key.value().representation().size();
} else {
size += sizeof(int8_t); /* _token_bund */
}
return size + _token.serialized_size();
}
void ring_position::serialize(bytes::iterator& out) const {
_token.serialize(out);
if (_key) {
auto v = _key.value().representation();
serialize_int32(out, v.size());
out = std::copy(v.begin(), v.end(), out);
} else {
serialize_int32(out, 0);
serialize_int8(out, static_cast<int8_t>(_token_bound));
}
}
ring_position ring_position::deserialize(bytes_view& in) {
auto token = token::deserialize(in);
auto size = read_simple<uint32_t>(in);
if (size == 0) {
auto bound = dht::ring_position::token_bound(read_simple<int8_t>(in));
return ring_position(std::move(token), bound);
} else {
return ring_position(std::move(token), partition_key::from_bytes(to_bytes(read_simple_bytes(in, size))));
}
}
unsigned shard_of(const token& t) {
return global_partitioner().shard_of(t);
}

View File

@@ -338,6 +338,12 @@ public:
, _key(std::experimental::make_optional(std::move(key)))
{ }
ring_position(dht::token token, token_bound bound, std::experimental::optional<partition_key> key)
: _token(std::move(token))
, _token_bound(bound)
, _key(std::move(key))
{ }
ring_position(const dht::decorated_key& dk)
: _token(dk._token)
, _key(std::experimental::make_optional(dk._key))
@@ -379,10 +385,6 @@ public:
// "less" comparator corresponding to tri_compare()
bool less_compare(const schema&, const ring_position&) const;
size_t serialized_size() const;
void serialize(bytes::iterator& out) const;
static ring_position deserialize(bytes_view& in);
friend std::ostream& operator<<(std::ostream&, const ring_position&);
};

View File

@@ -107,7 +107,7 @@ public:
, _tokens(std::move(tokens))
, _address(address)
, _description(std::move(description))
, _stream_plan(_description, true) {
, _stream_plan(_description) {
}
range_streamer(distributed<database>& db, token_metadata& tm, inet_address address, sstring description)

33
dist/ami/build_ami.sh vendored
View File

@@ -5,15 +5,22 @@ if [ ! -e dist/ami/build_ami.sh ]; then
exit 1
fi
TARGET_JSON=scylla.json
if [ "$1" != "" ]; then
TARGET_JSON=$1
fi
if [ ! -f dist/ami/$TARGET_JSON ]; then
echo "dist/ami/$TARGET_JSON does not found"
print_usage() {
echo "build_ami.sh -l"
echo " -l deploy locally built rpms"
exit 1
fi
}
LOCALRPM=0
while getopts lh OPT; do
case "$OPT" in
"l")
LOCALRPM=1
;;
"h")
print_usage
;;
esac
done
cd dist/ami
@@ -30,4 +37,12 @@ if [ ! -d packer ]; then
cd -
fi
packer/packer build -var-file=variables.json $TARGET_JSON
if [ $LOCALRPM = 0 ]; then
echo "sudo yum remove -y abrt; sudo sh -x -e /home/centos/scylla_install_pkg; sudo sh -x -e /usr/lib/scylla/scylla_setup -a" > scylla_deploy.sh
else
echo "sudo yum remove -y abrt; sudo sh -x -e /home/centos/scylla_install_pkg -l /home/centos; sudo sh -x -e /usr/lib/scylla/scylla_setup -a" > scylla_deploy.sh
fi
chmod a+rx scylla_deploy.sh
packer/packer build -var-file=variables.json scylla.json

View File

@@ -5,26 +5,27 @@ if [ ! -e dist/ami/build_ami_local.sh ]; then
exit 1
fi
rm -rf build/*
sudo yum -y install git
if [ ! -f dist/ami/scylla-server.x86_64.rpm ]; then
if [ ! -f dist/ami/files/scylla-server.x86_64.rpm ]; then
dist/redhat/build_rpm.sh
cp build/rpms/scylla-server-`cat build/SCYLLA-VERSION-FILE`-`cat build/SCYLLA-RELEASE-FILE`.*.x86_64.rpm dist/ami/scylla-server.x86_64.rpm
cp build/rpmbuild/RPMS/x86_64/scylla-server-`cat build/SCYLLA-VERSION-FILE`-`cat build/SCYLLA-RELEASE-FILE`.*.x86_64.rpm dist/ami/files/scylla-server.x86_64.rpm
fi
if [ ! -f dist/ami/scylla-jmx.noarch.rpm ]; then
if [ ! -f dist/ami/files/scylla-jmx.noarch.rpm ]; then
cd build
git clone --depth 1 https://github.com/scylladb/scylla-jmx.git
cd scylla-jmx
sh -x -e dist/redhat/build_rpm.sh
sh -x -e dist/redhat/build_rpm.sh $*
cd ../..
cp build/scylla-jmx/build/rpms/scylla-jmx-`cat build/scylla-jmx/build/SCYLLA-VERSION-FILE`-`cat build/scylla-jmx/build/SCYLLA-RELEASE-FILE`.*.noarch.rpm dist/ami/scylla-jmx.noarch.rpm
cp build/scylla-jmx/build/rpmbuild/RPMS/noarch/scylla-jmx-`cat build/scylla-jmx/build/SCYLLA-VERSION-FILE`-`cat build/scylla-jmx/build/SCYLLA-RELEASE-FILE`.*.noarch.rpm dist/ami/files/scylla-jmx.noarch.rpm
fi
if [ ! -f dist/ami/scylla-tools.noarch.rpm ]; then
if [ ! -f dist/ami/files/scylla-tools.noarch.rpm ]; then
cd build
git clone --depth 1 https://github.com/scylladb/scylla-tools-java.git
cd scylla-tools-java
sh -x -e dist/redhat/build_rpm.sh
cd ../..
cp build/scylla-tools-java/build/rpms/scylla-tools-`cat build/scylla-tools-java/build/SCYLLA-VERSION-FILE`-`cat build/scylla-tools-java/build/SCYLLA-RELEASE-FILE`.*.noarch.rpm dist/ami/scylla-tools.noarch.rpm
cp build/scylla-tools-java/build/rpmbuild/RPMS/noarch/scylla-tools-`cat build/scylla-tools-java/build/SCYLLA-VERSION-FILE`-`cat build/scylla-tools-java/build/SCYLLA-RELEASE-FILE`.*.noarch.rpm dist/ami/files/scylla-tools.noarch.rpm
fi
exec dist/ami/build_ami.sh scylla_local.json
exec dist/ami/build_ami.sh -l

View File

@@ -23,7 +23,7 @@ echo ' |___/ '
echo ''
echo ''
echo 'Nodetool:'
echo ' nodetool --help'
echo ' nodetool help'
echo 'CQL Shell:'
echo ' cqlsh'
echo 'More documentation available at: '
@@ -35,6 +35,7 @@ if [ "`systemctl is-active scylla-server`" = "active" ]; then
tput bold
echo " ScyllaDB is active."
tput sgr0
echo
else
tput setaf 1
tput bold
@@ -42,4 +43,5 @@ else
tput sgr0
echo "Please wait for startup. To see status of ScyllaDB, run "
echo " 'systemctl status scylla-server'"
echo
fi

30
dist/ami/scylla.json vendored
View File

@@ -8,34 +8,34 @@
"security_group_id": "{{user `security_group_id`}}",
"region": "{{user `region`}}",
"associate_public_ip_address": "{{user `associate_public_ip_address`}}",
"source_ami": "ami-a51564c0",
"source_ami": "ami-8ef1d6e4",
"user_data_file": "user_data.txt",
"instance_type": "{{user `instance_type`}}",
"ssh_username": "fedora",
"ssh_username": "centos",
"ssh_timeout": "5m",
"ami_name": "scylla_{{isotime | clean_ami_name}}"
"ami_name": "scylla_{{isotime | clean_ami_name}}",
"launch_block_device_mappings": [
{
"device_name": "/dev/sda1",
"volume_size": 10
}
]
}
],
"provisioners": [
{
"type": "file",
"source": "files/scylla-ami",
"destination": "/home/fedora/scylla-ami"
"source": "files/",
"destination": "/home/centos/"
},
{
"type": "file",
"source": "files/.bash_profile",
"destination": "/home/fedora/.bash_profile"
},
{
"type": "file",
"source": "../../scripts/scylla_install",
"destination": "/home/fedora/scylla_install"
"source": "../../scripts/scylla_install_pkg",
"destination": "/home/centos/scylla_install_pkg"
},
{
"type": "shell",
"inline": [
"sudo sh -x -e /home/fedora/scylla_install -a"
]
"script": "scylla_deploy.sh"
}
],
"variables": {

View File

@@ -1,67 +0,0 @@
{
"builders": [
{
"type": "amazon-ebs",
"access_key": "{{user `access_key`}}",
"secret_key": "{{user `secret_key`}}",
"subnet_id": "{{user `subnet_id`}}",
"security_group_id": "{{user `security_group_id`}}",
"region": "{{user `region`}}",
"associate_public_ip_address": "{{user `associate_public_ip_address`}}",
"source_ami": "ami-a51564c0",
"instance_type": "{{user `instance_type`}}",
"ssh_username": "fedora",
"ssh_timeout": "5m",
"ami_name": "scylla_{{isotime | clean_ami_name}}"
}
],
"provisioners": [
{
"type": "file",
"source": "files/scylla-ami",
"destination": "/home/fedora/scylla-ami"
},
{
"type": "file",
"source": "files/.bash_profile",
"destination": "/home/fedora/.bash_profile"
},
{
"type": "file",
"source": "../../scripts/scylla_install",
"destination": "/home/fedora/scylla_install"
},
{
"type": "file",
"source": "scylla-server.x86_64.rpm",
"destination": "/home/fedora/scylla-server.x86_64.rpm"
},
{
"type": "file",
"source": "scylla-jmx.noarch.rpm",
"destination": "/home/fedora/scylla-jmx.noarch.rpm"
},
{
"type": "file",
"source": "scylla-tools.noarch.rpm",
"destination": "/home/fedora/scylla-tools.noarch.rpm"
},
{
"type": "shell",
"inline": [
"sudo yum install -y /home/fedora/scylla-server.x86_64.rpm /home/fedora/scylla-jmx.noarch.rpm /home/fedora/scylla-tools.noarch.rpm",
"sudo mv /home/fedora/scylla-ami /usr/lib/scylla/scylla-ami",
"sudo sh -x -e /home/fedora/scylla_install -a -l /home/fedora"
]
}
],
"variables": {
"access_key": "",
"secret_key": "",
"subnet_id": "",
"security_group_id": "",
"region": "",
"associate_public_ip_address": "",
"instance_type": ""
}
}

2
dist/ami/user_data.txt vendored Normal file
View File

@@ -0,0 +1,2 @@
#!/bin/sh
sed -i 's/Defaults requiretty/#Defaults requiretty/g' /etc/sudoers

View File

@@ -2,47 +2,22 @@
#
# Copyright (C) 2015 ScyllaDB
print_usage() {
echo "scylla_bootparam_setup -a"
echo " -a AMI instance mode"
exit 1
}
AMI=0
while getopts a OPT; do
case "$OPT" in
"a")
AMI=1
;;
"h")
print_usage
;;
esac
done
. /etc/os-release
if [ $AMI -eq 1 ]; then
. /etc/sysconfig/scylla-server
sed -e "s#append #append clocksource=tsc tsc=reliable hugepagesz=2M hugepages=$NR_HUGEPAGES #" /boot/extlinux/extlinux.conf > /tmp/extlinux.conf
mv /tmp/extlinux.conf /boot/extlinux/extlinux.conf
else
. /etc/sysconfig/scylla-server
if [ ! -f /etc/default/grub ]; then
echo "Unsupported bootloader"
exit 1
fi
if [ "`grep hugepagesz /etc/default/grub`" != "" ] || [ "`grep hugepages /etc/default/grub`" != "" ]; then
sed -e "s#hugepagesz=2M ##" /etc/default/grub > /tmp/grub
mv /tmp/grub /etc/default/grub
sed -e "s#hugepages=[0-9]* ##" /etc/default/grub > /tmp/grub
mv /tmp/grub /etc/default/grub
fi
sed -e "s#^GRUB_CMDLINE_LINUX=\"#GRUB_CMDLINE_LINUX=\"hugepagesz=2M hugepages=$NR_HUGEPAGES #" /etc/default/grub > /tmp/grub
mv /tmp/grub /etc/default/grub
if [ "$ID" = "ubuntu" ]; then
grub2-mkconfig -o /boot/grub/grub.cfg
else
grub2-mkconfig -o /boot/grub2/grub.cfg
fi
if [ ! -f /etc/default/grub ]; then
echo "Unsupported bootloader"
exit 1
fi
if [ "`grep hugepagesz /etc/default/grub`" != "" ] || [ "`grep hugepages /etc/default/grub`" != "" ]; then
sed -e "s#hugepagesz=2M ##" /etc/default/grub > /tmp/grub
mv /tmp/grub /etc/default/grub
sed -e "s#hugepages=[0-9]* ##" /etc/default/grub > /tmp/grub
mv /tmp/grub /etc/default/grub
fi
sed -e "s#^GRUB_CMDLINE_LINUX=\"#GRUB_CMDLINE_LINUX=\"hugepagesz=2M hugepages=$NR_HUGEPAGES #" /etc/default/grub > /tmp/grub
mv /tmp/grub /etc/default/grub
if [ "$ID" = "ubuntu" ]; then
grub-mkconfig -o /boot/grub/grub.cfg
else
grub2-mkconfig -o /boot/grub2/grub.cfg
fi

View File

@@ -2,15 +2,43 @@
#
# Copyright (C) 2015 ScyllaDB
print_usage() {
echo "scylla_coredump_setup -s"
echo " -s store coredump to /var/lib/scylla"
exit 1
}
SYMLINK=0
while getopts sh OPT; do
case "$OPT" in
"s")
SYMLINK=1
;;
"h")
print_usage
;;
esac
done
. /etc/os-release
if [ "$ID" = "ubuntu" ]; then
apt-get remove -y apport-noui
sysctl -p /etc/sysctl.d/99-scylla.conf
else
if [ -f /etc/systemd/coredump.conf ]; then
mv /etc/systemd/coredump.conf /etc/systemd/coredump.conf.save
systemctl daemon-reload
cat << EOS > /etc/systemd/coredump.conf
[Coredump]
Storage=external
Compress=yes
ProcessSizeMax=1024G
ExternalSizeMax=1024G
EOS
if [ $SYMLINK = 1 ]; then
rm -rf /var/lib/systemd/coredump
ln -sf /var/lib/scylla/coredump /var/lib/systemd/coredump
fi
systemctl daemon-reload
echo "kernel.core_pattern=|/usr/lib/systemd/systemd-coredump %p %u %g %s %t %e" > /etc/sysctl.d/99-coredump.conf
sysctl -p /etc/sysctl.d/99-coredump.conf
fi
sysctl -p /etc/sysctl.d/99-scylla.conf

View File

@@ -29,10 +29,13 @@ if [ "$NAME" = "Ubuntu" ]; then
else
yum install -y ntp ntpdate || true
if [ $AMI -eq 1 ]; then
sed -e s#fedora.pool.ntp.org#amazon.pool.ntp.org# /etc/ntp.conf > /tmp/ntp.conf
sed -e s#centos.pool.ntp.org#amazon.pool.ntp.org# /etc/ntp.conf > /tmp/ntp.conf
mv /tmp/ntp.conf /etc/ntp.conf
fi
systemctl enable ntpd.service
if [ "`systemctl is-active ntpd`" = "active" ]; then
systemctl stop ntpd.service
fi
ntpdate `cat /etc/ntp.conf |grep "^server"|head -n1|awk '{print $2}'`
systemctl enable ntpd.service
systemctl start ntpd.service
fi

View File

@@ -30,8 +30,6 @@ if [ "$AMI" = "yes" ]; then
if [ "$DISKS" != "" ]; then
/usr/lib/scylla/scylla_raid_setup -d $DISKS
else
echo "ERROR: Scylla is not using XFS to store data. The scylla service will refuse to start." > /home/fedora/SCYLLA_SETUP_ERROR.LOG
fi
/usr/lib/scylla/scylla-ami/ds2_configure.py

View File

@@ -43,6 +43,13 @@ if [ "`mount|grep /var/lib/scylla`" != "" ]; then
echo "/var/lib/scylla is already mounted"
exit 1
fi
. /etc/os-release
if [ "$NAME" = "Ubuntu" ]; then
apt-get -y install mdadm xfsprogs
else
yum -y install mdadm xfsprogs
fi
mdadm --create --verbose --force --run $RAID --level=0 -c256 --raid-devices=$NR_DISK $DISKS
blockdev --setra 65536 $RAID
mkfs.xfs $RAID -f

76
dist/common/scripts/scylla_setup vendored Executable file
View File

@@ -0,0 +1,76 @@
#!/bin/sh -e
#
# Copyright (C) 2015 ScyllaDB
if [ "`id -u`" -ne 0 ]; then
echo "Requires root permission."
exit 1
fi
print_usage() {
echo "scylla_setup -d /dev/hda,/dev/hdb... -n eth0 -a"
echo " -d specify disks for RAID"
echo " -n specify NIC"
echo " -a setup AMI instance"
exit 1
}
NIC=eth0
RAID=/dev/md0
AMI=0
while getopts d:n:al:h OPT; do
case "$OPT" in
"n")
NIC=$OPTARG
;;
"d")
DISKS=$OPTARG
;;
"a")
AMI=1
;;
"h")
print_usage
;;
esac
done
SYSCONFIG_SETUP_ARGS="-n $NIC"
. /etc/os-release
if [ "$ID" != "ubuntu" ]; then
if [ "`sestatus | awk '{print $3}'`" != "disabled" ]; then
setenforce 0
sed -e "s/enforcing/disabled/" /etc/sysconfig/selinux > /tmp/selinux
mv /tmp/selinux /etc/sysconfig/
fi
if [ $AMI -eq 1 ]; then
SYSCONFIG_SETUP_ARGS="$SYSCONFIG_SETUP_ARGS -N -a"
if [ "$LOCAL_PKG" = "" ]; then
yum update -y
else
SYSCONFIG_SETUP_ARGS="$SYSCONFIG_SETUP_ARGS -k"
fi
grep -v ' - mounts' /etc/cloud/cloud.cfg > /tmp/cloud.cfg
mv /tmp/cloud.cfg /etc/cloud/cloud.cfg
mv /home/centos/scylla-ami /usr/lib/scylla/scylla-ami
chmod a+rx /usr/lib/scylla/scylla-ami/ds2_configure.py
fi
systemctl enable scylla-server.service
systemctl enable scylla-jmx.service
fi
if [ $AMI -eq 0 ]; then
/usr/lib/scylla/scylla_ntp_setup
/usr/lib/scylla/scylla_bootparam_setup
if [ $DISKS != "" ]; then
/usr/lib/scylla/scylla_raid_setup -d $DISKS -u
/usr/lib/scylla/scylla_coredump_setup -s
else
/usr/lib/scylla/scylla_coredump_setup
fi
else
/usr/lib/scylla/scylla_coredump_setup -s
/usr/lib/scylla/scylla_ntp_setup -a
/usr/lib/scylla/scylla_bootparam_setup -a
fi
/usr/lib/scylla/scylla_sysconfig_setup $SYSCONFIG_SETUP_ARGS

1
dist/common/sudoers.d/scylla vendored Normal file
View File

@@ -0,0 +1 @@
scylla ALL=(ALL) NOPASSWD:SETENV: /usr/lib/scylla/scylla_prepare,/usr/lib/scylla/scylla_stop

View File

@@ -34,8 +34,11 @@ SCYLLA_HOME=/var/lib/scylla
# scylla config dir
SCYLLA_CONF=/etc/scylla
# additional arguments
SCYLLA_ARGS=""
# scylla arguments (for posix mode)
SCYLLA_ARGS="--log-to-syslog 1 --log-to-stdout 0 --default-log-level info --collectd-address=127.0.0.1:25826 --collectd=1 --collectd-poll-period 3000 --network-stack posix"
## scylla arguments (for dpdk mode)
#SCYLLA_ARGS="--log-to-syslog 1 --log-to-stdout 0 --default-log-level info --collectd-address=127.0.0.1:25826 --collectd=1 --collectd-poll-period 3000 --network-stack native --dpdk-pmd"
# setup as AMI instance
AMI=no

View File

@@ -1,5 +1,22 @@
#!/bin/sh -e
print_usage() {
echo "build_rpm.sh -R"
echo " -R rebuild dependency packages (CentOS)"
exit 1
}
REBUILD=0
while getopts Rh OPT; do
case "$OPT" in
"R")
REBUILD=1
;;
"h")
print_usage
;;
esac
done
RPMBUILD=`pwd`/build/rpmbuild
if [ ! -e dist/redhat/build_rpm.sh ]; then
@@ -22,7 +39,12 @@ if [ ! -f /usr/bin/git ]; then
fi
mkdir -p $RPMBUILD/{BUILD,BUILDROOT,RPMS,SOURCES,SPECS,SRPMS}
if [ "$ID" = "centos" ]; then
./dist/redhat/centos_dep/build_dependency.sh
sudo yum install -y epel-release
if [ $REBUILD = 1 ]; then
./dist/redhat/centos_dep/build_dependency.sh
else
sudo curl https://s3.amazonaws.com/downloads.scylladb.com/rpm/unstable/centos/master/latest/scylla.repo -o /etc/yum.repos.d/scylla.repo
fi
fi
VERSION=$(./SCYLLA-VERSION-GEN)
SCYLLA_VERSION=$(cat build/SCYLLA-VERSION-FILE)
@@ -37,7 +59,7 @@ if [ "$ID" = "fedora" ]; then
rpmbuild -bs --define "_topdir $RPMBUILD" $RPMBUILD/SPECS/scylla-server.spec
mock rebuild --resultdir=`pwd`/build/rpms $RPMBUILD/SRPMS/scylla-server-$VERSION*.src.rpm
else
. /etc/profile.d/scylla.sh
sudo yum-builddep -y $RPMBUILD/SPECS/scylla-server.spec
. /etc/profile.d/scylla.sh
rpmbuild -ba --define "_topdir $RPMBUILD" $RPMBUILD/SPECS/scylla-server.spec
fi

View File

@@ -1,5 +1,5 @@
--- binutils.spec 2015-10-19 05:45:55.106745163 +0000
+++ binutils.spec.1 2015-10-19 05:45:55.807742899 +0000
--- binutils.spec.orig 2015-09-30 14:48:25.000000000 +0000
+++ binutils.spec 2016-01-20 14:42:17.856037134 +0000
@@ -17,7 +17,7 @@
%define enable_deterministic_archives 1
@@ -7,7 +7,7 @@
-Name: %{?cross}binutils%{?_with_debug:-debug}
+Name: scylla-%{?cross}binutils%{?_with_debug:-debug}
Version: 2.25
Release: 5%{?dist}
Release: 15%{?dist}
License: GPLv3+
@@ -29,6 +29,7 @@
# instead.
@@ -17,7 +17,7 @@
Source2: binutils-2.19.50.0.1-output-format.sed
Patch01: binutils-2.20.51.0.2-libtool-lib64.patch
@@ -82,6 +83,9 @@
@@ -89,6 +90,9 @@
BuildRequires: texinfo >= 4.0, gettext, flex, bison, zlib-devel
# BZ 920545: We need pod2man in order to build the manual pages.
BuildRequires: /usr/bin/pod2man
@@ -27,7 +27,7 @@
# Required for: ld-bootstrap/bootstrap.exp bootstrap with --static
# It should not be required for: ld-elf/elf.exp static {preinit,init,fini} array
%if %{run_testsuite}
@@ -105,8 +109,8 @@
@@ -112,8 +116,8 @@
%if "%{build_gold}" == "both"
Requires(post): coreutils
@@ -38,7 +38,7 @@
%endif
# On ARM EABI systems, we do want -gnueabi to be part of the
@@ -131,11 +135,12 @@
@@ -138,11 +142,12 @@
%package devel
Summary: BFD and opcodes static and dynamic libraries and header files
Group: System Environment/Libraries
@@ -50,10 +50,10 @@
Requires: zlib-devel
-Requires: binutils = %{version}-%{release}
+Requires: scylla-binutils = %{version}-%{release}
# BZ 1215242: We need touch...
Requires: coreutils
%description devel
This package contains BFD and opcodes static and dynamic libraries.
@@ -411,11 +416,11 @@
@@ -426,11 +431,11 @@
%post
%if "%{build_gold}" == "both"
%__rm -f %{_bindir}/%{?cross}ld
@@ -68,7 +68,7 @@
%endif
%if %{isnative}
/sbin/ldconfig
@@ -433,8 +438,8 @@
@@ -448,8 +453,8 @@
%preun
%if "%{build_gold}" == "both"
if [ $1 = 0 ]; then

View File

@@ -1,5 +1,5 @@
--- boost.spec 2015-05-03 17:32:13.000000000 +0000
+++ boost.spec.1 2015-10-19 06:03:12.670534256 +0000
--- boost.spec.orig 2016-01-15 18:41:47.000000000 +0000
+++ boost.spec 2016-01-20 14:46:47.397663246 +0000
@@ -6,6 +6,11 @@
# We should be able to install directly.
%define boost_docdir __tmp_docdir
@@ -20,9 +20,9 @@
+Name: scylla-boost
+%define orig_name boost
Summary: The free peer-reviewed portable C++ source libraries
Version: 1.57.0
%define version_enc 1_57_0
Release: 6%{?dist}
Version: 1.58.0
%define version_enc 1_58_0
Release: 11%{?dist}
License: Boost and MIT and Python
-%define toplev_dirname %{name}_%{version_enc}
@@ -93,8 +93,8 @@
+Requires: scylla-boost-wave%{?_isa} = %{version}-%{release}
BuildRequires: m4
BuildRequires: libstdc++-devel%{?_isa}
@@ -151,6 +159,7 @@
BuildRequires: libstdc++-devel
@@ -156,6 +164,7 @@
%package atomic
Summary: Run-Time component of boost atomic library
Group: System Environment/Libraries
@@ -102,7 +102,7 @@
%description atomic
@@ -162,7 +171,8 @@
@@ -167,7 +176,8 @@
%package chrono
Summary: Run-Time component of boost chrono library
Group: System Environment/Libraries
@@ -112,7 +112,7 @@
%description chrono
@@ -171,6 +181,7 @@
@@ -176,6 +186,7 @@
%package container
Summary: Run-Time component of boost container library
Group: System Environment/Libraries
@@ -120,7 +120,7 @@
%description container
@@ -183,6 +194,7 @@
@@ -188,6 +199,7 @@
%package context
Summary: Run-Time component of boost context switching library
Group: System Environment/Libraries
@@ -128,7 +128,7 @@
%description context
@@ -192,6 +204,7 @@
@@ -197,6 +209,7 @@
%package coroutine
Summary: Run-Time component of boost coroutine library
Group: System Environment/Libraries
@@ -136,7 +136,7 @@
%description coroutine
Run-Time support for Boost.Coroutine, a library that provides
@@ -203,6 +216,7 @@
@@ -208,6 +221,7 @@
%package date-time
Summary: Run-Time component of boost date-time library
Group: System Environment/Libraries
@@ -144,7 +144,7 @@
%description date-time
@@ -212,7 +226,8 @@
@@ -217,7 +231,8 @@
%package filesystem
Summary: Run-Time component of boost filesystem library
Group: System Environment/Libraries
@@ -154,7 +154,7 @@
%description filesystem
@@ -223,7 +238,8 @@
@@ -228,7 +243,8 @@
%package graph
Summary: Run-Time component of boost graph library
Group: System Environment/Libraries
@@ -164,7 +164,7 @@
%description graph
@@ -243,9 +259,10 @@
@@ -248,9 +264,10 @@
%package locale
Summary: Run-Time component of boost locale library
Group: System Environment/Libraries
@@ -178,7 +178,7 @@
%description locale
@@ -255,6 +272,7 @@
@@ -260,6 +277,7 @@
%package log
Summary: Run-Time component of boost logging library
Group: System Environment/Libraries
@@ -186,7 +186,7 @@
%description log
@@ -265,6 +283,7 @@
@@ -270,6 +288,7 @@
%package math
Summary: Math functions for boost TR1 library
Group: System Environment/Libraries
@@ -194,7 +194,7 @@
%description math
@@ -274,6 +293,7 @@
@@ -279,6 +298,7 @@
%package program-options
Summary: Run-Time component of boost program_options library
Group: System Environment/Libraries
@@ -202,7 +202,7 @@
%description program-options
@@ -284,6 +304,7 @@
@@ -289,6 +309,7 @@
%package python
Summary: Run-Time component of boost python library
Group: System Environment/Libraries
@@ -210,7 +210,7 @@
%description python
@@ -298,6 +319,7 @@
@@ -303,6 +324,7 @@
%package python3
Summary: Run-Time component of boost python library for Python 3
Group: System Environment/Libraries
@@ -218,7 +218,7 @@
%description python3
@@ -310,8 +332,9 @@
@@ -315,8 +337,9 @@
%package python3-devel
Summary: Shared object symbolic links for Boost.Python 3
Group: System Environment/Libraries
@@ -230,7 +230,7 @@
%description python3-devel
@@ -322,6 +345,7 @@
@@ -327,6 +350,7 @@
%package random
Summary: Run-Time component of boost random library
Group: System Environment/Libraries
@@ -238,7 +238,7 @@
%description random
@@ -330,6 +354,7 @@
@@ -335,6 +359,7 @@
%package regex
Summary: Run-Time component of boost regular expression library
Group: System Environment/Libraries
@@ -246,7 +246,7 @@
%description regex
@@ -338,6 +363,7 @@
@@ -343,6 +368,7 @@
%package serialization
Summary: Run-Time component of boost serialization library
Group: System Environment/Libraries
@@ -254,7 +254,7 @@
%description serialization
@@ -346,6 +372,7 @@
@@ -351,6 +377,7 @@
%package signals
Summary: Run-Time component of boost signals and slots library
Group: System Environment/Libraries
@@ -262,7 +262,7 @@
%description signals
@@ -354,6 +381,7 @@
@@ -359,6 +386,7 @@
%package system
Summary: Run-Time component of boost system support library
Group: System Environment/Libraries
@@ -270,7 +270,7 @@
%description system
@@ -364,6 +392,7 @@
@@ -369,6 +397,7 @@
%package test
Summary: Run-Time component of boost test library
Group: System Environment/Libraries
@@ -278,7 +278,7 @@
%description test
@@ -373,7 +402,8 @@
@@ -378,7 +407,8 @@
%package thread
Summary: Run-Time component of boost thread library
Group: System Environment/Libraries
@@ -288,7 +288,7 @@
%description thread
@@ -385,8 +415,9 @@
@@ -390,8 +420,9 @@
%package timer
Summary: Run-Time component of boost timer library
Group: System Environment/Libraries
@@ -300,7 +300,7 @@
%description timer
@@ -397,11 +428,12 @@
@@ -402,11 +433,12 @@
%package wave
Summary: Run-Time component of boost C99/C++ pre-processing library
Group: System Environment/Libraries
@@ -318,7 +318,7 @@
%description wave
@@ -412,27 +444,20 @@
@@ -417,27 +449,20 @@
%package devel
Summary: The Boost C++ headers and shared development libraries
Group: Development/Libraries
@@ -352,7 +352,7 @@
%description static
Static Boost C++ libraries.
@@ -443,11 +468,7 @@
@@ -448,11 +473,7 @@
%if 0%{?rhel} >= 6
BuildArch: noarch
%endif
@@ -365,7 +365,7 @@
%description doc
This package contains the documentation in the HTML format of the Boost C++
@@ -460,7 +481,7 @@
@@ -465,7 +486,7 @@
%if 0%{?rhel} >= 6
BuildArch: noarch
%endif
@@ -374,19 +374,18 @@
%description examples
This package contains example source files distributed with boost.
@@ -471,9 +492,10 @@
@@ -476,8 +497,9 @@
%package openmpi
Summary: Run-Time component of Boost.MPI library
Group: System Environment/Libraries
+Requires: scylla-env
Requires: openmpi%{?_isa}
BuildRequires: openmpi-devel
-Requires: boost-serialization%{?_isa} = %{version}-%{release}
+Requires: scylla-boost-serialization%{?_isa} = %{version}-%{release}
%description openmpi
@@ -483,10 +505,11 @@
@@ -487,10 +509,11 @@
%package openmpi-devel
Summary: Shared library symbolic links for Boost.MPI
Group: System Environment/Libraries
@@ -402,7 +401,7 @@
%description openmpi-devel
@@ -496,9 +519,10 @@
@@ -500,9 +523,10 @@
%package openmpi-python
Summary: Python run-time component of Boost.MPI library
Group: System Environment/Libraries
@@ -416,7 +415,7 @@
%description openmpi-python
@@ -508,8 +532,9 @@
@@ -512,8 +536,9 @@
%package graph-openmpi
Summary: Run-Time component of parallel boost graph library
Group: System Environment/Libraries
@@ -428,12 +427,11 @@
%description graph-openmpi
@@ -526,11 +551,11 @@
@@ -530,10 +555,10 @@
%package mpich
Summary: Run-Time component of Boost.MPI library
Group: System Environment/Libraries
+Requires: scylla-env
Requires: mpich%{?_isa}
BuildRequires: mpich-devel
-Requires: boost-serialization%{?_isa} = %{version}-%{release}
-Provides: boost-mpich2 = %{version}-%{release}
@@ -443,7 +441,7 @@
%description mpich
@@ -540,12 +565,12 @@
@@ -543,12 +568,12 @@
%package mpich-devel
Summary: Shared library symbolic links for Boost.MPI
Group: System Environment/Libraries
@@ -462,7 +460,7 @@
%description mpich-devel
@@ -555,11 +580,11 @@
@@ -558,11 +583,11 @@
%package mpich-python
Summary: Python run-time component of Boost.MPI library
Group: System Environment/Libraries
@@ -479,7 +477,7 @@
%description mpich-python
@@ -569,10 +594,10 @@
@@ -572,10 +597,10 @@
%package graph-mpich
Summary: Run-Time component of parallel boost graph library
Group: System Environment/Libraries
@@ -494,7 +492,7 @@
%description graph-mpich
@@ -586,7 +611,8 @@
@@ -589,7 +614,8 @@
%package build
Summary: Cross platform build system for C++ projects
Group: Development/Tools
@@ -504,7 +502,7 @@
BuildArch: noarch
%description build
@@ -600,6 +626,7 @@
@@ -613,6 +639,7 @@
%package jam
Summary: A low-level build tool
Group: Development/Tools
@@ -512,7 +510,7 @@
%description jam
Boost.Jam (BJam) is the low-level build engine tool for Boost.Build.
@@ -1134,7 +1161,7 @@
@@ -1186,7 +1213,7 @@
%files devel
%defattr(-, root, root, -)
%doc LICENSE_1_0.txt

View File

@@ -12,33 +12,40 @@ sudo yum install -y wget yum-utils rpm-build rpmdevtools gcc gcc-c++ make patch
mkdir -p build/srpms
cd build/srpms
if [ ! -f binutils-2.25-5.fc22.src.rpm ]; then
wget http://download.fedoraproject.org/pub/fedora/linux/releases/22/Everything/source/SRPMS/b/binutils-2.25-5.fc22.src.rpm
if [ ! -f binutils-2.25-15.fc23.src.rpm ]; then
wget https://kojipkgs.fedoraproject.org//packages/binutils/2.25/15.fc23/src/binutils-2.25-15.fc23.src.rpm
fi
if [ ! -f isl-0.14-3.fc22.src.rpm ]; then
wget http://download.fedoraproject.org/pub/fedora/linux/releases/22/Everything/source/SRPMS/i/isl-0.14-3.fc22.src.rpm
if [ ! -f isl-0.14-4.fc23.src.rpm ]; then
wget https://kojipkgs.fedoraproject.org//packages/isl/0.14/4.fc23/src/isl-0.14-4.fc23.src.rpm
fi
if [ ! -f gcc-5.1.1-4.fc22.src.rpm ]; then
wget https://s3.amazonaws.com/scylla-centos-dep/gcc-5.1.1-4.fc22.src.rpm
if [ ! -f gcc-5.3.1-2.fc23.src.rpm ]; then
wget https://kojipkgs.fedoraproject.org//packages/gcc/5.3.1/2.fc23/src/gcc-5.3.1-2.fc23.src.rpm
fi
if [ ! -f boost-1.57.0-6.fc22.src.rpm ]; then
wget http://download.fedoraproject.org/pub/fedora/linux/releases/22/Everything/source/SRPMS/b/boost-1.57.0-6.fc22.src.rpm
if [ ! -f boost-1.58.0-11.fc23.src.rpm ]; then
wget https://kojipkgs.fedoraproject.org//packages/boost/1.58.0/11.fc23/src/boost-1.58.0-11.fc23.src.rpm
fi
if [ ! -f ninja-build-1.5.3-2.fc22.src.rpm ]; then
wget http://download.fedoraproject.org/pub/fedora/linux/releases/22/Everything/source/SRPMS/n/ninja-build-1.5.3-2.fc22.src.rpm
if [ ! -f ninja-build-1.6.0-2.fc23.src.rpm ]; then
wget https://kojipkgs.fedoraproject.org//packages/ninja-build/1.6.0/2.fc23/src/ninja-build-1.6.0-2.fc23.src.rpm
fi
if [ ! -f ragel-6.8-3.fc22.src.rpm ]; then
wget http://download.fedoraproject.org/pub/fedora/linux/releases/22/Everything/source/SRPMS/r/ragel-6.8-3.fc22.src.rpm
if [ ! -f ragel-6.8-5.fc23.src.rpm ]; then
wget https://kojipkgs.fedoraproject.org//packages/ragel/6.8/5.fc23/src/ragel-6.8-5.fc23.src.rpm
fi
if [ ! -f gdb-7.10.1-30.fc23.src.rpm ]; then
wget https://kojipkgs.fedoraproject.org//packages/gdb/7.10.1/30.fc23/src/gdb-7.10.1-30.fc23.src.rpm
fi
if [ ! -f pyparsing-2.0.3-2.fc23.src.rpm ]; then
wget https://kojipkgs.fedoraproject.org//packages/pyparsing/2.0.3/2.fc23/src/pyparsing-2.0.3-2.fc23.src.rpm
fi
cd -
sudo yum install -y epel-release
sudo yum install -y cryptopp cryptopp-devel jsoncpp jsoncpp-devel lz4 lz4-devel yaml-cpp yaml-cpp-devel thrift thrift-devel scons gtest gtest-devel python34
sudo ln -sf /usr/bin/python3.4 /usr/bin/python3
@@ -47,6 +54,8 @@ sudo yum install -y flex bison dejagnu zlib-static glibc-static sharutils bc lib
sudo yum install -y gcc-objc
sudo yum install -y asciidoc
sudo yum install -y gettext
sudo yum install -y rpm-devel python34-devel guile-devel readline-devel ncurses-devel expat-devel texlive-collection-latexrecommended xz-devel libselinux-devel
sudo yum install -y dos2unix
if [ ! -f $RPMBUILD/RPMS/noarch/scylla-env-1.0-1.el7.centos.noarch.rpm ]; then
cd dist/redhat/centos_dep
@@ -56,48 +65,62 @@ if [ ! -f $RPMBUILD/RPMS/noarch/scylla-env-1.0-1.el7.centos.noarch.rpm ]; then
fi
do_install scylla-env-1.0-1.el7.centos.noarch.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-binutils-2.25-5.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/binutils-2.25-5.fc22.src.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-binutils-2.25-15.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/binutils-2.25-15.fc23.src.rpm
patch $RPMBUILD/SPECS/binutils.spec < dist/redhat/centos_dep/binutils.diff
rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/binutils.spec
fi
do_install scylla-binutils-2.25-5.el7.centos.x86_64.rpm
do_install scylla-binutils-2.25-15.el7.centos.x86_64.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-isl-0.14-3.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/isl-0.14-3.fc22.src.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-isl-0.14-4.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/isl-0.14-4.fc23.src.rpm
patch $RPMBUILD/SPECS/isl.spec < dist/redhat/centos_dep/isl.diff
rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/isl.spec
fi
do_install scylla-isl-0.14-3.el7.centos.x86_64.rpm
do_install scylla-isl-devel-0.14-3.el7.centos.x86_64.rpm
do_install scylla-isl-0.14-4.el7.centos.x86_64.rpm
do_install scylla-isl-devel-0.14-4.el7.centos.x86_64.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-gcc-5.1.1-4.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/gcc-5.1.1-4.fc22.src.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-gcc-5.3.1-2.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/gcc-5.3.1-2.fc23.src.rpm
patch $RPMBUILD/SPECS/gcc.spec < dist/redhat/centos_dep/gcc.diff
rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/gcc.spec
fi
do_install scylla-*5.1.1-4*
do_install scylla-*5.3.1-2*
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-boost-1.57.0-6.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/boost-1.57.0-6.fc22.src.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-boost-1.58.0-11.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/boost-1.58.0-11.fc23.src.rpm
patch $RPMBUILD/SPECS/boost.spec < dist/redhat/centos_dep/boost.diff
rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/boost.spec
fi
do_install scylla-boost*
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-ninja-build-1.5.3-2.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/ninja-build-1.5.3-2.fc22.src.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-ninja-build-1.6.0-2.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/ninja-build-1.6.0-2.fc23.src.rpm
patch $RPMBUILD/SPECS/ninja-build.spec < dist/redhat/centos_dep/ninja-build.diff
rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/ninja-build.spec
fi
do_install scylla-ninja-build-1.5.3-2.el7.centos.x86_64.rpm
do_install scylla-ninja-build-1.6.0-2.el7.centos.x86_64.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-ragel-6.8-3.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/ragel-6.8-3.fc22.src.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-ragel-6.8-5.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/ragel-6.8-5.fc23.src.rpm
patch $RPMBUILD/SPECS/ragel.spec < dist/redhat/centos_dep/ragel.diff
rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/ragel.spec
fi
do_install scylla-ragel-6.8-3.el7.centos.x86_64.rpm
do_install scylla-ragel-6.8-5.el7.centos.x86_64.rpm
if [ ! -f $RPMBUILD/RPMS/x86_64/scylla-gdb-7.10.1-30.el7.centos.x86_64.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/gdb-7.10.1-30.fc23.src.rpm
patch $RPMBUILD/SPECS/gdb.spec < dist/redhat/centos_dep/gdb.diff
rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/gdb.spec
fi
do_install scylla-gdb-7.10.1-30.el7.centos.x86_64.rpm
if [ ! -f $RPMBUILD/RPMS/noarch/python34-pyparsing-2.0.3-2.el7.centos.noarch.rpm ]; then
rpm --define "_topdir $RPMBUILD" -ivh build/srpms/pyparsing-2.0.3-2.fc23.src.rpm
patch $RPMBUILD/SPECS/pyparsing.spec < dist/redhat/centos_dep/pyparsing.diff
rpmbuild --define "_topdir $RPMBUILD" -ba $RPMBUILD/SPECS/pyparsing.spec
fi
do_install python34-pyparsing-2.0.3-2.el7.centos.noarch.rpm
if [ ! -f $RPMBUILD/RPMS/noarch/scylla-antlr3-tool-3.5.2-1.el7.centos.noarch.rpm ]; then
mkdir build/scylla-antlr3-tool-3.5.2

View File

@@ -1,30 +1,14 @@
--- gcc.spec 2015-10-19 06:31:44.889189647 +0000
+++ gcc.spec.1 2015-10-19 07:56:17.445991665 +0000
@@ -1,22 +1,15 @@
%global DATE 20150618
%global SVNREV 224595
%global gcc_version 5.1.1
--- gcc.spec.orig 2015-12-08 16:03:46.000000000 +0000
+++ gcc.spec 2016-01-21 08:47:49.160667342 +0000
@@ -1,6 +1,7 @@
%global DATE 20151207
%global SVNREV 231358
%global gcc_version 5.3.1
+%define _prefix /opt/scylladb
# Note, gcc_release must be integer, if you want to add suffixes to
# %{release}, append them after %{gcc_release} on Release: line.
%global gcc_release 4
%global _unpackaged_files_terminate_build 0
%global _performance_build 1
%global multilib_64_archs sparc64 ppc64 ppc64p7 s390x x86_64
-%ifarch %{ix86} x86_64 ia64 ppc ppc64 ppc64p7 alpha %{arm} aarch64
-%global build_ada 1
-%else
%global build_ada 0
-%endif
-%ifarch %{ix86} x86_64 ppc ppc64 ppc64le ppc64p7 s390 s390x %{arm} aarch64
-%global build_go 1
-%else
%global build_go 0
-%endif
%ifarch %{ix86} x86_64 ia64
%global build_libquadmath 1
%else
@@ -82,7 +75,8 @@
%global gcc_release 2
@@ -84,7 +85,8 @@
%global multilib_32_arch i686
%endif
Summary: Various compilers (C, C++, Objective-C, Java, ...)
@@ -34,7 +18,7 @@
Version: %{gcc_version}
Release: %{gcc_release}%{?dist}
# libgcc, libgfortran, libgomp, libstdc++ and crtstuff have
@@ -97,6 +91,7 @@
@@ -99,6 +101,7 @@
%global isl_version 0.14
URL: http://gcc.gnu.org
BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n)
@@ -42,7 +26,7 @@
# Need binutils with -pie support >= 2.14.90.0.4-4
# Need binutils which can omit dot symbols and overlap .opd on ppc64 >= 2.15.91.0.2-4
# Need binutils which handle -msecure-plt on ppc >= 2.16.91.0.2-2
@@ -108,7 +103,7 @@
@@ -110,7 +113,7 @@
# Need binutils which support .cfi_sections >= 2.19.51.0.14-33
# Need binutils which support --no-add-needed >= 2.20.51.0.2-12
# Need binutils which support -plugin
@@ -51,7 +35,7 @@
# While gcc doesn't include statically linked binaries, during testing
# -static is used several times.
BuildRequires: glibc-static
@@ -143,15 +138,15 @@
@@ -145,15 +148,15 @@
BuildRequires: libunwind >= 0.98
%endif
%if %{build_isl}
@@ -71,7 +55,7 @@
# Need .eh_frame ld optimizations
# Need proper visibility support
# Need -pie support
@@ -166,7 +161,7 @@
@@ -168,7 +171,7 @@
# Need binutils that support .cfi_sections
# Need binutils that support --no-add-needed
# Need binutils that support -plugin
@@ -80,7 +64,7 @@
# Make sure gdb will understand DW_FORM_strp
Conflicts: gdb < 5.1-2
Requires: glibc-devel >= 2.2.90-12
@@ -174,17 +169,15 @@
@@ -176,17 +179,15 @@
# Make sure glibc supports TFmode long double
Requires: glibc >= 2.3.90-35
%endif
@@ -102,7 +86,7 @@
Requires(post): /sbin/install-info
Requires(preun): /sbin/install-info
AutoReq: true
@@ -226,12 +219,12 @@
@@ -228,12 +229,12 @@
The gcc package contains the GNU Compiler Collection version 5.
You'll need this package in order to compile C code.
@@ -117,7 +101,7 @@
%endif
Obsoletes: libmudflap
Obsoletes: libmudflap-devel
@@ -239,17 +232,19 @@
@@ -241,17 +242,19 @@
Obsoletes: libgcj < %{version}-%{release}
Obsoletes: libgcj-devel < %{version}-%{release}
Obsoletes: libgcj-src < %{version}-%{release}
@@ -141,7 +125,7 @@
Autoreq: true
%description c++
@@ -257,50 +252,55 @@
@@ -259,50 +262,55 @@
It includes support for most of the current C++ specification,
including templates and exception handling.
@@ -209,7 +193,7 @@
Autoreq: true
%description objc
@@ -311,29 +311,32 @@
@@ -313,29 +321,32 @@
%package objc++
Summary: Objective-C++ support for GCC
Group: Development/Languages
@@ -249,7 +233,7 @@
%endif
Requires(post): /sbin/install-info
Requires(preun): /sbin/install-info
@@ -343,260 +346,286 @@
@@ -345,260 +356,286 @@
The gcc-gfortran package provides support for compiling Fortran
programs with the GNU Compiler Collection.
@@ -608,7 +592,7 @@
Cpp is the GNU C-Compatible Compiler Preprocessor.
Cpp is a macro processor which is used automatically
by the C compiler to transform your program before actual
@@ -621,8 +650,9 @@
@@ -623,8 +660,9 @@
%package gnat
Summary: Ada 83, 95, 2005 and 2012 support for GCC
Group: Development/Languages
@@ -620,7 +604,7 @@
Requires(post): /sbin/install-info
Requires(preun): /sbin/install-info
Autoreq: true
@@ -631,40 +661,44 @@
@@ -633,82 +671,90 @@
GNAT is a GNU Ada 83, 95, 2005 and 2012 front-end to GCC. This package includes
development tools, the documents and Ada compiler.
@@ -674,8 +658,13 @@
+Requires: scylla-libgo-devel = %{version}-%{release}
Requires(post): /sbin/install-info
Requires(preun): /sbin/install-info
Requires(post): %{_sbindir}/update-alternatives
@@ -675,38 +709,42 @@
-Requires(post): %{_sbindir}/update-alternatives
-Requires(postun): %{_sbindir}/update-alternatives
+Requires(post): /sbin/update-alternatives
+Requires(postun): /sbin/update-alternatives
Autoreq: true
%description go
The gcc-go package provides support for compiling Go programs
with the GNU Compiler Collection.
@@ -728,7 +717,7 @@
Requires: gmp-devel >= 4.1.2-8, mpfr-devel >= 2.2.1, libmpc-devel >= 0.8.1
%description plugin-devel
@@ -726,7 +764,8 @@
@@ -728,7 +774,8 @@
Summary: Debug information for package %{name}
Group: Development/Debug
AutoReqProv: 0
@@ -738,21 +727,21 @@
%description debuginfo
This package provides debug information for package %{name}.
@@ -961,11 +1000,10 @@
@@ -958,11 +1005,11 @@
--enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu \
--enable-plugin --enable-initfini-array \
--disable-libgcj \
-%if 0%{fedora} >= 21 && 0%{fedora} <= 22
--with-default-libstdcxx-abi=c++98 \
--with-default-libstdcxx-abi=gcc4-compatible \
-%endif
%if %{build_isl}
- --with-isl \
--with-isl \
+ --with-isl-include=/opt/scylladb/include/ \
+ --with-isl-lib=/opt/scylladb/lib64/ \
%else
--without-isl \
%endif
@@ -974,11 +1012,9 @@
@@ -971,11 +1018,9 @@
%else
--disable-libmpx \
%endif
@@ -764,7 +753,7 @@
%ifarch %{arm}
--disable-sjlj-exceptions \
%endif
@@ -1009,9 +1045,6 @@
@@ -1006,9 +1051,6 @@
%if 0%{?rhel} >= 7
--with-cpu-32=power8 --with-tune-32=power8 --with-cpu-64=power8 --with-tune-64=power8 \
%endif
@@ -774,7 +763,7 @@
%endif
%ifarch ppc
--build=%{gcc_target_platform} --target=%{gcc_target_platform} --with-cpu=default32
@@ -1273,16 +1306,15 @@
@@ -1270,16 +1312,15 @@
mv %{buildroot}%{_prefix}/%{_lib}/libmpx.spec $FULLPATH/
%endif
@@ -797,7 +786,7 @@
%endif
%ifarch ppc
rm -f $FULLPATH/libgcc_s.so
@@ -1816,7 +1848,7 @@
@@ -1819,7 +1860,7 @@
chmod 755 %{buildroot}%{_prefix}/bin/c?9
cd ..
@@ -806,7 +795,7 @@
%find_lang cpplib
# Remove binaries we will not be including, so that they don't end up in
@@ -1866,11 +1898,7 @@
@@ -1869,11 +1910,7 @@
# run the tests.
make %{?_smp_mflags} -k check ALT_CC_UNDER_TEST=gcc ALT_CXX_UNDER_TEST=g++ \
@@ -818,7 +807,7 @@
echo ====================TESTING=========================
( LC_ALL=C ../contrib/test_summary || : ) 2>&1 | sed -n '/^cat.*EOF/,/^EOF/{/^cat.*EOF/d;/^EOF/d;/^LAST_UPDATED:/d;p;}'
echo ====================TESTING END=====================
@@ -1897,13 +1925,13 @@
@@ -1900,13 +1937,13 @@
--info-dir=%{_infodir} %{_infodir}/gcc.info.gz || :
fi
@@ -834,7 +823,21 @@
if [ $1 = 0 -a -f %{_infodir}/cpp.info.gz ]; then
/sbin/install-info --delete \
--info-dir=%{_infodir} %{_infodir}/cpp.info.gz || :
@@ -1954,7 +1982,7 @@
@@ -1945,19 +1982,19 @@
fi
%post go
-%{_sbindir}/update-alternatives --install \
+/sbin/update-alternatives --install \
%{_prefix}/bin/go go %{_prefix}/bin/go.gcc 92 \
--slave %{_prefix}/bin/gofmt gofmt %{_prefix}/bin/gofmt.gcc
%preun go
if [ $1 = 0 ]; then
- %{_sbindir}/update-alternatives --remove go %{_prefix}/bin/go.gcc
+ /sbin/update-alternatives --remove go %{_prefix}/bin/go.gcc
fi
# Because glibc Prereq's libgcc and /sbin/ldconfig
# comes from glibc, it might not exist yet when
# libgcc is installed
@@ -843,7 +846,7 @@
if posix.access ("/sbin/ldconfig", "x") then
local pid = posix.fork ()
if pid == 0 then
@@ -1964,7 +1992,7 @@
@@ -1967,7 +2004,7 @@
end
end
@@ -852,7 +855,7 @@
if posix.access ("/sbin/ldconfig", "x") then
local pid = posix.fork ()
if pid == 0 then
@@ -1974,120 +2002,120 @@
@@ -1977,120 +2014,120 @@
end
end
@@ -1011,7 +1014,7 @@
%defattr(-,root,root,-)
%{_prefix}/bin/cc
%{_prefix}/bin/c89
@@ -2409,7 +2437,7 @@
@@ -2414,7 +2451,7 @@
%{!?_licensedir:%global license %%doc}
%license gcc/COPYING* COPYING.RUNTIME
@@ -1020,7 +1023,7 @@
%defattr(-,root,root,-)
%{_prefix}/lib/cpp
%{_prefix}/bin/cpp
@@ -2420,10 +2448,10 @@
@@ -2425,10 +2462,10 @@
%dir %{_prefix}/libexec/gcc/%{gcc_target_platform}/%{gcc_version}
%{_prefix}/libexec/gcc/%{gcc_target_platform}/%{gcc_version}/cc1
@@ -1034,7 +1037,7 @@
%{!?_licensedir:%global license %%doc}
%license gcc/COPYING* COPYING.RUNTIME
@@ -2461,7 +2489,7 @@
@@ -2469,7 +2506,7 @@
%endif
%doc rpm.doc/changelogs/gcc/cp/ChangeLog*
@@ -1043,7 +1046,7 @@
%defattr(-,root,root,-)
%{_prefix}/%{_lib}/libstdc++.so.6*
%dir %{_datadir}/gdb
@@ -2473,7 +2501,7 @@
@@ -2481,7 +2518,7 @@
%dir %{_prefix}/share/gcc-%{gcc_version}/python
%{_prefix}/share/gcc-%{gcc_version}/python/libstdcxx
@@ -1052,7 +1055,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/include/c++
%dir %{_prefix}/include/c++/%{gcc_version}
@@ -2488,7 +2516,7 @@
@@ -2507,7 +2544,7 @@
%endif
%doc rpm.doc/changelogs/libstdc++-v3/ChangeLog* libstdc++-v3/README*
@@ -1061,7 +1064,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2509,7 +2537,7 @@
@@ -2528,7 +2565,7 @@
%endif
%if %{build_libstdcxx_docs}
@@ -1070,7 +1073,7 @@
%defattr(-,root,root)
%{_mandir}/man3/*
%doc rpm.doc/libstdc++-v3/html
@@ -2548,7 +2576,7 @@
@@ -2567,7 +2604,7 @@
%dir %{_prefix}/libexec/gcc/%{gcc_target_platform}/%{gcc_version}
%{_prefix}/libexec/gcc/%{gcc_target_platform}/%{gcc_version}/cc1objplus
@@ -1079,7 +1082,7 @@
%defattr(-,root,root,-)
%{_prefix}/%{_lib}/libobjc.so.4*
@@ -2602,11 +2630,11 @@
@@ -2621,11 +2658,11 @@
%endif
%doc rpm.doc/gfortran/*
@@ -1093,7 +1096,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2652,12 +2680,12 @@
@@ -2671,12 +2708,12 @@
%{_prefix}/libexec/gcc/%{gcc_target_platform}/%{gcc_version}/gnat1
%doc rpm.doc/changelogs/gcc/ada/ChangeLog*
@@ -1108,7 +1111,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2683,7 +2711,7 @@
@@ -2702,7 +2739,7 @@
%exclude %{_prefix}/lib/gcc/%{gcc_target_platform}/%{gcc_version}/adalib/libgnarl.a
%endif
@@ -1117,7 +1120,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2707,7 +2735,7 @@
@@ -2726,7 +2763,7 @@
%endif
%endif
@@ -1126,7 +1129,7 @@
%defattr(-,root,root,-)
%{_prefix}/%{_lib}/libgomp.so.1*
%{_prefix}/%{_lib}/libgomp-plugin-host_nonshm.so.1*
@@ -2715,14 +2743,14 @@
@@ -2734,14 +2771,14 @@
%doc rpm.doc/changelogs/libgomp/ChangeLog*
%if %{build_libquadmath}
@@ -1143,7 +1146,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2735,7 +2763,7 @@
@@ -2754,7 +2791,7 @@
%endif
%doc rpm.doc/libquadmath/ChangeLog*
@@ -1152,7 +1155,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2754,12 +2782,12 @@
@@ -2773,12 +2810,12 @@
%endif
%if %{build_libitm}
@@ -1167,7 +1170,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2772,7 +2800,7 @@
@@ -2791,7 +2828,7 @@
%endif
%doc rpm.doc/libitm/ChangeLog*
@@ -1176,7 +1179,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2791,11 +2819,11 @@
@@ -2810,11 +2847,11 @@
%endif
%if %{build_libatomic}
@@ -1190,7 +1193,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2815,11 +2843,11 @@
@@ -2834,11 +2871,11 @@
%endif
%if %{build_libasan}
@@ -1204,7 +1207,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2841,11 +2869,11 @@
@@ -2860,11 +2897,11 @@
%endif
%if %{build_libubsan}
@@ -1218,7 +1221,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2867,11 +2895,11 @@
@@ -2886,11 +2923,11 @@
%endif
%if %{build_libtsan}
@@ -1232,7 +1235,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2883,11 +2911,11 @@
@@ -2902,11 +2939,11 @@
%endif
%if %{build_liblsan}
@@ -1246,7 +1249,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2899,11 +2927,11 @@
@@ -2918,11 +2955,11 @@
%endif
%if %{build_libcilkrts}
@@ -1260,7 +1263,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2923,12 +2951,12 @@
@@ -2942,12 +2979,12 @@
%endif
%if %{build_libmpx}
@@ -1275,7 +1278,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2990,12 +3018,12 @@
@@ -3009,12 +3046,12 @@
%endif
%doc rpm.doc/go/*
@@ -1290,7 +1293,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -3023,7 +3051,7 @@
@@ -3042,7 +3079,7 @@
%{_prefix}/lib/gcc/%{gcc_target_platform}/%{gcc_version}/libgo.so
%endif
@@ -1299,7 +1302,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -3041,12 +3069,12 @@
@@ -3060,12 +3097,12 @@
%endif
%endif

29
dist/redhat/centos_dep/gdb.diff vendored Normal file
View File

@@ -0,0 +1,29 @@
--- gdb.spec.orig 2015-12-06 04:10:30.000000000 +0000
+++ gdb.spec 2016-01-20 14:49:12.745843903 +0000
@@ -16,7 +16,10 @@
}
Summary: A GNU source-level debugger for C, C++, Fortran, Go and other languages
-Name: %{?scl_prefix}gdb
+Name: %{?scl_prefix}scylla-gdb
+%define orig_name gdb
+Requires: scylla-env
+%define _prefix /opt/scylladb
# Freeze it when GDB gets branched
%global snapsrc 20150706
@@ -572,12 +575,8 @@
BuildRequires: rpm-devel%{buildisa}
BuildRequires: zlib-devel%{buildisa} libselinux-devel%{buildisa}
%if 0%{!?_without_python:1}
-%if 0%{?rhel:1} && 0%{?rhel} <= 7
-BuildRequires: python-devel%{buildisa}
-%else
-%global __python %{__python3}
-BuildRequires: python3-devel%{buildisa}
-%endif
+BuildRequires: python34-devel%{?_isa}
+%global __python /usr/bin/python3.4
%if 0%{?rhel:1} && 0%{?rhel} <= 7
# Temporarily before python files get moved to libstdc++.rpm
# libstdc++%{bits_other} is not present in Koji, the .spec script generating

View File

@@ -1,5 +1,5 @@
--- isl.spec 2015-01-06 16:24:49.000000000 +0000
+++ isl.spec.1 2015-10-18 12:12:38.000000000 +0000
--- isl.spec.orig 2016-01-20 14:41:16.891802146 +0000
+++ isl.spec 2016-01-20 14:43:13.838336396 +0000
@@ -1,5 +1,5 @@
Summary: Integer point manipulation library
-Name: isl

View File

@@ -1,34 +1,56 @@
1c1
< Name: ninja-build
---
> Name: scylla-ninja-build
8d7
< Source1: ninja.vim
10a10
> Requires: scylla-env
14,16c14,15
< BuildRequires: re2c >= 0.11.3
< Requires: emacs-filesystem
< Requires: vim-filesystem
---
> #BuildRequires: scylla-re2c >= 0.11.3
> %define _prefix /opt/scylladb
35,37c34
< # TODO: Install ninja_syntax.py?
< mkdir -p %{buildroot}/{%{_bindir},%{_datadir}/bash-completion/completions,%{_datadir}/emacs/site-lisp,%{_datadir}/vim/vimfiles/syntax,%{_datadir}/vim/vimfiles/ftdetect,%{_datadir}/zsh/site-functions}
<
---
> mkdir -p %{buildroot}/opt/scylladb/bin
39,43d35
< install -pm644 misc/bash-completion %{buildroot}%{_datadir}/bash-completion/completions/ninja-bash-completion
< install -pm644 misc/ninja-mode.el %{buildroot}%{_datadir}/emacs/site-lisp/ninja-mode.el
< install -pm644 misc/ninja.vim %{buildroot}%{_datadir}/vim/vimfiles/syntax/ninja.vim
< install -pm644 %{SOURCE1} %{buildroot}%{_datadir}/vim/vimfiles/ftdetect/ninja.vim
< install -pm644 misc/zsh-completion %{buildroot}%{_datadir}/zsh/site-functions/_ninja
53,58d44
< %{_datadir}/bash-completion/completions/ninja-bash-completion
< %{_datadir}/emacs/site-lisp/ninja-mode.el
< %{_datadir}/vim/vimfiles/syntax/ninja.vim
< %{_datadir}/vim/vimfiles/ftdetect/ninja.vim
< # zsh does not have a -filesystem package
< %{_datadir}/zsh/
--- ninja-build.spec.orig 2016-01-20 14:41:16.892802134 +0000
+++ ninja-build.spec 2016-01-20 14:44:42.453227192 +0000
@@ -1,19 +1,18 @@
-Name: ninja-build
+Name: scylla-ninja-build
Version: 1.6.0
Release: 2%{?dist}
Summary: A small build system with a focus on speed
License: ASL 2.0
URL: http://martine.github.com/ninja/
Source0: https://github.com/martine/ninja/archive/v%{version}.tar.gz#/ninja-%{version}.tar.gz
-Source1: ninja.vim
# Rename mentions of the executable name to be ninja-build.
Patch1000: ninja-1.6.0-binary-rename.patch
+Requires: scylla-env
BuildRequires: asciidoc
BuildRequires: gtest-devel
BuildRequires: python2-devel
-BuildRequires: re2c >= 0.11.3
-Requires: emacs-filesystem
-Requires: vim-filesystem
+#BuildRequires: scylla-re2c >= 0.11.3
+%define _prefix /opt/scylladb
%description
Ninja is a small build system with a focus on speed. It differs from other
@@ -32,15 +31,8 @@
./ninja -v ninja_test
%install
-# TODO: Install ninja_syntax.py?
-mkdir -p %{buildroot}/{%{_bindir},%{_datadir}/bash-completion/completions,%{_datadir}/emacs/site-lisp,%{_datadir}/vim/vimfiles/syntax,%{_datadir}/vim/vimfiles/ftdetect,%{_datadir}/zsh/site-functions}
-
+mkdir -p %{buildroot}/opt/scylladb/bin
install -pm755 ninja %{buildroot}%{_bindir}/ninja-build
-install -pm644 misc/bash-completion %{buildroot}%{_datadir}/bash-completion/completions/ninja-bash-completion
-install -pm644 misc/ninja-mode.el %{buildroot}%{_datadir}/emacs/site-lisp/ninja-mode.el
-install -pm644 misc/ninja.vim %{buildroot}%{_datadir}/vim/vimfiles/syntax/ninja.vim
-install -pm644 %{SOURCE1} %{buildroot}%{_datadir}/vim/vimfiles/ftdetect/ninja.vim
-install -pm644 misc/zsh-completion %{buildroot}%{_datadir}/zsh/site-functions/_ninja
%check
# workaround possible too low default limits
@@ -50,12 +42,6 @@
%files
%doc COPYING HACKING.md README doc/manual.html
%{_bindir}/ninja-build
-%{_datadir}/bash-completion/completions/ninja-bash-completion
-%{_datadir}/emacs/site-lisp/ninja-mode.el
-%{_datadir}/vim/vimfiles/syntax/ninja.vim
-%{_datadir}/vim/vimfiles/ftdetect/ninja.vim
-# zsh does not have a -filesystem package
-%{_datadir}/zsh/
%changelog
* Mon Nov 16 2015 Ben Boeckel <mathstuf@gmail.com> - 1.6.0-2

40
dist/redhat/centos_dep/pyparsing.diff vendored Normal file
View File

@@ -0,0 +1,40 @@
--- pyparsing.spec.orig 2016-01-25 19:11:14.663651658 +0900
+++ pyparsing.spec 2016-01-25 19:12:49.853875369 +0900
@@ -1,4 +1,4 @@
-%if 0%{?fedora}
+%if 0%{?centos}
%global with_python3 1
%endif
@@ -15,7 +15,7 @@
BuildRequires: dos2unix
BuildRequires: glibc-common
%if 0%{?with_python3}
-BuildRequires: python3-devel
+BuildRequires: python34-devel
%endif # if with_python3
%description
@@ -30,11 +30,11 @@
The package contains documentation for pyparsing.
%if 0%{?with_python3}
-%package -n python3-pyparsing
+%package -n python34-pyparsing
Summary: An object-oriented approach to text processing (Python 3 version)
Group: Development/Libraries
-%description -n python3-pyparsing
+%description -n python34-pyparsing
pyparsing is a module that can be used to easily and directly configure syntax
definitions for any number of text parsing applications.
@@ -90,7 +90,7 @@
%{python_sitelib}/pyparsing.py*
%if 0%{?with_python3}
-%files -n python3-pyparsing
+%files -n python34-pyparsing
%doc CHANGES README LICENSE
%{python3_sitelib}/pyparsing*egg-info
%{python3_sitelib}/pyparsing.py*

View File

@@ -1,11 +1,11 @@
--- ragel.spec 2014-08-18 11:55:49.000000000 +0000
+++ ragel.spec.1 2015-10-18 12:18:23.000000000 +0000
--- ragel.spec.orig 2015-06-18 22:12:28.000000000 +0000
+++ ragel.spec 2016-01-20 14:49:53.980327766 +0000
@@ -1,17 +1,20 @@
-Name: ragel
+Name: scylla-ragel
+%define orig_name ragel
Version: 6.8
Release: 3%{?dist}
Release: 5%{?dist}
Summary: Finite state machine compiler
Group: Development/Tools

View File

@@ -1,14 +0,0 @@
#!/bin/sh -e
args="--log-to-syslog 1 --log-to-stdout 0 --default-log-level info $SCYLLA_ARGS"
if [ "$NETWORK_MODE" = "posix" ]; then
args="$args --network-stack posix"
elif [ "$NETWORK_MODE" = "virtio" ]; then
args="$args --network-stack native"
elif [ "$NETWORK_MODE" = "dpdk" ]; then
args="$args --network-stack native --dpdk-pmd"
fi
export HOME=/var/lib/scylla
exec sudo -E -u $USER /usr/bin/scylla $args

View File

@@ -8,10 +8,11 @@ License: AGPLv3
URL: http://www.scylladb.com/
Source0: %{name}-@@VERSION@@-@@RELEASE@@.tar
BuildRequires: libaio-devel boost-devel libstdc++-devel cryptopp-devel hwloc-devel numactl-devel libpciaccess-devel libxml2-devel zlib-devel thrift-devel yaml-cpp-devel lz4-devel snappy-devel jsoncpp-devel systemd-devel xz-devel openssl-devel libcap-devel libselinux-devel libgcrypt-devel libgpg-error-devel elfutils-devel krb5-devel libcom_err-devel libattr-devel pcre-devel elfutils-libelf-devel bzip2-devel keyutils-libs-devel xfsprogs-devel make gnutls-devel
%{?fedora:BuildRequires: ninja-build ragel antlr3-tool antlr3-C++-devel python3 gcc-c++ libasan libubsan}
%{?rhel:BuildRequires: scylla-ninja-build scylla-ragel scylla-antlr3-tool scylla-antlr3-C++-devel python34 scylla-gcc-c++ >= 5.1.1}
Requires: systemd-libs xfsprogs mdadm hwloc
BuildRequires: libaio-devel libstdc++-devel cryptopp-devel hwloc-devel numactl-devel libpciaccess-devel libxml2-devel zlib-devel thrift-devel yaml-cpp-devel lz4-devel snappy-devel jsoncpp-devel systemd-devel xz-devel openssl-devel libcap-devel libselinux-devel libgcrypt-devel libgpg-error-devel elfutils-devel krb5-devel libcom_err-devel libattr-devel pcre-devel elfutils-libelf-devel bzip2-devel keyutils-libs-devel xfsprogs-devel make gnutls-devel systemd-devel
%{?fedora:BuildRequires: boost-devel ninja-build ragel antlr3-tool antlr3-C++-devel python3 gcc-c++ libasan libubsan python3-pyparsing}
%{?rhel:BuildRequires: scylla-libstdc++-static scylla-boost-devel scylla-ninja-build scylla-ragel scylla-antlr3-tool scylla-antlr3-C++-devel python34 scylla-gcc-c++ >= 5.1.1, python34-pyparsing}
Requires: systemd-libs hwloc
Conflicts: abrt
%description
@@ -28,30 +29,29 @@ Requires: systemd-libs xfsprogs mdadm hwloc
./configure.py --with scylla --disable-xen --enable-dpdk --mode=release
%endif
%if 0%{?rhel}
./configure.py --with scylla --disable-xen --enable-dpdk --mode=release --static-stdc++ --compiler=/opt/scylladb/bin/g++
python3.4 ./configure.py --with scylla --disable-xen --enable-dpdk --mode=release --static-stdc++ --compiler=/opt/scylladb/bin/g++ --python python3.4
%endif
ninja-build -j2
%install
rm -rf $RPM_BUILD_ROOT
mkdir -p $RPM_BUILD_ROOT%{_bindir}
mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/sysctl.d/
mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/sysconfig/
mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/security/limits.d/
mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/sudoers.d/
mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/scylla/
mkdir -p $RPM_BUILD_ROOT%{_docdir}/scylla/
mkdir -p $RPM_BUILD_ROOT%{_unitdir}
mkdir -p $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
install -m644 dist/common/sysctl.d/99-scylla.conf $RPM_BUILD_ROOT%{_sysconfdir}/sysctl.d/
install -m644 dist/common/sysconfig/scylla-server $RPM_BUILD_ROOT%{_sysconfdir}/sysconfig/
install -m644 dist/common/limits.d/scylla.conf $RPM_BUILD_ROOT%{_sysconfdir}/security/limits.d/
install -m644 dist/common/sudoers.d/scylla $RPM_BUILD_ROOT%{_sysconfdir}/sudoers.d/
install -d -m755 $RPM_BUILD_ROOT%{_sysconfdir}/scylla
install -m644 conf/scylla.yaml $RPM_BUILD_ROOT%{_sysconfdir}/scylla/
install -m644 conf/cassandra-rackdc.properties $RPM_BUILD_ROOT%{_sysconfdir}/scylla/
install -m644 dist/redhat/systemd/scylla-server.service $RPM_BUILD_ROOT%{_unitdir}/
install -m755 dist/common/scripts/* $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
install -m755 dist/redhat/scripts/* $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
install -m755 seastar/scripts/posix_net_conf.sh $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
install -m755 seastar/dpdk/tools/dpdk_nic_bind.py $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
install -m755 build/release/scylla $RPM_BUILD_ROOT%{_bindir}
@@ -128,7 +128,7 @@ rm -rf $RPM_BUILD_ROOT
%config(noreplace) %{_sysconfdir}/sysconfig/scylla-server
%{_sysconfdir}/security/limits.d/scylla.conf
%{_sysconfdir}/sysctl.d/99-scylla.conf
%{_sysconfdir}/sudoers.d/scylla
%attr(0755,root,root) %dir %{_sysconfdir}/scylla
%config(noreplace) %{_sysconfdir}/scylla/scylla.yaml
%config(noreplace) %{_sysconfdir}/scylla/cassandra-rackdc.properties
@@ -140,9 +140,8 @@ rm -rf $RPM_BUILD_ROOT
%{_unitdir}/scylla-server.service
%{_bindir}/scylla
%{_prefix}/lib/scylla/scylla_prepare
%{_prefix}/lib/scylla/scylla_run
%{_prefix}/lib/scylla/scylla_stop
%{_prefix}/lib/scylla/scylla_save_coredump
%{_prefix}/lib/scylla/scylla_setup
%{_prefix}/lib/scylla/scylla_coredump_setup
%{_prefix}/lib/scylla/scylla_raid_setup
%{_prefix}/lib/scylla/scylla_sysconfig_setup

View File

@@ -1,20 +1,23 @@
[Unit]
Description=Scylla Server
After=network.target libvirtd.service
After=network.target
[Service]
Type=simple
Type=notify
LimitMEMLOCK=infinity
LimitNOFILE=200000
LimitAS=infinity
LimitNPROC=8096
WorkingDirectory=/var/lib/scylla
Environment="HOME=/var/lib/scylla"
EnvironmentFile=/etc/sysconfig/scylla-server
ExecStartPre=/usr/lib/scylla/scylla_prepare
ExecStart=/usr/lib/scylla/scylla_run
ExecStopPost=/usr/lib/scylla/scylla_stop
ExecStartPre=/usr/bin/sudo -E /usr/lib/scylla/scylla_prepare
ExecStart=/usr/bin/scylla $SCYLLA_ARGS
ExecStopPost=/usr/bin/sudo -E /usr/lib/scylla/scylla_stop
TimeoutStartSec=900
KillMode=process
Restart=no
User=scylla
[Install]
WantedBy=multi-user.target

View File

@@ -9,6 +9,19 @@ if [ -e debian ] || [ -e build/release ]; then
rm -rf debian build
mkdir build
fi
sudo apt-get -y update
if [ ! -f /usr/bin/git ]; then
sudo apt-get -y install git
fi
if [ ! -f /usr/bin/mk-build-deps ]; then
sudo apt-get -y install devscripts
fi
if [ ! -f /usr/bin/equivs-build ]; then
sudo apt-get -y install equivs
fi
if [ ! -f /usr/bin/add-apt-repository ]; then
sudo apt-get -y install software-properties-common
fi
RELEASE=`lsb_release -r|awk '{print $2}'`
CODENAME=`lsb_release -c|awk '{print $2}'`
@@ -21,9 +34,6 @@ fi
VERSION=$(./SCYLLA-VERSION-GEN)
SCYLLA_VERSION=$(cat build/SCYLLA-VERSION-FILE)
SCYLLA_RELEASE=$(cat build/SCYLLA-RELEASE-FILE)
if [ "$SCYLLA_VERSION" = "development" ]; then
SCYLLA_VERSION=0development
fi
echo $VERSION > version
./scripts/git-archive-all --extra version --force-submodules --prefix scylla-server ../scylla-server_$SCYLLA_VERSION-$SCYLLA_RELEASE.orig.tar.gz
@@ -34,27 +44,13 @@ sed -i -e "s/@@VERSION@@/$SCYLLA_VERSION/g" debian/changelog
sed -i -e "s/@@RELEASE@@/$SCYLLA_RELEASE/g" debian/changelog
sed -i -e "s/@@CODENAME@@/$CODENAME/g" debian/changelog
sudo apt-get -y update
./dist/ubuntu/dep/build_dependency.sh
DEP="libyaml-cpp-dev liblz4-dev libsnappy-dev libcrypto++-dev libjsoncpp-dev libaio-dev ragel ninja-build git liblz4-1 libaio1 hugepages software-properties-common libgnutls28-dev libhwloc-dev libnuma-dev libpciaccess-dev"
if [ "$RELEASE" = "14.04" ]; then
DEP="$DEP libboost1.55-dev libboost-program-options1.55.0 libboost-program-options1.55-dev libboost-system1.55.0 libboost-system1.55-dev libboost-thread1.55.0 libboost-thread1.55-dev libboost-test1.55.0 libboost-test1.55-dev libboost-filesystem1.55-dev libboost-filesystem1.55.0 libsnappy1"
else
DEP="$DEP libboost-dev libboost-program-options-dev libboost-system-dev libboost-thread-dev libboost-test-dev libboost-filesystem-dev libboost-filesystem-dev libsnappy1v5"
fi
if [ "$RELEASE" = "15.10" ]; then
DEP="$DEP libjsoncpp0v5 libcrypto++9v5 libyaml-cpp0.5v5 antlr3"
else
DEP="$DEP libjsoncpp0 libcrypto++9 libyaml-cpp0.5"
fi
sudo apt-get -y install $DEP
if [ "$RELEASE" != "15.10" ]; then
sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
sudo apt-get -y update
fi
sudo apt-get -y install g++-4.9
echo Y | sudo mk-build-deps -i -r
debuild -r fakeroot -us -uc

View File

@@ -4,11 +4,11 @@ Homepage: http://scylladb.com
Section: database
Priority: optional
Standards-Version: 3.9.5
Build-Depends: debhelper (>= 9), libyaml-cpp-dev, liblz4-dev, libsnappy-dev, libcrypto++-dev, libjsoncpp-dev, libaio-dev, libthrift-dev, thrift-compiler, antlr3, antlr3-c++-dev, ragel, g++-4.9, ninja-build, git, libboost-program-options1.55-dev | libboost-program-options-dev, libboost-filesystem1.55-dev | libboost-filesystem-dev, libboost-system1.55-dev | libboost-system-dev, libboost-thread1.55-dev | libboost-thread-dev, libboost-test1.55-dev | libboost-test-dev, libgnutls28-dev, libhwloc-dev, libnuma-dev, libpciaccess-dev
Build-Depends: debhelper (>= 9), libyaml-cpp-dev, liblz4-dev, libsnappy-dev, libcrypto++-dev, libjsoncpp-dev, libaio-dev, libthrift-dev, thrift-compiler, antlr3, antlr3-c++-dev, ragel, g++-4.9, ninja-build, git, libboost-program-options1.55-dev | libboost-program-options-dev, libboost-filesystem1.55-dev | libboost-filesystem-dev, libboost-system1.55-dev | libboost-system-dev, libboost-thread1.55-dev | libboost-thread-dev, libboost-test1.55-dev | libboost-test-dev, libgnutls28-dev, libhwloc-dev, libnuma-dev, libpciaccess-dev, xfslibs-dev, python3-pyparsing
Package: scylla-server
Architecture: amd64
Depends: ${shlibs:Depends}, ${misc:Depends}, hugepages, adduser, mdadm, xfsprogs, hwloc-nox
Depends: ${shlibs:Depends}, ${misc:Depends}, hugepages, adduser, hwloc-nox
Description: Scylla database server binaries
Scylla is a highly scalable, eventually consistent, distributed,
partitioned row DB.

Some files were not shown because too many files have changed in this diff Show More