Compare commits

...

228 Commits

Author SHA1 Message Date
Pekka Enberg
e5edfa98a6 release: prepare for 1.2.6 2016-11-01 12:24:57 +02:00
Pekka Enberg
972a6701b4 cql3: Fix selecting same column multiple times
Under the hood, the selectable::add_and_get_index() function
deliberately filters out duplicate columns. This causes
simple_selector::get_output_row() to return a row with all duplicate
columns filtered out, which triggers and assertion because of row
mismatch with metadata (which contains the duplicate columns).

The fix is rather simple: just make selection::from_selectors() use
selection_with_processing if the number of selectors and column
definitions doesn't match -- like Apache Cassandra does.

Fixes #1367
Message-Id: <1477989740-6485-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit e1e8ca2788)
2016-11-01 09:35:30 +00:00
Tomasz Grabiec
745ee206bf tests: Add test for UUID type ordering
Message-Id: <1473956716-5209-2-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 2282599394)
2016-09-20 12:11:16 +02:00
Tomasz Grabiec
890988c7e5 types: fix uuid_type_impl::less
timeuuid_type_impl::compare_bytes is a "trichotomic" comparator (-1,
0, 1) while less() is a "less" comparator (false, true). The code
incorrectly returns c1 instead of c1 < 0 which breaks the ordering.

Fixes #1196.
Message-Id: <1473956716-5209-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit 804fe50b7f)
2016-09-20 12:11:09 +02:00
Pekka Enberg
906ddc16b2 Merge "Fix regression in cql_query_test" from Tomek 2016-09-19 13:54:08 +03:00
Tomasz Grabiec
be0b7336d5 bound_view: Fix use-after-free involving bottom()/top() bound_views
The key is captured by reference, so we can't pass temporaries to it.

Introduced in 5dca11087e.

The problem doesn't exist on branch-1.3 and newer.
2016-09-19 11:54:21 +02:00
Tomasz Grabiec
58d92b304b keys: Don't require schema from make_empty()
Backported from 57413618e8
2016-09-19 11:53:43 +02:00
Shlomi Livne
cc8ab6de2e release: prepare for 1.2.5
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
2016-09-16 22:39:35 +03:00
Paweł Dziepak
5dca11087e Merge "Fix abort when querying with contradicting clustering restrictions" from Tomek
"This series fixes #1670 on top of 1.2 branch.

Fixes abort when querying with contradicting clustering column
restrictions, for example:

   SELECT * FROM test WHERE k = 0 AND ck < 1 and ck > 2"
2016-09-15 15:17:34 +01:00
Tomasz Grabiec
5971f7f4fa Fix abort when querying with contradicting clustering restrictions
Example of affected query:

 SELECT * FROM test WHERE k = 0 AND ck < 1 and ck > 2

Refs #1670.
2016-09-15 13:11:22 +02:00
Tomasz Grabiec
b6d2a73c56 Import bounds_view 2016-09-15 13:11:09 +02:00
Tomasz Grabiec
2db8626dbf database: Ignore spaces in initial_token list
Currently we get boost::lexical_cast on startup if inital_token has a
list which contains spaces after commas, e.g.:

  initial_token: -1100081313741479381, -1104041856484663086, ...

Fixes #1664.
Message-Id: <1473840915-5682-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit a498da1987)
2016-09-14 12:03:41 +03:00
Pekka Enberg
ba5d52c94e release: prepare for 1.2.4 2016-08-25 17:33:56 +03:00
Paweł Dziepak
ffed8a5603 mutation_partition: fix iterator invalidation in trim_rows
Reversed iterators are adaptors for 'normal' iterators. These underlying
iterators point to different objects that the reversed iterators
themselves.

The consequence of this is that removing an element pointed to by a
reversed iterator may invalidate reversed iterator which point to a
completely different object.

This is what happens in trim_rows for reversed queries. Erasing a row
can invalidate end iterator and the loop would fail to stop.

The solution is to introduce
reversal_traits::erase_dispose_and_update_end() funcion which erases and
disposes object pointed to by a given iterator but takes also a
reference to and end iterator and updates it if necessary to make sure
that it stays valid.

Fixes #1609.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1472080609-11642-1-git-send-email-pdziepak@scylladb.com>
(cherry picked from commit 6012a7e733)
2016-08-25 17:31:46 +03:00
Piotr Jastrzebski
ec51c8e1b8 Fix after free access bug in storage proxy
Due to speculative reads we can't guarantee that all
fibers started by storage_proxy::query will be finished
by the time the method returns a result.

We need to make sure that no parameter passed to this
method ever changes.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <31952e323e599905814b7f378aafdf779f7072b8.1471005642.git.piotr@scylladb.com>
(cherry picked from commit f212a6cfcb)

[tgrabiec: resolved trivial conflict]
2016-08-12 16:38:21 +02:00
Avi Kivity
50056a6df6 Update seastar submodule
* seastar 27e13e7...d6ccc19 (1):
  > Merge "Fix the SMP queue poller" from Tomasz

Fixes #1553.
2016-08-10 10:17:06 +03:00
Duarte Nunes
184b62d790 schema_builder: Ensure dense tables have compact col
This patch ensures that when the schema is dense, regardless of
compact_storage being set, the single regular columns is translated
into a compact column.

This fixes an issue where Thrift dynamic column families are
translated to a dense schema with a regular column, instead of a
compact one.

Since a compact column is also a regular column (e.g., for purposes of
querying), no further changes are required.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470062410-1414-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 5995aebf39)

Fixes #1535.
2016-08-03 13:50:54 +02:00
Duarte Nunes
f5a1f402f5 schema: Dense schemas are correctly upgrades
When upgrading a dense schema, we would drop the cells of the regular
(compact) column. This patch fixes this by making the regular and
compact column kinds compatible.

Fixes #1536

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470172097-7719-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 1516cd4c08)
2016-08-03 13:39:49 +02:00
Avi Kivity
9f09812733 checked_file: preserve DMA alignment
Inherit the alignment parameters from the underlying file instead of
defaulting to 4096.  This gives better read performance on disks with 512-byte
sectors.

Fixes #1532.
Message-Id: <1470122188-25548-1-git-send-email-avi@scylladb.com>

(cherry picked from commit 9f35e4d328)
2016-08-02 12:23:21 +03:00
Duarte Nunes
e9b7352adb storage_service: Fix get_range_to_address_map_in_local_dc
This patch fixes a couple of bugs in
get_range_to_address_map_in_local_dc.

Fixes #1517

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1469782666-21320-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 7d1b7e8da3)
2016-07-29 11:24:50 +02:00
Pekka Enberg
2461a85c0f Update seastar submodule
* seastar 3558f41...27e13e7 (2):
  > iotune: Fix SIGFPE with some executions
  > iotune: provide a status dump if we can't calculate a proper number
  > of io_queues
2016-07-29 11:13:58 +03:00
Gleb Natapov
9503145e38 api: fix use after free in sum_sstable
get_sstables_including_compacted_undeleted() may return temporary shared
ptr which will be destroyed before the loop if not stored locally.

Fixes #1514

Message-Id: <20160728100504.GD2502@scylladb.com>
(cherry picked from commit 3531dd8d71)
2016-07-28 14:34:08 +03:00
Tomasz Grabiec
9d99dd46cb tests: lsa_async_eviction_test: Use chunked_fifo<>
To protect against large reallocations during push() which are done
under reclaim lock and may fail.
2016-07-27 18:40:35 +02:00
Pekka Enberg
c9dfbf7913 release: prepare for 1.2.3 2016-07-27 13:32:11 +03:00
Avi Kivity
4f02a5f4b3 bloom_filter: fix overflow for large filters
We use ::abs(), which has an int parameter, on long arguments, resulting
in incorrect results.

Switch to std::abs() instead, which has the correct overloads.

Fixes #1494.

Message-Id: <1469347802-28933-1-git-send-email-avi@scylladb.com>
(cherry picked from commit 900639915d)
2016-07-24 11:32:54 +03:00
Tomasz Grabiec
7457ed982d schema_tables: Fix hang during keyspace drop
Fixes #1484.

We drop tables as part of keyspace drop. Table drop starts with
creating a snapshot on all shards. All shards must use the same
snapshot timestamp which, among other things, is part of the snapshot
name. The timestamp is generated using supplied timestamp generating
function (joinpoint object). The joinpoint object will wait for all
shards to arrive and then generate and return the timestamp.

However, we drop tables in parallel, using the same joinpoint
instance. So joinpoint may be contacted by snapshotting shards of
tables A and B concurrently, generating timestamp t1 for some shards
of table A and some shards of table B. Later the remaining shards of
table A will get a different timestamp. As a result, different shards
may use different snapshot names for the same table. The snapshot
creation will never complete because the sealing fiber waits for all
shards to signal it, on the same name.

The fix is to give each table a separate joinpoint instance.

Message-Id: <1469117228-17879-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 5e8f0efc85)
2016-07-22 15:53:46 +02:00
Avi Kivity
16a5be622c Update seastar submodule
* seastar 86d9b13...3558f41 (5):
  > Fix chunked_fifo move assignment
  > semaphore: switch to chunked_fifo
  > fair_queue: add missing include
  > chunked_fifo: implement back()
  > Chunked FIFO queue
2016-07-19 14:49:58 +03:00
Takuya ASADA
caab57bb24 dist/redhat/centos_dep: disable go and ada language on scylla-gcc package, since ScyllaDB never use them
centos-master jenkins job failed at building libgo, but we don't need go language, so let's disable it on scylla-gcc package.
Also we never use ada, disable it too.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1468166660-23323-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit d2caa486ba)
2016-07-19 11:03:42 +03:00
Tomasz Grabiec
3efa1211ec types: Fix update_types()
We should replace the old type, not insert the new type before the old type.

Fixes #1465

Message-Id: <1468861076-20397-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit ce768858f5)
2016-07-18 20:14:55 +03:00
Avi Kivity
3898216831 Update seastar submodule
* seastar 34c0f6c...86d9b13 (1):
  > rpc: do not call shutdown function on already closed fd

Fixes #1463.
2016-07-18 15:26:00 +03:00
Avi Kivity
0af39f2d0c Update seastar submodule
* seastar f3826f0...34c0f6c (2):
  > rpc: fix race between send loop and expiration timer
  > reactor: create new files with a more reasonable default mode
2016-07-17 13:36:18 +03:00
Avi Kivity
e296fef581 Fix bad backport (259b2592d4) 2016-07-15 14:18:50 +03:00
Avi Kivity
5ee6a00b0f db: don't over-allocate memory for mutation_reader
column_family::make_reader() doesn't deal with sstables directly, so it
doesn't need to reserve memory for them.

Fixes #1453.
Message-Id: <1468429143-4354-1-git-send-email-avi@scylladb.com>

(cherry picked from commit d3c87975b0)
2016-07-15 14:11:01 +03:00
Avi Kivity
64df5f3f38 db: estimate queued read size more conservatively
There are plenty of continuations involved, so don't assume it fits in 1k.
Message-Id: <1468429516-4591-1-git-send-email-avi@scylladb.com>

(cherry picked from commit 23edc1861a)
2016-07-15 14:09:47 +03:00
Avi Kivity
259b2592d4 db: do not create column family directories belonging to foreign keyspaces
Currently, for any column family, we create a directory for it in all
keyspace directories.  This is incredibly awkward.

Fix by iterating over just the keyspace's column families, not all
column families in existence.

Fixes #1457.
Message-Id: <1468495182-18424-1-git-send-email-avi@scylladb.com>

(cherry picked from commit 1048e1071b)
2016-07-15 14:08:46 +03:00
Avi Kivity
51eba96c14 transport: encode user-defined type metadata
Right now we fall back to tuples, which confuses the client.

Fixes #1443.

Reviewed-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1468167120-1945-1-git-send-email-avi@scylladb.com>
(cherry picked from commit f126efd7f2)
2016-07-12 11:12:37 +03:00
Avi Kivity
66e8204c79 Update seastar submodule
* seastar 31d988c...f3826f0 (3):
  > Fix boost version check
  > reactor: more fix for smp poll with older boost
  > reactor: fix build on older boost due to spsc_queue::read_available()
2016-07-05 00:43:11 +03:00
Avi Kivity
7f1c63afa3 auth: fix performance problem when looking up permissions
data_resource lookup uses data_resource::name(), which uses sprint(), which
uses (indirectly) locale, which takes a global lock.  This is a bottleneck
on large machines.

Fix by not using name() during lookup.

Fixes #1419
Message-Id: <1467616296-17645-1-git-send-email-avi@scylladb.com>

(cherry picked from commit 76cc0c0cf9)
2016-07-04 17:56:29 +03:00
Avi Kivity
8547f34d60 mutation_reader: make restricting_mutation_reader even more restricting
While limiting the number of concurrently executing sstable readers reduces
our memory load, the queued readers, although consuming a small amount of
memory, can still grow without bounds.

To limit the damage, add two limits on the queue:
 - a timeout, which is equal to the read timeout
 - a queue length limit, which is equal to 2% of the shard memory divided
   by an estimate of the queued request size (1kb)

Together, these limits bound the amount of memory needed by queued disk
requests in case the disk can't keep up.
Message-Id: <1467206055-30769-1-git-send-email-avi@scylladb.com>

(cherry picked from commit 9ac730dcc9)
2016-06-29 17:29:00 +03:00
Avi Kivity
a3078c9b9d Fix backport of restricting_mutation_reader 2016-06-27 19:57:31 +03:00
Avi Kivity
00692d891e db: add statistics about queued reads
Fixes #1398.

(cherry picked from commit f03cd6e913)
2016-06-27 19:43:16 +03:00
Avi Kivity
94aa879d19 db: restrict replica read concurrency
Since reading mutations can consume a large amount of memory, which, moreover,
is not predicatable at the time the read is initiated, restrict the number
of reads to 100 per shard.  This is more than enough to saturate the disk,
and hopefully enough to prevent allocation failures.

Restriction is applied in column_family::make_sstable_reader(), which is
called either on a cache miss or if the cache is disabled.  This allows
cached reads to proceed without restriction, since their memory usage is
supposedly low.

Reads from the system keyspace use a separate semaphore, to prevent
user reads from blocking system reads.  Perhaps we should select the
semaphore based on the source of the read rather than the keyspace,
but for now using the keyspace is sufficient.

Fixes #1398.

(cherry picked from commit edeef03b34)
2016-06-27 19:43:07 +03:00
Avi Kivity
8361b01b9d mutation_reader: introduce restricting_reader
A restricting_reader wraps a mutation_reader, and restricts it concurrency
using a provided semaphore; this allows controlling read concurrency, which
is important since reads can consume a lot of resources ((number of
participating sstables) * 128k after we have streaming mutations, and a lot
more before).

Fixes #1398.

(cherry picked from commit bea7d7ee94)
2016-06-27 19:42:59 +03:00
Avi Kivity
67e80fd595 Update seastar submodule
* seastar 0bcdd28...31d988c (2):
  > reactor: run idle poll handler with a pure poll function
  > resource: don't abort on too-high io queue count

Fixes #1395.
Fixes #1400.
2016-06-27 19:31:43 +03:00
Avi Kivity
b3915e0363 Seastar: prepare a branch for 1.2 backports 2016-06-27 19:30:13 +03:00
Avi Kivity
985c4ffcc6 release: prepare for 1.2.2 2016-06-27 19:29:57 +03:00
Avi Kivity
c56fc99b7f main: handle exceptions during startup
If we don't, std::terminate() causes a core dump, even though an
exception is sort-of-expected here and can be handled.

Add an exception handler to fix.

Fixes #1379.
Message-Id: <1466595221-20358-1-git-send-email-avi@scylladb.com>

(cherry picked from commit 5af22f6cb1)
2016-06-23 10:03:30 +03:00
Pekka Enberg
85d33e2ee4 release: prepare for 1.2.1 2016-06-21 16:22:17 +03:00
Duarte Nunes
ffeef2f072 database: Actually decrease query_state limit
query_state expects the current row limit to be updated so it
can be enforced across partition ranges. A regression introduced
in e4e8acc946 prevented that from
happening by passing a copy of the limit to querying_reader.

This patch fixes the issue by having column_family::query update
the limit as it processes partitions from the querying_reader.

Fixes #1338

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1465804012-30535-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit c896309383)
2016-06-21 10:03:22 +03:00
Pekka Enberg
d3ffa00eb2 systemd: Use PermissionsStartOnly instead of running sudo
Use the PermissionsStartOnly systemd option to apply the permission
related configurations only to the start command. This allows us to stop
using "sudo" for ExecStartPre and ExecStopPost hooks and drop the
"requiretty" /etc/sudoers hack from Scylla's RPM.

Tested-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1466407587-31734-1-git-send-email-penberg@scylladb.com>
(cherry picked from commit 1d5f7be447)
2016-06-21 08:49:17 +03:00
Nadav Har'El
ad50d83302 Rewriting shared sstables only after all shards loaded sstables
After commit faa4581, each shard only starts splitting its shared sstables
after opening all sstables. This was important because compaction needs to
be aware of all sstables.

However, another bug remained: If one shard finishes loading its sstables
and starts the splitting compactions, and in parallel a different shard is
still opening sstables - the second shard might find a half-written sstable
being written by the first shard, and abort on a malformed sstable.

So in this patch we start the shared sstable rewrites - on all shards -
only after all shards finished loading their sstables. Doing this is easy,
because main.cc already contains a list of sequential steps where each
uses invoke_on_all() to make sure the step completes on all shards before
continuing to the next step.

Fixes #1371

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1466426641-3972-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit 3372052d48)
2016-06-20 18:20:01 +03:00
Avi Kivity
c6a9844dfe dist: fix scylla-kernel-conf postinstall scriptlet failure
Because we build on CentOS 7, which does not have the %sysctl_apply macro,
the macro is not expanded, and therefore executed incorrectly even on 7.2,
which does.

Fix by expanding the macro manually.

Fixes #1360.
Message-Id: <1466250006-19476-1-git-send-email-avi@scylladb.com>

(cherry picked from commit 07045ffd7c)
2016-06-20 09:37:06 +03:00
Nadav Har'El
dececbc0b9 Rewrite shared sstables only after entire CF is read
Starting in commit 721f7d1d4f, we start "rewriting" a shared sstable (i.e.,
splitting it into individual shards) as soon as it is loaded in each shard.

However as discovered in issue #1366, this is too soon: Our compaction
process relies in several places that compaction is only done after all
the sstables of the same CF have been loaded. One example is that we
need to know the content of the other sstables to decide which tombstones
we can expire (this is issue #1366). Another example is that we use the
last generation number we are aware of to decide the number of the next
compaction output - and this is wrong before we saw all sstables.

So with this patch, while loading sstables we only make a list of shared
sstables which need to be rewritten - and the actual rewrite is only started
when we finish reading all the sstables for this CF. We need to do this in
two cases: reboot (when we load all the existing sstables we find on disk),
and nodetool referesh (when we import a set of new sstables).

Fixes #1366.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1466344078-31290-1-git-send-email-nyh@scylladb.com>
(cherry picked from commit faa45812b2)
2016-06-19 17:11:14 +03:00
Asias He
f2031bf3db repair: Switch log level to warn instead of error
dtest takes error level log as serious error. It is not a serious error
for streaming to fail to send a verb and fail a streaming session which
triggers a repair failure, for example, the peer node is gone or
stopped. Switch to use log level warn instead of level error.

Fixes repair_additional_test.py:RepairAdditionalTest.repair_kill_3_test

Fixes: #1335
Message-Id: <406fb0c4a45b81bd9c0aea2a898d7ca0787b23e9.1465979288.git.asias@scylladb.com>
(cherry picked from commit de0fd98349)
2016-06-18 11:42:21 +03:00
Asias He
da77b8885f streaming: Switch log level to warn instead of error
dtest takes error level log as serious error. It is not a serious error
for streaming to fail to send a verb and fail a streaming session, for
example, the peer node is gone or stopped. Switch to use log level warn
instead of level error.

Fixes repair_additional_test.py:RepairAdditionalTest.repair_kill_3_test

Fixes: #1335
Message-Id: <0149d30044e6e4d80732f1a20cd20593de489fc8.1465979288.git.asias@scylladb.com>
(cherry picked from commit 94c9211b0e)
2016-06-18 11:42:10 +03:00
Asias He
86434378d1 streaming: Fix indention in do_send_mutations
Message-Id: <bc8cfa7c7b29f08e70c0af6d2fb835124d0831ac.1464857352.git.asias@scylladb.com>
(cherry picked from commit 96463cc17c)
2016-06-18 11:41:51 +03:00
Pekka Enberg
e5d24d5940 service/storage_service: Make do_isolate_on_error() more robust
Currently, we only stop the CQL transport server. Extract a
stop_transport() function from drain_on_shutdown() and call it from
do_isolate_on_error() to also shut down the inter-node RPC transport,
Thrift, and other communications services.

Fixes #1353

(cherry picked from commit d72c608868)

Conflicts:
	service/storage_service.cc

(cherry picked from commit 7e052a4e91)
2016-06-16 14:01:33 +03:00
Nadav Har'El
0a2d4204bd Rewrite shared sstables soon after startup
Several shards may share the same sstable - e.g., when re-starting scylla
with a different number of shards, or when importing sstables from an
external source. Sharing an sstable is fine, but it can result in excessive
disk space use because the shared sstable cannot be deleted until all
the shards using it have finished compacting it. Normally, we have no idea
when the shards will decide to compact these sstables - e.g., with size-
tiered-compaction a large sstable will take a long time until we decide
to compact it. So what this patch does is to initiate compaction of the
shared sstables - on each shard using it - so that a soon as possible after
the restart, we will have the original sstable is split into separate
sstables per shard, and the original sstable can be deleted. If several
sstables are shared, we serialize this compaction process so that each
shard only rewrites one sstable at a time. Regular compactions may happen
in parallel, but they will not not be able to choose any of the shared
sstables because those are already marked as being compacted.

Commit 3f2286d0 increased the need for this patch, because since that
commit, if we don't delete the shared sstable, we also cannot delete
additional sstables which the different shards compacted with it. For one
scylla user, this resulted in so much excessive disk space use, that it
literally filled the whole disk.

After this patch commit 3f2286d0, or the discussion in issue #1318 on how
to improve it, is no longer necessary, because we will never compact a shared
sstable together with any other sstable - as explained above, the shared
sstables are marked as "being compacted" so the regular compactions will
avoid them.

Fixes #1314.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1465406235-15378-1-git-send-email-nyh@scylladb.com>
Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 721f7d1d4f)
2016-06-16 14:01:33 +03:00
Tomasz Grabiec
74b8f63e8f row_cache: Make stronger guarantees in clear/invalidate
Correctness of current uses of clear() and invalidate() relies on fact
that cache is not populated using readers created before
invalidation. Sstables are first modified and then cache is
invalidated. This is not guaranteed by current implementation
though. As pointed out by Avi, a populating read may race with the
call to clear(). If that read started before clear() and completed
after it, the cache may be populated with data which does not
correspond to the new sstable set.

To provide such guarantee, invalidate() variants were adjusted to
synchronize using _populate_phaser, similarly like row_cache::update()
does.

(cherry picked from commit 170a214628)

Conflicts:
	database.cc
2016-06-16 14:01:33 +03:00
Tomasz Grabiec
9b764b726b row_cache: Implement clear() using invalidate()
Reduces code duplication.

(cherry picked from commit 2ab18dcd2d)
2016-06-16 14:01:33 +03:00
Pekka Enberg
07ba03ce7b utils/exceptions: Whitelist EEXIST and ENOENT in should_stop_on_system_error()
There are various call-sites that explicitly check for EEXIST and
ENOENT:

  $ git grep "std::error_code(E"
  database.cc:                            if (e.code() != std::error_code(EEXIST, std::system_category())) {
  database.cc:            if (e.code() != std::error_code(ENOENT, std::system_category())) {
  database.cc:        if (e.code() != std::error_code(ENOENT, std::system_category())) {
  database.cc:                            if (e.code() != std::error_code(ENOENT, std::system_category())) {
  sstables/sstables.cc:            if (e.code() == std::error_code(ENOENT, std::system_category())) {
  sstables/sstables.cc:            if (e.code() == std::error_code(ENOENT, std::system_category())) {

Commit 961e80a ("Be more conservative when deciding when to shut down
due to disk errors") turned these errors into a storage_io_exception
that is not expected by the callers, which causes 'nodetool snapshot'
functionality to break, for example.

Whitelist the two error codes to revert back to the old behavior of
io_check().
Message-Id: <1465454446-17954-1-git-send-email-penberg@scylladb.com>

(cherry picked from commit 8df5aa7b0c)
2016-06-16 14:01:33 +03:00
Avi Kivity
de690a6997 Be more conservative when deciding when to shut down due to disk errors
Currently we only shut down on EIO.  Expand this to shut down on any
system_error.

This may cause us to shut down prematurely due to a transient error,
but this is better than not shutting down due to a permanent error
(such as ENOSPC or EPERM).  We may whitelist certain errors in the future
to improve the behavior.

Fixes #1311.
Message-Id: <1465136956-1352-1-git-send-email-avi@scylladb.com>

(cherry picked from commit 961e80ab74)
2016-06-16 14:01:33 +03:00
Pekka Enberg
7b53e969d2 dist/docker: Use Scylla 1.2 RPM repository 2016-06-15 19:50:02 +03:00
Pekka Enberg
c384b23112 release: prepare for 1.2.0 2016-06-13 15:18:13 +03:00
Shlomi Livne
3688542323 dist/common: Update scylla_io_setup to use settings done in cpuset.conf
scylla_io_setup is searching for --smp and --cpuset setting in
SCYLLA_ARGS. We have moved the settings of this args into
/etc/scylla.d/cpuset.conf and they are set by scylla_cpuset_setup into
CPUSET.

Fixes: #1327

Signed-off-by: Shlomi Livne <shlomi@scylladb.com>
Message-Id: <2735e3abdd63d245ec96cfa1e65f766b1c12132e.1465508701.git.shlomi@scylladb.com>
(cherry picked from commit ac6f2b5c13)
2016-06-10 09:38:17 +03:00
Pekka Enberg
7916182cfa Revert "Be more conservative when deciding when to shut down due to disk errors"
This reverts commit a6179476c5.

The change breaks 'nodetool snapshot', for example.
2016-06-09 10:11:29 +03:00
Tomasz Grabiec
ec1fd3945f Revert "config: adjust boost::program_options validator to work with db::string_map"
This reverts commit 653e250d04.

Compiletion is broken with this patch:

[155/264] CXX build/release/db/config.o
FAILED: g++ -MMD -MT build/release/db/config.o -MF build/release/db/config.o.d -std=gnu++1y -g  -Wall -Werror -fvisibility=hidden -pthread -I/home/shlomi/scylla/seastar -I/home/shlomi/scylla/seastar/build/release/gen  -march=nehalem -Wno-overloaded-virtual -DHAVE_HWLOC -DHAVE_NUMA  -O2 -I/usr/include/jsoncpp/  -Wno-maybe-uninitialized -DHAVE_LIBSYSTEMD=1 -I. -I build/release/gen -I seastar -I seastar/build/release/gen -c -o build/release/db/config.o db/config.cc
db/config.cc:57:13: error: ‘void db::validate(boost::any&, const std::vector<std::__cxx11::basic_string<char> >&, db::string_map*, int)’ defined but not used [-Werror=unused-function]
 static void validate(boost::any& out, const std::vector<std::string>& in,
             ^
cc1plus: all warnings being treated as errors

This branch doesn't have commits which introduce the problem which
this patch fixes, so let's just revert it.
2016-06-08 11:05:47 +02:00
Gleb Natapov
653e250d04 config: adjust boost::program_options validator to work with db::string_map
Fixes #1320

Message-Id: <20160607064511.GX9939@scylladb.com>
(cherry picked from commit 9635e67a84)
2016-06-07 10:43:30 +03:00
Amnon Heiman
6255076c20 rate_moving_average: mean_rate is not initilized
The rate_moving_average is used by timed_rate_moving_average to return
its internal values.

If there are no timed event, the mean_rate is not propertly initilized.
To solve that the mean_rate is now initilized to 0 in the structure
definition.

Refs #1306

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1465231006-7081-1-git-send-email-amnon@scylladb.com>
(cherry picked from commit 2cf882c365)
2016-06-07 09:44:26 +03:00
Pekka Enberg
420ebe28fd release: prepare for 1.2.rc2 2016-06-06 16:17:26 +03:00
Avi Kivity
a6179476c5 Be more conservative when deciding when to shut down due to disk errors
Currently we only shut down on EIO.  Expand this to shut down on any
system_error.

This may cause us to shut down prematurely due to a transient error,
but this is better than not shutting down due to a permanent error
(such as ENOSPC or EPERM).  We may whitelist certain errors in the future
to improve the behavior.

Fixes #1311.
Message-Id: <1465136956-1352-1-git-send-email-avi@scylladb.com>

(cherry picked from commit 961e80ab74)
2016-06-06 16:15:25 +03:00
Raphael S. Carvalho
342726a23c compaction: leveled: improve log message for overlapping table
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <2dcbe3c8131f1d88a3536daa0b6cdd25c6e41d76.1464883077.git.raphaelsc@scylladb.com>
(cherry picked from commit 17b56eb459)
2016-06-06 16:13:40 +03:00
Raphael S. Carvalho
e9946032f4 compaction: disable parallel compaction for leveled strategy
It was discussed that leveled strategy may not benefit from parallel
compaction feature because almost all compaction jobs will have similar
size. It was also found that leveled strategy wasn't working correctly
with it because two overlapping sstable (targetting the same level)
could be created in parallel by two ongoing compaction.

Fixes #1293.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <60fe165d611c0283ca203c6d3aa2662ab091e363.1464883077.git.raphaelsc@scylladb.com>
(cherry picked from commit 588ce915d6)
2016-06-06 16:13:36 +03:00
Pekka Enberg
5e0b113732 Update scylla-ami submodule
* dist/ami/files/scylla-ami 72ae258...863cc45 (3):
  > Move --cpuset/--smp parameter settings from scylla_sysconfig_setup to scylla_ami_setup
  > convert scylla_install_ami to bash script
  > 'sh -x -e' is not valid since all scripts converted to bash script, so remove them
2016-06-06 13:38:53 +03:00
Asias He
c70faa4f23 streaming: Reduce memory usage when sending mutations
Limit disk bandwidth to 5MB/s to emulate a slow disk:
echo "8:0 5000000" >
/cgroup/blkio/limit/blkio.throttle.write_bps_device
echo "8:0 5000000" >
/cgroup/blkio/limit/blkio.throttle.read_bps_device

Start scylla node 1 with low memory:
scylla -c 1 -m 128M --auto-bootstrap false

Run c-s:
taskset -c 7 cassandra-stress write duration=5m cl=ONE -schema
'replication(factor=1)' -pop seq=1..100000  -rate threads=20
limit=2000/s -node 127.0.0.1

Start scylla node 2 with low memory:
scylla -c 1 -m 128M --auto-bootstrap true

Without this patch, I saw std::bad_alloc during streaming

ERROR 2016-06-01 14:31:00,196 [shard 0] storage_proxy - exception during
mutation write to 127.0.0.1: std::bad_alloc (std::bad_alloc)
...
ERROR 2016-06-01 14:31:10,172 [shard 0] database - failed to move
memtable to cache: std::bad_alloc (std::bad_alloc)
...

To fix:

1. Apply the streaming mutation limiter before we read the mutation into
memory to avoid wasting memory holding the mutation which we can not
send.

2. Reduce the parallelism of sending streaming mutations. Before we send each
range in parallel, after we send each range one by one.

   before: nr_vnode * nr_shard * (send_info + cf.make_reader memory usage)

   after: nr_shard * (send_info + cf.make_reader memory usage)

We can at least save memory usage by the factor of nr_vnode, 256 by
default.

In my setup, fix 1) alone is not enough, with both fix 1) and 2), I saw
no std::bad_alloc. Also, I did not see streaming bandwidth dropped due
to 2).

In addition, I tested grow_cluster_test.py:GrowClusterTest.test_grow_3_to_4,
as described:

https://github.com/scylladb/scylla/issues/1270#issuecomment-222585375

With this patch, I saw no std::bad_alloc any more.

Fixes: #1270

Message-Id: <7703cf7a9db40e53a87f0f7b5acbb03fff2daf43.1464785542.git.asias@scylladb.com>
(cherry picked from commit 206955e47c)
2016-06-02 11:18:59 +03:00
Gleb Natapov
15ad4c9033 storage_proxy: drop debug output
Message-Id: <20160601132641.GK2381@scylladb.com>
(cherry picked from commit 26b50eb8f4)
2016-06-01 17:14:32 +03:00
Pekka Enberg
d094329b6e Revert "Revert "main: change order between storage service and drain execution during exit""
This reverts commit b3ed55be1d.

The issue is in the failing dtest, not this commit. Gleb writes:

  "The bug is in the test, not the patch. Test waits for repair session
   to end one way or the other when node is killed, but for nodetool to
   know if repair is completed it needs to poll for it.  If node dies
   before nodetool managed to see repair completion it will stuck
   forever since jmx is alive, but does not provide answers any more.
   The patch changes timing, repair is completed much close to exit now,
   so problem appears, but it may happen even without the patch.

   The fix is for dtest to kill jmx as part of killing a node
   operation."

Now that Lucas fixed the problem in scylla-ccm, revert the revert.

(cherry picked from commit 0255318bf3)
2016-06-01 08:51:51 +03:00
Pekka Enberg
dcab915f21 release: prepare for 1.2.rc1 2016-05-30 13:14:38 +03:00
Pekka Enberg
b3ed55be1d Revert "main: change order between storage service and drain execution during exit"
This reverts commit 0ebd8b18b7.

The change breaks repair_additional_test.py:RepairAdditionalTest.repair_kill_1_test
2016-05-30 12:48:09 +03:00
Avi Kivity
e515933c70 dist: tune scheduler for lower latency
Scylla-jmx and collectd can preempt scylla and induce long latencies.  Tune
the scheduler to provide lower latencies.

Since when the support processes are not running we normally do not context
switch (one thread per core, remember?), there should be no effect on
throughput.

The tunings are provided in a separate package, which can be uninstalled
if the server is shared with other applications which are negatively
affected by the tuning.

Fixes #1218.
Message-Id: <1464529625-12825-1-git-send-email-avi@scylladb.com>
2016-05-30 08:42:19 +03:00
Avi Kivity
e8e00338d1 config: document defragment_memory_on_idle
Message-Id: <1464261650-14136-2-git-send-email-avi@scylladb.com>
2016-05-30 08:39:26 +03:00
Avi Kivity
b50cb3eca8 config: rename compact_on_idle
compact_on_idle will lead users to thinking we're talking about sstable
compaction, not log-structured-allocator compaction.

Rename the variable to reduce the probability of confusion.
Message-Id: <1464261650-14136-1-git-send-email-avi@scylladb.com>
2016-05-30 08:39:13 +03:00
Yoav Kleinberger
e580ac5dae docker: fix Ubuntu Dockerfile
one needs to update the repository info before one can install packages.
Fixes issue #1296.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <a906e76d584baff5988cb31a4003de27455e0741.1464529740.git.yoav@scylladb.com>
2016-05-29 17:00:25 +03:00
Avi Kivity
3f6ecb9f28 Merge "cancel cross DC read repair if non matching data was recently modified" from Gleb 2016-05-29 15:58:55 +03:00
Gleb Natapov
2efbccc901 storage_proxy: do only local read repair if non matching data was recently modified
When read/write to a partition happens in parallel reader may detect
digest mismatch that may potentially cause cross DC read repair attempt,
but the repair is not really needed, so added latency is not justified.

This patch tries to prevent such parallel access from causing heavy
cross DC repair operation buy checking a timestamp of most resent
modification. If the modification happens less then "write timeout"
seconds ago the patch assumes that the read operation raced with write
one and cancel cross DC repair, but only if CL is LOCAL_*.
2016-05-29 15:26:51 +03:00
Amnon Heiman
d4123ba613 API: column_family count sstable space used correctly
The space calculation counters in column family had two problem:
1. The total bytes is an ever growing counter, which is meaningless for
the API.

2. Trying to simply sum the size on all shards, ignores the fact that the
same sstable file can be referenced by multiple shards, this is
especially noticeable during migration time.

To solve this, the implementation was modified so instead of
collecting the sizes, the API would collect a map of file name to size
and then would do the summing.

This removes the duplications and fixes the total bytes calculation

Calling cfstats before the change with load after a compaction happend:

$ nodetool cfstats keyspace1
Keyspace: keyspace1
Verify write latency 1068253.0 76435
	Read Count: 75915
	Read Latency: 0.5953986037015082 ms.
	Write Count: 76435
	Write Latency: 0.013975966507490025 ms.
	Pending Flushes: 0
		Table: standard1
		SSTable count: 5
		Space used (live): 44261215
		Space used (total): 219724478

After the fix:

$ nodetool cfstats keyspace1
Keyspace: keyspace1
Verify write latency 1863206.0 124219
	Read Count: 125401
	Read Latency: 0.9381053978835895 ms.
	Write Count: 124219
	Write Latency: 0.01499936402643718 ms.
	Pending Flushes: 0
		Table: standard1
		SSTable count: 6
		Space used (live): 50402904
		Space used (total): 50402904
		Space used by snapshots (total): 0

Fixes: #1042

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1464518757-14666-2-git-send-email-amnon@scylladb.com>
2016-05-29 14:11:03 +03:00
Gleb Natapov
32c9a06faf messaging_service: abort retrying send during exit
Fixes #862

Message-Id: <1463579574-15789-3-git-send-email-gleb@scylladb.com>
2016-05-29 11:39:36 +03:00
Gleb Natapov
0ebd8b18b7 main: change order between storage service and drain execution during exit
Even the comment says drain_on_shutdown should be called first, but for
that in has to be registered last.

Fixes #862

Message-Id: <1463579574-15789-2-git-send-email-gleb@scylladb.com>
2016-05-29 11:39:24 +03:00
Glauber Costa
30d54cef38 database: add a comment explaining the choice of function in CF stop
We have recently commited a fix to a broken streaming bug that involved
reverting column_family::stop() back to calling the custom seal functions
explicitly for both memtables and streaming memtables.

We here add a comment to explain why that had to be done.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <fe94b5883e9c29adc7fc9ee9f498894c057e7b64.1464293167.git.glauber@scylladb.com>
2016-05-29 11:28:15 +03:00
Avi Kivity
8e124b31aa Merge "gossip: Refactor waiting for supported features" from Duarte
"This patch changes the way we wait for supported features. We no longer
sleep periodically, waking up to check if the wanted features are now
avaiable. Instead, we register waiters in a condition variable that is
signaled whenever new endpoint information is received.

We also add a new poll interface based on the feature class, which
encapsulates the availability of a cluster feature."
2016-05-27 20:24:25 +03:00
Duarte Nunes
f613dabf53 gossip: Introduce the gms::feature class
This class encapsulates the waiting for a cluster feature. A feature
object is registered with the gossiper, which is responsible for later
marking it as enabled.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-05-27 17:20:51 +00:00
Duarte Nunes
4684b8ecbb gossip: Refactor waiting for features
This patch changes the sleep-based mechanism of detecting new features
by instead registering waiters with a condition variable that is
signaled whenever a new endpoint information is received.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-05-27 17:20:51 +00:00
Duarte Nunes
422f244172 gossip: Don't timeout when waiting for features
This patch removes the timeout when waiting for features,
since future patches will make this argument unnecessary.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-05-27 17:20:51 +00:00
Avi Kivity
fab4cc8d6d Merge seastar upstream
* seastar 8bfbb1a...0bcdd28 (1):
  > Merge "introduce sleep_abortable() that throws exception on application exit" from Gleb
2016-05-27 20:14:49 +03:00
Duarte Nunes
b3011c9039 gossip: Rename set_heart_beat_state
...to set_heart_beat_state_and_update_timestamp in order to make it
explicit for callers the update_timestamp is also changed.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1464309023-3254-3-git-send-email-duarte@scylladb.com>
2016-05-27 09:11:39 +03:00
Duarte Nunes
8c0e2e05b7 gossip: Fix modification to shadow endpoint state
This patch fixes an inadvertent change to the shadow endpoint state
map in gossiper::run, done by calling get_heart_beat_state() which
also updates the endpoint state's timestamp. This did not happen for
the normal map, but did happen for the shadow map. As a result, every
time gossiper::run() was scheduled, endpoint_map_changed would always
be true and all the shards would make superfluous copies of the
endpoint state maps.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1464309023-3254-2-git-send-email-duarte@scylladb.com>
2016-05-27 09:10:38 +03:00
Pekka Enberg
b7e79b72d5 Merge "Introduce SET_NIC for non-AMI environment" from Takuya
"This patchset provides a way to enable SET_NIC(posix_net_conf.sh) on
 non-AMI environment.
 Also support -mq option of the script.
 This also contains number of bug fixes of scripts.

 Fixes #1192"
2016-05-26 13:37:06 +03:00
Yoav Kleinberger
26c0d86401 tools/scyllatop: improved user interface: scrollable views
NOTE: scyllatop now requires the urwid library

previously, if there were more metrics that lines in the terminal
window, the user could not see some of the metrics.  Now the user can
scroll.

As an added bonus, the program will not crash when the window size
changes.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <1464098832-5755-1-git-send-email-yoav@scylladb.com>
2016-05-26 13:36:28 +03:00
Piotr Jastrzebski
136b8148d2 Use idle CPU to compact LSA memory
Register an idle CPU handler that compacts a single segment
every time there's nothing better to execute on CPU.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <c26aa608a1e0752fb9e6db1833ef3ba1de95f161.1464169748.git.piotr@scylladb.com>
2016-05-26 12:43:53 +03:00
Avi Kivity
d7f36a093f Merge seastar upstream
* seastar e5faea8...8bfbb1a (1):
  > reactor: advertise the logging_failures metric as a DERIVE counter

Fixes #1292.
2016-05-26 11:46:08 +03:00
Tomasz Grabiec
f0c2b1d161 config: Fix typos
Message-Id: <1464201938-4778-1-git-send-email-tgrabiec@scylladb.com>
2016-05-26 08:19:57 +03:00
Asias He
f1b3cb4a08 storage_service: Catch and fail an invalid configuration with --replace-address
Vlad reported a strange user configuration:

   SCYLLA_ARGS="--log-to-syslog 1 --log-to-stdout 0 --default-log-level
   info --collectd-address=127.0.0.1:25826 --collectd=1
   --collectd-poll-period 60000 --network-stack posix --num-io-queues 32
   --max-io-requests 128 --replace-address 10.0.4.131"

   seed_provider:
       - class_name: org.apache.cassandra.locator.SimpleSeedProvider
         parameters:
             - seeds: "10.0.4.131"

   In the mean while, 10.0.4.131 is the IP address of the node itself.

When the node was started, the following message were reported.

   Apr 13 06:31:12 n0 scylla[19681]: [shard 0] gossip - Connect seeds again
   ... (20 seconds passed)
   Apr 13 06:31:13 n0 scylla[19681]: [shard 0] gossip - Connect seeds again
   ... (21 seconds passed)
   Apr 13 06:31:14 n0 scylla[19681]: [shard 0] gossip - Connect seeds again
   ... (22 seconds passed)
   Apr 13 06:31:15 n0 scylla[19681]: [shard 0] gossip - Connect seeds again
   ... (23 seconds passed)

The configruation is invalid, becasue for --replace-address to
work, at least one working seed node should be alive. Catch the
configuration error and fail it with an appropriate error message.

Fixes #1183
Message-Id: <a94a082d896313e7a668915ae21fe2c03719da3a.1464164058.git.asias@scylladb.com>
2016-05-25 14:42:19 +03:00
Asias He
fed1e65e1e gossip: Do not insert the same node into _live_endpoints_just_added
_live_endpoints_just_added tracks the peer node which just becomes live.
When a down node gets back, the peer nodes can receive multiple messages
which would mark the node up, e.g., the message piled up in the sender's
tcp stack, after a node was blocked with gdb and released. Each such
message will trigger a echo message and when the reply of the echo
message is received (real_mark_alive), the same node will be added to
_live_endpoints_just_added.push_back more than once. Thus, we see the
same node be favored more than once:

INFO  2016-04-12 12:09:57,399 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO  2016-04-12 12:09:58,412 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO  2016-04-12 12:09:59,429 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO  2016-04-12 12:10:00,429 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO  2016-04-12 12:10:01,430 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO  2016-04-12 12:10:02,442 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2
INFO  2016-04-12 12:10:03,454 [shard 0] gossip -
do_gossip_to_live_member: Favor newly added node 127.0.0.2

To fix, do not insert the node if it is already in
_live_endpoints_just_added.

Fixes #1178
Message-Id: <6bcfad4430fbc63b4a8c40ec86a2744bdfafb40f.1464161975.git.asias@scylladb.com>
2016-05-25 14:19:40 +03:00
Glauber Costa
46f60f52d9 database: do not use implicitly stated seal function when closing the CF
In commit 4981362f57, I have introduced a regression that was thankfully
caught by our dtest infrastructure.

That patch is a preparation patch for the active reclaim patchset that is to
come, and it consolidated all the flushes using the memtable_list's seal_fn
function instead of calling the seal function explicitly.

The problem here is that the streaming memtables have the delayed mechanism,
about which the memtable_list is unaware. Calling memtable_list's
seal_active_memtable() for the streaming memtables calls the delayed version,
that does not guarantee flush. If we're lucky, we will indeed flush after the
timer expires, but if we're not we'll just stop the CF with data not flushed.

There are two options to fix this: the first is to teach the memtable_list about
the delayed/forced mechanism, and the second is to just call the correct
function explicitly during shutdown, and then when the time comes to add
continuations to the result of the seal, add them here as well.

Although the second option involves a bit more work and duplication, I think it
is better in the sense that the delayed / forced mechanism really is something
that belong to the streaming only. Being this the only user, I don't think it
justifies complicating the memtable_list with this concept.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <b26017c825ccf585f39f58c4ab3787d78e551f5f.1464126884.git.glauber@scylladb.com>
2016-05-25 08:21:24 +03:00
Avi Kivity
2d4d6c9c92 Merge seastar upstream
* seastar aed893e...e5faea8 (5):
  > Catch exceptions thrown by idle cpu handler
  > core::gate: add a get_count() method
  > reactor: Introduce idle CPU handler
  > core: add missing header for g++-4.9
  > Add lksctp-tools-devel do required packages
2016-05-24 20:42:41 +03:00
Pekka Enberg
ceb29f9d32 Merge "Introduce upload dir for sstable migration" from Raphael
"This change is intended to make migration process safer and easier.
 All column families will now have a directory called upload.
 With this feature, users may choose to copy migrated sstables to upload
 directory of respective column families, and run 'nodetool refresh'.
 That's supposed to be the preferred option from now on."
2016-05-24 16:36:47 +03:00
Gleb Natapov
7f6b12c97a query: add user provided timestamp to read_command
If read query supplies timestamp  move it to read_command to be
used later otherwise get local timestamp.
2016-05-24 15:19:35 +03:00
Pekka Enberg
d7d8c76fe5 transport/server: Add CQL frame LZ4 compression support
The default CQL frame compression algorithm in Cassandra is LZ4. Add
support for decompressing incoming frames and compressing outgoing
frames with LZ4 if the CQL driver asks for that.

Fixes #416

Message-Id: <1464086807-11325-1-git-send-email-penberg@scylladb.com>
2016-05-24 15:03:33 +03:00
Takuya ASADA
53cebb4a5e dist/ubuntu: don't rebuild dependency packages by default
Same as CentOS, do not build dependencies by default, install binary packages from our repository.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1464023451-21436-1-git-send-email-syuu@scylladb.com>
2016-05-24 14:10:59 +03:00
Gleb Natapov
12cf60c302 messaging_service: add timestemp of last modification to READ_DIGEST verb return value 2016-05-24 13:27:34 +03:00
Gleb Natapov
1e6f64f4ab query: add latest modification timestamp to result structure 2016-05-24 13:27:34 +03:00
Gleb Natapov
5fef0717cc query: find latest modification timestamp while calculating result digest 2016-05-24 13:27:34 +03:00
Avi Kivity
9637c2232c Merge "Move the JMX timer polling logic to Scylla" from Amnon 2016-05-24 13:07:52 +03:00
Raphael S. Carvalho
c2fa3b796d db: fix read consistency after refresh
If sstable loaded by refresh covers a row that is cached by the
column family, read query may fail to return consistent data.
What we should do is to clear cache for the column family being
loaded with new sstables.

Fixes #1212.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <a08c9885a5ceb0b2991e40337acf5b7679580a66.1464072720.git.raphaelsc@scylladb.com>
2016-05-24 12:11:41 +03:00
Takuya ASADA
5d5d525a14 dist/ubuntu: fix incorrect dependency package name
PyYAML is CentOS/RHEL/Fedora package name, python-yaml is correct one.

Fixes #1279

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1463987823-22837-1-git-send-email-syuu@scylladb.com>
2016-05-23 10:21:29 +03:00
Pekka Enberg
8a7197e390 dist/docker: Fetch RPM repository from Scylla web site
Fix the hard-coded Scylla RPM repository by downloading it from Scylla
web site. This makes it easier to switch between different versions.

Message-Id: <1463981271-25231-1-git-send-email-penberg@scylladb.com>
2016-05-23 09:45:41 +03:00
Piotr Jastrzebski
2be4ec4e06 Add lksctp-tools-devel to required packages
in fedora build instructions.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <15f3db34f12f01cb9da32fd14c16ba87e64ad5f4.1463947999.git.piotr@scylladb.com>
2016-05-23 08:26:02 +03:00
Avi Kivity
5e5317b228 dist: add build dependencies for sctp
Required by new seastar
2016-05-22 19:10:25 +03:00
Avi Kivity
5bb1255da1 Merge seastar upstream
* seastar 6a849ac...aed893e (3):
  > net: move 'transport' enum to seastar namespace
  > net: sctp protocol support for posix stack
  > future: Support get() when state is at a promise
2016-05-22 16:32:33 +03:00
Amnon Heiman
e26002d581 idl-compiler: default constructor of complex types
This patch solve a problem where a complex type is define as version
depended (with the version attribute) but doesn't have a default value.

In those cases the default constructor is used, but in the case of
complex types (template) param_type should be use to get the C++ type.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1463916723-15322-1-git-send-email-amnon@scylladb.com>
2016-05-22 15:32:29 +03:00
Raphael S. Carvalho
e5f0314afd db: introduce upload directory for sstable migration
This change is intended to make migration process safer and easier.
All column families will now have a directory called upload.
With this feature, users may choose to copy migrated sstables to upload
directory of respective column families, and call 'nodetool refresh'.
That's supposed to be the preferred option from now on.

For each sstable in upload directory, refresh will do the following:
1) Mutate sstable level to 0.
2) Create hard links to its components in column family dir, using
a new generation. We make it safe by creating a hard link to temporary
TOC first.
3) Remove all of its components in upload directory.

This new code runs after refresh checked for new sstables in the column
family directory. Otherwise, we could have a generation conflict.
Unlike the first step, this new step runs with sstable write enabled.
It's easier here because we know exactly which sstables are new.

After that, refresh will load new sstables found in column family
and upload directories.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-05-20 17:26:21 -03:00
Raphael S. Carvalho
70b793e4d3 tests: add test for statistics rewrite
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-05-20 17:26:12 -03:00
Raphael S. Carvalho
74c8a87777 sstables: fix statistics rewrite
It's not working because it tries to overwrite existing statistics
file with exclusive flag.
It's fixed by writing new statistics into temporary file and
renaming it into place.

If Scylla failed in middle of rewrite, a temporary file is left
over. So boot code was adjusted to delete a temporary file created
by this rewrite procedure.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-05-20 17:24:15 -03:00
Pekka Enberg
94e7e61cd0 api: Register snitch API earlier
Currently, we register snitch API in set_server_gossip_settle() which
waits until a node has joined the cluster. This makes 'nodetool status'
not properly show the status of a joining node. Fix the issue by
registering snitch API earlier.

Fixes #1269.
Message-Id: <1463576381-15484-1-git-send-email-penberg@scylladb.com>
2016-05-20 14:24:14 +03:00
Gleb Natapov
7a54b5ebbb gossiper: cleanup mark_alive() even more
Message-Id: <20160519100513.GE984@scylladb.com>
2016-05-19 12:47:19 +02:00
Takuya ASADA
03a762bb0b dist/common/scripts: Ask to set SET_NIC=yes on scylla_setup interactive prompt
We supported SET_NIC on non-AMI environment, so ask user to use it on scylla_setup interactive prompt.
2016-05-19 06:26:23 +09:00
Takuya ASADA
88fde0a91e dist/ami: fix dependency unresolved error on AMI build script with local package, by adding scylla-conf package
Since we added scylla-conf package, we cannot install scylla-server/-tools without the package, because of this --localrpm is failing.
So copy scylla-conf package to AMI, and install it to fix the problem.
2016-05-19 06:26:23 +09:00
Takuya ASADA
898243929f dist/common/scripts: specify queue settings for posix_net_conf.sh on scylla_prepare
posix_net_conf.sh wants -sq/-mq options, so detect number of queues and specify the option in scylla_prepare.
2016-05-19 06:26:23 +09:00
Takuya ASADA
f84b7b094f dist/common/scripts: drop special condition to enable SET_NIC on AMI, do this on AMI installation script
Remove special case of SET_NIC in AMI, do this in scylla-ami-setup.service.
2016-05-19 06:25:41 +09:00
Takuya ASADA
49cdd0b786 dist: move '--cpuset' and '--smp' configuration to scylla_cpuset_setup / cpuset.conf
These parameters are only required for AMI, not for non-AMI environment which want to enable SET_NIC, so split them to indivisual script / conf file, call it from AMI install script.
2016-05-19 06:25:28 +09:00
Takuya ASADA
46fa80a5a6 dist/common/scripts: replace IFNAME variable when --nic specified to scylla_sysconfig_setup
scylla_sysconfig_setup has bug that it not replaces IFNAME variable, so fixed.
2016-05-19 06:25:15 +09:00
Glauber Costa
4eff07d773 database: reorder initialization
In a preparation move for the LSA throttler, we have reordered the
initialization fields in database.hh so that the sizes of the regions are
computed before the initialization of the region.

However, that seemingly innocent move broke one of our tests. The reason behind
that, is that if we don't destroy the column families before destroying the
region, we may end up with a use after free in the memtable destructor - that
itself expects to call into the region.

This patch reorders the initialization so that the CF list still comes after the
dirty regions (therefore being destroyed first), while maintaining the relative
ordering between size / region that we needed in the first place.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <0669984b5bccdb2c950f2444bdee4427abad56ba.1463508884.git.glauber@scylladb.com>
2016-05-18 11:02:40 +03:00
Asias He
eb9ac9ab91 gms: Optimize gossiper::is_alive
In perf-flame, I saw in

service::storage_proxy::create_write_response_handler (2.66% cpu)

  gossiper::is_alive takes 0.72% cpu
  locator::token_metadata::pending_endpoints_for takes 1.2% cpu

After this patch:

service::storage_proxy::create_write_response_handler (2.17% cpu)

  gossiper::is_alive does not show up at all
  locator::token_metadata::pending_endpoints_for takes 1.3% cpu

There is no need to copy the endpoint_state from the endpoint_state_map
to check if a node is alive. Optimize it since gossiper::is_alive is
called in the fast path.

Message-Id: <2144310aef8d170cab34a2c96cb67cabca761ca8.1463540290.git.asias@scylladb.com>
2016-05-18 10:12:38 +03:00
Avi Kivity
6ec0000df8 Merge "fix migration of tables with level > 0" from Rapahel 2016-05-17 19:14:01 +03:00
Raphael S. Carvalho
cbc2e96a58 tests: check that overlapping sstable has its level changed to 0
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-05-17 11:11:05 -03:00
Raphael S. Carvalho
ee0f66eef6 db: fix migration of sstables with level greater than 0
Refresh will rewrite statistics of any migrated sstable with level
> 0. However, this operation is currently not working because O_EXCL
flag is used, meaning that create will fail.

It turns out that we don't actually need to change on-disk level of
a sstable by overwriting statistics file.
We can only set in-memory level of a sstable to 0. If Scylla reboots
before all migrated sstables are compacted, leveled strategy is smart
enough to detect sstables that overlap, and set their in-memory level
to 0.

Fixes #1124.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-05-17 11:08:08 -03:00
Gleb Natapov
76e0eb426e gossiper: simplify mark_alive()
The code runs in a thread so there is no need to use heap to
communicate between statements.

Message-Id: <20160517120245.GK984@scylladb.com>
2016-05-17 15:37:21 +03:00
Avi Kivity
4413176051 Merge "reduce performance degradation when adding node" from Asias
"With this series, the operations per second drop during adding node period gets
much better.

Before:
45K to 10K

After:
45k to 38K

Refs: #1223
"
2016-05-17 14:31:31 +03:00
Asias He
089734474b token_metadata: Speed up pending_endpoints_for
pending_endpoints_for is called frequently by
storage_proxy::create_write_response_handler when doing cql query.

Before this patch, each call to pending_endpoints_for involves
converting a multimap (std::unordered_multimap<range<token>,
inet_address>>) to map (std::unordered_map<range<token>,
std::unordered_set<inet_address>>).

To speed up the token to pending endpoint mapping search, a interval map
is introduced. It is faster than searching the map linearly and can
avoid caching the token/pending endpoint mapping.

With this patch, the operations per second drop during adding node
period gets much better.

Before:
45K to 10K

After:
45k to 38K

(The number is measured with the streaming code skipping to send data to
rule out the streaming factor.)

Refs: #1223
2016-05-17 17:32:15 +08:00
Asias He
ee0585cee9 dht: Add default constructor for token
It is needed to put token in to a boost interval_map in the following
patch.
2016-05-17 17:32:15 +08:00
Amnon Heiman
ad34f80e6f API: change cache_service, column_family and storage_proxy to rate
object

The API would expose now the rate_moving_average and
rate_moving_average_and_histogram.

The old end points remains for the transition period, but marked as
depricated.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-05-17 11:56:52 +03:00
Amnon Heiman
b33ed48527 API Definition: change cache_service, column_family and storage_proxy to use rate objects
This patch replaces the latency histogram to
rate_moving_avrage_and_histogram and the counters to
rate_moving_average.

The old endpoints where left unchagned but marked as depricated when
needed.
2016-05-17 11:55:06 +03:00
Amnon Heiman
20a48b0f20 API: column family stats break the map_reduce functionality
This patch replaces the helper function for column family with two
function, one that collect the relevant column family from all shareds
and another one that do the translation to json object.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-05-17 11:53:15 +03:00
Amnon Heiman
750f30cf07 column_family: Change histogram to
timed_rate_moving_average_and_histogram

As part of moving the derived statistic in to scylla, this replaces the
histogram object in the column_family to
timed_rate_moving_average_and_histogram.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-05-17 11:53:15 +03:00
Amnon Heiman
468bcfbf1f row_cache: Change counter to timed_rate_moving_average_and_histogram
As part of moving the derived statistic in to scylla, this replaces the
counter in the row_cache stats to
timed_rate_moving_average_and_histogram.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-05-17 11:53:15 +03:00
Amnon Heiman
64e0c8cd1b storage_proxy: Change histogram to
timed_rate_moving_average_and_histogram

As part of moving the derived statistic in to scylla, this replaces the
histogram object in the storage_proxy to
timed_rate_moving_average_and_histogram. and the read, write and range
counters where replaced by rate_moving_average.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-05-17 11:52:16 +03:00
Amnon Heiman
f6a5a4e3da API: Add helper function for the rate objects
This patch adds the helper function that are used to sum the
rate_moving_average and rate_moving_average_and_histogram.

The current sum functionality for histogram was modified to support
rate and histogram but return a histogram. This way current endpoints
would continue to behave the same.

It also cleans the histogram related method by using the plus operator
in the histogram.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-05-17 11:49:34 +03:00
Amnon Heiman
8ef25ceb05 Add waited avrage rate related object
This patch adds a few data structure for derived and accumulative
statistics that are similiar to the yammer implementation used by the
JMX.

It also adds a plus operator to histogram which cleans the histogram
usage.

moving_average - An exponentially-weighted moving average. calculate an event rate
on a given interval.

rate_moving_average and timed_rate_moving_average - Calculate 1m, 5m and
15m ewma an all time avrage and a counter.

rate_moving_average_and_histogram and
timed_rate_moving_average_and_histogram - Combines a histogram with a
rate_moving_average. It also expose a histogram API so it will be an
easy task to replace a histogram with a
timed_rate_moving_average_and_histogram.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-05-17 11:47:49 +03:00
Glauber Costa
17b9203719 database: invert order of elements
So that the sizes of the region can be initialized first

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <dc3df186a977b492d83c0a397f206c2db940aa37.1463448522.git.glauber@scylladb.com>
2016-05-17 11:28:39 +03:00
Glauber Costa
2ff6d38d0c database: use a single constructor for the column family
We've been keeping two constructors for the column family to allow for a
version without the commitlog. But it's by now quite complicated to maintain
the two, because changes always have to be made in two places.

This patch adds a private constructor that does the actual construction, and
have the public constructors to call it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <dd3cb0b9c20ad154a6131bad6ece619f70ed5025.1463448522.git.glauber@scylladb.com>
2016-05-17 11:28:39 +03:00
Glauber Costa
8fede5b98e memtables: isolate logic for disk writes disabled
When we have disk writes disabled, we exit immediately from the flush
function. We can just encode that separately and pass a different function
in the memtable_list creation. That simplifies the memtable flush a bit.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <908e3b5eb2c6ee84b8ad7b31c3673be5531a087c.1463448522.git.glauber@scylladb.com>
2016-05-17 11:28:38 +03:00
Glauber Costa
4981362f57 memtables: always seal through memtable_list seal function
I would like to be able to apply a function at the end of every flush, that is
common for both memtables and streaming memtables. For instance, to unthrottle
current waiters. Right now some calls to seal_active_memtable are open coded,
calling the column family's function directly, for both the main memtable list
and the streaming list.

This patch moves all the current open code callers to call the respective
memtable_list function.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <0c780254f3c4eb03e2bcd856b83941cf49a84b85.1463448522.git.glauber@scylladb.com>
2016-05-17 11:28:37 +03:00
Takuya ASADA
4972a72380 dist: drop 'sudo -E' and SETENV for security reason, source envfile from scripts
As Nadav pointed out, SETENV and sudo -E might be causes security hole:
https://github.com/scylladb/scylla/issues/1028#issuecomment-196202171
So drop them now, sourcing envfiles from scylla_prepare / scylla_stop scripts
instead.

Also on "[PATCH] ubuntu: Fix the init script variable sourcing" thread
we have problem to passing variables from envfiles to scylla_prepare /
scylla_stop on Ubuntu, it seems better to sourcing from these scripts.

Additionally, this fixes #1249

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1462989906-30062-1-git-send-email-syuu@scylladb.com>
2016-05-17 10:31:03 +03:00
Pekka Enberg
9c450f673c cql3: Clean up prepared_metadata class
Return vectors by const reference in prepared_metadata class and add a
FIXME to result_message class.

Message-Id: <1463425756-20225-1-git-send-email-penberg@scylladb.com>
2016-05-17 10:02:14 +03:00
Pekka Enberg
217c1ffa95 cql3: Specify result set flag ABI explicitly
As Avi points out, the flag values are an ABI. So specify them explicitly.

Message-Id: <1463413379-8355-1-git-send-email-penberg@scylladb.com>
2016-05-16 19:00:52 +03:00
Avi Kivity
a3b23d75b9 Merge "Fix Prepared message metadata serialization"
"The Prepared message has a metadata section that's similar to result set
metadata but not exactly the same. Fix serialization by introducing a
separate prepared_metadata class like Origin has and implement
serialization as per the CQL protocol specification. This fixes one CQL
binary protocol version 4 issue that we currently have.

The changes have been verified by running the gocql integration tests
using v4. Please note that this series does *not* enable v4 for clients
because Cassandra 2.1.x series only supports CQL binary protocol v3."
2016-05-16 18:59:54 +03:00
Pekka Enberg
868ff5107c cql3: Introduce prepared_metadata class
Introduce a new prepared_metadata class that holds prepared statement
metadata and implement CQL binary protocol serialization that works for
all versions.
2016-05-16 18:06:01 +03:00
Tomasz Grabiec
272e89846d Merge branch 'cache' from git@github.com:haaawk/scylla.git
From Piotr:

Fixes #656.

It makes it possible to slice using clustering ranges in mutation
readers.  We don't have row index yet so the slicing is just ignoring
data which is out of range.
2016-05-16 14:44:33 +02:00
Piotr Jastrzebski
dcba6f5c45 Pass clustering_row_ranges to mutation readers.
This will allow readers to reduce the amount of data read.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2016-05-16 14:36:57 +02:00
Pekka Enberg
a68671e247 cql3: Add column_specification::all_in_same_table() helper
We need it the prepared_metadata class that we're about to introduce.
2016-05-16 14:13:31 +03:00
Takuya ASADA
80037aa95b dist/common/scripts: don't proceed to run scylla_raid_setup when disks not selected, on interactive RAID setup
When disks not selected, run disk select prompt again.
Fixes #1260

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1463388933-3640-1-git-send-email-syuu@scylladb.com>
2016-05-16 13:45:17 +03:00
Pekka Enberg
adfb4d7bbd cql3: Move result_set class implementation to source file 2016-05-16 13:20:45 +03:00
Pekka Enberg
8552f222f5 cql3: Clean up result_set class
Kill some left-over ifdef'd code from the result_set class.

Message-Id: <1463392997-22921-1-git-send-email-penberg@scylladb.com>
2016-05-16 13:09:37 +03:00
Piotr Jastrzebski
23c23abe53 Make memtable mutation_reader slice using clustering ranges.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2016-05-16 11:46:41 +02:00
Piotr Jastrzebski
484d2ecd0a Slice data with clustering key range in sstable reader
Add additional parameters to mp_row_consumer to be able to fetch
only cells for given clustering key ranges

This will be used in row_cache when it will work on clustering key
level instead of partition key level.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2016-05-16 11:46:30 +02:00
Piotr Jastrzebski
8307681975 Introduce clustering_ranges type.
It will be used to slice data returned by mutation_readers.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2016-05-16 11:46:09 +02:00
Amnon Heiman
7e07d97e4b API utils: Adding rate moving avrage
rate_moving_average and rate_moving_average_and_histogram are type that
are used by the JMX.  They are based on the yammer meter and timer and
are used to collect derivative information.

Specificlly: rate_moving_average calculate rates and
rate_moving_average_and_histogram collect rates and
histogram.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-05-16 11:40:19 +03:00
Pekka Enberg
17765b6c06 Merge seastar upstream
* seastar 3dec26f...6a849ac (4):
  > seastar::socket: Be resilient against ENOTCONN
  > Merge " improve performance and predictability of syscall thread communications" from Glauber
  > rpc_test: Shutdown properly
  > [PATCH} future: better detect get_future() on already used promise
2016-05-16 08:04:47 +03:00
Yoav Kleinberger
de7952a8db tools/scyllatop: log input from collectd for easier debugging
When running with DEBUG verbosity, scyllatop will now log every single
value it receives from collectd. When you suspect that scyllatop is
somehow distorting values, this is a good way to check it.

Signed-off-by: Yoav Kleinberger <yoav@scylladb.com>
Message-Id: <1463320730-6631-1-git-send-email-yoav@scylladb.com>
2016-05-15 19:17:10 +03:00
Tomasz Grabiec
1eabe9b840 storage_proxy: Add trace-level logging for mutating
Message-Id: <1462978554-31217-1-git-send-email-tgrabiec@scylladb.com>
2016-05-12 13:52:56 +03:00
Tomasz Grabiec
7207cc8b1a storage_proxy: Improve error reporting
Knowing the source node can help in debugging the issue.
Message-Id: <1462978535-31164-1-git-send-email-tgrabiec@scylladb.com>
2016-05-12 13:52:39 +03:00
Pekka Enberg
b5d9aa866d Merge "Fixes for schema synchronization" from Tomek
"Writes may start to be rejected by replicas after issuing alter table
 which doesn't affect columns. This affects all versions with alter table
 support.

 Fixes #1258"
2016-05-12 09:43:25 +03:00
Duarte Nunes
7dbeef3c39 storage_service: Fix ignored future in on_alive
This patch ensures the future created by invoke_on_all is not ignored
by waiting on it, which is safe to do since we are within a
seastar::async context.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1462989837-7326-1-git-send-email-duarte@scylladb.com>
2016-05-12 09:03:46 +03:00
Tomasz Grabiec
13d8cd0ae9 migration_manager: Invalidate prepared statements on every schema change
Currently we only do that when column set changes. When prepared
statements are executed, paramaters like read repair chance are read
from schema version stored in the statement. Not invalidating prepared
statements on changes of such parameters will appear as if alter took
no effect.

Fixes #1255.
Message-Id: <1462985495-9767-1-git-send-email-tgrabiec@scylladb.com>
2016-05-12 08:58:40 +03:00
Tomasz Grabiec
90c31701e3 tests: Add unit tests for schema_registry 2016-05-11 17:31:22 +02:00
Tomasz Grabiec
443e5aef5a schema_registry: Fix possible hang in maybe_sync() if syncer doesn't defer
Spotted during code review.

If it doesn't defer, we may execute then_wrapped() body before we
change the state. Fix by moving then_wrapped() body after state changes.
2016-05-11 17:31:22 +02:00
Tomasz Grabiec
8703136a4f migration_manager: Fix schema syncing with older version
The problem was that "s" would not be marked as synced-with if it came from
shard != 0.

As a result, mutation using that schema would fail to apply with an exception:

  "attempted to mutate using not synced schema of ..."

The problem could surface when altering schema without changing
columns and restarting one of the nodes so that it forgets past
versions.

Fixes #1258.

Will be covered by dtest:

  SchemaManagementTest.test_prepared_statements_work_after_node_restart_after_altering_schema_without_changing_columns
2016-05-11 17:29:14 +02:00
Takuya ASADA
8503600e30 dist/common/systemd: drop hardcoded path
Stop using /var/lib/scylla, use $SCYLLA_HOME instead.
systemd seems does not extract variables on Environment="HOME=$SCYLLA_HOME", but both CentOS/Ubuntu able to run scylla-server without $HOME, so dropped it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1462977871-26632-1-git-send-email-syuu@scylladb.com>
2016-05-11 17:53:53 +03:00
Calle Wilund
152bd82a05 alter_keyspace_statement: Handle missing replication strategy
ALTER KEYSPACE should allow no replication strategy to be set,
in which case old strategy should be kept.
Initial translation from origin missed this.

Fixes #1256

Message-Id: <1462967584-2875-2-git-send-email-calle@scylladb.com>
2016-05-11 16:02:22 +03:00
Calle Wilund
5604fb8aa3 cql3::statements::cf_prop_defs: Fix compation min/max not handled
Property parsing code was looking at wrong property level
for initial guard statement.

Fixes #1257

Message-Id: <1462967584-2875-1-git-send-email-calle@scylladb.com>
2016-05-11 16:02:16 +03:00
Takuya ASADA
c38b5fbb3d dist/common/scripts: On scylla_io_setup, run iotune on correct data directory which specified on scylla.yaml
Currently scylla_io_setup hardcoded to run iotune on /var/lib/scylla, but user may change data directory by modifying scylla.yaml, and it may on different block device.
So use scylla_config_get.py to get configuration from scylla.yaml, passes it to iotune.

Fixes #1167

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1462955824-21983-2-git-send-email-syuu@scylladb.com>
2016-05-11 13:02:25 +03:00
Takuya ASADA
53820393da dist/common/scripts: add scylla.yaml parser for scripts
To parse scylla.yaml, scylla_config_get.py is added.
It can be use like 'scylla_config_get.py [key name]' from shell script, or command line.
This is needed for scylla_io_setup, to get 'data_file_directories' from shellscript.
Currently it does not supported to specify key name of nested data structure, but enough for scyll_io_setup.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1462955824-21983-1-git-send-email-syuu@scylladb.com>
2016-05-11 13:02:23 +03:00
Pekka Enberg
d93d46e721 Merge "ALTER KEYSPACE" from Calle
"Implementation of ALTER KEYSPACE.
Fixes #429"
2016-05-10 22:07:06 +03:00
Takuya ASADA
a73924b4e0 dist/ubuntu/dep: introduce scylla-gdb-7.11 for Ubuntu 14.04LTS
Introduce scylla-gdb-7.11 for Ubuntu 14.04LTS, to get better support of recent version of g++ on gdb.

Fixes #969

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1462825880-20866-3-git-send-email-syuu@scylladb.com>
2016-05-10 17:53:32 +03:00
Takuya ASADA
9ff2efb28b dist/common/dep: add Ubuntu support for scylla-env
Since Ubuntu 14.04LTS needs scylla-gdb package which install to /opt/scylladb, we need to port scylla-env package to Ubuntu as well.
This change introduces scylla-env package to Ubuntu 14.04LTS.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1462825880-20866-2-git-send-email-syuu@scylladb.com>
2016-05-10 17:53:32 +03:00
Takuya ASADA
43cc77d1b8 dist/redhat/centos_dep: move scylla-env to dist/common to share with Ubuntu
Since Ubuntu 14.04LTS needs scylla-gdb package which install to /opt/scylladb, we need to port scylla-env package to Ubuntu as well.
To do it, share the package directory on dist/common/dep at first.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1462825880-20866-1-git-send-email-syuu@scylladb.com>
2016-05-10 17:53:31 +03:00
Calle Wilund
147aa81177 Cql.g: Handle ALTER KEYSPACE 2016-05-10 14:36:46 +00:00
Calle Wilund
5c36d2e09e alter_keyspace_statement: Implement
Note: Like create keyspace, we don't properly validate 
replication strategy yet.
2016-05-10 14:36:17 +00:00
Piotr Jastrzebski
240a185727 Stop scanning keyspace data directory when populating.
Iterate over column families and check/create directories for them
instead of scanning keyspace data directory and filtering directories
against column families that exist in system tables for this keyspace.

Fixes #1008

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <26da66eec67a1ab1318917a66161915cdef924ab.1462890592.git.piotr@scylladb.com>
2016-05-10 17:35:55 +03:00
Calle Wilund
63b6c6bb5a migration_manager: Implement announce_keyspace_update
More or less the same as create keyspace...
2016-05-10 14:34:51 +00:00
Calle Wilund
8cdf4e37fb schema_tables: Fix merge_keyspaces to handle alter keyspace
Must keep "altered" alive into the call chain.
2016-05-10 14:32:51 +00:00
Calle Wilund
6ef7885ae3 database: Implement update_keyspace
Reloads keyspace metadata and replaces in existing keyspace. 
Note: since keyspace metadata, and consequently, replication 
strategy now becomes volatile, keyspace::metadata now returns
shared pointer by value (i.e. keep-alive). 
Replication strategy should receive the same treatment, but
since it is extensively used, but never kept across a 
continuation, I've just added a comment for now.
2016-05-10 14:31:30 +00:00
Raphael S. Carvalho
d80d194873 compaction_manager: stop compaction tasks in parallel
Purpose is to speed up shutdown.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <a8db3492f1ceeea2a886d3920e5effa841ea155f.1462838670.git.raphaelsc@scylladb.com>
2016-05-10 10:03:35 +03:00
Avi Kivity
28cc6f97af Merge 2016-05-09 14:25:25 +03:00
Calle Wilund
917bf850fa transport::server: Do not treat accept exception as fatal
1.) It most likely is not, i.e. either tcp or more likely, ssl
    negotiation failure. In any case, we can still try next
    connection.
2.) Not retrying will cause us to "leak" the accept, and then hang
    on shutdown.

Also, promote logging message on accept exception to "warn", since
dtest(s?) depend on seeing log output.

Message-Id: <1462283265-27051-4-git-send-email-calle@scylladb.com>
2016-05-09 14:13:07 +03:00
Calle Wilund
437ebe7128 cql_server: Use credentials_builder to init tls
Slightly cleaner, and shard-safe tls init.

Message-Id: <1462283265-27051-3-git-send-email-calle@scylladb.com>
2016-05-09 14:12:59 +03:00
Calle Wilund
58f7edb04f messaging_service: Change tls init to use credentials_builder
To simplify init of msg service, use credendials_builder
to encapsulate tls options so actual credentials can be
more easily created in each shard.

Message-Id: <1462283265-27051-2-git-send-email-calle@scylladb.com>
2016-05-09 14:12:53 +03:00
Avi Kivity
29e103a2ae Merge seastar upstream
* seastar 7782ad4...3dec26f (3):
  > tests/mkcert.gmk: Fix makefile bug in snakeoil cert generator
  > tls_test: Add case to do a little checking of credentials_builder
  > tls: Add credentials_builder - copyable credentials "factory"
2016-05-09 14:12:29 +03:00
Tomasz Grabiec
1ca5ceadff Merge tag '1235-v2' from https://github.com/avikivity/scylla
From Avi:

When we shut down, we may have to give up on some pending atomic
sstable deletions, because not all shards may have agreed to delete
all members of the set.

This is expected, so silence these frightening error messages.

Fixes #1235.
2016-05-09 12:22:41 +02:00
Duarte Nunes
dada385826 rpc: Secure connection attempts can be cancelled
This patch adds support for secure connection attempts to be
cancellable.

Fixes #862

Includes seastar upstream merge:

* seastar f1a3520...7782ad4 (1):
  > Merge "rpc: Allow client connections to be cancelled" from Duarte

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1462783335-10731-1-git-send-email-duarte@scylladb.com>
2016-05-09 11:44:53 +03:00
Takuya ASADA
f7d41ba07a dist: Extract scylla.yaml and create metapackage
This patch create a scylla-conf package containing
scylla.yaml and a scylla package acting as a metapackage.

Fixes #421

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1462280987-26909-1-git-send-email-syuu@scylladb.com>
2016-05-09 11:23:28 +03:00
Avi Kivity
4b34152870 Merge seastar upstream
* seastar ab74536...f1a3520 (2):
  > rpc: clear outgoing queue of a socket after failed connection
  > Merge "unconnected socket (now seastar::socket)" from Duarte

Fixes #1236.
2016-05-09 10:16:15 +03:00
Raphael S. Carvalho
3ac22bc0d7 compaction_manager: simplify code that waits for cleanup termination
Now that a task is created on demand, it's possible to wait for
termination of cleanup without extra machinery.
However, shared_future<> is now used because we may have more
than one fiber waiting for completion of task.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <209de365c7782742dc2876a66f9d0784998cae53.1462599296.git.raphaelsc@scylladb.com>
2016-05-08 11:26:36 +03:00
Avi Kivity
ee7225a9cb sstables: silence atomic deletion cancellation logs during sstable deletion
Those logs are expected during shutdown.
2016-05-07 20:37:49 +03:00
Avi Kivity
80302d98dd database: silence atomic deletion cancellation logs during compaction
Those logs are expected during shutdown.
2016-05-07 20:37:48 +03:00
Avi Kivity
43221fc7e2 sstables: make delete_atomically() throw a distinct exception when cancelled
Throwing a runtime_error makes it impossible to catch the cancellation
exception, so replace it with a distinct exception class.
2016-05-07 20:37:46 +03:00
Calle Wilund
709dd82d59 storage_service: Add logging to match origin
Pointing out if CQL server is listing in SSL mode.
Message-Id: <1462368016-32394-2-git-send-email-calle@scylladb.com>
2016-05-06 13:27:55 +03:00
Raphael S. Carvalho
bf18025937 main: stop compaction manager earlier
Avi says:
"During shutdown, we prevent new compactions, but perhaps too late.
Memtables are flushed and these can trigger compaction."

To solve that, let's stop compaction manager at a very early step
of shutdown. We will still try to stop compaction manager in
database::stop() because user may ask for a shutdown before scylla
was fully started. It's fine to stop compaction manager twice.
Only the first call will actually stop the manager.

Fixes #1238.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <c64ab11f3c91129c424259d317e48abc5bde6ff3.1462496694.git.raphaelsc@scylladb.com>
2016-05-06 07:41:29 +03:00
Calle Wilund
d8ea85cd90 messaging_service: Add logging to match origin
To announce rpc port + ssl if on.

Message-Id: <1462368016-32394-1-git-send-email-calle@scylladb.com>
2016-05-05 10:26:01 +03:00
Raphael S. Carvalho
b8277979ef compaction_manager: fix indentation
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <82c6b93b24cbcc97f5eff3f91b05d4c1b415ecee.1462412927.git.raphaelsc@scylladb.com>
2016-05-05 10:06:56 +03:00
Avi Kivity
3aefa4f1d2 Merge seastar upstream
* seastar e536555...ab74536 (4):
  > reactor: kill max_inline_continuations
  > smp: optimize smp_message_queue::flush_request_batch() for empty queue
  > thread: do not yield if idle
  > Merge "Fixes for iotune" from Glauber
2016-05-05 09:48:58 +03:00
Gleb Natapov
f1cd52ff3f tests: test for result row counting
Message-Id: <1462377579-2419-2-git-send-email-gleb@scylladb.com>
2016-05-04 18:18:17 +02:00
Gleb Natapov
b75475de80 query: fix result row counting for results with multiple partitions
Message-Id: <1462377579-2419-1-git-send-email-gleb@scylladb.com>
2016-05-04 18:18:15 +02:00
Gleb Natapov
2a00c06dd5 query: fix non full clustering key deserialization
Clustering key prefix may have less columns than described in schema.
Deserailiaztion should stop when end of buffer is reached.

Message-Id: <20160503140420.GP23113@scylladb.com>
2016-05-04 17:42:28 +02:00
Raphael S. Carvalho
5aeeb0b3e8 compaction: add support to parallel compaction on the same column family
It was noticed that small sstables will accumulate for a column family because
scylla was limited to two compaction per shard, and a column family could have
at most one compaction running at a given shard. With the number of sstables
increasing rapidly, read performance is degraded.

At the moment, our compaction manager works by running two compaction task
handlers that run in parallel to the rest of the system. Each task handler
gets to run when needed, gets a column family from compaction manager queue,
runs compaction on it, and goes to sleep again. That's basically its cycle.
Compaction manager only allows one instance of a column family to be on its
queue, meaning that it's impossible for a column family to be compacted in
parallel. One compaction starts after another for a given column family.

To solve the problem described, we want to concurrently run compaction jobs
of a column family that have different "size tier" (or "weight").
For those unfamiliar, compaction job contains a list of sstables that will be
compacted together.
The "size tier" of a compaction job is the log of the total size of the input
sstables. So a compaction job only gets to run if its "size tier" is not the
same of an ongoing compaction. There is no point in compacting concurrently at
the same "size tier", because that slows down both compactions.

We will no longer queue column families in compaction manager. Instead, we
create a new fiber to run compaction on demand.
This fiber that runs asynchronously will do the following:
1) Get a compaction job from compaction strategy.
2) Calculate "size tier" of compaction job.
3) Run compaction job if its "size tier" is not the same of an ongoing
compaction for the given column family.
As before, it may decide to re-compact a column family based on a stat stored
in column family object.

Ran all compaction-related dtests.

Fixes #1216.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <d30952ff136192a522bde4351926130addec8852.1462311908.git.raphaelsc@scylladb.com>
2016-05-04 11:46:09 +03:00
Calle Wilund
6d2caedafd auth: Make auth.* schemas use deterministic UUIDs
In initial implementation I figured this was not required, but
we get issues communicating across nodes if system tables
don't have the same UUID, since creation is forcefully local, yet
shared.

Just do a manual re-create of the scema with a name UUID, and
use migration manager directly.
Message-Id: <1462194588-11964-1-git-send-email-calle@scylladb.com>
2016-05-03 10:48:24 +03:00
Avi Kivity
24f90b087f Merge "fix range queries with limiter to not generate more requests than needed" from Gleb
Fixes #1204.
2016-05-02 15:14:45 +03:00
Gleb Natapov
3039e4c7de storage_proxy: stop range query with limit after the limit is reached 2016-05-02 15:10:15 +03:00
Gleb Natapov
db322d8f74 query: put live row count into query::result
The patch calculates row count during result building and while merging.
If one of results that are being merged does not have row count the
merged result will not have one either.
2016-05-02 15:10:15 +03:00
Gleb Natapov
41c586313a storage_proxy: fix calculation of concurrency queried ranges 2016-05-02 15:10:15 +03:00
Gleb Natapov
c364ab9121 storage_proxy: add logging for range query row count estimation 2016-05-02 15:10:15 +03:00
Calle Wilund
751ba2f0bf messaging_service: Change init to use per-shard tls credentials
Fixes: #1220

While the server_credentials object is technically immutable
(esp with last change in seastar), the ::shared_ptr holding them
is not safe to share across shards.

Pre-create cpu x credentials and then move-hand them out in service
start-up instead.

Fixes assertion error in debug builds. And just maybe real memory
corruption in release.

Requires seastar tls change:
"Change server_credentials to copy dh_params input"

Message-Id: <1462187704-2056-1-git-send-email-calle@scylladb.com>
2016-05-02 15:04:40 +03:00
Raphael S. Carvalho
ae95ce1bd7 sstables: optimize leveled compaction strategy
Leveled compaction strategy is doing a lot of work whenever it's asked to get
a list of sstables to be compacted. It's checking if a sstable overlaps with
another sstable in the same level twice. First, when adding a sstable to a
list with sstables at the same level. Second, after adding all sstables to
their respective lists.

It's enough to check that a sstable creates an overlap in its level only once.
So I am changing the code to unconditionally insert a sstable to its respective
list, and after that, it will call repair_overlapping_sstables() that will send
any sstable that creates an overlap in its level to L0 list.

By the way, the optimization isn't in the compaction itself, instead in the
strategy code that gets a set of sstables to be compacted.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <8c8526737277cb47987a3a5dbd5ff3bb81a6d038.1461965074.git.raphaelsc@scylladb.com>
2016-05-02 11:18:39 +03:00
Avi Kivity
dc69999fd8 Merge seastar upstream
* seastar dab58e4...e536555 (5):
  > rpc: introduce outgoing packet queue
  > Add condition variable implementation.
  > future-utils: support futures with multiple values in map_reduce
  > tests: rpc: stop client and server
  > tls_test: Add test for large-ish buffer send/recieve
2016-05-02 11:10:33 +03:00
Takuya ASADA
122330a5eb dist/common/scripts: add interactive prompt for package installation check, also check scylla-tools installed
Currently scylla_setup is unusable when user does not want to install scylla-jmx because it checks package unconditionally, but some users (or developers) does not want to install it, so let's ask to skip check or not on interactive prompt.

Also, scylla-tools package should installed for most of the case, added check code for the package.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1460662354-10221-1-git-send-email-syuu@scylladb.com>
2016-05-01 14:50:50 +03:00
Takuya ASADA
cc74b6ff5f dist/ubuntu: move lines from rules to .install/.dirs/.docs
To simplify build script, and make it easier spliting two packages,
use .install/.dirs/.docs instead of rules.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1461960695-30647-1-git-send-email-syuu@scylladb.com>
2016-05-01 10:16:35 +03:00
Avi Kivity
434db0bc8b Update scylla-ami submodule
* dist/ami/files/scylla-ami 7019088...72ae258 (1):
  > Add --repo option to scylla_install_ami to construct AMI with custom repository URL
2016-04-28 16:41:30 +03:00
Takuya ASADA
6723978891 dist/ami: Add --repo option to build_ami.sh to construct AMI with custom repository URL
To build AMI from specified build of .rpm, custom repo URL option is required.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1461849370-11963-1-git-send-email-syuu@scylladb.com>
2016-04-28 16:40:49 +03:00
172 changed files with 8597 additions and 1827 deletions

2
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui

View File

@@ -15,7 +15,7 @@ git submodule update --recursive
* Installing required packages:
```
sudo yum install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel numactl-devel hwloc-devel libpciaccess-devel libxml2-devel python3-pyparsing
sudo yum install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel numactl-devel hwloc-devel libpciaccess-devel libxml2-devel python3-pyparsing lksctp-tools-devel
```
* Build Scylla

View File

@@ -1,6 +1,6 @@
#!/bin/sh
VERSION=666.development
VERSION=1.2.6
if test -f version
then

View File

@@ -487,6 +487,36 @@
}
]
},
{
"path": "/cache_service/metrics/row/hits_moving_avrage",
"operations": [
{
"method": "GET",
"summary": "Get row hits moving avrage",
"type": "#/utils/rate_moving_average",
"nickname": "get_row_hits_moving_avrage",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/cache_service/metrics/row/requests_moving_avrage",
"operations": [
{
"method": "GET",
"summary": "Get row requests moving avrage",
"type": "#/utils/rate_moving_average",
"nickname": "get_row_requests_moving_avrage",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/cache_service/metrics/row/size",
"operations": [

View File

@@ -1094,7 +1094,7 @@
"method":"GET",
"summary":"Get read latency histogram",
"$ref": "#/utils/histogram",
"nickname":"get_read_latency_histogram",
"nickname":"get_read_latency_histogram_depricated",
"produces":[
"application/json"
],
@@ -1121,6 +1121,49 @@
"items":{
"$ref": "#/utils/histogram"
},
"nickname":"get_all_read_latency_histogram_depricated",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/column_family/metrics/read_latency/moving_average_histogram/{name}",
"operations":[
{
"method":"GET",
"summary":"Get read latency moving avrage histogram",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname":"get_read_latency_histogram",
"produces":[
"application/json"
],
"parameters":[
{
"name":"name",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}
]
},
{
"path":"/column_family/metrics/read_latency/moving_average_histogram/",
"operations":[
{
"method":"GET",
"summary":"Get read latency moving avrage histogram from all column family",
"type":"array",
"items":{
"$ref": "#/utils/rate_moving_average_and_histogram"
},
"nickname":"get_all_read_latency_histogram",
"produces":[
"application/json"
@@ -1260,7 +1303,7 @@
"method":"GET",
"summary":"Get write latency histogram",
"$ref": "#/utils/histogram",
"nickname":"get_write_latency_histogram",
"nickname":"get_write_latency_histogram_depricated",
"produces":[
"application/json"
],
@@ -1287,6 +1330,49 @@
"items":{
"$ref": "#/utils/histogram"
},
"nickname":"get_all_write_latency_histogram_depricated",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/column_family/metrics/write_latency/moving_average_histogram/{name}",
"operations":[
{
"method":"GET",
"summary":"Get write latency moving average histogram",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname":"get_write_latency_histogram",
"produces":[
"application/json"
],
"parameters":[
{
"name":"name",
"description":"The column family name in keysspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}
]
},
{
"path":"/column_family/metrics/write_latency/moving_average_histogram/",
"operations":[
{
"method":"GET",
"summary":"Get write latency moving average histogram of all column family",
"type":"array",
"items":{
"$ref": "#/utils/rate_moving_average_and_histogram"
},
"nickname":"get_all_write_latency_histogram",
"produces":[
"application/json"

View File

@@ -716,6 +716,36 @@
}
]
},
{
"path": "/storage_proxy/metrics/read/timeouts_rates",
"operations": [
{
"method": "GET",
"summary": "Get read metrics rates",
"type": "#/utils/rate_moving_average",
"nickname": "get_read_metrics_timeouts_rates",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/read/unavailables_rates",
"operations": [
{
"method": "GET",
"summary": "Get read metrics rates",
"type": "#/utils/rate_moving_average",
"nickname": "get_read_metrics_unavailables_rates",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/read/histogram",
"operations": [
@@ -723,7 +753,7 @@
"method": "GET",
"summary": "Get read metrics",
"$ref": "#/utils/histogram",
"nickname": "get_read_metrics_latency_histogram",
"nickname": "get_read_metrics_latency_histogram_depricated",
"produces": [
"application/json"
],
@@ -738,6 +768,36 @@
"method": "GET",
"summary": "Get range metrics",
"$ref": "#/utils/histogram",
"nickname": "get_range_metrics_latency_histogram_depricated",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/read/moving_avrage_histogram",
"operations": [
{
"method": "GET",
"summary": "Get read metrics",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname": "get_read_metrics_latency_histogram",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/range/moving_avrage_histogram",
"operations": [
{
"method": "GET",
"summary": "Get range metrics rate and histogram",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname": "get_range_metrics_latency_histogram",
"produces": [
"application/json"
@@ -776,6 +836,36 @@
}
]
},
{
"path": "/storage_proxy/metrics/range/timeouts_rates",
"operations": [
{
"method": "GET",
"summary": "Get range metrics rates",
"type": "#/utils/rate_moving_average",
"nickname": "get_range_metrics_timeouts_rates",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/range/unavailables_rates",
"operations": [
{
"method": "GET",
"summary": "Get range metrics rates",
"type": "#/utils/rate_moving_average",
"nickname": "get_range_metrics_unavailables_rates",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/write/timeouts",
"operations": [
@@ -806,6 +896,36 @@
}
]
},
{
"path": "/storage_proxy/metrics/write/timeouts_rates",
"operations": [
{
"method": "GET",
"summary": "Get write metrics rates",
"type": "#/utils/rate_moving_average",
"nickname": "get_write_metrics_timeouts_rates",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/write/unavailables_rates",
"operations": [
{
"method": "GET",
"summary": "Get write metrics rates",
"type": "#/utils/rate_moving_average",
"nickname": "get_write_metrics_unavailables_rates",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/write/histogram",
"operations": [
@@ -813,6 +933,21 @@
"method": "GET",
"summary": "Get write metrics",
"$ref": "#/utils/histogram",
"nickname": "get_write_metrics_latency_histogram_depricated",
"produces": [
"application/json"
],
"parameters": []
}
]
},
{
"path": "/storage_proxy/metrics/write/moving_avrage_histogram",
"operations": [
{
"method": "GET",
"summary": "Get write metrics",
"$ref": "#/utils/rate_moving_average_and_histogram",
"nickname": "get_write_metrics_latency_histogram",
"produces": [
"application/json"

View File

@@ -65,6 +65,41 @@
"description":"The series of values to which the counts in `buckets` correspond"
}
}
}
}
},
"rate_moving_average": {
"id":"rate_moving_average",
"description":"A meter metric which measures mean throughput and one, five, and fifteen-minute exponentially-weighted moving average throughputs",
"properties":{
"rates": {
"type":"array",
"items":{
"type":"double"
},
"description":"One, five and fifteen mintues rates"
},
"mean_rate": {
"type":"double",
"description":"The mean rate from startup"
},
"count": {
"type":"long",
"description":"Total number of events from startup"
}
}
},
"rate_moving_average_and_histogram": {
"id":"rate_moving_average_and_histogram",
"description":"A timer metric which aggregates timing durations and provides duration statistics, plus throughput statistics",
"properties":{
"meter": {
"type":"rate_moving_average",
"description":"The metric rate moving average"
},
"hist": {
"type":"histogram",
"description":"The metric histogram"
}
}
}
}
}

View File

@@ -83,6 +83,10 @@ future<> set_server_storage_service(http_context& ctx) {
return register_api(ctx, "storage_service", "The storage service API", set_storage_service);
}
future<> set_server_snitch(http_context& ctx) {
return register_api(ctx, "endpoint_snitch_info", "The endpoint snitch info API", set_endpoint_snitch);
}
future<> set_server_gossip(http_context& ctx) {
return register_api(ctx, "gossiper",
"The gossiper API", set_gossiper);
@@ -118,10 +122,6 @@ future<> set_server_gossip_settle(http_context& ctx) {
rb->register_function(r, "cache_service",
"The cache service API");
set_cache_service(ctx,r);
rb->register_function(r, "endpoint_snitch_info",
"The endpoint snitch info API");
set_endpoint_snitch(ctx, r);
});
}

View File

@@ -110,44 +110,7 @@ future<json::json_return_type> sum_stats(distributed<T>& d, V F::*f) {
});
}
inline double pow2(double a) {
return a * a;
}
// FIXME: Move to utils::ihistogram::operator+=()
inline utils::ihistogram add_histogram(utils::ihistogram res,
const utils::ihistogram& val) {
if (res.count == 0) {
return val;
}
if (val.count == 0) {
return std::move(res);
}
if (res.min > val.min) {
res.min = val.min;
}
if (res.max < val.max) {
res.max = val.max;
}
double ncount = res.count + val.count;
// To get an estimated sum we take the estimated mean
// and multiply it by the true count
res.sum = res.sum + val.mean * val.count;
double a = res.count/ncount;
double b = val.count/ncount;
double mean = a * res.mean + b * val.mean;
res.variance = (res.variance + pow2(res.mean - mean) )* a +
(val.variance + pow2(val.mean -mean))* b;
res.mean = mean;
res.count = res.count + val.count;
for (auto i : val.sample) {
res.sample.push_back(i);
}
return res;
}
inline
httpd::utils_json::histogram to_json(const utils::ihistogram& val) {
@@ -156,15 +119,39 @@ httpd::utils_json::histogram to_json(const utils::ihistogram& val) {
return h;
}
template<class T, class F>
future<json::json_return_type> sum_histogram_stats(distributed<T>& d, utils::ihistogram F::*f) {
inline
httpd::utils_json::rate_moving_average meter_to_json(const utils::rate_moving_average& val) {
httpd::utils_json::rate_moving_average m;
m = val;
return m;
}
return d.map_reduce0([f](const T& p) {return p.get_stats().*f;}, utils::ihistogram(),
add_histogram).then([](const utils::ihistogram& val) {
inline
httpd::utils_json::rate_moving_average_and_histogram timer_to_json(const utils::rate_moving_average_and_histogram& val) {
httpd::utils_json::rate_moving_average_and_histogram h;
h.hist = val.hist;
h.meter = meter_to_json(val.rate);
return h;
}
template<class T, class F>
future<json::json_return_type> sum_histogram_stats(distributed<T>& d, utils::timed_rate_moving_average_and_histogram F::*f) {
return d.map_reduce0([f](const T& p) {return (p.get_stats().*f).hist;}, utils::ihistogram(),
std::plus<utils::ihistogram>()).then([](const utils::ihistogram& val) {
return make_ready_future<json::json_return_type>(to_json(val));
});
}
template<class T, class F>
future<json::json_return_type> sum_timer_stats(distributed<T>& d, utils::timed_rate_moving_average_and_histogram F::*f) {
return d.map_reduce0([f](const T& p) {return (p.get_stats().*f).rate();}, utils::rate_moving_average_and_histogram(),
std::plus<utils::rate_moving_average_and_histogram>()).then([](const utils::rate_moving_average_and_histogram& val) {
return make_ready_future<json::json_return_type>(timer_to_json(val));
});
}
inline int64_t min_int64(int64_t a, int64_t b) {
return std::min(a,b);
}

View File

@@ -38,6 +38,7 @@ struct http_context {
};
future<> set_server_init(http_context& ctx);
future<> set_server_snitch(http_context& ctx);
future<> set_server_storage_service(http_context& ctx);
future<> set_server_gossip(http_context& ctx);
future<> set_server_load_sstable(http_context& ctx);

View File

@@ -200,24 +200,40 @@ void set_cache_service(http_context& ctx, routes& r) {
});
cs::get_row_hits.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, 0, [](const column_family& cf) {
return cf.get_row_cache().stats().hits;
}, std::plus<int64_t>());
return map_reduce_cf(ctx, uint64_t(0), [](const column_family& cf) {
return cf.get_row_cache().stats().hits.count();
}, std::plus<uint64_t>());
});
cs::get_row_requests.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, 0, [](const column_family& cf) {
return cf.get_row_cache().stats().hits + cf.get_row_cache().stats().misses;
}, std::plus<int64_t>());
return map_reduce_cf(ctx, uint64_t(0), [](const column_family& cf) {
return cf.get_row_cache().stats().hits.count() + cf.get_row_cache().stats().misses.count();
}, std::plus<uint64_t>());
});
cs::get_row_hit_rate.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, ratio_holder(), [](const column_family& cf) {
return ratio_holder(cf.get_row_cache().stats().hits + cf.get_row_cache().stats().misses,
cf.get_row_cache().stats().hits);
return ratio_holder(cf.get_row_cache().stats().hits.count() + cf.get_row_cache().stats().misses.count(),
cf.get_row_cache().stats().hits.count());
}, std::plus<ratio_holder>());
});
cs::get_row_hits_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf_raw(ctx, utils::rate_moving_average(), [](const column_family& cf) {
return cf.get_row_cache().stats().hits.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
return make_ready_future<json::json_return_type>(meter_to_json(m));
});
});
cs::get_row_requests_moving_avrage.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf_raw(ctx, utils::rate_moving_average(), [](const column_family& cf) {
return cf.get_row_cache().stats().hits.rate() + cf.get_row_cache().stats().misses.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
return make_ready_future<json::json_return_type>(meter_to_json(m));
});
});
cs::get_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
// In origin row size is the weighted size.
// We currently do not support weights, so we use num entries instead

View File

@@ -77,14 +77,14 @@ future<json::json_return_type> get_cf_stats(http_context& ctx,
}
static future<json::json_return_type> get_cf_stats_count(http_context& ctx, const sstring& name,
utils::ihistogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
return map_reduce_cf(ctx, name, int64_t(0), [f](const column_family& cf) {
return (cf.get_stats().*f).count;
return (cf.get_stats().*f).hist.count;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_stats_sum(http_context& ctx, const sstring& name,
utils::ihistogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
auto uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([uuid, f](database& db) {
// Histograms information is sample of the actual load
@@ -92,7 +92,7 @@ static future<json::json_return_type> get_cf_stats_sum(http_context& ctx, const
// with count. The information is gather in nano second,
// but reported in micro
column_family& cf = db.find_column_family(uuid);
return ((cf.get_stats().*f).count/1000.0) * (cf.get_stats().*f).mean;
return ((cf.get_stats().*f).hist.count/1000.0) * (cf.get_stats().*f).hist.mean;
}, 0.0, std::plus<double>()).then([](double res) {
return make_ready_future<json::json_return_type>((int64_t)res);
});
@@ -100,28 +100,29 @@ static future<json::json_return_type> get_cf_stats_sum(http_context& ctx, const
static future<json::json_return_type> get_cf_stats_count(http_context& ctx,
utils::ihistogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
return map_reduce_cf(ctx, int64_t(0), [f](const column_family& cf) {
return (cf.get_stats().*f).count;
return (cf.get_stats().*f).hist.count;
}, std::plus<int64_t>());
}
static future<json::json_return_type> get_cf_histogram(http_context& ctx, const sstring& name,
utils::ihistogram column_family::stats::*f) {
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::UUID uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([f, uuid](const database& p) {return p.find_column_family(uuid).get_stats().*f;},
return ctx.db.map_reduce0([f, uuid](const database& p) {
return (p.find_column_family(uuid).get_stats().*f).hist;},
utils::ihistogram(),
add_histogram)
std::plus<utils::ihistogram>())
.then([](const utils::ihistogram& val) {
return make_ready_future<json::json_return_type>(to_json(val));
});
}
static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::ihistogram column_family::stats::*f) {
static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
std::function<utils::ihistogram(const database&)> fun = [f] (const database& db) {
utils::ihistogram res;
for (auto i : db.get_column_families()) {
res = add_histogram(res, i.second->get_stats().*f);
res += (i.second->get_stats().*f).hist;
}
return res;
};
@@ -132,6 +133,33 @@ static future<json::json_return_type> get_cf_histogram(http_context& ctx, utils:
});
}
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, const sstring& name,
utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
utils::UUID uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([f, uuid](const database& p) {
return (p.find_column_family(uuid).get_stats().*f).rate();},
utils::rate_moving_average_and_histogram(),
std::plus<utils::rate_moving_average_and_histogram>())
.then([](const utils::rate_moving_average_and_histogram& val) {
return make_ready_future<json::json_return_type>(timer_to_json(val));
});
}
static future<json::json_return_type> get_cf_rate_and_histogram(http_context& ctx, utils::timed_rate_moving_average_and_histogram column_family::stats::*f) {
std::function<utils::rate_moving_average_and_histogram(const database&)> fun = [f] (const database& db) {
utils::rate_moving_average_and_histogram res;
for (auto i : db.get_column_families()) {
res += (i.second->get_stats().*f).rate();
}
return res;
};
return ctx.db.map(fun).then([](const std::vector<utils::rate_moving_average_and_histogram> &res) {
std::vector<httpd::utils_json::rate_moving_average_and_histogram> r;
boost::copy(res | boost::adaptors::transformed(timer_to_json), std::back_inserter(r));
return make_ready_future<json::json_return_type>(r);
});
}
static future<json::json_return_type> get_cf_unleveled_sstables(http_context& ctx, const sstring& name) {
return map_reduce_cf(ctx, name, int64_t(0), [](const column_family& cf) {
return cf.get_unleveled_sstables();
@@ -173,6 +201,51 @@ static ratio_holder mean_row_size(column_family& cf) {
return res;
}
static std::unordered_map<sstring, uint64_t> merge_maps(std::unordered_map<sstring, uint64_t> a,
const std::unordered_map<sstring, uint64_t>& b) {
a.insert(b.begin(), b.end());
return a;
}
static json::json_return_type sum_map(const std::unordered_map<sstring, uint64_t>& val) {
uint64_t res = 0;
for (auto i : val) {
res += i.second;
}
return res;
}
static future<json::json_return_type> sum_sstable(http_context& ctx, const sstring name, bool total) {
auto uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([uuid, total](database& db) {
std::unordered_map<sstring, uint64_t> m;
auto sstables = (total) ? db.find_column_family(uuid).get_sstables_including_compacted_undeleted() :
db.find_column_family(uuid).get_sstables();
for (auto t : *sstables) {
m[t.second->get_filename()] = t.second->bytes_on_disk();
}
return m;
}, std::unordered_map<sstring, uint64_t>(), merge_maps).
then([](const std::unordered_map<sstring, uint64_t>& val) {
return sum_map(val);
});
}
static future<json::json_return_type> sum_sstable(http_context& ctx, bool total) {
return map_reduce_cf_raw(ctx, std::unordered_map<sstring, uint64_t>(), [total](column_family& cf) {
std::unordered_map<sstring, uint64_t> m;
auto sstables = (total) ? cf.get_sstables_including_compacted_undeleted() :
cf.get_sstables();
for (auto t : *sstables) {
m[t.second->get_filename()] = t.second->bytes_on_disk();
}
return m;
},merge_maps).then([](const std::unordered_map<sstring, uint64_t>& val) {
return sum_map(val);
});
}
template <typename T>
class sum_ratio {
uint64_t _n = 0;
@@ -384,10 +457,14 @@ void set_column_family(http_context& ctx, routes& r) {
return get_cf_stats_count(ctx, &column_family::stats::writes);
});
cf::get_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::reads);
});
cf::get_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::reads);
});
cf::get_read_latency.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats_sum(ctx,req->param["name"] ,&column_family::stats::reads);
});
@@ -396,18 +473,30 @@ void set_column_family(http_context& ctx, routes& r) {
return get_cf_stats_sum(ctx, req->param["name"] ,&column_family::stats::writes);
});
cf::get_all_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, &column_family::stats::writes);
});
cf::get_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_all_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);
});
cf::get_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, req->param["name"], &column_family::stats::writes);
});
cf::get_all_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
cf::get_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, req->param["name"], &column_family::stats::writes);
});
cf::get_all_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_histogram(ctx, &column_family::stats::writes);
});
cf::get_all_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_rate_and_histogram(ctx, &column_family::stats::writes);
});
cf::get_pending_compactions.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, req->param["name"], &column_family::stats::pending_compactions);
});
@@ -429,19 +518,19 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_live_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, req->param["name"], &column_family::stats::live_disk_space_used);
return sum_sstable(ctx, req->param["name"], false);
});
cf::get_all_live_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::live_disk_space_used);
return sum_sstable(ctx, false);
});
cf::get_total_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, req->param["name"], &column_family::stats::total_disk_space_used);
return sum_sstable(ctx, req->param["name"], true);
});
cf::get_all_total_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return get_cf_stats(ctx, &column_family::stats::total_disk_space_used);
return sum_sstable(ctx, true);
});
cf::get_min_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -652,27 +741,35 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_row_cache_hit.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](const column_family& cf) {
return cf.get_row_cache().stats().hits;
}, std::plus<int64_t>());
return map_reduce_cf_raw(ctx, req->param["name"], utils::rate_moving_average(), [](const column_family& cf) {
return cf.get_row_cache().stats().hits.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
return make_ready_future<json::json_return_type>(meter_to_json(m));
});
});
cf::get_all_row_cache_hit.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, int64_t(0), [](const column_family& cf) {
return cf.get_row_cache().stats().hits;
}, std::plus<int64_t>());
return map_reduce_cf_raw(ctx, utils::rate_moving_average(), [](const column_family& cf) {
return cf.get_row_cache().stats().hits.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
return make_ready_future<json::json_return_type>(meter_to_json(m));
});
});
cf::get_row_cache_miss.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](const column_family& cf) {
return cf.get_row_cache().stats().misses;
}, std::plus<int64_t>());
return map_reduce_cf_raw(ctx, req->param["name"], utils::rate_moving_average(), [](const column_family& cf) {
return cf.get_row_cache().stats().misses.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
return make_ready_future<json::json_return_type>(meter_to_json(m));
});
});
cf::get_all_row_cache_miss.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, int64_t(0), [](const column_family& cf) {
return cf.get_row_cache().stats().misses;
}, std::plus<int64_t>());
return map_reduce_cf_raw(ctx, utils::rate_moving_average(), [](const column_family& cf) {
return cf.get_row_cache().stats().misses.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
return make_ready_future<json::json_return_type>(meter_to_json(m));
});
});

View File

@@ -34,31 +34,44 @@ future<> foreach_column_family(http_context& ctx, const sstring& name, std::func
template<class Mapper, class I, class Reducer>
future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& name, I init,
future<I> map_reduce_cf_raw(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer) {
auto uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([mapper, uuid](database& db) {
return mapper(db.find_column_family(uuid));
}, init, reducer).then([](const I& res) {
}, init, reducer);
}
template<class Mapper, class I, class Reducer>
future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer) {
return map_reduce_cf_raw(ctx, name, init, mapper, reducer).then([](const I& res) {
return make_ready_future<json::json_return_type>(res);
});
}
template<class Mapper, class I, class Reducer, class Result>
future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& name, I init,
future<I> map_reduce_cf_raw(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer, Result result) {
auto uuid = get_uuid(name, ctx.db.local());
return ctx.db.map_reduce0([mapper, uuid](database& db) {
return mapper(db.find_column_family(uuid));
}, init, reducer).then([result](const I& res) mutable {
}, init, reducer);
}
template<class Mapper, class I, class Reducer, class Result>
future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer, Result result) {
return map_reduce_cf_raw(ctx, name, init, mapper, reducer, result).then([result](const I& res) mutable {
result = res;
return make_ready_future<json::json_return_type>(result);
});
}
template<class Mapper, class I, class Reducer>
future<json::json_return_type> map_reduce_cf(http_context& ctx, I init,
future<I> map_reduce_cf_raw(http_context& ctx, I init,
Mapper mapper, Reducer reducer) {
return ctx.db.map_reduce0([mapper, init, reducer](database& db) {
auto res = init;
@@ -66,10 +79,18 @@ future<json::json_return_type> map_reduce_cf(http_context& ctx, I init,
res = reducer(res, mapper(*i.second.get()));
}
return res;
}, init, reducer).then([](const I& res) {
}, init, reducer);
}
template<class Mapper, class I, class Reducer>
future<json::json_return_type> map_reduce_cf(http_context& ctx, I init,
Mapper mapper, Reducer reducer) {
return map_reduce_cf_raw(ctx, init, mapper, reducer).then([](const I& res) {
return make_ready_future<json::json_return_type>(res);
});
}
future<json::json_return_type> get_cf_stats(http_context& ctx, const sstring& name,
int64_t column_family::stats::*f);

View File

@@ -33,6 +33,25 @@ namespace sp = httpd::storage_proxy_json;
using proxy = service::storage_proxy;
using namespace json;
static future<utils::rate_moving_average> sum_timed_rate(distributed<proxy>& d, utils::timed_rate_moving_average proxy::stats::*f) {
return d.map_reduce0([f](const proxy& p) {return (p.get_stats().*f).rate();}, utils::rate_moving_average(),
std::plus<utils::rate_moving_average>());
}
static future<json::json_return_type> sum_timed_rate_as_obj(distributed<proxy>& d, utils::timed_rate_moving_average proxy::stats::*f) {
return sum_timed_rate(d, f).then([](const utils::rate_moving_average& val) {
httpd::utils_json::rate_moving_average m;
m = val;
return make_ready_future<json::json_return_type>(m);
});
}
static future<json::json_return_type> sum_timed_rate_as_long(distributed<proxy>& d, utils::timed_rate_moving_average proxy::stats::*f) {
return sum_timed_rate(d, f).then([](const utils::rate_moving_average& val) {
return make_ready_future<json::json_return_type>(val.count);
});
}
static future<json::json_return_type> sum_estimated_histogram(http_context& ctx, sstables::estimated_histogram proxy::stats::*f) {
return ctx.sp.map_reduce0([f](const proxy& p) {return p.get_stats().*f;}, sstables::estimated_histogram(),
sstables::merge).then([](const sstables::estimated_histogram& val) {
@@ -42,8 +61,8 @@ static future<json::json_return_type> sum_estimated_histogram(http_context& ctx
});
}
static future<json::json_return_type> total_latency(http_context& ctx, utils::ihistogram proxy::stats::*f) {
return ctx.sp.map_reduce0([f](const proxy& p) {return (p.get_stats().*f).mean * (p.get_stats().*f).count;}, 0.0,
static future<json::json_return_type> total_latency(http_context& ctx, utils::timed_rate_moving_average_and_histogram proxy::stats::*f) {
return ctx.sp.map_reduce0([f](const proxy& p) {return (p.get_stats().*f).hist.mean * (p.get_stats().*f).hist.count;}, 0.0,
std::plus<double>()).then([](double val) {
int64_t res = val;
return make_ready_future<json::json_return_type>(res);
@@ -291,41 +310,77 @@ void set_storage_proxy(http_context& ctx, routes& r) {
});
sp::get_read_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::read_timeouts);
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::read_timeouts);
});
sp::get_read_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::read_unavailables);
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::read_unavailables);
});
sp::get_range_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::range_slice_timeouts);
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::range_slice_timeouts);
});
sp::get_range_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::range_slice_unavailables);
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::range_slice_unavailables);
});
sp::get_write_metrics_timeouts.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::write_timeouts);
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::write_timeouts);
});
sp::get_write_metrics_unavailables.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_stats(ctx.sp, &proxy::stats::write_unavailables);
return sum_timed_rate_as_long(ctx.sp, &proxy::stats::write_unavailables);
});
sp::get_range_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_read_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::read_timeouts);
});
sp::get_read_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::read_unavailables);
});
sp::get_range_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::range_slice_timeouts);
});
sp::get_range_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::range_slice_unavailables);
});
sp::get_write_metrics_timeouts_rates.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::write_timeouts);
});
sp::get_write_metrics_unavailables_rates.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timed_rate_as_obj(ctx.sp, &proxy::stats::write_unavailables);
});
sp::get_range_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_histogram_stats(ctx.sp, &proxy::stats::range);
});
sp::get_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_write_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_histogram_stats(ctx.sp, &proxy::stats::write);
});
sp::get_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
sp::get_read_metrics_latency_histogram_depricated.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_histogram_stats(ctx.sp, &proxy::stats::read);
});
sp::get_range_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::range);
});
sp::get_write_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::write);
});
sp::get_read_metrics_latency_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_timer_stats(ctx.sp, &proxy::stats::read);
});
sp::get_read_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_estimated_histogram(ctx, &proxy::stats::estimated_read);
});
@@ -342,7 +397,7 @@ void set_storage_proxy(http_context& ctx, routes& r) {
});
sp::get_range_estimated_histogram.set(r, [&ctx](std::unique_ptr<request> req) {
return sum_histogram_stats(ctx.sp, &proxy::stats::read);
return sum_timer_stats(ctx.sp, &proxy::stats::read);
});
sp::get_range_latency.set(r, [&ctx](std::unique_ptr<request> req) {

View File

@@ -97,7 +97,7 @@ namespace std {
template <>
struct hash<auth::data_resource> {
size_t operator()(const auth::data_resource & v) const {
return std::hash<sstring>()(v.name());
return v.hash_value();
}
};
@@ -354,9 +354,12 @@ future<> auth::auth::setup_table(const sstring& name, const sstring& cql) {
::shared_ptr<cql3::statements::create_table_statement> statement =
static_pointer_cast<cql3::statements::create_table_statement>(
parsed->prepare(db)->statement);
// Origin sets "Legacy Cf Id" for the new table. We have no need to be
// pre-2.1 compatible (afaik), so lets skip a whole lotta hoolaballo
return statement->announce_migration(qp.proxy(), false).then([statement](bool) {});
auto schema = statement->get_cf_meta_data();
auto uuid = generate_legacy_id(schema->ks_name(), schema->cf_name());
schema_builder b(schema);
b.set_uuid(uuid);
return service::get_local_migration_manager().announce_new_column_family(b.build(), false);
}
future<bool> auth::auth::has_existing_users(const sstring& cfname, const sstring& def_user_name, const sstring& name_column) {

View File

@@ -41,6 +41,7 @@
#pragma once
#include "utils/hash.hh"
#include <iosfwd>
#include <set>
#include <seastar/core/sstring.hh>
@@ -137,6 +138,10 @@ public:
bool operator==(const data_resource&) const;
bool operator<(const data_resource&) const;
size_t hash_value() const {
return utils::tuple_hash()(_ks, _cf);
}
};
/**

View File

@@ -28,7 +28,11 @@ class checked_file_impl : public file_impl {
public:
checked_file_impl(disk_error_signal_type& s, file f)
: _signal(s) , _file(f) {}
: _signal(s) , _file(f) {
_memory_dma_alignment = f.memory_dma_alignment();
_disk_read_dma_alignment = f.disk_read_dma_alignment();
_disk_write_dma_alignment = f.disk_write_dma_alignment();
}
virtual future<size_t> write_dma(uint64_t pos, const void* buffer, size_t len, const io_priority_class& pc) override {
return do_io_check(_signal, [&] {

View File

@@ -0,0 +1,127 @@
/*
* Copyright (C) 2016 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "keys.hh"
#include "schema.hh"
#include "range.hh"
/**
* Represents the kind of bound in a range tombstone.
*/
enum class bound_kind : uint8_t {
excl_end = 0,
incl_start = 1,
// values 2 to 5 are reserved for forward Origin compatibility
incl_end = 6,
excl_start = 7,
};
std::ostream& operator<<(std::ostream& out, const bound_kind k);
bound_kind invert_kind(bound_kind k);
int32_t weight(bound_kind k);
static inline bound_kind flip_bound_kind(bound_kind bk)
{
switch (bk) {
case bound_kind::excl_end: return bound_kind::excl_start;
case bound_kind::incl_end: return bound_kind::incl_start;
case bound_kind::excl_start: return bound_kind::excl_end;
case bound_kind::incl_start: return bound_kind::incl_end;
}
abort();
}
class bound_view {
const static thread_local clustering_key empty_prefix;
public:
const clustering_key_prefix& prefix;
bound_kind kind;
bound_view(const clustering_key_prefix& prefix, bound_kind kind)
: prefix(prefix)
, kind(kind)
{ }
struct compare {
// To make it assignable and to avoid taking a schema_ptr, we
// wrap the schema reference.
std::reference_wrapper<const schema> _s;
compare(const schema& s) : _s(s)
{ }
bool operator()(const clustering_key_prefix& p1, int32_t w1, const clustering_key_prefix& p2, int32_t w2) const {
auto type = _s.get().clustering_key_prefix_type();
auto res = prefix_equality_tri_compare(type->types().begin(),
type->begin(p1), type->end(p1),
type->begin(p2), type->end(p2),
tri_compare);
if (res) {
return res < 0;
}
auto d1 = p1.size(_s);
auto d2 = p2.size(_s);
if (d1 == d2) {
return w1 < w2;
}
return d1 < d2 ? w1 <= 0 : w2 > 0;
}
bool operator()(const bound_view b, const clustering_key_prefix& p) const {
return operator()(b.prefix, weight(b.kind), p, 0);
}
bool operator()(const clustering_key_prefix& p, const bound_view b) const {
return operator()(p, 0, b.prefix, weight(b.kind));
}
bool operator()(const bound_view b1, const bound_view b2) const {
return operator()(b1.prefix, weight(b1.kind), b2.prefix, weight(b2.kind));
}
};
bool equal(const schema& s, const bound_view other) const {
return kind == other.kind && prefix.equal(s, other.prefix);
}
bool adjacent(const schema& s, const bound_view other) const {
return invert_kind(other.kind) == kind && prefix.equal(s, other.prefix);
}
static bound_view bottom(const schema& s) {
return {empty_prefix, bound_kind::incl_start};
}
static bound_view top(const schema& s) {
return {empty_prefix, bound_kind::incl_end};
}
/*
template<template<typename> typename T, typename U>
concept bool Range() {
return requires (T<U> range) {
{ range.start() } -> stdx::optional<U>;
{ range.end() } -> stdx::optional<U>;
};
};*/
template<template<typename> typename Range>
static std::pair<bound_view, bound_view> from_range(const schema& s, const Range<clustering_key_prefix>& range) {
return {
range.start() ? bound_view(range.start()->value(), range.start()->is_inclusive() ? bound_kind::incl_start : bound_kind::excl_start) : bottom(s),
range.end() ? bound_view(range.end()->value(), range.end()->is_inclusive() ? bound_kind::incl_end : bound_kind::excl_end) : top(s),
};
}
friend std::ostream& operator<<(std::ostream& out, const bound_view& b) {
return out << "{bound: prefix=" << b.prefix << ", kind=" << b.kind << "}";
}
};

124
clustering_key_filter.cc Normal file
View File

@@ -0,0 +1,124 @@
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "clustering_key_filter.hh"
#include "keys.hh"
#include "query-request.hh"
#include "range.hh"
namespace query {
const std::vector<range<clustering_key_prefix>>&
clustering_key_filtering_context::get_ranges(const partition_key& key) const {
static thread_local std::vector<range<clustering_key_prefix>> full_range = {{}};
return _factory ? _factory->get_ranges(key) : full_range;
}
clustering_key_filtering_context clustering_key_filtering_context::create_no_filtering() {
return clustering_key_filtering_context{};
}
const clustering_key_filtering_context no_clustering_key_filtering =
clustering_key_filtering_context::create_no_filtering();
class stateless_clustering_key_filter_factory : public clustering_key_filter_factory {
clustering_key_filter _filter;
std::vector<range<clustering_key_prefix>> _ranges;
public:
stateless_clustering_key_filter_factory(std::vector<range<clustering_key_prefix>>&& ranges,
clustering_key_filter&& filter)
: _filter(std::move(filter)), _ranges(std::move(ranges)) {}
virtual clustering_key_filter get_filter(const partition_key& key) override {
return _filter;
}
virtual clustering_key_filter get_filter_for_sorted(const partition_key& key) override {
return _filter;
}
virtual const std::vector<range<clustering_key_prefix>>& get_ranges(const partition_key& key) override {
return _ranges;
}
};
class partition_slice_clustering_key_filter_factory : public clustering_key_filter_factory {
schema_ptr _schema;
const partition_slice& _slice;
clustering_key_prefix::prefix_equal_tri_compare _cmp;
public:
partition_slice_clustering_key_filter_factory(schema_ptr s, const partition_slice& slice)
: _schema(std::move(s)), _slice(slice), _cmp(*_schema) {}
virtual clustering_key_filter get_filter(const partition_key& key) override {
const clustering_row_ranges& ranges = _slice.row_ranges(*_schema, key);
return [this, &ranges] (const clustering_key& key) {
return std::any_of(std::begin(ranges), std::end(ranges),
[this, &key] (const range<clustering_key_prefix>& r) { return r.contains(key, _cmp); });
};
}
virtual clustering_key_filter get_filter_for_sorted(const partition_key& key) override {
const clustering_row_ranges& ranges = _slice.row_ranges(*_schema, key);
return [this, &ranges] (const clustering_key& key) {
return std::any_of(std::begin(ranges), std::end(ranges),
[this, &key] (const range<clustering_key_prefix>& r) { return r.contains(key, _cmp); });
};
}
virtual const std::vector<range<clustering_key_prefix>>& get_ranges(const partition_key& key) override {
return _slice.row_ranges(*_schema, key);
}
};
static const shared_ptr<clustering_key_filter_factory>
create_partition_slice_filter(schema_ptr s, const partition_slice& slice) {
return ::make_shared<partition_slice_clustering_key_filter_factory>(std::move(s), slice);
}
const clustering_key_filtering_context
clustering_key_filtering_context::create(schema_ptr schema, const partition_slice& slice) {
static thread_local clustering_key_filtering_context accept_all = clustering_key_filtering_context(
::make_shared<stateless_clustering_key_filter_factory>(std::vector<range<clustering_key_prefix>>{{}},
[](const clustering_key&) { return true; }));
static thread_local clustering_key_filtering_context reject_all = clustering_key_filtering_context(
::make_shared<stateless_clustering_key_filter_factory>(std::vector<range<clustering_key_prefix>>{},
[](const clustering_key&) { return false; }));
if (slice.get_specific_ranges()) {
return clustering_key_filtering_context(create_partition_slice_filter(schema, slice));
}
const clustering_row_ranges& ranges = slice.default_row_ranges();
if (ranges.empty()) {
return reject_all;
}
if (ranges.size() == 1 && ranges[0].is_full()) {
return accept_all;
}
return clustering_key_filtering_context(create_partition_slice_filter(schema, slice));
}
}

75
clustering_key_filter.hh Normal file
View File

@@ -0,0 +1,75 @@
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <functional>
#include <vector>
#include "core/shared_ptr.hh"
#include "database_fwd.hh"
#include "schema.hh"
template<typename T> class range;
namespace query {
class partition_slice;
// A predicate that tells if a clustering key should be accepted.
using clustering_key_filter = std::function<bool(const clustering_key&)>;
// A factory for clustering key filter which can be reused for multiple clustering keys.
class clustering_key_filter_factory {
public:
// Create a clustering key filter that can be used for multiple clustering keys with no restrictions.
virtual clustering_key_filter get_filter(const partition_key&) = 0;
// Create a clustering key filter that can be used for multiple clustering keys but they have to be sorted.
virtual clustering_key_filter get_filter_for_sorted(const partition_key&) = 0;
virtual const std::vector<range<clustering_key_prefix>>& get_ranges(const partition_key&) = 0;
virtual ~clustering_key_filter_factory() = default;
};
class clustering_key_filtering_context {
private:
shared_ptr<clustering_key_filter_factory> _factory;
clustering_key_filtering_context() {};
clustering_key_filtering_context(shared_ptr<clustering_key_filter_factory> factory) : _factory(factory) {}
public:
// Create a clustering key filter that can be used for multiple clustering keys with no restrictions.
clustering_key_filter get_filter(const partition_key& key) const {
return _factory ? _factory->get_filter(key) : [] (const clustering_key&) { return true; };
}
// Create a clustering key filter that can be used for multiple clustering keys but they have to be sorted.
clustering_key_filter get_filter_for_sorted(const partition_key& key) const {
return _factory ? _factory->get_filter_for_sorted(key) : [] (const clustering_key&) { return true; };
}
const std::vector<range<clustering_key_prefix>>& get_ranges(const partition_key& key) const;
static const clustering_key_filtering_context create(schema_ptr, const partition_slice&);
static clustering_key_filtering_context create_no_filtering();
};
extern const clustering_key_filtering_context no_clustering_key_filtering;
}

View File

@@ -51,6 +51,9 @@ public:
// Return a list of sstables to be compacted after applying the strategy.
compaction_descriptor get_sstables_for_compaction(column_family& cfs, std::vector<lw_shared_ptr<sstable>> candidates);
// Return if parallel compaction is allowed by strategy.
bool parallel_compaction() const;
static sstring name(compaction_strategy_type type) {
switch (type) {
case compaction_strategy_type::null:

View File

@@ -805,3 +805,11 @@ commitlog_total_space_in_mb: -1
# true: relaxed environment checks; performance and reliability may degraade.
#
# developer_mode: false
# Idle-time background processing
#
# Scylla can perform certain jobs in the background while the system is otherwise idle,
# freeing processor resources when there is other work to be done.
#
# defragment_memory_on_idle: true

View File

@@ -162,6 +162,7 @@ modes = {
scylla_tests = [
'tests/mutation_test',
'tests/schema_registry_test',
'tests/canonical_mutation_test',
'tests/range_test',
'tests/types_test',
@@ -292,6 +293,7 @@ scylla_core = (['database.cc',
'mutation_query.cc',
'key_reader.cc',
'keys.cc',
'clustering_key_filter.cc',
'sstables/sstables.cc',
'sstables/compress.cc',
'sstables/row.cc',
@@ -350,6 +352,7 @@ scylla_core = (['database.cc',
'cql3/statements/grant_statement.cc',
'cql3/statements/revoke_statement.cc',
'cql3/statements/alter_type_statement.cc',
'cql3/statements/alter_keyspace_statement.cc',
'cql3/update_parameters.cc',
'cql3/ut_name.cc',
'cql3/user_options.cc',
@@ -368,6 +371,7 @@ scylla_core = (['database.cc',
'cql3/operator.cc',
'cql3/relation.cc',
'cql3/column_identifier.cc',
'cql3/column_specification.cc',
'cql3/constants.cc',
'cql3/query_processor.cc',
'cql3/query_options.cc',
@@ -383,6 +387,7 @@ scylla_core = (['database.cc',
'cql3/selection/selection.cc',
'cql3/selection/selector.cc',
'cql3/restrictions/statement_restrictions.cc',
'cql3/result_set.cc',
'db/consistency_level.cc',
'db/system_keyspace.cc',
'db/schema_tables.cc',

View File

@@ -35,7 +35,7 @@ class converting_mutation_partition_applier : public mutation_partition_visitor
deletable_row* _current_row;
private:
static bool is_compatible(const column_definition& new_def, const data_type& old_type, column_kind kind) {
return new_def.kind == kind && new_def.type->is_value_compatible_with(*old_type);
return ::is_compatible(new_def.kind, kind) && new_def.type->is_value_compatible_with(*old_type);
}
void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, atomic_cell_view cell) {
if (is_compatible(new_def, old_type, kind) && cell.timestamp() > new_def.dropped_at()) {

View File

@@ -32,6 +32,7 @@ options {
@parser::includes {
#include "cql3/selection/writetime_or_ttl.hh"
#include "cql3/statements/alter_keyspace_statement.hh"
#include "cql3/statements/alter_table_statement.hh"
#include "cql3/statements/create_keyspace_statement.hh"
#include "cql3/statements/drop_keyspace_statement.hh"
@@ -316,9 +317,7 @@ cqlStatement returns [shared_ptr<parsed_statement> stmt]
| st13=dropIndexStatement { $stmt = st13; }
#endif
| st14=alterTableStatement { $stmt = st14; }
#if 0
| st15=alterKeyspaceStatement { $stmt = st15; }
#endif
| st16=grantStatement { $stmt = st16; }
| st17=revokeStatement { $stmt = st17; }
| st18=listPermissionsStatement { $stmt = st18; }
@@ -809,15 +808,18 @@ dropTriggerStatement returns [DropTriggerStatement expr]
{ $expr = new DropTriggerStatement(cf, name.toString(), ifExists); }
;
#endif
/**
* ALTER KEYSPACE <KS> WITH <property> = <value>;
*/
alterKeyspaceStatement returns [AlterKeyspaceStatement expr]
@init { KSPropDefs attrs = new KSPropDefs(); }
alterKeyspaceStatement returns [shared_ptr<cql3::statements::alter_keyspace_statement> expr]
@init {
auto attrs = make_shared<cql3::statements::ks_prop_defs>();
}
: K_ALTER K_KEYSPACE ks=keyspaceName
K_WITH properties[attrs] { $expr = new AlterKeyspaceStatement(ks, attrs); }
K_WITH properties[attrs] { $expr = make_shared<cql3::statements::alter_keyspace_statement>(ks, attrs); }
;
#endif
/**
* ALTER COLUMN FAMILY <CF> ALTER <column> TYPE <newtype>;

View File

@@ -0,0 +1,56 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2016 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "cql3/column_specification.hh"
namespace cql3 {
bool column_specification::all_in_same_table(const std::vector<::shared_ptr<column_specification>>& names)
{
assert(!names.empty());
auto first = names.front();
return std::all_of(std::next(names.begin()), names.end(), [first] (auto&& spec) {
return spec->ks_name == first->ks_name && spec->cf_name == first->cf_name;
});
}
}

View File

@@ -75,6 +75,8 @@ public:
bool is_reversed_type() const {
return ::dynamic_pointer_cast<const reversed_type_impl>(type) != nullptr;
}
static bool all_in_same_table(const std::vector<::shared_ptr<column_specification>>& names);
};
}

View File

@@ -432,10 +432,9 @@ void query_processor::migration_subscriber::on_update_keyspace(const sstring& ks
void query_processor::migration_subscriber::on_update_column_family(const sstring& ks_name, const sstring& cf_name, bool columns_changed)
{
if (columns_changed) {
log.info("Column definitions for {}.{} changed, invalidating related prepared statements", ks_name, cf_name);
remove_invalid_prepared_statements(ks_name, cf_name);
}
// #1255: Ignoring columns_changed deliberately.
log.info("Column definitions for {}.{} changed, invalidating related prepared statements", ks_name, cf_name);
remove_invalid_prepared_statements(ks_name, cf_name);
}
void query_processor::migration_subscriber::on_update_user_type(const sstring& ks_name, const sstring& type_name)

View File

@@ -394,6 +394,9 @@ public:
return bounds_range_type::bound(prefix, is_inclusive(b));
};
auto range = bounds_range_type(read_bound(statements::bound::START), read_bound(statements::bound::END));
if (query::is_wrap_around(range, *_schema)) {
return {};
}
return { range };
}
#if 0

View File

@@ -352,7 +352,9 @@ single_column_primary_key_restrictions<partition_key>::bounds_ranges(const query
template<>
std::vector<query::clustering_range>
single_column_primary_key_restrictions<clustering_key_prefix>::bounds_ranges(const query_options& options) const {
auto bounds = compute_bounds(options);
auto wrapping_bounds = compute_bounds(options);
auto bounds = boost::copy_range<query::clustering_row_ranges>(wrapping_bounds
| boost::adaptors::filtered([&](auto&& r) { return !query::is_wrap_around(r, *_schema); }));
auto less_cmp = clustering_key_prefix::less_compare(*_schema);
std::sort(bounds.begin(), bounds.end(), [&] (query::clustering_range& x, query::clustering_range& y) {
if (!x.start() && !y.start()) {

181
cql3/result_set.cc Normal file
View File

@@ -0,0 +1,181 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2015 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "cql3/result_set.hh"
namespace cql3 {
metadata::metadata(std::vector<::shared_ptr<column_specification>> names_)
: metadata(flag_enum_set(), std::move(names_), names_.size(), {})
{ }
metadata::metadata(flag_enum_set flags, std::vector<::shared_ptr<column_specification>> names_, uint32_t column_count,
::shared_ptr<const service::pager::paging_state> paging_state)
: _flags(flags)
, names(std::move(names_))
, _column_count(column_count)
, _paging_state(std::move(paging_state))
{ }
// The maximum number of values that the ResultSet can hold. This can be bigger than columnCount due to CASSANDRA-4911
uint32_t metadata::value_count() {
return _flags.contains<flag::NO_METADATA>() ? _column_count : names.size();
}
void metadata::add_non_serialized_column(::shared_ptr<column_specification> name) {
// See comment above. Because columnCount doesn't account the newly added name, it
// won't be serialized.
names.emplace_back(std::move(name));
}
bool metadata::all_in_same_cf() const {
if (_flags.contains<flag::NO_METADATA>()) {
return false;
}
return column_specification::all_in_same_table(names);
}
void metadata::set_has_more_pages(::shared_ptr<const service::pager::paging_state> paging_state) {
if (!paging_state) {
return;
}
_flags.set<flag::HAS_MORE_PAGES>();
_paging_state = std::move(paging_state);
}
void metadata::set_skip_metadata() {
_flags.set<flag::NO_METADATA>();
}
metadata::flag_enum_set metadata::flags() const {
return _flags;
}
uint32_t metadata::column_count() const {
return _column_count;
}
::shared_ptr<const service::pager::paging_state> metadata::paging_state() const {
return _paging_state;
}
const std::vector<::shared_ptr<column_specification>>& metadata::get_names() const {
return names;
}
prepared_metadata::prepared_metadata(const std::vector<::shared_ptr<column_specification>>& names,
const std::vector<uint16_t>& partition_key_bind_indices)
: _names{names}
, _partition_key_bind_indices{partition_key_bind_indices}
{
if (!names.empty() && column_specification::all_in_same_table(_names)) {
_flags.set<flag::GLOBAL_TABLES_SPEC>();
}
}
prepared_metadata::flag_enum_set prepared_metadata::flags() const {
return _flags;
}
const std::vector<::shared_ptr<column_specification>>& prepared_metadata::names() const {
return _names;
}
const std::vector<uint16_t>& prepared_metadata::partition_key_bind_indices() const {
return _partition_key_bind_indices;
}
result_set::result_set(std::vector<::shared_ptr<column_specification>> metadata_)
: _metadata(::make_shared<metadata>(std::move(metadata_)))
{ }
result_set::result_set(::shared_ptr<metadata> metadata)
: _metadata(std::move(metadata))
{ }
size_t result_set::size() const {
return _rows.size();
}
bool result_set::empty() const {
return _rows.empty();
}
void result_set::add_row(std::vector<bytes_opt> row) {
assert(row.size() == _metadata->value_count());
_rows.emplace_back(std::move(row));
}
void result_set::add_column_value(bytes_opt value) {
if (_rows.empty() || _rows.back().size() == _metadata->value_count()) {
std::vector<bytes_opt> row;
row.reserve(_metadata->value_count());
_rows.emplace_back(std::move(row));
}
_rows.back().emplace_back(std::move(value));
}
void result_set::reverse() {
std::reverse(_rows.begin(), _rows.end());
}
void result_set::trim(size_t limit) {
if (_rows.size() > limit) {
_rows.resize(limit);
}
}
metadata& result_set::get_metadata() {
return *_metadata;
}
const metadata& result_set::get_metadata() const {
return *_metadata;
}
const std::deque<std::vector<bytes_opt>>& result_set::rows() const {
return _rows;
}
}

View File

@@ -50,15 +50,11 @@
namespace cql3 {
class metadata {
#if 0
public static final CBCodec<Metadata> codec = new Codec();
public static final Metadata EMPTY = new Metadata(EnumSet.of(Flag.NO_METADATA), null, 0, null);
#endif
public:
enum class flag : uint8_t {
GLOBAL_TABLES_SPEC,
HAS_MORE_PAGES,
NO_METADATA
GLOBAL_TABLES_SPEC = 0,
HAS_MORE_PAGES = 1,
NO_METADATA = 2,
};
using flag_enum = super_enum<flag,
@@ -83,208 +79,31 @@ private:
::shared_ptr<const service::pager::paging_state> _paging_state;
public:
metadata(std::vector<::shared_ptr<column_specification>> names_)
: metadata(flag_enum_set(), std::move(names_), names_.size(), {})
{ }
metadata(std::vector<::shared_ptr<column_specification>> names_);
metadata(flag_enum_set flags, std::vector<::shared_ptr<column_specification>> names_, uint32_t column_count,
::shared_ptr<const service::pager::paging_state> paging_state)
: _flags(flags)
, names(std::move(names_))
, _column_count(column_count)
, _paging_state(std::move(paging_state))
{ }
::shared_ptr<const service::pager::paging_state> paging_state);
// The maximum number of values that the ResultSet can hold. This can be bigger than columnCount due to CASSANDRA-4911
uint32_t value_count() {
return _flags.contains<flag::NO_METADATA>() ? _column_count : names.size();
}
uint32_t value_count();
void add_non_serialized_column(::shared_ptr<column_specification> name) {
// See comment above. Because columnCount doesn't account the newly added name, it
// won't be serialized.
names.emplace_back(std::move(name));
}
void add_non_serialized_column(::shared_ptr<column_specification> name);
private:
bool all_in_same_cf() const {
if (_flags.contains<flag::NO_METADATA>()) {
return false;
}
assert(!names.empty());
auto first = names.front();
return std::all_of(std::next(names.begin()), names.end(), [first] (auto&& spec) {
return spec->ks_name == first->ks_name && spec->cf_name == first->cf_name;
});
}
bool all_in_same_cf() const;
public:
void set_has_more_pages(::shared_ptr<const service::pager::paging_state> paging_state) {
if (!paging_state) {
return;
}
void set_has_more_pages(::shared_ptr<const service::pager::paging_state> paging_state);
_flags.set<flag::HAS_MORE_PAGES>();
_paging_state = std::move(paging_state);
}
void set_skip_metadata();
void set_skip_metadata() {
_flags.set<flag::NO_METADATA>();
}
flag_enum_set flags() const;
flag_enum_set flags() const {
return _flags;
}
uint32_t column_count() const;
uint32_t column_count() const {
return _column_count;
}
auto paging_state() const {
return _paging_state;
}
auto const& get_names() const {
return names;
}
#if 0
@Override
public String toString()
{
StringBuilder sb = new StringBuilder();
if (names == null)
{
sb.append("[").append(columnCount).append(" columns]");
}
else
{
for (ColumnSpecification name : names)
{
sb.append("[").append(name.name);
sb.append("(").append(name.ksName).append(", ").append(name.cfName).append(")");
sb.append(", ").append(name.type).append("]");
}
}
if (flags.contains(Flag.HAS_MORE_PAGES))
sb.append(" (to be continued)");
return sb.toString();
}
private static class Codec implements CBCodec<Metadata>
{
public Metadata decode(ByteBuf body, int version)
{
// flags & column count
int iflags = body.readInt();
int columnCount = body.readInt();
EnumSet<Flag> flags = Flag.deserialize(iflags);
PagingState state = null;
if (flags.contains(Flag.HAS_MORE_PAGES))
state = PagingState.deserialize(CBUtil.readValue(body));
if (flags.contains(Flag.NO_METADATA))
return new Metadata(flags, null, columnCount, state);
boolean globalTablesSpec = flags.contains(Flag.GLOBAL_TABLES_SPEC);
String globalKsName = null;
String globalCfName = null;
if (globalTablesSpec)
{
globalKsName = CBUtil.readString(body);
globalCfName = CBUtil.readString(body);
}
// metadata (names/types)
List<ColumnSpecification> names = new ArrayList<ColumnSpecification>(columnCount);
for (int i = 0; i < columnCount; i++)
{
String ksName = globalTablesSpec ? globalKsName : CBUtil.readString(body);
String cfName = globalTablesSpec ? globalCfName : CBUtil.readString(body);
ColumnIdentifier colName = new ColumnIdentifier(CBUtil.readString(body), true);
AbstractType type = DataType.toType(DataType.codec.decodeOne(body, version));
names.add(new ColumnSpecification(ksName, cfName, colName, type));
}
return new Metadata(flags, names, names.size(), state);
}
public void encode(Metadata m, ByteBuf dest, int version)
{
boolean noMetadata = m.flags.contains(Flag.NO_METADATA);
boolean globalTablesSpec = m.flags.contains(Flag.GLOBAL_TABLES_SPEC);
boolean hasMorePages = m.flags.contains(Flag.HAS_MORE_PAGES);
assert version > 1 || (!m.flags.contains(Flag.HAS_MORE_PAGES) && !noMetadata): "version = " + version + ", flags = " + m.flags;
dest.writeInt(Flag.serialize(m.flags));
dest.writeInt(m.columnCount);
if (hasMorePages)
CBUtil.writeValue(m.pagingState.serialize(), dest);
if (!noMetadata)
{
if (globalTablesSpec)
{
CBUtil.writeString(m.names.get(0).ksName, dest);
CBUtil.writeString(m.names.get(0).cfName, dest);
}
for (int i = 0; i < m.columnCount; i++)
{
ColumnSpecification name = m.names.get(i);
if (!globalTablesSpec)
{
CBUtil.writeString(name.ksName, dest);
CBUtil.writeString(name.cfName, dest);
}
CBUtil.writeString(name.name.toString(), dest);
DataType.codec.writeOne(DataType.fromType(name.type, version), dest, version);
}
}
}
public int encodedSize(Metadata m, int version)
{
boolean noMetadata = m.flags.contains(Flag.NO_METADATA);
boolean globalTablesSpec = m.flags.contains(Flag.GLOBAL_TABLES_SPEC);
boolean hasMorePages = m.flags.contains(Flag.HAS_MORE_PAGES);
int size = 8;
if (hasMorePages)
size += CBUtil.sizeOfValue(m.pagingState.serialize());
if (!noMetadata)
{
if (globalTablesSpec)
{
size += CBUtil.sizeOfString(m.names.get(0).ksName);
size += CBUtil.sizeOfString(m.names.get(0).cfName);
}
for (int i = 0; i < m.columnCount; i++)
{
ColumnSpecification name = m.names.get(i);
if (!globalTablesSpec)
{
size += CBUtil.sizeOfString(name.ksName);
size += CBUtil.sizeOfString(name.cfName);
}
size += CBUtil.sizeOfString(name.name.toString());
size += DataType.codec.oneSerializedSize(DataType.fromType(name.type, version), version);
}
}
return size;
}
}
#endif
::shared_ptr<const service::pager::paging_state> paging_state() const;
const std::vector<::shared_ptr<column_specification>>& get_names() const;
};
inline ::shared_ptr<cql3::metadata> make_empty_metadata()
@@ -294,223 +113,61 @@ inline ::shared_ptr<cql3::metadata> make_empty_metadata()
return result;
}
class result_set {
#if 0
private static final ColumnIdentifier COUNT_COLUMN = new ColumnIdentifier("count", false);
#endif
class prepared_metadata {
public:
enum class flag : uint8_t {
GLOBAL_TABLES_SPEC = 0,
};
using flag_enum = super_enum<flag,
flag::GLOBAL_TABLES_SPEC>;
using flag_enum_set = enum_set<flag_enum>;
private:
flag_enum_set _flags;
std::vector<::shared_ptr<column_specification>> _names;
std::vector<uint16_t> _partition_key_bind_indices;
public:
prepared_metadata(const std::vector<::shared_ptr<column_specification>>& names,
const std::vector<uint16_t>& partition_key_bind_indices);
flag_enum_set flags() const;
const std::vector<::shared_ptr<column_specification>>& names() const;
const std::vector<uint16_t>& partition_key_bind_indices() const;
};
class result_set {
public:
::shared_ptr<metadata> _metadata;
std::deque<std::vector<bytes_opt>> _rows;
public:
result_set(std::vector<::shared_ptr<column_specification>> metadata_)
: _metadata(::make_shared<metadata>(std::move(metadata_)))
{ }
result_set(std::vector<::shared_ptr<column_specification>> metadata_);
result_set(::shared_ptr<metadata> metadata)
: _metadata(std::move(metadata))
{ }
result_set(::shared_ptr<metadata> metadata);
size_t size() const {
return _rows.size();
}
size_t size() const;
bool empty() const {
return _rows.empty();
}
bool empty() const;
void add_row(std::vector<bytes_opt> row) {
assert(row.size() == _metadata->value_count());
_rows.emplace_back(std::move(row));
}
void add_row(std::vector<bytes_opt> row);
void add_column_value(bytes_opt value) {
if (_rows.empty() || _rows.back().size() == _metadata->value_count()) {
std::vector<bytes_opt> row;
row.reserve(_metadata->value_count());
_rows.emplace_back(std::move(row));
}
void add_column_value(bytes_opt value);
_rows.back().emplace_back(std::move(value));
}
void reverse();
void reverse() {
std::reverse(_rows.begin(), _rows.end());
}
void trim(size_t limit) {
if (_rows.size() > limit) {
_rows.resize(limit);
}
}
void trim(size_t limit);
template<typename RowComparator>
void sort(RowComparator&& cmp) {
std::sort(_rows.begin(), _rows.end(), std::forward<RowComparator>(cmp));
}
metadata& get_metadata() {
return *_metadata;
}
metadata& get_metadata();
const metadata& get_metadata() const {
return *_metadata;
}
const metadata& get_metadata() const;
// Returns a range of rows. A row is a range of bytes_opt.
auto const& rows() const {
return _rows;
}
#if 0
public CqlResult toThriftResult()
{
assert metadata.names != null;
String UTF8 = "UTF8Type";
CqlMetadata schema = new CqlMetadata(new HashMap<ByteBuffer, String>(),
new HashMap<ByteBuffer, String>(),
// The 2 following ones shouldn't be needed in CQL3
UTF8, UTF8);
for (int i = 0; i < metadata.columnCount; i++)
{
ColumnSpecification spec = metadata.names.get(i);
ByteBuffer colName = ByteBufferUtil.bytes(spec.name.toString());
schema.name_types.put(colName, UTF8);
AbstractType<?> normalizedType = spec.type instanceof ReversedType ? ((ReversedType)spec.type).baseType : spec.type;
schema.value_types.put(colName, normalizedType.toString());
}
List<CqlRow> cqlRows = new ArrayList<CqlRow>(rows.size());
for (List<ByteBuffer> row : rows)
{
List<Column> thriftCols = new ArrayList<Column>(metadata.columnCount);
for (int i = 0; i < metadata.columnCount; i++)
{
Column col = new Column(ByteBufferUtil.bytes(metadata.names.get(i).name.toString()));
col.setValue(row.get(i));
thriftCols.add(col);
}
// The key of CqlRow shoudn't be needed in CQL3
cqlRows.add(new CqlRow(ByteBufferUtil.EMPTY_BYTE_BUFFER, thriftCols));
}
CqlResult res = new CqlResult(CqlResultType.ROWS);
res.setRows(cqlRows).setSchema(schema);
return res;
}
@Override
public String toString()
{
try
{
StringBuilder sb = new StringBuilder();
sb.append(metadata).append('\n');
for (List<ByteBuffer> row : rows)
{
for (int i = 0; i < row.size(); i++)
{
ByteBuffer v = row.get(i);
if (v == null)
{
sb.append(" | null");
}
else
{
sb.append(" | ");
if (metadata.flags.contains(Flag.NO_METADATA))
sb.append("0x").append(ByteBufferUtil.bytesToHex(v));
else
sb.append(metadata.names.get(i).type.getString(v));
}
}
sb.append('\n');
}
sb.append("---");
return sb.toString();
}
catch (Exception e)
{
throw new RuntimeException(e);
}
}
public static class Codec implements CBCodec<ResultSet>
{
/*
* Format:
* - metadata
* - rows count (4 bytes)
* - rows
*/
public ResultSet decode(ByteBuf body, int version)
{
Metadata m = Metadata.codec.decode(body, version);
int rowCount = body.readInt();
ResultSet rs = new ResultSet(m, new ArrayList<List<ByteBuffer>>(rowCount));
// rows
int totalValues = rowCount * m.columnCount;
for (int i = 0; i < totalValues; i++)
rs.addColumnValue(CBUtil.readValue(body));
return rs;
}
public void encode(ResultSet rs, ByteBuf dest, int version)
{
Metadata.codec.encode(rs.metadata, dest, version);
dest.writeInt(rs.rows.size());
for (List<ByteBuffer> row : rs.rows)
{
// Note that we do only want to serialize only the first columnCount values, even if the row
// as more: see comment on Metadata.names field.
for (int i = 0; i < rs.metadata.columnCount; i++)
CBUtil.writeValue(row.get(i), dest);
}
}
public int encodedSize(ResultSet rs, int version)
{
int size = Metadata.codec.encodedSize(rs.metadata, version) + 4;
for (List<ByteBuffer> row : rs.rows)
{
for (int i = 0; i < rs.metadata.columnCount; i++)
size += CBUtil.sizeOfValue(row.get(i));
}
return size;
}
}
public static enum Flag
{
// The order of that enum matters!!
GLOBAL_TABLES_SPEC,
HAS_MORE_PAGES,
NO_METADATA;
public static EnumSet<Flag> deserialize(int flags)
{
EnumSet<Flag> set = EnumSet.noneOf(Flag.class);
Flag[] values = Flag.values();
for (int n = 0; n < values.length; n++)
{
if ((flags & (1 << n)) != 0)
set.add(values[n]);
}
return set;
}
public static int serialize(EnumSet<Flag> flags)
{
int i = 0;
for (Flag flag : flags)
i |= 1 << flag.ordinal();
return i;
}
}
#endif
const std::deque<std::vector<bytes_opt>>& rows() const;
};
}

View File

@@ -232,7 +232,7 @@ uint32_t selection::add_column_for_ordering(const column_definition& c) {
raw_selector::to_selectables(raw_selectors, schema), db, schema, defs);
auto metadata = collect_metadata(schema, raw_selectors, *factories);
if (processes_selection(raw_selectors)) {
if (processes_selection(raw_selectors) || raw_selectors.size() != defs.size()) {
return ::make_shared<selection_with_processing>(schema, std::move(defs), std::move(metadata), std::move(factories));
} else {
return ::make_shared<simple_selection>(schema, std::move(defs), std::move(metadata), false);

View File

@@ -0,0 +1,102 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2015 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "alter_keyspace_statement.hh"
#include "service/migration_manager.hh"
#include "db/system_keyspace.hh"
#include "database.hh"
cql3::statements::alter_keyspace_statement::alter_keyspace_statement(sstring name, ::shared_ptr<ks_prop_defs> attrs)
: _name(name)
, _attrs(std::move(attrs))
{}
const sstring& cql3::statements::alter_keyspace_statement::keyspace() const {
return _name;
}
future<> cql3::statements::alter_keyspace_statement::check_access(const service::client_state& state) {
return state.has_keyspace_access(_name, auth::permission::ALTER);
}
void cql3::statements::alter_keyspace_statement::validate(distributed<service::storage_proxy>& proxy, const service::client_state& state) {
try {
service::get_local_storage_proxy().get_db().local().find_keyspace(_name); // throws on failure
auto tmp = _name;
std::transform(tmp.begin(), tmp.end(), tmp.begin(), ::tolower);
if (tmp == db::system_keyspace::NAME) {
throw exceptions::invalid_request_exception("Cannot alter system keyspace");
}
_attrs->validate();
if (!bool(_attrs->get_replication_strategy_class()) && !_attrs->get_replication_options().empty()) {
throw exceptions::configuration_exception("Missing replication strategy class");
}
#if 0
// The strategy is validated through KSMetaData.validate() in announceKeyspaceUpdate below.
// However, for backward compatibility with thrift, this doesn't validate unexpected options yet,
// so doing proper validation here.
AbstractReplicationStrategy.validateReplicationStrategy(name,
AbstractReplicationStrategy.getClass(attrs.getReplicationStrategyClass()),
StorageService.instance.getTokenMetadata(),
DatabaseDescriptor.getEndpointSnitch(),
attrs.getReplicationOptions());
#endif
} catch (no_such_keyspace& e) {
std::throw_with_nested(exceptions::invalid_request_exception("Unknown keyspace " + _name));
}
}
future<bool> cql3::statements::alter_keyspace_statement::announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only) {
auto old_ksm = service::get_local_storage_proxy().get_db().local().find_keyspace(_name).metadata();
return service::get_local_migration_manager().announce_keyspace_update(_attrs->as_ks_metadata_update(old_ksm), is_local_only).then([] {
return true;
});
}
shared_ptr<transport::event::schema_change> cql3::statements::alter_keyspace_statement::change_event() {
return make_shared<transport::event::schema_change>(
transport::event::schema_change::change_type::UPDATED,
keyspace());
}

View File

@@ -41,80 +41,29 @@
#pragma once
#include <memory>
#include "cql3/statements/schema_altering_statement.hh"
#include "cql3/statements/ks_prop_defs.hh"
#include <memory>
namespace cql3 {
namespace statements {
class alter_keyspace_statement : public schema_altering_statement {
sstring _name;
std::unique_ptr<ks_prop_defs> _attrs;
::shared_ptr<ks_prop_defs> _attrs;
public:
alter_keyspace_statement(sstring name, std::unique_ptr<ks_prop_defs>&& attrs)
: _name{name}
, _attrs{std::move(attrs)}
{ }
alter_keyspace_statement(sstring name, ::shared_ptr<ks_prop_defs> attrs);
virtual const sstring& keyspace() const override {
return _name;
}
const sstring& keyspace() const override;
#if 0
public void checkAccess(ClientState state) throws UnauthorizedException, InvalidRequestException
{
state.hasKeyspaceAccess(name, Permission.ALTER);
}
public void validate(ClientState state) throws RequestValidationException
{
KSMetaData ksm = Schema.instance.getKSMetaData(name);
if (ksm == null)
throw new InvalidRequestException("Unknown keyspace " + name);
if (ksm.name.equalsIgnoreCase(SystemKeyspace.NAME))
throw new InvalidRequestException("Cannot alter system keyspace");
attrs.validate();
if (attrs.getReplicationStrategyClass() == null && !attrs.getReplicationOptions().isEmpty())
{
throw new ConfigurationException("Missing replication strategy class");
}
else if (attrs.getReplicationStrategyClass() != null)
{
// The strategy is validated through KSMetaData.validate() in announceKeyspaceUpdate below.
// However, for backward compatibility with thrift, this doesn't validate unexpected options yet,
// so doing proper validation here.
AbstractReplicationStrategy.validateReplicationStrategy(name,
AbstractReplicationStrategy.getClass(attrs.getReplicationStrategyClass()),
StorageService.instance.getTokenMetadata(),
DatabaseDescriptor.getEndpointSnitch(),
attrs.getReplicationOptions());
}
}
public boolean announceMigration(boolean isLocalOnly) throws RequestValidationException
{
KSMetaData ksm = Schema.instance.getKSMetaData(name);
// In the (very) unlikely case the keyspace was dropped since validate()
if (ksm == null)
throw new InvalidRequestException("Unknown keyspace " + name);
MigrationManager.announceKeyspaceUpdate(attrs.asKSMetadataUpdate(ksm), isLocalOnly);
return true;
}
public Event.SchemaChange changeEvent()
{
return new Event.SchemaChange(Event.SchemaChange.Change.UPDATED, keyspace());
}
#endif
future<> check_access(const service::client_state& state) override;
void validate(distributed<service::storage_proxy>& proxy, const service::client_state& state) override;
future<bool> announce_migration(distributed<service::storage_proxy>& proxy, bool is_local_only) override;
shared_ptr<transport::event::schema_change> change_event() override;
};
}
}

View File

@@ -162,7 +162,7 @@ void cf_prop_defs::apply_to_builder(schema_builder& builder) {
}
std::experimental::optional<sstring> tmp_value = {};
if (has_property(KW_MINCOMPACTIONTHRESHOLD)) {
if (has_property(KW_COMPACTION)) {
if (get_compaction_options().count(KW_MINCOMPACTIONTHRESHOLD)) {
tmp_value = get_compaction_options().at(KW_MINCOMPACTIONTHRESHOLD);
}
@@ -170,7 +170,7 @@ void cf_prop_defs::apply_to_builder(schema_builder& builder) {
int min_compaction_threshold = to_int(KW_MINCOMPACTIONTHRESHOLD, tmp_value, builder.get_min_compaction_threshold());
tmp_value = {};
if (has_property(KW_MAXCOMPACTIONTHRESHOLD)) {
if (has_property(KW_COMPACTION)) {
if (get_compaction_options().count(KW_MAXCOMPACTIONTHRESHOLD)) {
tmp_value = get_compaction_options().at(KW_MAXCOMPACTIONTHRESHOLD);
}

View File

@@ -79,6 +79,18 @@ lw_shared_ptr<keyspace_metadata> ks_prop_defs::as_ks_metadata(sstring ks_name) {
return keyspace_metadata::new_keyspace(ks_name, get_replication_strategy_class().value(), options, get_boolean(KW_DURABLE_WRITES, true));
}
lw_shared_ptr<keyspace_metadata> ks_prop_defs::as_ks_metadata_update(lw_shared_ptr<keyspace_metadata> old) {
auto options = get_replication_options();
options.erase(REPLICATION_STRATEGY_CLASS_KEY);
auto sc = get_replication_strategy_class();
if (!sc) {
sc = old->strategy_name();
options = old->strategy_options();
}
return keyspace_metadata::new_keyspace(old->name(), *sc, options, get_boolean(KW_DURABLE_WRITES, true));
}
}
}

View File

@@ -66,6 +66,7 @@ public:
std::map<sstring, sstring> get_replication_options() const;
std::experimental::optional<sstring> get_replication_strategy_class() const;
lw_shared_ptr<keyspace_metadata> as_ks_metadata(sstring ks_name);
lw_shared_ptr<keyspace_metadata> as_ks_metadata_update(lw_shared_ptr<keyspace_metadata> old);
#if 0
public KSMetaData asKSMetadataUpdate(KSMetaData old) throws RequestValidationException

View File

@@ -221,7 +221,7 @@ select_statement::execute(distributed<service::storage_proxy>& proxy, service::q
auto now = db_clock::now();
auto command = ::make_lw_shared<query::read_command>(_schema->id(), _schema->version(),
make_partition_slice(options), limit, to_gc_clock(now));
make_partition_slice(options), limit, to_gc_clock(now), options.get_timestamp(state));
int32_t page_size = options.get_page_size();
@@ -313,7 +313,7 @@ select_statement::execute_internal(distributed<service::storage_proxy>& proxy, s
int32_t limit = get_limit(options);
auto now = db_clock::now();
auto command = ::make_lw_shared<query::read_command>(_schema->id(), _schema->version(),
make_partition_slice(options), limit);
make_partition_slice(options), limit, to_gc_clock(now), options.get_timestamp(state));
auto partition_ranges = _restrictions->get_partition_key_ranges(options);
if (needs_post_query_ordering() && _limit) {

View File

@@ -87,6 +87,13 @@ public:
}
};
lw_shared_ptr<memtable_list>
column_family::make_memory_only_memtable_list() {
auto seal = [this] { return make_ready_future<>(); };
auto get_schema = [this] { return schema(); };
return make_lw_shared<memtable_list>(std::move(seal), std::move(get_schema), _config.max_memtable_size, _config.dirty_memory_region_group);
}
lw_shared_ptr<memtable_list>
column_family::make_memtable_list() {
auto seal = [this] { return seal_active_memtable(); };
@@ -101,30 +108,14 @@ column_family::make_streaming_memtable_list() {
return make_lw_shared<memtable_list>(std::move(seal), std::move(get_schema), _config.max_streaming_memtable_size, _config.streaming_dirty_memory_region_group);
}
column_family::column_family(schema_ptr schema, config config, db::commitlog& cl, compaction_manager& compaction_manager)
column_family::column_family(schema_ptr schema, config config, db::commitlog* cl, compaction_manager& compaction_manager)
: _schema(std::move(schema))
, _config(std::move(config))
, _memtables(make_memtable_list())
, _streaming_memtables(_config.enable_disk_writes ? make_streaming_memtable_list() : make_memtable_list())
, _memtables(_config.enable_disk_writes ? make_memtable_list() : make_memory_only_memtable_list())
, _streaming_memtables(_config.enable_disk_writes ? make_streaming_memtable_list() : make_memory_only_memtable_list())
, _sstables(make_lw_shared<sstable_list>())
, _cache(_schema, sstables_as_mutation_source(), sstables_as_key_source(), global_cache_tracker())
, _commitlog(&cl)
, _compaction_manager(compaction_manager)
, _flush_queue(std::make_unique<memtable_flush_queue>())
{
if (!_config.enable_disk_writes) {
dblog.warn("Writes disabled, column family no durable.");
}
}
column_family::column_family(schema_ptr schema, config config, no_commitlog cl, compaction_manager& compaction_manager)
: _schema(std::move(schema))
, _config(std::move(config))
, _memtables(make_memtable_list())
, _streaming_memtables(_config.enable_disk_writes ? make_streaming_memtable_list() : make_memtable_list())
, _sstables(make_lw_shared<sstable_list>())
, _cache(_schema, sstables_as_mutation_source(), sstables_as_key_source(), global_cache_tracker())
, _commitlog(nullptr)
, _commitlog(cl)
, _compaction_manager(compaction_manager)
, _flush_queue(std::make_unique<memtable_flush_queue>())
{
@@ -147,8 +138,11 @@ column_family::make_partition_presence_checker(lw_shared_ptr<sstable_list> old_s
mutation_source
column_family::sstables_as_mutation_source() {
return mutation_source([this] (schema_ptr s, const query::partition_range& r, const io_priority_class& pc) {
return make_sstable_reader(std::move(s), r, pc);
return mutation_source([this] (schema_ptr s,
const query::partition_range& r,
query::clustering_key_filtering_context ck_filtering,
const io_priority_class& pc) {
return make_sstable_reader(std::move(s), r, ck_filtering, pc);
});
}
@@ -180,16 +174,23 @@ class range_sstable_reader final : public mutation_reader::impl {
// Use a pointer instead of copying, so we don't need to regenerate the reader if
// the priority changes.
const io_priority_class& _pc;
query::clustering_key_filtering_context _ck_filtering;
public:
range_sstable_reader(schema_ptr s, lw_shared_ptr<sstable_list> sstables, const query::partition_range& pr, const io_priority_class& pc)
range_sstable_reader(schema_ptr s,
lw_shared_ptr<sstable_list> sstables,
const query::partition_range& pr,
query::clustering_key_filtering_context ck_filtering,
const io_priority_class& pc)
: _pr(pr)
, _sstables(std::move(sstables))
, _pc(pc)
, _ck_filtering(ck_filtering)
{
std::vector<mutation_reader> readers;
for (const lw_shared_ptr<sstables::sstable>& sst : *_sstables | boost::adaptors::map_values) {
// FIXME: make sstable::read_range_rows() return ::mutation_reader so that we can drop this wrapper.
mutation_reader reader = make_mutation_reader<sstable_range_wrapping_reader>(sst, s, pr, pc);
mutation_reader reader =
make_mutation_reader<sstable_range_wrapping_reader>(sst, s, pr, _ck_filtering, _pc);
if (sst->is_shared()) {
reader = make_filtering_reader(std::move(reader), belongs_to_current_shard);
}
@@ -214,22 +215,30 @@ class single_key_sstable_reader final : public mutation_reader::impl {
// Use a pointer instead of copying, so we don't need to regenerate the reader if
// the priority changes.
const io_priority_class& _pc;
query::clustering_key_filtering_context _ck_filtering;
public:
single_key_sstable_reader(schema_ptr schema, lw_shared_ptr<sstable_list> sstables, const partition_key& key, const io_priority_class& pc)
single_key_sstable_reader(schema_ptr schema,
lw_shared_ptr<sstable_list> sstables,
const partition_key& key,
query::clustering_key_filtering_context ck_filtering,
const io_priority_class& pc)
: _schema(std::move(schema))
, _key(sstables::key::from_partition_key(*_schema, key))
, _sstables(std::move(sstables))
, _pc(pc)
, _ck_filtering(ck_filtering)
{ }
virtual future<mutation_opt> operator()() override {
if (_done) {
return make_ready_future<mutation_opt>();
}
return parallel_for_each(*_sstables | boost::adaptors::map_values, [this](const lw_shared_ptr<sstables::sstable>& sstable) {
return sstable->read_row(_schema, _key, _pc).then([this](mutation_opt mo) {
apply(_m, std::move(mo));
});
return parallel_for_each(*_sstables | boost::adaptors::map_values,
[this](const lw_shared_ptr<sstables::sstable>& sstable) {
return sstable->read_row(_schema, _key, _ck_filtering, _pc)
.then([this](mutation_opt mo) {
apply(_m, std::move(mo));
});
}).then([this] {
_done = true;
return std::move(_m);
@@ -238,16 +247,28 @@ public:
};
mutation_reader
column_family::make_sstable_reader(schema_ptr s, const query::partition_range& pr, const io_priority_class& pc) const {
column_family::make_sstable_reader(schema_ptr s,
const query::partition_range& pr,
query::clustering_key_filtering_context ck_filtering,
const io_priority_class& pc) const {
// restricts a reader's concurrency if the configuration specifies it
auto restrict_reader = [&] (mutation_reader&& in) {
if (_config.read_concurrency_config.sem) {
return make_restricted_reader(_config.read_concurrency_config, 1, std::move(in));
} else {
return std::move(in);
}
};
if (pr.is_singular() && pr.start()->value().has_key()) {
const dht::ring_position& pos = pr.start()->value();
if (dht::shard_of(pos.token()) != engine().cpu_id()) {
return make_empty_reader(); // range doesn't belong to this shard
}
return make_mutation_reader<single_key_sstable_reader>(std::move(s), _sstables, *pos.key(), pc);
return restrict_reader(make_mutation_reader<single_key_sstable_reader>(std::move(s), _sstables, *pos.key(), ck_filtering, pc));
} else {
// range_sstable_reader is not movable so we need to wrap it
return make_mutation_reader<range_sstable_reader>(std::move(s), _sstables, pr, pc);
return restrict_reader(make_mutation_reader<range_sstable_reader>(std::move(s), _sstables, pr, ck_filtering, pc));
}
}
@@ -306,14 +327,17 @@ column_family::find_row(schema_ptr s, const dht::decorated_key& partition_key, c
}
mutation_reader
column_family::make_reader(schema_ptr s, const query::partition_range& range, const io_priority_class& pc) const {
column_family::make_reader(schema_ptr s,
const query::partition_range& range,
const query::clustering_key_filtering_context& ck_filtering,
const io_priority_class& pc) const {
if (query::is_wrap_around(range, *s)) {
// make_combined_reader() can't handle streams that wrap around yet.
fail(unimplemented::cause::WRAP_AROUND);
}
std::vector<mutation_reader> readers;
readers.reserve(_memtables->size() + _sstables->size());
readers.reserve(_memtables->size() + 1);
// We're assuming that cache and memtables are both read atomically
// for single-key queries, so we don't need to special case memtable
@@ -336,13 +360,13 @@ column_family::make_reader(schema_ptr s, const query::partition_range& range, co
// https://github.com/scylladb/scylla/issues/185
for (auto&& mt : *_memtables) {
readers.emplace_back(mt->make_reader(s, range, pc));
readers.emplace_back(mt->make_reader(s, range, ck_filtering, pc));
}
if (_config.enable_cache) {
readers.emplace_back(_cache.make_reader(s, range, pc));
readers.emplace_back(_cache.make_reader(s, range, ck_filtering, pc));
} else {
readers.emplace_back(make_sstable_reader(s, range, pc));
readers.emplace_back(make_sstable_reader(s, range, ck_filtering, pc));
}
return make_combined_reader(std::move(readers));
@@ -458,12 +482,6 @@ future<> lister::scan_dir(sstring name, lister::dir_entry_types type, walker_typ
});
}
static std::vector<sstring> parse_fname(sstring filename) {
std::vector<sstring> comps;
boost::split(comps , filename ,boost::is_any_of(".-"));
return comps;
}
static bool belongs_to_current_shard(const schema& s, const partition_key& first, const partition_key& last) {
auto key_shard = [&s] (const partition_key& pk) {
auto token = dht::global_partitioner().get_token(s, pk);
@@ -475,12 +493,75 @@ static bool belongs_to_current_shard(const schema& s, const partition_key& first
return (s1 <= me) && (me <= s2);
}
static bool belongs_to_other_shard(const schema& s, const partition_key& first, const partition_key& last) {
auto key_shard = [&s] (const partition_key& pk) {
auto token = dht::global_partitioner().get_token(s, pk);
return dht::shard_of(token);
};
auto s1 = key_shard(first);
auto s2 = key_shard(last);
auto me = engine().cpu_id();
return (s1 != me) || (me != s2);
}
static bool belongs_to_current_shard(const schema& s, range<partition_key> r) {
assert(r.start());
assert(r.end());
return belongs_to_current_shard(s, r.start()->value(), r.end()->value());
}
static bool belongs_to_other_shard(const schema& s, range<partition_key> r) {
assert(r.start());
assert(r.end());
return belongs_to_other_shard(s, r.start()->value(), r.end()->value());
}
future<> column_family::load_sstable(sstables::sstable&& sstab, bool reset_level) {
auto sst = make_lw_shared<sstables::sstable>(std::move(sstab));
return sst->get_sstable_key_range(*_schema).then([this, sst, reset_level] (range<partition_key> r) mutable {
// Checks whether or not sstable belongs to current shard.
if (!belongs_to_current_shard(*_schema, r)) {
dblog.debug("sstable {} not relevant for this shard, ignoring", sst->get_filename());
sst->mark_for_deletion();
return make_ready_future<>();
}
bool in_other_shard = belongs_to_other_shard(*_schema, std::move(r));
return sst->load().then([this, sst, in_other_shard, reset_level] () mutable {
if (in_other_shard) {
// If we're here, this sstable is shared by this and other
// shard(s). Shared sstables cannot be deleted until all
// shards compacted them, so to reduce disk space usage we
// want to start splitting them now.
// However, we need to delay this compaction until we read all
// the sstables belonging to this CF, because we need all of
// them to know which tombstones we can drop, and what
// generation number is free.
_sstables_need_rewrite.push_back(sst);
}
if (reset_level) {
// When loading a migrated sstable, set level to 0 because
// it may overlap with existing tables in levels > 0.
// This step is optional, because even if we didn't do this
// scylla would detect the overlap, and bring back some of
// the sstables to level 0.
sst->set_sstable_level(0);
}
add_sstable(sst);
});
});
}
// load_sstable() wants to start rewriting sstables which are shared between
// several shards, but we can't start any compaction before all the sstables
// of this CF were loaded. So call this function to start rewrites, if any.
void column_family::start_rewrite() {
for (auto sst : _sstables_need_rewrite) {
dblog.info("Splitting {} for shard", sst->get_filename());
_compaction_manager.submit_sstable_rewrite(this, sst);
}
_sstables_need_rewrite.clear();
}
future<sstables::entry_descriptor> column_family::probe_file(sstring sstdir, sstring fname) {
using namespace sstables;
@@ -505,24 +586,9 @@ future<sstables::entry_descriptor> column_family::probe_file(sstring sstdir, sst
}
}
auto sst = std::make_unique<sstables::sstable>(_schema->ks_name(), _schema->cf_name(), sstdir, comps.generation, comps.version, comps.format);
auto fut = sst->get_sstable_key_range(*_schema);
return std::move(fut).then([this, sst = std::move(sst), sstdir = std::move(sstdir), comps] (range<partition_key> r) mutable {
// Checks whether or not sstable belongs to current shard.
if (!belongs_to_current_shard(*_schema, std::move(r))) {
dblog.debug("sstable {} not relevant for this shard, ignoring",
sstables::sstable::filename(sstdir, _schema->ks_name(), _schema->cf_name(), comps.version, comps.generation, comps.format,
sstables::sstable::component_type::Data));
sstable::mark_sstable_for_deletion(_schema->ks_name(), _schema->cf_name(), sstdir, comps.generation, comps.version, comps.format);
return make_ready_future<>();
}
auto fut = sst->load();
return std::move(fut).then([this, sst = std::move(sst)] () mutable {
add_sstable(std::move(*sst));
return make_ready_future<>();
});
}).then_wrapped([fname, comps] (future<> f) {
return load_sstable(sstables::sstable(
_schema->ks_name(), _schema->cf_name(), sstdir, comps.generation,
comps.version, comps.format)).then_wrapped([fname, comps] (future<> f) {
try {
f.get();
} catch (malformed_sstable_exception& e) {
@@ -662,10 +728,6 @@ column_family::seal_active_memtable() {
auto old = _memtables->back();
dblog.debug("Sealing active memtable, partitions: {}, occupancy: {}", old->partition_count(), old->occupancy());
if (!_config.enable_disk_writes) {
return make_ready_future<>();
}
if (old->empty()) {
dblog.debug("Memtable is empty");
return make_ready_future<>();
@@ -766,6 +828,12 @@ column_family::start() {
future<>
column_family::stop() {
// Please note that in here, we shouldn't use the implicit seal function in each memtable's
// list. The reason is that for the streaming memtables, the memtable_list's seal function does
// not guarantee anything to be immediately flushed, and will set a timer instead (so we can
// coalesce writes). During stop, we need to force flushing behavior so we call
// seal_active_streaming_memtable() instead. That problem does not exist for memtables and we
// could call _memtables->seal_active_memtable() here. We don't, for consistency with streaming.
seal_active_memtable();
seal_active_streaming_memtable();
return _compaction_manager.remove(this).then([this] {
@@ -778,6 +846,51 @@ column_family::stop() {
});
}
future<std::vector<sstables::entry_descriptor>> column_family::flush_upload_dir() {
struct work {
sstable_list sstables;
std::unordered_map<int64_t, sstables::entry_descriptor> descriptors;
std::vector<sstables::entry_descriptor> flushed;
};
return do_with(work(), [this] (work& work) {
return lister::scan_dir(_config.datadir + "/upload/", { directory_entry_type::regular },
[this, &work] (directory_entry de) {
auto comps = sstables::entry_descriptor::make_descriptor(de.name);
if (comps.component != sstables::sstable::component_type::TOC) {
return make_ready_future<>();
}
auto sst = make_lw_shared<sstables::sstable>(_schema->ks_name(), _schema->cf_name(),
_config.datadir + "/upload", comps.generation,
comps.version, comps.format);
work.sstables.emplace(comps.generation, std::move(sst));
work.descriptors.emplace(comps.generation, std::move(comps));
return make_ready_future<>();
}, &manifest_json_filter).then([this, &work] {
work.flushed.reserve(work.descriptors.size());
return do_for_each(work.sstables, [this, &work] (auto& pair) {
auto gen = this->calculate_generation_for_new_table();
auto& sst = pair.second;
auto&& comps = std::move(work.descriptors.at(pair.first));
comps.generation = gen;
work.flushed.push_back(std::move(comps));
// Read toc content as it will be needed for moving and deleting a sstable.
return sst->read_toc().then([&sst] {
return sst->mutate_sstable_level(0);
}).then([this, &sst, gen] {
return sst->create_links(_config.datadir, gen);
}).then([&sst] {
return sstables::remove_by_toc_name(sst->toc_filename());
});
});
}).then([&work] {
return make_ready_future<std::vector<sstables::entry_descriptor>>(std::move(work.flushed));
});
});
}
future<std::vector<sstables::entry_descriptor>>
column_family::reshuffle_sstables(std::set<int64_t> all_generations, int64_t start) {
@@ -905,6 +1018,12 @@ column_family::rebuild_sstable_list(const std::vector<sstables::shared_sstable>&
});
_sstables_compacted_but_not_deleted.erase(e, _sstables_compacted_but_not_deleted.end());
rebuild_statistics();
}).handle_exception([] (std::exception_ptr e) {
try {
std::rethrow_exception(e);
} catch (sstables::atomic_deletion_cancelled& adc) {
dblog.debug("Failed to delete sstables after compaction: {}", adc);
}
});
});
}
@@ -971,19 +1090,14 @@ future<> column_family::cleanup_sstables(sstables::compaction_descriptor descrip
future<>
column_family::load_new_sstables(std::vector<sstables::entry_descriptor> new_tables) {
return parallel_for_each(new_tables, [this] (auto comps) {
auto sst = make_lw_shared<sstables::sstable>(_schema->ks_name(), _schema->cf_name(), _config.datadir, comps.generation, comps.version, comps.format);
return sst->load().then([this, sst] {
return sst->mutate_sstable_level(0);
}).then([this, sst] {
auto first = sst->get_first_partition_key(*_schema);
auto last = sst->get_last_partition_key(*_schema);
if (belongs_to_current_shard(*_schema, first, last)) {
this->add_sstable(sst);
} else {
sst->mark_for_deletion();
}
return make_ready_future<>();
});
return this->load_sstable(sstables::sstable(
_schema->ks_name(), _schema->cf_name(), _config.datadir,
comps.generation, comps.version, comps.format), true);
}).then([this] {
start_rewrite();
// Drop entire cache for this column family because it may be populated
// with stale data.
return get_row_cache().clear();
});
}
@@ -1034,14 +1148,6 @@ void column_family::set_compaction_strategy(sstables::compaction_strategy_type s
_compaction_strategy = make_compaction_strategy(strategy, _schema->compaction_strategy_options());
}
bool column_family::compaction_manager_queued() const {
return _compaction_manager_queued;
}
void column_family::set_compaction_manager_queued(bool compaction_manager_queued) {
_compaction_manager_queued = compaction_manager_queued;
}
bool column_family::pending_compactions() const {
return _stats.pending_compactions > 0;
}
@@ -1113,7 +1219,12 @@ future<> column_family::populate(sstring sstdir) {
return do_with(std::vector<future<>>(), [this, sstdir, verifier, descriptor] (std::vector<future<>>& futures) {
return lister::scan_dir(sstdir, { directory_entry_type::regular }, [this, sstdir, verifier, descriptor, &futures] (directory_entry de) {
// FIXME: The secondary indexes are in this level, but with a directory type, (starting with ".")
auto f = probe_file(sstdir, de.name).then([verifier, descriptor] (auto entry) {
auto f = probe_file(sstdir, de.name).then([verifier, descriptor, sstdir] (auto entry) {
if (entry.component == sstables::sstable::component_type::TemporaryStatistics) {
return remove_file(sstables::sstable::filename(sstdir, entry.ks, entry.cf, entry.version, entry.generation,
entry.format, sstables::sstable::component_type::TemporaryStatistics));
}
if (verifier->count(entry.generation)) {
if (verifier->at(entry.generation) == status::has_toc_file) {
if (entry.component == sstables::sstable::component_type::TOC) {
@@ -1143,6 +1254,7 @@ future<> column_family::populate(sstring sstdir) {
if (!descriptor->format) {
descriptor->format = entry.format;
}
return make_ready_future<>();
});
// push future returned by probe_file into an array of futures,
@@ -1205,8 +1317,7 @@ database::database() : database(db::config())
{}
database::database(const db::config& cfg)
: _streaming_dirty_memory_region_group(&_dirty_memory_region_group)
, _cfg(std::make_unique<db::config>(cfg))
: _cfg(std::make_unique<db::config>(cfg))
, _memtable_total_space([this] {
_stats = make_lw_shared<db_stats>();
@@ -1217,6 +1328,7 @@ database::database(const db::config& cfg)
return memtable_total_space;
}())
, _streaming_memtable_total_space(_memtable_total_space / 4)
, _streaming_dirty_memory_region_group(&_dirty_memory_region_group)
, _version(empty_version)
, _enable_incremental_backups(cfg.incremental_backups())
, _memtables_throttler(_memtable_total_space, _dirty_memory_region_group)
@@ -1225,8 +1337,7 @@ database::database(const db::config& cfg)
&_memtables_throttler
)
{
// Start compaction manager with two tasks for handling compaction jobs.
_compaction_manager.start(2);
_compaction_manager.start();
setup_collectd();
dblog.info("Row: max_vector_size: {}, internal_count: {}", size_t(row::max_vector_size), size_t(row::internal_count));
@@ -1269,6 +1380,38 @@ database::setup_collectd() {
, "total_operations", "total_reads")
, scollectd::make_typed(scollectd::data_type::DERIVE, _stats->total_reads)
));
_collectd.push_back(
scollectd::add_polled_metric(scollectd::type_instance_id("database"
, scollectd::per_cpu_plugin_instance
, "total_operations", "sstable_read_queue_overloads")
, scollectd::make_typed(scollectd::data_type::COUNTER, _stats->sstable_read_queue_overloaded)
));
_collectd.push_back(
scollectd::add_polled_metric(scollectd::type_instance_id("database"
, scollectd::per_cpu_plugin_instance
, "queue_length", "active_reads")
, scollectd::make_typed(scollectd::data_type::GAUGE, [this] { return max_concurrent_reads() - _read_concurrency_sem.current(); })
));
_collectd.push_back(
scollectd::add_polled_metric(scollectd::type_instance_id("database"
, scollectd::per_cpu_plugin_instance
, "queue_length", "queued_reads")
, scollectd::make_typed(scollectd::data_type::GAUGE, [this] { return _read_concurrency_sem.waiters(); })
));
_collectd.push_back(
scollectd::add_polled_metric(scollectd::type_instance_id("database"
, scollectd::per_cpu_plugin_instance
, "queue_length", "active_reads_system_keyspace")
, scollectd::make_typed(scollectd::data_type::GAUGE, [this] { return max_system_concurrent_reads() - _system_read_concurrency_sem.current(); })
));
_collectd.push_back(
scollectd::add_polled_metric(scollectd::type_instance_id("database"
, scollectd::per_cpu_plugin_instance
, "queue_length", "queued_reads_system_keyspace")
, scollectd::make_typed(scollectd::data_type::GAUGE, [this] { return _system_read_concurrency_sem.waiters(); })
));
}
database::~database() {
@@ -1287,48 +1430,29 @@ future<> database::populate_keyspace(sstring datadir, sstring ks_name) {
auto i = _keyspaces.find(ks_name);
if (i == _keyspaces.end()) {
dblog.warn("Skipping undefined keyspace: {}", ks_name);
return make_ready_future<>();
} else {
dblog.info("Populating Keyspace {}", ks_name);
return lister::scan_dir(ksdir, { directory_entry_type::directory }, [this, ksdir, ks_name] (directory_entry de) {
auto comps = parse_fname(de.name);
if (comps.size() < 2) {
dblog.error("Keyspace {}: Skipping malformed CF {} ", ksdir, de.name);
return make_ready_future<>();
}
sstring cfname = comps[0];
sstring uuidst = comps[1];
try {
auto&& uuid = [&] {
try {
return find_uuid(ks_name, cfname);
} catch (const std::out_of_range& e) {
std::throw_with_nested(no_such_column_family(ks_name, cfname));
}
}();
auto& cf = find_column_family(uuid);
// #870: Check that the directory name matches
// the current, expected UUID of the CF.
if (utils::UUID(uuidst) == uuid) {
// FIXME: Increase parallelism.
auto sstdir = ksdir + "/" + de.name;
dblog.info("Keyspace {}: Reading CF {} ", ksdir, cfname);
return cf.populate(sstdir);
}
// Nope. Warn and ignore.
dblog.info("Keyspace {}: Skipping obsolete version of CF {} ({})", ksdir, cfname, uuidst);
} catch (marshal_exception&) {
// Bogus UUID part of directory name
dblog.warn("{}, CF {}: malformed UUID: {}. Ignoring", ksdir, comps[0], uuidst);
} catch (no_such_column_family&) {
dblog.warn("{}, CF {}: schema not loaded!", ksdir, comps[0]);
}
return make_ready_future<>();
});
auto& ks = i->second;
return parallel_for_each(ks.metadata()->cf_meta_data() | boost::adaptors::map_values,
[ks_name, &ks, this] (schema_ptr s) {
utils::UUID uuid = s->id();
lw_shared_ptr<column_family> cf = _column_families[uuid];
sstring cfname = cf->schema()->cf_name();
auto sstdir = ks.column_family_directory(cfname, uuid);
dblog.info("Keyspace {}: Reading CF {} ", ks_name, cfname);
return ks.make_directory_for_column_family(cfname, uuid).then([cf, sstdir] {
return cf->populate(sstdir);
}).handle_exception([ks_name, cfname, sstdir](std::exception_ptr eptr) {
std::string msg =
sprint("Exception while populating keyspace '%s' with column family '%s' from file '%s': %s",
ks_name, cfname, sstdir, eptr);
dblog.error("Exception while populating keyspace '{}' with column family '{}' from file '{}': {}",
ks_name, cfname, sstdir, eptr);
throw std::runtime_error(msg.c_str());
});
});
}
return make_ready_future<>();
}
future<> database::populate(sstring datadir) {
@@ -1476,8 +1600,17 @@ void database::add_keyspace(sstring name, keyspace k) {
_keyspaces.emplace(std::move(name), std::move(k));
}
void database::update_keyspace(const sstring& name) {
throw std::runtime_error("update keyspace not implemented");
future<> database::update_keyspace(const sstring& name) {
auto& proxy = service::get_storage_proxy();
return db::schema_tables::read_schema_partition_for_keyspace(proxy, db::schema_tables::KEYSPACES, name).then([this, name](db::schema_tables::schema_result_value_type&& v) {
auto& ks = find_keyspace(name);
auto tmp_ksm = db::schema_tables::create_keyspace_from_schema_partition(v);
auto new_ksm = ::make_lw_shared<keyspace_metadata>(tmp_ksm->name(), tmp_ksm->strategy_name(), tmp_ksm->strategy_options(), tmp_ksm->durable_writes(),
boost::copy_range<std::vector<schema_ptr>>(ks.metadata()->cf_meta_data() | boost::adaptors::map_values), ks.metadata()->user_types());
ks.update_from(std::move(new_ksm));
return service::get_local_migration_manager().notify_update_keyspace(ks.metadata());
});
}
void database::drop_keyspace(const sstring& name) {
@@ -1631,6 +1764,11 @@ keyspace::set_replication_strategy(std::unique_ptr<locator::abstract_replication
_replication_strategy = std::move(replication_strategy);
}
void keyspace::update_from(::lw_shared_ptr<keyspace_metadata> ksm) {
_metadata = std::move(ksm);
create_replication_strategy(_metadata->strategy_options());
}
column_family::config
keyspace::make_column_family_config(const schema& s) const {
column_family::config cfg;
@@ -1643,6 +1781,7 @@ keyspace::make_column_family_config(const schema& s) const {
cfg.max_streaming_memtable_size = _config.max_streaming_memtable_size;
cfg.dirty_memory_region_group = _config.dirty_memory_region_group;
cfg.streaming_dirty_memory_region_group = _config.streaming_dirty_memory_region_group;
cfg.read_concurrency_config = _config.read_concurrency_config;
cfg.cf_stats = _config.cf_stats;
cfg.enable_incremental_backups = _config.enable_incremental_backups;
@@ -1658,7 +1797,11 @@ keyspace::column_family_directory(const sstring& name, utils::UUID uuid) const {
future<>
keyspace::make_directory_for_column_family(const sstring& name, utils::UUID uuid) {
return io_check(touch_directory, column_family_directory(name, uuid));
auto cfdir = column_family_directory(name, uuid);
return seastar::async([cfdir = std::move(cfdir)] {
io_check(touch_directory, cfdir).get();
io_check(touch_directory, cfdir + "/upload").get();
});
}
no_such_keyspace::no_such_keyspace(const sstring& ks_name)
@@ -1821,6 +1964,7 @@ column_family::query(schema_ptr s, const query::read_command& cmd, query::result
auto add_partition = [&qs] (uint32_t live_rows, mutation&& m) {
auto pb = qs.builder.add_partition(*qs.schema, m.key());
m.partition().query_compacted(pb, *qs.schema, live_rows);
qs.limit -= live_rows;
};
return do_with(querying_reader(qs.schema, as_mutation_source(), range, qs.cmd.slice, qs.limit, qs.cmd.timestamp, add_partition),
[] (auto&& rd) { return rd.read(); });
@@ -1828,18 +1972,21 @@ column_family::query(schema_ptr s, const query::read_command& cmd, query::result
return make_ready_future<lw_shared_ptr<query::result>>(
make_lw_shared<query::result>(qs.builder.build()));
}).finally([lc, this]() mutable {
_stats.reads.mark(lc);
if (lc.is_start()) {
_stats.estimated_read.add(lc.latency(), _stats.reads.count);
}
_stats.reads.mark(lc);
if (lc.is_start()) {
_stats.estimated_read.add(lc.latency(), _stats.reads.hist.count);
}
});
}
}
mutation_source
column_family::as_mutation_source() const {
return mutation_source([this] (schema_ptr s, const query::partition_range& range, const io_priority_class& pc) {
return this->make_reader(std::move(s), range, pc);
return mutation_source([this] (schema_ptr s,
const query::partition_range& range,
query::clustering_key_filtering_context ck_filtering,
const io_priority_class& pc) {
return this->make_reader(std::move(s), range, ck_filtering, pc);
});
}
@@ -1865,7 +2012,7 @@ std::unordered_set<sstring> database::get_initial_tokens() {
std::unordered_set<sstring> tokens;
sstring tokens_string = get_config().initial_token();
try {
boost::split(tokens, tokens_string, boost::is_any_of(sstring(",")));
boost::split(tokens, tokens_string, boost::is_any_of(sstring(", ")));
} catch (...) {
throw std::runtime_error(sprint("Unable to parse initial_token=%s", tokens_string));
}
@@ -1931,7 +2078,7 @@ column_family::apply(const mutation& m, const db::replay_position& rp) {
_memtables->seal_on_overflow();
_stats.writes.mark(lc);
if (lc.is_start()) {
_stats.estimated_write.add(lc.latency(), _stats.writes.count);
_stats.estimated_write.add(lc.latency(), _stats.writes.hist.count);
}
}
@@ -1944,7 +2091,7 @@ column_family::apply(const frozen_mutation& m, const schema_ptr& m_schema, const
_memtables->seal_on_overflow();
_stats.writes.mark(lc);
if (lc.is_start()) {
_stats.estimated_write.add(lc.latency(), _stats.writes.count);
_stats.estimated_write.add(lc.latency(), _stats.writes.hist.count);
}
}
@@ -2095,6 +2242,14 @@ database::make_keyspace_config(const keyspace_metadata& ksm) {
}
cfg.dirty_memory_region_group = &_dirty_memory_region_group;
cfg.streaming_dirty_memory_region_group = &_streaming_dirty_memory_region_group;
cfg.read_concurrency_config.sem = &_read_concurrency_sem;
cfg.read_concurrency_config.timeout = _cfg->read_request_timeout_in_ms() * 1ms;
// Assume a queued read takes up 10kB of memory, and allow 2% of memory to be filled up with such reads.
cfg.read_concurrency_config.max_queue_length = memory::stats().total_memory() * 0.02 / 10000;
cfg.read_concurrency_config.raise_queue_overloaded_exception = [this] {
++_stats->sstable_read_queue_overloaded;
throw std::runtime_error("sstable inactive read queue overloaded");
};
cfg.cf_stats = &_cf_stats;
cfg.enable_incremental_backups = _enable_incremental_backups;
return cfg;
@@ -2186,7 +2341,7 @@ future<> database::truncate(const keyspace& ks, column_family& cf, timestamp_fun
// gotten all things to disk. Again, need queue-ish or something.
f = cf.flush();
} else {
cf.clear();
f = cf.clear();
}
return cf.run_with_compaction_disabled([f = std::move(f), &cf, auto_snapshot, tsf = std::move(tsf)]() mutable {
@@ -2522,7 +2677,7 @@ future<> column_family::flush() {
// FIXME: this will synchronously wait for this write to finish, but doesn't guarantee
// anything about previous writes.
_stats.pending_flushes++;
return seal_active_memtable().finally([this]() mutable {
return _memtables->seal_active_memtable().finally([this]() mutable {
_stats.pending_flushes--;
// In origin memtable_switch_count is incremented inside
// ColumnFamilyMeetrics Flush.run
@@ -2544,7 +2699,7 @@ future<> column_family::flush(const db::replay_position& pos) {
// We ignore this for now and just say that if we're asked for
// a CF and it exists, we pretty much have to have data that needs
// flushing. Let's do it.
return seal_active_memtable();
return _memtables->seal_active_memtable();
}
// FIXME: We can do much better than this in terms of cache management. Right
@@ -2561,29 +2716,36 @@ future<> column_family::flush_streaming_mutations(std::vector<query::partition_r
// need this code to go away as soon as we can (see FIXME above). So the double gate is a better
// temporary counter measure.
return with_gate(_streaming_flush_gate, [this, ranges = std::move(ranges)] {
return seal_active_streaming_memtable_delayed().finally([this, ranges = std::move(ranges)] {
if (_config.enable_cache) {
for (auto& range : ranges) {
_cache.invalidate(range);
}
return _streaming_memtables->seal_active_memtable().finally([this, ranges = std::move(ranges)] {
if (!_config.enable_cache) {
return make_ready_future<>();
}
return do_with(std::move(ranges), [this] (auto& ranges) {
return parallel_for_each(ranges, [this](auto&& range) {
return _cache.invalidate(range);
});
});
return do_with(std::move(ranges), [this] (auto& ranges) {
return parallel_for_each(ranges, [this](auto&& range) {
return _cache.invalidate(range);
});
});
});
});
}
void column_family::clear() {
_cache.clear();
future<> column_family::clear() {
_memtables->clear();
_memtables->add_memtable();
_streaming_memtables->clear();
_streaming_memtables->add_memtable();
return _cache.clear();
}
// NOTE: does not need to be futurized, but might eventually, depending on
// if we implement notifications, whatnot.
future<db::replay_position> column_family::discard_sstables(db_clock::time_point truncated_at) {
assert(_compaction_disabled > 0);
assert(!compaction_manager_queued());
return with_lock(_sstables_lock.for_read(), [this, truncated_at] {
db::replay_position rp;
@@ -2603,13 +2765,13 @@ future<db::replay_position> column_family::discard_sstables(db_clock::time_point
_sstables = std::move(pruned);
dblog.debug("cleaning out row cache");
_cache.clear();
return parallel_for_each(remove, [](sstables::shared_sstable s) {
return sstables::delete_atomically({s});
}).then([rp] {
return make_ready_future<db::replay_position>(rp);
}).finally([remove] {}); // keep the objects alive until here.
return _cache.clear().then([rp, remove = std::move(remove)] () mutable {
return parallel_for_each(remove, [](sstables::shared_sstable s) {
return sstables::delete_atomically({s});
}).then([rp] {
return make_ready_future<db::replay_position>(rp);
}).finally([remove] {}); // keep the objects alive until here.
});
});
}

View File

@@ -249,6 +249,7 @@ public:
size_t max_streaming_memtable_size = 5'000'000;
logalloc::region_group* dirty_memory_region_group = nullptr;
logalloc::region_group* streaming_dirty_memory_region_group = nullptr;
restricted_mutation_reader_config read_concurrency_config;
::cf_stats* cf_stats = nullptr;
};
struct no_commitlog {};
@@ -262,13 +263,13 @@ public:
int64_t live_sstable_count = 0;
/** Estimated number of compactions pending for this column family */
int64_t pending_compactions = 0;
utils::ihistogram reads{256};
utils::ihistogram writes{256};
utils::timed_rate_moving_average_and_histogram reads{256};
utils::timed_rate_moving_average_and_histogram writes{256};
sstables::estimated_histogram estimated_read;
sstables::estimated_histogram estimated_write;
sstables::estimated_histogram estimated_sstable_per_read;
utils::ihistogram tombstone_scanned;
utils::ihistogram live_scanned;
utils::timed_rate_moving_average_and_histogram tombstone_scanned;
utils::timed_rate_moving_average_and_histogram live_scanned;
};
struct snapshot_details {
@@ -300,6 +301,7 @@ private:
// server.
lw_shared_ptr<memtable_list> _streaming_memtables;
lw_shared_ptr<memtable_list> make_memory_only_memtable_list();
lw_shared_ptr<memtable_list> make_memtable_list();
lw_shared_ptr<memtable_list> make_streaming_memtable_list();
@@ -309,6 +311,11 @@ private:
// have not been deleted yet, so must not GC any tombstones in other sstables
// that may delete data in these sstables:
std::vector<sstables::shared_sstable> _sstables_compacted_but_not_deleted;
// sstables that are shared between several shards so we want to rewrite
// them (split the data belonging to this shard to a separate sstable),
// but for correct compaction we need to start the compaction only after
// reading all sstables.
std::vector<sstables::shared_sstable> _sstables_need_rewrite;
// Control background fibers waiting for sstables to be deleted
seastar::gate _sstable_deletion_gate;
// There are situations in which we need to stop writing sstables. Flushers will take
@@ -322,8 +329,6 @@ private:
db::commitlog* _commitlog;
sstables::compaction_strategy _compaction_strategy;
compaction_manager& _compaction_manager;
// Whether or not a cf is queued by its compaction manager.
bool _compaction_manager_queued = false;
int _compaction_disabled = 0;
class memtable_flush_queue;
std::unique_ptr<memtable_flush_queue> _flush_queue;
@@ -339,6 +344,7 @@ private:
void update_stats_for_new_sstable(uint64_t disk_space_used_by_sstable);
void add_sstable(sstables::sstable&& sstable);
void add_sstable(lw_shared_ptr<sstables::sstable> sstable);
future<> load_sstable(sstables::sstable&& sstab, bool reset_level = false);
lw_shared_ptr<memtable> new_memtable();
lw_shared_ptr<memtable> new_streaming_memtable();
future<stop_iteration> try_flush_memtable_to_sstable(lw_shared_ptr<memtable> memt);
@@ -369,7 +375,10 @@ private:
// Caller needs to ensure that column_family remains live (FIXME: relax this).
// The 'range' parameter must be live as long as the reader is used.
// Mutations returned by the reader will all have given schema.
mutation_reader make_sstable_reader(schema_ptr schema, const query::partition_range& range, const io_priority_class& pc) const;
mutation_reader make_sstable_reader(schema_ptr schema,
const query::partition_range& range,
query::clustering_key_filtering_context ck_filtering,
const io_priority_class& pc) const;
mutation_source sstables_as_mutation_source();
key_source sstables_as_key_source() const;
@@ -407,6 +416,7 @@ public:
// will be scheduled under the priority class given by pc.
mutation_reader make_reader(schema_ptr schema,
const query::partition_range& range = query::full_partition_range,
const query::clustering_key_filtering_context& ck_filtering = query::no_clustering_key_filtering,
const io_priority_class& pc = default_priority_class()) const;
mutation_source as_mutation_source() const;
@@ -427,9 +437,13 @@ public:
}
logalloc::occupancy_stats occupancy() const;
private:
column_family(schema_ptr schema, config cfg, db::commitlog* cl, compaction_manager&);
public:
column_family(schema_ptr schema, config cfg, db::commitlog& cl, compaction_manager&);
column_family(schema_ptr schema, config cfg, no_commitlog, compaction_manager&);
column_family(schema_ptr schema, config cfg, db::commitlog& cl, compaction_manager& cm)
: column_family(schema, std::move(cfg), &cl, cm) {}
column_family(schema_ptr schema, config cfg, no_commitlog, compaction_manager& cm)
: column_family(schema, std::move(cfg), nullptr, cm) {}
column_family(column_family&&) = delete; // 'this' is being captured during construction
~column_family();
const schema_ptr& schema() const { return _schema; }
@@ -456,7 +470,7 @@ public:
future<> flush();
future<> flush(const db::replay_position&);
future<> flush_streaming_mutations(std::vector<query::partition_range> ranges = std::vector<query::partition_range>{});
void clear(); // discards memtable(s) without flushing them to disk.
future<> clear(); // discards memtable(s) without flushing them to disk.
future<db::replay_position> discard_sstables(db_clock::time_point);
// Important warning: disabling writes will only have an effect in the current shard.
@@ -483,6 +497,17 @@ public:
return std::chrono::steady_clock::now() - _sstable_writes_disabled_at;
}
// This function will iterate through upload directory in column family,
// and will do the following for each sstable found:
// 1) Mutate sstable level to 0.
// 2) Create hard links to its components in column family dir.
// 3) Remove all of its components in upload directory.
// At the end, it's expected that upload dir is empty and all of its
// previous content was moved to column family dir.
//
// Return a vector containing descriptor of sstables to be loaded.
future<std::vector<sstables::entry_descriptor>> flush_upload_dir();
// Make sure the generation numbers are sequential, starting from "start".
// Generations before "start" are left untouched.
//
@@ -546,8 +571,6 @@ public:
return _compaction_strategy;
}
bool compaction_manager_queued() const;
void set_compaction_manager_queued(bool compaction_manager_queued);
bool pending_compactions() const;
const stats& get_stats() const {
@@ -618,6 +641,7 @@ private:
future<sstables::entry_descriptor> probe_file(sstring sstdir, sstring fname);
void check_valid_rp(const db::replay_position&) const;
public:
void start_rewrite();
// Iterate over all partitions. Protocol is the same as std::all_of(),
// so that iteration can be stopped by returning false.
future<bool> for_all_partitions_slow(schema_ptr, std::function<bool (const dht::decorated_key&, const mutation_partition&)> func) const;
@@ -727,6 +751,7 @@ public:
size_t max_streaming_memtable_size = 5'000'000;
logalloc::region_group* dirty_memory_region_group = nullptr;
logalloc::region_group* streaming_dirty_memory_region_group = nullptr;
restricted_mutation_reader_config read_concurrency_config;
::cf_stats* cf_stats = nullptr;
};
private:
@@ -738,10 +763,24 @@ public:
: _metadata(std::move(metadata))
, _config(std::move(cfg))
{}
const lw_shared_ptr<keyspace_metadata>& metadata() const {
void update_from(lw_shared_ptr<keyspace_metadata>);
/** Note: return by shared pointer value, since the meta data is
* semi-volatile. I.e. we could do alter keyspace at any time, and
* boom, it is replaced.
*/
lw_shared_ptr<keyspace_metadata> metadata() const {
return _metadata;
}
void create_replication_strategy(const std::map<sstring, sstring>& options);
/**
* This should not really be return by reference, since replication
* strategy is also volatile in that it could be replaced at "any" time.
* However, all current uses at least are "instantateous", i.e. does not
* carry it across a continuation. So it is sort of same for now, but
* should eventually be refactored.
*/
locator::abstract_replication_strategy& get_replication_strategy();
const locator::abstract_replication_strategy& get_replication_strategy() const;
column_family::config make_column_family_config(const schema& s) const;
@@ -770,7 +809,7 @@ public:
const sstring& datadir() const {
return _config.datadir;
}
private:
sstring column_family_directory(const sstring& name, utils::UUID uuid) const;
};
@@ -792,22 +831,30 @@ public:
class database {
::cf_stats _cf_stats;
static constexpr size_t max_concurrent_reads() { return 100; }
static constexpr size_t max_system_concurrent_reads() { return 10; }
struct db_stats {
uint64_t total_writes = 0;
uint64_t total_reads = 0;
uint64_t sstable_read_queue_overloaded = 0;
};
lw_shared_ptr<db_stats> _stats;
std::unique_ptr<db::config> _cfg;
size_t _memtable_total_space = 500 << 20;
size_t _streaming_memtable_total_space = 500 << 20;
logalloc::region_group _dirty_memory_region_group;
logalloc::region_group _streaming_dirty_memory_region_group;
semaphore _read_concurrency_sem{max_concurrent_reads()};
restricted_mutation_reader_config _read_concurrency_config;
semaphore _system_read_concurrency_sem{max_system_concurrent_reads()};
restricted_mutation_reader_config _system_read_concurrency_config;
std::unordered_map<sstring, keyspace> _keyspaces;
std::unordered_map<utils::UUID, lw_shared_ptr<column_family>> _column_families;
std::unordered_map<std::pair<sstring, sstring>, utils::UUID, utils::tuple_hash> _ks_cf_to_uuid;
std::unique_ptr<db::commitlog> _commitlog;
std::unique_ptr<db::config> _cfg;
size_t _memtable_total_space = 500 << 20;
size_t _streaming_memtable_total_space = 500 << 20;
utils::UUID _version;
// compaction_manager object is referenced by all column families of a database.
compaction_manager _compaction_manager;
@@ -876,7 +923,7 @@ public:
keyspace& find_keyspace(const sstring& name);
const keyspace& find_keyspace(const sstring& name) const;
bool has_keyspace(const sstring& name) const;
void update_keyspace(const sstring& name);
future<> update_keyspace(const sstring& name);
void drop_keyspace(const sstring& name);
const auto& keyspaces() const { return _keyspaces; }
std::vector<sstring> get_non_system_keyspaces() const;
@@ -948,6 +995,9 @@ public:
std::unordered_set<sstring> get_initial_tokens();
std::experimental::optional<gms::inet_address> get_replace_address();
bool is_replacing();
semaphore& system_keyspace_read_concurrency_sem() {
return _system_read_concurrency_sem;
}
};
// FIXME: stub

View File

@@ -324,6 +324,7 @@ public:
val(sstable_preemptive_open_interval_in_mb, uint32_t, 50, Unused, \
"When compacting, the replacement opens SSTables before they are completely written and uses in place of the prior SSTables for any range previously written. This setting helps to smoothly transfer reads between the SSTables by reducing page cache churn and keeps hot rows hot." \
) \
val(defragment_memory_on_idle, bool, true, Used, "Set to true to defragment memory when the cpu is idle. This reduces the amount of work Scylla performs when processing client requests.") \
/* Memtable settings */ \
val(memtable_allocation_type, sstring, "heap_buffers", Invalid, \
"Specify the way Cassandra allocates and manages memtable memory. See Off-heap memtables in Cassandra 2.1. Options are:\n" \
@@ -712,17 +713,17 @@ public:
val(api_ui_dir, sstring, "swagger-ui/dist/", Used, "The directory location of the API GUI") \
val(api_doc_dir, sstring, "api/api-doc/", Used, "The API definition file directory") \
val(load_balance, sstring, "none", Used, "CQL request load balancing: 'none' or round-robin'") \
val(consistent_rangemovement, bool, true, Used, "When set to true, range movements will be consistent. It means: 1) it will refuse to bootstrapp a new node if other bootstrapping/leaving/moving nodes detected. 2) data will be streamed to a new node only from the node which is no longer responsible for the token range. Same as -Dcassandra.consistent.rangemovement in cassandra") \
val(consistent_rangemovement, bool, true, Used, "When set to true, range movements will be consistent. It means: 1) it will refuse to bootstrap a new node if other bootstrapping/leaving/moving nodes detected. 2) data will be streamed to a new node only from the node which is no longer responsible for the token range. Same as -Dcassandra.consistent.rangemovement in cassandra") \
val(join_ring, bool, true, Used, "When set to true, a node will join the token ring. When set to false, a node will not join the token ring. User can use nodetool join to initiate ring joinging later. Same as -Dcassandra.join_ring in cassandra.") \
val(load_ring_state, bool, true, Used, "When set to true, load tokens and host_ids previously saved. Same as -Dcassandra.load_ring_state in cassandra.") \
val(replace_node, sstring, "", Used, "The UUID of the node to replace. Same as -Dcassandra.replace_node in cssandra.") \
val(replace_token, sstring, "", Used, "The tokens of the node to replace. Same as -Dcassandra.replace_token in cassandra.") \
val(replace_address, sstring, "", Used, "The listen_address or broadcast_address of the dead node to replace. Same as -Dcassandra.replace_address.") \
val(replace_address_first_boot, sstring, "", Used, "Like replace_address option, but if the node has been bootstrapped sucessfully it will be ignored. Same as -Dcassandra.replace_address_first_boot.") \
val(replace_address_first_boot, sstring, "", Used, "Like replace_address option, but if the node has been bootstrapped successfully it will be ignored. Same as -Dcassandra.replace_address_first_boot.") \
val(override_decommission, bool, false, Used, "Set true to force a decommissioned node to join the cluster") \
val(ring_delay_ms, uint32_t, 30 * 1000, Used, "Time a node waits to hear from other nodes before joining the ring in milliseconds. Same as -Dcassandra.ring_delay_ms in cassandra.") \
val(shutdown_announce_in_ms, uint32_t, 2 * 1000, Used, "Time a node waits after sending gossip shutdown message in milliseconds. Same as -Dcassandra.shutdown_announce_in_ms in cassandra.") \
val(developer_mode, bool, false, Used, "Relax environement checks. Setting to true can reduce performance and reliability significantly.") \
val(developer_mode, bool, false, Used, "Relax environment checks. Setting to true can reduce performance and reliability significantly.") \
val(skip_wait_for_gossip_to_settle, int32_t, -1, Used, "An integer to configure the wait for gossip to settle. -1: wait normally, 0: do not wait at all, n: wait for at most n polls. Same as -Dcassandra.skip_wait_for_gossip_to_settle in cassandra.") \
val(experimental, bool, false, Used, "Set to true to unlock experimental features.") \
/* done! */

View File

@@ -674,17 +674,19 @@ future<std::set<sstring>> merge_keyspaces(distributed<service::storage_proxy>& p
for (auto&& key : diff.entries_differing) {
altered.emplace_back(key);
}
return do_with(std::move(created), [&proxy, altered = std::move(altered)] (auto& created) {
return proxy.local().get_db().invoke_on_all([&created, altered = std::move(altered)] (database& db) {
return do_for_each(created, [&db](auto&& val) {
auto ksm = create_keyspace_from_schema_partition(val);
return db.create_keyspace(ksm).then([ksm] {
return service::get_local_migration_manager().notify_create_keyspace(ksm);
return do_with(std::move(created), [&proxy, altered = std::move(altered)] (auto& created) mutable {
return do_with(std::move(altered), [&proxy, &created](auto& altered) {
return proxy.local().get_db().invoke_on_all([&created, &altered] (database& db) {
return do_for_each(created, [&db](auto&& val) {
auto ksm = create_keyspace_from_schema_partition(val);
return db.create_keyspace(ksm).then([ksm] {
return service::get_local_migration_manager().notify_create_keyspace(ksm);
});
}).then([&altered, &db]() {
return do_for_each(altered, [&db](auto& name) {
return db.update_keyspace(name);
});
});
}).then([&altered, &db] () mutable {
for (auto&& name : altered) {
db.update_keyspace(name);
}
});
});
}).then([dropped = std::move(dropped)] () {
@@ -709,15 +711,21 @@ static void merge_tables(distributed<service::storage_proxy>& proxy,
std::map<qualified_name, schema_mutations>&& before,
std::map<qualified_name, schema_mutations>&& after)
{
struct dropped_table {
global_schema_ptr schema;
utils::joinpoint<db_clock::time_point> jp{[] {
return make_ready_future<db_clock::time_point>(db_clock::now());
}};
};
std::vector<global_schema_ptr> created;
std::vector<global_schema_ptr> altered;
std::vector<global_schema_ptr> dropped;
std::vector<dropped_table> dropped;
auto diff = difference(before, after);
for (auto&& key : diff.entries_only_on_left) {
auto&& s = proxy.local().get_db().local().find_schema(key.keyspace_name, key.table_name);
logger.info("Dropping {}.{} id={} version={}", s->ks_name(), s->cf_name(), s->id(), s->version());
dropped.emplace_back(s);
dropped.emplace_back(dropped_table{s});
}
for (auto&& key : diff.entries_only_on_right) {
auto s = create_table_from_mutations(after.at(key));
@@ -730,9 +738,7 @@ static void merge_tables(distributed<service::storage_proxy>& proxy,
altered.emplace_back(s);
}
do_with(utils::make_joinpoint([] { return db_clock::now();})
, [&created, &dropped, &altered, &proxy](auto& tsf) {
return proxy.local().get_db().invoke_on_all([&created, &dropped, &altered, &tsf] (database& db) {
proxy.local().get_db().invoke_on_all([&created, &dropped, &altered] (database& db) {
return seastar::async([&] {
for (auto&& gs : created) {
schema_ptr s = gs.get();
@@ -747,14 +753,13 @@ static void merge_tables(distributed<service::storage_proxy>& proxy,
for (auto&& gs : altered) {
update_column_family(db, gs.get()).get();
}
parallel_for_each(dropped.begin(), dropped.end(), [&db, &tsf](auto&& gs) {
schema_ptr s = gs.get();
return db.drop_column_family(s->ks_name(), s->cf_name(), [&tsf] { return tsf.value(); }).then([s] {
parallel_for_each(dropped.begin(), dropped.end(), [&db](dropped_table& dt) {
schema_ptr s = dt.schema.get();
return db.drop_column_family(s->ks_name(), s->cf_name(), [&dt] { return dt.jp.value(); }).then([s] {
return service::get_local_migration_manager().notify_drop_column_family(s);
});
}).get();
});
});
}).get();
}

View File

@@ -1022,6 +1022,10 @@ void make(database& db, bool durable, bool volatile_testing_only) {
kscfg.enable_disk_writes = !volatile_testing_only;
kscfg.enable_commitlog = !volatile_testing_only;
kscfg.enable_cache = true;
// don't make system keyspace reads wait for user reads
kscfg.read_concurrency_config.sem = &db.system_keyspace_read_concurrency_sem();
kscfg.read_concurrency_config.timeout = {};
kscfg.read_concurrency_config.max_queue_length = std::numeric_limits<size_t>::max();
keyspace _ks{ksm, std::move(kscfg)};
auto rs(locator::abstract_replication_strategy::create_replication_strategy(NAME, "LocalStrategy", service::get_local_storage_service().get_token_metadata(), ksm->strategy_options()));
_ks.set_replication_strategy(std::move(rs));

View File

@@ -87,6 +87,10 @@ public:
// [0x00, 0x80] == 1/512
// [0xff, 0x80] == 1 - 1/512
managed_bytes _data;
token() : _kind(kind::before_all_keys) {
}
token(kind k, managed_bytes d) : _kind(std::move(k)), _data(std::move(d)) {
}

View File

@@ -36,6 +36,8 @@ extern thread_local disk_error_signal_type sstable_read_error;
extern thread_local disk_error_signal_type sstable_write_error;
extern thread_local disk_error_signal_type general_disk_error;
bool should_stop_on_system_error(const std::system_error& e);
template<typename Func, typename... Args>
std::enable_if_t<!is_future<std::result_of_t<Func(Args&&...)>>::value,
std::result_of_t<Func(Args&&...)>>
@@ -44,7 +46,7 @@ do_io_check(disk_error_signal_type& signal, Func&& func, Args&&... args) {
// calling function
return func(std::forward<Args>(args)...);
} catch (std::system_error& e) {
if (is_system_error_errno(EIO)) {
if (should_stop_on_system_error(e)) {
signal();
throw storage_io_error(e);
}
@@ -62,7 +64,7 @@ auto do_io_check(disk_error_signal_type& signal, Func&& func, Args&&... args) {
try {
std::rethrow_exception(ep);
} catch (std::system_error& sys_err) {
if (is_system_error_errno(EIO)) {
if (should_stop_on_system_error(sys_err)) {
signal();
throw storage_io_error(sys_err);
}
@@ -70,7 +72,7 @@ auto do_io_check(disk_error_signal_type& signal, Func&& func, Args&&... args) {
return futurize<std::result_of_t<Func(Args&&...)>>::make_exception_future(ep);
});
} catch (std::system_error& e) {
if (is_system_error_errno(EIO)) {
if (should_stop_on_system_error(e)) {
signal();
throw storage_io_error(e);
}

14
dist/ami/build_ami.sh vendored
View File

@@ -6,9 +6,9 @@ if [ ! -e dist/ami/build_ami.sh ]; then
fi
print_usage() {
echo "build_ami.sh --localrpm --unstable"
echo "build_ami.sh --localrpm --repo [URL]"
echo " --localrpm deploy locally built rpms"
echo " --unstable use unstable branch"
echo " --repo specify repository URL"
exit 1
}
LOCALRPM=0
@@ -19,9 +19,9 @@ while [ $# -gt 0 ]; do
INSTALL_ARGS="$INSTALL_ARGS --localrpm"
shift 1
;;
"--unstable")
INSTALL_ARGS="$INSTALL_ARGS --unstable"
shift 1
"--repo")
INSTALL_ARGS="$INSTALL_ARGS --repo $2"
shift 2
;;
*)
print_usage
@@ -52,8 +52,9 @@ if [ $LOCALRPM -eq 1 ]; then
if [ "$ID" = "centos" ]; then
rm -rf build/*
sudo yum -y install git
if [ ! -f dist/ami/files/scylla-server.x86_64.rpm ]; then
if [ ! -f dist/ami/files/scylla-conf.x86_64.rpm ] || [ ! -f dist/ami/files/scylla-server.x86_64.rpm ]; then
dist/redhat/build_rpm.sh
cp build/rpmbuild/RPMS/x86_64/scylla-conf-`cat build/SCYLLA-VERSION-FILE`-`cat build/SCYLLA-RELEASE-FILE`.*.x86_64.rpm dist/ami/files/scylla-conf.x86_64.rpm
cp build/rpmbuild/RPMS/x86_64/scylla-server-`cat build/SCYLLA-VERSION-FILE`-`cat build/SCYLLA-RELEASE-FILE`.*.x86_64.rpm dist/ami/files/scylla-server.x86_64.rpm
fi
if [ ! -f dist/ami/files/scylla-jmx.noarch.rpm ]; then
@@ -79,6 +80,7 @@ if [ $LOCALRPM -eq 1 ]; then
echo "Build .deb before running build_ami.sh"
exit 1
fi
cp ../scylla-conf_`cat build/SCYLLA-VERSION-FILE | sed 's/\.rc/~rc/'`-`cat build/SCYLLA-RELEASE-FILE`-ubuntu1_amd64.deb dist/ami/files/scylla-conf_amd64.deb
cp ../scylla-server_`cat build/SCYLLA-VERSION-FILE | sed 's/\.rc/~rc/'`-`cat build/SCYLLA-RELEASE-FILE`-ubuntu1_amd64.deb dist/ami/files/scylla-server_amd64.deb
fi
if [ ! -f dist/ami/files/scylla-jmx_all.deb ]; then

View File

@@ -0,0 +1,5 @@
scylla-env (1.0-ubuntu1) trusty; urgency=medium
* Initial release.
-- Takuya ASADA <syuu@scylladb.com> Mon, 09 May 2016 18:32:52 +0000

View File

@@ -0,0 +1 @@
9

View File

@@ -0,0 +1,12 @@
Source: scylla-env
Maintainer: Takuya ASADA <syuu@scylladb.com>
Section: misc
Priority: optional
Standards-Version: 1.0
Build-Depends: debhelper (>= 9)
Package: scylla-env
Architecture: all
Depends: ${shlibs:Depends}, ${misc:Depends}
Description: language tool for constructing recognizers, compilers etc
A language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages.

View File

@@ -0,0 +1,676 @@
Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
Upstream-Name: ScyllaDB
Upstream-Contact: http://www.scylladb.com/
Source: https://github.com/scylladb/scylla
Files: *
Copyright: Copyright (c) 2016 ScyllaDB
License: AGPL-3.0
Files: debian/*
Copyright: Copyright (c) 2016 ScyllaDB
License: AGPL-3.0
License: AGPL-3.0
GNU AFFERO GENERAL PUBLIC LICENSE
Version 3, 19 November 2007
.
Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
.
Preamble
.
The GNU Affero General Public License is a free, copyleft license for
software and other kinds of works, specifically designed to ensure
cooperation with the community in the case of network server software.
.
The licenses for most software and other practical works are designed
to take away your freedom to share and change the works. By contrast,
our General Public Licenses are intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users.
.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.
.
Developers that use our General Public Licenses protect your rights
with two steps: (1) assert copyright on the software, and (2) offer
you this License which gives you legal permission to copy, distribute
and/or modify the software.
.
A secondary benefit of defending all users' freedom is that
improvements made in alternate versions of the program, if they
receive widespread use, become available for other developers to
incorporate. Many developers of free software are heartened and
encouraged by the resulting cooperation. However, in the case of
software used on network servers, this result may fail to come about.
The GNU General Public License permits making a modified version and
letting the public access it on a server without ever releasing its
source code to the public.
.
The GNU Affero General Public License is designed specifically to
ensure that, in such cases, the modified source code becomes available
to the community. It requires the operator of a network server to
provide the source code of the modified version running there to the
users of that server. Therefore, public use of a modified version, on
a publicly accessible server, gives the public access to the source
code of the modified version.
.
An older license, called the Affero General Public License and
published by Affero, was designed to accomplish similar goals. This is
a different license, not a version of the Affero GPL, but Affero has
released a new version of the Affero GPL which permits relicensing under
this license.
.
The precise terms and conditions for copying, distribution and
modification follow.
.
TERMS AND CONDITIONS
.
0. Definitions.
.
"This License" refers to version 3 of the GNU Affero General Public License.
.
"Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.
.
"The Program" refers to any copyrightable work licensed under this
License. Each licensee is addressed as "you". "Licensees" and
"recipients" may be individuals or organizations.
.
To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy. The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.
.
A "covered work" means either the unmodified Program or a work based
on the Program.
.
To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy. Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.
.
To "convey" a work means any kind of propagation that enables other
parties to make or receive copies. Mere interaction with a user through
a computer network, with no transfer of a copy, is not conveying.
.
An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to the
extent that warranties are provided), that licensees may convey the
work under this License, and how to view a copy of this License. If
the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.
.
1. Source Code.
.
The "source code" for a work means the preferred form of the work
for making modifications to it. "Object code" means any non-source
form of a work.
.
A "Standard Interface" means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that
is widely used among developers working in that language.
.
The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form. A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.
.
The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities. However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work. For example, Corresponding Source
includes interface definition files associated with source files for
the work, and the source code for shared libraries and dynamically
linked subprograms that the work is specifically designed to require,
such as by intimate data communication or control flow between those
subprograms and other parts of the work.
.
The Corresponding Source need not include anything that users
can regenerate automatically from other parts of the Corresponding
Source.
.
The Corresponding Source for a work in source code form is that
same work.
.
2. Basic Permissions.
.
All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met. This License explicitly affirms your unlimited
permission to run the unmodified Program. The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work. This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.
.
You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise remains
in force. You may convey covered works to others for the sole purpose
of having them make modifications exclusively for you, or provide you
with facilities for running those works, provided that you comply with
the terms of this License in conveying all material for which you do
not control copyright. Those thus making or running the covered works
for you must do so exclusively on your behalf, under your direction
and control, on terms that prohibit them from making any copies of
your copyrighted material outside their relationship with you.
.
Conveying under any other circumstances is permitted solely under
the conditions stated below. Sublicensing is not allowed; section 10
makes it unnecessary.
.
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
.
No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under article
11 of the WIPO copyright treaty adopted on 20 December 1996, or
similar laws prohibiting or restricting circumvention of such
measures.
.
When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such circumvention
is effected by exercising rights under this License with respect to
the covered work, and you disclaim any intention to limit operation or
modification of the work as a means of enforcing, against the work's
users, your or third parties' legal rights to forbid circumvention of
technological measures.
.
4. Conveying Verbatim Copies.
.
You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice;
keep intact all notices stating that this License and any
non-permissive terms added in accord with section 7 apply to the code;
keep intact all notices of the absence of any warranty; and give all
recipients a copy of this License along with the Program.
.
You may charge any price or no price for each copy that you convey,
and you may offer support or warranty protection for a fee.
.
5. Conveying Modified Source Versions.
.
You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:
.
a) The work must carry prominent notices stating that you modified
it, and giving a relevant date.
.
b) The work must carry prominent notices stating that it is
released under this License and any conditions added under section
7. This requirement modifies the requirement in section 4 to
"keep intact all notices".
.
c) You must license the entire work, as a whole, under this
License to anyone who comes into possession of a copy. This
License will therefore apply, along with any applicable section 7
additional terms, to the whole of the work, and all its parts,
regardless of how they are packaged. This License gives no
permission to license the work in any other way, but it does not
invalidate such permission if you have separately received it.
.
d) If the work has interactive user interfaces, each must display
Appropriate Legal Notices; however, if the Program has interactive
interfaces that do not display Appropriate Legal Notices, your
work need not make them do so.
.
A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,
in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users
beyond what the individual works permit. Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.
.
6. Conveying Non-Source Forms.
.
You may convey a covered work in object code form under the terms
of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this License,
in one of these ways:
.
a) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by the
Corresponding Source fixed on a durable physical medium
customarily used for software interchange.
.
b) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by a
written offer, valid for at least three years and valid for as
long as you offer spare parts or customer support for that product
model, to give anyone who possesses the object code either (1) a
copy of the Corresponding Source for all the software in the
product that is covered by this License, on a durable physical
medium customarily used for software interchange, for a price no
more than your reasonable cost of physically performing this
conveying of source, or (2) access to copy the
Corresponding Source from a network server at no charge.
.
c) Convey individual copies of the object code with a copy of the
written offer to provide the Corresponding Source. This
alternative is allowed only occasionally and noncommercially, and
only if you received the object code with such an offer, in accord
with subsection 6b.
.
d) Convey the object code by offering access from a designated
place (gratis or for a charge), and offer equivalent access to the
Corresponding Source in the same way through the same place at no
further charge. You need not require recipients to copy the
Corresponding Source along with the object code. If the place to
copy the object code is a network server, the Corresponding Source
may be on a different server (operated by you or a third party)
that supports equivalent copying facilities, provided you maintain
clear directions next to the object code saying where to find the
Corresponding Source. Regardless of what server hosts the
Corresponding Source, you remain obligated to ensure that it is
available for as long as needed to satisfy these requirements.
.
e) Convey the object code using peer-to-peer transmission, provided
you inform other peers where the object code and Corresponding
Source of the work are being offered to the general public at no
charge under subsection 6d.
.
A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be
included in conveying the object code work.
.
A "User Product" is either (1) a "consumer product", which means any
tangible personal property which is normally used for personal, family,
or household purposes, or (2) anything designed or sold for incorporation
into a dwelling. In determining whether a product is a consumer product,
doubtful cases shall be resolved in favor of coverage. For a particular
product received by a particular user, "normally used" refers to a
typical or common use of that class of product, regardless of the status
of the particular user or of the way in which the particular user
actually uses, or expects or is expected to use, the product. A product
is a consumer product regardless of whether the product has substantial
commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.
.
"Installation Information" for a User Product means any methods,
procedures, authorization keys, or other information required to install
and execute modified versions of a covered work in that User Product from
a modified version of its Corresponding Source. The information must
suffice to ensure that the continued functioning of the modified object
code is in no case prevented or interfered with solely because
modification has been made.
.
If you convey an object code work under this section in, or with, or
specifically for use in, a User Product, and the conveying occurs as
part of a transaction in which the right of possession and use of the
User Product is transferred to the recipient in perpetuity or for a
fixed term (regardless of how the transaction is characterized), the
Corresponding Source conveyed under this section must be accompanied
by the Installation Information. But this requirement does not apply
if neither you nor any third party retains the ability to install
modified object code on the User Product (for example, the work has
been installed in ROM).
.
The requirement to provide Installation Information does not include a
requirement to continue to provide support service, warranty, or updates
for a work that has been modified or installed by the recipient, or for
the User Product in which it has been modified or installed. Access to a
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
protocols for communication across the network.
.
Corresponding Source conveyed, and Installation Information provided,
in accord with this section must be in a format that is publicly
documented (and with an implementation available to the public in
source code form), and must require no special password or key for
unpacking, reading or copying.
.
7. Additional Terms.
.
"Additional permissions" are terms that supplement the terms of this
License by making exceptions from one or more of its conditions.
Additional permissions that are applicable to the entire Program shall
be treated as though they were included in this License, to the extent
that they are valid under applicable law. If additional permissions
apply only to part of the Program, that part may be used separately
under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.
.
When you convey a copy of a covered work, you may at your option
remove any additional permissions from that copy, or from any part of
it. (Additional permissions may be written to require their own
removal in certain cases when you modify the work.) You may place
additional permissions on material, added by you to a covered work,
for which you have or can give appropriate copyright permission.
.
Notwithstanding any other provision of this License, for material you
add to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:
.
a) Disclaiming warranty or limiting liability differently from the
terms of sections 15 and 16 of this License; or
.
b) Requiring preservation of specified reasonable legal notices or
author attributions in that material or in the Appropriate Legal
Notices displayed by works containing it; or
.
c) Prohibiting misrepresentation of the origin of that material, or
requiring that modified versions of such material be marked in
reasonable ways as different from the original version; or
.
d) Limiting the use for publicity purposes of names of licensors or
authors of the material; or
.
e) Declining to grant rights under trademark law for use of some
trade names, trademarks, or service marks; or
.
f) Requiring indemnification of licensors and authors of that
material by anyone who conveys the material (or modified versions of
it) with contractual assumptions of liability to the recipient, for
any liability that these contractual assumptions directly impose on
those licensors and authors.
.
All other non-permissive additional terms are considered "further
restrictions" within the meaning of section 10. If the Program as you
received it, or any part of it, contains a notice stating that it is
governed by this License along with a term that is a further
restriction, you may remove that term. If a license document contains
a further restriction but permits relicensing or conveying under this
License, you may add to a covered work material governed by the terms
of that license document, provided that the further restriction does
not survive such relicensing or conveying.
.
If you add terms to a covered work in accord with this section, you
must place, in the relevant source files, a statement of the
additional terms that apply to those files, or a notice indicating
where to find the applicable terms.
.
Additional terms, permissive or non-permissive, may be stated in the
form of a separately written license, or stated as exceptions;
the above requirements apply either way.
.
8. Termination.
.
You may not propagate or modify a covered work except as expressly
provided under this License. Any attempt otherwise to propagate or
modify it is void, and will automatically terminate your rights under
this License (including any patent licenses granted under the third
paragraph of section 11).
.
However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly and
finally terminates your license, and (b) permanently, if the copyright
holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.
.
Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.
.
Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License. If your rights have been terminated and not permanently
reinstated, you do not qualify to receive new licenses for the same
material under section 10.
.
9. Acceptance Not Required for Having Copies.
.
You are not required to accept this License in order to receive or
run a copy of the Program. Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer transmission
to receive a copy likewise does not require acceptance. However,
nothing other than this License grants you permission to propagate or
modify any covered work. These actions infringe copyright if you do
not accept this License. Therefore, by modifying or propagating a
covered work, you indicate your acceptance of this License to do so.
.
10. Automatic Licensing of Downstream Recipients.
.
Each time you convey a covered work, the recipient automatically
receives a license from the original licensors, to run, modify and
propagate that work, subject to this License. You are not responsible
for enforcing compliance by third parties with this License.
.
An "entity transaction" is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations. If propagation of a covered
work results from an entity transaction, each party to that
transaction who receives a copy of the work also receives whatever
licenses to the work the party's predecessor in interest had or could
give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if
the predecessor has it or can get it with reasonable efforts.
.
You may not impose any further restrictions on the exercise of the
rights granted or affirmed under this License. For example, you may
not impose a license fee, royalty, or other charge for exercise of
rights granted under this License, and you may not initiate litigation
(including a cross-claim or counterclaim in a lawsuit) alleging that
any patent claim is infringed by making, using, selling, offering for
sale, or importing the Program or any portion of it.
.
11. Patents.
.
A "contributor" is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based. The
work thus licensed is called the contributor's "contributor version".
.
A contributor's "essential patent claims" are all patent claims
owned or controlled by the contributor, whether already acquired or
hereafter acquired, that would be infringed by some manner, permitted
by this License, of making, using, or selling its contributor version,
but do not include claims that would be infringed only as a
consequence of further modification of the contributor version. For
purposes of this definition, "control" includes the right to grant
patent sublicenses in a manner consistent with the requirements of
this License.
.
Each contributor grants you a non-exclusive, worldwide, royalty-free
patent license under the contributor's essential patent claims, to
make, use, sell, offer for sale, import and otherwise run, modify and
propagate the contents of its contributor version.
.
In the following three paragraphs, a "patent license" is any express
agreement or commitment, however denominated, not to enforce a patent
(such as an express permission to practice a patent or covenant not to
sue for patent infringement). To "grant" such a patent license to a
party means to make such an agreement or commitment not to enforce a
patent against the party.
.
If you convey a covered work, knowingly relying on a patent license,
and the Corresponding Source of the work is not available for anyone
to copy, free of charge and under the terms of this License, through a
publicly available network server or other readily accessible means,
then you must either (1) cause the Corresponding Source to be so
available, or (2) arrange to deprive yourself of the benefit of the
patent license for this particular work, or (3) arrange, in a manner
consistent with the requirements of this License, to extend the patent
license to downstream recipients. "Knowingly relying" means you have
actual knowledge that, but for the patent license, your conveying the
covered work in a country, or your recipient's use of the covered work
in a country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.
.
If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
receiving the covered work authorizing them to use, propagate, modify
or convey a specific copy of the covered work, then the patent license
you grant is automatically extended to all recipients of the covered
work and works based on it.
.
A patent license is "discriminatory" if it does not include within
the scope of its coverage, prohibits the exercise of, or is
conditioned on the non-exercise of one or more of the rights that are
specifically granted under this License. You may not convey a covered
work if you are a party to an arrangement with a third party that is
in the business of distributing software, under which you make payment
to the third party based on the extent of your activity of conveying
the work, and under which the third party grants, to any of the
parties who would receive the covered work from you, a discriminatory
patent license (a) in connection with copies of the covered work
conveyed by you (or copies made from those copies), or (b) primarily
for and in connection with specific products or compilations that
contain the covered work, unless you entered into that arrangement,
or that patent license was granted, prior to 28 March 2007.
.
Nothing in this License shall be construed as excluding or limiting
any implied license or other defenses to infringement that may
otherwise be available to you under applicable patent law.
.
12. No Surrender of Others' Freedom.
.
If conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot convey a
covered work so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you may
not convey it at all. For example, if you agree to terms that obligate you
to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.
.
13. Remote Network Interaction; Use with the GNU General Public License.
.
Notwithstanding any other provision of this License, if you modify the
Program, your modified version must prominently offer all users
interacting with it remotely through a computer network (if your version
supports such interaction) an opportunity to receive the Corresponding
Source of your version by providing access to the Corresponding Source
from a network server at no charge, through some standard or customary
means of facilitating copying of software. This Corresponding Source
shall include the Corresponding Source for any work covered by version 3
of the GNU General Public License that is incorporated pursuant to the
following paragraph.
.
Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU General Public License into a single
combined work, and to convey the resulting work. The terms of this
License will continue to apply to the part which is the covered work,
but the work with which it is combined will remain governed by version
3 of the GNU General Public License.
.
14. Revised Versions of this License.
.
The Free Software Foundation may publish revised and/or new versions of
the GNU Affero General Public License from time to time. Such new versions
will be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
.
Each version is given a distinguishing version number. If the
Program specifies that a certain numbered version of the GNU Affero General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation. If the Program does not specify a version number of the
GNU Affero General Public License, you may choose any version ever published
by the Free Software Foundation.
.
If the Program specifies that a proxy can decide which future
versions of the GNU Affero General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.
.
Later license versions may give you additional or different
permissions. However, no additional obligations are imposed on any
author or copyright holder as a result of your choosing to follow a
later version.
.
15. Disclaimer of Warranty.
.
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
.
16. Limitation of Liability.
.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.
.
17. Interpretation of Sections 15 and 16.
.
If the disclaimer of warranty and limitation of liability provided
above cannot be given local legal effect according to their terms,
reviewing courts shall apply local law that most closely approximates
an absolute waiver of all civil liability in connection with the
Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.
.
END OF TERMS AND CONDITIONS
.
How to Apply These Terms to Your New Programs
.
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
.
Also add information on how to contact you by electronic and paper mail.
.
If your software can interact with users remotely through a computer
network, you should also make sure that it provides a way for users to
get its source. For example, if your program is a web application, its
interface could display a "Source" link that leads users to an archive
of the code. There are many ways you could offer source, and different
solutions will be better for different programs; see section 13 for the
specific requirements.
.
You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU AGPL, see
<http://www.gnu.org/licenses/>.

View File

@@ -0,0 +1,2 @@
profile.d/* etc/profile.d
ld.so.conf.d/* /etc/ld.so.conf.d

View File

@@ -0,0 +1,7 @@
#!/bin/sh
set -e
ldconfig
#DEBHELPER#

4
dist/common/dep/scylla-env-1.0/debian/rules vendored Executable file
View File

@@ -0,0 +1,4 @@
#!/usr/bin/make -f
%:
dh $@

54
dist/common/scripts/scylla_config_get.py vendored Executable file
View File

@@ -0,0 +1,54 @@
#!/usr/bin/python
#
# Copyright 2016 ScyllaDB
#
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
import sys
import yaml
import argparse
def get(config, key):
s = open(config).read()
cfg = yaml.load(s)
try:
val = cfg[key]
except KeyError:
print "key '%s' not found" % key
sys.exit(1)
if isinstance(val, list):
for v in val:
print "%s" % v
elif isinstance(val, dict):
for k, v in val.items():
print "%s:%s" % (k, v)
else:
print val
def main():
parser = argparse.ArgumentParser(description='scylla.yaml config reader/writer from shellscript.')
parser.add_argument('-c', '--config', dest='config', action='store',
default='/etc/scylla/scylla.yaml',
help='path to scylla.yaml')
parser.add_argument('-g', '--get', dest='get', action='store',
required=True, help='get parameter')
args = parser.parse_args()
get(args.config, args.get)
if __name__ == "__main__":
main()

42
dist/common/scripts/scylla_cpuset_setup vendored Executable file
View File

@@ -0,0 +1,42 @@
#!/bin/bash -e
#
# Copyright (C) 2016 ScyllaDB
print_usage() {
echo "scylla_cpuset_setup --cpuset 1-7 --smp 7"
echo " --cpuset CPUs to use (in cpuset(7) format; default: all))"
echo " --smp number of threads (default: one per CPU)"
exit 1
}
CPUSET=
SMP=
while [ $# -gt 0 ]; do
case "$1" in
"--cpuset")
CPUSET=$2
shift 2
;;
"--smp")
SMP=$2
shift 2
;;
*)
print_usage
;;
esac
done
if [ "$CPUSET" = "" ] && [ "$SMP" = "" ]; then
print_usage
fi
OUT="CPUSET=\""
if [ "$CPUSET" != "" ]; then
OUT="$OUT--cpuset $CPUSET "
fi
if [ "$SMP" != "" ]; then
OUT="$OUT--smp $SMP "
fi
OUT="$OUT\""
echo $OUT > /etc/scylla.d/cpuset.conf

View File

@@ -1,5 +1,18 @@
#!/bin/bash
. /etc/os-release
if [ "$ID" = "ubuntu" ]; then
. /etc/default/scylla-server
else
. /etc/sysconfig/scylla-server
fi
for i in /etc/scylla.d/*.conf; do
if [ "$i" = "/etc/scylla.d/*.conf" ]; then
break
fi
. "$i"
done
print_usage() {
echo "scylla_io_setup --ami"
echo " --ami setup AMI instance"
@@ -30,16 +43,9 @@ output_to_user()
logger -p user.err "$1"
}
. /etc/os-release
if [ "$NAME" = "Ubuntu" ]; then
. /etc/default/scylla-server
else
. /etc/sysconfig/scylla-server
fi
if [ `is_developer_mode` -eq 0 ]; then
SMP=`echo $SCYLLA_ARGS|grep smp|sed -e "s/^.*smp\(\s\+\|=\)\([0-9]*\).*$/\2/"`
CPUSET=`echo $SCYLLA_ARGS|grep cpuset|sed -e "s/^.*\(--cpuset\(\s\+\|=\)[0-9\-]*\).*$/\1/"`
SMP=`echo $CPUSET|grep smp|sed -e "s/^.*smp\(\s\+\|=\)\([0-9]*\).*$/\2/"`
CPUSET=`echo $CPUSET|grep cpuset|sed -e "s/^.*\(--cpuset\(\s\+\|=\)[0-9\-]*\).*$/\1/"`
if [ $AMI_OPT -eq 1 ]; then
NR_CPU=`cat /proc/cpuinfo |grep processor|wc -l`
NR_DISKS=`lsblk --list --nodeps --noheadings | grep -v xvda | grep xvd | wc -l`
@@ -69,7 +75,8 @@ if [ `is_developer_mode` -eq 0 ]; then
echo "SEASTAR_IO=\"--num-io-queues $NR_IO_QUEUES --max-io-requests $NR_REQS\"" > /etc/scylla.d/io.conf
else
iotune --evaluation-directory /var/lib/scylla --format envfile --options-file /etc/scylla.d/io.conf $CPUSET
DATA_DIR=`/usr/lib/scylla/scylla_config_get.py --config $SCYLLA_CONF/scylla.yaml --get data_file_directories|head -n1`
iotune --evaluation-directory $DATA_DIR --format envfile --options-file /etc/scylla.d/io.conf $CPUSET
if [ $? -ne 0 ]; then
output_to_user "/var/lib/scylla did not pass validation tests, it may not be on XFS and/or has limited disk space."
output_to_user "This is a non-supported setup, and performance is expected to be very bad."

View File

@@ -1,5 +1,18 @@
#!/bin/bash -e
. /etc/os-release
if [ "$ID" = "ubuntu" ]; then
. /etc/default/scylla-server
else
. /etc/sysconfig/scylla-server
fi
for i in /etc/scylla.d/*.conf; do
if [ "$i" = "/etc/scylla.d/*.conf" ]; then
break
fi
. "$i"
done
if [ "$AMI" = "yes" ] && [ -f /etc/scylla/ami_disabled ]; then
rm /etc/scylla/ami_disabled
exit 1
@@ -20,10 +33,15 @@ elif [ "$NETWORK_MODE" = "dpdk" ]; then
done
else # NETWORK_MODE = posix
if [ "$SET_NIC" = "yes" ]; then
/usr/lib/scylla/posix_net_conf.sh $IFNAME
NRXQ=`find /sys/class/net/$IFNAME/queues -name "rx-*"|wc -l`
if [ $NRXQ -gt 1 ]; then
CONF_ARGS=-mq
else
CONF_ARGS=-sq
fi
/usr/lib/scylla/posix_net_conf.sh $IFNAME $CONF_ARGS
fi
fi
. /etc/os-release
if [ "$ID" = "ubuntu" ]; then
hugeadm --create-mounts
fi

View File

@@ -8,13 +8,15 @@ if [ "`id -u`" -ne 0 ]; then
fi
print_usage() {
echo "scylla_setup --disks /dev/hda,/dev/hdb... --nic eth0 --ntp-domain centos --ami --developer-mode --no-kernel-check --no-enable-service --no-selinux-setup --no-bootparam-setup --no-ntp-setup --no-raid-setup --no-coredump-setup --no-sysconfig-setup"
echo "scylla_setup --disks /dev/hda,/dev/hdb... --nic eth0 --ntp-domain centos --ami --setup-nic --developer-mode --no-kernel-check --no-verify-package --no-enable-service --no-selinux-setup --no-bootparam-setup --no-ntp-setup --no-raid-setup --no-coredump-setup --no-sysconfig-setup"
echo " --disks specify disks for RAID"
echo " --nic specify NIC"
echo " --ntp-domain specify NTP domain"
echo " --ami setup AMI instance"
echo " --setup-nic optimize NIC queue"
echo " --developer-mode enable developer mode"
echo " --no-kernel-check skip kernel version check"
echo " --no-verify-package skip verifying packages"
echo " --no-enable-service skip enabling service"
echo " --no-selinux-setup skip selinux setup"
echo " --no-bootparam-setup skip bootparam setup"
@@ -42,9 +44,23 @@ interactive_ask_service() {
done
}
verify_package() {
if [ "$ID" = "ubuntu" ]; then
dpkg -s $1 > /dev/null 2>&1 &&:
else
rpm -q $1 > /dev/null 2>&1 &&:
fi
if [ $? -eq 1 ]; then
echo "$1 package is not installed."
exit 1
fi
}
AMI=0
SET_NIC=0
DEV_MODE=0
KERNEL_CHECK=1
VERIFY_PACKAGE=1
ENABLE_SERVICE=1
SELINUX_SETUP=1
BOOTPARAM_SETUP=1
@@ -78,6 +94,10 @@ while [ $# -gt 0 ]; do
AMI=1
shift 1
;;
"--setup-nic")
SET_NIC=1
shift 1
;;
"--developer-mode")
DEV_MODE=1
shift 1
@@ -86,6 +106,10 @@ while [ $# -gt 0 ]; do
KERNEL_CHECK=0
shift 1
;;
"--no-verify-package")
VERIFY_PACKAGE=0
shift 1
;;
"--no-enable-service")
ENABLE_SERVICE=0
shift 1
@@ -147,17 +171,19 @@ if [ $INTERACTIVE -eq 1 ]; then
ENABLE_SERVICE=$?
fi
if [ "$ID" = "ubuntu" ] || [ "$ID" = "debian" ]; then
dpkg -s scylla-jmx > /dev/null 2>&1 &&:
NO_SCYLLA_JMX=$?
else
rpm -q scylla-jmx > /dev/null 2>&1 &&:
NO_SCYLLA_JMX=$?
if [ $INTERACTIVE -eq 1 ]; then
interactive_ask_service "Do you verify ScyllaDB packages installed?" &&:
VERIFY_PACKAGE=$?
fi
if [ $NO_SCYLLA_JMX -eq 1 ]; then
echo "scylla-jmx package is not installed."
exit 1
if [ $VERIFY_PACKAGE -eq 1 ]; then
verify_package scylla-jmx
verify_package scylla-tools
fi
if [ $INTERACTIVE -eq 1 ]; then
interactive_ask_service "Do you want to enable ScyllaDB services?" &&:
ENABLE_SERVICE=$?
fi
if [ $ENABLE_SERVICE -eq 1 ]; then
@@ -207,6 +233,9 @@ if [ $INTERACTIVE -eq 1 ]; then
echo -n "> "
read dsk
if [ "$dsk" = "done" ]; then
if [ "$DISKS" = "" ]; then
continue
fi
break
fi
if [ "$dsk" = "" ]; then
@@ -262,10 +291,16 @@ if [ $INTERACTIVE -eq 1 ]; then
fi
done
fi
interactive_ask_service "Do you want to optimize NIC queue settings?" &&:
SET_NIC=$?
fi
fi
if [ $SYSCONFIG_SETUP -eq 1 ]; then
/usr/lib/scylla/scylla_sysconfig_setup --nic $NIC
SETUP_ARGS=
if [ $SET_NIC -eq 1 ]; then
SETUP_ARGS="--setup-nic"
fi
/usr/lib/scylla/scylla_sysconfig_setup --nic $NIC $SETUP_ARGS
fi
if [ $INTERACTIVE -eq 1 ]; then

View File

@@ -1,5 +1,18 @@
#!/bin/bash -e
. /etc/os-release
if [ "$ID" = "ubuntu" ]; then
. /etc/default/scylla-server
else
. /etc/sysconfig/scylla-server
fi
for i in /etc/scylla.d/*.conf; do
if [ "$i" = "/etc/scylla.d/*.conf" ]; then
break
fi
. "$i"
done
if [ "$NETWORK_MODE" = "virtio" ]; then
ip tuntap del mode tap dev $TAP
elif [ "$NETWORK_MODE" = "dpdk" ]; then

View File

@@ -75,13 +75,8 @@ echo Setting parameters on $SYSCONFIG/scylla-server
ETHDRV=`/usr/lib/scylla/dpdk_nic_bind.py --status | grep if=$NIC | sed -e "s/^.*drv=//" -e "s/ .*$//"`
ETHPCIID=`/usr/lib/scylla/dpdk_nic_bind.py --status | grep if=$NIC | awk '{print $1}'`
NR_CPU=`cat /proc/cpuinfo |grep processor|wc -l`
if [ "$AMI" = "yes" ] && [ $NR_CPU -ge 8 ] && [ "$SET_NIC" = "no" ]; then
NR=$((NR_CPU - 1))
SET_NIC="yes"
SCYLLA_ARGS="$SCYLLA_ARGS --cpuset 1-$NR --smp $NR"
fi
sed -e s#^NETWORK_MODE=.*#NETWORK_MODE=$NETWORK_MODE# \
-e s#^IFNAME=.*#IFNAME=$NIC# \
-e s#^ETHDRV=.*#ETHDRV=$ETHDRV# \
-e s#^ETHPCIID=.*#ETHPCIID=$ETHPCIID# \
-e s#^NR_HUGEPAGES=.*#NR_HUGEPAGES=$NR_HUGEPAGES# \

4
dist/common/scylla.d/cpuset.conf vendored Normal file
View File

@@ -0,0 +1,4 @@
# DO NO EDIT
# This file should be automatically configure by scylla_cpuset_setup
#
# CPUSET="--cpuset 0 --smp 1"

View File

@@ -1 +1 @@
scylla ALL=(ALL) NOPASSWD:SETENV: /usr/lib/scylla/scylla_prepare,/usr/lib/scylla/scylla_stop,/usr/lib/scylla/scylla_io_setup,/usr/lib/scylla/scylla-ami/scylla_ami_setup
scylla ALL=(ALL) NOPASSWD: /usr/lib/scylla/scylla_prepare,/usr/lib/scylla/scylla_stop,/usr/lib/scylla/scylla_io_setup,/usr/lib/scylla/scylla-ami/scylla_ami_setup

View File

@@ -0,0 +1,14 @@
# Prevent auto-scaling from doing anything to our tunables
kernel.sched_tunable_scaling = 0
# Preempt sooner
kernel.sched_min_granularity_ns = 500000
# Don't delay unrelated workloads
kernel.sched_wakeup_granularity_ns = 500000
# Schedule all tasks in this period
kernel.sched_latency_ns = 1000000
# autogroup seems to prevent sched_latency_ns from being respected
kernel.sched_autogroup_enabled = 0

View File

@@ -2,18 +2,18 @@
Description=Scylla Server
[Service]
PermissionsStartOnly=true
Type=notify
LimitMEMLOCK=infinity
LimitNOFILE=200000
LimitAS=infinity
LimitNPROC=8096
WorkingDirectory=/var/lib/scylla
Environment="HOME=/var/lib/scylla"
EnvironmentFile=@@SYSCONFDIR@@/scylla-server
EnvironmentFile=/etc/scylla.d/*.conf
ExecStartPre=/usr/bin/sudo -E /usr/lib/scylla/scylla_prepare
ExecStart=/usr/bin/scylla $SCYLLA_ARGS $SEASTAR_IO $DEV_MODE
ExecStopPost=/usr/bin/sudo -E /usr/lib/scylla/scylla_stop
WorkingDirectory=$SCYLLA_HOME
ExecStartPre=/usr/lib/scylla/scylla_prepare
ExecStart=/usr/bin/scylla $SCYLLA_ARGS $SEASTAR_IO $DEV_MODE $CPUSET
ExecStopPost=/usr/lib/scylla/scylla_stop
TimeoutStartSec=900
KillMode=process
Restart=on-abnormal

View File

@@ -2,8 +2,8 @@ FROM centos:7
MAINTAINER Avi Kivity <avi@cloudius-systems.com>
RUN curl http://downloads.scylladb.com/rpm/centos/scylla-1.2.repo -o /etc/yum.repos.d/scylla.repo
RUN yum -y install epel-release
ADD scylla.repo /etc/yum.repos.d/
RUN yum -y clean expire-cache
RUN yum -y update
RUN yum -y remove boost-thread boost-system

View File

@@ -1,23 +0,0 @@
[scylla]
name=Scylla for Centos $releasever - $basearch
baseurl=https://s3.amazonaws.com/downloads.scylladb.com/rpm/centos/$releasever/$basearch/
enabled=1
gpgcheck=0
[scylla-generic]
name=Scylla for centos $releasever
baseurl=https://s3.amazonaws.com/downloads.scylladb.com/rpm/centos/$releasever/noarch/
enabled=1
gpgcheck=0
[scylla-3rdparty]
name=Scylla 3rdParty for Centos $releasever - $basearch
baseurl=https://s3.amazonaws.com/downloads.scylladb.com/rpm/3rdparty/centos/$releasever/$basearch/
enabled=1
gpgcheck=0
[scylla-3rdparty-generic]
name=Scylla 3rdParty for Centos $releasever
baseurl=https://s3.amazonaws.com/downloads.scylladb.com/rpm/3rdparty/centos/$releasever/noarch/
enabled=1
gpgcheck=0

View File

@@ -1,4 +1,5 @@
FROM ubuntu:14.04
RUN sudo apt-get update
RUN sudo apt-get install -y wget
RUN sudo wget -O /etc/apt/sources.list.d/scylla.list http://downloads.scylladb.com/deb/ubuntu/scylla.list
RUN sudo apt-get update

View File

@@ -57,24 +57,24 @@ VERSION=$(./SCYLLA-VERSION-GEN)
SCYLLA_VERSION=$(cat build/SCYLLA-VERSION-FILE)
SCYLLA_RELEASE=$(cat build/SCYLLA-RELEASE-FILE)
echo $VERSION >version
./scripts/git-archive-all --extra version --force-submodules --prefix scylla-server-$SCYLLA_VERSION $RPMBUILD/SOURCES/scylla-server-$VERSION.tar
./scripts/git-archive-all --extra version --force-submodules --prefix scylla-$SCYLLA_VERSION $RPMBUILD/SOURCES/scylla-$VERSION.tar
rm -f version
cp dist/redhat/scylla-server.spec.in $RPMBUILD/SPECS/scylla-server.spec
sed -i -e "s/@@VERSION@@/$SCYLLA_VERSION/g" $RPMBUILD/SPECS/scylla-server.spec
sed -i -e "s/@@RELEASE@@/$SCYLLA_RELEASE/g" $RPMBUILD/SPECS/scylla-server.spec
cp dist/redhat/scylla.spec.in $RPMBUILD/SPECS/scylla.spec
sed -i -e "s/@@VERSION@@/$SCYLLA_VERSION/g" $RPMBUILD/SPECS/scylla.spec
sed -i -e "s/@@RELEASE@@/$SCYLLA_RELEASE/g" $RPMBUILD/SPECS/scylla.spec
if [ "$ID" = "fedora" ]; then
if [ $JOBS -gt 0 ]; then
rpmbuild -bs --define "_topdir $RPMBUILD" --define "_smp_mflags -j$JOBS" $RPMBUILD/SPECS/scylla-server.spec
rpmbuild -bs --define "_topdir $RPMBUILD" --define "_smp_mflags -j$JOBS" $RPMBUILD/SPECS/scylla.spec
else
rpmbuild -bs --define "_topdir $RPMBUILD" $RPMBUILD/SPECS/scylla-server.spec
rpmbuild -bs --define "_topdir $RPMBUILD" $RPMBUILD/SPECS/scylla.spec
fi
mock rebuild --resultdir=`pwd`/build/rpms $RPMBUILD/SRPMS/scylla-server-$VERSION*.src.rpm
mock rebuild --resultdir=`pwd`/build/rpms $RPMBUILD/SRPMS/scylla-$VERSION*.src.rpm
else
sudo yum-builddep -y $RPMBUILD/SPECS/scylla-server.spec
sudo yum-builddep -y $RPMBUILD/SPECS/scylla.spec
. /etc/profile.d/scylla.sh
if [ $JOBS -gt 0 ]; then
rpmbuild -ba --define "_topdir $RPMBUILD" --define "_smp_mflags -j$JOBS" $RPMBUILD/SPECS/scylla-server.spec
rpmbuild -ba --define "_topdir $RPMBUILD" --define "_smp_mflags -j$JOBS" $RPMBUILD/SPECS/scylla.spec
else
rpmbuild -ba --define "_topdir $RPMBUILD" $RPMBUILD/SPECS/scylla-server.spec
rpmbuild -ba --define "_topdir $RPMBUILD" $RPMBUILD/SPECS/scylla.spec
fi
fi

View File

@@ -58,10 +58,10 @@ sudo yum install -y rpm-devel python34-devel guile-devel readline-devel ncurses-
sudo yum install -y dos2unix
if [ ! -f $RPMBUILD/RPMS/noarch/scylla-env-1.0-1.el7.centos.noarch.rpm ]; then
cd dist/redhat/centos_dep
cd dist/common/dep
tar cpf $RPMBUILD/SOURCES/scylla-env-1.0.tar scylla-env-1.0
cd -
rpmbuild --define "_topdir $RPMBUILD" --ba dist/redhat/centos_dep/scylla-env.spec
rpmbuild --define "_topdir $RPMBUILD" --ba dist/common/dep/scylla-env.spec
fi
do_install scylla-env-1.0-1.el7.centos.noarch.rpm

View File

@@ -1,5 +1,5 @@
--- gcc.spec.orig 2015-12-08 16:03:46.000000000 +0000
+++ gcc.spec 2016-01-21 08:47:49.160667342 +0000
+++ gcc.spec 2016-07-10 06:07:27.612453480 +0000
@@ -1,6 +1,7 @@
%global DATE 20151207
%global SVNREV 231358
@@ -8,7 +8,24 @@
# Note, gcc_release must be integer, if you want to add suffixes to
# %{release}, append them after %{gcc_release} on Release: line.
%global gcc_release 2
@@ -84,7 +85,8 @@
@@ -9,16 +10,8 @@
# Hardening slows the compiler way too much.
%undefine _hardened_build
%global multilib_64_archs sparc64 ppc64 ppc64p7 s390x x86_64
-%ifarch %{ix86} x86_64 ia64 ppc ppc64 ppc64p7 alpha %{arm} aarch64
-%global build_ada 1
-%else
%global build_ada 0
-%endif
-%ifarch %{ix86} x86_64 ppc ppc64 ppc64le ppc64p7 s390 s390x %{arm} aarch64
-%global build_go 1
-%else
%global build_go 0
-%endif
%ifarch %{ix86} x86_64 ia64
%global build_libquadmath 1
%else
@@ -84,7 +77,8 @@
%global multilib_32_arch i686
%endif
Summary: Various compilers (C, C++, Objective-C, Java, ...)
@@ -18,7 +35,7 @@
Version: %{gcc_version}
Release: %{gcc_release}%{?dist}
# libgcc, libgfortran, libgomp, libstdc++ and crtstuff have
@@ -99,6 +101,7 @@
@@ -99,6 +93,7 @@
%global isl_version 0.14
URL: http://gcc.gnu.org
BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n)
@@ -26,7 +43,7 @@
# Need binutils with -pie support >= 2.14.90.0.4-4
# Need binutils which can omit dot symbols and overlap .opd on ppc64 >= 2.15.91.0.2-4
# Need binutils which handle -msecure-plt on ppc >= 2.16.91.0.2-2
@@ -110,7 +113,7 @@
@@ -110,7 +105,7 @@
# Need binutils which support .cfi_sections >= 2.19.51.0.14-33
# Need binutils which support --no-add-needed >= 2.20.51.0.2-12
# Need binutils which support -plugin
@@ -35,7 +52,7 @@
# While gcc doesn't include statically linked binaries, during testing
# -static is used several times.
BuildRequires: glibc-static
@@ -145,15 +148,15 @@
@@ -145,15 +140,15 @@
BuildRequires: libunwind >= 0.98
%endif
%if %{build_isl}
@@ -55,7 +72,7 @@
# Need .eh_frame ld optimizations
# Need proper visibility support
# Need -pie support
@@ -168,7 +171,7 @@
@@ -168,7 +163,7 @@
# Need binutils that support .cfi_sections
# Need binutils that support --no-add-needed
# Need binutils that support -plugin
@@ -64,7 +81,7 @@
# Make sure gdb will understand DW_FORM_strp
Conflicts: gdb < 5.1-2
Requires: glibc-devel >= 2.2.90-12
@@ -176,17 +179,15 @@
@@ -176,17 +171,15 @@
# Make sure glibc supports TFmode long double
Requires: glibc >= 2.3.90-35
%endif
@@ -86,7 +103,7 @@
Requires(post): /sbin/install-info
Requires(preun): /sbin/install-info
AutoReq: true
@@ -228,12 +229,12 @@
@@ -228,12 +221,12 @@
The gcc package contains the GNU Compiler Collection version 5.
You'll need this package in order to compile C code.
@@ -101,7 +118,7 @@
%endif
Obsoletes: libmudflap
Obsoletes: libmudflap-devel
@@ -241,17 +242,19 @@
@@ -241,17 +234,19 @@
Obsoletes: libgcj < %{version}-%{release}
Obsoletes: libgcj-devel < %{version}-%{release}
Obsoletes: libgcj-src < %{version}-%{release}
@@ -125,7 +142,7 @@
Autoreq: true
%description c++
@@ -259,50 +262,55 @@
@@ -259,50 +254,55 @@
It includes support for most of the current C++ specification,
including templates and exception handling.
@@ -193,7 +210,7 @@
Autoreq: true
%description objc
@@ -313,29 +321,32 @@
@@ -313,29 +313,32 @@
%package objc++
Summary: Objective-C++ support for GCC
Group: Development/Languages
@@ -233,7 +250,7 @@
%endif
Requires(post): /sbin/install-info
Requires(preun): /sbin/install-info
@@ -345,260 +356,286 @@
@@ -345,260 +348,286 @@
The gcc-gfortran package provides support for compiling Fortran
programs with the GNU Compiler Collection.
@@ -592,7 +609,7 @@
Cpp is the GNU C-Compatible Compiler Preprocessor.
Cpp is a macro processor which is used automatically
by the C compiler to transform your program before actual
@@ -623,8 +660,9 @@
@@ -623,8 +652,9 @@
%package gnat
Summary: Ada 83, 95, 2005 and 2012 support for GCC
Group: Development/Languages
@@ -604,7 +621,7 @@
Requires(post): /sbin/install-info
Requires(preun): /sbin/install-info
Autoreq: true
@@ -633,82 +671,90 @@
@@ -633,82 +663,90 @@
GNAT is a GNU Ada 83, 95, 2005 and 2012 front-end to GCC. This package includes
development tools, the documents and Ada compiler.
@@ -717,7 +734,7 @@
Requires: gmp-devel >= 4.1.2-8, mpfr-devel >= 2.2.1, libmpc-devel >= 0.8.1
%description plugin-devel
@@ -728,7 +774,8 @@
@@ -728,7 +766,8 @@
Summary: Debug information for package %{name}
Group: Development/Debug
AutoReqProv: 0
@@ -727,7 +744,7 @@
%description debuginfo
This package provides debug information for package %{name}.
@@ -958,11 +1005,11 @@
@@ -958,11 +997,11 @@
--enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu \
--enable-plugin --enable-initfini-array \
--disable-libgcj \
@@ -741,7 +758,7 @@
%else
--without-isl \
%endif
@@ -971,11 +1018,9 @@
@@ -971,11 +1010,9 @@
%else
--disable-libmpx \
%endif
@@ -753,7 +770,7 @@
%ifarch %{arm}
--disable-sjlj-exceptions \
%endif
@@ -1006,9 +1051,6 @@
@@ -1006,9 +1043,6 @@
%if 0%{?rhel} >= 7
--with-cpu-32=power8 --with-tune-32=power8 --with-cpu-64=power8 --with-tune-64=power8 \
%endif
@@ -763,7 +780,7 @@
%endif
%ifarch ppc
--build=%{gcc_target_platform} --target=%{gcc_target_platform} --with-cpu=default32
@@ -1270,16 +1312,15 @@
@@ -1270,16 +1304,15 @@
mv %{buildroot}%{_prefix}/%{_lib}/libmpx.spec $FULLPATH/
%endif
@@ -786,7 +803,7 @@
%endif
%ifarch ppc
rm -f $FULLPATH/libgcc_s.so
@@ -1819,7 +1860,7 @@
@@ -1819,7 +1852,7 @@
chmod 755 %{buildroot}%{_prefix}/bin/c?9
cd ..
@@ -795,7 +812,7 @@
%find_lang cpplib
# Remove binaries we will not be including, so that they don't end up in
@@ -1869,11 +1910,7 @@
@@ -1869,11 +1902,7 @@
# run the tests.
make %{?_smp_mflags} -k check ALT_CC_UNDER_TEST=gcc ALT_CXX_UNDER_TEST=g++ \
@@ -807,7 +824,7 @@
echo ====================TESTING=========================
( LC_ALL=C ../contrib/test_summary || : ) 2>&1 | sed -n '/^cat.*EOF/,/^EOF/{/^cat.*EOF/d;/^EOF/d;/^LAST_UPDATED:/d;p;}'
echo ====================TESTING END=====================
@@ -1900,13 +1937,13 @@
@@ -1900,13 +1929,13 @@
--info-dir=%{_infodir} %{_infodir}/gcc.info.gz || :
fi
@@ -823,7 +840,7 @@
if [ $1 = 0 -a -f %{_infodir}/cpp.info.gz ]; then
/sbin/install-info --delete \
--info-dir=%{_infodir} %{_infodir}/cpp.info.gz || :
@@ -1945,19 +1982,19 @@
@@ -1945,19 +1974,19 @@
fi
%post go
@@ -846,7 +863,7 @@
if posix.access ("/sbin/ldconfig", "x") then
local pid = posix.fork ()
if pid == 0 then
@@ -1967,7 +2004,7 @@
@@ -1967,7 +1996,7 @@
end
end
@@ -855,7 +872,7 @@
if posix.access ("/sbin/ldconfig", "x") then
local pid = posix.fork ()
if pid == 0 then
@@ -1977,120 +2014,120 @@
@@ -1977,120 +2006,120 @@
end
end
@@ -1014,7 +1031,7 @@
%defattr(-,root,root,-)
%{_prefix}/bin/cc
%{_prefix}/bin/c89
@@ -2414,7 +2451,7 @@
@@ -2414,7 +2443,7 @@
%{!?_licensedir:%global license %%doc}
%license gcc/COPYING* COPYING.RUNTIME
@@ -1023,7 +1040,7 @@
%defattr(-,root,root,-)
%{_prefix}/lib/cpp
%{_prefix}/bin/cpp
@@ -2425,10 +2462,10 @@
@@ -2425,10 +2454,10 @@
%dir %{_prefix}/libexec/gcc/%{gcc_target_platform}/%{gcc_version}
%{_prefix}/libexec/gcc/%{gcc_target_platform}/%{gcc_version}/cc1
@@ -1037,7 +1054,7 @@
%{!?_licensedir:%global license %%doc}
%license gcc/COPYING* COPYING.RUNTIME
@@ -2469,7 +2506,7 @@
@@ -2469,7 +2498,7 @@
%endif
%doc rpm.doc/changelogs/gcc/cp/ChangeLog*
@@ -1046,7 +1063,7 @@
%defattr(-,root,root,-)
%{_prefix}/%{_lib}/libstdc++.so.6*
%dir %{_datadir}/gdb
@@ -2481,7 +2518,7 @@
@@ -2481,7 +2510,7 @@
%dir %{_prefix}/share/gcc-%{gcc_version}/python
%{_prefix}/share/gcc-%{gcc_version}/python/libstdcxx
@@ -1055,7 +1072,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/include/c++
%dir %{_prefix}/include/c++/%{gcc_version}
@@ -2507,7 +2544,7 @@
@@ -2507,7 +2536,7 @@
%endif
%doc rpm.doc/changelogs/libstdc++-v3/ChangeLog* libstdc++-v3/README*
@@ -1064,7 +1081,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2528,7 +2565,7 @@
@@ -2528,7 +2557,7 @@
%endif
%if %{build_libstdcxx_docs}
@@ -1073,7 +1090,7 @@
%defattr(-,root,root)
%{_mandir}/man3/*
%doc rpm.doc/libstdc++-v3/html
@@ -2567,7 +2604,7 @@
@@ -2567,7 +2596,7 @@
%dir %{_prefix}/libexec/gcc/%{gcc_target_platform}/%{gcc_version}
%{_prefix}/libexec/gcc/%{gcc_target_platform}/%{gcc_version}/cc1objplus
@@ -1082,7 +1099,7 @@
%defattr(-,root,root,-)
%{_prefix}/%{_lib}/libobjc.so.4*
@@ -2621,11 +2658,11 @@
@@ -2621,11 +2650,11 @@
%endif
%doc rpm.doc/gfortran/*
@@ -1096,7 +1113,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2671,12 +2708,12 @@
@@ -2671,12 +2700,12 @@
%{_prefix}/libexec/gcc/%{gcc_target_platform}/%{gcc_version}/gnat1
%doc rpm.doc/changelogs/gcc/ada/ChangeLog*
@@ -1111,7 +1128,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2702,7 +2739,7 @@
@@ -2702,7 +2731,7 @@
%exclude %{_prefix}/lib/gcc/%{gcc_target_platform}/%{gcc_version}/adalib/libgnarl.a
%endif
@@ -1120,7 +1137,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2726,7 +2763,7 @@
@@ -2726,7 +2755,7 @@
%endif
%endif
@@ -1129,7 +1146,7 @@
%defattr(-,root,root,-)
%{_prefix}/%{_lib}/libgomp.so.1*
%{_prefix}/%{_lib}/libgomp-plugin-host_nonshm.so.1*
@@ -2734,14 +2771,14 @@
@@ -2734,14 +2763,14 @@
%doc rpm.doc/changelogs/libgomp/ChangeLog*
%if %{build_libquadmath}
@@ -1146,7 +1163,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2754,7 +2791,7 @@
@@ -2754,7 +2783,7 @@
%endif
%doc rpm.doc/libquadmath/ChangeLog*
@@ -1155,7 +1172,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2773,12 +2810,12 @@
@@ -2773,12 +2802,12 @@
%endif
%if %{build_libitm}
@@ -1170,7 +1187,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2791,7 +2828,7 @@
@@ -2791,7 +2820,7 @@
%endif
%doc rpm.doc/libitm/ChangeLog*
@@ -1179,7 +1196,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2810,11 +2847,11 @@
@@ -2810,11 +2839,11 @@
%endif
%if %{build_libatomic}
@@ -1193,7 +1210,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2834,11 +2871,11 @@
@@ -2834,11 +2863,11 @@
%endif
%if %{build_libasan}
@@ -1207,7 +1224,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2860,11 +2897,11 @@
@@ -2860,11 +2889,11 @@
%endif
%if %{build_libubsan}
@@ -1221,7 +1238,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2886,11 +2923,11 @@
@@ -2886,11 +2915,11 @@
%endif
%if %{build_libtsan}
@@ -1235,7 +1252,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2902,11 +2939,11 @@
@@ -2902,11 +2931,11 @@
%endif
%if %{build_liblsan}
@@ -1249,7 +1266,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2918,11 +2955,11 @@
@@ -2918,11 +2947,11 @@
%endif
%if %{build_libcilkrts}
@@ -1263,7 +1280,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -2942,12 +2979,12 @@
@@ -2942,12 +2971,12 @@
%endif
%if %{build_libmpx}
@@ -1278,7 +1295,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -3009,12 +3046,12 @@
@@ -3009,12 +3038,12 @@
%endif
%doc rpm.doc/go/*
@@ -1293,7 +1310,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -3042,7 +3079,7 @@
@@ -3042,7 +3071,7 @@
%{_prefix}/lib/gcc/%{gcc_target_platform}/%{gcc_version}/libgo.so
%endif
@@ -1302,7 +1319,7 @@
%defattr(-,root,root,-)
%dir %{_prefix}/lib/gcc
%dir %{_prefix}/lib/gcc/%{gcc_target_platform}
@@ -3060,12 +3097,12 @@
@@ -3060,12 +3089,12 @@
%endif
%endif

View File

@@ -1,4 +1,4 @@
Name: scylla-server
Name: scylla
Version: @@VERSION@@
Release: @@RELEASE@@%{?dist}
Summary: Scylla is a highly scalable, eventually consistent, distributed, partitioned row DB.
@@ -7,23 +7,40 @@ Group: Applications/Databases
License: AGPLv3
URL: http://www.scylladb.com/
Source0: %{name}-@@VERSION@@-@@RELEASE@@.tar
BuildRequires: libaio-devel libstdc++-devel cryptopp-devel hwloc-devel numactl-devel libpciaccess-devel libxml2-devel zlib-devel thrift-devel yaml-cpp-devel lz4-devel snappy-devel jsoncpp-devel systemd-devel xz-devel openssl-devel libcap-devel libselinux-devel libgcrypt-devel libgpg-error-devel elfutils-devel krb5-devel libcom_err-devel libattr-devel pcre-devel elfutils-libelf-devel bzip2-devel keyutils-libs-devel xfsprogs-devel make gnutls-devel systemd-devel
%{?fedora:BuildRequires: boost-devel ninja-build ragel antlr3-tool antlr3-C++-devel python3 gcc-c++ libasan libubsan python3-pyparsing dnf-yum}
%{?rhel:BuildRequires: scylla-libstdc++-static scylla-boost-devel scylla-ninja-build scylla-ragel scylla-antlr3-tool scylla-antlr3-C++-devel python34 scylla-gcc-c++ >= 5.1.1, python34-pyparsing}
Requires: systemd-libs hwloc collectd
Conflicts: abrt
Requires: scylla-server scylla-jmx scylla-tools scylla-kernel-conf
Obsoletes: scylla-server < 1.1
%description
Scylla is a highly scalable, eventually consistent, distributed,
partitioned row DB.
This package installs all required packages for ScyllaDB, including
scylla-server, scylla-jmx, scylla-tools.
%files
%defattr(-,root,root)
%prep
%setup -q
%package server
Group: Applications/Databases
Summary: The Scylla database server
License: AGPLv3
URL: http://www.scylladb.com/
BuildRequires: libaio-devel libstdc++-devel cryptopp-devel hwloc-devel numactl-devel libpciaccess-devel libxml2-devel zlib-devel thrift-devel yaml-cpp-devel lz4-devel snappy-devel jsoncpp-devel systemd-devel xz-devel openssl-devel libcap-devel libselinux-devel libgcrypt-devel libgpg-error-devel elfutils-devel krb5-devel libcom_err-devel libattr-devel pcre-devel elfutils-libelf-devel bzip2-devel keyutils-libs-devel xfsprogs-devel make gnutls-devel systemd-devel lksctp-tools-devel
%{?fedora:BuildRequires: boost-devel ninja-build ragel antlr3-tool antlr3-C++-devel python3 gcc-c++ libasan libubsan python3-pyparsing dnf-yum}
%{?rhel:BuildRequires: scylla-libstdc++-static scylla-boost-devel scylla-ninja-build scylla-ragel scylla-antlr3-tool scylla-antlr3-C++-devel python34 scylla-gcc-c++ >= 5.1.1, python34-pyparsing}
Requires: scylla-conf systemd-libs hwloc collectd PyYAML python-urwid
Conflicts: abrt
%description server
This package contains ScyllaDB server.
%define __debug_install_post \
%{_rpmconfigdir}/find-debuginfo.sh %{?_missing_build_ids_terminate_build:--strict-build-id} %{?_find_debuginfo_opts} "%{_builddir}/%{?buildsubdir}";\
cp scylla-gdb.py ${RPM_BUILD_ROOT}/usr/src/debug/%{name}-%{version}/;\
%{nil}
%prep
%setup -q
%build
%if 0%{?fedora}
./configure.py --disable-xen --enable-dpdk --mode=release
@@ -45,6 +62,7 @@ mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/sudoers.d/
mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/collectd.d/
mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/scylla/
mkdir -p $RPM_BUILD_ROOT%{_sysconfdir}/scylla.d/
mkdir -p $RPM_BUILD_ROOT%{_sysctldir}/
mkdir -p $RPM_BUILD_ROOT%{_docdir}/scylla/
mkdir -p $RPM_BUILD_ROOT%{_unitdir}
mkdir -p $RPM_BUILD_ROOT%{_prefix}/lib/scylla/
@@ -54,6 +72,7 @@ install -m644 dist/common/limits.d/scylla.conf $RPM_BUILD_ROOT%{_sysconfdir}/sec
install -m644 dist/common/sudoers.d/scylla $RPM_BUILD_ROOT%{_sysconfdir}/sudoers.d/
install -m644 dist/common/collectd.d/scylla.conf $RPM_BUILD_ROOT%{_sysconfdir}/collectd.d/
install -m644 dist/common/scylla.d/*.conf $RPM_BUILD_ROOT%{_sysconfdir}/scylla.d/
install -m644 dist/common/sysctl.d/*.conf $RPM_BUILD_ROOT%{_sysctldir}/
install -d -m755 $RPM_BUILD_ROOT%{_sysconfdir}/scylla
install -m644 conf/scylla.yaml $RPM_BUILD_ROOT%{_sysconfdir}/scylla/
install -m644 conf/cassandra-rackdc.properties $RPM_BUILD_ROOT%{_sysconfdir}/scylla/
@@ -82,33 +101,24 @@ cp -r api/api-doc $RPM_BUILD_ROOT%{_prefix}/lib/scylla/api
cp -r tools/scyllatop $RPM_BUILD_ROOT%{_prefix}/lib/scylla/scyllatop
cp -P dist/common/sbin/* $RPM_BUILD_ROOT%{_sbindir}/
%pre
%pre server
/usr/sbin/groupadd scylla 2> /dev/null || :
/usr/sbin/useradd -g scylla -s /sbin/nologin -r -d %{_sharedstatedir}/scylla scylla 2> /dev/null || :
%if 0%{?rhel}
sed -e "s/Defaults requiretty/#Defaults requiretty/" /etc/sudoers > /tmp/sudoers
cp /tmp/sudoers /etc/sudoers
rm /tmp/sudoers
%endif
%post
grep -v api_ui_dir /etc/scylla/scylla.yaml | grep -v api_doc_dir > /tmp/scylla.yaml
echo "api_ui_dir: /usr/lib/scylla/swagger-ui/dist/" >> /tmp/scylla.yaml
echo "api_doc_dir: /usr/lib/scylla/api/api-doc/" >> /tmp/scylla.yaml
mv /tmp/scylla.yaml /etc/scylla/scylla.yaml
%post server
# Upgrade coredump settings
if [ -f /etc/systemd/coredump.conf ];then
/usr/lib/scylla/scylla_coredump_setup
fi
%systemd_post scylla-server.service
%preun
%preun server
%systemd_preun scylla-server.service
%postun
%postun server
%systemd_postun
%posttrans
%posttrans server
if [ -d /tmp/%{name}-%{version}-%{release} ]; then
cp -a /tmp/%{name}-%{version}-%{release}/* /etc/scylla/
rm -rf /tmp/%{name}-%{version}-%{release}/
@@ -119,16 +129,13 @@ systemctl restart collectd
%clean
rm -rf $RPM_BUILD_ROOT
%files
%files server
%defattr(-,root,root)
%config(noreplace) %{_sysconfdir}/sysconfig/scylla-server
%{_sysconfdir}/security/limits.d/scylla.conf
%{_sysconfdir}/sudoers.d/scylla
%config(noreplace) %{_sysconfdir}/collectd.d/scylla.conf
%attr(0755,root,root) %dir %{_sysconfdir}/scylla
%config(noreplace) %{_sysconfdir}/scylla/scylla.yaml
%config(noreplace) %{_sysconfdir}/scylla/cassandra-rackdc.properties
%attr(0755,root,root) %dir %{_sysconfdir}/scylla.d
%config(noreplace) %{_sysconfdir}/scylla.d/*.conf
%{_docdir}/scylla/README.md
@@ -153,6 +160,7 @@ rm -rf $RPM_BUILD_ROOT
%{_prefix}/lib/scylla/scylla_io_setup
%{_prefix}/lib/scylla/scylla_dev_mode_setup
%{_prefix}/lib/scylla/scylla_kernel_check
%{_prefix}/lib/scylla/scylla_cpuset_setup
%{_prefix}/lib/scylla/posix_net_conf.sh
%{_prefix}/lib/scylla/dpdk_nic_bind.py
%{_prefix}/lib/scylla/dpdk_nic_bind.pyc
@@ -160,11 +168,55 @@ rm -rf $RPM_BUILD_ROOT
%{_prefix}/lib/scylla/swagger-ui/dist/*
%{_prefix}/lib/scylla/api/api-doc/*
%{_prefix}/lib/scylla/scyllatop/*
%{_prefix}/lib/scylla/scylla_config_get.py
%{_prefix}/lib/scylla/scylla_config_get.pyc
%{_prefix}/lib/scylla/scylla_config_get.pyo
%attr(0755,scylla,scylla) %dir %{_sharedstatedir}/scylla/
%attr(0755,scylla,scylla) %dir %{_sharedstatedir}/scylla/data
%attr(0755,scylla,scylla) %dir %{_sharedstatedir}/scylla/commitlog
%attr(0755,scylla,scylla) %dir %{_sharedstatedir}/scylla/coredump
%package conf
Group: Applications/Databases
Summary: Scylla configuration package
License: AGPLv3
URL: http://www.scylladb.com/
Obsoletes: scylla-server < 1.1
%description conf
This package contains the main scylla configuration file.
%post conf
grep -v api_ui_dir /etc/scylla/scylla.yaml | grep -v api_doc_dir > /tmp/scylla.yaml
echo "api_ui_dir: /usr/lib/scylla/swagger-ui/dist/" >> /tmp/scylla.yaml
echo "api_doc_dir: /usr/lib/scylla/api/api-doc/" >> /tmp/scylla.yaml
mv /tmp/scylla.yaml /etc/scylla/scylla.yaml
%files conf
%defattr(-,root,root)
%attr(0755,root,root) %dir %{_sysconfdir}/scylla
%config(noreplace) %{_sysconfdir}/scylla/scylla.yaml
%config(noreplace) %{_sysconfdir}/scylla/cassandra-rackdc.properties
%package kernel-conf
Group: Applications/Databases
Summary: Scylla configuration package for the Linux kernel
License: AGPLv3
URL: http://www.scylladb.com/
%description kernel-conf
This package contains Linux kernel configuration changes for the Scylla database. Install this package
if Scylla is the main application on your server and you wish to optimize its latency and throughput.
%post kernel-conf
# We cannot use the sysctl_apply rpm macro because it is not present in 7.0
# following is a "manual" expansion
/usr/lib/systemd/systemd-sysctl 99-scylla-sched.conf >/dev/null 2>&1 || :
%files kernel-conf
%defattr(-,root,root)
%{_sysctldir}/*.conf
%changelog
* Tue Jul 21 2015 Takuya ASADA <syuu@cloudius-systems.com>
- inital version of scylla.spec

View File

@@ -1,5 +1,24 @@
#!/bin/bash -e
print_usage() {
echo "build_deb.sh --rebuild-dep"
echo " --rebuild-dep rebuild dependency packages"
exit 1
}
REBUILD=0
while [ $# -gt 0 ]; do
case "$1" in
"--rebuild-dep")
REBUILD=1
shift 1
;;
*)
print_usage
;;
esac
done
if [ ! -e dist/ubuntu/build_deb.sh ]; then
echo "run build_deb.sh in top of scylla dir"
exit 1
@@ -68,7 +87,17 @@ fi
cp dist/common/systemd/scylla-server.service.in debian/scylla-server.service
sed -i -e "s#@@SYSCONFDIR@@#/etc/default#g" debian/scylla-server.service
./dist/ubuntu/dep/build_dependency.sh
if [ "$RELEASE" = "14.04" ] && [ $REBUILD -eq 0 ]; then
if [ ! -f /etc/apt/sources.list.d/scylla-3rdparty-trusty.list ]; then
cd /etc/apt/sources.list.d
sudo wget https://s3.amazonaws.com/downloads.scylladb.com/deb/3rdparty/ubuntu/scylla-3rdparty-trusty.list
cd -
fi
sudo apt-get -y update
sudo apt-get -y --allow-unauthenticated install antlr3 antlr3-c++-dev libthrift-dev libthrift0 thrift-compiler
else
./dist/ubuntu/dep/build_dependency.sh
fi
if [ "$RELEASE" = "14.04" ]; then
sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test

View File

@@ -4,11 +4,19 @@ Homepage: http://scylladb.com
Section: database
Priority: optional
Standards-Version: 3.9.5
Build-Depends: debhelper (>= 9), libyaml-cpp-dev, liblz4-dev, libsnappy-dev, libcrypto++-dev, libjsoncpp-dev, libaio-dev, libthrift-dev, thrift-compiler, antlr3, antlr3-c++-dev, ragel, ninja-build, git, libboost-program-options1.55-dev | libboost-program-options-dev, libboost-filesystem1.55-dev | libboost-filesystem-dev, libboost-system1.55-dev | libboost-system-dev, libboost-thread1.55-dev | libboost-thread-dev, libboost-test1.55-dev | libboost-test-dev, libgnutls28-dev, libhwloc-dev, libnuma-dev, libpciaccess-dev, xfslibs-dev, python3-pyparsing, libxml2-dev, @@BUILD_DEPENDS@@
Build-Depends: debhelper (>= 9), libyaml-cpp-dev, liblz4-dev, libsnappy-dev, libcrypto++-dev, libjsoncpp-dev, libaio-dev, libthrift-dev, thrift-compiler, antlr3, antlr3-c++-dev, ragel, ninja-build, git, libboost-program-options1.55-dev | libboost-program-options-dev, libboost-filesystem1.55-dev | libboost-filesystem-dev, libboost-system1.55-dev | libboost-system-dev, libboost-thread1.55-dev | libboost-thread-dev, libboost-test1.55-dev | libboost-test-dev, libgnutls28-dev, libhwloc-dev, libnuma-dev, libpciaccess-dev, xfslibs-dev, python3-pyparsing, libxml2-dev, libsctp-dev, python-urwid, @@BUILD_DEPENDS@@
Package: scylla-conf
Architecture: any
Description: Scylla database main configuration file
Scylla is a highly scalable, eventually consistent, distributed,
partitioned row DB.
Replaces: scylla-server (<< 1.1)
Conflicts: scylla-server (<< 1.1)
Package: scylla-server
Architecture: amd64
Depends: ${shlibs:Depends}, ${misc:Depends}, adduser, hwloc-nox, collectd, @@DEPENDS@@
Depends: ${shlibs:Depends}, ${misc:Depends}, adduser, hwloc-nox, collectd, scylla-conf, python-yaml, python-urwid, @@DEPENDS@@
Description: Scylla database server binaries
Scylla is a highly scalable, eventually consistent, distributed,
partitioned row DB.
@@ -22,3 +30,17 @@ Description: debugging symbols for scylla-server
Scylla is a highly scalable, eventually consistent, distributed,
partitioned row DB.
This package contains the debugging symbols for scylla-server.
Package: scylla-kernel-conf
Architecture: any
Description: Scylla kernel tuning configuration
Scylla is a highly scalable, eventually consistent, distributed,
partitioned row DB.
Package: scylla
Section: metapackages
Architecture: any
Depends: scylla-server, scylla-jmx, scylla-tools, scylla-kernel-conf
Description: Scylla database metapackage
Scylla is a highly scalable, eventually consistent, distributed,
partitioned row DB.

1
dist/ubuntu/debian/scylla-conf.dirs vendored Normal file
View File

@@ -0,0 +1 @@
etc/scylla

View File

@@ -0,0 +1,2 @@
conf/scylla.yaml etc/scylla
conf/cassandra-rackdc.properties etc/scylla

View File

@@ -0,0 +1 @@
etc/sysctl.d

View File

@@ -0,0 +1 @@
dist/common/sysctl.d/99-scylla-sched.conf /etc/sysctl.d

View File

@@ -0,0 +1,7 @@
#!/bin/sh
set -e
sysctl -p/etc/sysctl.d/99-scylla-sched.conf
#DEBHELPER#

View File

@@ -0,0 +1 @@
scylla-gdb.py usr/lib/scylla

6
dist/ubuntu/debian/scylla-server.dirs vendored Normal file
View File

@@ -0,0 +1,6 @@
etc/scylla.d
usr/lib/scylla
var/lib/scylla
var/lib/scylla/data
var/lib/scylla/commitlog
var/lib/scylla/coredump

4
dist/ubuntu/debian/scylla-server.docs vendored Normal file
View File

@@ -0,0 +1,4 @@
*.md
NOTICE.txt
ORIGIN
licenses

View File

@@ -0,0 +1,16 @@
dist/common/limits.d/scylla.conf etc/security/limits.d
dist/ubuntu/sysctl.d/99-scylla.conf etc/sysctl.d
dist/common/sudoers.d/scylla etc/sudoers.d
dist/common/collectd.d/scylla.conf etc/collectd/collectd.conf.d
dist/common/scylla.d/*.conf etc/scylla.d
seastar/scripts/dpdk_nic_bind.py usr/lib/scylla
seastar/scripts/posix_net_conf.sh usr/lib/scylla
dist/common/scripts/* usr/lib/scylla
dist/ubuntu/scripts/* usr/lib/scylla
tools/scyllatop usr/lib/scylla
swagger-ui/dist usr/lib/scylla/swagger-ui
api/api-doc usr/lib/scylla/api
build/release/scylla usr/bin
build/release/iotune usr/bin
dist/common/bin/scyllatop usr/bin
dist/common/sbin/* usr/sbin

View File

@@ -27,11 +27,6 @@ chdir /var/lib/scylla
env HOME=/var/lib/scylla
pre-start script
eval "`grep -v -e "^\s*#" -e "^$" /etc/default/scylla-server|sed -e 's/^/export /'`"
. /etc/scylla.d/dev-mode.conf
. /etc/scylla.d/io.conf
export DEV_MODE
export SEASTAR_IO
if [ "$AMI" = "yes" ]; then
sudo /usr/lib/scylla/scylla-ami/scylla_ami_setup
fi
@@ -39,19 +34,16 @@ pre-start script
end script
script
eval "`grep -v -e "^\s*#" -e "^$" /etc/default/scylla-server|sed -e 's/^/export /'`"
. /etc/scylla.d/dev-mode.conf
. /etc/scylla.d/io.conf
export DEV_MODE
export SEASTAR_IO
exec /usr/bin/scylla $SCYLLA_ARGS $SEASTAR_IO $DEV_MODE
. /etc/default/scylla-server
for i in /etc/scylla.d/*.conf; do
if [ "$i" = "/etc/scylla.d/*.conf" ]; then
break
fi
. "$i"
done
exec /usr/bin/scylla $SCYLLA_ARGS $SEASTAR_IO $DEV_MODE $CPUSET
end script
post-stop script
eval "`grep -v -e "^\s*#" -e "^$" /etc/default/scylla-server|sed -e 's/^/export /'`"
. /etc/scylla.d/dev-mode.conf
. /etc/scylla.d/io.conf
export DEV_MODE
export SEASTAR_IO
sudo /usr/lib/scylla/scylla_stop
end script

View File

@@ -15,6 +15,34 @@ if [ "$RELEASE" = "14.04" ] || [ "$DISTRIBUTION" = "Debian" ]; then
debuild -r fakeroot --no-tgz-check -us -uc
cd -
fi
if [ ! -f build/scylla-env_1.0-0ubuntu1_all.deb ]; then
rm -rf build/scylla-env-1.0
cp -a dist/common/dep/scylla-env-1.0 build/
cd build/scylla-env-1.0
debuild -r fakeroot --no-tgz-check -us -uc
cd -
fi
if [ ! -f build/scylla-gdb_7.11-0ubuntu1_amd64.deb ]; then
rm -rf build/gdb-7.11
if [ ! -f build/gdb_7.11-0ubuntu1.dsc ]; then
wget -O build/gdb_7.11-0ubuntu1.dsc http://archive.ubuntu.com/ubuntu/pool/main/g/gdb/gdb_7.11-0ubuntu1.dsc
fi
if [ ! -f build/gdb_7.11.orig.tar.xz ]; then
wget -O build/gdb_7.11.orig.tar.xz http://archive.ubuntu.com/ubuntu/pool/main/g/gdb/gdb_7.11.orig.tar.xz
fi
if [ ! -f build/gdb_7.11-0ubuntu1.debian.tar.xz ]; then
wget -O build/gdb_7.11-0ubuntu1.debian.tar.xz http://archive.ubuntu.com/ubuntu/pool/main/g/gdb/gdb_7.11-0ubuntu1.debian.tar.xz
fi
cd build
dpkg-source -x gdb_7.11-0ubuntu1.dsc
mv gdb_7.11.orig.tar.xz scylla-gdb_7.11.orig.tar.xz
cd -
cd build/gdb-7.11
patch -p0 < ../../dist/ubuntu/dep/gdb.diff
echo Y | sudo mk-build-deps -i -r
debuild -r fakeroot --no-tgz-check -us -uc
cd -
fi
fi
if [ ! -f build/antlr3-c++-dev_3.5.2-1_all.deb ]; then

3101
dist/ubuntu/dep/gdb.diff vendored Normal file

File diff suppressed because it is too large Load Diff

71
dist/ubuntu/rules.in vendored
View File

@@ -1,17 +1,5 @@
#!/usr/bin/make -f
DOC = $(CURDIR)/debian/scylla-server/usr/share/doc/scylla-server
SCRIPTS = $(CURDIR)/debian/scylla-server/usr/lib/scylla
SWAGGER = $(SCRIPTS)/swagger-ui
API = $(SCRIPTS)/api
SYSCTL = $(CURDIR)/debian/scylla-server/etc/sysctl.d
SUDOERS = $(CURDIR)/debian/scylla-server/etc/sudoers.d
LIMITS= $(CURDIR)/debian/scylla-server/etc/security/limits.d
COLLECTD= $(CURDIR)/debian/scylla-server/etc/collectd/collectd.conf.d
SCYLLAD= $(CURDIR)/debian/scylla-server/etc/scylla.d
LIBS = $(CURDIR)/debian/scylla-server/usr/lib
CONF = $(CURDIR)/debian/scylla-server/etc/scylla
override_dh_auto_build:
./configure.py --disable-xen --enable-dpdk --mode=release --static-stdc++ --static-thrift --compiler=@@COMPILER@@
ninja
@@ -21,65 +9,6 @@ override_dh_auto_clean:
rm -rf Cql.tokens
rm -rf build.ninja seastar/build.ninja
override_dh_auto_install:
mkdir -p $(LIMITS) && \
cp $(CURDIR)/dist/common/limits.d/scylla.conf $(LIMITS)
mkdir -p $(SYSCTL) && \
cp $(CURDIR)/dist/ubuntu/sysctl.d/99-scylla.conf $(SYSCTL)
mkdir -p $(SUDOERS) && \
cp $(CURDIR)/dist/common/sudoers.d/scylla $(SUDOERS)
mkdir -p $(COLLECTD) && \
cp $(CURDIR)/dist/common/collectd.d/scylla.conf $(COLLECTD)
mkdir -p $(SCYLLAD) && \
cp $(CURDIR)/dist/common/scylla.d/*.conf $(SCYLLAD)
mkdir -p $(CONF) && \
cp $(CURDIR)/conf/scylla.yaml $(CONF)
cp $(CURDIR)/conf/cassandra-rackdc.properties $(CONF)
mkdir -p $(DOC) && \
cp $(CURDIR)/*.md $(DOC)
cp $(CURDIR)/NOTICE.txt $(DOC)
cp $(CURDIR)/ORIGIN $(DOC)
cp -r $(CURDIR)/licenses $(DOC)
mkdir -p $(SCRIPTS) && \
cp $(CURDIR)/seastar/scripts/dpdk_nic_bind.py $(SCRIPTS)
cp $(CURDIR)/seastar/scripts/posix_net_conf.sh $(SCRIPTS)
cp $(CURDIR)/dist/common/scripts/* $(SCRIPTS)
cp $(CURDIR)/dist/ubuntu/scripts/* $(SCRIPTS)
cp -r $(CURDIR)/tools/scyllatop $(SCRIPTS)
mkdir -p $(SWAGGER) && \
cp -r $(CURDIR)/swagger-ui/dist $(SWAGGER)
mkdir -p $(API) && \
cp -r $(CURDIR)/api/api-doc $(API)
mkdir -p $(CURDIR)/debian/scylla-server/usr/bin/ && \
cp $(CURDIR)/build/release/scylla \
$(CURDIR)/debian/scylla-server/usr/bin/
cp $(CURDIR)/build/release/iotune \
$(CURDIR)/debian/scylla-server/usr/bin/
cp $(CURDIR)/dist/common/bin/scyllatop \
$(CURDIR)/debian/scylla-server/usr/bin/
mkdir -p $(CURDIR)/debian/scylla-server/usr/sbin/ && \
cp -P $(CURDIR)/dist/common/sbin/* \
$(CURDIR)/debian/scylla-server/usr/sbin/
mkdir -p $(CURDIR)/debian/scylla-server/var/lib/scylla/data
mkdir -p $(CURDIR)/debian/scylla-server/var/lib/scylla/commitlog
mkdir -p $(CURDIR)/debian/scylla-server/var/lib/scylla/coredump
mkdir -p $(CURDIR)/debian/scylla-server-dbg/usr/lib/scylla
cp $(CURDIR)/scylla-gdb.py $(CURDIR)/debian/scylla-server-dbg/usr/lib/scylla
override_dh_installinit:
dh_installinit --no-start @@DH_INSTALLINIT@@

View File

@@ -97,7 +97,7 @@ public:
return _heart_beat_state;
}
void set_heart_beat_state(heart_beat_state hbs) {
void set_heart_beat_state_and_update_timestamp(heart_beat_state hbs) {
update_timestamp();
_heart_beat_state = hbs;
}

54
gms/feature.hh Normal file
View File

@@ -0,0 +1,54 @@
/*
* Copyright (C) 2016 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
namespace gms {
/**
* A gossip feature tracks whether all the nodes the current one is
* aware of support the specified feature.
*
* A feature should only be created once the gossiper is available.
*/
class feature final {
sstring _name;
bool _enabled;
friend class gossiper;
public:
explicit feature(sstring name, bool enabled = false);
~feature();
feature(const feature& other)
: feature(other._name, other._enabled)
{ }
feature& operator=(const feature& other) = delete;
const sstring& name() const {
return _name;
}
explicit operator bool() const {
return _enabled;
}
friend inline std::ostream& operator<<(std::ostream& os, const feature& f) {
return os << "{ gossip feature = " << f._name << " }";
}
};
} // namespace gms

View File

@@ -57,6 +57,7 @@
#include <seastar/core/thread.hh>
#include <chrono>
#include "dht/i_partitioner.hh"
#include <boost/range/algorithm/set_algorithm.hpp>
namespace gms {
@@ -534,7 +535,7 @@ void gossiper::run() {
// We don't care about heart_beat change on other CPUs - so ingnore this
// specific change.
//
shadow_endpoint_state_map[br_addr].set_heart_beat_state(hbs);
shadow_endpoint_state_map[br_addr].get_heart_beat_state() = hbs;
logger.trace("My heartbeat is now {}", endpoint_state_map[br_addr].get_heart_beat_state().get_heart_beat_version());
std::vector<gossip_digest> g_digests;
@@ -596,6 +597,8 @@ void gossiper::run() {
if (endpoint_map_changed || live_endpoint_changed || unreachable_endpoint_changed) {
if (endpoint_map_changed) {
shadow_endpoint_state_map = endpoint_state_map;
_features_condvar.broadcast();
maybe_enable_features();
}
if (live_endpoint_changed) {
@@ -612,6 +615,8 @@ void gossiper::run() {
if (engine().cpu_id() != 0) {
if (endpoint_map_changed) {
local_gossiper.endpoint_state_map = shadow_endpoint_state_map;
local_gossiper._features_condvar.broadcast();
local_gossiper.maybe_enable_features();
}
if (live_endpoint_changed) {
@@ -1109,21 +1114,13 @@ void gossiper::mark_alive(inet_address addr, endpoint_state& local_state) {
local_state.mark_dead();
msg_addr id = get_msg_addr(addr);
logger.trace("Sending a EchoMessage to {}", id);
auto ok = make_shared<bool>(false);
ms().send_gossip_echo(id).then_wrapped([this, id, ok] (auto&& f) mutable {
try {
f.get();
logger.trace("Got EchoMessage Reply");
*ok = true;
} catch (...) {
logger.warn("Fail to send EchoMessage to {}: {}", id, std::current_exception());
}
return make_ready_future<>();
}).get();
if (*ok) {
this->set_last_processed_message_at();
this->real_mark_alive(id.addr, local_state);
try {
ms().send_gossip_echo(id).get();
logger.trace("Got EchoMessage Reply");
set_last_processed_message_at();
real_mark_alive(id.addr, local_state);
} catch(...) {
logger.warn("Fail to send EchoMessage to {}: {}", id, std::current_exception());
}
}
@@ -1133,7 +1130,10 @@ void gossiper::real_mark_alive(inet_address addr, endpoint_state& local_state) {
local_state.mark_alive();
local_state.update_timestamp(); // prevents do_status_check from racing us and evicting if it was down > A_VERY_LONG_TIME
_live_endpoints.insert(addr);
_live_endpoints_just_added.push_back(addr);
auto it = std::find(_live_endpoints_just_added.begin(), _live_endpoints_just_added.end(), addr);
if (it == _live_endpoints_just_added.end()) {
_live_endpoints_just_added.push_back(addr);
}
_unreachable_endpoints.erase(addr);
_expire_time_endpoint_map.erase(addr);
logger.debug("removing expire time for endpoint : {}", addr);
@@ -1230,7 +1230,7 @@ void gossiper::apply_new_states(inet_address addr, endpoint_state& local_state,
// don't assert here, since if the node restarts the version will go back to zero
//int oldVersion = local_state.get_heart_beat_state().get_heart_beat_version();
local_state.set_heart_beat_state(remote_state.get_heart_beat_state());
local_state.set_heart_beat_state_and_update_timestamp(remote_state.get_heart_beat_state());
// if (logger.isTraceEnabled()) {
// logger.trace("Updating heartbeat state version to {} from {} for {} ...",
// local_state.get_heart_beat_state().get_heart_beat_version(), oldVersion, addr);
@@ -1447,7 +1447,7 @@ void gossiper::add_saved_endpoint(inet_address ep) {
if (it != endpoint_state_map.end()) {
ep_state = it->second;
logger.debug("not replacing a previous ep_state for {}, but reusing it: {}", ep, ep_state);
ep_state.set_heart_beat_state(heart_beat_state(0));
ep_state.set_heart_beat_state_and_update_timestamp(heart_beat_state(0));
}
ep_state.mark_dead();
endpoint_state_map[ep] = ep_state;
@@ -1529,6 +1529,7 @@ future<> gossiper::do_stop_gossiping() {
get_local_failure_detector().unregister_failure_detection_event_listener(&g);
}
g.uninit_messaging_service_handler();
g._features_condvar.broken();
return make_ready_future<>();
}).get();
});
@@ -1590,12 +1591,12 @@ bool gossiper::is_alive(inet_address ep) {
if (ep == get_broadcast_address()) {
return true;
}
auto eps = get_endpoint_state_for_endpoint(ep);
auto it = endpoint_state_map.find(ep);
// we could assert not-null, but having isAlive fail screws a node over so badly that
// it's worth being defensive here so minor bugs don't cause disproportionate
// badness. (See CASSANDRA-1463 for an example).
if (eps) {
return eps->is_alive();
if (it != endpoint_state_map.end()) {
return it->second.is_alive();
} else {
logger.warn("unknown endpoint {}", ep);
return false;
@@ -1749,32 +1750,69 @@ std::set<sstring> gossiper::get_supported_features() const {
return common_features;
}
static future<stop_iteration> check_features(auto features, auto need_features, auto expire) {
static bool check_features(std::set<sstring> features, std::set<sstring> need_features) {
logger.info("Checking if need_features {} in features {}", need_features, features);
if (std::includes(features.begin(), features.end(), need_features.begin(), need_features.end())) {
return make_ready_future<stop_iteration>(stop_iteration::yes);
}
if (gossiper::now() > expire) {
throw std::runtime_error(sprint("Unable to wait for feature %s", need_features));
return boost::range::includes(features, need_features);
}
future<> gossiper::wait_for_feature_on_all_node(std::set<sstring> features) {
return _features_condvar.wait([this, features = std::move(features)] {
return check_features(get_supported_features(), features);
});
}
future<> gossiper::wait_for_feature_on_node(std::set<sstring> features, inet_address endpoint) {
return _features_condvar.wait([this, features = std::move(features), endpoint = std::move(endpoint)] {
return check_features(get_supported_features(endpoint), features);
});
}
void gossiper::register_feature(feature* f) {
if (check_features(get_local_gossiper().get_supported_features(), {f->name()})) {
f->_enabled = true;
} else {
return sleep(std::chrono::seconds(2)).then([] {
return make_ready_future<stop_iteration>(stop_iteration::no);
});
_registered_features.emplace(f->name(), std::vector<feature*>()).first->second.emplace_back(f);
}
}
future<> gossiper::wait_for_feature_on_all_node(std::set<sstring> features, std::chrono::seconds timeout) const {
auto expire = now() + timeout;
return repeat([this, features, expire] {
return check_features(get_supported_features(), features, expire);
});
void gossiper::unregister_feature(feature* f) {
auto&& fs = _registered_features[f->name()];
auto it = std::find(fs.begin(), fs.end(), f);
if (it != fs.end()) {
fs.erase(it);
}
}
future<> gossiper::wait_for_feature_on_node(std::set<sstring> features, inet_address endpoint, std::chrono::seconds timeout) const {
auto expire = now() + timeout;
return repeat([this, features, endpoint, expire] {
return check_features(get_supported_features(endpoint), features, expire);
});
void gossiper::maybe_enable_features() {
if (_registered_features.empty()) {
return;
}
auto&& features = get_supported_features();
for (auto it = _registered_features.begin(); it != _registered_features.end(); ) {
if (features.find(it->first) != features.end()) {
for (auto&& f : it->second) {
f->_enabled = true;
}
it = _registered_features.erase(it);
} else {
++it;
}
}
}
feature::feature(sstring name, bool enabled)
: _name(name)
, _enabled(enabled) {
if (!_enabled) {
get_local_gossiper().register_feature(this);
}
}
feature::~feature() {
if (!_enabled) {
get_local_gossiper().unregister_feature(this);
}
}
} // namespace gms

View File

@@ -48,12 +48,14 @@
#include "gms/versioned_value.hh"
#include "gms/application_state.hh"
#include "gms/endpoint_state.hh"
#include "gms/feature.hh"
#include "message/messaging_service.hh"
#include <boost/algorithm/string.hpp>
#include <experimental/optional>
#include <algorithm>
#include <chrono>
#include <set>
#include <seastar/core/condition-variable.hh>
namespace gms {
@@ -514,17 +516,23 @@ private:
uint64_t _nr_run = 0;
bool _ms_registered = false;
bool _gossiped_to_seed = false;
private:
condition_variable _features_condvar;
std::unordered_map<sstring, std::vector<feature*>> _registered_features;
friend class feature;
public:
// Get features supported by a particular node
std::set<sstring> get_supported_features(inet_address endpoint) const;
// Get features supported by all the nodes this node knows about
std::set<sstring> get_supported_features() const;
// Wait for features are available on all nodes this node knows about
future<> wait_for_feature_on_all_node(std::set<sstring> features,
std::chrono::seconds timeout = std::chrono::seconds(300)) const;
future<> wait_for_feature_on_all_node(std::set<sstring> features);
// Wait for features are available on a particular node
future<> wait_for_feature_on_node(std::set<sstring> features, inet_address endpoint,
std::chrono::seconds timeout = std::chrono::seconds(300)) const;
future<> wait_for_feature_on_node(std::set<sstring> features, inet_address endpoint);
private:
void register_feature(feature* f);
void unregister_feature(feature* f);
void maybe_enable_features();
};
extern distributed<gossiper> _the_gossiper;

View File

@@ -877,7 +877,7 @@ $name$temp_param serializer<$name$temp_param>::read(Input& buf) {""").substitute
continue
local_param = "__local_" + str(index)
if "attribute" in param:
deflt = param["default"][0] if "default" in param else param["type"] + "()"
deflt = param["default"][0] if "default" in param else param_type(param["type"]) + "()"
fprintln(cout, Template(""" $typ $local = (in.size()>0) ?
$func(in, boost::type<$typ>()) : $default;""").substitute({'func' : DESERIALIZER, 'typ': param_type(param["type"]), 'local' : local_param, 'default': deflt}))
else:

View File

@@ -28,6 +28,7 @@ class result_digest final {
class result {
bytes_ostream buf();
std::experimental::optional<query::result_digest> digest();
api::timestamp_type last_modified() [ [version 1.2] ] = api::missing_timestamp;
};
}

15
init.cc
View File

@@ -64,18 +64,23 @@ void init_ms_fd_gossiper(sstring listen_address
}
future<> f = make_ready_future<>();
::shared_ptr<server_credentials> creds;
std::shared_ptr<credentials_builder> creds;
if (ew != encrypt_what::none) {
// note: credentials are immutable after this, and ok to share across shards
creds = ::make_shared<server_credentials>(::make_shared<dh_params>(dh_params::level::MEDIUM));
creds = std::make_shared<credentials_builder>();
creds->set_dh_level(dh_params::level::MEDIUM);
creds->set_x509_key_file(ms_cert, ms_key, x509_crt_format::PEM).get();
ms_trust_store.empty() ? creds->set_system_trust().get() :
creds->set_x509_trust_file(ms_trust_store, x509_crt_format::PEM).get();
if (ms_trust_store.empty()) {
creds->set_system_trust().get();
} else {
creds->set_x509_trust_file(ms_trust_store, x509_crt_format::PEM).get();
}
}
// Init messaging_service
net::get_messaging_service().start(listen, storage_port, ew, ssl_storage_port, creds).get();
// #293 - do not stop anything
//engine().at_exit([] { return net::get_messaging_service().stop(); });
// Init failure_detector

41
keys.cc
View File

@@ -23,6 +23,7 @@
#include "keys.hh"
#include "dht/i_partitioner.hh"
#include "clustering_bounds_comparator.hh"
std::ostream& operator<<(std::ostream& out, const partition_key& pk) {
return out << "pk{" << to_hex(pk) << "}";
@@ -52,3 +53,43 @@ partition_key_view::ring_order_tri_compare(const schema& s, partition_key_view k
}
return legacy_tri_compare(s, k2);
}
std::ostream& operator<<(std::ostream& out, const bound_kind k) {
switch(k) {
case bound_kind::excl_end:
return out << "excl end";
case bound_kind::incl_start:
return out << "incl start";
case bound_kind::incl_end:
return out << "incl end";
case bound_kind::excl_start:
return out << "excl start";
}
abort();
}
bound_kind invert_kind(bound_kind k) {
switch(k) {
case bound_kind::excl_start: return bound_kind::incl_end;
case bound_kind::incl_start: return bound_kind::excl_end;
case bound_kind::excl_end: return bound_kind::incl_start;
case bound_kind::incl_end: return bound_kind::excl_start;
}
abort();
}
int32_t weight(bound_kind k) {
switch(k) {
case bound_kind::excl_end:
return -2;
case bound_kind::incl_start:
return -1;
case bound_kind::incl_end:
return 1;
case bound_kind::excl_start:
return 2;
}
abort();
}
const thread_local clustering_key_prefix bound_view::empty_prefix = clustering_key::make_empty();

Some files were not shown because too many files have changed in this diff Show More