Commit Graph

14560 Commits

Author SHA1 Message Date
Jesse Haber-Kucharsky
0d1ea0a357 auth/authenticated_user: Mark functions noexcept 2018-02-14 14:15:58 -05:00
Jesse Haber-Kucharsky
6cb3b06112 auth/authenticated_user: Remove outdated comment 2018-02-14 14:15:58 -05:00
Jesse Haber-Kucharsky
64f844b870 auth/authenticated_user: Hide internal constant 2018-02-14 14:15:58 -05:00
Jesse Haber-Kucharsky
15a2b93970 auth/authenticated_user: Use default ctors 2018-02-14 14:15:58 -05:00
Jesse Haber-Kucharsky
fa94ee5a3a auth/authenticated_user: Move defns into namespace 2018-02-14 14:15:57 -05:00
Jesse Haber-Kucharsky
4fad30ef42 auth/authenticated_user: Remove whitespace 2018-02-14 14:15:57 -05:00
Jesse Haber-Kucharsky
2dd632f6e8 auth/authenticated_user: Use string_view in ctor 2018-02-14 14:15:57 -05:00
Jesse Haber-Kucharsky
fa159c0ac4 auth: Mark authenticated_user final 2018-02-14 14:15:57 -05:00
Jesse Haber-Kucharsky
f18dd25e7e cql3: Fix DROP ROLE IF EXISTS
Checking if the role to be dropped has superuser requires that the role
exists, which means `auth::nonexistent_role` was thrown even when IF
EXISTS was specified.
2018-02-14 14:15:57 -05:00
Jesse Haber-Kucharsky
b69c27d210 auth/standard_role_manager: Avoid string copies 2018-02-14 14:15:57 -05:00
Jesse Haber-Kucharsky
bcc1fbad3a auth/service.hh: Fix documentation for errors
There is a distinct difference between throwing an exceptional
immediately and returning an exceptional future.
2018-02-14 14:15:57 -05:00
Jesse Haber-Kucharsky
741d215516 auth: Switch to roles from users
This is a large change, but it's a necessary evil.

This change brings us to a minimally-functional implementation of roles.
There are many additional changes that are necessary, including refined
grammar, bug fixes, code hygiene, and internal code structure changes.
In the interest of keeping this patch somewhat read-able, those changes
will come in subsequent patches. Until that time, roles are still marked
"unimplemented".

IMPORTANT: This code does not include any mechanism for transitioning a
cluster from user-based access-control to role-based access control. All
existing access-control metadata will be ignored (though not deleted).

Specific changes:

- All user-specific CQL statements now delegate to their roles
  equivalent. The statements are effectively the same, but CREATE USER
  will include LOGIN automatically. Also, LIST USERS only lists roles
  with LOGIN.

- A call to LIST PERMISSIONS will now also list permissions of roles
  that have been granted to the caller, in addition to permissions which
  have been granted directly.

- Much of the logic of creating, altering, and deleting roles has been
  moved to `auth::service`, since these operations require cooperation
  between the authenticator, authorizer, and role-manager.

- LIST USERS actually works as expected now (fixes #2968).
2018-02-14 14:15:57 -05:00
Jesse Haber-Kucharsky
41f893d676 Don't use "experimental" optional
We're in C++17 country now.
2018-02-14 14:15:57 -05:00
Jesse Haber-Kucharsky
903ea32f30 auth/standard_role_manager: Fix life-time bug
It worked most of the time, but changes in other areas of the code must
have triggered the conditions necessary to make it fail.
2018-02-14 14:15:57 -05:00
Jesse Haber-Kucharsky
8878ce456c cql3/statements: Use convenient type alias 2018-02-14 14:15:57 -05:00
Jesse Haber-Kucharsky
36b283f7ea auth: Allow empty role updates 2018-02-14 14:15:57 -05:00
Jesse Haber-Kucharsky
34280c18bb tests: Rename helper function for clarity 2018-02-14 14:15:57 -05:00
Jesse Haber-Kucharsky
635dc3d5ed auth: Include missing header 2018-02-14 14:15:57 -05:00
Jesse Haber-Kucharsky
f2b78499fe auth: Fix logic in service::role_has_superuser
The previous code has an off-by-one error since the iterator is
incremented unconditionally prior to being compared to the end of the
collection.

This new version is also shorter thanks to `seastar::do_until`.
2018-02-14 14:15:57 -05:00
Jesse Haber-Kucharsky
28a840db72 auth: Add error handling for incompatible modules
The components of access-control (authentication, authorization, and
role-management) are designed as abstract interfaces, but due to
decisions of Apache Cassandra, certain implementations are dependent on
other particular implementations.

This change throws a new exception,
`auth::incompatible_module_combination`, when a dependency is not
satisfied.
2018-02-14 14:15:57 -05:00
Jesse Haber-Kucharsky
b3dc90d5d2 auth: Refactor authentication options
The set of allowed options is quite small, so we benefit from a static
representation (member variables) over a dynamic map.

We also logically move the "OPTIONS" option to the domain of the
authenticator (from user management), since this is where it is applied.

This refactor also aims to reduce compilation time by moving
`authentication_options` into its own header file.

While changes to `user_options` were necessary to accommodate the new
structure, that class will be deprecated shortly in the switch to roles.
Therefore, the changes are strictly temporary.
2018-02-14 14:15:57 -05:00
Paweł Dziepak
6c1503241d Merge seastar upstream
* seastar 2b0a81d...383ccd6 (9):
  > future-util: relax concept requirements for do_for_each()
  > seastar-addr2line: improve UX for bactraces read from stdin
  > noncopyable_function: Lift the noexcept guarantee
  > queue: doxygen documentation
  > queue: documentation
  > build: reinstate -Wsign-compare
  > iotune: don't compare sign and unsigned types
  > future-util: Remove unused local in with_scheduling_group()
  > tests/test-utils: Add macro for running tests within a seastar thread
2018-02-14 14:37:42 +00:00
Duarte Nunes
6f7233fbaf cql3/statements/truncate_statement: Prevent MV from being truncated
To truncate an MV, one must truncate the base table.

Fixes #3188

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180209162720.32757-1-duarte@scylladb.com>
2018-02-13 11:37:27 +00:00
Duarte Nunes
771852e731 Merge 'Fix possible stall in calculate_pending_ranges' from Asias
When the cluster is large or the num_tokens is big, calculate_pending_ranges
can take long time to complete. It now runs in the gossip thread so it can
block the gossip processing. Another problem is it runs in a plain for loop and
can cause the reactor stall.

User see this stall with decommission operations.

I can reproduce up to 4 seconds stall within a two-node cluster each with
`--num-tokens 3072` during decommission.

Tests: update_cluster_layout_tests.py:TestUpdateClusterLayout

Fixes #3203

* tag 'asias/issue_3203_v2.1' of github.com:scylladb/seastar-dev:
  storage_service: Do not wait for update_pending_ranges in handle_state_leaving
  token_metadata: Handle affected_ranges with do_for_each
  token_metadata: Split token_metadata::calculate_pending_ranges
  token_metadata: Futurize calculate_pending_ranges
  storage_service: Futurize storage_service::do_update_pending_ranges
  token_metadata: Speed up token_metadata::get_endpoint
2018-02-13 11:12:22 +00:00
Asias He
74b4035611 storage_service: Do not wait for update_pending_ranges in handle_state_leaving
The call chain is:

storage_service::on_change() -> storage_service::handle_state_leaving()
-> storage_service::update_pending_ranges()

Listeners run as part of gossip message processing, which is
serialized. This means we won't be processing any gossip messages until
update_pending_ranges completes. update_pending_ranges takes time to
complete.

Since we do not wait for update_pending_ranges to complete any more,
multiple update_pending_ranges operations can run at the same time, use
serialized_action to serialize it.

Tested with update_cluster_layout_tests.py
2018-02-13 19:00:43 +08:00
Asias He
c17ce79977 token_metadata: Handle affected_ranges with do_for_each
affected_ranges can be very large in a large cluster or node with big
num_tokens account. calculate_natural_endpoints takes more time to
process in this case as well.

Futurize calculate_pending_ranges_for_leaving and handle the loop with
do_for_each to give some time for the reactor to breath, so it does not
block.
2018-02-13 19:00:43 +08:00
Asias He
60143a7517 token_metadata: Split token_metadata::calculate_pending_ranges
token_metadata::calculate_pending_ranges is a complicated function.
Split it into 3 parts for leaving operation, moving opeartion,
bootstrap opeartion.
2018-02-13 19:00:43 +08:00
Asias He
1834dd023f token_metadata: Futurize calculate_pending_ranges
Now, do_update_pending_ranges is futurized. We can finally futurize
token_metadata::calculate_pending_ranges in order to convert the loops
inside it to do_for_each insead of plain for loops to avoid reactor
stall.
2018-02-13 19:00:43 +08:00
Asias He
33c43b78c7 storage_service: Futurize storage_service::do_update_pending_ranges
Preparation work for the futurizing of the time consuming
token_metadata::calculate_pending_ranges.

In addition, we use do_for_each for the loop. It is better than the
plain for loop because the reactor can yield to avoid stalls in cases
there are tons of keyspaces.
2018-02-13 19:00:43 +08:00
Asias He
96266fc76a token_metadata: Speed up token_metadata::get_endpoint
token_metadata::calculate_pending_ranges ->
abstract_replication_strategy::calculate_natural_endpoints
-> token_metadata::get_endpoint()

With std::map

   INFO  2018-02-09 14:58:32,960 [shard 0] token_metadata - In
   calculate_pending_ranges: affected_ranges.size=6145 stars
   Reactor stalled for 4000 ms on shard 0.
   Backtrace:
     0x00000000004b12cb
     0x00000000004b1561
     /lib64/libpthread.so.0+0x00000000000123af
     0x0000000001159e25
     0x00000000011581eb
     0x000000000114f122
     0x000000000119f8c7
     0x00000000011985a4
     0x00000000011a7e16
     0x0000000001364741
     0x00000000013fe9fd
     0x00000000013ff792
     0x00000000014024b2
     0x000000000141a66f
     0x000000000141d7be
     0x00000000010ed234
     0x000000000112fdaa
     0x00000000011301f4
     0x000000000043543d
   INFO  2018-02-09 14:58:35,993 [shard 0] token_metadata - In
   calculate_pending_ranges: affected_ranges.size=6145 ends

With std::unordered_map

    INFO  2018-02-09 14:47:50,251 [shard 0] token_metadata - In
    calculate_pending_ranges: affected_ranges.size=6145 stars
    INFO  2018-02-09 14:47:51,585 [shard 0] token_metadata - In
    calculate_pending_ranges: affected_ranges.size=6145 ends
2018-02-13 19:00:42 +08:00
Duarte Nunes
ac6abf8021 Merge 'CQL clustering column secondary indexing support' from Pekka
"This patch series adds support for clustering column secondary indexing.

Fixes #2961

Tests: unit-tests (release)"

* 'penberg/cql-2i-clustering-key-indexing/v2' of github.com:penberg/scylla:
  tests/cql_query_test: Add indexed clustering key query test
  cql3: Fix clustering column secondary indexing
  cql3/statements: Add values() helper to restrictions
  cql3/restrictions: Fix multi_column_restriction::values()
  cql3/restrictions: Fix single_column_primary_key_restrictions::values()
2018-02-12 18:49:34 +00:00
Amnon Heiman
d88c27614e scylla-housekeeping: add configuration for api-address
This patch makes the api address and port configurable.

Fixes #2332

Message-Id: <20180204095628.1210-1-amnon@scylladb.com>
2018-02-12 15:26:46 +02:00
Amnon Heiman
449f9af0db API: Use stream_range_as_array to return token endpoints
The token_to_endpoint map can get big that trying to convert it to a
vector will cause large allocation warning.

This patch replace the implementation, so the return json array will be
created directly from the map by using stream_range_as_array helper
function.

Fixes #3185

Message-Id: <20180207153306.30921-1-amnon@scylladb.com>
2018-02-12 15:24:07 +02:00
Avi Kivity
e77ecda1da tests: avoid signed/unsigned compares
Container indices are size_t, and in other places we gratuituously
declare a limit as unsigned and the loop index as signed.

Tests: unit (release)
Message-Id: <20180212121642.10525-1-avi@scylladb.com>
2018-02-12 12:25:21 +00:00
Avi Kivity
87f10bc853 sstables: continuous_data_consumer: make _remain an unsigned type
All of the adjustments to _remain already ensure it is greater than 0,
and indeed a negative _remain doesn't make sense.

Switching to an unsigne types allows us to re-enable -Wsign-compare.

Tests: unit (release)
Message-Id: <20180212121636.10463-1-avi@scylladb.com>
2018-02-12 12:25:21 +00:00
Avi Kivity
55168592ad compaction_manager: fix use-after-free of column_family
Commit cce1a2bce8 ("Use the CPU scheduler")
placed some compaction manager code in a scheduling_group. Unfortunately,
downstream code relied on the callers not deferring, so it can rely
on the column_family's existence. That doesn't happen if the column_family
is removed quickly, as with_scheduling_group() always defers.

Fix applying the scheduling group after we've taken the lock and guaranteed
the stability of the column_family object.

Fixes #3196.
Message-Id: <20180211165155.18179-1-avi@scylladb.com>
2018-02-11 17:53:35 +00:00
Avi Kivity
3f5a8229ac tests: fix for sstable::get_index_reader() removal
71495691aa removed sstable::get_index_reader(),
but forgot to update its callers in tests/.  Update the callers to construct
a temporary shared_index_list and create the index_reader directly.

This is none too clean, but shared_index_lists needs to be retired, and then
the changes in this patch can go away too.

Tests: unit (release)
Message-Id: <20180211164739.17862-1-avi@scylladb.com>
2018-02-11 17:53:08 +00:00
Vladimir Krivopalov
71495691aa Use separate shared_index_lists per sstable_mutation_reader instead of a single one per sstable.
With the changes introduced in #2981, it is no longer safe to share
index_entries among multiple sstable_mutation_readers.
The original intent behind sharing index_entries among index_readers was
to avoid re-reading same pages twice as we have two index readers -
lower and upper bound - for every sstable_mutation_reader. In fact, the
shared entries were held at the sstable object level so index_readers
from different sstable_mutation_readers could have accessed them.

Now, with calls to index_reader::advance_to(pos)/index_reader::advance_past(pos),
index_entry can be accessed in a way that modifies its state if we need
to read more promoted index blocks. It is safe to keep sharing them
between two index_readers within the same sstable_mutation_reader as the
invariant is maintained that readers can be only moved forward.
We cannot safely assume, however, that this invariant holds for multiple
sstable_mutation_readers as it may happen that one of them has read and
thrown away some promoted index blocks that another one needs. So we
restrict sharing to per-sstable_mutation_reader level.

Fixes #3189.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <83957d007621fe4c62af49aebf1838bb2f32ee55.1518226793.git.vladimir@scylladb.com>
2018-02-10 15:08:45 +02:00
Duarte Nunes
d757c87107 cql3/query_processor: Remove prepared statements upon dropping a view
Fixes #3198

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180209143652.31852-1-duarte@scylladb.com>
2018-02-09 16:30:28 +00:00
Avi Kivity
432268f582 Merge "branch 'remove_atomic_deletion_manager_v2' of github.com:raphaelsc/scylla" from Raphael
"The motivation is that it's no longer needed after new resharding
algorithm that is the sole responsible for working with shared
sstables and regular compaction will not work with those!
So resharding will schedule deletion of shared sstables once it's
certain that shards that own them have the new unshared sstables.
The manager was needed for orchestrating deletion of shared sstable
across shards. It brings extra complexity that's not longer needed,
and it was also overloading shard 0, but the latter could have
been fixed.

Tests:
- unit: release mode
- dtest: resharding_test.py"

* 'remove_atomic_deletion_manager_v2' of github.com:raphaelsc/scylla:
  Remove SSTable's atomic deletion manager
  Stop using SSTable's atomic deletion manager
  database: split column_family::rebuild_sstable_list
2018-02-08 19:10:16 +02:00
Duarte Nunes
456b678e0b database.hh: Fix data query stage argument type
Fixes a merge gone wrong.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180208163338.25238-1-duarte@scylladb.com>
2018-02-08 16:35:10 +00:00
Avi Kivity
404172652e Merge "Use xxHash for digest instead of MD5" from Duarte
"This series changes digest calculation to use a faster algorithm
(xxHash) and to also cache calculated cell hashes that can be kept in
memory to speed up subsequent digest requests.

The MD5 hash function has proved to be slow for large cell values:

size = 256; elapsed = 4us
size = 512; elapsed = 8us
size = 1024; elapsed = 14us
size = 2048; elapsed = 21us
size = 4096; elapsed = 33us
size = 8192; elapsed = 51us
size = 16384; elapsed = 86us
size = 32768; elapsed = 150us
size = 65536; elapsed = 278us
size = 131072; elapsed = 531us
size = 262144; elapsed = 1032us
size = 524288; elapsed = 2026us
size = 1048576; elapsed = 4004us
size = 2097152; elapsed = 7943us
size = 4194304; elapsed = 15800us
size = 8388608; elapsed = 31731us
size = 16777216; elapsed = 64681us
size = 33554432; elapsed = 130752us
size = 67108864; elapsed = 263154us

The xxHash is a non-cryptographic, 64bit (there's work in progress on
the 128 version) hash that can be used to replace MD5. It performs much
better:

size = 256; elapsed = 2us
size = 512; elapsed = 1us
size = 1024; elapsed = 1us
size = 2048; elapsed = 2us
size = 4096; elapsed = 2us
size = 8192; elapsed = 3us
size = 16384; elapsed = 5us
size = 32768; elapsed = 8us
size = 65536; elapsed = 14us
size = 131072; elapsed = 28us
size = 262144; elapsed = 59us
size = 524288; elapsed = 116us
size = 1048576; elapsed = 226us
size = 2097152; elapsed = 456us
size = 4194304; elapsed = 935us
size = 8388608; elapsed = 1848us
size = 16777216; elapsed = 4723us
size = 33554432; elapsed = 10507us
size = 67108864; elapsed = 21622us

Performance was tested using a 3 node cluster with 1 cpu and 8GB,
and with the following cassandra-stress loaders. Measurements are for
the read workload.

sudo taskset -c 4-15 ./cassandra-stress write cl=ALL n=5000000 -schema 'replication(factor=3)' -col 'size=FIXED(1024) n=FIXED(4)' -mode native cql3 -rate threads=100
sudo taskset -c 4-15 ./cassandra-stress mixed cl=ALL 'ratio(read=1)' n=10000000 -pop 'dist=gauss(1..5000000,5000000,500000)' -col 'size=FIXED(1024) n=FIXED(4)' -mode native cql3 -rate threads=100

xxhash + caching:

Results:
op rate                   : 32699 [READ:32699]
partition rate            : 32699 [READ:32699]
row rate                  : 32699 [READ:32699]
latency mean              : 3.0 [READ:3.0]
latency median            : 3.0 [READ:3.0]
latency 95th percentile   : 3.9 [READ:3.9]
latency 99th percentile   : 4.5 [READ:4.5]
latency 99.9th percentile : 6.6 [READ:6.6]
latency max               : 24.0 [READ:24.0]
Total partitions          : 10000000 [READ:10000000]
Total errors              : 0 [READ:0]
total gc count            : 0
total gc mb               : 0
total gc time (s)         : 0
avg gc time(ms)           : NaN
stdev gc time(ms)         : 0
Total operation time      : 00:05:05
END

md5:

Results:
op rate                   : 25241 [READ:25241]
partition rate            : 25241 [READ:25241]
row rate                  : 25241 [READ:25241]
latency mean              : 3.9 [READ:3.9]
latency median            : 3.9 [READ:3.9]
latency 95th percentile   : 5.1 [READ:5.1]
latency 99th percentile   : 5.8 [READ:5.8]
latency 99.9th percentile : 8.0 [READ:8.0]
latency max               : 24.8 [READ:24.8]
Total partitions          : 10000000 [READ:10000000]
Total errors              : 0 [READ:0]
total gc count            : 0
total gc mb               : 0
total gc time (s)         : 0
avg gc time(ms)           : NaN
stdev gc time(ms)         : 0
Total operation time      : 00:06:36
END

This translates into a 21% improvoment for this workload.

Bigger cell values were also tested:

sudo taskset -c 4-15 ./cassandra-stress write cl=ALL n=1000000 -schema 'replication(factor=3)' -col 'size=FIXED(4096) n=FIXED(4)' -mode native cql3 -rate threads=100
sudo taskset -c 4-15 ./cassandra-stress mixed cl=ALL 'ratio(read=1)' n=10000000 -pop 'dist=gauss(1..1000000,500000,100000)' -col 'size=FIXED(4096) n=FIXED(4)' -mode native cql3 -rate threads=100

xxhash + caching:

Results:
op rate                   : 19964 [READ:19964]
partition rate            : 19964 [READ:19964]
row rate                  : 19964 [READ:19964]
latency mean              : 4.9 [READ:4.9]
latency median            : 4.6 [READ:4.6]
latency 95th percentile   : 7.2 [READ:7.2]
latency 99th percentile   : 11.5 [READ:11.5]
latency 99.9th percentile : 13.6 [READ:13.6]
latency max               : 29.2 [READ:29.2]
Total partitions          : 10000000 [READ:10000000]
Total errors              : 0 [READ:0]
total gc count            : 0
total gc mb               : 0
total gc time (s)         : 0
avg gc time(ms)           : NaN
stdev gc time(ms)         : 0
Total operation time      : 00:08:20
END

md5:

Results:
op rate                   : 12773 [READ:12773]
partition rate            : 12773 [READ:12773]
row rate                  : 12773 [READ:12773]
latency mean              : 7.7 [READ:7.7]
latency median            : 7.3 [READ:7.3]
latency 95th percentile   : 10.2 [READ:10.2]
latency 99th percentile   : 16.8 [READ:16.8]
latency 99.9th percentile : 19.2 [READ:19.2]
latency max               : 71.5 [READ:71.5]
Total partitions          : 10000000 [READ:10000000]
Total errors              : 0 [READ:0]
total gc count            : 0
total gc mb               : 0
total gc time (s)         : 0
avg gc time(ms)           : NaN
stdev gc time(ms)         : 0
Total operation time      : 00:13:02
END

This translates into a 37% improvoment for this workload.

Fixes #2884

Tests: unit-tests (release), dtests (smp=2)

Note: dtests are kinda broken in master (> 30 failures), so take the
tests tag with a grain of himalayan salt."

* 'xxhash/v5' of https://github.com/duarten/scylla: (29 commits)
  tests/row_cache_test: Test hash caching
  tests/memtable_test: Test hash caching
  tests/mutation_test: Use xxHash instead of MD5 for some tests
  tests/mutation_test: Test xx_hasher alongside md5_hasher
  schema: Remove unneeded include
  service/storage_proxy: Enable hash caching
  service/storage_service: Add and use xxhash feature
  message/messaging_service: Specify algorithm when requesting digest
  storage_proxy: Extract decision about digest algorithm to use
  cache_flat_mutation_reader: Pre-calculate cell hash
  partition_snapshot_reader: Pre-calculate cell hash
  query::partition_slice: Add option to specify when digest is requested
  row: Use cached hash for hash calculation
  mutation_partition: Replace hash_row_slice with appending_hash
  mutation_partition: Allow caching cell hashes
  mutation_partition: Force vector_storage internal storage size
  test.py: Increase memory for row_cache_stress_test
  atomic_cell_hash: Add specialization for atomic_cell_or_collection
  query-result: Use digester instead of md5_hasher
  range_tombstone: Replace feed_hash() member function with appending_hash
  ...
2018-02-08 18:24:58 +02:00
Avi Kivity
6298655178 Merge "Inline and optimise more aggressively" from Paweł
"We have noticed in the past that the compiler is too conservative when it comes
to deciding which functions to inline. Since inlining functions enables further
optimisations such as const folding in some cases the difference in performance
was significant enough to force us to add [[gnu::always_inline]] attribute in
numerous places. However, this is neither a partical nor an elegant solution.

A better way to deal with the problem is to adjust the compiler tunables that
control the heuristics used for making inlining decisions. In particular,
inline-unit-growth seems to affect the performance of the emitted code most.

Apart from making the compiler more eager to inline functions bumping the
optimisation level to -O3 also seems to have a positive impact on the
performance.

Fixes #1644.

Tests: unit-test (release)

Performance tested with gcc 7.3.

Macrobenchmark
perf_simple_query
Flags: -c4 --duration 60
All results are medians.

         ./before    ./after   diff
 read   338662.12  405377.80  19.7%
 write  387378.89  466744.15  20.5%

Microbenchmarks
single run duration:      1.000s
number of runs:           5

BEFORE
test                                      iterations      median         mad         min         max
combined.one_row                              858933   536.389ns     0.819ns   534.823ns   537.208ns
combined.single_active                          8469    77.131us    11.000ns    77.118us    77.145us
combined.many_overlapping                       1199   664.105us   160.807ns   663.818us   668.527us
combined.disjoint_interleaved                   8100    75.522us    22.254ns    75.500us    75.732us
combined.disjoint_ranges                        8288    72.580us    10.571ns    72.568us    72.599us
memtable.one_partition_one_row               1216233   825.581ns     0.446ns   821.450ns   826.027ns
memtable.one_partition_many_rows              127336     7.855us     2.153ns     7.853us     7.898us
memtable.many_partitions_one_row               57919    17.356us     6.028ns    17.259us    17.362us
memtable.many_partitions_many_rows              4751   210.496us   102.339ns   210.393us   211.188us

AFTER
test                                      iterations      median         mad         min         max
combined.one_row                             1002321   450.292ns     0.313ns   447.202ns   450.605ns
combined.single_active                          9605    67.086us     8.620ns    67.073us    67.115us
combined.many_overlapping                       1476   519.554us     5.334ns   519.549us   519.953us
combined.disjoint_interleaved                   9280    64.363us     5.328ns    64.335us    64.369us
combined.disjoint_ranges                        9481    61.893us     3.620ns    61.885us    61.903us
memtable.one_partition_one_row               1432668   699.775ns     0.106ns   696.023ns   699.918ns
memtable.one_partition_many_rows              153692     6.536us     6.885ns     6.501us     6.543us
memtable.many_partitions_one_row               63319    15.879us     5.080ns    15.793us    15.884us
memtable.many_partitions_many_rows              5659   176.717us    66.770ns   176.650us   177.778us"

* tag 'optimise-and-inline/v2' of https://github.com/pdziepak/scylla:
  configure.py: set optimisation level to -O3
  configure.py: set inline-unit-growth to 300
  configure.py: flag_supported: support flags with spaces
  configure.py: rename warning_supported to flag_supported
  configure.py: pass optimisation flags to seastar/configure.py
  cql3/select_statement: do not capture stack variables by reference
2018-02-08 17:45:41 +02:00
Tomasz Grabiec
cce1a2bce8 Merge "Use the CPU scheduler" from Glauber & Avi
In this patchset I am resubmitting Avi's enablement of the CPU scheduler
in his behalf. I've done a ton of testing in the series and there are
some improvements / changes that I had previously sent as a separate series.

What you see here is the result of merging that work.

After this patchset is applied, workloads are smoother and we are able to
uphold the pre-defined shares among the various actors.

We also finally have everything we need to merge the CPU and I/O controllers.
After that is done the code is now much simpler. But also, as a bonus,
controllers that were previously available for I/O only (compactions) are
enabled for CPU as well.

* git@github.com:glommer/scylla.git cpusched-v7:

Avi Kivity (4):
  database, sstables, compaction: convert use of thread_scheduling_group
    to seastar cpu scheduler
  memtable, database: make memtable::clear_gently() inherit
    scheduling_group
  config: mark background_writer_scheduling_quota as Unused
  database: place data_query execution stage into scheduling_group

Glauber Costa (9):
  database, main: set up scheduling_groups for our main tasks
  row_cache: actually use the scheduling group for update_cache
  allow update_cache and clear_gently to use the entire task quota.
  database: remove cpu_flush_quota metric
  controllers: retire auto_adjust_flush_quota
  controllers: allow memtable I/O controller to have shares statically
    set
  controllers: update control points for memtable I/O controller
  controllers: allow a static priority to override the controller output
  controllers: unify the I/O and CPU controllers
2018-02-08 15:58:40 +01:00
Paweł Dziepak
eb5b76ea50 configure.py: set optimisation level to -O3 2018-02-08 14:46:11 +00:00
Paweł Dziepak
bc65659a46 configure.py: set inline-unit-growth to 300
It has been discovered that the compiler is too conservative when
deciding which functions to inline. In particular, the limiting tunable
turned out to be inline-unit-growth which limits inlining in large
translation units.
2018-02-08 14:46:11 +00:00
Paweł Dziepak
89063a9cc0 configure.py: flag_supported: support flags with spaces 2018-02-08 14:46:11 +00:00
Paweł Dziepak
8f4b30b572 configure.py: rename warning_supported to flag_supported
warning_supported() can be used to detect support of any compiler flag,
not just warnings.
2018-02-08 14:46:11 +00:00
Paweł Dziepak
a8372b87eb configure.py: pass optimisation flags to seastar/configure.py 2018-02-08 14:46:11 +00:00
Paweł Dziepak
b635fec9bf cql3/select_statement: do not capture stack variables by reference
Default capture by reference considered harmful in async code.
2018-02-08 14:46:10 +00:00