Commit Graph

11716 Commits

Author SHA1 Message Date
Duarte Nunes
11bd3bd29f database: Ensure new write_type is correctly printed
By removing the default case in the switch statement over a write_type
variable, we ensure the compiler warns us about lack of exhaustiveness
in case we add a value to the enum but forget to change the
corresponding operator<<().

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:36:45 +01:00
Nadav Har'El
365df8f900 materialized views: match base and view replicas
A function to find the appropriate replica to send a view update to.

This patch creates a new source file db/view/view.cc. We should
eventually move a lot more of the materialized views code there.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2017-02-06 13:36:45 +01:00
Duarte Nunes
16206e9f15 column_family: Generate view updates
This patch adds the generate_view_updates() function to the
column_family class, which will use the view_update_builder to
generate updates to the column_family's materialized views.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:36:45 +01:00
Duarte Nunes
90cb35db04 column_family: Adds affected_views() function
This patch the affected_views() to determine the column family's
views a given update affects.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:36:45 +01:00
Duarte Nunes
d5a61a8c48 view: Add view_update_builder class
This patch adds the view_update_builder class, which is responsible
for calculating the mutations to apply to a column family's
materialized views, given a streamed_mutation representing an update
to the base table and a streamed_mutation representing the
pre-existing rows which the update covers.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:36:45 +01:00
Duarte Nunes
2ab9ba995a range_tombstone_accumulator: Expose current tombstone
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:36:45 +01:00
Duarte Nunes
f3c5ea392a range_tombstone_accumulator: apply() takes value
range_tombstone_accumulator::apply() now takes a value so the caller
can decide whether to move or copy the argument.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:36:45 +01:00
Duarte Nunes
3991a58f08 view_updates: Generate updates
This patch adds the view_updates::generate_update() function to
generate view updates given a base row update and the corresponding,
pre-existing row. This function will decide which of the previously
introduced functions to call based on whether there is a pre-existing
row and whether there exists a regular base column that's part of the
view's PK.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:36:45 +01:00
Duarte Nunes
861d2dfb61 view_updates: Adds function to replace row
This patch adds a function to replace a view row given a base
table update and the pre-existing row, which simply deletes the old
view entry and adds a new one.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:36:45 +01:00
Duarte Nunes
7901ce7de4 view_updates: Update view entry
This patch introduces the view_updates::update_entry function,
which creates the updates to apply to the existing view entry given
the base table row before and after the update.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:36:45 +01:00
Duarte Nunes
b34ae6d6da view_updates: Delete old view entry
This patch introduces the view_updates::delete_old_entry function,
which creates a view row mutation to delete an entry given an updated
base table row.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:36:45 +01:00
Duarte Nunes
7e150a18eb mutation_partition: Introduce shadowable tombstone
This patch introduces shadowable row tombstones. A shadowable row
tombstone is valid only if the row has no live marker. In other words,
the row tombstone is only valid as long as no newer insert is done
(thus setting a live row marker; note that if the row timestamp set
is lower than the tombstone's, then the tombstone remains in effect
as usual).

If a row has a shadowable tombstone with timestamp Ti and that row
is updated with a timestamp Tj, such that Tj > Ti (and that update
sets the row marker), then the shadowable tombstone is shadowed by
that update. A concrete consequence is that if the update has cells
with timestamp lower than Ti, then those cells are preserved (since
the deletion is removed), and this is contrary to a regular,
non-shadowable row tombstone where the tombstone is preserved and
such cells are removed.

Currently, only Materialized Views require shadowable row tombstones,
which solve a problem with view row deletions. Consider a base row with
columns p, v1, v2, PRIMARY KEY (p) denormalized into a view row consisting
of columns p, v1, v2 PRIMARY KEY (p, v1), and the following operations:

1) INSERT INTO base (p, v1, v2) VALUES (0, 0, 1) USING TIMESTAMP 0;
2) UPDATE base SET v1 = 1 USING TIMESTAMP 1 WHERE p = 0;
3) UPDATE base SET v1 = 0 USING TIMESTAMP 2 WHERE p = 0;

Without shadowable tombstones, the view contains:

At 1), pk = (0, 0), row_marker@T0, v2=1@T0
At 2), pk = (0, 0), row_marker@T0, row_tombstone@T1, v2=1@T0
       pk = (0, 1), row_marker@T1, v2=1@T0
At 3), pk = (0, 0), row_marker@T2, row_tombstone@T1, v2=1@T0
       pk = (0, 1), row_marker@T1, row_tombstone@T2, v2=1@T0

Notice how, if we read row (0, 0), the value of v2 will be shadowed by
the row tombstone we previously inserted.

With a view's row tombstone becoming shadowable, at 3) the row (0, 0)
will look like pk = (0, 0), row_marker@T2, shadowable_tombstone@T1, v2=1@T0,
which is equivalent to pk = (0, 0), row_marker@T2, v2=1@T0.

Since the shadowable tombstone is shadowed by the new row marker (T0 <
T2), now v2 would be taken into account.

Finally, note that this patch doesn't generalize the idea of
shadowable tombstone, instead taking advantage of the fact that they
are only needed by Materialized Views. This saves changing the
tombstone representation to account for an extra flag, the bits such
representation would require, and also avoids changes to the storage
format.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:36:45 +01:00
Duarte Nunes
e0f642180f view_updates: Create view entry
This patch introduces the view_updates::create_entry function, which
creates a view row mutation given a new base table row.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:35:31 +01:00
Duarte Nunes
b8b8a8099c view_updates: Compute row marker
This patch adds a function to compute the row marker of a view row
given the base row. There are two cases to consider when building the
row marker: 1) there is a column C that is a regular base column but
is in the view PK; and 2) the columns for the base and the view PKs
are the same.

For 1), the view row marker timestamp will be the biggest between the
base's row marker and C. The TTL will be that of C. This means that if
C expires, the view row maker will expire as well (and the row, if no
other column is keeping it alive). Note that if the base row marker
expires but not C, then the base row will still be live due to C and
we shouldn't expire the view row.

For 2), the view row timestamp will be the same as the base row
timestamp. The TTL should be set in such a way that both base and view
rows live for the same time. We thus set the view row TTL to be the
max of any other TTL in the base row. This is particularly important
in the case where the base row marker has a TTL, but a column *absent*
from the view holds a greater one.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:35:31 +01:00
Duarte Nunes
7321938bcf view: Introduce view_updates class
This patch introduces the view_updates class, which is responsible
for generating and storing updates to a particular materialized view.

The updates will be generated from an updated base row and the
pre-existing one (if any), in later patches.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:35:31 +01:00
Duarte Nunes
0f8dbc9243 collection_type_impl: Iterate over collection cells
This patch introduces the collection_type_impl::for_each_cell()
function, which allows the caller to iterate over the cells of a
particular collection_mutation_view.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:35:31 +01:00
Duarte Nunes
082ef56df1 view: Store pk view column that's non-pk in the base
To help calculate the view mutations from a base update, we store in
the view class the column that's part of the view's primary key but
not part of the base's, if such column exists.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:35:30 +01:00
Duarte Nunes
734ad80390 view: Add matches_view_filter() function
This patch adds the matches_view_filter() function which specifies
whether a given base row matches the view filter. Unlike
may_be_affected_by(), this function has no false positives.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:35:30 +01:00
Duarte Nunes
7be0f319d4 single_column_restriction: Filter clustering rows
This patch adds the is_satisfied_by() function to
single_column_restriction, which given a clustering row returns
whether the restrictions applies or not.

This is useful for secondary indexing such as materialized views,
where filters on regular columns precisely select which base table
rows to denormalize.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:35:30 +01:00
Duarte Nunes
3b52440ff3 statement_restrictions: Expose non-pk restrictions
This patch exposes the non-primary key column restrictions in a given
select statement, exposing them as single_column_restrictions.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:35:30 +01:00
Duarte Nunes
e987d87ab1 collection_type_impl: Identify concrete types
This patch adds the is_set() and is_list() functions to
collection_type_impl, which identify the concrete collection
type.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:35:30 +01:00
Duarte Nunes
71faa4a4eb abstract_restriction: Rename uses_function()
This patch renames abstract_restriction::uses_function() to
term_uses_function(), as it was previously hiding a function with the
same name in the restriction base class.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:35:30 +01:00
Duarte Nunes
21d1bbb527 view: Add may_be_affected_by function
This patch adds the may_be_affected_by() function to the view class,
which is responsible to determine whether an update to a base class
affects one of its views.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:35:30 +01:00
Duarte Nunes
c35d14e285 column_family: Store a pointer to view
Instead of storing the view in the column_family's map of materialized
views, store a lw_shared_ptr so that the view can be removed while it
is being updated.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:35:30 +01:00
Duarte Nunes
69171c28f0 cql3/util: Fix use-after-free
This patch fixes a use-after-free error in
rename_column_in_where_clause(), where we were creating a boost
adaptor on an rvalue.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-02-06 13:35:30 +01:00
Avi Kivity
3896c27e5f Merge "DNS use in scylla" from Calle
"Fixes #1531

Adds lookup to gms::inet_address and uses it in (hopefully all) the
salient places where configured symbolic names are interpreted.

Removes the dummy dns modula in scylla in favour of the seastar one."

* 'calle/use-dns' of github.com:cloudius-systems/seastar-dev:
  remove scylla dns code
  service::storage_service: Remove depedency on scylla dns
  main.cc: remove scylla dns dependency
  main/init: Lookup inet addresses from config by dns lookup
  db::system_keyspace: Find rpc_address by lookup
  gms::inet_address: Add lookup functionality.
  scylla tls: Add option support for client auth and tls opts
2017-02-06 13:50:42 +02:00
Avi Kivity
da8d00199e Merge 2017-02-06 13:43:07 +02:00
Avi Kivity
fdfabbf8bb Merge seastar upstream
* seastar f07f8ed...83a41c8 (8):
  > Cleaning the metrics API
  > tutorial: pick the name "asynchronous function".
  > tutorial: explain the difference between exception and exception future
  > tutorial: abstract
  > ninja: don't bother building c-ares shared libraries
  > ninja: unbreak build ordering
  > ninja: unbreak "ninja -t clean"
  > Add libtool to dependencies
2017-02-06 13:42:38 +02:00
Calle Wilund
44503f8253 remove scylla dns code
Use seastar facilities instead.
2017-02-06 11:36:57 +00:00
Calle Wilund
ab800c225a service::storage_service: Remove depedency on scylla dns
Use seastar facilities instead
2017-02-06 11:36:57 +00:00
Calle Wilund
c4c4eb06c4 main.cc: remove scylla dns dependency
Use seastar facilities instead.
2017-02-06 11:36:57 +00:00
Avi Kivity
b18e54307f tests: add --operations-per-shard option to perf_simple_query
This helps achieve more repeatable runs that can then be compared via the
Linux perf tool.  The option overrides duration-based testing and runs the
test for a specific number of iterations.
Message-Id: <20170204172937.8462-1-avi@scylladb.com>
2017-02-06 12:08:04 +01:00
Gleb Natapov
3c372525ed storage_proxy: use storage_proxy clock instead of explicit lowres_clock
Merge commit 45b6070832 used butchered version of storage_proxy
patch to adjust to rpc timer change instead the one I've sent. This
patch fixes the differences.

Message-Id: <20170206095237.GA7691@scylladb.com>
2017-02-06 12:51:36 +02:00
Calle Wilund
feffc2bbe1 main/init: Lookup inet addresses from config by dns lookup
I.e. allow symbolic names in addition to ip addresses.
2017-02-06 09:45:37 +00:00
Calle Wilund
ef26ab0e1b db::system_keyspace: Find rpc_address by lookup 2017-02-06 09:45:37 +00:00
Calle Wilund
0a740b5ccf gms::inet_address: Add lookup functionality.
To find addresses by name.
2017-02-06 09:45:37 +00:00
Calle Wilund
ff8f82f21c scylla tls: Add option support for client auth and tls opts
Refs #1813 (fixes scylla part)

Added require_client_auth and priority_string options to
server_encryption_options/client_encryption_options an process them.

Allows TLS method/algo specification. Also enabled enforcing known cert
authentication for both node-to-node and client communication.
2017-02-06 09:45:09 +00:00
Avi Kivity
6e9e28d5a3 cell_locking: work around for missing boost::container::small_vector
small_vector doesn't exist on Ubuntu 14.04's boost, use std::vector
instead.
2017-02-05 20:48:36 +02:00
Avi Kivity
2510b756fc dist: add build dependency on automake
Needed by seastar's c-ares.
2017-02-05 20:16:27 +02:00
Takuya ASADA
e82932b774 dist/common/systemd: introduce scylla-housekeeping restart mode
scylla-housekeeping requires to run 'restart mode' for check the version during
scylla-server restart, which wasn't called on systemd timer so added it.

Existing scylla-housekeeping.timer renamed to scylla-housekeeping-daily.timer,
since it is running 'daily mode'.

Fixes #1953

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1486180031-18093-1-git-send-email-syuu@scylladb.com>
2017-02-05 10:46:04 +02:00
Avi Kivity
4175f40da1 dist: add libtool build dependency for seastar/c-ares 2017-02-05 10:42:53 +02:00
Takuya ASADA
12b5e7288d dist/common/scripts/scylla_setup: show restart message when SELinux was disabled on the script
Disabling SELinux requires server restart, so warn user to restart before
running Scylla.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1485817393-25919-2-git-send-email-syuu@scylladb.com>
2017-02-05 10:10:18 +02:00
Takuya ASADA
c28a574b9e dist/common/scripts: stop setting hugepages boot parameter
Stop setting hugepages boot parameter since we don't use it on default
configuration (posix mode), but keep scylla_bootparam_setup to setup clocksource
on AMI.

Fixes #1758

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1485817393-25919-1-git-send-email-syuu@scylladb.com>
2017-02-05 10:10:18 +02:00
Paweł Dziepak
37b0c71f1d cell_locking: fix parititon_entry::equal_compare
The comparator constructor took schema by value instead of const l-ref
and, consequently, later tried to access object that has been destroyed
long time ago.
Message-Id: <20170202135853.8190-1-pdziepak@scylladb.com>
2017-02-03 19:49:18 +01:00
Avi Kivity
7a00dd6985 Merge "Avoid avalanche of tasks after memtable flush" from Tomasz
"Before, the logic for releasing writes blocked on dirty worked like this:

  1) When region group size changes and it is not under pressure and there
     are some requests blocked, then schedule request releasing task

  2) request releasing task, if no pressure, runs one request and if there are
     still blocked requests, schedules next request releasing task

If requests don't change the size of the region group, then either some request
executes or there is a request releasing task scheduled. The amount of scheduled
tasks is at most 1, there is a single releasing thread.

However, if requests themselves would change the size of the group, then each
such change would schedule yet another request releasing thread, growing the task
queue size by one.

The group size can also change when memory is reclaimed from the groups (e.g.
when contains sparse segments). Compaction may start many request releasing
threads due to group size updates.

Such behavior is detrimental for performance and stability if there are a lot
of blocked requests. This can happen on 1.5 even with modest concurrency
because timed out requests stay in the queue. This is less likely on 1.6 where
they are dropped from the queue.

The releasing of tasks may start to dominate over other processes in the
system. When the amount of scheduled tasks reaches 1000, polling stops and
server becomes unresponsive until all of the released requests are done, which
is either when they start to block on dirty memory again or run out of blocked
requests. It may take a while to reach pressure condition after memtable flush
if it brings virtual dirty much below the threshold, which is currently the
case for workloads with overwrites producing sparse regions.

I saw this happening in a write workload from issue #2021 where the number of
request releasing threads grew into thousands.

Fix by ensuring there is at most one request releasing thread at a time. There
will be one releasing fiber per region group which is woken up when pressure is
lifted. It executes blocked requests until pressure occurs."

* tag 'tgrabiec/lsa-single-threaded-releasing-v2' of github.com:cloudius-systems/seastar-dev:
  tests: lsa: Add test for reclaimer starting and stopping
  tests: lsa: Add request releasing stress test
  lsa: Avoid avalanche releasing of requests
  lsa: Move definitions to .cc
  lsa: Simplify hard pressure notification management
  lsa: Do not start or stop reclaiming on hard pressure
  tests: lsa: Adjust to take into account that reclaimers are run synchronously
  lsa: Document and annotate reclaimer notification callbacks
  tests: lsa: Use with_timeout() in quiesce()
2017-02-02 17:49:31 +02:00
Paweł Dziepak
788892e931 counters: fix build failure on gcc5
Message-Id: <20170202132049.4497-1-pdziepak@scylladb.com>
2017-02-02 14:23:49 +01:00
Piotr Jastrzebski
36b2c4df19 row_cache_test: extend test_mvcc
Make the test execute with and without an active
reader to memtable that's flushed to cache.

This improves the code covarage of MVCC with tests.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <007b6cd1ba7a84ea5675ea82e454bf1adf3b3330.1485954941.git.piotr@scylladb.com>
2017-02-02 13:51:32 +01:00
Tomasz Grabiec
5458a32f13 gdb: Introduce commands for inspecting pending task queue
Message-Id: <1485426236-6627-1-git-send-email-tgrabiec@scylladb.com>
2017-02-02 13:15:17 +02:00
Avi Kivity
000edc36c4 Merge "Counters" from Paweł
"This series introduces support for counters. The implementation of
counters more or less follows the design described on our wiki page [1].
Counter cells contain many shards with replicas being able to modify
and announce new versions only of the shards that they own. Historically,
there were three types of shards: local, remote and global. In these
patches only support for the global ones is added.

[1] https://github.com/scylladb/scylla/wiki/Counters

Currently, counters are only enabled as experimental features as there
still several things that need to be done before they become production
ready. Namely, the performance is expected to be quite poor (especially
for writes), there is no proper tracing support and timed out counter
requests may not be recognized and dropped early. There are also no
counter-related metrics.

However, apart from these problems there are no other missing parts of
counter implementation and they are expected to work correctly.

Fixes #577."

* 'pdziepak/counters/v3-rebased' of github.com:cloudius-systems/seastar-dev: (38 commits)
  perf_simple_query: add counter tables tests
  thrift: add support for counter operations
  cql3: allow counters in CREATE TABLE statements
  cql3: selection: do not panic when seeing counters
  storage_proxy: support counter updates
  storage_proxy: add get_live_endpoints()
  cql3: add counter increment and decrement operations
  db: add operations for applying counter updates
  counters: implement transforming counter deltas to shards
  add infrastructure for locking counter cells
  add fnv1a hasher
  position_in_partition: add feed_hash()
  position_in_partition: add functions for querying object type
  types: make counter_type_impl report its cql3_type
  transport: encode counters as long_type
  mutation_partition: make for_each_cell() accessible outside source file
  messaging_service: add COUNTER_MUTATION verb
  storage_service: add COUNTERS feature
  idl: add idl description of consistency level
  schema: make is_counter() return correct value
  ...
2017-02-02 12:40:09 +02:00
Paweł Dziepak
8671d8329d perf_simple_query: add counter tables tests 2017-02-02 10:35:14 +00:00