Commit Graph

4972 Commits

Author SHA1 Message Date
Duarte Nunes
ded9221187 db/view: Apply tracked tombstones for new updates
When generating view updates for base mutations when no pre-existing
data exists, we were forgetting to apply the tracked tombstones.

Fixes #4321
Tests: unit(dev)

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2019-03-27 12:01:39 +00:00
Glauber Costa
043d102ab6 commitlog: fix typo in error message
maxiumum -> maximum

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190326191108.7573-1-glauber@scylladb.com>
2019-03-26 21:32:56 +02:00
Avi Kivity
f259a4c3b4 Merge "Remove usage of static gossiper object in init.cc and storage_service" from Asias
"
This series removes the usage of the static gossiper object in init.cc
and storage_service.

Follow up series will remove more in other components. This is the
effort to clean up the component dependencies and have better shutdown
procedure.

Tests: tests/gossip_test, tests/cql_query_test, tests/sstable_mutation_test,  dtests.
"

* tag 'asias/storage_service_gossiper_dep_v5' of github.com:cloudius-systems/seastar-dev:
  storage_service: Do not use the global gms::get_local_gossiper()
  storage_service: Pass gossiper object to storage_service
  gms: Remove i_failure_detector.hh
  gossip: Get rid of the gms::get_local_failure_detector static object
  dht: Do not use failure_detector::is_alive in failure_detector_source_filter
  tests: Fix stop snitch in gossip_test.cc
  gossiper: Do not use value_factory from storage_service object
  gossiper: Use cfg options from _cfg instead of get_local_storage_service
  gossiper: Pass db::config object to gossiper class
  init: Pass gossiper object to init_ms_fd_gossiper
2019-03-26 08:54:46 +02:00
Avi Kivity
a7520c0ba9 Merge "Turn cql3_type into a trivial wrapper over data_type" from Rafael
"
Both cql3_type and abstract_type are normally used inside
shared_ptr. This creates a problem when an abstract_type needs to refer
to a cql3_type as that creates a cycle.

To avoid warnings from asan, we were using a std::unordered_map to
store one of the edges of the cycle. This avoids the warning, but
wastes even more memory.

Even before this series cql3_type was a fairly light weight
structure. This patch pushes in that direction and now cql3_type is a
struct with a single member variable, a data_type.

This avoids the reference cycle and is easier to understand IMHO.

The one corner case is varchar. In the old system cql3_type::varchar
and cql3_type::text don't compare equal, but they both map to the same
data_type.

In the new system they would compare equal, so we avoid the confusion
by just removing the cql3_type::varchar variable.

Tests: unit (dev)
"

* 'espindola/merge-cq3-type-and-type-v3' of https://github.com/espindola/scylla:
  Turn cql3_type into a trivial wrapper over data_type
  Delete cql3_type::varchar
  Simplify db::cql_type_parser::parse
  Add a test for the varchar column representation
2019-03-25 15:03:16 +02:00
Tomasz Grabiec
80020118d0 Merge "Fix a couple of bugs related to large entry deletion" from Rafael
The crash observed in issue #4335 happens because
delete_large_data_entries is passed a deleted name.

Normally we don't get a crash, but a garbage name and we fail to
delete entries from system.large_*.

Adding a test for the fix found another issue that the second patch
is this series fixes.

Tests: unit (dev)

Fixes #4335.

* https://github.com/espindola/scylla guthub/fix-use-after-free-v4:
  large_data_handler: Fix a use after destruction
  large_data_handler: Make a variable non static
  Allow large_data_handler to be stopped twice
  Allow table to be stopped twice
  Test that large data entries are deleted
2019-03-25 10:37:36 +01:00
Duarte Nunes
93a1c27b31 service/storage_proxy: Don't consider view hints for MV backpressure
When a view replica becomes unavailable, updates to it are stored as
hints at the paired based replica. This on-disk queue of pending view
updates grows as long as there are view updated and the view replica
remains unavailable. Currently, we take that relative queue size into
account when calculating the delay for new base writes, in the context
of the backpressure algorithm for materialized views.

However, the way we're calculating that on-disk backlog is wrong,
since we calculate it per-device and then feed it to all the hints
managers for that device. This means that normal hints will show up as
backlog for the view hints manager, which in turn introduces delays.
This can make the view backpressure mechanism kick-in even if the
cluster uses no materialized views.

There's yet another way in which considering the view hints backlog is
wrong: a view replica that is unavailable for some period of time can
cause the backlog to grow to a point where all base writes are applied
the maximum delay of 1 second. This turns a single-node failure into
cluster unavailability.

The fix to both issues is to simply not take this on-disk backlog into
account for the backpressure algorithm.

Fixes #4351
Fixes #4352

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190321170418.25953-1-duarte@scylladb.com>
2019-03-24 20:29:56 +02:00
Asias He
af579a055b gossip: Get rid of the gms::get_local_failure_detector static object
Store the failure_detector object inside gossiper object.

- No more the global object sharded<failure_detector>

- No need to initialize sharded<failure_detector> manually which
simplifies the code in tests/cql_test_env.cc and init.cc.
2019-03-22 09:08:51 +08:00
Rafael Ávila de Espíndola
c8da28a3eb Allow large_data_handler to be stopped twice
This will be used in a testcase.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-21 10:47:23 -07:00
Avi Kivity
a9cf07369f Merge "Add local indexes" from Piotr
"
This series adds support for local indexing, i.e. when the index table
resides on the same partition as base data.
It addresses the performance issue of having an indexed query
that also specifies a partition key - index will be queried
locally.
"

* 'add_local_indexing_11' of https://github.com/psarna/scylla: (30 commits)
  tests: add cases for local index prefix optimization
  tests: add create/drop local index test case
  tests: add non-standard names cases to local index tests
  tests: add multi pk case for local index tests
  tests: add test for malformed local index definitions
  tests: add local index paging test
  tests: add local indexing test
  cql3: add CREATE INDEX syntax for local indexes
  cql3: use serialization function to create index target string
  index: add serialization function for index targets
  index: use proper local index target when adding index
  index: add parsing target column name from local index targets
  db: add checking for local index in schema tables
  index: add checking if serialized target implies local index
  index: enable parsing multi-key targets
  index: move target parser code to .cc file
  json: add non-throwing overload for to_json_value
  cql3: add checking for local indexes in has_supporting_index()
  cql3: move finding index restrictions to prepare stage
  cql3: add picking an index by score
  ...
2019-03-21 12:46:00 -03:00
Rafael Ávila de Espíndola
53ab298957 Turn cql3_type into a trivial wrapper over data_type
Both cql3_type and abstract_type are normally used inside
shared_ptr. This creates a problem when an abstract_type needs to refer
to a cql3_type as that creates a cycle.

To avoid warnings from asan, we were using a std::unordered_map to
store one of the edges of the cycle. This avoids the warning, but
wastes even more memory.

Even before this patch cql3_type was a fairly light weight
structure. This patch pushes in that direction and now cql3_type is a
struct with a single member variable, a data_type.

This avoids the reference cycle and is easier to understand IMHO.

Tests: unit (dev)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-20 14:10:28 -07:00
Rafael Ávila de Espíndola
7f64a6ec4b Simplify db::cql_type_parser::parse
Since its first version, db::cql_type_parser::parse had special cases
for native and user defined types.

Those are not necessary, as the general parser has no problem handling
them.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-20 12:44:31 -07:00
Rafael Ávila de Espíndola
8d9baf9843 large_data_handler: Make a variable non static
The value computed is not static since
f254664fe6, but unfortunately that was
missed in that commit.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-20 09:31:21 -07:00
Rafael Ávila de Espíndola
e7749e7aee large_data_handler: Fix a use after destruction
The path leading to the issue was:

The sstable name is allocated and passed to maybe_delete_large_data_entries by reference

   auto name = sst->get_filename();
   return large_data_handler.maybe_delete_large_data_entries(*sst->get_schema(), name, sst->data_size());

A future is created with a reference to it

  large_partitions = with_sem([&s, &filename, this] {
     return delete_large_data_entries(s, filename, db::system_keyspace::LARGE_PARTITIONS);
  });

The semaphore blocks.

The filename is destroyed.

delete_large_data_entries is called with a destroyed filename.

The reason this did not reproduce trivially in a debug build was that
the sstable itself was in the stack and the destructed value was read
as an internal value, and so asan had nothing to complain about.

Unfortunately we also had no tests that the entry in
system.large_rows was actually deleted.

This patch passes the name by value. It might create up to 3 copies of
it. If that is too inefficient it can probably be avoided with a
do_with in maybe_delete_large_data_entries.

Fixes #4335

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-20 09:30:42 -07:00
Nadav Har'El
7c874057f5 materialized_views: propagate "view virtual columns" between nodes
db::schema_tables::ALL and db::schema_tables::all_tables() are both supposed
to list the same schema tables - the former is the list of their names, and
the latter is the list of their schemas. This code duplication makes it easy
to forget to update one of them, and indeed recently the new
"view_virtual_columns" was added to all_tables() but not to ALL.

What this patch does is to make ALL a function instead of constant vector.
The newly named all_table_names() function uses all_tables() so the list
of schema tables only appears once.

So that nobody worries about the performance impact, all_table_names()
caches the list in a per-thread vector that is only prepared once per thread.

Because after this patch all_table_names() has the "view_virtual_columns"
that was previously missing, this patch also fixes #4339, which was about
virtual columns in materialized views not being propagated to other nodes.

Unfortunately, to test the fix for #4339 we need a test with multiple
nodes, so we cannot test it here in a unit test, and will instead use
the dtest framework, in a separate patch.

Fixes #4339

Branches: 3.0
Tests: all unit tests (release and debug mode), new dtest for #4339. The unit test mutation_reader_test failed in debug mode but not in release mode, but this probably has nothing to do with this patch (?).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Message-Id: <20190320063437.32731-1-nyh@scylladb.com>
2019-03-20 09:14:59 -03:00
Piotr Sarna
e0d7807eed db: add checking for local index in schema tables
Based on which targets the index has, it will be either local
or global - local indexes have their full base partition key
embedded in their targets.
2019-03-20 10:20:24 +01:00
Piotr Sarna
90d47ca183 schema: add is_local_index cached value to index metadata
In order to quickly distinguish global indexes from local ones,
a cached boolean value is introduced.
2019-03-20 09:51:46 +01:00
Piotr Sarna
a7602bd2f1 database: add global view update stats
Currently view update metrics are only per-table, but per-table metrics
are not always enabled. In order to be able to see the number of
generated view updates in all cases, global stats are added.

Fixes #4221
Message-Id: <e94c27c530b2d7d262f76d03937e7874d674870a.1552552016.git.sarna@scylladb.com>
2019-03-14 12:04:18 +00:00
Rafael Ávila de Espíndola
63251b66c1 db: Record large cells
Fixes #4234.

Large cells are now recorded in system.large_cells.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
d17083b483 Create a system.large_cells table
This is analogous to the system.large_rows table, but holds individual
cells, so it also needs the column name.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
8b4ae95168 large_data_handler: Run large data recording in parallel
With this changes the futures returned by large_data_handler will not
normally wait for entries to be written to system.large_rows or
system.large_partitions.

We use a semaphore to bound how behind system.large_* table updates
can get.

This should avoid delaying sstables writes in the common case, which
is more relevant once we warn of large cells since the the default
threshold will be just 1MB.

Note that there is no ordering between the various maybe_record_* and
maybe_delete_large_data_entries requests. This means that we can end
up with a stale entry that is only removed once the TTL expires.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
54b856e5e4 large_data_handler: propagate a future out of stop()
stop() will close a semaphore in a followup patch, so it needs to return a
future.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
989ab33507 large_data_handler: Remove const from a few functions
These will use a member semaphore variable in a followup patch, so they
cannot be const.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
5fcb3ff2d7 db: don't use _stopped directly
This gives flexibility in how it is implemented.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
a17a936882 large_data_handler: assert it is not used after stop()
This should have been changed in the patch

db: stop the commit log after the tables during shutdown

But unfortunately I missed it then.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
f3089bf3d1 db: refactor a try_record helper
We had almost identical error handling for large_partitions and
large_rows. Refactor in preparation for large_cells.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:02 -07:00
Rafael Ávila de Espíndola
d7f263d334 db: Rename (maybe_)?update_large_partitions
This renames it to record_large_partitions, which matches
record_large_rows. It also changes the signature to be closer to
record_large_rows.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:16:04 -07:00
Rafael Ávila de Espíndola
f254664fe6 db: refactor large data deletion code
The code for deleting entries from system.large_partitions was almost
a duplicate from the code for deleting entries from system.large_rows.

This patch unifies the two, which also improves the error message when
we fail to delete entries from system.large_partitions.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:16:04 -07:00
Rafael Ávila de Espíndola
16ed9a2574 db: stop the commit log after the tables during shutdown
This allows for system.large_partitions to be updated if a large
partition is found while writing the last sstables.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-05 18:04:51 -08:00
Avi Kivity
026821fb59 Merge "Record large rows in the system.large_rows table" from Rafael
"
This fixes #3988.

We already have a system.large_partitions, but only a warning for
large rows. These patches close the gap by also recording large rows
into a new system.large_rows.
"

* 'espindola/large-row-add-table-v6' of https://github.com/espindola/scylla:
  Add a testcase for large rows
  Populate system.large_rows.
  Create a system.large_rows table
  Extract a key_to_str helper
  Don't call record_large_rows if stopped
  Add a delete_large_rows_entries method to large_data_handler
  db::large_data_handler::(maybe_)?record_large_rows: Return future<> instead of void
  Rename maybe_delete_large_partitions_entry
  Rename log_large_row to record_large_rows
  Rename maybe_log_large_row to maybe_record_large_rows
2019-03-04 18:31:10 +02:00
Avi Kivity
da0a25859b Merge "Improvements to commitlog logs" from Paweł
"
This series contains minor improvements to commitlog log messages that
have helped investigating #4231, but are not specific to that bug.
"

* tag 'improve-commitlog-logs/v1' of https://github.com/pdziepak/scylla:
  commitlog: use consistent chunk offsets in logs
  commitlog: provide more information in logs
  commitlog: remove unnecessary comment
2019-03-04 14:52:46 +02:00
Paweł Dziepak
00b33de25c commitlog: use consistent chunk offsets in logs
Logs in commitlog writer use offset in the file of the chunk header to
identify chunks. However, the replayer is using offset after the header
for the same purpose. This causes unnecessary confusion suggesting that
the replayer is reading at the wrong position.

This patch changes the replayer so that it reports chunk header offsets.
2019-03-04 12:15:50 +00:00
Paweł Dziepak
813b00a1a6 commitlog: provide more information in logs
This commits adds some more information to the logs. Motivated, by
experiences with investigating #4231.

 * size of each write
 * position of each write
 * log message for final write
2019-03-04 12:15:50 +00:00
Paweł Dziepak
1a657e9c5f commitlog: remove unnecessary comment 2019-03-04 12:15:50 +00:00
Paweł Dziepak
434023425d commitlog: write the correct buffer size
Commitlog files contain multiple chunks. Each chunk starts as a single
(possibly, fragmented buffer). The size of that buffer in memory may be
larger than the size in the file.

cycle() was incorrectly using the in-memory size to write the whole
buffer to the file. That sometimes caused data corruption, since a
smaller on-file size was used to compute the offset of the next chunk
and there could be multiple chunk writes happening at the same time.

This patch solves the issue by ensuring that only the actual on-file
size of the chunk is written.
2019-03-04 10:25:48 +00:00
Piotr Sarna
5f85a7a821 db,view: fix virtual columns liveness checks
When looking for optimization paths, columns selected in a view
are checked against multiple conditions - unfortunately virtual
columns were erroneously skipped from that check, which resulted
in ignoring their TTLs. That can lead to overoptimizing
and not including vital liveness info into view rows,
which can then result in row disappearing too early.
2019-02-28 10:47:19 +01:00
Rafael Ávila de Espíndola
25f81cf3e3 Populate system.large_rows.
It now records large rows when they are first written to an sstable
and removes them when the sstable is deleted.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:56:42 -08:00
Rafael Ávila de Espíndola
66d8a0cf93 Create a system.large_rows table
This is analogous to the system.large_partitions table, but holds
individual rows, so it also needs the clustering key of the large
rows.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
da4c0da78a Extract a key_to_str helper
It will be used in more places in a followup patch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
b7fd03d0fd Don't call record_large_rows if stopped
The implementations large_data_handler should only be called if
large_data_handler hasn't been stopped yet.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
0c401f56f8 Add a delete_large_rows_entries method to large_data_handler
This will be responsible for removing large rows from
system.large_rows.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
81a21ea425 db::large_data_handler::(maybe_)?record_large_rows: Return future<> instead of void
These functions will record into tables in a followup patch, so they
will need to return a future.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
d4c001cba8 Rename maybe_delete_large_partitions_entry
It will also delete large rows, so rename it to
maybe_delete_large_data_entries.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
e9a13aff90 Rename log_large_row to record_large_rows
It will also record into a table in a followup patch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
6fb7066755 Rename maybe_log_large_row to maybe_record_large_rows
It will also record into a table in a followup patch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Avi Kivity
5f94bc902a transport: add option to disable shard-aware drivers
The shard-aware drivers can cause a huge amount of connections to be created
when there are tens of thousands of clients. While normally the shard-aware
drivers are beneficial, in those cases they can consume too much memory.

Provide an option to disable shard awareness from the server (it is likely to
be easier to do this on the server than to reprovision those thousands of
clients).

Tests: manual test with wireshark.
Message-Id: <20190223173331.24424-1-avi@scylladb.com>
2019-02-26 12:44:11 +01:00
Benny Halevy
13ffda5c31 database: maybe_delete_large_partitions_entry: do not access sstable and do not mask exceptions
1. We would like to be able to call maybe_delete_large_partitions_entry
from the sstable destructor path in the future so the sstable might go away
while the large data entries are being deleted.

2. We would like the caller to handle any exception on this path,
especially in the prepatation part, before calling delete_large_partitions_entry().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 10:44:02 +02:00
Tomasz Grabiec
8687666169 schema_tables: Add trace-level logging of schema mutations
Can be useful in diagnosing problems with application of schema
mutations.

do_merge_schema() is called on every change of schema of the local
node.

create_table_from_mutations() is called on schema merge when a table
was altered or created using mutations read from local schema tables
after applying the change, or when loading schema on boot.

Message-Id: <20190221093929.8929-2-tgrabiec@scylladb.com>
2019-02-21 12:16:38 +02:00
Avi Kivity
9adfd11374 Merge "Avoid including cryptopp headers" from Rafael
"
cryptopp's config.h has the following pragma:

 #pragma GCC diagnostic ignored "-Wunused-function"

It is not wrapped in a push/pop. Because of that, including cryptopp
headers disables that warning on scylla code too.

This patch series introduces a single .cc file that has to include
cryptopp headers.
"

* 'avoid-cryptopp-v3' of https://github.com/espindola/scylla:
  Avoid including cryptopp headers
  Delete dead code
2019-02-21 10:31:20 +02:00
Rafael Ávila de Espíndola
fd5ea2df5a Avoid including cryptopp headers
cryptopp's config.h has the following pragma:

 #pragma GCC diagnostic ignored "-Wunused-function"

It is not wrapped in a push/pop. Because of that, including cryptopp
headers disables that warning on scylla code too.

The issue has been reported as
https://github.com/weidai11/cryptopp/issues/793

To work around it, this patch uses a pimpl to have a single .cc file
that has to include cryptopp headers.

While at it, it also reduces the differences and code duplication
between the md5 and sha1 hashers.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-20 08:03:46 -08:00
Piotr Sarna
bd52e05ae2 view: minimize generated view updates for unselected columns
In some cases generating view updates for columns that were not
selected in CREATE VIEW statement is redundant - it is the case
when the update will not influence row liveness in anyway.
Currently, these cases are optimized out:
 - row marker is live and only unselected columns were updated;
 - row marked is not live and only unselected columns were updated,
   and in the process nothing was created or deleted and there was
   no TTL involved;
2019-02-20 14:05:27 +01:00