Commit Graph

1311 Commits

Author SHA1 Message Date
Asias He
af579a055b gossip: Get rid of the gms::get_local_failure_detector static object
Store the failure_detector object inside gossiper object.

- No more the global object sharded<failure_detector>

- No need to initialize sharded<failure_detector> manually which
simplifies the code in tests/cql_test_env.cc and init.cc.
2019-03-22 09:08:51 +08:00
Avi Kivity
a9cf07369f Merge "Add local indexes" from Piotr
"
This series adds support for local indexing, i.e. when the index table
resides on the same partition as base data.
It addresses the performance issue of having an indexed query
that also specifies a partition key - index will be queried
locally.
"

* 'add_local_indexing_11' of https://github.com/psarna/scylla: (30 commits)
  tests: add cases for local index prefix optimization
  tests: add create/drop local index test case
  tests: add non-standard names cases to local index tests
  tests: add multi pk case for local index tests
  tests: add test for malformed local index definitions
  tests: add local index paging test
  tests: add local indexing test
  cql3: add CREATE INDEX syntax for local indexes
  cql3: use serialization function to create index target string
  index: add serialization function for index targets
  index: use proper local index target when adding index
  index: add parsing target column name from local index targets
  db: add checking for local index in schema tables
  index: add checking if serialized target implies local index
  index: enable parsing multi-key targets
  index: move target parser code to .cc file
  json: add non-throwing overload for to_json_value
  cql3: add checking for local indexes in has_supporting_index()
  cql3: move finding index restrictions to prepare stage
  cql3: add picking an index by score
  ...
2019-03-21 12:46:00 -03:00
Nadav Har'El
7c874057f5 materialized_views: propagate "view virtual columns" between nodes
db::schema_tables::ALL and db::schema_tables::all_tables() are both supposed
to list the same schema tables - the former is the list of their names, and
the latter is the list of their schemas. This code duplication makes it easy
to forget to update one of them, and indeed recently the new
"view_virtual_columns" was added to all_tables() but not to ALL.

What this patch does is to make ALL a function instead of constant vector.
The newly named all_table_names() function uses all_tables() so the list
of schema tables only appears once.

So that nobody worries about the performance impact, all_table_names()
caches the list in a per-thread vector that is only prepared once per thread.

Because after this patch all_table_names() has the "view_virtual_columns"
that was previously missing, this patch also fixes #4339, which was about
virtual columns in materialized views not being propagated to other nodes.

Unfortunately, to test the fix for #4339 we need a test with multiple
nodes, so we cannot test it here in a unit test, and will instead use
the dtest framework, in a separate patch.

Fixes #4339

Branches: 3.0
Tests: all unit tests (release and debug mode), new dtest for #4339. The unit test mutation_reader_test failed in debug mode but not in release mode, but this probably has nothing to do with this patch (?).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Message-Id: <20190320063437.32731-1-nyh@scylladb.com>
2019-03-20 09:14:59 -03:00
Piotr Sarna
e0d7807eed db: add checking for local index in schema tables
Based on which targets the index has, it will be either local
or global - local indexes have their full base partition key
embedded in their targets.
2019-03-20 10:20:24 +01:00
Piotr Sarna
90d47ca183 schema: add is_local_index cached value to index metadata
In order to quickly distinguish global indexes from local ones,
a cached boolean value is introduced.
2019-03-20 09:51:46 +01:00
Piotr Sarna
a7602bd2f1 database: add global view update stats
Currently view update metrics are only per-table, but per-table metrics
are not always enabled. In order to be able to see the number of
generated view updates in all cases, global stats are added.

Fixes #4221
Message-Id: <e94c27c530b2d7d262f76d03937e7874d674870a.1552552016.git.sarna@scylladb.com>
2019-03-14 12:04:18 +00:00
Rafael Ávila de Espíndola
63251b66c1 db: Record large cells
Fixes #4234.

Large cells are now recorded in system.large_cells.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
d17083b483 Create a system.large_cells table
This is analogous to the system.large_rows table, but holds individual
cells, so it also needs the column name.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
8b4ae95168 large_data_handler: Run large data recording in parallel
With this changes the futures returned by large_data_handler will not
normally wait for entries to be written to system.large_rows or
system.large_partitions.

We use a semaphore to bound how behind system.large_* table updates
can get.

This should avoid delaying sstables writes in the common case, which
is more relevant once we warn of large cells since the the default
threshold will be just 1MB.

Note that there is no ordering between the various maybe_record_* and
maybe_delete_large_data_entries requests. This means that we can end
up with a stale entry that is only removed once the TTL expires.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
54b856e5e4 large_data_handler: propagate a future out of stop()
stop() will close a semaphore in a followup patch, so it needs to return a
future.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
989ab33507 large_data_handler: Remove const from a few functions
These will use a member semaphore variable in a followup patch, so they
cannot be const.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
5fcb3ff2d7 db: don't use _stopped directly
This gives flexibility in how it is implemented.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
a17a936882 large_data_handler: assert it is not used after stop()
This should have been changed in the patch

db: stop the commit log after the tables during shutdown

But unfortunately I missed it then.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
f3089bf3d1 db: refactor a try_record helper
We had almost identical error handling for large_partitions and
large_rows. Refactor in preparation for large_cells.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:02 -07:00
Rafael Ávila de Espíndola
d7f263d334 db: Rename (maybe_)?update_large_partitions
This renames it to record_large_partitions, which matches
record_large_rows. It also changes the signature to be closer to
record_large_rows.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:16:04 -07:00
Rafael Ávila de Espíndola
f254664fe6 db: refactor large data deletion code
The code for deleting entries from system.large_partitions was almost
a duplicate from the code for deleting entries from system.large_rows.

This patch unifies the two, which also improves the error message when
we fail to delete entries from system.large_partitions.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:16:04 -07:00
Rafael Ávila de Espíndola
16ed9a2574 db: stop the commit log after the tables during shutdown
This allows for system.large_partitions to be updated if a large
partition is found while writing the last sstables.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-05 18:04:51 -08:00
Avi Kivity
026821fb59 Merge "Record large rows in the system.large_rows table" from Rafael
"
This fixes #3988.

We already have a system.large_partitions, but only a warning for
large rows. These patches close the gap by also recording large rows
into a new system.large_rows.
"

* 'espindola/large-row-add-table-v6' of https://github.com/espindola/scylla:
  Add a testcase for large rows
  Populate system.large_rows.
  Create a system.large_rows table
  Extract a key_to_str helper
  Don't call record_large_rows if stopped
  Add a delete_large_rows_entries method to large_data_handler
  db::large_data_handler::(maybe_)?record_large_rows: Return future<> instead of void
  Rename maybe_delete_large_partitions_entry
  Rename log_large_row to record_large_rows
  Rename maybe_log_large_row to maybe_record_large_rows
2019-03-04 18:31:10 +02:00
Avi Kivity
da0a25859b Merge "Improvements to commitlog logs" from Paweł
"
This series contains minor improvements to commitlog log messages that
have helped investigating #4231, but are not specific to that bug.
"

* tag 'improve-commitlog-logs/v1' of https://github.com/pdziepak/scylla:
  commitlog: use consistent chunk offsets in logs
  commitlog: provide more information in logs
  commitlog: remove unnecessary comment
2019-03-04 14:52:46 +02:00
Paweł Dziepak
00b33de25c commitlog: use consistent chunk offsets in logs
Logs in commitlog writer use offset in the file of the chunk header to
identify chunks. However, the replayer is using offset after the header
for the same purpose. This causes unnecessary confusion suggesting that
the replayer is reading at the wrong position.

This patch changes the replayer so that it reports chunk header offsets.
2019-03-04 12:15:50 +00:00
Paweł Dziepak
813b00a1a6 commitlog: provide more information in logs
This commits adds some more information to the logs. Motivated, by
experiences with investigating #4231.

 * size of each write
 * position of each write
 * log message for final write
2019-03-04 12:15:50 +00:00
Paweł Dziepak
1a657e9c5f commitlog: remove unnecessary comment 2019-03-04 12:15:50 +00:00
Paweł Dziepak
434023425d commitlog: write the correct buffer size
Commitlog files contain multiple chunks. Each chunk starts as a single
(possibly, fragmented buffer). The size of that buffer in memory may be
larger than the size in the file.

cycle() was incorrectly using the in-memory size to write the whole
buffer to the file. That sometimes caused data corruption, since a
smaller on-file size was used to compute the offset of the next chunk
and there could be multiple chunk writes happening at the same time.

This patch solves the issue by ensuring that only the actual on-file
size of the chunk is written.
2019-03-04 10:25:48 +00:00
Piotr Sarna
5f85a7a821 db,view: fix virtual columns liveness checks
When looking for optimization paths, columns selected in a view
are checked against multiple conditions - unfortunately virtual
columns were erroneously skipped from that check, which resulted
in ignoring their TTLs. That can lead to overoptimizing
and not including vital liveness info into view rows,
which can then result in row disappearing too early.
2019-02-28 10:47:19 +01:00
Rafael Ávila de Espíndola
25f81cf3e3 Populate system.large_rows.
It now records large rows when they are first written to an sstable
and removes them when the sstable is deleted.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:56:42 -08:00
Rafael Ávila de Espíndola
66d8a0cf93 Create a system.large_rows table
This is analogous to the system.large_partitions table, but holds
individual rows, so it also needs the clustering key of the large
rows.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
da4c0da78a Extract a key_to_str helper
It will be used in more places in a followup patch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
b7fd03d0fd Don't call record_large_rows if stopped
The implementations large_data_handler should only be called if
large_data_handler hasn't been stopped yet.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
0c401f56f8 Add a delete_large_rows_entries method to large_data_handler
This will be responsible for removing large rows from
system.large_rows.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
81a21ea425 db::large_data_handler::(maybe_)?record_large_rows: Return future<> instead of void
These functions will record into tables in a followup patch, so they
will need to return a future.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
d4c001cba8 Rename maybe_delete_large_partitions_entry
It will also delete large rows, so rename it to
maybe_delete_large_data_entries.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
e9a13aff90 Rename log_large_row to record_large_rows
It will also record into a table in a followup patch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
6fb7066755 Rename maybe_log_large_row to maybe_record_large_rows
It will also record into a table in a followup patch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Avi Kivity
5f94bc902a transport: add option to disable shard-aware drivers
The shard-aware drivers can cause a huge amount of connections to be created
when there are tens of thousands of clients. While normally the shard-aware
drivers are beneficial, in those cases they can consume too much memory.

Provide an option to disable shard awareness from the server (it is likely to
be easier to do this on the server than to reprovision those thousands of
clients).

Tests: manual test with wireshark.
Message-Id: <20190223173331.24424-1-avi@scylladb.com>
2019-02-26 12:44:11 +01:00
Benny Halevy
13ffda5c31 database: maybe_delete_large_partitions_entry: do not access sstable and do not mask exceptions
1. We would like to be able to call maybe_delete_large_partitions_entry
from the sstable destructor path in the future so the sstable might go away
while the large data entries are being deleted.

2. We would like the caller to handle any exception on this path,
especially in the prepatation part, before calling delete_large_partitions_entry().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 10:44:02 +02:00
Tomasz Grabiec
8687666169 schema_tables: Add trace-level logging of schema mutations
Can be useful in diagnosing problems with application of schema
mutations.

do_merge_schema() is called on every change of schema of the local
node.

create_table_from_mutations() is called on schema merge when a table
was altered or created using mutations read from local schema tables
after applying the change, or when loading schema on boot.

Message-Id: <20190221093929.8929-2-tgrabiec@scylladb.com>
2019-02-21 12:16:38 +02:00
Avi Kivity
9adfd11374 Merge "Avoid including cryptopp headers" from Rafael
"
cryptopp's config.h has the following pragma:

 #pragma GCC diagnostic ignored "-Wunused-function"

It is not wrapped in a push/pop. Because of that, including cryptopp
headers disables that warning on scylla code too.

This patch series introduces a single .cc file that has to include
cryptopp headers.
"

* 'avoid-cryptopp-v3' of https://github.com/espindola/scylla:
  Avoid including cryptopp headers
  Delete dead code
2019-02-21 10:31:20 +02:00
Rafael Ávila de Espíndola
fd5ea2df5a Avoid including cryptopp headers
cryptopp's config.h has the following pragma:

 #pragma GCC diagnostic ignored "-Wunused-function"

It is not wrapped in a push/pop. Because of that, including cryptopp
headers disables that warning on scylla code too.

The issue has been reported as
https://github.com/weidai11/cryptopp/issues/793

To work around it, this patch uses a pimpl to have a single .cc file
that has to include cryptopp headers.

While at it, it also reduces the differences and code duplication
between the md5 and sha1 hashers.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-20 08:03:46 -08:00
Piotr Sarna
bd52e05ae2 view: minimize generated view updates for unselected columns
In some cases generating view updates for columns that were not
selected in CREATE VIEW statement is redundant - it is the case
when the update will not influence row liveness in anyway.
Currently, these cases are optimized out:
 - row marker is live and only unselected columns were updated;
 - row marked is not live and only unselected columns were updated,
   and in the process nothing was created or deleted and there was
   no TTL involved;
2019-02-20 14:05:27 +01:00
Piotr Sarna
dbe8491655 view: cache is_index for view pointer
It's detrimental to keep querying index manager whether a view
is backing a secondary index every time, so this value is cached
at construct time.
At the same time, this value is not simply passed to view_info
when being created in secondary index manager, in order to
decouple materialized view logic from secondary indexes as much as
possible (the sole existence of is_index() is bad enough).
2019-02-20 12:52:32 +01:00
Nadav Har'El
05db7d8957 Materialized views: name the "batch_memory_max" constant
Give the constant 1024*1024 introduced in an earlier commit a name,
"batch_memory_max", and move it from view.cc to view_builder.hh.
It now resides next to the pre-existing constant that controlled how
many rows were read in each build step, "batch_size".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190217100222.15673-1-nyh@scylladb.com>
2019-02-17 13:28:16 +00:00
Rafael Ávila de Espíndola
9cd14f2602 Don't write to system.large_partition during shutdown
The included testcase used to crash because during database::stop() we
would try to update system.large_partition.

There doesn't seem to be an order we can stop the existing services in
cql_test_env that makes this possible.

This patch then adds another step when shutting down a database: first
stop updating system.large_partition.

This means that during shutdown any memtable flush, compaction or
sstable deletion will not be reflected in system.large_partition. This
is hopefully not too bad since the data in the table is TTLed.

This seems to impact only tests, since main.cc calls _exit directly.

Tests: unit (release,debug)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190213194851.117692-1-espindola@scylladb.com>
2019-02-15 10:49:10 +01:00
Gleb Natapov
0b84b04f97 consistency_level: make it more const correct
Message-Id: <20190214122631.GF19055@scylladb.com>
2019-02-14 14:52:51 +02:00
Nadav Har'El
fec562ec8f Materialized views: limit size of row batching during bulk view building
The bulk materialized-view building processes (when adding a materialized
view to a table with existing data) currently reads the base table in
batches of 128 (view_builder::batch_size) rows. This is clearly better
than reading entire partitions (which may be huge), but still, 128 rows
may grow pretty large when we have rows with large strings or blobs,
and there is no real reason to buffer 128 rows when they are large.

Instead, when the rows we read so far exceed some size threshold (in this
patch, 1MB), we can operate on them immediately instead of waiting for
128.

As a side-effect, this patch also solves another bug: At worst case, all
the base rows of one batch may be written into one output view partition,
in one mutation. But there is a hard limit on the size of one mutation
(commitlog_segment_size_in_mb, by default 32MB), so we cannot allow the
batch size to exceed this limit. By not batching further after 1MB,
we avoid reaching this limit when individual rows do not reach it but
128 of them did.

Fixes #4213.

This patch also includes a unit test reproducing #4213, and demonstrating
that it is now solved.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190214093424.7172-1-nyh@scylladb.com>
2019-02-14 12:04:40 +02:00
Calle Wilund
e70286a849 db/extensions: Allow schema extensions to turn themselves off
Fixes #4222

Iff an extension creation callback returns null (not exception)
we treat this as "I'm not needed" and simply ignore it.

Message-Id: <20190213124311.23238-1-calle@scylladb.com>
2019-02-13 14:50:51 +02:00
Calle Wilund
4e657c0633 system_keyspace: Add waitable for trunc. migration
For tests. Hooray for separation of concern.
2019-02-13 09:08:12 +00:00
Calle Wilund
64e8c6f31d storage_service: Add features disabling for tests 2019-02-13 09:08:12 +00:00
Calle Wilund
12ebcf1ec7 commitlog_replay: Use dedicated table for truncation
Fixes #4083

Instead of sharded collection in system.local, use a
dedicated system table (system.truncated) to store
truncation positions. Makes query/update easier
and easier on the query memory.

The code also migrates any existing truncation
positions on startup and clears the old data.
2019-02-13 09:08:12 +00:00
Calle Wilund
4a52ed7884 commitlog: Accept recycled (not yet re-used) segments in replay
Refs #4085

Changes commitlog descriptor to both accept "Recycled-Commitlog..."
file names, and preserve said name in the descriptor.

This ensures we pick up the not-yet-used recycled segments left
from a crash for replay. The replay in turn will simply ignore
the recycled files, and post actual replay they will be deleted
as needed.

Message-Id: <20190129123311.16050-1-calle@scylladb.com>
2019-02-12 12:23:55 +02:00
Glauber Costa
e0bfd1c40a allow Cassandra SSTables with counters to be imported if they are new enough
Right now Cassandra SSTables with counters cannot be imported into
Scylla.  The reason for that is that Cassandra changed their counter
representation in their 2.1 version and kept transparently supporting
both representations.  We do not support their old representation, nor
there is a sane way to figure out by looking at the data which one is in
use.

For safety, we had made the decision long ago to not import any
tables with counters: if a counter was generated in older Cassandra, we
would misrepresent them.

In this patch, I propose we offer a non-default way to import SSTables
with counters: we can gate it with a flag, and trust that the user knows
what they are doing when flipping it (at their own peril). Cassandra 2.1
is by now pretty old. many users can safely say they've never used
anything older.

While there are tools like sstableloader that can be used to import
those counters, there are often situations in which directly importing
SSTables is either better, faster, or worse: the only option left.  I
argue that having a flag that allow us to import them when we are sure
it is safe is better than having no option at all.

With this patch I was able to successfully import Cassandra tables with
counters that were generated in Cassandra 2.1, reshard and compact their
SSTables, and read the data back to get the same values in Scylla as in
Cassandra.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190210154028.12472-1-glauber@scylladb.com>
2019-02-10 17:50:48 +02:00