When generating view updates for base mutations when no pre-existing
data exists, we were forgetting to apply the tracked tombstones.
Fixes#4321
Tests: unit(dev)
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
"
This series removes the usage of the static gossiper object in init.cc
and storage_service.
Follow up series will remove more in other components. This is the
effort to clean up the component dependencies and have better shutdown
procedure.
Tests: tests/gossip_test, tests/cql_query_test, tests/sstable_mutation_test, dtests.
"
* tag 'asias/storage_service_gossiper_dep_v5' of github.com:cloudius-systems/seastar-dev:
storage_service: Do not use the global gms::get_local_gossiper()
storage_service: Pass gossiper object to storage_service
gms: Remove i_failure_detector.hh
gossip: Get rid of the gms::get_local_failure_detector static object
dht: Do not use failure_detector::is_alive in failure_detector_source_filter
tests: Fix stop snitch in gossip_test.cc
gossiper: Do not use value_factory from storage_service object
gossiper: Use cfg options from _cfg instead of get_local_storage_service
gossiper: Pass db::config object to gossiper class
init: Pass gossiper object to init_ms_fd_gossiper
"
Both cql3_type and abstract_type are normally used inside
shared_ptr. This creates a problem when an abstract_type needs to refer
to a cql3_type as that creates a cycle.
To avoid warnings from asan, we were using a std::unordered_map to
store one of the edges of the cycle. This avoids the warning, but
wastes even more memory.
Even before this series cql3_type was a fairly light weight
structure. This patch pushes in that direction and now cql3_type is a
struct with a single member variable, a data_type.
This avoids the reference cycle and is easier to understand IMHO.
The one corner case is varchar. In the old system cql3_type::varchar
and cql3_type::text don't compare equal, but they both map to the same
data_type.
In the new system they would compare equal, so we avoid the confusion
by just removing the cql3_type::varchar variable.
Tests: unit (dev)
"
* 'espindola/merge-cq3-type-and-type-v3' of https://github.com/espindola/scylla:
Turn cql3_type into a trivial wrapper over data_type
Delete cql3_type::varchar
Simplify db::cql_type_parser::parse
Add a test for the varchar column representation
The crash observed in issue #4335 happens because
delete_large_data_entries is passed a deleted name.
Normally we don't get a crash, but a garbage name and we fail to
delete entries from system.large_*.
Adding a test for the fix found another issue that the second patch
is this series fixes.
Tests: unit (dev)
Fixes#4335.
* https://github.com/espindola/scylla guthub/fix-use-after-free-v4:
large_data_handler: Fix a use after destruction
large_data_handler: Make a variable non static
Allow large_data_handler to be stopped twice
Allow table to be stopped twice
Test that large data entries are deleted
When a view replica becomes unavailable, updates to it are stored as
hints at the paired based replica. This on-disk queue of pending view
updates grows as long as there are view updated and the view replica
remains unavailable. Currently, we take that relative queue size into
account when calculating the delay for new base writes, in the context
of the backpressure algorithm for materialized views.
However, the way we're calculating that on-disk backlog is wrong,
since we calculate it per-device and then feed it to all the hints
managers for that device. This means that normal hints will show up as
backlog for the view hints manager, which in turn introduces delays.
This can make the view backpressure mechanism kick-in even if the
cluster uses no materialized views.
There's yet another way in which considering the view hints backlog is
wrong: a view replica that is unavailable for some period of time can
cause the backlog to grow to a point where all base writes are applied
the maximum delay of 1 second. This turns a single-node failure into
cluster unavailability.
The fix to both issues is to simply not take this on-disk backlog into
account for the backpressure algorithm.
Fixes#4351Fixes#4352
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190321170418.25953-1-duarte@scylladb.com>
Store the failure_detector object inside gossiper object.
- No more the global object sharded<failure_detector>
- No need to initialize sharded<failure_detector> manually which
simplifies the code in tests/cql_test_env.cc and init.cc.
"
This series adds support for local indexing, i.e. when the index table
resides on the same partition as base data.
It addresses the performance issue of having an indexed query
that also specifies a partition key - index will be queried
locally.
"
* 'add_local_indexing_11' of https://github.com/psarna/scylla: (30 commits)
tests: add cases for local index prefix optimization
tests: add create/drop local index test case
tests: add non-standard names cases to local index tests
tests: add multi pk case for local index tests
tests: add test for malformed local index definitions
tests: add local index paging test
tests: add local indexing test
cql3: add CREATE INDEX syntax for local indexes
cql3: use serialization function to create index target string
index: add serialization function for index targets
index: use proper local index target when adding index
index: add parsing target column name from local index targets
db: add checking for local index in schema tables
index: add checking if serialized target implies local index
index: enable parsing multi-key targets
index: move target parser code to .cc file
json: add non-throwing overload for to_json_value
cql3: add checking for local indexes in has_supporting_index()
cql3: move finding index restrictions to prepare stage
cql3: add picking an index by score
...
Both cql3_type and abstract_type are normally used inside
shared_ptr. This creates a problem when an abstract_type needs to refer
to a cql3_type as that creates a cycle.
To avoid warnings from asan, we were using a std::unordered_map to
store one of the edges of the cycle. This avoids the warning, but
wastes even more memory.
Even before this patch cql3_type was a fairly light weight
structure. This patch pushes in that direction and now cql3_type is a
struct with a single member variable, a data_type.
This avoids the reference cycle and is easier to understand IMHO.
Tests: unit (dev)
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Since its first version, db::cql_type_parser::parse had special cases
for native and user defined types.
Those are not necessary, as the general parser has no problem handling
them.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
The value computed is not static since
f254664fe6, but unfortunately that was
missed in that commit.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
The path leading to the issue was:
The sstable name is allocated and passed to maybe_delete_large_data_entries by reference
auto name = sst->get_filename();
return large_data_handler.maybe_delete_large_data_entries(*sst->get_schema(), name, sst->data_size());
A future is created with a reference to it
large_partitions = with_sem([&s, &filename, this] {
return delete_large_data_entries(s, filename, db::system_keyspace::LARGE_PARTITIONS);
});
The semaphore blocks.
The filename is destroyed.
delete_large_data_entries is called with a destroyed filename.
The reason this did not reproduce trivially in a debug build was that
the sstable itself was in the stack and the destructed value was read
as an internal value, and so asan had nothing to complain about.
Unfortunately we also had no tests that the entry in
system.large_rows was actually deleted.
This patch passes the name by value. It might create up to 3 copies of
it. If that is too inefficient it can probably be avoided with a
do_with in maybe_delete_large_data_entries.
Fixes#4335
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
db::schema_tables::ALL and db::schema_tables::all_tables() are both supposed
to list the same schema tables - the former is the list of their names, and
the latter is the list of their schemas. This code duplication makes it easy
to forget to update one of them, and indeed recently the new
"view_virtual_columns" was added to all_tables() but not to ALL.
What this patch does is to make ALL a function instead of constant vector.
The newly named all_table_names() function uses all_tables() so the list
of schema tables only appears once.
So that nobody worries about the performance impact, all_table_names()
caches the list in a per-thread vector that is only prepared once per thread.
Because after this patch all_table_names() has the "view_virtual_columns"
that was previously missing, this patch also fixes#4339, which was about
virtual columns in materialized views not being propagated to other nodes.
Unfortunately, to test the fix for #4339 we need a test with multiple
nodes, so we cannot test it here in a unit test, and will instead use
the dtest framework, in a separate patch.
Fixes#4339
Branches: 3.0
Tests: all unit tests (release and debug mode), new dtest for #4339. The unit test mutation_reader_test failed in debug mode but not in release mode, but this probably has nothing to do with this patch (?).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190320063437.32731-1-nyh@scylladb.com>
This is analogous to the system.large_rows table, but holds individual
cells, so it also needs the column name.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
With this changes the futures returned by large_data_handler will not
normally wait for entries to be written to system.large_rows or
system.large_partitions.
We use a semaphore to bound how behind system.large_* table updates
can get.
This should avoid delaying sstables writes in the common case, which
is more relevant once we warn of large cells since the the default
threshold will be just 1MB.
Note that there is no ordering between the various maybe_record_* and
maybe_delete_large_data_entries requests. This means that we can end
up with a stale entry that is only removed once the TTL expires.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
These will use a member semaphore variable in a followup patch, so they
cannot be const.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
This should have been changed in the patch
db: stop the commit log after the tables during shutdown
But unfortunately I missed it then.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
We had almost identical error handling for large_partitions and
large_rows. Refactor in preparation for large_cells.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
This renames it to record_large_partitions, which matches
record_large_rows. It also changes the signature to be closer to
record_large_rows.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
The code for deleting entries from system.large_partitions was almost
a duplicate from the code for deleting entries from system.large_rows.
This patch unifies the two, which also improves the error message when
we fail to delete entries from system.large_partitions.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
This allows for system.large_partitions to be updated if a large
partition is found while writing the last sstables.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
"
This fixes#3988.
We already have a system.large_partitions, but only a warning for
large rows. These patches close the gap by also recording large rows
into a new system.large_rows.
"
* 'espindola/large-row-add-table-v6' of https://github.com/espindola/scylla:
Add a testcase for large rows
Populate system.large_rows.
Create a system.large_rows table
Extract a key_to_str helper
Don't call record_large_rows if stopped
Add a delete_large_rows_entries method to large_data_handler
db::large_data_handler::(maybe_)?record_large_rows: Return future<> instead of void
Rename maybe_delete_large_partitions_entry
Rename log_large_row to record_large_rows
Rename maybe_log_large_row to maybe_record_large_rows
"
This series contains minor improvements to commitlog log messages that
have helped investigating #4231, but are not specific to that bug.
"
* tag 'improve-commitlog-logs/v1' of https://github.com/pdziepak/scylla:
commitlog: use consistent chunk offsets in logs
commitlog: provide more information in logs
commitlog: remove unnecessary comment
Logs in commitlog writer use offset in the file of the chunk header to
identify chunks. However, the replayer is using offset after the header
for the same purpose. This causes unnecessary confusion suggesting that
the replayer is reading at the wrong position.
This patch changes the replayer so that it reports chunk header offsets.
This commits adds some more information to the logs. Motivated, by
experiences with investigating #4231.
* size of each write
* position of each write
* log message for final write
Commitlog files contain multiple chunks. Each chunk starts as a single
(possibly, fragmented buffer). The size of that buffer in memory may be
larger than the size in the file.
cycle() was incorrectly using the in-memory size to write the whole
buffer to the file. That sometimes caused data corruption, since a
smaller on-file size was used to compute the offset of the next chunk
and there could be multiple chunk writes happening at the same time.
This patch solves the issue by ensuring that only the actual on-file
size of the chunk is written.
When looking for optimization paths, columns selected in a view
are checked against multiple conditions - unfortunately virtual
columns were erroneously skipped from that check, which resulted
in ignoring their TTLs. That can lead to overoptimizing
and not including vital liveness info into view rows,
which can then result in row disappearing too early.
It now records large rows when they are first written to an sstable
and removes them when the sstable is deleted.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
This is analogous to the system.large_partitions table, but holds
individual rows, so it also needs the clustering key of the large
rows.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
The implementations large_data_handler should only be called if
large_data_handler hasn't been stopped yet.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
These functions will record into tables in a followup patch, so they
will need to return a future.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
The shard-aware drivers can cause a huge amount of connections to be created
when there are tens of thousands of clients. While normally the shard-aware
drivers are beneficial, in those cases they can consume too much memory.
Provide an option to disable shard awareness from the server (it is likely to
be easier to do this on the server than to reprovision those thousands of
clients).
Tests: manual test with wireshark.
Message-Id: <20190223173331.24424-1-avi@scylladb.com>
1. We would like to be able to call maybe_delete_large_partitions_entry
from the sstable destructor path in the future so the sstable might go away
while the large data entries are being deleted.
2. We would like the caller to handle any exception on this path,
especially in the prepatation part, before calling delete_large_partitions_entry().
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Can be useful in diagnosing problems with application of schema
mutations.
do_merge_schema() is called on every change of schema of the local
node.
create_table_from_mutations() is called on schema merge when a table
was altered or created using mutations read from local schema tables
after applying the change, or when loading schema on boot.
Message-Id: <20190221093929.8929-2-tgrabiec@scylladb.com>
"
cryptopp's config.h has the following pragma:
#pragma GCC diagnostic ignored "-Wunused-function"
It is not wrapped in a push/pop. Because of that, including cryptopp
headers disables that warning on scylla code too.
This patch series introduces a single .cc file that has to include
cryptopp headers.
"
* 'avoid-cryptopp-v3' of https://github.com/espindola/scylla:
Avoid including cryptopp headers
Delete dead code
cryptopp's config.h has the following pragma:
#pragma GCC diagnostic ignored "-Wunused-function"
It is not wrapped in a push/pop. Because of that, including cryptopp
headers disables that warning on scylla code too.
The issue has been reported as
https://github.com/weidai11/cryptopp/issues/793
To work around it, this patch uses a pimpl to have a single .cc file
that has to include cryptopp headers.
While at it, it also reduces the differences and code duplication
between the md5 and sha1 hashers.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
In some cases generating view updates for columns that were not
selected in CREATE VIEW statement is redundant - it is the case
when the update will not influence row liveness in anyway.
Currently, these cases are optimized out:
- row marker is live and only unselected columns were updated;
- row marked is not live and only unselected columns were updated,
and in the process nothing was created or deleted and there was
no TTL involved;