Virtual columns should not be visible to the user,
so they are now hidden not only from directly selecting them,
but also via WRITETIME() and TTL() keywords.
Fixes#4288
Let's add a PRODUCT variable, similar to build_rpm.sh, for example, so
that we can override package names for enterprise AMIs.
Message-Id: <20190225063319.19516-1-penberg@scylladb.com>
"
After adcb3ec20c ("row_cache: read is not
single-partition if inter-partition forwarding is enabled") we have
noticed a regression in the results of some perf_fast_forward tests.
This was caused by those tests not disabling partition-level
fast-forwarding even though it was not needed and the commit in question
fixed an incorrect optimisation in such cases.
However, after solving that issue it has also become apparent that
mutation_reader_merger performs worse when the fast-forwarding is
disabled. This was attributed to logic responsible for dropping readers
as soon as they have reached the end of stream (which cannot be done if
fast-forwarding is enabled). This problem was mitigated with avoiding a
scan of the list and removing readers in small batches.
Fixes#4246.
Fixes#4254.
Tests: unit(dev)
"
* tag 'perf_fast_forward-fix-regression/v1' of https://github.com/pdziepak/scylla:
mutation_reader_merger: drop unneded readers in small batches
mutation_reader_merger: track readers by iterators and not pointers
tests/perf_fast_forward: disable partition-level fast-forwarding if not needed
* seastar 2313dec...ab54765 (10):
> Fix C++-17-only uses of static_assert() with a single parameter.
> README.md: fix out-of-date explanation of C++ dialect
> net: fix tcp load balancer accounting leak while moving socket to other shard
> Revert "deleter: prevent early memory free caused by deleter append."
> deleter: prevent early memory free caused by deleter append.
> Solve seastar.unit.thread failure in debug mode
> Fix iovec-based read_dma: use make_readv_iocb instead of make_read_iocb
> build: Fix the required version of `fmt`
> app_template: fix use after move in app constructor
> build: Rename CMake variable for private flags
Fixes#4269.
* 'jhk/define_debug/v1' of https://github.com/hakuch/scylla:
build: Remove the `DEBUG_SHARED_PTR` pp variable
build: Prefer the Seastar version of a pp variable
Scylla currently prints a welcome message when it starts, with the
Scylla version, but this is not printed to the regular log so in some
cases (e.g., Jenkins runs) we do not see it in the log. So let's add
a regular INFO-level log message with the same information.
Also, Scylla currently doesn't print any specific log message when it
normally completes its shutdown. In some cases, users may end up
wondering whether Scylla hung in the middle of the shutdown, or in
fact exited normally. Refs #4238. So in this patch we add a "shutdown
complete" message as the very last message in a successfull shutdown.
We print Scylla's version also in the shutdown message, which may be
useful to see in the logs when shutting down one version of Scylla
and starting a different version.
Finally, we also add a log message when initialization is complete,
which may also be useful to understand whether Scylla hung during
initialization.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190217140659.19512-1-nyh@scylladb.com>
It was observed that destroying readers as soon as they are not needed
negatively affects performance of relatively small reads. We don't want
to keep them alive for too long either, since they may own a lot of
memory, but deferring the destruction slightly and removing them in
batches of 4 seems to solve the problem for the small reads.
mutation_reader_merger uses a std::list of mutation_reader to keep them
alive while the rest of the logic operates on non-owning pointers.
This means that when it is a time to drop some of the readers that are
no longer needed, the merger needs to scan the list looking for them.
That's not ideal.
The solution is to make the logic use iterators to elements in that
list, which allows for O(1) removal of an unneeded reader. Iterators to
list are just pointers to the node and are not invalidated by unrelated
additions and removals.
Several of the test cases in perf_fast_forward do not need
partition-level fast-forwarding. However, since the defaults are used to
construct most of the readers the fast-forwarding is enabled regardless.
This showed an apparent regression in the perf_fast_forward results
after adcb3ec20c ("row_cache: read is not
single-partition if inter-partition forwarding is enabled") which
disabled an optimisation that was invalid when partition-level
fast-forwarind was requested.
This patch ensures that all single-partition reads that do not need
partition-level fast-forwarding keep it disabled.
"
Currently we keep the entries in a circular_buffer, which uses
a contiguous storage. For large partitions with many promoted index
entries this can cause OOM and sstable compaction failure.
A similar problem exists for the offset vector built
in write_promoted_index().
This change solves the problem by serializing promoted index entries
and the offset vector on the fly directly into a bytes_ostream, which
uses fragmented storage.
The serialization of the first entry is deferred, so that
serialization is avoided if there will be less than 2
entries. Promoted index is not added for such partitions.
There still remains a problem that large-enough promoted index can cause OOM.
Refs #4217
Tests:
- unit (release)
- scylla-bench write
Branches: 3.0
"
* tag 'fix-large-alloc-for-promoted-index-v3' of github.com:tgrabiec/scylla:
sstables: mc: writer: Avoid large allocations for maintaining promoted index
sstables: mc: writer: Avoid double-serialization of the promoted index
"
The delete_atomically function is required to delete a set of sstables
atomically. I.e. Either delete all or none of them. Deleting only
some sstables in the set might result in data resurrection in case
sstable A holding tombstone that cover mutation in sstable B, is deleted,
while sstable B remains.
This patchset introduces a log file holding a list of SSTable TOC files
to delete for recovering a partial delete_atomically operation.
A new subdirectory is create in the sstables dir called `pending_delete`
holding in-flight logs.
The logs are created with a temporary name (using a .tmp suffix)
and renamed to the final .log name once ready. This indicates
the commit point for the operation.
When populating the column family, all files in the pending_delete
sub-directory are examined. Temporary log files are just removed,
and committed log files are read, replayed, and deleted.
Fixes#4082
Tests: unit (dev), database_test (debug)
"
* 'projects/delete_atomically_recovery/v5' of https://github.com/bhalevy/scylla:
tests: database_test: add test_distributed_loader_with_pending_delete
distributed_loader: replay and cleanup pending_delete log files
distributed_loader: populated_column_family: separate temp sst dirs cleanup phase
docs: add sstables-directory-structure.md
sstables: commit sstables to delete_atomically into a pending_delete log file
sstables: delete_atomically: delete sstables in a thread
sstables: component_basename: reuse with sstring component
sstables: introduce component_basename
database: maybe_delete_large_partitions_entry: do not access sstable and do not mask exceptions
sstables: add delete_sstable_and_maybe_large_data_entries
sstables: call remove_by_toc_name in dtor if marked_for_deletion
Scan the table's pending_delete sub-directory if it exists.
Remove any temporary pending_delete log files to roll back the respective
delete_atomically operation.
Replay completed pending_delete log files to roll forward the respective
delete_atomically operation, and finally delete the log files.
Cleanup of temporary sstable directories and pending_delete
sstables are done in a preliminary scan phase when populating the column family
so that we won't attempt to load the to-be-deleted sstables.
Fixes#4082
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
In preparation for replaying pending_delete log files,
we would like to first remove any temporary sst dirs
and later handle pending_delete log files, and only
then populate the column family.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
To facilitate recovery of a delete_atomically operation that crashed mid
way, add a replayable log file holding the committed sstables to delete.
It will be used by populate_column_family to replay the atomic deletion.
1. Write the toc names of sstables to be deleted into a temporary file.
2. Once flushed and closed, rename the temp log file into the final name
and flush the pending_delete directory.
3. delete the sstables.
4. Remove the pending_delete log file
and flush the pending_delete directory.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
component_basename returns just the basename for the component filename
without the leading sstdir path.
To be used for delete_atomically's pending_delete log file.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
1. We would like to be able to call maybe_delete_large_partitions_entry
from the sstable destructor path in the future so the sstable might go away
while the large data entries are being deleted.
2. We would like the caller to handle any exception on this path,
especially in the prepatation part, before calling delete_large_partitions_entry().
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
To be called by delete_atomically,
rather that passing a vector to delete_sstables.
This way, no need to build `sstables_to_delete_atomically` vector
To be replaced in the future with a sstable method once we
provide the large_data_handler upon construction.
Handle exceptions from remove_by_toc_name or maybe_delete_large_partitions_entry
by merely logging an error. There is nothing else we can do at this point.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
No need to call delete_sstables which works on a list of sstable
(by toc name).
Also, add FIXME comment about not calling
large_data_handler.maybe_delete_large_partitions_entry
on this path.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
checksummed_file_writer does not override allocate_buffer(), so it inherits
data_source_impl's default allocate_buffer, which does not care about alignment.
The buffer is then passed to the real file_data_sink_impl, and thence to the file
itself, which cannot complete the write since it is not properly aligned.
This doesn't fail in release mode, since the Seastar allocator will supply a
properly aligned buffer even if not asked to do so. The ASAN allocator usually
does supply an aligned buffer, but not always, which causes the test to fail.
Fix by forwarding the allocate_buffer() function to the underlying data_source.
Fixes#4262.
Branches: branch-3.0
Message-Id: <20190221184115.6695-1-avi@scylladb.com>
Limits are stored as uint32_t everywhere, but in some places
int32_t was used, which created inconsistencies when comparing
the value to std::numeric_limits<Type>::max().
In order to solve inconsistencies, the types are unified to uint32_t,
and instead of explicitly calling numeric limit max,
an already existing constant value query::max_rows is utilized.
Fixes#4253
Message-Id: <4234712ff61a0391821acaba63455a34844e489b.1550683120.git.sarna@scylladb.com>
We've seen schema application failing with marshal_exception
here. That's not enough information to figure out what is the
problem. Knowing which table and column is affected would make
diagnosis much easier in certain cases.
This patch wraps errors in query::deserialization_error with more
information.
Example output:
query::deserialization_error (failed on column system_schema.tables#bloom_filter_fp_chance \
(version: c179c1d7-9503-3f66-a5b3-70e72af3392a, id: 0, index: 0, type: org.apache.cassandra.db.marshal.DoubleType):\
seastar::internal::backtraced<marshal_exception> (marshaling error: read_simple - not enough bytes (expected 8, got 3)
Message-Id: <20190221113219.13018-1-tgrabiec@scylladb.com>
This patch removes the log message about "compaction_manager - Asked to stop"
at the very end of Scylla runs. This log message is confusing because it
only has the "asked to stop" part, without finally a "stopped", and may
lead a user to incorrectly fear that the shutdown hung - when it in fact
finished just fine.
The database object holds a compaction_manager and stop()s it when the
database is stop()ed - and that is the very last thing our shutdown does.
However, much earlier, as the *first* shutdown operation (i.e., the last
at_exit() in main.cc), we already stop() the compaction manager.
The second stop() call does nothing, but unfortunately prints the log
message just before checking if it has anything to stop. So this patch
just moves the log message to after the check.
Fixes#4238.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190217142657.19963-1-nyh@scylladb.com>
"
Fixes#4256
This miniseries fixes a problem with inserting NULL values through
INSERT JSON interface.
Tests: unit (dev)
"
* 'fix_insert_json_with_null' of https://github.com/psarna/scylla:
tests: add test for INSERT JSON with null values
cql3: add missing value erasing to json parser
Can be useful in diagnosing problems with application of schema
mutations.
do_merge_schema() is called on every change of schema of the local
node.
create_table_from_mutations() is called on schema merge when a table
was altered or created using mutations read from local schema tables
after applying the change, or when loading schema on boot.
Message-Id: <20190221093929.8929-2-tgrabiec@scylladb.com>
"
cryptopp's config.h has the following pragma:
#pragma GCC diagnostic ignored "-Wunused-function"
It is not wrapped in a push/pop. Because of that, including cryptopp
headers disables that warning on scylla code too.
This patch series introduces a single .cc file that has to include
cryptopp headers.
"
* 'avoid-cryptopp-v3' of https://github.com/espindola/scylla:
Avoid including cryptopp headers
Delete dead code
cryptopp's config.h has the following pragma:
#pragma GCC diagnostic ignored "-Wunused-function"
It is not wrapped in a push/pop. Because of that, including cryptopp
headers disables that warning on scylla code too.
The issue has been reported as
https://github.com/weidai11/cryptopp/issues/793
To work around it, this patch uses a pimpl to have a single .cc file
that has to include cryptopp headers.
While at it, it also reduces the differences and code duplication
between the md5 and sha1 hashers.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
This code would have be to refactored by the next patch. Since it is
commented out, just delete it.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
"
This series addresses the issue of redundant view updates,
generated for columns that were not selected for given materialized view.
Cases covered (quote:)
* If a base row has a live row marker, then we can avoid generating
view updates if only unselected columns change;
* If a base row has no live row marker, then we can avoid generating
view updates if unselected columns are updated, unless they are newly
created, deleted, or they have a TTL.
Additionally, this series includes caching selected columns and is_index information
to avoid unnecessary CPU cycles spent on recomputing these two.
Fixes#3819
"
* 'send_less_view_updates_if_not_necessary_4' of https://github.com/psarna/scylla:
tests: add cases for view update generation optimizations
view: minimize generated view updates for unselected columns
view: cache is_index for view pointer
index: make non-pointer overload of is_index function
index: avoid copying when checking for is_index
In some cases generating view updates for columns that were not
selected in CREATE VIEW statement is redundant - it is the case
when the update will not influence row liveness in anyway.
Currently, these cases are optimized out:
- row marker is live and only unselected columns were updated;
- row marked is not live and only unselected columns were updated,
and in the process nothing was created or deleted and there was
no TTL involved;
It's detrimental to keep querying index manager whether a view
is backing a secondary index every time, so this value is cached
at construct time.
At the same time, this value is not simply passed to view_info
when being created in secondary index manager, in order to
decouple materialized view logic from secondary indexes as much as
possible (the sole existence of is_index() is bad enough).
allocate_segment() can fail even though we're not out of memory, when
it's invoked inside an allocating section with the cache region
locked. That section may later succeed after retried after memory
reclamation.
We should ignore bad_alloc thrown inside allocating section body and
fail only when the whole section fails.
Fixes#2924
Message-Id: <1550597493-22500-1-git-send-email-tgrabiec@scylladb.com>