Commit Graph

24 Commits

Author SHA1 Message Date
Piotr Sarna
986004a959 loader: move uploaded view pending sstables to staging
When loading tables uploaded via `nodetool refresh`, they used to be
left in upload/ directory if view updates would need to be generated
from them. Since view update generation is asynchronous, sstables
left in the directory could erroneously get overwritten by the user,
who decides to upload another batch of sstables and some of the names
collided.
To remedy this, uploaded sstables that need view updates are moved
to staging/ directory with a unique generation number, where they
await view update generation.

Fixes #4047
2019-03-20 13:44:29 +01:00
Benny Halevy
1021eb29c9 distributed_loader: fix old format counters exception
table::load_sstable: fix missing arg in old format counters exception

Properly catch and log the exception in load_new_sstables.
Abort when the exception is caught to keep current behavior.

Seen with migration_test:TestMigration_with_2_1_x.migrate_sstable_with_counter_test
without enable_dangerous_direct_import_of_cassandra_counters.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190301091235.2914-1-bhalevy@scylladb.com>
2019-03-04 17:36:09 +01:00
Benny Halevy
043673b236 distributed_loader: replay and cleanup pending_delete log files
Scan the table's pending_delete sub-directory if it exists.
Remove any temporary pending_delete log files to roll back the respective
delete_atomically operation.
Replay completed pending_delete log files to roll forward the respective
delete_atomically operation, and finally delete the log files.

Cleanup of temporary sstable directories and pending_delete
sstables are done in a preliminary scan phase when populating the column family
so that we won't attempt to load the to-be-deleted sstables.

Fixes #4082

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 11:08:22 +02:00
Benny Halevy
ee3ad75492 distributed_loader: populated_column_family: separate temp sst dirs cleanup phase
In preparation for replaying pending_delete log files,
we would like to first remove any temporary sst dirs
and later handle pending_delete log files, and only
then populate the column family.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 11:08:22 +02:00
Glauber Costa
e0bfd1c40a allow Cassandra SSTables with counters to be imported if they are new enough
Right now Cassandra SSTables with counters cannot be imported into
Scylla.  The reason for that is that Cassandra changed their counter
representation in their 2.1 version and kept transparently supporting
both representations.  We do not support their old representation, nor
there is a sane way to figure out by looking at the data which one is in
use.

For safety, we had made the decision long ago to not import any
tables with counters: if a counter was generated in older Cassandra, we
would misrepresent them.

In this patch, I propose we offer a non-default way to import SSTables
with counters: we can gate it with a flag, and trust that the user knows
what they are doing when flipping it (at their own peril). Cassandra 2.1
is by now pretty old. many users can safely say they've never used
anything older.

While there are tools like sstableloader that can be used to import
those counters, there are often situations in which directly importing
SSTables is either better, faster, or worse: the only option left.  I
argue that having a flag that allow us to import them when we are sure
it is safe is better than having no option at all.

With this patch I was able to successfully import Cassandra tables with
counters that were generated in Cassandra 2.1, reshard and compact their
SSTables, and read the data back to get the same values in Scylla as in
Cassandra.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190210154028.12472-1-glauber@scylladb.com>
2019-02-10 17:50:48 +02:00
Rafael Ávila de Espíndola
625080b414 Rename large_partition_handler
Now that it also handles large rows, rename it to large_data_handler.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-28 15:03:14 -08:00
Benny Halevy
74ef09a3a2 distributed_loader: populate_column_family should scan directories too
To detect and cleanup leftover temporary sstable directories.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-27 14:14:32 +02:00
Benny Halevy
bd85975277 sstables: fix is_temp_dir
1. fs::canonical required that the path will exist.
   and there is no need for fs::canonical here.
2. fs::path::extension will return the leading dot.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-27 14:14:32 +02:00
Benny Halevy
c2a5f3b842 distributed_loader: populate_column_family: ignore directories other than sstable::is_temp_dir
populate_column_family currently lists only regular files. ignoring all directories.
A later patch in this series allows it to list also directories so to cleanup
the temporary sstable directories, yet valid sub-directories, like staging|upload|snapshots,
may still exist and need to be ignored.

Other kinds of handling, like validating recgnized sub-directories and halting on
unrecognized sub-directories are possible, yet out of scope for this patch(set).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-27 14:14:32 +02:00
Benny Halevy
9bd7b2f4e6 distributed_loader: remove temporary sstable directories only on shard 0
Similar to calling remove_sstable_with_temp_toc later on in
populate_column_family(), we need only one thread to do the
cleanup work and the existing convention is that it's shard 0.

Since lister::rmdir is checking remove_file of all entries
(recursively) and the dir itself, doing that concurrently would fail.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-27 14:14:32 +02:00
Benny Halevy
bcfb2e509b distributed_loader: push future returned by rmdir into futures vector 2019-01-27 14:14:32 +02:00
Piotr Sarna
5d76a635ca distributed_loader: migrate flush_upload_dir to thread
Flushing upload dir code suffers from overcomplication,
so in order to make it a little bit simpler, it's moved
to threaded context.

Refs #4118

Message-Id: <232cca077bae7116cfa87de9c9b4ba60efc2a01d.1548077720.git.sarna@scylladb.com>
2019-01-21 15:48:17 +02:00
Tomasz Grabiec
d7c701d2d1 Merge "Type-erase gratuitous templates with functions" from Avi
Many area of the code are splattered with unneeded templates. This patchset replaces
some of them, where the template parameter is a function object, with an std::function
or noncopyable_function (with a preference towards the latter; but it is not always
possible). As the template is compiled for each instantiation (if the function
object is a lambda) while a function is compiled only once, there are significant
savings in compile time and bloat.

   text    data     bss     dec     hex filename
85160690          42120  284910 85487720        5187068 scylla.before
84824762          42120  284910 85151792        5135030 scylla.after

* https://github.com/avikivity/scylla detemplate/v2:
  api/commitlog: de-template acquire_cl_metric()
  database: de-template do_parse_schema_tables
  database: merge for_all_partitions and for_all_partitions_slow
  hints: de-template scan_for_hints_dirs()
  schema_tables: partially de-template make_map_mutation()
  distributed_loader: de-template
  tests: commitlog_test: de-template
  tests: cql_auth_query_test: de-template
  test: de-template eventually() and eventually_true()
  tests: flush_queue_test: de-template
  hint_test: de-template
  tests: mutation_fragment_test: de-template
  test: mutation_test: de-template
2019-01-21 11:32:22 +01:00
Avi Kivity
baf9480c8d distributed_loader: de-template
distributed_loader has several large templates that can be converted to normal
function with the help of noncopyable_function<>, reducing code bloat.

One of the lambdas used as an actual argument was adjusted, because the de-templated
callee only accepts functions returning a future, while the original accepted both
functions returning a future and functions returning void (similar to future::then).
2019-01-20 15:55:20 +02:00
Avi Kivity
6e6372e8d2 Revert "Merge "Type-eaese gratuitous templates with functions" from Avi"
This reverts commit 31c6a794e9, reversing
changes made to 4537ec7426. It causes bad_function_calls
in some situations:

INFO  2019-01-20 01:41:12,164 [shard 0] database - Keyspace system: Reading CF sstable_activity id=5a1ff267-ace0-3f12-8563-cfae6103c65e version=d69820df-9d03-3cd0-91b0-c078c030b708
INFO  2019-01-20 01:41:13,952 [shard 0] legacy_schema_migrator - Moving 0 keyspaces from legacy schema tables to the new schema keyspace (system_schema)
INFO  2019-01-20 01:41:13,958 [shard 0] legacy_schema_migrator - Dropping legacy schema tables
INFO  2019-01-20 01:41:14,702 [shard 0] legacy_schema_migrator - Completed migration of legacy schema tables
ERROR 2019-01-20 01:41:14,999 [shard 0] seastar - Exiting on unhandled exception: std::bad_function_call (bad_function_call)
2019-01-20 11:32:14 +02:00
Piotr Sarna
3d65eb5d4a distributed_loader: restore indentation 2019-01-18 10:59:37 +01:00
Piotr Sarna
e50e9b5150 distributed_loader: restore always mutating to level 0
When introducing view update generation path for sstables
in /upload directory, mutating these sstables was moved
to regular path only. It was wrong, because sstables that
need view updates generated from them may still need
to be downgraded to LCS level 0, so they won't disrupt
LCS assumptions after being loaded.

Reported-by: Nadav Har'El <nyh@scylladb.com>
2019-01-18 10:35:20 +01:00
Avi Kivity
b6239134c2 distributed_loader: de-template
distributed_loader has several large templates that can be converted to normal
function with the help of noncopyable_function<>, reducing code bloat.
2019-01-17 18:56:22 +02:00
Piotr Sarna
0eb703dc80 all: rename view_update_from_staging_generator
The new name, view_update_generator, is both more concise
and correct, since we now generate from directories
other than "/staging".
2019-01-15 17:31:47 +01:00
Piotr Sarna
a5d24e40e0 distributed_loader: fix indentation
Bad indentation was introduced in the previous commit.
2019-01-15 17:31:37 +01:00
Piotr Sarna
13c8c84045 service: add generating view updates from uploaded sstables
SSTables loaded to the system via /upload dir may sometimes be needed
to generate view updates from them (if their table has accompanying
views).

Fixes #4047
2019-01-15 17:31:37 +01:00
Piotr Sarna
76616f6803 distributed_loader: use proper directory for opening SSTable
Previous implementation assumes that each SSTable resides directly
in table::datadir directory, while what should actually be used
is directory path from SSTable descriptor.
This patch prevents a regression when adding staging sstables support
for upload/ dir.
2019-01-15 16:47:01 +01:00
Duarte Nunes
b851cb1a9a distributed_loader: Forbid uploading MV sstables
Instead suggest that the views be re-created.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190103142933.35354-1-duarte@scylladb.com>
2019-01-03 16:31:20 +02:00
Avi Kivity
c180a18dbb Distribute distributed_loader into its own header and source files
distributed_loader is a sizeable fraction of database.cc, so moving it
out reduces compile time and improves readability.
Message-Id: <20181230200926.15074-1-avi@scylladb.com>
2018-12-31 14:27:27 +02:00