scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 14:15:46 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	a921479e71	Merge tag '807-v3' from https://github.com/avikivity/scylla From Avi: This patchset introduces a linearization context for managed_bytes objects. Within this context, any scattered managed_bytes (found only in lsa regions, so limited to memtable and cache) are auto-linearized for the lifetime of the context. This ensures that key and value lookups can use fast contiguous iterators instead of using slow discontiguous iterators (or crashing, as is the case now).	2016-02-16 14:29:48 +01:00
Avi Kivity	ce74718950	Merge "Preparation for specifying query result format in IDL" from Tomasz	2016-02-15 19:41:18 +02:00
Raphael S. Carvalho	59bbe98c21	sstables: keep track of compacting sstables in compacton manager itself Avi says: "Something like unordered_set<unsigned long> is error prone, because ints tend to mix up (also, need to use a sized type, unsigned long varies among machines)." With that in mind, it's better if we keep track of compacting sstables in a unordered_set<shared_sstable>. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <249f0fd4cfcf786cf3c37a79978f7743d07f48ad.1455120811.git.raphaelsc@scylladb.com>	2016-02-15 18:35:43 +02:00
Tomasz Grabiec	9d11968ad8	Rename serialization_format to cql_serialization_format	2016-02-15 16:53:56 +01:00
Raphael S. Carvalho	a487ef1ff3	sstables: improve log message when a sstable is sealed Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <e391243212d83347b1b50c728bee24f6a2ecc950.1455230788.git.raphaelsc@scylladb.com>	2016-02-14 12:05:16 +02:00
Raphael S. Carvalho	ed61fe5831	sstables: make compaction stop report user-friendly When scylla stopped an ongoing compaction, the event was reported as an error. This patch introduces a specialized exception for compaction stop so that the event can be handled appropriately. Before: ERROR [shard 0] compaction_manager - compaction failed: read exception: std::runtime_error (Compaction for keyspace1/standard1 was deliberately stopped.) After: INFO [shard 0] compaction_manager - compaction info: Compaction for keyspace1/standard1 was stopped due to shutdown. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <1f85d4e5c24d23a1b4e7e0370a2cffc97cbc6d44.1455034236.git.raphaelsc@scylladb.com>	2016-02-11 12:16:53 +02:00
Avi Kivity	3c60310e38	key: relax some APIs to accept partition_key_view instead of const partition_key& Using a partition_key_view can save an allocation in some cases. We will make use of it when we linearize a partition_key; during the process we are given a simple byte pointer, and constructing a partition_key from that requires an allocation.	2016-02-09 19:55:13 +02:00
Avi Kivity	f3ca597a01	Merge "Sstable cleanup fixes" from Tomasz " - Added waiting for async cleanup on clean shutdown - Crash in the middle of sstable removal doesn't leave system in a non-bootable state"	2016-02-04 12:36:13 +02:00
Tomasz Grabiec	c7ef3703cc	sstable: Make sstable deletion never leave sstable set in a non-bootable state Refs #860 Refs #802 An sstable file set with any component missing is interpreted as a critical error during boot. Currently sstable removal procedure could leave the files in a non-bootable state if the process crashed after TOC was removed but before all components were removed as well. To solve this problem, start the removal by renaming the TOC file to a so called "temporary TOC". Upon boot such kind of TOC file is interpreted as an sstable which is safe to remove. This kind of TOC was added before to deal with a similar scenario but in the opposite direction - when writing a new sstable.	2016-02-03 17:36:17 +01:00
Tomasz Grabiec	c8a98b487c	sstables: Remove coupling-hiding duplication	2016-02-03 17:36:17 +01:00
Tomasz Grabiec	355874281a	sstables: Do not register exit hooks from static initializer Fixes #868. Registerring exit hooks while reactor is already iterating over exit hooks is not allowed and currently leads to undefined behavior observed in #868. While we should make the failure more user friendly, registering exit hooks concurrently with shutdown will not be allowed. We don't expect exit hooks to be registered after exit starts because this would violate the guarantee which says that exit hooks are executed in reverse order of registration. Starting exit sequence in the middle of initialization sequence would result in use after free errors. Btw, I'm not sure if currently there's anything which prevents this To solve this problem, move the exit hook to initilization sequence. In case of tests, the cleanup has to be called explicitly.	2016-02-03 17:35:50 +01:00
Raphael S. Carvalho	4041f8cffc	compaction: stop all ongoing compaction during shutdown Currently, we wait for ongoing compaction during shutdown, but that may take 'forever' if compacting huge sstables with a slow disk. Compaction of huge sstables will take a considerable amount of time even with fast disks. Therefore, all ongoing compaction should be stopped during shutdown. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <3370f17ce4274df417ea60651f33fc5d4de91199.1454441286.git.raphaelsc@scylladb.com>	2016-02-03 10:18:51 +02:00
Raphael S. Carvalho	cf22c827f9	compaction_manager: fix assertion when stopping task Task is stopped by closing gate and forcing it to exit via gate exception. The problem is that task->compacting_cf may be set to the column family being compacted, and compaction_manager::remove would see it and try to stop the same task again, which would lead to problems. The fix is to clean task->compacting_cf when stopping task. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <3473e93c1a107a619322769d65fa020529b5501b.1454441286.git.raphaelsc@scylladb.com>	2016-02-03 10:18:15 +02:00
Raphael S. Carvalho	a46aa47ab1	make sstables::compact_sstables return list of created sstables Now, sstables::compact_sstables() receives as input a list of sstables to be compacted, and outputs a list of sstables generated by compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <0d8397f0395ce560a7c83cccf6e897a7f464d030.1454110234.git.raphaelsc@scylladb.com>	2016-01-31 12:39:20 +02:00
Raphael S. Carvalho	ee84f310d9	move deletion of sstables generated by interrupted compaction This deletion should be handled by sstables::compact_sstables, which is the responsible for creation of new sstables. It also simplifies the code. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <541206be2e910ab4edb1500b098eb5ebf29c6509.1454110234.git.raphaelsc@scylladb.com>	2016-01-31 12:39:20 +02:00
Glauber Costa	7214649b8a	sstables: const where const is due Some SSTable methods are not marked as const. But they should be. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <72cd3ef0157eb38e7fd48d0c989f2342cbc42f3c.1454103008.git.glauber@scylladb.com>	2016-01-31 12:36:36 +02:00
Raphael S. Carvalho	ba4260ea8f	api: print proper compaction type There are several compaction types, and we should print the correct one when listing ongoing compaction. Currently, we only support compaction types: COMPACTION and CLEANUP. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <c96b1508a8216bf5405b1a0b0f8489d5cc4be844.1453851299.git.raphaelsc@scylladb.com>	2016-01-28 13:47:00 +02:00
Raphael S. Carvalho	45c446d6eb	compaction: pass dht::token by reference Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-27 13:25:41 -02:00
Raphael S. Carvalho	fc541e2f08	compaction: remove code to sort local ranges storage_service::get_local_ranges returns sorted ranges, which are not overlapping nor wrap-around. As a result, there is no need for the consumer to do anything. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-27 13:15:36 -02:00
Glauber Costa	3f94070d4e	use auto&& instead of auto& for priority classes. By Avi's request, who reminds us that auto& is more suited for situations in which we are assigning to the variable in question. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <87c76520f4df8b8c152e60cac3b5fba5034f0b50.1453820373.git.glauber@scylladb.com>	2016-01-26 17:00:20 +02:00
Glauber Costa	b63611e148	mark I/O operations with priority classes After this patch, our I/O operations will be tagged into a specific priority class. The available classes are 5, and were defined in the previous patch: 1) memtable flush 2) commitlog writes 3) streaming mutation 4) SSTable compaction 5) CQL query Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-01-25 15:20:38 -05:00
Glauber Costa	8e4bf025ae	sstables: wire priority for read path All the SSTable read path can now take an io_priority. The public functions will take a default parameter which is Seastar's default priority. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-01-25 15:20:38 -05:00
Glauber Costa	56c11a8109	sstables: wire priority for write path All variants of write_component now take an io_priority. The public interfaces are by default set to Seastar's default priority. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-01-25 15:20:38 -05:00
Glauber Costa	03d5a89b90	sstables: mandate a buffer size parameter for data_stream_at The only user for the default size is data_read, sitting at row.cc. That reader wants to read and process a chunk all at once. So there's really no reason to use the default buffer size - except that this code is old. We should do as we do in other single-key / single-range readers and try to read all at once if possible, by looking at the size we received as a parameter. Cleaning up the data_stream_at interface then comes as a nice side effect. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-01-25 15:20:38 -05:00
Pekka Enberg	81996bd10b	Merge "Improvements to compaction manager" from Raphael	2016-01-21 20:54:49 +02:00
Raphael S. Carvalho	bb909798bc	compaction_manager: introduce can_submit Purpose is to reuse code and also make it easier to read. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-21 15:42:23 -02:00
Raphael S. Carvalho	653a07d75d	compaction_manager: introduce signal_less_busy_task Purpose is to reuse code. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-21 15:31:44 -02:00
Raphael S. Carvalho	2164aa8d5b	move compaction manager from /utils to /sstables Compaction manager was initially created at utils because it was more generic, and wasn't only intended for compaction. It was more like a task handler based on futures, but now it's only intended to manage compaction tasks, and thus should be moved elsewhere. /sstables is where compaction code is located. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-21 15:23:05 -02:00
Pekka Enberg	b5833e8002	Merge "Enable incremental backups option" from Vlad "This series moves the "backup" logic into the sstable::write_components() methods, adds a support for enabling backup for sstables flushed in the compaction flow (in addition to a regular flushing flow which had this support already) and enables the "incremental_backups" configuration option." I fixed up a merge conflict with commit `5e953b5` ("Merge "Add support to stop ongoing compaction" from Raphael").	2016-01-21 18:52:07 +02:00
Pekka Enberg	5e953b5e47	Merge "Add support to stop ongoing compaction" from Raphael "stop compaction is about temporarily interrupting all ongoing compaction of a given type. That will also be needed for 'nodetool stop <compaction_type>'. The test was about starting scylla, stressing it, stopping compaction using the API and checking that scylla was able to recover. Scylla will print a message as follow for each compaction that was stopped: ERROR [shard 0] compaction_manager - compaction failed: read exception: std::runtime_error (Compaction for keyspace1/standard1 was deliberately stopped.) INFO [shard 0] compaction_manager - compaction task handler sleeping for 20 seconds"	2016-01-21 18:34:10 +02:00
Vlad Zolotarov	c2ab54e9c7	sstables flushing: enable incremental backup (if requested) Enable incremental backup when sstables are flushed if incremental backup has been requested. It has been enabled in the regular flushing flow before but wasn't in the compaction flow. This patch enables it in both places and does it using a backup capability of sstable::write_components() method(s). Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-01-21 12:13:20 +02:00
Vlad Zolotarov	cb5c66f264	sstable::write_components(): add a 'backup' parameter When 'backup' parameter is TRUE - create backup hard links for a newly written sstables in <sstable dir>/backups/ subdirectory. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-01-21 12:04:45 +02:00
Raphael S. Carvalho	f001bb0f53	sstables: fix make_checksummed_file_output_stream Arguments buffer_size and true were accidently inverted. GCC wasn't complaning because implicit conversion of bool to int, and vice-versa, is valid. However, this conversion is not very safe because we could accidentaly invert parameters. This should fix the last problem with sstable_test. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <9478cd266006fdf8a7bd806f1c612ec9d1297c1f.1453301866.git.raphaelsc@scylladb.com>	2016-01-20 16:01:38 +01:00
Paweł Dziepak	33892943d9	sstables: do not drop row marker when reading mutation Since `581271a243` "sstables: ignore data belonging to dropped columns" we silently drop cells if there is no column in the current schema that they belong to or their timestamp is older than the column dropped_at value. Originally this check was applied to row markers as well which caused them to be always dropped since there is no column in the schema representing these markers. This patch makes sure that the check whether colum is alive is performed only if the cell is not a row marker. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1453289300-28607-1-git-send-email-pdziepak@scylladb.com>	2016-01-20 12:35:41 +01:00
Raphael S. Carvalho	c318f3baa3	sstables: fix sstable::data_stream_at After `63967db8`, offset is ignored when creating a input stream. Found the problem after sstable_test failed recently. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <56ece21ff6e043e224eb2a6e76cdd422b94821b0.1453232689.git.raphaelsc@scylladb.com>	2016-01-20 09:35:57 +02:00
Raphael S. Carvalho	3bd240d9e8	compaction: add ability to stop an ongoing compaction That's needed for nodetool stop, which is called to stop all ongoing compaction. The implementation is about informing an ongoing compaction that it was asked to stop, so the compaction itself will trigger an exception. Compaction manager will catch this exception and re-schedule the compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-19 23:15:18 -02:00
Raphael S. Carvalho	ec4c73d451	compaction: rename compaction_stats to compaction_info compaction_info makes more sense because this structure doesn't only store stats about ongoing compaction. Soon, we will add information to it about whether or not an user asked to stop the respective ongoing compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-19 23:15:18 -02:00
Glauber Costa	63967db8bf	sstables: always use a file_*_stream_options in our readers and writes Instead of using the APIs that explicitly pass things like buffer_size, always use the options instance instead. This will make it easier to pass extra options in the future. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <5b04e60ab469c319a17a522694e5bedf806702fe.1453219530.git.glauber@scylladb.com>	2016-01-19 18:26:37 +02:00
Glauber Costa	c3ac5257b5	sstables: don't repeat file_writer creation all the time When this code was originally written, we used to operate on a generic output_stream. We created a file output stream, and then moved it into the generic object. Many patches and reworks later, we now have a file_writer object, but that pattern was never reworked. So in a couple of places we have something like this: f = file_object acquired by open_file_dma auto out = file_writer(std::move(f), 4096); auto w = make_shared<file_writer>(std::move(out)); The last statement is just totally redundant. make_shared can create an object from its parameters without trouble, so we can just pass the parameter list directly to it. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <c01801a1fdf37f8ea9a3e5c52cd424e35ba0a80d.1453219530.git.glauber@scylladb.com>	2016-01-19 18:26:36 +02:00
Raphael S. Carvalho	0c67b1d22b	compaction: filter out mutation that doesn't belong to shard When compacting sstable, mutation that doesn't belong to current shard should be filtered out. Otherwise, mutation would be duplicated in all shards that share the sstable being compacted. sstable_test will now run with -c1 because arbitrary keys are chosen for sstables to be compacted, so test could fail because of mutations being filtered out. fixes #527. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <1acc2e8b9c66fb9c0c601b05e3ae4353e514ead5.1453140657.git.raphaelsc@scylladb.com>	2016-01-19 10:16:41 +01:00
Pekka Enberg	7d3a3bd201	Merge "column family cleanup support" from Raphael "This patch is intended to add support to column family cleanup, which will make 'nodetool cleanup' possible. Why is this feature needed? Remove irrelevant data from a node that loses part of its token range to a newly added node."	2016-01-18 10:15:05 +02:00
Paweł Dziepak	cfc0a132a9	sstable: handle multi-cell vs atomic incompatibilities Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-01-15 13:12:40 +01:00
Paweł Dziepak	581271a243	sstables: ignore data belonging to dropped columns Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-01-15 13:12:40 +01:00
Tomasz Grabiec	ccd609185f	sstables: Add ability to wait for async sstable cleanup tasks This patch adds a function which waits for the background cleanup work which is started from sstable destructors. We wait for those cleanups on reactor exit so that unit tests don't leak. This fixes erratic ASAN complaint about memory leak when running schema_change_test in debug mode: Indirect leak of 64 byte(s) in 1 object(s) allocated from: 0x7fab24413912 in operator new(unsigned long) (/lib64/libasan.so.2+0x99912) 0x1776aeb in make_unique<continuation<future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> >, future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> > /usr/include/c++/5.1.1/bits/unique_ptr.h:765 0x1752b69 in schedule<future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:513 0x1711365 in schedule<future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:690 0x16d0474 in then_wrapped<future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>, future<> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:880 0x1696e9c in handle_exception<sstables::sstable::~sstable()::<lambda(auto:52)> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:1012 0x1638ba8 in sstables::sstable::~sstable() sstables/sstables.cc:1619 The leak is about allocations related to close() syscall tasks invoked from sstable destructor, which were not waited for. Message-Id: <1452783887-25244-1-git-send-email-tgrabiec@scylladb.com>	2016-01-15 11:32:15 +02:00
Raphael S. Carvalho	d44a5d1e94	compaction: filter out compacting sstables The implementation is about storing generation of compacting sstables in an unordered set per column family, so before strategy is called, compaction manager will filter out compacting sstables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-12 01:18:29 -02:00
Raphael S. Carvalho	9c13c1c738	compaction: move compaction execution from strategy to manager Currently, compaction strategy is the responsible for both getting the sstables selected for compaction and running compaction. Moving the code that runs compaction from strategy to manager is a big improvement, which will also make possible for the compaction manager to keep track of which sstables are being compacted at a moment. This change will also be needed for cleanup and concurrent compaction on the same column family. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-12 00:04:27 -02:00
Raphael S. Carvalho	ed80ed82ef	sstables: prepare compact_sstables to work with cleanup Cleanup is about rewriting a sstable discarding any keys that are irrelevant, i.e. keys that don't belong to current node. Parameter cleanup was added to compact_sstables. If set to true, irrelevant code such as the one that updates compaction history will be skipped. Logic was also added to discard irrelevant keys. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-11 21:43:40 -02:00
Tomasz Grabiec	4e5a52d6fa	db: Make read interface schema version aware The intent is to make data returned by queries always conform to a single schema version, which is requested by the client. For CQL queries, for example, we want to use the same schema which was used to compile the query. The other node expects to receive data conforming to the requested schema. Interface on shard level accepts schema_ptr, across nodes we use table_schema_version UUID. To transfer schema_ptr across shards, we use global_schema_ptr. Because schema is identified with UUID across nodes, requestors must be prepared for being queried for the definition of the schema. They must hold a live schema_ptr around the request. This guarantees that schema_registry will always know about the requested version. This is not an issue because for queries the requestor needs to hold on to the schema anyway to be able to interpret the results. But care must be taken to always use the same schema version for making the request and parsing the results. Schema requesting across nodes is currently stubbed (throws runtime exception).	2016-01-11 10:34:52 +01:00
Tomasz Grabiec	5184381a0b	memtable: Deconstify memtable in readers We want to upgrade entries on read and for that we need mutating permission.	2016-01-11 10:34:51 +01:00
Avi Kivity	0c755d2c94	db: reduce log spam when ignoring an sstable With 10 sstables/shard and 50 shards, we get ~105050 messages = 25,000 log messages about sstables being ignored. This is not reasonable. Reduce the log level to debug, and move the message to database.cc, because at its original location, the containing function has nothing to do with the message itself. Reviewed-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Message-Id: <1452181687-7665-1-git-send-email-avi@scylladb.com>	2016-01-07 19:23:25 +02:00

1 2 3 4 5 ...

535 Commits