scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-24 02:20:37 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	b4835756fd	tests: Fix compilation error Introdued in `920fe4278a`	2015-09-08 12:52:30 +02:00
Tomasz Grabiec	920fe4278a	Cleanup leftovers after compaction_counter to reclaim_counter rename	2015-09-08 10:19:19 +02:00
Tomasz Grabiec	15ae1a92cb	Merge branch 'pdziepak/compaction-remove-items/v4' from seastar-dev.git From Pawel: This series makes compaction remove items that are no longer items: - expired cells are changed into tombstones - items covered by higher level tombstones are removed - expired tombstones are removed if possible Fixes #70. Fixes #71.	2015-09-08 09:23:00 +02:00
Paweł Dziepak	b17f5c442f	tests/sstable: uncomment part of compaction test Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-09-07 21:21:38 +02:00
Paweł Dziepak	969fe6b878	sstables: make compact_sstables() take ref to column_family Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-09-07 21:20:32 +02:00
Paweł Dziepak	5fa42d6b5f	tests/sstables: construct schema using schema_builder schema_builder is necessary to set gc_grace_period. Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-09-07 21:20:32 +02:00
Calle Wilund	d614143f5e	Commitlog/database: Fixup series "Commit log flush request on disk overflow" Also at seastar-dev: calle/commitlog_flush_v3 (And, yes, this time I _did_ update the remote!) Refs #262 Commit of original series was done on stale version (v2) due to authors inability to multitask and update git repos. v3: * Removed future<> return value from callbacks. I.e. flush callback is now only fully syncronous over actual call	2015-09-07 21:29:19 +03:00
Avi Kivity	dee9060b12	Merge "Commit log flush request on disk overflow" from Calle "Fixes #262 Handles CL disk size exceeding configured max size by calling flush handlers for each dirty CF id / high replay_position mark. (Instead of uncontrolled delete as previously). * Increased default max disk size to 8GB. Same as Origin/scylla.yaml (so no real change, but synced). * Divide the max disk size by cpus (so sum of all shards == max) * Abstract flush callbacks in CL * Handler in DB that initiates memtable->sstable writes when called. Note that the flush request is done "syncronously" in new_segment() (i.e. when getting a new segment and crossing threshold). This is however more or less congruent with Origin, which will do a request-sync in the corresponding case. Actual dealing with the request should at least in production code however be done async, and in DB it is, i.e. we initiate sstable writes. Hopefully they finish soon, and CL segments will be released (before next segment is allocated). If the flush request does _not_ eventually result in any CF:s becoming clean and segments released we could potentially be issuing flushes repeatedly, but never more often than on every new segment."	2015-09-07 18:46:48 +03:00
Paweł Dziepak	ac602b13b5	tests: fix signed/unsigned comparison Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-09-07 16:41:00 +02:00
Avi Kivity	e37dfab853	Merge "Stability improvements" from Tomasz "Fixes #259 and other problems found along the way."	2015-09-07 16:45:44 +03:00
Calle Wilund	fdb921afb2	Commitlog: Add flushing of segment CF:s on disk overflow * Do not throw away commitlog segments on disk size overflow. Issue a flush request (i.e. calculate RP we want to free unto, and for all dirty CF:s, do a request). "Abstracted" as registerable callback. I.e. DB:s responsibility to actually do something with it.	2015-09-07 13:21:43 +02:00
Tomasz Grabiec	bf6062493e	tests: Introduce tests/perf_row_cache_update	2015-09-07 09:41:36 +02:00
Tomasz Grabiec	10453c71d2	tests: perf: Make iterations between clock readings in time_it() configurable	2015-09-07 09:41:36 +02:00
Asias He	7cc768a864	gossip: Fix wrong cluster name and partitioner name Right now, gossip returns hard coded cluster and partitioner name. sstring get_cluster_name() { // FIXME: DatabaseDescriptor.getClusterName() return "my_cluster_name"; } sstring get_partitioner_name() { // FIXME: DatabaseDescriptor.getPartitionerName() return "my_partitioner_name"; } Fix it by setting the correct name from configure option. With this cqlsh 127.0.0.$i -e "SELECT * from system.local; returns correct cluster_name. Fixes #291	2015-09-07 09:21:18 +03:00
Tomasz Grabiec	49bf844418	tests: Introduce row_cache_alloc_stress Tests stability of row_cache operations under low/fragmented memory.	2015-09-06 21:25:44 +02:00
Tomasz Grabiec	49f094ad5f	tests: Add test for row_cache::update()	2015-09-06 21:25:44 +02:00
Tomasz Grabiec	c82325a76c	lsa: Make region evictor signal forward progress In some cases region may be in a state where it is not empty and nothing could be evicted from it. For example when creating the first entry, reclaimer may get invoked during creation before it gets linked. We therefore can't rely on emptiness as a stop condition for reclamation, the evction function shall signal us if it made forward progress.	2015-09-06 21:25:44 +02:00
Tomasz Grabiec	704cfc13d8	tests: cql_query_test: Init local cache only once It's a singleton, so we can't attempt to init it more than once. Fixes cql_query_test failure: /home/tgrabiec/src/urchin2/seastar/core/future.hh:315: void future_state<>::set(): Assertion `_u.st == state::future' failed. unknown location(0): fatal error in "test_create_table_statement": signal: SIGABRT (application abort requested) seastar/tests/test-utils.cc(31): last checkpoint	2015-09-04 20:01:55 +02:00
Glauber Costa	b1c59ab995	sstable_mutation_test: test condition related to #188 This patch tests that collection within a mutation behave properly. That is what lead to #188. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-09-02 06:01:39 +03:00
Tomasz Grabiec	870e9e5729	lsa: Replace compaction_lock with broader reclaim_lock Disabling compaction of a region is currently done in order to keep the references valid. But disabling only compaction is not enough, we also need to disable eviction, as it also invalidates references. Rather than introducing another type of lock, compaction and eviction are controlled together, generalized as "reclaiming" (hence the reclaim_lock).	2015-09-01 17:29:04 +03:00
Tomasz Grabiec	3115a1aaa0	tests: logalloc_test: Disable test_compaction_lock with default allocator It relies on the fact that the process has a fixed amount of memory assigned and std::bad_alloc is thrown in a timely manner when it fills up, which is the case for seastar's allocator, but not with the default allocator. With the latter the OOM killer kills the process.	2015-09-01 15:17:43 +03:00
Tomasz Grabiec	66fcff8ff9	tests: Introduce tests for lsa eviction	2015-08-31 21:57:23 +02:00
Tomasz Grabiec	2d6d15308e	tests: logalloc_test: Add test for compaction_lock	2015-08-31 21:50:17 +02:00
Tomasz Grabiec	29e33dee4a	tests: mutation_test: Restore indentation	2015-08-31 21:50:17 +02:00
Tomasz Grabiec	ff8c81b25f	memtable: Encapsulate unsafe accessors	2015-08-31 21:50:17 +02:00
Calle Wilund	9ba84e458a	Commitlog: Handle partial writes in segment::cycle * Fixes #247 * Re-introduce test_allocation_failure, but allow for the "failure" to not happen. I.e. if run with low memory settings, the test will check that allocation failure is graceful. With lots of memory it will check partial write.	2015-08-31 20:02:05 +03:00
Paweł Dziepak	78eb61b38e	tests: add test for managed_vector Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-08-31 17:29:16 +02:00
Paweł Dziepak	f1167a594a	tests/cql_env: make sure that value views are correct Query options need to have correct _value_views in order to get_value_at() to work. With this patch we switch to constructor that generates value views from the passed values and sets remaining options to their default values. Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-08-31 17:25:36 +02:00
Paweł Dziepak	4b9791230a	tests/perf/simple_query: fix write mode Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-08-31 17:25:32 +02:00
Avi Kivity	f2a79aa7f6	Merge Prepare for closing sstables, part 1 Read-ahead will require that we close input_streams. As part of that we have to close sstables, and mutation_readers (which encapsulate input_streams). This is part 1 of a patchset series to do that. (The overarching goal is to enable read-ahead for sstables, see #244) Conflicts: sstables/compaction.cc	2015-08-31 16:15:18 +03:00
Avi Kivity	702de43ce3	Merge "Commit log replay" from Calle "Initial implementation/transposition of commit log replay. * Changes replay position to be shard aware * Commit log segment ID:s now follow basically the same scheme as origin; max(previous ID, wall clock time in ms) + shard info (for us) * SStables now use the DB definition of replay_position. * Stores and propagates (compaction) flush replay positions in sstables * If CL segments are left over from a previous run, they, and existing sstables are inspected for high water mark, and then replayed from those marks to amend mutations potentially lost in a crash * Note that CPU count change is "handled" in so much that shard matching is per _previous_ runs shards, not current. Known limitations: * Mutations deserialized from old CL segments are _not_ fully validated against existing schemas. * System::truncated_at (not currently used) does not handle sharding afaik, so watermark ID:s coming from there are dubious. * Mutations that fail to apply (invalid, broken) are not placed in blob files like origin. Partly because I am lazy, but also partly because our serial format differs, and we currently have no tools to do anything useful with it * No replay filtering (Origin allows a system property to designate a filter file, detailing which keyspace/cf:s to replay). Partly because we have no system properties. There is no unit test for the commit log replayer (yet). Because I could not really come up with a good one given the test infrastructure that exists (tricky to kill stuff just "right"). The functionality is verified by manual testing, i.e. running scylla, building up data (cassandra-stress), kill -9 + restart. This of course does not really fully validate whether the resulting DB is 100% valid compared to the one at k-9, but at least it verified that replay took place, and mutations where applied. (Note that origin also lacks validity testing)" Fixes #98.	2015-08-31 15:58:12 +03:00
Avi Kivity	7090dffe91	mutation_reader: switch to a class based implementation Using a lambda for implementing a mutation_reader is nifty, but does not allow us to add methods. Switch to a class-based implementation in anticipation of adding a close() method.	2015-08-31 15:53:53 +03:00
Calle Wilund	e068ffb5a5	Commitlog: Make file reader provide replay_position for entries	2015-08-31 14:29:47 +02:00
Calle Wilund	4ac07fa87d	Commitlog test: remove some hardcoded assumptions on segment IDs To enable changing the ID generation scheme.	2015-08-31 14:29:45 +02:00
Calle Wilund	0fcf7e3e91	Commitlog: Make "position" type 32-bit to align replay_position with Origin * Note: removed commitlog_test:test_allocation_failure because with segments limited to 4GB -> mutation limited to 2GB, actually forcing a fail is not guaranteed or even likely.	2015-08-31 14:29:44 +02:00
Avi Kivity	8c69098c89	Merge "Optimize memtable's scanning_reader" from Tomasz "I saw about 4% improvement in perf_sstable write on muninn with this. The decorated_key comparison is gone from the perf profile now. Now most of the work inside the reader is for copying the mutation."	2015-08-31 15:07:27 +03:00
Tomasz Grabiec	110a55886c	lsa: Introduce region::compaction_counter()	2015-08-31 13:58:42 +02:00
Glauber Costa	a9ab31dd9c	index_entry: move its fields to private visibility And provide accessors. This will give us the freedom to change their internal storage. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-29 14:05:36 -05:00
Glauber Costa	13d59c9618	index_entry: do away with the disk_string<> fields Now that we are using the NSM, and not the general parser for the index, there is no reason to keep using disk_string<>s in it. Since it is staying in the way of further optimizations, let's get rid of it and use bytes directly. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-29 14:05:36 -05:00
Avi Kivity	c734ef2b72	Merge seastar upstream * seastar 10e09b0...2e041c2 (7): > Merge "Change app_template::run() to terminate when callback is done" from Tomasz > resource: Fix compilation for hwloc version 1.8.0 > memory: Fix infinite recursion when throwing std::bad_alloc > core/reactor: Throw the right error code when connect() fails > future: improve exception safety > xen: add missing virtual destructors > circular_buffer: do not destroy uninitialized object app_template::run() users updated to call app_template::run_depracated().	2015-08-28 23:52:49 +03:00
Glauber Costa	bd272fe6aa	perf_sstable: test sequential reads from an sstable. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-27 09:02:11 -05:00
Glauber Costa	b194509a6d	perf_write: test for full writes it writes 5 columns (configurable) per row. This will exercise other paths aside from the index. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-27 09:02:11 -05:00
Glauber Costa	dcd312a982	perf_sstable: more than just the index My plan was originally to have two separate sets of tests: one for the index, and one for the data. With most of the code having ended up in the .hh file anyway, this distinction became a bit pointless. Let's put it everything here. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-27 09:02:11 -05:00
Glauber Costa	b3b0aff85e	perf_sstable_index: add test for index_read Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-27 09:02:11 -05:00
Glauber Costa	93e55969f2	sstables: modify read_indexes so it no longer takes a quantity read_indexes was one of the first functions coded in the sstable read path. At the time, I made the (now so obviously) wrong decision to code it generic enough so that we could specify the number of items to be read, instead of an upper bound in the file. The main reason for that, was that without the Summary, we have no way to know where to stop reading, and the Summary is a relatively new addition to the C* codebase: while I didn't really check when it got in, the code is full of tests for its presence. That turned out to be totally useless: we always read the indexes with the help of the Summary. While the Summary is a relatively new addition to C*, it is present in all version we aim to support. Meaning that reads without the Summary will never happen in our codebase. Even if, in the future, we happen to ditch the Summary file, we are very likely to do so in favor of some other structure that also allows us to manipulate precise borders in the Index. The code as it is, however, would not be too big of a problem if that wasn't causing us performance problems. But it is, and the majority of it is caused by the fact that our underlying read_indexes do not know in advance how many bytes to read, forcing us to do an element-per-element read. It's time for a change. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-27 16:44:25 +03:00
Avi Kivity	5f62f7a288	Revert "Merge "Commit log replay" from Calle" Due to test breakage. This reverts commit `43a4491043`, reversing changes made to `5dcf1ab71a`.	2015-08-27 12:39:08 +03:00
Avi Kivity	0fff367230	Merge "test for compaction metadata's ancestors" from Raphael	2015-08-27 11:07:53 +03:00
Avi Kivity	43a4491043	Merge "Commit log replay" from Calle "Initial implementation/transposition of commit log replay. * Changes replay position to be shard aware * Commit log segment ID:s now follow basically the same scheme as origin; max(previous ID, wall clock time in ms) + shard info (for us) * SStables now use the DB definition of replay_position. * Stores and propagates (compaction) flush replay positions in sstables * If CL segments are left over from a previous run, they, and existing sstables are inspected for high water mark, and then replayed from those marks to amend mutations potentially lost in a crash * Note that CPU count change is "handled" in so much that shard matching is per _previous_ runs shards, not current. Known limitations: * Mutations deserialized from old CL segments are _not_ fully validated against existing schemas. * System::truncated_at (not currently used) does not handle sharding afaik, so watermark ID:s coming from there are dubious. * Mutations that fail to apply (invalid, broken) are not placed in blob files like origin. Partly because I am lazy, but also partly because our serial format differs, and we currently have no tools to do anything useful with it * No replay filtering (Origin allows a system property to designate a filter file, detailing which keyspace/cf:s to replay). Partly because we have no system properties. There is no unit test for the commit log replayer (yet). Because I could not really come up with a good one given the test infrastructure that exists (tricky to kill stuff just "right"). The functionality is verified by manual testing, i.e. running scylla, building up data (cassandra-stress), kill -9 + restart. This of course does not really fully validate whether the resulting DB is 100% valid compared to the one at k-9, but at least it verified that replay took place, and mutations where applied. (Note that origin also lacks validity testing)"	2015-08-27 10:53:36 +03:00
Glauber Costa	873cf17cf4	sstable tests: allow for the creation of sstables of non-default buffer size. This can now be used in the sstable_index_write performance test. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-25 18:31:50 -05:00
Glauber Costa	f4d8310d88	perf_sstable_index: calculate time spent before the map reduce operation. Not doing that will include the smp communication costs in the total cost of the operation. This will not very significant when comparing one run against the other when the results clearly differ, but the proposed way yields error figures that are much lower. So results are generally better. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-25 18:31:49 -05:00

1 2 3 4 5 ...

803 Commits