scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-01 12:36:56 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	4e093bc3a4	sstables: mc: writer: Avoid double-serialization of the promoted index	2019-02-18 16:03:07 +01:00
Avi Kivity	a3de5581ce	Update seastar submodule * seastar 428f4ac...11546d4 (9): > reactor: Fix an infinite loop caused the by high resolution timer not being monitored > build: Add back `SEASTAR_SHUFFLE_TASK_QUEUE` > build: Unify dependency versions > future-util: optimize parallel_for_each() with single element > core/sharded.hh: fix doxygen for "Multicore" group > build: switch from travis-ci to circleci > perftune.py: fix irqbalance tuning on Ubuntu 18 > build: Make the use of sanitizers transitive > net: ipv6: fix ipv6 detection and tests by binding to loopback	2019-02-12 18:42:07 +02:00
Avi Kivity	c7aa73af51	Merge "Automatically pause shard readers when not used" from Botond " Recently, there has been a series of incidents of the multishard combining reader deadlocking, when the concurrency of reads were severely restricted and there was no timeout for the read. Several fixes have been merged (`414b14a6b`, `21b4b2b9a`, `ee193f1ab`, `170fa382f`) but eliminating all occurrences of deadlocks proved to be a whack-a-mole game. After the last bug report I have decided that instead of trying to plug new wholes as we find them, I'll try to make wholes impossible to appear in the first place. To translate this into the multishard reader, instead of sprinkling new `reader.pause()` calls all over the place in the multishard reader to solve the newly found deadlocks, make the pausing of readers fully automatic on the shard reader level. Readers are now always kept in a paused state, except when actually used. This eliminates the entire class of deadlock bugs. This patch-set also aims at simplifying the multishard reader code, as well as the code of the existing `lifecycle_policy` implementations. This effort resulted in: * mutation_reader.cc: no change in SLOC, although it now also contains logic that used to be duplicated in every `lifecycle_policy` implementation; * multishard_mutation_query.cc: 150 SLOC removed; * database.cc: 30 SLOC removed; Also the code is now (hopefully) simpler, safer and has a clearer structure. Fixes #4050 (main issue) Fixes #3970 Fixes #3998 (deprecates really) " * 'simplify-and-fix-multishard-reader/v3.1' of https://github.com/denesb/scylla: query_mutations_on_all_shards(): make states light-weight query_mutations_on_all_shards(): get rid of read_context::paused_reader query_mutations_on_all_shards(): merge the dismantling and ready_to_save states into saving state query_mutations_on_all_shards(): pause looked-up readers query_mutation_on_all_shards(): remove unecessary indirection shard_reader: auto pause readers after being used reader_concurrency_semaphore::inactive_read_handle: fix handle semantics shard_reader: make reader creation sync shard_reader: use semaphore directly to pause-resume shard_reader: recreate_reader(): fix empty range case foreign_reader: rip out the now unused private API shard_reader: move away from foreign_reader multishard_combining_reader: make shard_reader a shared pointer multishard_combining_reader: move the shard reader definition out multishard_combining_reader: disentangle shard_reader	2019-02-12 16:22:52 +02:00
Botond Dénes	db106a32c8	query_mutations_on_all_shards(): make states light-weight Previously the different states a reader can be in were all separate structs, and were joined together by a variant. When this was designed this made sense as states were numerous and quite different. By this point however the number of states has been reduced to 4, with 3 of them being almost the same. Thus it makes sense to merge these states into single struct and keep track of the current state with an enum field. This can theoretically increase the chances of mistakes, but in practice I expect the opposite, due to the simpler (and less) code. Also, all the important checks that verify that a reader is in the state expected by the code are all left in place. A byproduct of this change is that the amount of cross-shard writes is greatly reduced. Whereas previously the whole state object had to be rewritten on state change, now a single enum value has to be updated. Cross shard reads are reduced as well to the read of a few foreign pointers, all state-related data is now kept on the shard where the associated reader lives.	2019-02-12 16:20:51 +02:00
Botond Dénes	65b2eb0939	query_mutations_on_all_shards(): get rid of read_context::paused_reader	2019-02-12 16:20:51 +02:00
Botond Dénes	ec44a4dbb1	query_mutations_on_all_shards(): merge the dismantling and ready_to_save states into saving state These two states are now the same, with the artificial distinction that all readers are promoted to readey_to_save state after the compaction state and the combined buffer is dismantled. From a practical perspective this distinction is meaningless so merge the two states into a single `saving` state.	2019-02-12 16:20:51 +02:00
Botond Dénes	9a1bd24d82	query_mutations_on_all_shards(): pause looked-up readers On the beginning of each page, all saved readers from the previous pages (if any) are looked up, so they can be reused. Some of these saved readers can end up not being used at all for the current page, in which case they will needlessly sit on their permit for the duration of filling the page. Avoid this by immediately pausing all looked-up readers. This also allows a nice unifying of the reader saving logic, as now all readers will be in a paused state when `save_reader()` is called. Previously, looked-up, but not used readers were an exception to this, requiring extra logic to handle both cases. This logic can now be removed.	2019-02-12 16:20:51 +02:00
Botond Dénes	61b9ed7faf	query_mutation_on_all_shards(): remove unecessary indirection	2019-02-12 16:20:51 +02:00
Botond Dénes	9000626647	shard_reader: auto pause readers after being used Previously it was the responsibility of the layer above (multishard combining reader) to pause readers, which happened via an explicit `pause()` call. This proved to be a very bad design as we kept finding spots where the multishard reader should have paused the reader to avoid potential deadlocks (due to starved reader concurrency semaphores), but didn't. This commit moves the responsibility of pausing the reader into the shard reader. The reader is now kept in a paused state, except when it is actually used (a `fill_buffer()` or `fast_forward_to()` call is executing). This is fully transparent to the layer above. As a side note, the shard reader now also hides when the reader is created. This also used to be the responsibility of the multishard reader, and although it caused no problems so far, it can be considered a leak of internal details. The shard reader now automatically creates the remote reader on the first time it is attempted to be used. The code has been reorganized, such that there is now a clear separation of responsibilities. The multishard combining reader handles the combining of the output of the shard readers, as well as issuing read-aheads. The shard reader handles read-ahead and creating the remote reader when needed, as well as transferring the results of remote reads to the "home" shard. The remote reader (`shard_reader::remote_reader`, new in this patch) handles pausing-resuming as well as recreating the reader after it was evicted. Layers don't access each other's internals (like they used to). After this commit, the reader passed to `destroy_reader()` will always be in paused state.	2019-02-12 16:20:51 +02:00
Botond Dénes	ab5d717052	reader_concurrency_semaphore::inactive_read_handle: fix handle semantics That is: * make it move only; * make moved-from handles null handles; * add (public) default constructor, which constructs a null handle;	2019-02-12 16:20:51 +02:00
Botond Dénes	37006135dc	shard_reader: make reader creation sync Reader creation happens through the `reader_lifecycle_policy` interface, which offers a `create_reader()` method. This method accepts a shard parameter (among others) and returns a future. Its implementation is expected to go to the specified shard and then return with the created reader. The method is expected to be called from the shard where the shard reader (and consequently the multishard reader) lives. This API, while reasonable enough, has a serious flaw. It doesn't make batching possible. For example, if the shard reader issues a call to the remote shard to fill the remote reader's buffer, but finds that it was evicted while paused, it has to come back to the local shard just to issue the recreate call. This makes the code both convoluted and slow. Change the reader creation API to be synchronous, that is, callable from the shard where the reader has to be created, allowing for simple call sites and batching. This change requires that implementations of the lifecycle policy update any per-reader data-structure they have from the remote shard. This is not a problem however, as these data-structures are usually partitioned, such that they can be accessed safely from a remote shard. Another, very pleasant, consequence of this change is that now all methods of the lifecycle interface are sync and thus calls to them cannot overlap anymore. This patch also removes the `test_multishard_combining_reader_destroyed_with_pending_create_reader` unit test, which is not useful anymore. For now just emulate the old interface inside shard reader. We will overhaul the shard reader after some further changes to minimize noise.	2019-02-12 16:20:51 +02:00
Botond Dénes	57d1f6589c	shard_reader: use semaphore directly to pause-resume The shard reader relies on the `reader_lifecycle_policy` for pausing and resuming the remote reader. The lifecycle policy's API was designed to be as general as possible, allowing for any implementation of pause/resume. However, in practice, we have a single implementation of pause/resume: registering/unregistering the reader with the relevant `reader_concurrency_semaphore`, and we don't expect any new implementations to appear in the future. Thus, the generic API of the lifecycle policy, is needlessly abstract making its implementations needlessly complex. We can instead make this very concrete and have the lifecycle policy just return the relevant semaphore, removing the need for every implementor of the lifecycle policy interface to have a duplicate implementation of the very same logic. For now just emulate the old interface inside shard reader. We will overhaul the shard reader after some further changes to minimize noise.	2019-02-12 16:20:51 +02:00
Botond Dénes	fae5a2a8c8	shard_reader: recreate_reader(): fix empty range case If the shard reader is created for a singular range (has a single partition), and then it is evicted after reaching EOS, when recreated we would have to create a reader that reads an empty range, since the only partition the range has was already read. Since it is not possible to create a reader with an empty range, we just didn't recreate the reader in this case. This is incorrect however, as the code might still attempt to read from this reader, if only due to a bug, and would trigger a crash. The correct fix is to create an empty reader that will immediately be at EOS.	2019-02-12 16:20:51 +02:00
Botond Dénes	cd807586f6	foreign_reader: rip out the now unused private API Drop all the glue code, needed in the past so the shard reader can be implemented on top of foreign reader. As the shard reader moved away from foreign reader, this glue code is not needed anymore.	2019-02-12 16:20:51 +02:00
Botond Dénes	d80bc3c0a5	shard_reader: move away from foreign_reader In the past, shard reader wrapped a foreign reader instance, adding functionality required by the multishard reader on top. This has worked well to a certain degree, but after the addition of pause-resume of shard reader, the cooperation with foreign reader became more-and-more a struggle. It has now gotten to a point, where it feels like shard reader is fighting foreign reader as much as it reuses it. This manifested itself in the ever growing amount of glue code, and hacks baked into foreign reader (which is supposed to be of general use), specific to the usage in the multishard reader. It is time we don't force this code-reuse anymore and instead implement all the required functionality in shard reader directly.	2019-02-12 16:20:51 +02:00
Botond Dénes	da0c01c68b	multishard_combining_reader: make shard_reader a shared pointer Some members of shard reader have to be accessed even after it is destroyed. This is required by background work that might still be pending when the reader is destroyed. This was solved by creating a special `state` struct, which contained all the members of the shard readers that had to be accessed even after it was destroyed. This state struct was managed through a shared pointer, that each continuation that was expected to outlive the reader, held a copy of. This however created a minefield, where each line of the code had to be carefully audited to access only fields that will be guaranteed to remain valid. Fix this mess by making the whole class a shared pointer, with `enable_shared_from_this`. Now each continuation just has to make sure to keep `this` alive and code can now access all members freely (well, almost).	2019-02-12 16:20:51 +02:00
Botond Dénes	f1c3421eb4	multishard_combining_reader: move the shard reader definition out Shard reader started its life as a very thin layer above foreign reader, with just some convenience methods added. As usual, by now it has grown into a hairy monster, its class definition out-growing even that of the multishard reader itself. It is time shard reader is moved into the top-level scope, improving the readability of both classes.	2019-02-12 16:20:51 +02:00
Botond Dénes	7114b59309	multishard_combining_reader: disentangle shard_reader Currently shard reader has a reference to the owning multishard reader and it freely accesses its members. This resulted in a mess, where it's not clear what exactly shard reader depends on. Disentangle this mess, by making the shard reader self-sufficient, passing all it depends on into its constructor.	2019-02-12 16:20:51 +02:00
Nadav Har'El	85e5791710	tests/view_schema_test: fix flakiness caused by missing eventually() All tests that involve writing to a base table and then reading from the view table must use the eventually() function to account for the fact that the view update is asynchronous, and may be visible only some time after writing the base table. Forgetting an eventually() can cause the test to become flaky and sometimes fail because the expected data is not yet in the view. Botond noticed these failures in practice in two subtests (test_partition_key_filtering_with_slice and test_clustering_key_in_restrictions). This patch fixes both tests, and I also reviewed the entire source file view_schem_test.cc and found additional places missing an eventually() (and also places that unnecessarily used eventually() to read from the base table), and fixed those as well. Fixes #4212 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190212121140.14679-1-nyh@scylladb.com>	2019-02-12 16:10:30 +02:00
Paweł Dziepak	eb03cf00f5	sstable: write_components: drop default for encoding stats There is no value if having a default value for encoding_stats parameter of write_components(). If anything it weakens the tests by encouraging not using the real encoding stats which is not what the actual sstable write path in Scylla does. This patch removes the default value and makes most of the tests provide real encoding statistics. The ones that do not are those that have no easy way of obtaining those (and those stats are not that important for the test itself) or there is a reason for not using those (sstable_3_x_test::test_sstable_write_large_row uses row size thresholds based on size with default-constructed encoding_stats). Message-Id: <20190212124356.14878-1-pdziepak@scylladb.com>	2019-02-12 16:08:24 +02:00
Calle Wilund	4a52ed7884	commitlog: Accept recycled (not yet re-used) segments in replay Refs #4085 Changes commitlog descriptor to both accept "Recycled-Commitlog..." file names, and preserve said name in the descriptor. This ensures we pick up the not-yet-used recycled segments left from a crash for replay. The replay in turn will simply ignore the recycled files, and post actual replay they will be deleted as needed. Message-Id: <20190129123311.16050-1-calle@scylladb.com>	2019-02-12 12:23:55 +02:00
Nadav Har'El	93baa334ea	create-relocatable-package.py: speed up slow compression create-relocatable-package.py currently (refs #4194) builds a compressed tar file, but does so using a painfully slow Python implementation of gzip, which is a problem considering the huge size (around 2 gigabytes) of Scylla's executable. On my machine, running it for a release build of Scylla takes a whopping 6 minutes. Just replacing the Python compression with a pipe to an external "gzip" process speeds up the run to just 2 minutes. But gzip is still not optimal, using only one thread even when on a many-core machine. If we switch to "pigz", a parallel implementation of "gzip", all cores are used and on my machine the compression speeds up to just 23 seconds - that's 15 times faster than before this patch. So this patch has create-relocatable-package.py use an external pigz process. "pigz" is now required on the build system (if you want to create packages), so is added to install-dependencies.sh. [avi: update toolchain] Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190212090333.3970-1-nyh@scylladb.com>	2019-02-12 11:19:04 +02:00
Nadav Har'El	1cf1af1502	scylla_setup: fix non-interactive behavior In commit `ec66dd6562`, in non-interactive runs of scylla_setup all options were unintentionally set to "false", regardless of the options passed on the scylla_setup command line. This can lead to all sorts of wrong behaviors, and in particular one test setup assumed it was enabling the Scylla service (which was previously the default) but after this commit, it no longer did. This patch restores the previous behavior: Non-interactive invocations of scylla_setup adhere to the defaults and the command-line options, rather than blindly choosing "false". Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190211214105.32613-1-nyh@scylladb.com>	2019-02-12 10:50:00 +02:00
Avi Kivity	da9628c6dc	auth: password_authenticator: protect against NULL salted_hash In case salted_hash was NULL, we'd access uninitialized memory when dereferencing the optional in get_as<>(). Protect against that by using get_opt() and failing authentication if we see a NULL. Fixes #4168. Tests: unit (release) Branches: 3.0, 2.3 Message-Id: <20190211173820.8053-1-avi@scylladb.com>	2019-02-11 18:54:03 +01:00
Avi Kivity	cb51fcab9d	README: improbe dbuild instructions Add a quick start, document more options, and link from the main README. Message-Id: <20190210154606.21739-1-avi@scylladb.com>	2019-02-11 09:25:25 +01:00
Avi Kivity	2724a66a12	docker: don't send .git during "docker build" It's huge and useless during "docker build" operations. Message-Id: <20190208161848.21125-1-avi@scylladb.com>	2019-02-11 09:17:14 +01:00
Glauber Costa	e0bfd1c40a	allow Cassandra SSTables with counters to be imported if they are new enough Right now Cassandra SSTables with counters cannot be imported into Scylla. The reason for that is that Cassandra changed their counter representation in their 2.1 version and kept transparently supporting both representations. We do not support their old representation, nor there is a sane way to figure out by looking at the data which one is in use. For safety, we had made the decision long ago to not import any tables with counters: if a counter was generated in older Cassandra, we would misrepresent them. In this patch, I propose we offer a non-default way to import SSTables with counters: we can gate it with a flag, and trust that the user knows what they are doing when flipping it (at their own peril). Cassandra 2.1 is by now pretty old. many users can safely say they've never used anything older. While there are tools like sstableloader that can be used to import those counters, there are often situations in which directly importing SSTables is either better, faster, or worse: the only option left. I argue that having a flag that allow us to import them when we are sure it is safe is better than having no option at all. With this patch I was able to successfully import Cassandra tables with counters that were generated in Cassandra 2.1, reshard and compact their SSTables, and read the data back to get the same values in Scylla as in Cassandra. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190210154028.12472-1-glauber@scylladb.com>	2019-02-10 17:50:48 +02:00
Glauber Costa	61ea54eff6	tools: toolchain: dbuild: use host networking This is convenient to test scylla directly by invoking build/dev/scylla. This needs to be done under docker because the shared objects scylla looks for may not exist in the host system. During quick development we may not want to go through the trouble of packaging relocatable scylla every time to test changes. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190209021033.8400-1-glauber@scylladb.com>	2019-02-10 12:16:47 +02:00
Duarte Nunes	d2d885fb93	Merge 'Fix misdetection of remote counter shards' from Paweł " The code reading counter cells form sstables verifies that there are no unsupported local or remote shards. The latter are detected by checking if all shards are present in the counter cell header (only remote shards do not have entries there). However, the logic responsible for doing that was incorrectly computing the total number of counter shards in a cell if the header was larger than a single counter shard. This resulted in incorrect complaints that remote shards are present. Fixes #4206 Tests: unit(release) " * tag 'counter-header-fix/v1' of https://github.com/pdziepak/scylla: tests/sstables: test counter cell header with large number of shards sstables/counters: fix remote counter shard detection	2019-02-10 12:16:31 +02:00
Paweł Dziepak	4eeb8eeed5	tests/sstables: test counter cell header with large number of shards The logic responsible for reading counters from sstables was getting confused by large headers. The size of the header depends directly on the number of shards. This tests checks that we can handle cells with large number of counter shards properly.	2019-02-08 17:06:31 +00:00
Paweł Dziepak	df1ac03154	sstables/counters: fix remote counter shard detection Each counter cell has a header with an entry for each local and global shards. The detection of remote shards is done by checking if there are any counter shards that do not have an entry in the header. This is done by computing the number of counter shards in a cell and comparing it to the number of header entries. However, the computation was wrong and included the size taken by the header itself. As a result, if the header was as big or larger than a single counter shard Scylla incorrectly complained about remote shards.	2019-02-08 17:04:22 +00:00
Glauber Costa	8ba6b569b1	relocatable python: make sure all shared objects are relocated The interpreter as it is right now has a bug: I incorrectly assumed that all the shared libraries that python dynamically links would be in lib-dynload. That is not true, and at least some of them are in site-packages. With that, we were loading system libraries for some shared objects. The approach taken to fix this is to just check if we're seeing a shared library and relocate everything we see: we will end up relocating the ones in lib64 too, but that not only should be okay, it is probably even more fool-proof. While doing that I noticed that I had forgotten to incorporate one of previous feedback from Avi (that we're leaving temporary files behind). So I'm fixing that as well. [avi: update toolchain] Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190208115501.7234-1-glauber@scylladb.com>	2019-02-08 18:42:24 +02:00
Glauber Costa	fb742473e2	replace /usr/local as a source of packages in the python relocatable interpreter I was playing with the python3 interpreter trying to get pip to work, just to see how far we can go. We don't really need pip, but I figured it would be a good stress test to make sure that the process is working and robust. And it didn't really work, because although pip will correctly install things into $relocatable_root/local/lib, sys.path will still refer to a hardcoded /usr/local. While this should not affect Scylla, since we expect to have all our modules in out path anyway -- and that path is searched before /usr/local, it is still dangerous to make an absolute reference like this. Unfortunately, /usr/local/ it is included unconditionally by site.py, which is executed when the interpreter is started and there is no environment variable I found to change that (the help string refers to PYTHONNOUSERSITE, but I found no mention of that in site.py whatsoever) There is a way to tell site.py not to bother to add user sites, by passing the -s flag, which this patch does. Aside from doing that, we also enhance PYTHONPATH to include a reference to ./local/{lib,lib64}/python<version>/site-packages. After applying this patch, I was able to build an interpreter containing only python3-pip and python3-setuptools, and build the relocatable environment from there. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20190206052104.25927-1-glauber@scylladb.com>	2019-02-08 18:41:52 +02:00
Paweł Dziepak	64b1a2caf9	tests: modernise tmpdir tmpdir is a helper class representing a temporary directory. Unfortunately, it suffers for some problems such as lack of proper encapsulation and weak typing. This has caused bugs in the past when the user code accidentally modified the member variable with the path to the directory. This patch modernises tmpdir and updates its users. The path is stored in a std::filesystem::path and available read-only to the class users. mkdtemp and boost are replaced by standard solution. The users are update to use path more (when it didn't involve too many changes to their code) and stop using lw_shared_ptr to store the tmpdir when it wasn't necessary. tmpdir intentionally doesn't provide any helpers for getting the path as a string in order to discourage weak types. Message-Id: <20190207145727.491-1-pdziepak@scylladb.com>	2019-02-07 20:18:14 +02:00
Avi Kivity	e2e25720c1	Update seastar submodule * seastar c3be06d...428f4ac (13): > build: make the "dist" test respect the build type > Merge 'Add support for docker --cpuset-cpus' from Juliana > Merge "Add support for Coroutines TS" from Paweł > Merge "Modernize dependency management" from Avi > future: propagate broken_promise exception to abandoned continuations > net/inet_address: avoid clang Wmissing-braces > build: Default to the "Release" type if unspecified > rpc: log an exception that may happen while processing an RPC message > Add a --split-dwarf option to configure.py > build: Fix the `StdFilesystem` module > Compress debug info by default > Add an option for building with split dwarf > Dockerfile: install stow	2019-02-07 20:08:15 +02:00
Paweł Dziepak	de2a447576	utils/extremum_tracking: drop default constructor Default constructed extremum_tracker has uninitialised _default_value which basically makes it never correct to do that. Since this class is a mechanism and not a value it doesn't really need to be a regular type, so let's drop the default constructor. Message-Id: <20190207162430.7460-1-pdziepak@scylladb.com>	2019-02-07 18:31:25 +02:00
Tomasz Grabiec	7184289015	Merge "Various fixes and improvements for sstables statistics" from Paweł This series contains several fixes and improvements as well as new tests for sstable code dealing with statistics. * https://github.com/pdziepak/scylla.git sstable-stats-fixes/v1-rebased: sstables: compaction: don't access moved-from vector of sstables memtable: move encoding_stats_collector implementation out of header sstables: seal_statistics(): pass encoding_stats by constant reference sstables/mc/writer: don't assume all schema columns are present tests/sstable3: improvements to file compare tests: extract mutation data model tests/data_model: add support for expiring atomic cells tests/data_model: allow specifying timestamp for row markers tests/memtable: test column tracking for encoding stats sstables: use correct source of statistics in get_encoding_stats_for_compaction() utils/extremum_tracking: preserve "not-set" status on merge sstables/metadata_collector: move the default values to the global tracker tests/sstables: test for reading serialisation header tests/sstables: pass encoding stats to write_components() tests/sstable: test merging encoding_stats Fixes #4202.	2019-02-07 12:35:29 +01:00
Paweł Dziepak	67252de195	tests/sstable: test merging encoding_stats	2019-02-07 10:17:06 +00:00
Paweł Dziepak	e25603fbf7	tests/sstables: pass encoding stats to write_components() By default write_components() uses a safe default for encoding_stats which indicates that all columns are present. This may hide so bugs, so let's pass the real thing in the tests that this may matter.	2019-02-07 10:17:06 +00:00
Paweł Dziepak	d44d5ebf86	tests/sstables: test for reading serialisation header	2019-02-07 10:17:06 +00:00
Paweł Dziepak	ebf667fb9c	sstables/metadata_collector: move the default values to the global tracker column_stats is a per-partition tracker, while metadata_collector is the global one. The statistics gathered by column_stats are merged into the metadata_collector. In order to ensure that we get proper default values in case no value of particular kind (e.g. no TTLs) was seen they need to be set on the global tracker, not the per-partition one.	2019-02-07 10:16:50 +00:00
Paweł Dziepak	2680022df0	utils/extremum_tracking: preserve "not-set" status on merge extremum_tracker allows choosing a default value that's going to be used only if no "real" values were provided. Since it is never compared with the actual input values it can be anything. For instance, if the minimum tracker default value is 0 and there was one update with the value 1 the detected minimum is going to be 1 (the default is ignored). However, this doesn't work when the trackers are merged since that process always leaves the destination tracker in the "set" state regardless whether any of the merged trakcers has ever seen any value. This is fixed by this patch, by properly preserving _is_set state on merge.	2019-02-07 10:16:50 +00:00
Paweł Dziepak	84d8ee35d4	sstables: use correct source of statistics in get_encoding_stats_for_compaction() sstable class is responsible for much more things that it should. In particular, it takes care of both writing and reading sstables. The problem that it causes is that it is very easy to confuse those two. This is what has happened in get_encoding_stats_for_compaction(). Originally, it was using _c_stats as a source of the statistics, which is used only during the write and per-partition. Needless to say, the returned encoding_stats were bogus. The correct source of those statistics is get_stats_metadata().	2019-02-07 10:16:50 +00:00
Paweł Dziepak	e315448d0a	tests/memtable: test column tracking for encoding stats	2019-02-07 10:16:50 +00:00
Paweł Dziepak	591d5195a9	tests/data_model: allow specifying timestamp for row markers	2019-02-07 10:16:50 +00:00
Paweł Dziepak	b07cba6a89	tests/data_model: add support for expiring atomic cells	2019-02-07 10:16:50 +00:00
Paweł Dziepak	aab0b7360f	tests: extract mutation data model	2019-02-07 10:16:50 +00:00
Paweł Dziepak	fa216be260	tests/sstable3: improvements to file compare This patch introduces some improvement to file comparison: - exception flags are set so that any error triggers an exceptions and guarantees that they are not silently ignored - std::ios_base::binary flag is passed to open() - istreambuf_iterator is used instead of istream_iterator. It is better suited for comparing binary data.	2019-02-07 10:16:50 +00:00
Paweł Dziepak	bc61471132	sstables/mc/writer: don't assume all schema columns are present The writer constructor prepares lists of present static and regular columns, those should be used for any further checks.	2019-02-07 10:16:50 +00:00
Paweł Dziepak	0132bcc035	sstables: seal_statistics(): pass encoding_stats by constant reference	2019-02-07 10:16:50 +00:00

1 2 3 4 5 ...

17952 Commits